Apify Discord Mirror

Updated last year

how to stop skipping of urls due to error handling

At a glance

The community member is scraping product data from URLs and encountering issues where certain tags are not found, causing the entire scraping process to fail. They have tried using try-catch statements and if-else statements to handle missing tags, but these approaches have not worked. Another community member recommends using the Cheerio parser with the Crawlee library for a nicer parsing API, and provides an example of how to handle missing tags using the .catch(() => '') approach. The original community member confirms that the error catching built into the scraping tags finally worked for them.

Useful resources
when i am scraping product data from product urls, if i am trying to either see whether a tag is available and if not to use a different tag or if a tag simply isn't found, i don't want it to give a full error for not finding that certain element i want and not scrape and save the rest of the data
how do i avoid this "skipping" over by overriding or changing the natural response of the crawler

i even have tried try catch statements and if else statements to handle a product not being found and nothing works
Plain Text
   let salePrice = await page.$eval('span.price-value', (el) => el.textContent?.trim() || '');
        let newTag = await page.$eval('span.price-ns', (el) => el.textContent?.trim() || '');
        let originalPrice = salePrice;

        if(newTag){
          originalPrice = newTag;
        }else{
          return
        }
h
L
3 comments
how to stop skipping of urls due to error handling
I recommend using the $ cheerio parser with https://crawlee.dev/api/puppeteer-crawler/interface/PuppeteerCrawlingContext#parseWithCheerio for nicer parsing API.

Otherwise, you can do something like
Plain Text
let salePrice = await page.$eval('span.price-value', (el) => el.textContent?.trim() || '').catch(() => '');
        let newTag = await page.$eval('span.price-ns', (el) => el.textContent?.trim() || '').catch(() => '');

const originalPrice = newTag || salePrice || '';
thank you, the error catching built into the scraping tag finally worked, thanks again!
Add a reply
Sign up and join the conversation on Discord