wise-white•2y ago

how to stop skipping of urls due to error handling

when i am scraping product data from product urls, if i am trying to either see whether a tag is available and if not to use a different tag or if a tag simply isn't found, i don't want it to give a full error for not finding that certain element i want and not scrape and save the rest of the data how do i avoid this "skipping" over by overriding or changing the natural response of the crawler i even have tried try catch statements and if else statements to handle a product not being found and nothing works

   let salePrice = await page.$eval('span.price-value', (el) => el.textContent?.trim() || '');
        let newTag = await page.$eval('span.price-ns', (el) => el.textContent?.trim() || '');
        let originalPrice = salePrice;

        if(newTag){
          originalPrice = newTag;
        }else{
          return
        }

   let salePrice = await page.$eval('span.price-value', (el) => el.textContent?.trim() || '');
        let newTag = await page.$eval('span.price-ns', (el) => el.textContent?.trim() || '');
        let originalPrice = salePrice;

        if(newTag){
          originalPrice = newTag;
        }else{
          return
        }

2 Replies

Lukas Krivka•2y ago

I recommend using the $ cheerio parser with https://crawlee.dev/api/puppeteer-crawler/interface/PuppeteerCrawlingContext#parseWithCheerio for nicer parsing API. Otherwise, you can do something like

let salePrice = await page.$eval('span.price-value', (el) => el.textContent?.trim() || '').catch(() => '');
        let newTag = await page.$eval('span.price-ns', (el) => el.textContent?.trim() || '').catch(() => '');

const originalPrice = newTag || salePrice || '';

let salePrice = await page.$eval('span.price-value', (el) => el.textContent?.trim() || '').catch(() => '');
        let newTag = await page.$eval('span.price-ns', (el) => el.textContent?.trim() || '').catch(() => '');

const originalPrice = newTag || salePrice || '';

PuppeteerCrawlingContext | API | Crawlee

wise-whiteOP•2y ago

thank you, the error catching built into the scraping tag finally worked, thanks again!

how to stop skipping of urls due to error handling

Did you find this page helpful?