wise-white
wise-white2y ago

how to stop skipping of urls due to error handling

when i am scraping product data from product urls, if i am trying to either see whether a tag is available and if not to use a different tag or if a tag simply isn't found, i don't want it to give a full error for not finding that certain element i want and not scrape and save the rest of the data how do i avoid this "skipping" over by overriding or changing the natural response of the crawler i even have tried try catch statements and if else statements to handle a product not being found and nothing works
let salePrice = await page.$eval('span.price-value', (el) => el.textContent?.trim() || '');
let newTag = await page.$eval('span.price-ns', (el) => el.textContent?.trim() || '');
let originalPrice = salePrice;

if(newTag){
originalPrice = newTag;
}else{
return
}
let salePrice = await page.$eval('span.price-value', (el) => el.textContent?.trim() || '');
let newTag = await page.$eval('span.price-ns', (el) => el.textContent?.trim() || '');
let originalPrice = salePrice;

if(newTag){
originalPrice = newTag;
}else{
return
}
2 Replies
Lukas Krivka
Lukas Krivka2y ago
I recommend using the $ cheerio parser with https://crawlee.dev/api/puppeteer-crawler/interface/PuppeteerCrawlingContext#parseWithCheerio for nicer parsing API. Otherwise, you can do something like
let salePrice = await page.$eval('span.price-value', (el) => el.textContent?.trim() || '').catch(() => '');
let newTag = await page.$eval('span.price-ns', (el) => el.textContent?.trim() || '').catch(() => '');

const originalPrice = newTag || salePrice || '';
let salePrice = await page.$eval('span.price-value', (el) => el.textContent?.trim() || '').catch(() => '');
let newTag = await page.$eval('span.price-ns', (el) => el.textContent?.trim() || '').catch(() => '');

const originalPrice = newTag || salePrice || '';
wise-white
wise-whiteOP2y ago
thank you, the error catching built into the scraping tag finally worked, thanks again!

Did you find this page helpful?