yelping-magenta
yelping-magenta2y ago

enqueueLinks works without options but with options it does not work

Hi everyone I was trying just for fun to crawl the idealista page for some homes. While doing so I noticed that enqueueLinks() without options works but as soon as I put a label and a selector it does not select anything and stops. Any idea what it could be? StartUrl example: https://www.idealista.com/venta-viviendas/barcelona/sant-marti/
requestHandler: async ({page, log, request,enqueueLinks}) =>{
if(request.label === 'LIST'){
await sleep(5000);
}else if (request.label === 'HOME'){
await sleep(5000);
}else {
log.info(`Processing: ${request.url}`);
// Check if Cookie concent is there.
await page.waitForSelector('#didomi-notice-agree-button');
const cookieAcceptButton = await page.$('#didomi-notice-agree-button');
if (cookieAcceptButton) {
await page.click('#didomi-notice-agree-button');
}

// Check if there are more pages to que them up
await page.waitForLoadState('networkidle');
await page.waitForSelector('li.next');
const nextButton = await page.$('li.next');
if (nextButton) {
await enqueueLinks({
selector: 'li.next',
strategy: 'all',
label: 'LIST',
});
}
}
},
requestHandler: async ({page, log, request,enqueueLinks}) =>{
if(request.label === 'LIST'){
await sleep(5000);
}else if (request.label === 'HOME'){
await sleep(5000);
}else {
log.info(`Processing: ${request.url}`);
// Check if Cookie concent is there.
await page.waitForSelector('#didomi-notice-agree-button');
const cookieAcceptButton = await page.$('#didomi-notice-agree-button');
if (cookieAcceptButton) {
await page.click('#didomi-notice-agree-button');
}

// Check if there are more pages to que them up
await page.waitForLoadState('networkidle');
await page.waitForSelector('li.next');
const nextButton = await page.$('li.next');
if (nextButton) {
await enqueueLinks({
selector: 'li.next',
strategy: 'all',
label: 'LIST',
});
}
}
},
1 Reply
Alexey Udovydchenko
Probably your label is not logically correct: do mind that actor might face antibot protection and actual content will not be the same as seen in your real browser. Try https://crawlee.dev/api/puppeteer-crawler/namespace/puppeteerUtils#saveSnapshot and check html page content.

Did you find this page helpful?