dead-brownD
Apify & Crawleeโ€ข2y agoโ€ข
5 replies
dead-brown

Requests timing out - best practices?

Hello everyone! I am trying to scrape a grocery store website and I'm running into some difficulties. I'm using PlayWright/Crawlee and running on the APIFY platform. Any assistance would be greatly appreciated!

I have a huge number of URLs to use as starting points for my scrape. And I am initiating the scrape with something like this: (Note: startUrls is an array containing several hundred URLs)

await crawler.run(startUrls);


Then, in each callback of router.addDefaultHandler, I further scroll through each page, enqueuing more links. So, what i'm trying to do is quite extensive and I expect the scrape to take many hours.

When I run my scraper, it works well up to a point, but then I start getting more and more errors like:

-  "PlaywrightCrawler: Reclaiming failed request back to the list or queue. requestHandler timed out after 30 seconds"
- WARN  PlaywrightCrawler: Reclaiming failed request back to the list or queue. Navigation timed out after 60 seconds.
- "Reclaiming failed request back to the list or queue. page.goto: net::ERR_SOCKET_NOT_CONNECTED"


And eventually, the entire thing grinds to a halt with something like:

2024-05-05T14:49:15.152Z /home/myuser/node_modules/playwright-core/lib/server/chromium/crPage.js:492
2024-05-05T14:49:15.156Z     this._firstNonInitialNavigationCommittedReject(new Error('Page closed'));
2024-05-05T14:49:15.158Z                                                    ^
2024-05-05T14:49:15.160Z
2024-05-05T14:49:15.162Z Error: Page closed
2024-05-05T14:49:15.164Z     at FrameSession.dispose (/home/myuser/node_modules/playwright-core/lib/server/chromium/crPage.js:492:52)


[To be continued...]
Was this page helpful?