exotic-emerald•17mo ago
Requests timing out - best practices?
Hello everyone! I am trying to scrape a grocery store website and I'm running into some difficulties. I'm using PlayWright/Crawlee and running on the APIFY platform. Any assistance would be greatly appreciated!
I have a huge number of URLs to use as starting points for my scrape. And I am initiating the scrape with something like this: (Note: startUrls is an array containing several hundred URLs)
Then, in each callback of router.addDefaultHandler, I further scroll through each page, enqueuing more links. So, what i'm trying to do is quite extensive and I expect the scrape to take many hours.
When I run my scraper, it works well up to a point, but then I start getting more and more errors like:
And eventually, the entire thing grinds to a halt with something like:
[To be continued...]
5 Replies
@Red Guy just advanced to level 1! Thanks for your contributions! 🎉
exotic-emeraldOP•17mo ago
[Continued from above]
I'm wondering if there are some best practices I'm missing here.
It seems like I'm being throttled by the website? I tried to change my proxy to residential (which I do have a subscription for) and it does not seem to help, unfortunately. I'm reproducing the code below, in case I'm doing something wrong.
it looks like the request handler is timing out, try to increase the timeout using requestHandlerTimeoutSecs
you can also take a look at infiniteScroll, it might be helpful in your case
playwrightUtils | API | Crawlee
A namespace that contains various utilities for
Playwright - the headless Chrome Node API.
Example usage:
```javascript
import { launchPlaywright, playwrightUtils } from 'crawlee';
// Navigate to https://www.example.com in Playwright with a POST request
const browser = await launchPlaywright();
c...
exotic-emeraldOP•16mo ago
I appreciate that, but why would 30 seconds not be enough to load a basic webpage? I am afraid some kind of throttling is going on.
could be many factors, maybe try switching to Datacenter proxies if that would help