Davido
Davido4mo ago

Throttle on 429 responses

Hi, I'm using a cheerio crawler and things are generally working well. I occasionally get a Cloudflare 429 page, though, and it manifests itself as an error on waitForSelector because I'm getting the Cloudflare response. Should Crawlee be catching these responses and waiting/slowing without intervention? I've had to catch this issue and then pause the autoscale pool (for 10 sec) manually. Should I be tuning other nobs too/instead? I don't have maxRequestsPerMinute configured yet because I'm not sure how to find/tune this setting.
2 Replies
Hall
Hall4mo ago
Someone will reply to you shortly. In the meantime, this might help:
lemurio
lemurio3mo ago
Scaling our crawlers | Crawlee for JavaScript · Build reliable cra...
Crawlee helps you build and maintain your crawlers. It's open source, but built by developers who scrape millions of pages every day for a living.
Anti-scraping protections | Academy | Apify Documentation
Understand the various anti-scraping measures different sites use to prevent bots from accessing them, and how to appear more human to fix these issues.
Bypassing Cloudflare browser check | Academy | Apify Documentation
Learn how to bypass Cloudflare browser challenge with Crawlee.

Did you find this page helpful?