rare-sapphire
rare-sapphire2y ago

Mysterious retryOnBlocked property

Actually this should be a great thing! https://crawlee.dev/api/playwright-crawler/interface/PlaywrightCrawlerOptions#retryOnBlocked
If set to true, the crawler will automatically try to bypass any detected bot protection. Currently supports: Cloudflare Bot Management Google Search Rate Limiting
Can we have some information about ... how to use this thing? Any prerequisites? Side effects? Does it needs some special settings in PlaywrightCrawler ? Example: I have maxRequestRetries=0 - is it OK to use retryOnBlocked in such case?
1 Reply
Lukas Krivka
Lukas Krivka2y ago
You just need to turn it on and it will check for these, it will probably be upgraded in the future https://github.com/apify/crawlee/blob/master/packages/utils/src/internals/blocked.ts maxRequestRetries=0 can be used but in that case it will just fail completely
GitHub
crawlee/packages/utils/src/internals/blocked.ts at master · apify/c...
Crawlee—A web scraping and browser automation library for Node.js that helps you build reliable crawlers. Fast. - apify/crawlee

Did you find this page helpful?