rare-sapphire•2y ago
Mysterious retryOnBlocked property
Actually this should be a great thing!
https://crawlee.dev/api/playwright-crawler/interface/PlaywrightCrawlerOptions#retryOnBlocked
If set to true, the crawler will automatically try to bypass any detected bot protection. Currently supports: Cloudflare Bot Management Google Search Rate LimitingCan we have some information about ... how to use this thing? Any prerequisites? Side effects? Does it needs some special settings in PlaywrightCrawler ? Example: I have
maxRequestRetries=0
- is it OK to use retryOnBlocked
in such case?1 Reply
You just need to turn it on and it will check for these, it will probably be upgraded in the future
https://github.com/apify/crawlee/blob/master/packages/utils/src/internals/blocked.ts
maxRequestRetries=0
can be used but in that case it will just fail completelyGitHub
crawlee/packages/utils/src/internals/blocked.ts at master · apify/c...
Crawlee—A web scraping and browser automation library for Node.js that helps you build reliable crawlers. Fast. - apify/crawlee