Apify Discord Mirror

Updated last year

Mysterious retryOnBlocked property

At a glance

The community members are discussing the retryOnBlocked option in the Playwright Crawler library. This option allows the crawler to automatically try to bypass any detected bot protection, such as Cloudflare Bot Management and Google Search Rate Limiting. The community members are asking for more information on how to use this feature, including any prerequisites, side effects, and whether it can be used with maxRequestRetries=0. A community member responds that you just need to turn on the retryOnBlocked option, and that it will likely be upgraded in the future. They also mention that using maxRequestRetries=0 will cause the crawler to fail completely in case of blocked requests.

Useful resources
Actually this should be a great thing!

https://crawlee.dev/api/playwright-crawler/interface/PlaywrightCrawlerOptions#retryOnBlocked

If set to true, the crawler will automatically try to bypass any detected bot protection.
Currently supports:
Cloudflare Bot Management
Google Search Rate Limiting

Can we have some information about ... how to use this thing?
Any prerequisites? Side effects?
Does it needs some special settings in PlaywrightCrawler ?
Example: I have maxRequestRetries=0 - is it OK to use retryOnBlocked in such case?
L
1 comment
You just need to turn it on and it will check for these, it will probably be upgraded in the future
https://github.com/apify/crawlee/blob/master/packages/utils/src/internals/blocked.ts

maxRequestRetries=0 can be used but in that case it will just fail completely
Add a reply
Sign up and join the conversation on Discord