conscious-sapphire•17mo ago
AdaptivePlaywrightCrawler: programmatically deciding when to render JS
Using the new adaptative Playwright crawler, is it possible programmatically decide when to render JS?
For example using HTTP crawling by default, but if some condition is met (for example, finding the word 'captcha' in the loaded url), switch to JS rendering and try to unblock the page.
A similar question, for which I didn't find any answer in the docs, is how does the AdaptivePlaywrightCrawler decide to render JS or not?
4 Replies
No official support atm, since related classes are private https://github.com/apify/crawlee/blob/master/packages/playwright-crawler/src/internals/adaptive-playwright-crawler.ts#L143 so you not expected to inherit from them with additional logic. However you can reuse or browse current code i.e. see how prediction works based on custom ratio https://github.com/apify/crawlee/blob/master/packages/playwright-crawler/src/internals/utils/rendering-type-prediction.ts#L32
conscious-sapphireOP•17mo ago
Ok thanks for these precisions. Any chance of adding this to the roadmap? For example using Scrapy it's easy to mix HTTP-only and JS rendering within the same crawler, it would be great to have the same in Crawlee.
No plans, please add github issue with feature request, if it will become popular and attract feedback from other users then features will be considered.
conscious-sapphireOP•17mo ago
Got it, here is the issue https://github.com/apify/crawlee/issues/2446
GitHub
AdaptivePlaywrightCrawler
: programmatically deciding when to rend...Which package is the feature request for? If unsure which one to select, leave blank @crawlee/playwright (PlaywrightCrawler) Feature Add the possibility to programmatically decide when to render JS...