conscious-sapphire
conscious-sapphire17mo ago

AdaptivePlaywrightCrawler: programmatically deciding when to render JS

Using the new adaptative Playwright crawler, is it possible programmatically decide when to render JS? For example using HTTP crawling by default, but if some condition is met (for example, finding the word 'captcha' in the loaded url), switch to JS rendering and try to unblock the page. A similar question, for which I didn't find any answer in the docs, is how does the AdaptivePlaywrightCrawler decide to render JS or not?
4 Replies
Alexey Udovydchenko
No official support atm, since related classes are private https://github.com/apify/crawlee/blob/master/packages/playwright-crawler/src/internals/adaptive-playwright-crawler.ts#L143 so you not expected to inherit from them with additional logic. However you can reuse or browse current code i.e. see how prediction works based on custom ratio https://github.com/apify/crawlee/blob/master/packages/playwright-crawler/src/internals/utils/rendering-type-prediction.ts#L32
conscious-sapphire
conscious-sapphireOP17mo ago
Ok thanks for these precisions. Any chance of adding this to the roadmap? For example using Scrapy it's easy to mix HTTP-only and JS rendering within the same crawler, it would be great to have the same in Crawlee.
Alexey Udovydchenko
No plans, please add github issue with feature request, if it will become popular and attract feedback from other users then features will be considered.
conscious-sapphire
conscious-sapphireOP17mo ago
GitHub
AdaptivePlaywrightCrawler: programmatically deciding when to rend...
Which package is the feature request for? If unsure which one to select, leave blank @crawlee/playwright (PlaywrightCrawler) Feature Add the possibility to programmatically decide when to render JS...

Did you find this page helpful?