Crawlee Hybrid Crawler?
I notice a lot of the time I end up writing the exact same type of crawler where it first uses CheerioCrawler and then falls back to PlaywrightCrawler for failed requests. The only annoying thing is the obviously different syntax between cheerio and playwright ($ and load for Cheerio vs page for Playwright). For code reuse purposes i end up writing a lot of code that looks like this
Or like:
And it got me thinking, why doesn't Crawlee have a generalized crawler for this exact purpose? Similar to your adaptive crawler but less opaque. I cant tell why or when that adaptive crawler will use cheerio. I want ALL requests to start on cheerio and only failed ones (failed based on my crawling logic that I expect to be present in the page) to go to Playwright. Thanks!
2 Replies
Also tracking via this Github FR here: https://github.com/apify/crawlee/issues/3155
GitHub
Crawlee Better Hybrid Crawler? · Issue #3155 · apify/crawlee
Which package is the feature request for? If unsure which one to select, leave blank Crawlee Feature I notice a lot of the time I end up writing the exact same type of crawler where it first uses C...
Thanks for idea. The ticket is already in the repo, so guys should check it soon.