conscious-sapphire•2y ago
Is anyone scraping indeed with Apify and
Is anyone scraping indeed with Apify and having Cloudflare captcha issues over the passed two weeks?
14 Replies
Indeed has now much better bot protection. What worked with cheerio before now needs playwright with very good proxies.
it's not a matter of Apify.
Bot protection is a lot better. Especially fingerprinting.
I'm also using other tools and facing the same issues.
continuing-cyan•2y ago
I'm using Playwright and Residential Proxies.
try to go headful and with xvfb
try to use particular waits for scripts to be loaded .
The whole trick about captchas is to learn what is triggering them and try to avoid as much as possible.
Just throwing residential proxies is not solving the issue.
continuing-cyan•2y ago
The only way I've gotten it working locally right now is via puppeteer-real-browser. I'm not sure if that will work if i wrap it in Apify code and deploy it to the platform.
that puppeteer-real-browser is just a collection of settings for chrome.
nothing magic happens.
crawlee should do the same, no
continuing-cyan•2y ago
That's what I was thinking, so I re-wrote our code to use the latest Apify SDK and Crawlee, but not luck. So I started going down the rabbit hole with other potential solutions.
can you share some url that always get captcha no matter how much you retry?
@danimalweb just advanced to level 1! Thanks for your contributions! 🎉
That url really does not load for me in any automated browser. With or without proxies, so the proxy is not an issue. I did not try the real browser plugin for puppeteer, if that works it should work on platform also.
continuing-cyan•2y ago
Thanks for checking on that for me.
How to integrate the puppeteer-real-browser with Crawlee / Apify, @danimalweb & @NeoNomade ?