conscious-sapphire
conscious-sapphire•2y ago

Is anyone scraping indeed with Apify and

Is anyone scraping indeed with Apify and having Cloudflare captcha issues over the passed two weeks?
14 Replies
HonzaS
HonzaS•2y ago
Indeed has now much better bot protection. What worked with cheerio before now needs playwright with very good proxies.
NeoNomade
NeoNomade•2y ago
it's not a matter of Apify. Bot protection is a lot better. Especially fingerprinting. I'm also using other tools and facing the same issues.
continuing-cyan
continuing-cyan•2y ago
I'm using Playwright and Residential Proxies.
NeoNomade
NeoNomade•2y ago
try to go headful and with xvfb try to use particular waits for scripts to be loaded . The whole trick about captchas is to learn what is triggering them and try to avoid as much as possible. Just throwing residential proxies is not solving the issue.
continuing-cyan
continuing-cyan•2y ago
The only way I've gotten it working locally right now is via puppeteer-real-browser. I'm not sure if that will work if i wrap it in Apify code and deploy it to the platform.
NeoNomade
NeoNomade•2y ago
that puppeteer-real-browser is just a collection of settings for chrome. nothing magic happens.
HonzaS
HonzaS•2y ago
crawlee should do the same, no
continuing-cyan
continuing-cyan•2y ago
That's what I was thinking, so I re-wrote our code to use the latest Apify SDK and Crawlee, but not luck. So I started going down the rabbit hole with other potential solutions.
HonzaS
HonzaS•2y ago
can you share some url that always get captcha no matter how much you retry?
MEE6
MEE6•2y ago
@danimalweb just advanced to level 1! Thanks for your contributions! 🎉
HonzaS
HonzaS•2y ago
That url really does not load for me in any automated browser. With or without proxies, so the proxy is not an issue. I did not try the real browser plugin for puppeteer, if that works it should work on platform also.
continuing-cyan
continuing-cyan•2y ago
Thanks for checking on that for me.
Louis Deconinck
Louis Deconinck•9mo ago
How to integrate the puppeteer-real-browser with Crawlee / Apify, @danimalweb & @NeoNomade ?

Did you find this page helpful?