extended-salmon
extended-salmon2y ago

PlaywrightCrawler runs into timeout on Apify but works locally

I have a crawler which performs very simple GET requests to a well known marketplace website. One crawler instance only sends 2-3 requests, after that I'm setting up a new crawler with different configuration. Locally, my crawler works just fine, but if I run it on Apify I'm getting the following error for most of the requests: ERROR PlaywrightCrawler: Request failed and reached maximum retries. page.goto: Timeout 10000ms exceeded. I only get this behaviour on one specific website, on other websites I'm crawling it works fine. So I'm assuming that I'm running into a blocking mechanism. However, I'm already using residential proxies: proxyConfiguration = await Actor.createProxyConfiguration({ groups: ['RESIDENTIAL'], countryCode: 'DE', }); Interesting is that if I mark the session as bad after a timeout and try it again, it seems to work. But this approach is extremely slow and cumbersome, since I have to reset the session after pretty much every request. Any ideas?
5 Replies
HonzaS
HonzaS2y ago
Try headfull mode, try puppeteer with stealth plugin, try playwright with firefox. You can also try to buy other proxies that will maybe work better.
Lukas Krivka
Lukas Krivka2y ago
Can you share the URL? Some websites might have very aggressive challenges where the fingerprint we use might not match Apify hardware. Our team would explore that.
Lukas Krivka
Lukas Krivka2y ago
Anti-scraping protections | Academy | Apify Documentation
Understand the various anti-scraping measures different sites use to prevent bots from accessing them, and how to appear more human to fix these issues.
HonzaS
HonzaS2y ago
https://www.g2.com/products/salesforce-salesforce-sales-cloud/reviews for example, works in browser, does not work in automated browser
Lukas Krivka
Lukas Krivka2y ago
Yeah, we are aware of this one, were able to get through locally but not on platform

Did you find this page helpful?