extended-salmon•2y ago
PlaywrightCrawler runs into timeout on Apify but works locally
I have a crawler which performs very simple GET requests to a well known marketplace website. One crawler instance only sends 2-3 requests, after that I'm setting up a new crawler with different configuration. Locally, my crawler works just fine, but if I run it on Apify I'm getting the following error for most of the requests:
ERROR PlaywrightCrawler: Request failed and reached maximum retries. page.goto: Timeout 10000ms exceeded.
I only get this behaviour on one specific website, on other websites I'm crawling it works fine. So I'm assuming that I'm running into a blocking mechanism. However, I'm already using residential proxies:
proxyConfiguration = await Actor.createProxyConfiguration({
groups: ['RESIDENTIAL'],
countryCode: 'DE',
});
Interesting is that if I mark the session as bad after a timeout and try it again, it seems to work. But this approach is extremely slow and cumbersome, since I have to reset the session after pretty much every request.
Any ideas?5 Replies
Try headfull mode, try puppeteer with stealth plugin, try playwright with firefox. You can also try to buy other proxies that will maybe work better.
Can you share the URL? Some websites might have very aggressive challenges where the fingerprint we use might not match Apify hardware. Our team would explore that.
Few generic tricks here - https://docs.apify.com/academy/anti-scraping
Anti-scraping protections | Academy | Apify Documentation
Understand the various anti-scraping measures different sites use to prevent bots from accessing them, and how to appear more human to fix these issues.
https://www.g2.com/products/salesforce-salesforce-sales-cloud/reviews for example, works in browser, does not work in automated browser
Yeah, we are aware of this one, were able to get through locally but not on platform