extended-salmon•2y ago

PlaywrightCrawler runs into timeout on Apify but works locally

I have a crawler which performs very simple GET requests to a well known marketplace website. One crawler instance only sends 2-3 requests, after that I'm setting up a new crawler with different configuration. Locally, my crawler works just fine, but if I run it on Apify I'm getting the following error for most of the requests: ERROR PlaywrightCrawler: Request failed and reached maximum retries. page.goto: Timeout 10000ms exceeded. I only get this behaviour on one specific website, on other websites I'm crawling it works fine. So I'm assuming that I'm running into a blocking mechanism. However, I'm already using residential proxies:

proxyConfiguration = await Actor.createProxyConfiguration({
    groups: ['RESIDENTIAL'],
    countryCode: 'DE',
});

Interesting is that if I mark the session as bad after a timeout and try it again, it seems to work. But this approach is extremely slow and cumbersome, since I have to reset the session after pretty much every request. Any ideas?

5 Replies

HonzaS•2y ago

Try headfull mode, try puppeteer with stealth plugin, try playwright with firefox. You can also try to buy other proxies that will maybe work better.

Lukas Krivka•2y ago

Can you share the URL? Some websites might have very aggressive challenges where the fingerprint we use might not match Apify hardware. Our team would explore that.

Lukas Krivka•2y ago

Few generic tricks here - https://docs.apify.com/academy/anti-scraping

Anti-scraping protections | Academy | Apify Documentation

Understand the various anti-scraping measures different sites use to prevent bots from accessing them, and how to appear more human to fix these issues.

HonzaS•2y ago

https://www.g2.com/products/salesforce-salesforce-sales-cloud/reviews for example, works in browser, does not work in automated browser

Lukas Krivka•2y ago

Yeah, we are aware of this one, were able to get through locally but not on platform

PlaywrightCrawler runs into timeout on Apify but works locally

Did you find this page helpful?