Ways to minimize traffic (save money) when crawling-scraping?
It can be done either with
preNavigationHooks, see https://crawlee.dev/api/playwright-crawler/interface/PlaywrightCrawlerOptions#preNavigationHooksor with the
blockRequests https://crawlee.dev/api/playwright-crawler/interface/PlaywrightCrawlingContext#blockRequestsAs far as I know,
blockRequests has some limitations (does it works in incognito mode with Firerox as launcher?). This was discussed in this forum, see:crawlee-jsHow to avoid requesting some static resources?
crawlee-jsDisable image in playwright
2. Use cache
As far as I understand - you can not have both: cache AND incognito mode.
Well, there is the
experimentalContainers thing - in theory it should allow both cache and incognito.I tried it, see PlaywrightCrawler - how often browser fingerprints are changed?
it looks it's not really "incognito" when fingerprint.com recognize you even when your IP is different.
(you can disprove me - may be my test was wrong, who knows?)
3. Something else to reduce traffic?
Please suggest...
4. Actually I care more about money than about traffic...
So one of the ideas - to use "Datacenter proxy" instead of "Residential"...
I see Datacenter proxies for about $0.7 per GB - much cheaper that Residential.
Does it make sense to try?
What is your experience with Datacenter proxies ?

