Ways to minimize traffic (save money) when crawling-scraping?

  1. Block images, media files and similar things

It can be done either with preNavigationHooks, see https://crawlee.dev/api/playwright-crawler/interface/PlaywrightCrawlerOptions#preNavigationHooks

or with the blockRequests https://crawlee.dev/api/playwright-crawler/interface/PlaywrightCrawlingContext#blockRequests

As far as I know, blockRequests has some limitations (does it works in incognito mode with Firerox as launcher?). This was discussed in this forum, see:
crawlee-jsHow to avoid requesting some static resources?
crawlee-jsDisable image in playwright

  1. Use cache

As far as I understand - you can not have both: cache AND incognito mode.
Well, there is the experimentalContainers thing - in theory it should allow both cache and incognito.
I tried it, see PlaywrightCrawler - how often browser fingerprints are changed?
it looks it's not really "incognito" when fingerprint.com recognize you even when your IP is different.
(you can disprove me - may be my test was wrong, who knows?)

  1. Something else to reduce traffic?

Please suggest...

  1. Actually I care more about money than about traffic...

So one of the ideas - to use "Datacenter proxy" instead of "Residential"...
I see Datacenter proxies for about $0.7 per GB - much cheaper that Residential.
Does it make sense to try?
What is your experience with Datacenter proxies ?
Was this page helpful?