sacred-emeraldS

Saving bandwith using PlaywrightCrawler: to block googletagmanager, google-analytics etc...

I already block images as described in [1] and this helps to save some bandwith.
Next step: looking at statistics in my proxy service I see a significant number of requests like these:

https://www.googletagmanager.com/gtag/js?id=...
https://connect.facebook.net/en_US/fbevents.js
https://www.google-analytics.com/analytics.js
https://fonts.googleapis.com/css?family=Lato


Can somebody show me an example of code blocking these domains? (better: to block all domains from a given list)

I assume it should be something in PlaywrightCrawler.preNavigationHooks, right?
Prerequisites: PlaywrightCrawler, Firefox as launcher (Chrome-specific hacks probably would not work)

(I'm not good at writing Javascript from scratch, so need some help)

[1] crawlee-jsWays to minimize traffic (save money) when crawling-scraping?
Was this page helpful?