national-gold•2y ago
Make PlaywrightCrawler less unique and avoid blocking? (canvas/fonts/plugins/permissions...)
I checked my program (PlaywrightCrawler) against this thing: https://amiunique.org/fingerprint
Used US residential proxy, did 3 screenshots, see below
It seems - there are some areas where Crawlee could do better (be less unique, less detectable)!
Here the list (these things are red on the screenshots):
- User Agent (I used fingerprint generator for this!)
- Canvas
- Navigator properties
- List of fonts
- List of plugins
- Permissions
Some settings in my PlaywrightCrawler:
useFingerprints: true
, useFingerprintCache: false
, launcher: firefox
Regarding list of plugins: I use some JS code (pluginContent
string) taken from here: https://discord.com/channels/801163717915574323/1059483872271798333
and inject it into page this way:
Well, this code/hack... it simulates presence of some PDF plugins... but I have an impression there are better solutions for plugins/fonts/permissions...My Fingerprint- Am I Unique ?
Check if your browser has a unique fingerprint, how identifiable you are on the Internet



9 Replies
Hi @new_in_town,
If you ideas for improvements of crawlee I suggest you to rise and Issue or PR at https://github.com/apify/crawlee/issues.
Some improvements are already in progress, thanks for suggestion. cc @petrpatek.
Some other tricks you can try https://docs.apify.com/academy/anti-scraping
Anti-scraping protections | Academy | Apify Documentation
Understand the various anti-scraping measures different sites use to prevent bots from accessing them, and how to appear more human to fix these issues.
national-goldOP•2y ago
Thanks @Lukas Krivka !
Well, fixing "too unique User Agent" - I think this should be done somewhere in libraries... may be in the fingerprint generator ?
Other things like List of fonts, List of plugins etc... probably the idea to use puppeteer-extra-plugin-stealth is good for this.
I mean this part of documentation:
Try Puppeteer with the puppeteer-extra-plugin-stealth plugin. Generally, Crawlee's default configuration should have stronger bypassing but some features might land first in the stealth plugin.I have two problems with puppeteer-extra-plugin-stealth: 1. I need an example: how to use this plugin with Crawlee 2. What to do if I already built my JS program/crawler with PlaywrightCrawler ?
@new_in_town just advanced to level 7! Thanks for your contributions! 🎉
like-gold•2y ago
there's a playwright extra stealth plugin too by the same maintainers
national-goldOP•2y ago
we are talking about this: https://github.com/berstend/puppeteer-extra/tree/master/packages/playwright-extra
right?
any example how to use it with PlaywrightCrawler ?
GitHub
puppeteer-extra/packages/playwright-extra at master · berstend/pupp...
💯 Teach puppeteer new tricks through plugins. Contribute to berstend/puppeteer-extra development by creating an account on GitHub.
see
Using puppeteer-extra and playwright-extra | Crawlee
puppeteer-extra and playwright-extra are community-built