national-gold
national-gold•2y ago

Make PlaywrightCrawler less unique and avoid blocking? (canvas/fonts/plugins/permissions...)

I checked my program (PlaywrightCrawler) against this thing: https://amiunique.org/fingerprint Used US residential proxy, did 3 screenshots, see below It seems - there are some areas where Crawlee could do better (be less unique, less detectable)! Here the list (these things are red on the screenshots): - User Agent (I used fingerprint generator for this!) - Canvas - Navigator properties - List of fonts - List of plugins - Permissions Some settings in my PlaywrightCrawler: useFingerprints: true, useFingerprintCache: false, launcher: firefox Regarding list of plugins: I use some JS code (pluginContent string) taken from here: https://discord.com/channels/801163717915574323/1059483872271798333 and inject it into page this way:
preNavigationHooks: [
async ({ page, request }) => {
await page.addInitScript({ content: pluginContent });
},
preNavigationHooks: [
async ({ page, request }) => {
await page.addInitScript({ content: pluginContent });
},
Well, this code/hack... it simulates presence of some PDF plugins... but I have an impression there are better solutions for plugins/fonts/permissions...
My Fingerprint- Am I Unique ?
Check if your browser has a unique fingerprint, how identifiable you are on the Internet
No description
No description
No description
9 Replies
Pepa J
Pepa J•2y ago
Hi @new_in_town, If you ideas for improvements of crawlee I suggest you to rise and Issue or PR at https://github.com/apify/crawlee/issues.
Lukas Krivka
Lukas Krivka•2y ago
Some improvements are already in progress, thanks for suggestion. cc @petrpatek.
Lukas Krivka
Lukas Krivka•2y ago
Some other tricks you can try https://docs.apify.com/academy/anti-scraping
Anti-scraping protections | Academy | Apify Documentation
Understand the various anti-scraping measures different sites use to prevent bots from accessing them, and how to appear more human to fix these issues.
national-gold
national-goldOP•2y ago
Thanks @Lukas Krivka ! Well, fixing "too unique User Agent" - I think this should be done somewhere in libraries... may be in the fingerprint generator ? Other things like List of fonts, List of plugins etc... probably the idea to use puppeteer-extra-plugin-stealth is good for this. I mean this part of documentation:
Try Puppeteer with the puppeteer-extra-plugin-stealth plugin. Generally, Crawlee's default configuration should have stronger bypassing but some features might land first in the stealth plugin.
I have two problems with puppeteer-extra-plugin-stealth: 1. I need an example: how to use this plugin with Crawlee 2. What to do if I already built my JS program/crawler with PlaywrightCrawler ?
MEE6
MEE6•2y ago
@new_in_town just advanced to level 7! Thanks for your contributions! 🎉
like-gold
like-gold•2y ago
there's a playwright extra stealth plugin too by the same maintainers
national-gold
national-goldOP•2y ago
we are talking about this: https://github.com/berstend/puppeteer-extra/tree/master/packages/playwright-extra right? any example how to use it with PlaywrightCrawler ?
GitHub
puppeteer-extra/packages/playwright-extra at master · berstend/pupp...
💯 Teach puppeteer new tricks through plugins. Contribute to berstend/puppeteer-extra development by creating an account on GitHub.
Lukas Krivka
Lukas Krivka•2y ago
see

Did you find this page helpful?