Apify Discord Mirror

Updated 5 months ago

Make PlaywrightCrawler less unique and avoid blocking? (canvas/fonts/plugins/permissions...)

At a glance

The community member is using a PlaywrightCrawler program and has found that it is too unique and detectable, with issues such as the User Agent, Canvas, Navigator properties, list of fonts, list of plugins, and permissions. They have tried using a fingerprint generator and injecting some JavaScript code to simulate PDF plugins, but believe there may be better solutions. The comments suggest raising an issue or PR on the Crawlee repository, as some improvements are already in progress. The community members also discuss using the puppeteer-extra-plugin-stealth or the playwright-extra plugin to address the detection issues, but need examples on how to use these plugins with Crawlee and PlaywrightCrawler.

Useful resources
I checked my program (PlaywrightCrawler) against this thing: https://amiunique.org/fingerprint
Used US residential proxy, did 3 screenshots, see below
It seems - there are some areas where Crawlee could do better (be less unique, less detectable)!

Here the list (these things are red on the screenshots):
  • User Agent (I used fingerprint generator for this!)
  • Canvas
  • Navigator properties
  • List of fonts
  • List of plugins
  • Permissions
Some settings in my PlaywrightCrawler:
useFingerprints: true, useFingerprintCache: false, launcher: firefox

Regarding list of plugins: I use some JS code (pluginContent string) taken from here: https://discord.com/channels/801163717915574323/1059483872271798333
and inject it into page this way:
Plain Text
    preNavigationHooks: [
        async ({ page, request }) => {
            await page.addInitScript({ content: pluginContent });
        },


Well, this code/hack... it simulates presence of some PDF plugins... but I have an impression there are better solutions for plugins/fonts/permissions...
Attachments
2.png
3.png
1.png
2
P
L
n
10 comments
Hi ,
If you ideas for improvements of crawlee I suggest you to rise and Issue or PR at https://github.com/apify/crawlee/issues.
Some improvements are already in progress, thanks for suggestion. cc
Thanks !
Well, fixing "too unique User Agent" - I think this should be done somewhere in libraries... may be in the fingerprint generator ?
Other things like List of fonts, List of plugins etc... probably the idea to use puppeteer-extra-plugin-stealth is good for this.

I mean this part of documentation:
Try Puppeteer with the puppeteer-extra-plugin-stealth plugin. Generally, Crawlee's default configuration should have stronger bypassing but some features might land first in the stealth plugin.

I have two problems with puppeteer-extra-plugin-stealth:
  1. I need an example: how to use this plugin with Crawlee
  2. What to do if I already built my JS program/crawler with PlaywrightCrawler ?
just advanced to level 7! Thanks for your contributions! πŸŽ‰
there's a playwright extra stealth plugin too by the same maintainers
we are talking about this: https://github.com/berstend/puppeteer-extra/tree/master/packages/playwright-extra
right?

any example how to use it with PlaywrightCrawler ?
Add a reply
Sign up and join the conversation on Discord