Apify Discord Mirror

Updated 2 years ago

PlaywrightCrawler - how often browser fingerprints are changed?

At a glance

The community members are discussing the behavior of browser fingerprints and how they are affected by using Playwright Crawler with different settings. They tested the fingerprint.com/demo site and found that without incognito mode, the site assigned the same ID even with different IP addresses. With incognito mode, the site assigned a different ID for each request.

The community members also tested other bot detection/fingerprinting sites ([2] and [3]) and found that they were able to identify the same fingerprint across different IP addresses. They are looking for ways to randomize the fingerprint to avoid detection, such as using the prePageCreateHooks in Playwright Crawler to set randomized page options like locale, timezone, and user agent.

There is no explicitly marked answer, but the community members are collaborating to find a solution to the issue of consistent fingerprints across different IP addresses.

Useful resources
Are browser fingerprints changed
  • every request?
  • every 1 min?
  • every... I do not what else ))
And how changing browser fingerprints related
to using or not using PlaywrightCrawler.launchContext.useIncognitoPages ?

I am asking this because I saw a situation when two attempts to open a bot detection site
https://fingerprint.com/demo/ result in same "ID" - in other words they were
able to identify me! Screenshots attached.

Interval between requests - 3 min.
Different IPs (from the pool of "rotating" IP's).
Without incognito
Attachments
f2.png
f1.png
A
n
L
8 comments
I used the site [1] to test PlaywrightCrawler with different setting. Here what I saw:


Without incognito mode - as described above - the site [1] assigns the same ID even when different IP's are used.


Plain Text
    launchContext: {
        experimentalContainers: false,
        useIncognitoPages: true,
        launcher: firefox,
    },

With incognito mode turned on the site [1] can not detect Crawlee - it assigns a
different ID for every request.


Plain Text
    launchContext: {
        experimentalContainers: true,
        useIncognitoPages: false,
        launcher: firefox,
    },

the site [1] assign the same ID even when different IP's are used.

In all cases - no session pool and persistent cookies:
Plain Text
    useSessionPool: false,
    persistCookiesPerSession: false,


[1] https://fingerprint.com/demo/

So is it "as designed" or my test is wrong or something else?
Fingerprint and proxy IP should be attached to a browser. Depending on your browserPoolOptions, it will open some number of pages per browser. By default it is 100 requests per browser I think. Those should have same fingerprint and proxy IP. Each new browser should attach to different IP and fingerprint.

Incognito and experimental containers should on the other hand use different fingerprint/IP for each request.
But of course, there might be bugs
Thanks for all the testing
Incognito and experimental containers should on the other hand use different fingerprint/IP for each request.

This seems to be true for incognito - different ID (on fingerprint.com/demo/) assigned every time

And something is wrong in case of experimental containers - same ID all the time!
I did additional tests with bot detection/fingerprinting sites.

Used PlaywrightCrawler with this configuration:
Plain Text
    launchContext: {
        experimentalContainers: false,
        useIncognitoPages: true,
        launcher: firefox,
    },

No session pool and persistent cookies. PlaywrightCrawler.browserPoolOptions.useFingerprints = true

  • Test results with [1] are same as before - it assigns a different ID for every request. Fine.
  • Test results with [2] and [3] - SAME ID FOR EVERY REQUEST. The IP was different with each request.
On the screenshot from site [2] -attached below- look at "FP ID" on the top and on "visits".
"Visits: 4" - it means this site knows - it is my visit number 4.

Screenshots from site [3] are self-explaining: huge "Your fingerpring:" in the middle.
Fingerprint remains the same when I visit it from different IP.

Not a good news.
So, what can be done against it?
The FingerprintGenerator is there, how it can be used against [2] and [3] ???


[1] https://fingerprint.com/demo/
[2] https://abrahamjuliot.github.io/creepjs/
[3] https://noscriptfingerprint.com/
Attachment
01-abrahamjuliot.github.io.png
we could use the following trick (to fool bot detection [2] and [3]):
Use the prePageCreateHooks and set randomized pageOptions there.
I already doing this for locale+timezoneId, and it works:
Plain Text
    browserPoolOptions: {
   
        prePageCreateHooks: [
            (pageId, browserController, pageOptions) => {
                pageOptions.locale      = myLocaleSingle;
                pageOptions.timezoneId  = myTimezone;
            },
        ],


According to Playwright documentation https://playwright.dev/docs/api/class-browser#browser-new-page
we can set these things (only the most interesting options listed here):

colorScheme
deviceScaleFactor
extraHTTPHeaders ???
screen.width
screen.height
timezoneId
locale
userAgent !!!
viewport.width
viewport.height

How to get these values from the Fingerprint Generator? (being in the PlaywrightCrawler)
I would like to try, pls help...
Add a reply
Sign up and join the conversation on Discord