exotic-emerald
exotic-emeraldโ€ข11mo ago

Crawlee Playwright is detected as bot

Checking on this page, Crawlee Playwright is detected as bot due to CDP. https://www.browserscan.net/bot-detection This is a known issue, also discussed on: https://github.com/berstend/puppeteer-extra/issues/899 Wondering if Crawlee can come up with a solution?
BrowserScan
BrowserScan - Robot Detection/WebDriver
Bot Test, WebDriver Test, Discord bots, Cloudflare Turnstile, Google reCAPTCHA, gives you a powerful tool to prevent online fraud
16 Replies
MEE6
MEE6โ€ข11mo ago
@Jeno just advanced to level 2! Thanks for your contributions! ๐ŸŽ‰
Hall
Hallโ€ข11mo ago
View post on community site
This post has been pushed to the community knowledgebase. Any replies in this thread will be synced to the community site.
Apify Community
Saurav Jain
Saurav Jainโ€ข11mo ago
checking with team and getting back to you.
Lukas Celnar
Lukas Celnarโ€ข11mo ago
Hi @Jeno We are working on solution that will not use Playwright and should be more unblockable. Meanwhile you can checkout these tips https://docs.apify.com/academy/anti-scraping#quick-start
Anti-scraping protections | Academy | Apify Documentation
Understand the various anti-scraping measures different sites use to prevent bots from accessing them, and how to appear more human to fix these issues.
exotic-emerald
exotic-emeraldOPโ€ข11mo ago
That's exciting news!
passive-yellow
passive-yellowโ€ข11mo ago
Very interesting, @Jeno pls look at this screenshot. As far as i understand: "Normal" = "Nothing detected", right? Code: Playwright+Crawlee+Firefox+rotating proxies. And exactly same program is detected on another site, see here: https://discord.com/channels/801163717915574323/1296398744870457394
No description
exotic-emerald
exotic-emeraldOPโ€ข10mo ago
Can you give a hint at what stage that new solution is? Weeks or months away? Or next major version?
hunterleung.
hunterleung.โ€ข10mo ago
I wanna know that too. It is a great feature .๐Ÿ‘
azzouzana
azzouzanaโ€ข10mo ago
๐Ÿ‘€
eager-peach
eager-peachโ€ข9mo ago
๐Ÿ‘€
hunterleung.
hunterleung.โ€ข9mo ago
๐Ÿ‘€
xenial-black
xenial-blackโ€ข9mo ago
I wonder, is this an issue with fingerprint-suite or the PlaywrightCrawler implementation?
GitHub
GitHub - apify/fingerprint-suite: Browser fingerprinting tools for ...
Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify. - apify/fingerprint-suite
exotic-emerald
exotic-emeraldOPโ€ข8mo ago
It's a lot of things. I have read somewhere that CF can detect the synthetic mouse actions by Playwright and Puppeteer. The only solution I had success with was Puppeteer Real Browser. It passes CF easily.
hunterleung.
hunterleung.โ€ข7mo ago
How can we use real browsers at scale?
exotic-emerald
exotic-emeraldOPโ€ข6mo ago
Crawlee silently added Camoufox and it's amazing! Real browser was a pain to use. I am using Camoufox with simple datacenter proxies and easily passing everything. It is available as a template. Make sure you use socks5 to avoid some edge cases.
Saurav Jain
Saurav Jainโ€ข6mo ago
haha its still in beta ๐Ÿ™‚ we are about to post it on social media this week! glad you liked it ๐Ÿ˜„

Did you find this page helpful?