Configuring playwright + crawlee js to bypass certain sites
I have noticed some pages that appear completely normal are sometimes hard to fetch content from. After some investigation, it might have something to do with the site being behind cloudflare. Do you have any suggestions on how to get past this? I believe in certain cases, it's simply a matter of popups and accepting some cookies. I do have stealth plugin added, but it still does not pierce through.
3 Replies
post to #💻hire-freelancers
use cloudflare bypassing pkg, and many ways to bypass cloudflare turnstile
You can try
https://crawlee.dev/js/docs/guides/avoid-blocking#camoufox
just make sure to have a late version of Node like 22 to be able to run camoufox
it is not bullet proof, but it is better than the normal playwright crawler.
Avoid getting blocked | Crawlee for JavaScript · Build reliable cr...
Crawlee helps you build and maintain your crawlers. It's open source, but built by developers who scrape millions of pages every day for a living.