Rói
Rói2mo ago

Configuring playwright + crawlee js to bypass certain sites

I have noticed some pages that appear completely normal are sometimes hard to fetch content from. After some investigation, it might have something to do with the site being behind cloudflare. Do you have any suggestions on how to get past this? I believe in certain cases, it's simply a matter of popups and accepting some cookies. I do have stealth plugin added, but it still does not pierce through.
3 Replies
azzouzana
azzouzana2mo ago
post to #💻hire-freelancers
thenetaji
thenetaji2mo ago
use cloudflare bypassing pkg, and many ways to bypass cloudflare turnstile
fierDeToiMonGrand
You can try https://crawlee.dev/js/docs/guides/avoid-blocking#camoufox just make sure to have a late version of Node like 22 to be able to run camoufox it is not bullet proof, but it is better than the normal playwright crawler.
Avoid getting blocked | Crawlee for JavaScript · Build reliable cr...
Crawlee helps you build and maintain your crawlers. It's open source, but built by developers who scrape millions of pages every day for a living.

Did you find this page helpful?