sacred-emeraldS
Apify & Crawlee3y ago
10 replies
sacred-emerald

Got captha and HTTP 403 using PlaywrightCrawler

Got captha and HTTP 403 when accessing wellfound.com

I get captcha all the time when I access links like these (basically - accessing any job ad on wellfound):
https://wellfound.com/company/kalepa/jobs/2651640-tech-lead-manager-full-stack-europe
https://wellfound.com/company/pinatacloud/jobs/2655889-principal-software-engineer
https://wellfound.com/company/wingspanapp/jobs/2629420-senior-software-engineer

Screenshot attached.


and this is not Cloudflare protection - it's some other anti-bot thing.

I am using:
- US residential proxies from smartproxy.com
- PlaywrightCrawler with
useSessionPool: false
and
persistCookiesPerSession: false

- headless Firefox, both as
launcher
and in
fingerprintGeneratorOptions
browsers

- my locale is en-US, timezone in America/New_York (to match US proxies)
- in
fingerprintGeneratorOptions
devices: ['desktop']

- in
launchContext: { useIncognitoPages: true }

- I set pluginContent in
preNavigationHooks
to fix the "plugin length" problem, as described here: crawlee-jsCrawlee vs bot detection systems - Plugins length is not OK

And still this site detects me as robot!
Any ideas how to overcome this?

UPDATE1: the IP on screenshot is somewhere in US/Texas...

UPDATE2: when I open these links in my desktop browser incognito mode - I get this captcha too...
wellfound.com-01.png
Was this page helpful?