Got captha and HTTP 403 using PlaywrightCrawler
I get captcha all the time when I access links like these (basically - accessing any job ad on wellfound):
https://wellfound.com/company/kalepa/jobs/2651640-tech-lead-manager-full-stack-europe
https://wellfound.com/company/pinatacloud/jobs/2655889-principal-software-engineer
https://wellfound.com/company/wingspanapp/jobs/2629420-senior-software-engineer
Screenshot attached.
I am using:
- US residential proxies from smartproxy.com
- PlaywrightCrawler with
useSessionPool: false and persistCookiesPerSession: false- headless Firefox, both as
launcher and in fingerprintGeneratorOptions browsers- my locale is en-US, timezone in America/New_York (to match US proxies)
- in
fingerprintGeneratorOptions devices: ['desktop']- in
launchContext: { useIncognitoPages: true }- I set pluginContent in
preNavigationHooks to fix the "plugin length" problem, as described here: crawlee-jsCrawlee vs bot detection systems - Plugins length is not OKAnd still this site detects me as robot!
Any ideas how to overcome this?
UPDATE1: the IP on screenshot is somewhere in US/Texas...
UPDATE2: when I open these links in my desktop browser incognito mode - I get this captcha too...

