Apify Discord Mirror

Updated 3 months ago

bot detection (captcha) changed, Playwright+Crawlee+Firefox+rotating proxies does not help any more

At a glance

The community member has a program that uses Playwright, Crawlee, Firefox, and rotating proxies to scrape jobs from wellfound.com. The program worked well until recently, when the community member started receiving HTTP 403 errors and captchas. The community member has tried various approaches, such as switching between Chrome and Firefox, using the stealthPlugin() for Chrome, and accessing the site through a "sticky session" proxy, but the captcha issue persists. The community member believes that the site's bot detection has improved, and is seeking a solution to this problem. There is no explicitly marked answer in the comments.

Useful resources
I have a program: Playwright+Crawlee+Firefox+rotating proxies used to scrape jobs from wellfound.com In may 2024 (and earlier) it worked quite well, many months, despite captcha protection on site.

Today I get HTTP 403 and captcha (from ct.captcha-delivery.com). My code is not changed!

Proxies: iproyal.com, "residential-proxies", session time 1 min ("sticky session"). What I did: in the same session accessed URL1 and than URL2. URL1 has no captcha, URL2 contains info I need, and is/was protected with captcha. In the past the trick with "URL1 and than URL2 in the same session" worked well. Today I get captcha when accessing URL2.

What I tried: switched between Chrome and Firefox in my code. For Chrome tried with chromium.use(stealthPlugin()) and without it.

Still see that captcha. Tried to access the site with normal GUI browser (Firefox) through iproyal.com "sticky session": accessing URL1 and than URL2: no captcha.
It means: proxies are still OK, they are not detected!

Bottom line: something changed, bot detection improved.
What is our answer?

Is it something similar to this: https://discord.com/channels/801163717915574323/1293244368249032895/1293244368249032895
@Jeno what solution you found?
n
1 comment
This is it:
Attachment
SPOILER_data-dome.png
Add a reply
Sign up and join the conversation on Discord