14 replies

Problem with scraping a site that requires login

🎭PlaywrightCrawler👨‍💻Web-Scraping

I have a paid actor I am renting out to customers that is failing because of a recent anti bot mitigation that prevents scraping pages past 10 without logging in. I have implemented Google login and store session cookies in a shared key value store for the actor to use and this seem to work fine. However Google has flagged account and logins as being a bot and has since terminated the account thus login fails and then scraping fails as well. Before the Google account termination, I experienced that the site I scrape, also seemed to throttle my requests - however this is without using a proxy so might be possible to circumvent, however this has never been an issue before with this site.

The site has option for Google, Facebook, Apple or email login and I chose Google because email requires to receive a login code to the email each time a login is performed, which I couldn't automate.

I have been trying to resolve this for the past week and was successfull until the Google login termination.

I am using Crawlee Playwright and run only 1 concurrent browser context to not overwhelm or batch requests against the site.

Do you have experience with how to deal with such anti bot measures reliably?

Problem with scraping a site that requires login

Similar Threads