Apify & CrawleeA&CApify & Crawlee
Powered by
CasperC
Apify & Crawlee•4mo ago•
14 replies
Casper

Problem with scraping a site that requires login

🎭PlaywrightCrawler👨‍💻Web-Scraping
I have a paid actor I am renting out to customers that is failing because of a recent anti bot mitigation that prevents scraping pages past 10 without logging in. I have implemented Google login and store session cookies in a shared key value store for the actor to use and this seem to work fine. However Google has flagged account and logins as being a bot and has since terminated the account thus login fails and then scraping fails as well. Before the Google account termination, I experienced that the site I scrape, also seemed to throttle my requests - however this is without using a proxy so might be possible to circumvent, however this has never been an issue before with this site.

The site has option for Google, Facebook, Apple or email login and I chose Google because email requires to receive a login code to the email each time a login is performed, which I couldn't automate.

I have been trying to resolve this for the past week and was successfull until the Google login termination.

I am using Crawlee Playwright and run only 1 concurrent browser context to not overwhelm or batch requests against the site.

Do you have experience with how to deal with such anti bot measures reliably?
Apify & Crawlee banner
Apify & CrawleeJoin
This is the official developer community of Apify and Crawlee.
13,739Members
Resources
Recent Announcements

Similar Threads

Was this page helpful?
Recent Announcements
ellativity

**The Apify $1M Challenge is over!** For everyone who joined yesterday’s Award Ceremony livestream for the Apify $1M Challenge, thank you for your enthusiastic drumrolls in the chat and positive vibes. We were really feeling the excitement and celebratory mood! If you missed the stream or just want to rewatch the key moments again, here’s the replay link https://www.youtube.com/watch?v=eEDV-5X43Gg (ngl, the replay is not the same without your live chat) And, if you didn’t check the email that should have landed in your inboxes, we’d love to hear about your experience of participating in the Apify $1M Challenge. **<a:alerthulk:1468892073917939713> Win one of five $100 Visa gift cards by completing the end-of-challenge survey here: https://apify.typeform.com/to/mjoMaZqD** Thank you again to everyone who participated in any capacity. The past 3 months have been a wild ride and we feel so grateful to have been on this adventure with y’all. We mean every word when we say how much you’ve impressed us. Thank you all from the bottom of our hearts. <a:keanuthanks:1430839059655426068> Saurav and Ella xoxo PS - if you just want to jump to the spoilers, a full list of winners is available at https://apify.com/challenge 🏆

ellativity · 5d ago

ellativity

**You are invited** ... to celebrate all the achievements of the Apify $1M Challenge with us on Wednesday, February 4 at **8 AM PT / 11 AM ET / 4 PM GMT / 5 PM CET / 9:30 PM IST / 12 AM +1d CST** We will be announcing winners of the Grand Prizes, as well as regional winners and much more, with especially good news for all participating developers. 🏆 We look forward to sharing with you all! 🎉 More info here: https://luma.com/6c1493t0

ellativity · 2w ago

ellativity

Hi @everyone 👋 I'm hanging out in https://discord.com/channels/801163717915574323/1430491198145167371 for the next 45 min, if you want to discuss the end of the challenge or anything else.

ellativity · 2w ago

Similar Threads

parallel Login Scraping
full-greenFfull-green / crawlee-js
3y ago
scraping only certain parts of a site that are dynamically rendered
embarrassing-maroonEembarrassing-maroon / crawlee-js
3y ago
A site that shows cloudflare captcha ALWAYS
sacred-emeraldSsacred-emerald / crawlee-js
17mo ago
Scraping Lazada.
ordinary-sapphireOordinary-sapphire / crawlee-js
3y ago