harsh-harlequin
harsh-harlequin2y ago

Getting past google's signin page

I'm writing a script for scraping part of google maps. I've got chromium to open, go to google maps, click on the cookie consent button, but I'm not sure how to get past the google sign in. What's the secret for this? Thanks.
15 Replies
Lukas Krivka
Lukas Krivka2y ago
You shouldn't need to sign in just because of cookie consent
harsh-harlequin
harsh-harlequinOP2y ago
The following script ends up in the signup page. const puppeteer = require('puppeteer'); const { newInjectedPage } = require('fingerprint-injector'); (async () => { const browser = await puppeteer.launch({ headless: false }); const page = await newInjectedPage(browser, { fingerprintOptions: { devices: ['desktop'], operatingSystems: ['windows'], }, }); try { await page.goto('https://www.google.com/maps/search/malls+in+London'); const consentButtonSelector = '[jsname="V67aGc"]'; // Consent button selector // Wait for the consent button to be visible await page.waitForSelector(consentButtonSelector, { visible: true, timeout: 10000 }); // Wait for 3 seconds before clicking the button await page.waitForTimeout(3000); // Click the consent button await page.click(consentButtonSelector); // Wait for any navigation that might occur after clicking the consent button await page.waitForNavigation({ timeout: 10000 }); // Add your scraping logic after the consent interaction // ... } catch (error) { console.error('An error occurred:', error); } // Uncomment the line below if you want to keep the browser open for debugging // await page.waitForTimeout(20000); // Adjust or remove for production // await browser.close(); })();
Malls
Malls
Lukas Krivka
Lukas Krivka2y ago
Not sure why, when I click on '[action^="https://consent.google"] button', it just goes to the websites
harsh-harlequin
harsh-harlequinOP2y ago
Oh… well on the one hand that’s good news I guess on the other, I’ve got absolutely no idea what to do about it.
MEE6
MEE62y ago
@Raed just advanced to level 1! Thanks for your contributions! 🎉
harsh-harlequin
harsh-harlequinOP2y ago
I assume you’re using chromium? Puppeteer.
Lukas Krivka
Lukas Krivka2y ago
Yep, with Crawlee
harsh-harlequin
harsh-harlequinOP2y ago
I also tried with crawlee but same problem. I’m new to this and normally just use google sheets to scrape. Or some simple python. And anything needing more than 1k lines I use Octoparse.
harsh-harlequin
harsh-harlequinOP2y ago
I've also tried this: const { PuppeteerCrawler } = require('crawlee'); const crawler = new PuppeteerCrawler({ launchContext: { launchOptions: { headless: false, // Set a common user-agent to avoid detection of automated browsing args: ['--user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36"'], }, }, requestHandler: async ({ page, request }) => { const consentButtonSelector = '[jsname="V67aGc"]'; // Before clicking the consent button, set some cookies/localStorage if needed await page.waitForSelector(consentButtonSelector, { visible: true }); await page.click(consentButtonSelector); // Check if we have been redirected to the sign-in page await page.waitForNavigation(); const currentUrl = page.url();
// If redirected to sign-in, handle accordingly, otherwise proceed if (currentUrl.includes('accounts.google.com')) { // Logic to handle sign-in page } else { // We are on the correct page, proceed with scraping const pageTitle = await page.title(); console.log(Title of ${request.url}: ${pageTitle}); } // Additional scraping logic will go here } }); (async () => { await crawler.addRequests(['https://www.google.com/maps/search/malls+in+London']); await crawler.run(); })();
Malls
Malls
harsh-harlequin
harsh-harlequinOP2y ago
Do I need to modify the chromium settings in some way do you think?
Lukas Krivka
Lukas Krivka2y ago
I still have no idea why it would redirect to sign up, it never did for me, manually or with Puppeteer
harsh-harlequin
harsh-harlequinOP2y ago
If I do it manually, it doesn't do it. are you signed in to a google account on chromium?
Lukas Krivka
Lukas Krivka2y ago
No, no idea why it would happen, make sure you use proxies
harsh-harlequin
harsh-harlequinOP2y ago
OK. Thanks. I'll keep trying
MEE6
MEE62y ago
@Raed just advanced to level 2! Thanks for your contributions! 🎉

Did you find this page helpful?