rival-black
rival-black3y ago

Accessing browser.newPage() inside PuppeteerCrawler

Hi, I'm trying to integrate the puppeteer-extra-plugin-recaptcha into my crawling, and I've gotten everything working except for one bit: in the documentation it says I need to create a new page with
const page = await browser.newPage()
const page = await browser.newPage()
However, I can't figure out where I can hook the page with that call to get the captcha integration working properly. My thoughts were that it would need to be done in the preNavigationHooks - maybe through crawlingContext? Any ideas/pointers would be greatly appreciated!
5 Replies
Oleg V.
Oleg V.3y ago
preNavigationHooksin PuppeteerCrawlerOptions should work for you: https://crawlee.dev/api/puppeteer-crawler/interface/PuppeteerCrawlerOptions#preNavigationHooks example:
preNavigationHooks: [async ({ page }, gotoOptions) => {
gotoOptions.waitUntil = 'domcontentloaded';
// use page
await page.title();

}],
preNavigationHooks: [async ({ page }, gotoOptions) => {
gotoOptions.waitUntil = 'domcontentloaded';
// use page
await page.title();

}],
rival-black
rival-blackOP3y ago
that part makes sense, but is there any way to set the page to browser.newPage() or does it do that in the background automatically?
rare-sapphire
rare-sapphire3y ago
In the background, Crawlee already runs browser.newPage() for you. Launching browsers or creating pages yourself can cause issues.
Lukas Krivka
Lukas Krivka3y ago
You can pass in your own launcher, basically what you get from Puppeteer to the crawler - https://crawlee.dev/api/puppeteer-crawler/interface/PuppeteerLaunchContext#launcher
PuppeteerLaunchContext | API | Crawlee
Apify extends the launch options of Puppeteer. You can use any of the Puppeteer compatible LaunchOptions options by providing the launchOptions property. Example: ```js // launch a headless Chrome (not Chromium) const launchContext = { // Apify helpers useCh...
Lukas Krivka
Lukas Krivka3y ago
Basically, you import puppeteer, wrap it with the extra and than provide that into the options to the crawler

Did you find this page helpful?