rival-black•3y ago
Accessing browser.newPage() inside PuppeteerCrawler
Hi, I'm trying to integrate the puppeteer-extra-plugin-recaptcha into my crawling, and I've gotten everything working except for one bit: in the documentation it says I need to create a new page with
However, I can't figure out where I can hook the page with that call to get the captcha integration working properly. My thoughts were that it would need to be done in the preNavigationHooks - maybe through crawlingContext?
Any ideas/pointers would be greatly appreciated!
5 Replies
preNavigationHooks
in PuppeteerCrawlerOptions
should work for you:
https://crawlee.dev/api/puppeteer-crawler/interface/PuppeteerCrawlerOptions#preNavigationHooks
example:
rival-blackOP•3y ago
that part makes sense, but is there any way to set the page to browser.newPage() or does it do that in the background automatically?
rare-sapphire•3y ago
In the background, Crawlee already runs
browser.newPage()
for you. Launching browsers or creating pages yourself can cause issues.You can pass in your own launcher, basically what you get from Puppeteer to the crawler - https://crawlee.dev/api/puppeteer-crawler/interface/PuppeteerLaunchContext#launcher
PuppeteerLaunchContext | API | Crawlee
Apify extends the launch options of Puppeteer.
You can use any of the Puppeteer compatible
LaunchOptions
options by providing the launchOptions
property.
Example:
```js
// launch a headless Chrome (not Chromium)
const launchContext = {
// Apify helpers
useCh...Basically, you import puppeteer, wrap it with the extra and than provide that into the options to the crawler