stormy-gold
stormy-gold3y ago

puppeteer.connect()

Hey there! Is there still a way to connect crawlee to a remote browser instance using the browserWSEndpoint parameter when normally calling puppeteer.connect()?
3 Replies
Lukas Krivka
Lukas Krivka3y ago
You can first create your puppeteer instance and connect and then pass it to Crawler via launcher. https://crawlee.dev/docs/examples/playwright-crawler-firefox Sorry, this is nonsense, connect gives you back single browser instance. In this case, you would need to delve more into browser-pool subpackage and provide your own plugin I think but don't know exactly now https://github.com/apify/crawlee/tree/master/packages/browser-pool
GitHub
crawlee/packages/browser-pool at master · apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js that helps you build reliable crawlers. Fast. - crawlee/packages/browser-pool at master · apify/crawlee
Lukas Krivka
Lukas Krivka3y ago
The key part you would need to override in the plugin is here: https://github.com/apify/crawlee/blob/master/packages/browser-pool/src/browser-pool.ts#LL633C35-L633C71 You would also need to make sure there is max 1 browser at a time if you would not connect to new browsers
GitHub
crawlee/browser-pool.ts at master · apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js that helps you build reliable crawlers. Fast. - crawlee/browser-pool.ts at master · apify/crawlee
Lukas Krivka
Lukas Krivka3y ago
Might be easier to just use BasicCrawler and create new pages there manually

Did you find this page helpful?