Apify Discord Mirror

Updated last month

Connecting to a remote browser instance?

At a glance

The post asks if there is a way to specify a web socket endpoint in the PlaywrightCrawler config to connect to a remote browser. Community members suggest a few potential solutions:

1. Write a custom PlaywrightPlugin that uses this.library.connectOverCDP() instead of this.library.launch(), and provide it to the PlaywrightCrawler via the browserPool option.

2. Use a BasicCrawler and manage the browser page manually, connecting to the remote browser using chromium.connectOverCDP().

However, community members note that these solutions may not be straightforward, and there is an open issue on the Crawlee repository about this feature request.

Useful resources
Is there a way we can specify a web socket endpoint in the PlaywrightCrawler config (or somewhere else) so we can connect to a remote browser?
3
M
A
t
8 comments
Hi ,

It looks like the solution is not straightforward, you may try to write your own PlaywrightPlugin, by replacing every this.library.launch by this.library.connectOverCDP('http://hostname:port') (e.g. http://localhost:9222), and then provide it to the PlaywrightCrawler via the browserPool option parameter (check the code of PlaywrightCrawlerOptions for more details).
just advanced to level 1! Thanks for your contributions! πŸŽ‰
Thanks for the response! unfortunately when i try that i get an error:
Error: browserPoolOptions.browserPlugins is disallowed. Use launchContext.launcher instead.
Hello! I have same task with remote browser.
did you find the solution with launchContext.launcher? May you share this one?
I was thinking about another solution: you can create a BasicCrawler and manage your browser page by yourself, for example:
Plain Text
import { chromium } from 'playwright';
import { newInjectedContext } from 'fingerprint-injector';

const BROWSER_URL = 'http://127.0.0.1:9222'; // or something like 'ws://127.0.0.1:36775/devtools/browser/a292f96c-7332-4ce8-82a9-7411f3bd280a'

// ... inside your BasicCrawler
async requestHandler({ request, sendRequest, log }) {
    // Initialize your browser
    const browser = await chromium.connectOverCDP(BROWSER_URL);
    const context = await newInjectedContext(browser); // See https://github.com/apify/fingerprint-suite
    const page = await context.newPage();
    
    try {
        await page.goto(request.url, {timeout: 20000});
    
        // ... extract data here
        
    } finally {
        await page.close();
        await context.close();
        await browser.close();
    }
}
Yeah, I don't think this is possible with e.g. PlaywrightCrawler but if there would be bigger demand, technically could be implemented. There is actually an issue for this https://github.com/apify/crawlee/issues/1822
Hi @Lukas Krivka

Any feature update on this? I checked the github issue, it is still open. Building a scraper functionality into our AI agent, hoping to use Crawlee for the scraping part, but require connecting to remote browser.
Add a reply
Sign up and join the conversation on Discord