Connecting to a remote browser instance?

At a glance

The post asks if there is a way to specify a web socket endpoint in the PlaywrightCrawler config to connect to a remote browser. Community members suggest a few potential solutions:

1. Write a custom PlaywrightPlugin that uses this.library.connectOverCDP() instead of this.library.launch(), and provide it to the PlaywrightCrawler via the browserPool option.

2. Use a BasicCrawler and manage the browser page manually, connecting to the remote browser using chromium.connectOverCDP().

However, community members note that these solutions may not be straightforward, and there is an open issue on the Crawlee repository about this feature request.

Useful resources

ttim

Is there a way we can specify a web socket endpoint in the PlaywrightCrawler config (or somewhere else) so we can connect to a remote browser?

8 comments

MMarc Plouhinec

Hi ,

It looks like the solution is not straightforward, you may try to write your own PlaywrightPlugin, by replacing every this.library.launch by this.library.connectOverCDP('http://hostname:port') (e.g. http://localhost:9222), and then provide it to the PlaywrightCrawler via the browserPool option parameter (check the code of PlaywrightCrawlerOptions for more details).

MMarc Plouhinec

For more info about connectOverCDP: https://playwright.dev/docs/api/class-browsertype#browser-type-connect-over-cdp

AApifyBot

just advanced to level 1! Thanks for your contributions! 🎉

ttim

Thanks for the response! unfortunately when i try that i get an error:
Error: browserPoolOptions.browserPlugins is disallowed. Use launchContext.launcher instead.

sscarych

Hello! I have same task with remote browser.
did you find the solution with launchContext.launcher? May you share this one?

MMarc Plouhinec

I was thinking about another solution: you can create a BasicCrawler and manage your browser page by yourself, for example:

Plain Text

import { chromium } from 'playwright';
import { newInjectedContext } from 'fingerprint-injector';

const BROWSER_URL = 'http://127.0.0.1:9222'; // or something like 'ws://127.0.0.1:36775/devtools/browser/a292f96c-7332-4ce8-82a9-7411f3bd280a'

// ... inside your BasicCrawler
async requestHandler({ request, sendRequest, log }) {
    // Initialize your browser
    const browser = await chromium.connectOverCDP(BROWSER_URL);
    const context = await newInjectedContext(browser); // See https://github.com/apify/fingerprint-suite
    const page = await context.newPage();
    
    try {
        await page.goto(request.url, {timeout: 20000});
    
        // ... extract data here
        
    } finally {
        await page.close();
        await context.close();
        await browser.close();
    }
}

LLukas Krivka

Yeah, I don't think this is possible with e.g. PlaywrightCrawler but if there would be bigger demand, technically could be implemented. There is actually an issue for this https://github.com/apify/crawlee/issues/1822

ddarkprince

Hi @Lukas Krivka

Any feature update on this? I checked the github issue, it is still open. Building a scraper functionality into our AI agent, hoping to use Crawlee for the scraping part, but require connecting to remote browser.

Add a reply

Apify Discord Mirror

Connecting to a remote browser instance?