Apify Discord Mirror

Updated last year

How to access browser instance in Playwright Crawler?

At a glance

The community member is trying to port their scrapers from Selenium/Python to Crawlee, mainly due to the anti-bot protections built into Crawlee. They are having trouble translating their functions 1-to-1 from Selenium to Crawlee, as a lot of it depends on the Selenium driver or Playwright's browser instance. For example, they need to click on an element to get a link because there's a redirect in between, and they need to wait for it before grabbing the link, but they can't use enqueueLinksByClickingElements because they need it in the same request for their dataset to be complete.

The community member asks if it's possible to achieve this functionality with Crawlee, or if there are any workarounds they can use. Another community member suggests using context.browserController.browser to access the full browser API that Playwright provides, which seems to solve the issue. The community members test this approach and confirm that it works.

I have been trying to port our scrapers from Selenium/Python to crawlee mainly because of the anti bot protections already built into it. The issue I am facing is I am having a hard time translating our functions 1-to-1 from selenium to Crawlee because a lot of it depends on the selenium driver or in Playwright's case browser instance, for e.g.

I need to click on an element to get the link because there's a redirect in between and I need to wait for it before grabbing it and I cant use enqueueLinksByClickingElements because I need it in the same request for my dataset to be complete.

There are other such issues I am having trouble with and I know we have Page exposed but that's just a single tab in a browser's context and I need more control over it for my usecase.

Is this something that's possible with Crawlee? or are there any workarounds that I can use for this same functionality?
P
A
A
14 comments
Hey ,
Are you talking about context.page.browser()?

Plain Text
const crawler = new PlaywrightCrawler({
    requestHandler: async (context) => {
        // context.page.browser()
        context.browserController.browser
    },
});
will this give me the full browser api that playwright has? if so then this is exactly what I need
I went through the docs on apify
but coudlnt figure out how to get those inside crawler
if this is the same then I can simply refer to playwright's docs for opening new tabs etc and have the crawler handle the anti bot stuff etc
not the most optimized thing but I'll refactor once I am more familiar with crawlee
Please test it and let us know, if it solve your problem.
will do thanks!
I just tried this snippet, and it just says browser does not exist?
just advanced to level 3! Thanks for your contributions! πŸŽ‰
I am sorry, it should be context.browserController.browser, instead can you try it?
it worked thanks!
I''ll try out the functions from playwright's docs and see how it works
Add a reply
Sign up and join the conversation on Discord