How to access browser instance in Playwright Crawler?
How to access browser instance in Playwright Crawler?
At a glance
The community member is trying to port their scrapers from Selenium/Python to Crawlee, mainly due to the anti-bot protections built into Crawlee. They are having trouble translating their functions 1-to-1 from Selenium to Crawlee, as a lot of it depends on the Selenium driver or Playwright's browser instance. For example, they need to click on an element to get a link because there's a redirect in between, and they need to wait for it before grabbing the link, but they can't use enqueueLinksByClickingElements because they need it in the same request for their dataset to be complete.
The community member asks if it's possible to achieve this functionality with Crawlee, or if there are any workarounds they can use. Another community member suggests using context.browserController.browser to access the full browser API that Playwright provides, which seems to solve the issue. The community members test this approach and confirm that it works.
I have been trying to port our scrapers from Selenium/Python to crawlee mainly because of the anti bot protections already built into it. The issue I am facing is I am having a hard time translating our functions 1-to-1 from selenium to Crawlee because a lot of it depends on the selenium driver or in Playwright's case browser instance, for e.g.
I need to click on an element to get the link because there's a redirect in between and I need to wait for it before grabbing it and I cant use enqueueLinksByClickingElements because I need it in the same request for my dataset to be complete.
There are other such issues I am having trouble with and I know we have Page exposed but that's just a single tab in a browser's context and I need more control over it for my usecase.
Is this something that's possible with Crawlee? or are there any workarounds that I can use for this same functionality?