conscious-sapphire•2y ago
How to access browser instance in Playwright Crawler?
I have been trying to port our scrapers from Selenium/Python to crawlee mainly because of the anti bot protections already built into it. The issue I am facing is I am having a hard time translating our functions 1-to-1 from selenium to Crawlee because a lot of it depends on the selenium
driver
or in Playwright's case browser
instance, for e.g.
I need to click on an element to get the link because there's a redirect in between and I need to wait for it before grabbing it and I cant use enqueueLinksByClickingElements
because I need it in the same request for my dataset to be complete.
There are other such issues I am having trouble with and I know we have Page
exposed but that's just a single tab in a browser's context and I need more control over it for my usecase.
Is this something that's possible with Crawlee? or are there any workarounds that I can use for this same functionality?7 Replies
Hey @AltairSama2 ,
Are you talking about
context.page.browser()
?
conscious-sapphireOP•2y ago
will this give me the full browser api that playwright has? if so then this is exactly what I need
I went through the docs on apify
but coudlnt figure out how to get those inside crawler
if this is the same then I can simply refer to playwright's docs for opening new tabs etc and have the crawler handle the anti bot stuff etc
not the most optimized thing but I'll refactor once I am more familiar with crawlee
Please test it and let us know, if it solve your problem.
conscious-sapphireOP•2y ago
will do thanks!
I just tried this snippet, and it just says browser does not exist?
@AltairSama2 just advanced to level 3! Thanks for your contributions! 🎉
@AltairSama2 I am sorry, it should be
context.browserController.browser
, instead can you try it?conscious-sapphireOP•2y ago
sure thanks!
it worked thanks!
I''ll try out the functions from playwright's docs and see how it works