conscious-sapphire
conscious-sapphire•2y ago

How to access browser instance in Playwright Crawler?

I have been trying to port our scrapers from Selenium/Python to crawlee mainly because of the anti bot protections already built into it. The issue I am facing is I am having a hard time translating our functions 1-to-1 from selenium to Crawlee because a lot of it depends on the selenium driver or in Playwright's case browser instance, for e.g. I need to click on an element to get the link because there's a redirect in between and I need to wait for it before grabbing it and I cant use enqueueLinksByClickingElements because I need it in the same request for my dataset to be complete. There are other such issues I am having trouble with and I know we have Page exposed but that's just a single tab in a browser's context and I need more control over it for my usecase. Is this something that's possible with Crawlee? or are there any workarounds that I can use for this same functionality?
7 Replies
Pepa J
Pepa J•2y ago
Hey @AltairSama2 , Are you talking about context.page.browser()?
const crawler = new PlaywrightCrawler({
requestHandler: async (context) => {
// context.page.browser()
context.browserController.browser
},
});
const crawler = new PlaywrightCrawler({
requestHandler: async (context) => {
// context.page.browser()
context.browserController.browser
},
});
conscious-sapphire
conscious-sapphireOP•2y ago
will this give me the full browser api that playwright has? if so then this is exactly what I need I went through the docs on apify but coudlnt figure out how to get those inside crawler if this is the same then I can simply refer to playwright's docs for opening new tabs etc and have the crawler handle the anti bot stuff etc not the most optimized thing but I'll refactor once I am more familiar with crawlee
Pepa J
Pepa J•2y ago
Please test it and let us know, if it solve your problem.
conscious-sapphire
conscious-sapphireOP•2y ago
will do thanks! I just tried this snippet, and it just says browser does not exist?
MEE6
MEE6•2y ago
@AltairSama2 just advanced to level 3! Thanks for your contributions! 🎉
Pepa J
Pepa J•2y ago
@AltairSama2 I am sorry, it should be context.browserController.browser, instead can you try it?
conscious-sapphire
conscious-sapphireOP•2y ago
sure thanks! it worked thanks! I''ll try out the functions from playwright's docs and see how it works

Did you find this page helpful?