The community member is exploring ways to optimize web crawling speed using Playwright. They are interested in a method to navigate to new URLs without closing and reopening pages each time, and a way to disable the rendering of images, fonts, and stylesheets while still accessing the DOM. One community member suggests that the existing PlaywrightCrawler cannot re-use pages, and they had a similar problem in a previous post where they wrote their own crawler. Another community member mentions that the blockRequests method may be helpful for disabling rendering.
I'm exploring ways to optimize web crawling speed using Playwright. I'm curious if there's a method to navigate to new URLs without closing and reopening pages each time. Essentially, updating the URL in the address bar and initiating navigation.
Additionally, is there a way to disable the rendering of images, fonts, and stylesheets, assuming I only need access to the DOM? Any insights or tips would be greatly appreciated!
Hi, I'm afraid you can't re-use pages with the existing PlaywrightCrawler. I had a similar problem in this previous post, where I finally wrote my own crawler (I shared my code)