foreign-sapphire
foreign-sapphire16mo ago

Enqueue Links from new Window

Hi. I'm attempting to scrape data from a website, using crawlee and playwright, that is very JS heavy. The links I'm interested in are created by a JS function that opens the content in a new window. I've implemented the enqueueLinksByClickingElements function with a very specific selector. Playwright reports successfully clicking the links but I suspect the request is not being intercepted.
DEBUG Playwright Click Elements: enqueueLinksByClickingElements: There are 1 elements to click.
DEBUG Playwright Click Elements: enqueueLinksByClickingElements: Successfully clicked 1 elements out of 1
DEBUG PlaywrightCrawler: Crawled 1/2 pages, 0 failed requests.
DEBUG PlaywrightCrawler: Crawled 1/2 pages, 0 failed requests.
INFO PlaywrightCrawler: All requests from the queue have been processed, the crawler will shut down.
DEBUG Playwright Click Elements: enqueueLinksByClickingElements: There are 1 elements to click.
DEBUG Playwright Click Elements: enqueueLinksByClickingElements: Successfully clicked 1 elements out of 1
DEBUG PlaywrightCrawler: Crawled 1/2 pages, 0 failed requests.
DEBUG PlaywrightCrawler: Crawled 1/2 pages, 0 failed requests.
INFO PlaywrightCrawler: All requests from the queue have been processed, the crawler will shut down.
I've also passed in the transformRequestFunction to set useExtendedUniqueKey to true. Is there a way I can: 1. Take a screenshot after Playwright clicks the element? 2. Log the intercepted requests? Thanks!
4 Replies
ondro_k
ondro_k16mo ago
Hi, to intercept requests made by website's JS, try to use page.route (https://playwright.dev/docs/api/class-page#page-route). You can take a screenshot by calling page.screenshot (https://playwright.dev/docs/screenshots#full-page-screenshots) right after click.
Page | Playwright
* extends: [EventEmitter]
foreign-sapphire
foreign-sapphireOP16mo ago
Thanks!
xenial-black
xenial-black12mo ago
@ondro_k , do you know if using page.route will intercept ALL external requests? I'm scrapping a site and some of the pages (not all of them) make an external request to algolia. I need to intercept those requests so that I can get an information from the request header. I'd need to make sure that I just follow to the next step (scrapping the html) once all the requests are finished, since some pages will make the algolia request and some won't. Any idea of how I can handle this?
Oleg V.
Oleg V.12mo ago
You can try something like:
await page.route('**/*', (route) => {
return route.request().url().includes('algolia') ? route.abort() : route.continue();
}); // or do whatever You need with the request
await page.route('**/*', (route) => {
return route.request().url().includes('algolia') ? route.abort() : route.continue();
}); // or do whatever You need with the request
Also this article might be handy: https://medium.com/@kbalaji.kks/playwright-network-insights-how-to-intercept-modify-delete-and-analyze-network-calls-cde402f103e6

Did you find this page helpful?