rare-sapphire•2y ago
Error Target page, context or browser closed
Hello fellow developers,
I'm facing a consistent issue with Playwright in the Crawlee library context. Every time I perform an async operation on a locator instance, the page unexpectedly closes.
Here's the simplified code where the issue is evident:
The issue specifically happens at the line
const result = await test.count()
. Each time this line executes, the page closes, leading to the failure of the operation.
Some key points:
- The problem consistently occurs every time this code is executed.
- I'm using the latest versions of Playwright and Crawlee.
- The issue seems to be tied to the await
operation on the locator instance.
I'm stumped as to why this is happening. Is this a known issue with Playwright or Crawlee, or could there be something wrong with my implementation? Any insights, suggestions, or similar experiences would be incredibly helpful.
Thanks a lot in advance for any assistance!
PS I'm adding a video with settings headless: false to show you how it looks
PSS And here is disscussion on github with more details: https://github.com/apify/crawlee/discussions/2185GitHub
Consistent Page Closure Issue in Playwright Crawler (Crawlee) on As...
Hello everyone, I'm encountering a consistent issue with Playwright in the context of the Crawlee library. Each time I perform an async operation on a locator instance, specifically using await...
10 Replies
@Wojciech just advanced to level 1! Thanks for your contributions! 🎉
rare-sapphireOP•2y ago
@Helper could someone let me know what could be the problem here?
Hi @Wojciech , hard to say on this one, what is the value for the
element
variable there? I dont see it in the debugger variables toolbar, is it just regular page
object?rare-sapphireOP•2y ago
I cralw on Frames object @Pepa J
Frames | Playwright
Introduction
rare-sapphireOP•2y ago
I tired that and it did not work, As far as I testing I found out that the problem is probably with resoling my promise to early, becasue once I do operations like await page.title() await page.content, etc inside requestHandler everything works fine, but my logic looks different:
After debugging it it looks like page close after i call first await on the result from my resolve function
@Pepa J
@Wojciech I am sorry I cannot really follow your code. The only element that by syntax may contain
<body>
element is <html>
element - That is why I asked for the value of element
parameter there. I already linked you official documentation on how to work with frames in Playwright, and unfortunately we don't know anything about the website, that you are scraping. I suggest you to check the link I have provided and maybe some further examples, to get the idea on how to scrape frames/iframes.rare-sapphireOP•2y ago
The problem lies with the way PlaywrightCrawler handles crawling (the page is open only within the scope of
requestHandler
callback) so I manage this problem by providing call back to my function
playwrightCrawleePageResolver.resolve([url], callback)
and this callback is executed in requestHandler
I am sorry, I am not able to follow what you try to achieve.
PlaywrightCrawler handles crawling (the page is open only within the scope of requestHandler callback)Yes it does, so you should keep all the logic withing the requestHandler, of course you may call your own functions and methods, just keep in mind, when calling async functions use the
await
keyword, otherwise the is no waiting for the results of your function and the page might get close before your function being evaluated.rare-sapphireOP•2y ago
I encountered a challenge with maintaining all operations within the requestHandler. This stems from the fact that our crawler is initialized just once, yet it needs to continuously crawl the ever-changing internet for data. This dynamic nature of data necessitated a custom approach, allowing the crawler to dynamically accept varying arguments and functions. The key takeaway is that I've successfully resolved this issue. Thank you for your time and assistance.