Large threaded, kubernetes scrape = Target page, context or browser has been closed
Ironically @cryptorex just posted a similar issue (TargetClosedError: Target page, context or browser has been closed (I've tried a lot)) but I wanted to provide some additional context to see if they're related.
I'm:
This works flawlessly... for about 60 minutes... afterwards, I get plagued with
What I've tried:
I'm:
- Running a node app with worker threads (usually 32 of them)
- Running multiple containers in kubernetes
- Grabs 5 domains from my postgres DB (of 5 million!)
- Loops through each domain
- Creates a new PlaywrightCrawler with unique-named storages (to prevent collision / global deletion from crawlers in other threads)
- Queues the domains home page
- Controllers then queue up some additional pages based on what's found on the home page
- The results are processed in real-time and pushed to the database (since we don't want to wait until all 5M all are complete
- The thread-specific sotrages are then deleted used drop()
This works flawlessly... for about 60 minutes... afterwards, I get plagued with
Target page, context or browser has been closed. It appears at the ~ hour mark is when this first presents itself and then incrementally increases in frequency until I'm getting more failed records than successful (at which point, I kill the cluster or restart it).What I've tried:
likebrowserPoolOptions
andretireBrowserAfterPageCount: 100closeInactiveBrowserAfterSecs: 200
in hopes that this would clear and sort of cache/memory that could be stacking upawait crawler.teardown();- A cron to restart my cluster
- Ensuring the EBS volumes are not running out of space (they're 20GB each and seem to be 50% full when crashing)
- Ensuring the pods have plenty of memory (running EC2s with 64GB memory and 16 CPU (32 threads). Seems to handle the load in the first hour just fine.