Large threaded, kubernetes scrape = Target page, context or browser has been closed
I'm:
- Running a node app with worker threads (usually 32 of them)
- Running multiple containers in kubernetes
Each thread:
- Grabs 5 domains from my postgres DB (of 5 million!)
- Loops through each domain
- Creates a new PlaywrightCrawler with unique-named storages (to prevent collision / global deletion from crawlers in other threads)
- Queues the domains home page
- Controllers then queue up some additional pages based on what's found on the home page
- The results are processed in real-time and pushed to the database (since we don't want to wait until all 5M all are complete
- The thread-specific sotrages are then deleted used drop()
The Problem
This works flawlessly... for about 60 minutes... afterwards, I get plagued with
Target page, context or browser has been closed. It appears at the ~ hour mark is when this first presents itself and then incrementally increases in frequency until I'm getting more failed records than successful (at which point, I kill the cluster or restart it).What I've tried:
-
browserPoolOptions like retireBrowserAfterPageCount: 100 and closeInactiveBrowserAfterSecs: 200-
await crawler.teardown(); in hopes that this would clear and sort of cache/memory that could be stacking up- A cron to restart my cluster
- Ensuring the EBS volumes are not running out of space (they're 20GB each and seem to be 50% full when crashing)
- Ensuring the pods have plenty of memory (running EC2s with 64GB memory and 16 CPU (32 threads). Seems to handle the load in the first hour just fine.
I suspect there's a leak or store not being cleared out since it happens gradually?
