Ironically @cryptorex just posted a similar issue (
https://discord.com/channels/801163717915574323/1255531330704375828/1255531330704375828) but I wanted to provide some additional context to see if they're related.
I'm:
- Running a node app with worker threads (usually 32 of them)
- Running multiple containers in kubernetes
Each thread:
- Grabs 5 domains from my postgres DB (of 5 million!)
- Loops through each domain
- Creates a new PlaywrightCrawler with unique-named storages (to prevent collision / global deletion from crawlers in other threads)
- Queues the domains home page
- Controllers then queue up some additional pages based on what's found on the home page
- The results are processed in real-time and pushed to the database (since we don't want to wait until all 5M all are complete
- The thread-specific sotrages are then deleted used drop()
The ProblemThis works flawlessly... for about 60 minutes... afterwards, I get plagued with
Target page, context or browser has been closed
. It appears at the ~ hour mark is when this first presents itself and then incrementally increases in frequency until I'm getting more failed records than successful (at which point, I kill the cluster or restart it).
What I've tried:browserPoolOptions
like retireBrowserAfterPageCount: 100
and closeInactiveBrowserAfterSecs: 200
await crawler.teardown();
in hopes that this would clear and sort of cache/memory that could be stacking up- A cron to restart my cluster 🤣
- Ensuring the EBS volumes are not running out of space (they're 20GB each and seem to be 50% full when crashing)
- Ensuring the pods have plenty of memory (running EC2s with 64GB memory and 16 CPU (32 threads). Seems to handle the load in the first hour just fine.
I suspect there's a leak or store not being cleared out since it happens gradually?