Memory is critically overloaded. Using 12184 MB of 3883 MB (314%). Consider increasing available mem
Hi. I am running a playwright crawler in my linux vps. The vps ihas 8 core CPU and 15533MB memory.
But I got many warning like :
WARN PlaywrightCrawler:AutoscaledPool:Snapshotter: Memory is critically overloaded. Using 12184 MB of 3883 MB (314%). Consider increasing available memory.
So how should I fix this ?
Thanks for your help.
23 Replies
First of all config the CRAWLEE_MEMORY_MBYTES env var to something higher.
Then also use the —max-old-space-size attribute to run your spider
Thanks very much. I will try it.
inland-turquoise•16mo ago
@NeoNomade @hunterleung. did you fix this? wondering why even 1 browser is spending 12GB?!
@bmax just advanced to level 6! Thanks for your contributions! 🎉
It's not 1 browser. You usually end up with this amount of ram used when you have high concurrency and forget to close pages
inland-turquoise•16mo ago
@NeoNomade thank you for your response. struggling over here scaling my crawlers.
Doesn't crawlee automatically close pages?
Nope .
await page.close() is your weapon
inland-turquoise•16mo ago
omg, after default handler it doesn't close page?
Automatic no .
inland-turquoise•16mo ago
any other suggestions?
await page.close() at the end of your handler
inland-turquoise•16mo ago
haha ok.
ty
@NeoNomade I already have that.
Than is something else that is wrong. You can use the chrome debugger to see what objects are holding your memory .
inland-turquoise•16mo ago
We read a lot of PDF's
is there anyway to ensure that gets cleaned up?
Don't know what package you are using for that. Check docs of that particular package.
Also I don't think those activities should be mixed. Scraping is scraping, pdf parsing is something different .
Indeed memory usage can go bananas if you scrape and read pdfs and mix in some other stuff like maybe unclosed connections to S3. Divide your processes, clean the code .
inland-turquoise•16mo ago
Totally agree, we don't read the PDF's on the same machine. Sorry that was convoluted. We hit a lot of PDF's and then either intercept request and send the url to SQS to download and parse.
There are a ton of possibilities . If you can share some of your code it would be easier to spot the issue
inland-turquoise•16mo ago
@NeoNomade would love to chat and even pay you to look over w/ me. Would you mind getting on a call with me?
@bmax what time is at your place ?
inland-turquoise•16mo ago
6PM PST
We can do tomorrow or another time.
Here is 4am. If we can do it in 5-6 hours it would be great for me .
Kids are sleeping and it's hard to have a call at this hour
inland-turquoise•16mo ago
haha understandable! Up at 4am 😂
i'll add you and we can figured it out. thank you.
Awesome. Thank you.