hunterleung.•3y ago

Memory is critically overloaded. Using 12184 MB of 3883 MB (314%). Consider increasing available mem

Hi. I am running a playwright crawler in my linux vps. The vps ihas 8 core CPU and 15533MB memory. But I got many warning like : WARN PlaywrightCrawler:AutoscaledPool:Snapshotter: Memory is critically overloaded. Using 12184 MB of 3883 MB (314%). Consider increasing available memory. So how should I fix this ? Thanks for your help.

23 Replies

NeoNomade•3y ago

First of all config the CRAWLEE_MEMORY_MBYTES env var to something higher. Then also use the —max-old-space-size attribute to run your spider

hunterleung.OP•3y ago

Thanks very much. I will try it.

inland-turquoise•16mo ago

@NeoNomade @hunterleung. did you fix this? wondering why even 1 browser is spending 12GB?!

MEE6•16mo ago

@bmax just advanced to level 6! Thanks for your contributions! 🎉

NeoNomade•16mo ago

It's not 1 browser. You usually end up with this amount of ram used when you have high concurrency and forget to close pages

inland-turquoise•16mo ago

@NeoNomade thank you for your response. struggling over here scaling my crawlers. Doesn't crawlee automatically close pages?

NeoNomade•16mo ago

Nope . await page.close() is your weapon

inland-turquoise•16mo ago

omg, after default handler it doesn't close page?

NeoNomade•16mo ago

Automatic no .

inland-turquoise•16mo ago

any other suggestions?

NeoNomade•16mo ago

await page.close() at the end of your handler

inland-turquoise•16mo ago

haha ok. ty @NeoNomade I already have that.

NeoNomade•16mo ago

Than is something else that is wrong. You can use the chrome debugger to see what objects are holding your memory .

inland-turquoise•16mo ago

We read a lot of PDF's is there anyway to ensure that gets cleaned up?

NeoNomade•16mo ago

Don't know what package you are using for that. Check docs of that particular package. Also I don't think those activities should be mixed. Scraping is scraping, pdf parsing is something different . Indeed memory usage can go bananas if you scrape and read pdfs and mix in some other stuff like maybe unclosed connections to S3. Divide your processes, clean the code .

inland-turquoise•16mo ago

Totally agree, we don't read the PDF's on the same machine. Sorry that was convoluted. We hit a lot of PDF's and then either intercept request and send the url to SQS to download and parse.

NeoNomade•16mo ago

There are a ton of possibilities . If you can share some of your code it would be easier to spot the issue

inland-turquoise•16mo ago

@NeoNomade would love to chat and even pay you to look over w/ me. Would you mind getting on a call with me?

NeoNomade•16mo ago

@bmax what time is at your place ?

inland-turquoise•16mo ago

6PM PST We can do tomorrow or another time.

NeoNomade•16mo ago

Here is 4am. If we can do it in 5-6 hours it would be great for me . Kids are sleeping and it's hard to have a call at this hour

inland-turquoise•16mo ago

haha understandable! Up at 4am 😂 i'll add you and we can figured it out. thank you.

NeoNomade•16mo ago

Awesome. Thank you.

Memory is critically overloaded. Using 12184 MB of 3883 MB (314%). Consider increasing available mem

Did you find this page helpful?