hunterleung.
hunterleung.•3y ago

Memory is critically overloaded. Using 12184 MB of 3883 MB (314%). Consider increasing available mem

Hi. I am running a playwright crawler in my linux vps. The vps ihas 8 core CPU and 15533MB memory. But I got many warning like : WARN PlaywrightCrawler:AutoscaledPool:Snapshotter: Memory is critically overloaded. Using 12184 MB of 3883 MB (314%). Consider increasing available memory. So how should I fix this ? Thanks for your help.
23 Replies
NeoNomade
NeoNomade•3y ago
First of all config the CRAWLEE_MEMORY_MBYTES env var to something higher. Then also use the —max-old-space-size attribute to run your spider
hunterleung.
hunterleung.OP•3y ago
Thanks very much. I will try it.
inland-turquoise
inland-turquoise•16mo ago
@NeoNomade @hunterleung. did you fix this? wondering why even 1 browser is spending 12GB?!
MEE6
MEE6•16mo ago
@bmax just advanced to level 6! Thanks for your contributions! 🎉
NeoNomade
NeoNomade•16mo ago
It's not 1 browser. You usually end up with this amount of ram used when you have high concurrency and forget to close pages
inland-turquoise
inland-turquoise•16mo ago
@NeoNomade thank you for your response. struggling over here scaling my crawlers. Doesn't crawlee automatically close pages?
NeoNomade
NeoNomade•16mo ago
Nope . await page.close() is your weapon
inland-turquoise
inland-turquoise•16mo ago
omg, after default handler it doesn't close page?
NeoNomade
NeoNomade•16mo ago
Automatic no .
inland-turquoise
inland-turquoise•16mo ago
any other suggestions?
NeoNomade
NeoNomade•16mo ago
await page.close() at the end of your handler
inland-turquoise
inland-turquoise•16mo ago
haha ok. ty @NeoNomade I already have that.
NeoNomade
NeoNomade•16mo ago
Than is something else that is wrong. You can use the chrome debugger to see what objects are holding your memory .
inland-turquoise
inland-turquoise•16mo ago
We read a lot of PDF's is there anyway to ensure that gets cleaned up?
NeoNomade
NeoNomade•16mo ago
Don't know what package you are using for that. Check docs of that particular package. Also I don't think those activities should be mixed. Scraping is scraping, pdf parsing is something different . Indeed memory usage can go bananas if you scrape and read pdfs and mix in some other stuff like maybe unclosed connections to S3. Divide your processes, clean the code .
inland-turquoise
inland-turquoise•16mo ago
Totally agree, we don't read the PDF's on the same machine. Sorry that was convoluted. We hit a lot of PDF's and then either intercept request and send the url to SQS to download and parse.
NeoNomade
NeoNomade•16mo ago
There are a ton of possibilities . If you can share some of your code it would be easier to spot the issue
inland-turquoise
inland-turquoise•16mo ago
@NeoNomade would love to chat and even pay you to look over w/ me. Would you mind getting on a call with me?
NeoNomade
NeoNomade•16mo ago
@bmax what time is at your place ?
inland-turquoise
inland-turquoise•16mo ago
6PM PST We can do tomorrow or another time.
NeoNomade
NeoNomade•16mo ago
Here is 4am. If we can do it in 5-6 hours it would be great for me . Kids are sleeping and it's hard to have a call at this hour
inland-turquoise
inland-turquoise•16mo ago
haha understandable! Up at 4am 😂 i'll add you and we can figured it out. thank you.
NeoNomade
NeoNomade•16mo ago
Awesome. Thank you.

Did you find this page helpful?