hunterleung.
hunterleung.•3y ago

Memory is critically overloaded. Using 12184 MB of 3883 MB (314%). Consider increasing available mem

Hi. I am running a playwright crawler in my linux vps. The vps ihas 8 core CPU and 15533MB memory. But I got many warning like : WARN PlaywrightCrawler:AutoscaledPool:Snapshotter: Memory is critically overloaded. Using 12184 MB of 3883 MB (314%). Consider increasing available memory. So how should I fix this ? Thanks for your help.
23 Replies
NeoNomade
NeoNomade•3y ago
First of all config the CRAWLEE_MEMORY_MBYTES env var to something higher. Then also use the —max-old-space-size attribute to run your spider
hunterleung.
hunterleung.OP•3y ago
Thanks very much. I will try it.
stormy-gold
stormy-gold•2y ago
@NeoNomade @hunterleung. did you fix this? wondering why even 1 browser is spending 12GB?!
MEE6
MEE6•2y ago
@bmax just advanced to level 6! Thanks for your contributions! 🎉
NeoNomade
NeoNomade•2y ago
It's not 1 browser. You usually end up with this amount of ram used when you have high concurrency and forget to close pages
stormy-gold
stormy-gold•2y ago
@NeoNomade thank you for your response. struggling over here scaling my crawlers. Doesn't crawlee automatically close pages?
NeoNomade
NeoNomade•2y ago
Nope . await page.close() is your weapon
stormy-gold
stormy-gold•2y ago
omg, after default handler it doesn't close page?
NeoNomade
NeoNomade•2y ago
Automatic no .
stormy-gold
stormy-gold•2y ago
any other suggestions?
NeoNomade
NeoNomade•2y ago
await page.close() at the end of your handler
stormy-gold
stormy-gold•2y ago
haha ok. ty @NeoNomade I already have that.
NeoNomade
NeoNomade•2y ago
Than is something else that is wrong. You can use the chrome debugger to see what objects are holding your memory .
stormy-gold
stormy-gold•2y ago
We read a lot of PDF's is there anyway to ensure that gets cleaned up?
NeoNomade
NeoNomade•2y ago
Don't know what package you are using for that. Check docs of that particular package. Also I don't think those activities should be mixed. Scraping is scraping, pdf parsing is something different . Indeed memory usage can go bananas if you scrape and read pdfs and mix in some other stuff like maybe unclosed connections to S3. Divide your processes, clean the code .
stormy-gold
stormy-gold•2y ago
Totally agree, we don't read the PDF's on the same machine. Sorry that was convoluted. We hit a lot of PDF's and then either intercept request and send the url to SQS to download and parse.
NeoNomade
NeoNomade•2y ago
There are a ton of possibilities . If you can share some of your code it would be easier to spot the issue
stormy-gold
stormy-gold•2y ago
@NeoNomade would love to chat and even pay you to look over w/ me. Would you mind getting on a call with me?
NeoNomade
NeoNomade•2y ago
@bmax what time is at your place ?
stormy-gold
stormy-gold•2y ago
6PM PST We can do tomorrow or another time.
NeoNomade
NeoNomade•2y ago
Here is 4am. If we can do it in 5-6 hours it would be great for me . Kids are sleeping and it's hard to have a call at this hour
stormy-gold
stormy-gold•2y ago
haha understandable! Up at 4am 😂 i'll add you and we can figured it out. thank you.
NeoNomade
NeoNomade•2y ago
Awesome. Thank you.

Did you find this page helpful?