The community member is experiencing an "out of memory" (OOM) error while trying to scrape a list of small and medium-sized businesses in Ontario, excluding franchises, using Apify. The error keeps occurring even though the system has 32GB of memory, with only 1-2GB being used before the error appears. The community members asks for help to resolve this issue.
In the comments, another community member suggests that the issue is likely due to a memory leak in the scraper. They provide several general tips to identify and handle memory leaks in scrapers, such as avoiding retaining large data in memory, optimizing dependencies, using batch processing, limiting concurrent tasks, and avoiding retaining references. The community member also recommends reviewing the scraper's logic step by step to pinpoint the source of the issue.
I keep getting this error message "The Actor hit an OOM (out of memory) condition. You can resurrect it with more memory to continue where you left off."
It keeps resurrecting from failed status and then running into the same issue however out of 32 Gb memory it only uses 1-2 GB before the error is brought up
failing no matter what I try
Any help would be awesome.
I have a list of industries of businesses that I need to scrape from Google Maps based in Ontario. One thing to note is I'm trying to tackle small/medium sized businesses only, and NO franchises businesses. I'm new to Apify so I'm not sure if there's a way to filter out franchise businesses from a scrape
Most likely, your scraper has a memory leak issue that's causing the runs to fail. Here are some general tips to identify and handle memory leaks scrapers:
Avoid Retaining Large Data in Memory: If you're storing large objects like responses or HTML, make sure to release them as soon as they're no longer needed.
Optimize Dependencies: Check for memory-intensive libraries and replace them with lightweight alternatives if possible.
Batch Processing: Break your tasks into smaller batches to avoid processing too much data at once. This prevents overwhelming memory.
Limit Concurrent Tasks: If your scraper is running too many tasks in parallel, reduce the concurrency to balance memory usage.
Avoid Retaining References: Check for closures or global variables that unnecessarily hold references to objects, preventing them from being garbage-collected.
But anyway, it’s a good idea to review your logic step by step to pinpoint where the issue might be.