How to set ram used by the crawler

At a glance

The community member is trying to set the RAM available to a crawler in their Python code, but is having trouble figuring out how to do it. The community members suggest using the Configuration class from the crawlee library and setting the memory_mbytes parameter. However, the community member is still unsure if it's working properly, as the crawling speed doesn't seem to have improved. There is no explicitly marked answer, but the community members provide suggestions and try to help the original poster resolve the issue.

Useful resources

GGrespino

Ive scoured the docs and used chatgpt/perplexity. I for the life of me cannot work out how to set the ram available to the crawler. I want to give it 20gb i have a 32gb system

11 comments

EExp

Hi, please refer this link
https://docs.apify.com/platform/limits

GGrespino

Im not using Apify, is this not place for crawlee python questions?

EExp

are you going to set the RAM in your local python code?

EExp

In most case, we discuss Apify actor crawlee here

MMantisus

Hey @Grespino

Use https://crawlee.dev/python/api/class/Configuration#memory_mbytes

GGrespino

Hi so I’ve found that but clearly I am not a good developer because I haven’t figured out what I actually need to write in the code to use it.

Plain Text

from crawlee import Configuration

Doesn’t work for me.

MMantisus

Is it not working when used in this way?

Plain Text

from crawlee.http_crawler import HttpCrawler, HttpCrawlingContext
from crawlee.configuration import Configuration


async def main() -> None:
    crawler = HttpCrawler(
        configuration=Configuration(memory_mbytes=20480)
    )

GGrespino

Plain Text

from crawlee.configuration import Configuration

Thats the correction I was probably looking for. Ive just tried it on my macbook and it starts crawling w/o errors. When I get home Ill try it on my actual computer and let you know how it goes

AApifyBot

@Grespino just advanced to level 1! Thanks for your contributions! 🎉

GGrespino

ok so it works in the script but im not quite sure if its working properly. It doesnt seem to going faster althought that could be because I am only trying to do one domain atm rather than many in parallel?

here s the autoscaling stuff from console:

[crawlee._autoscaling.autoscaled_pool] INFO current_concurrency = 25; desired_concurrency = 200; cpu = 0.0; mem = 0.0; event_loop = 0.295; client_info = 0.0

MMantisus

Mergers expect PR related to parallelism - https://github.com/apify/crawlee-python/pull/780

This may be related to the problem it solves.

Add a reply

Apify Discord Mirror

How to set ram used by the crawler