Apify Discord Mirror

Updated 2 months ago

How to set ram used by the crawler

At a glance

The community member is trying to set the RAM available to a crawler in their Python code, but is having trouble figuring out how to do it. The community members suggest using the Configuration class from the crawlee library and setting the memory_mbytes parameter. However, the community member is still unsure if it's working properly, as the crawling speed doesn't seem to have improved. There is no explicitly marked answer, but the community members provide suggestions and try to help the original poster resolve the issue.

Useful resources
Ive scoured the docs and used chatgpt/perplexity. I for the life of me cannot work out how to set the ram available to the crawler. I want to give it 20gb i have a 32gb system
1
E
G
M
11 comments
Im not using Apify, is this not place for crawlee python questions?
are you going to set the RAM in your local python code?
In most case, we discuss Apify actor crawlee here
Hi so I’ve found that but clearly I am not a good developer because I haven’t figured out what I actually need to write in the code to use it.

Plain Text
from crawlee import Configuration


Doesn’t work for me.
Is it not working when used in this way?

Plain Text
from crawlee.http_crawler import HttpCrawler, HttpCrawlingContext
from crawlee.configuration import Configuration


async def main() -> None:
    crawler = HttpCrawler(
        configuration=Configuration(memory_mbytes=20480)
    )
Plain Text
from crawlee.configuration import Configuration


Thats the correction I was probably looking for. Ive just tried it on my macbook and it starts crawling w/o errors. When I get home Ill try it on my actual computer and let you know how it goes
@Grespino just advanced to level 1! Thanks for your contributions! 🎉
ok so it works in the script but im not quite sure if its working properly. It doesnt seem to going faster althought that could be because I am only trying to do one domain atm rather than many in parallel?

here s the autoscaling stuff from console:

[crawlee._autoscaling.autoscaled_pool] INFO current_concurrency = 25; desired_concurrency = 200; cpu = 0.0; mem = 0.0; event_loop = 0.295; client_info = 0.0
Mergers expect PR related to parallelism - https://github.com/apify/crawlee-python/pull/780

This may be related to the problem it solves.
Add a reply
Sign up and join the conversation on Discord