2 replies

How to maximize throughput with concurrency settings?

I am deploying crawlee in my kubernetes cluster and first I tried not setting a maximum (nor a minimum) of concurrent tasks, but crawlee kept on taking more and more memory until the pod was killed by the cluster (took too much memory). Now I set a conservative maximum but I see resources are very underutilised in some moments depending on the page I am crawling. Is there a way I am not seeing on how to do this correctly? Or is it possible that there is a bug when determining a max amount of concurrent tasks where crawlee spawns too many of them?

How to maximize throughput with concurrency settings?

Similar Threads