Python crawlers running in parallel

Hi, I have a custom Python + requests Actor that works great. It's pretty simple, it works against a list of starting URLs and pulls out a piece of information per URL.

My question is: If (for example) one run of 1,000 input URLs takes an hour to complete, i would like to parallel-ize it 4 ways so that I can run 4,000 URLs in an hour.

What's the best way to do this? I could kick off 4 copies of the run with segmented data, but this seems like something Apify could support natively.

I saw that if I was using Crawlee (and therefore JS) I could use autoscaling: https://docs.apify.com/platform/actors/running/usage-and-resources . But is there a way to build a single Python based Actor that uses more threads/CPU cores if needed?

Usage and resources | Platform | Apify Documentation

Learn about your Actors' memory and processing power requirements, their relationship with Docker resources, minimum requirements for different use cases and its impact on the cost.

Apify & Crawlee•3y ago•

4 replies

rubber-blue

Python crawlers running in parallel

Usage and resources | Platform | Apify Documentation

Learn about your Actors' memory and processing power requirements, their relationship with Docker resources, minimum requirements for different use cases and its impact on the cost.

Python crawlers running in parallel

Similar Threads

Python crawlers running in parallel

Similar Threads

Similar Threads

Similar Threads