Apify Discord Mirror

Updated 3 months ago

How to set concurrency/cpu's/memory correcty

At a glance

The community member is asking for help setting up the correct concurrency, memory, and CPU settings for using the PlayWrightCrawler for web scraping. Another community member responds that the best concurrency settings depend on the available resources, use-case, and the website being scraped. They suggest referring to the PlayWrightCrawler and BasicCrawler documentation to set the crawling options, including the concurrency settings.

Useful resources
Hello, I would like to use PlayWrightCrawler for scraping , but it is not clear from the documentation how can I set up correctly concurrency, memory, cpu's, etc. Can someone help me out? What is the best practice to set up this Crawler to make scraping parallel? Thanks in advance!
M
1 comment
Hello! The best concurrency settings really depend on the context, for instance the available resources, the use-case and the scraped website. You can set the crawling options when creating the PlaywrightCrawler: see https://crawlee.dev/python/api/class/PlaywrightCrawler#__init__ and https://crawlee.dev/python/api/class/BasicCrawler#__init__. For instance, you can set concurrency_settings: https://crawlee.dev/python/api/class/ConcurrencySettings.
Add a reply
Sign up and join the conversation on Discord