Crawling one link at a time
Hi all, moving my first steps with crawlee and trying out the basic example on https://crawlee.dev/python.
Is there a (clean) way to run the crawling one step at a time - with a step being a single execution of
request_handler
?2 Replies
Hello, maybe you can use concurrency_settings argument for the crawler and set max_concurrency=1. Alternatively, you could probably add a lock to your request_handler
Limiting concurrency:
https://crawlee.dev/python/api/class/BasicCrawler#__init__
https://crawlee.dev/python/api/class/ConcurrencySettings
Adding lock:
https://docs.python.org/3/library/asyncio-sync.html#lock
ConcurrencySettings | API | Crawlee for Python · Fast, reliable Py...
Crawlee helps you build and maintain your Python crawlers. It's open source and modern, with type hints for Python to help you catch bugs early.
Python documentation
Synchronization Primitives
Source code: Lib/asyncio/locks.py asyncio synchronization primitives are designed to be similar to those of the threading module with two important caveats: asyncio primitives are not thread-safe, ...
BasicCrawler | API | Crawlee for Python · Fast, reliable Python we...
Crawlee helps you build and maintain your Python crawlers. It's open source and modern, with type hints for Python to help you catch bugs early.
Neat, thank you! I was looking for something built-in in the API, but in fact that sounds like a simple enough solution!