Urbanus
Urbanus2w ago

Crawling one link at a time

Hi all, moving my first steps with crawlee and trying out the basic example on https://crawlee.dev/python. Is there a (clean) way to run the crawling one step at a time - with a step being a single execution of request_handler?
2 Replies
Josef
Josef2w ago
Hello, maybe you can use concurrency_settings argument for the crawler and set max_concurrency=1. Alternatively, you could probably add a lock to your request_handler Limiting concurrency: https://crawlee.dev/python/api/class/BasicCrawler#__init__ https://crawlee.dev/python/api/class/ConcurrencySettings Adding lock: https://docs.python.org/3/library/asyncio-sync.html#lock
ConcurrencySettings | API | Crawlee for Python · Fast, reliable Py...
Crawlee helps you build and maintain your Python crawlers. It's open source and modern, with type hints for Python to help you catch bugs early.
Python documentation
Synchronization Primitives
Source code: Lib/asyncio/locks.py asyncio synchronization primitives are designed to be similar to those of the threading module with two important caveats: asyncio primitives are not thread-safe, ...
BasicCrawler | API | Crawlee for Python · Fast, reliable Python we...
Crawlee helps you build and maintain your Python crawlers. It's open source and modern, with type hints for Python to help you catch bugs early.
Urbanus
UrbanusOP2w ago
Neat, thank you! I was looking for something built-in in the API, but in fact that sounds like a simple enough solution!

Did you find this page helpful?