Urbanus•2w ago

Crawling one link at a time

Hi all, moving my first steps with crawlee and trying out the basic example on https://crawlee.dev/python. Is there a (clean) way to run the crawling one step at a time - with a step being a single execution of request_handler?

2 Replies

Josef•2w ago

Hello, maybe you can use concurrency_settings argument for the crawler and set max_concurrency=1. Alternatively, you could probably add a lock to your request_handler Limiting concurrency: https://crawlee.dev/python/api/class/BasicCrawler#__init__ https://crawlee.dev/python/api/class/ConcurrencySettings Adding lock: https://docs.python.org/3/library/asyncio-sync.html#lock

ConcurrencySettings | API | Crawlee for Python · Fast, reliable Py...

Crawlee helps you build and maintain your Python crawlers. It's open source and modern, with type hints for Python to help you catch bugs early.

Python documentation

Synchronization Primitives

Source code: Lib/asyncio/locks.py asyncio synchronization primitives are designed to be similar to those of the threading module with two important caveats: asyncio primitives are not thread-safe, ...

BasicCrawler | API | Crawlee for Python · Fast, reliable Python we...

Crawlee helps you build and maintain your Python crawlers. It's open source and modern, with type hints for Python to help you catch bugs early.

UrbanusOP•2w ago

Neat, thank you! I was looking for something built-in in the API, but in fact that sounds like a simple enough solution!

Crawling one link at a time

Did you find this page helpful?