extended-salmon
extended-salmon8mo ago

Max Depth option

Hello! Just wondering whether it is possible to set max depth for the crawl? Previous posts (2023) seems to make use of 'userData' to track the depth. Thank you.
4 Replies
Hall
Hall8mo ago
Someone will reply to you shortly. In the meantime, this might help:
optimistic-gold
optimistic-gold8mo ago
Hi, this can be set. You can add a maxdepth parameter in the INPUT_SCHEMA.json file, then retrieve it from the code and implement the specified depth functionality.
lemurio
lemurio8mo ago
@mjh Hey, when using python, you can use max_crawl_depth (https://crawlee.dev/python/api/class/BasicCrawlerOptions#max_crawl_depth) . Otherwise, you can track the current depth using request.userData. When adding new links in the request handler, just increment the depth by one and use that for each newly enqueued request (https://crawlee.dev/api/core/class/Request)
Request | API | Crawlee · Build reliable crawlers. Fast.
Crawlee helps you build and maintain your crawlers. It's open source, but built by developers who scrape millions of pages every day for a living.
BasicCrawlerOptions | API | Crawlee for Python · Fast, reliable Pyt...
Crawlee helps you build and maintain your Python crawlers. It's open source and modern, with type hints for Python to help you catch bugs early.
extended-salmon
extended-salmonOP8mo ago
Got this. thank you

Did you find this page helpful?