inland-turquoise•12mo ago

Prevent automatic reclaim of failed requests

Hi everyone! Hope you're all doing well. I have a small question about Crawlee. My use case is a little simpler than a crawler; I just want to scrape a single URL every few seconds. To do this, I create a RequestList with just one url and start the Crawler. Sometimes, the crawler returns HTTP errors and fails. However, I don't mind as I'm going to run the crawler again after a few seconds and I'd prefer the errors to be ignored rather than automatically reclaimed. Is there a way of doing this?

4 Replies

Hall•12mo ago

View post on community site

This post has been pushed to the community knowledgebase. Any replies in this thread will be synced to the community site.

Apify Community

quickest-silver•12mo ago

You can simply set the maxRequestRetries option to 0:

const crawler = new BasicCrawler({
    maxRequestRetries: 0,
    ...
});

const crawler = new BasicCrawler({
    maxRequestRetries: 0,
    ...
});

inland-turquoiseOP•12mo ago

Maybe I misunderstood how the lib works, but wouldn't that just make the request go to failed status faster? Correct me if I'm wrong, but what I understood is: - Url is added to requests; - If the request fails, it is retried up to maxRequestRetries times; - If it still fails, it is marked as failed and can be reclaimed.

Oleg V.•12mo ago

I guess, You can use noRetry option: https://crawlee.dev/api/next/core/class/Request#noRetry

Prevent automatic reclaim of failed requests

Did you find this page helpful?