Apify Discord Mirror

Updated last year

Prevent Clawler from adding failed request to default RequestQueue

At a glance
The community member is using a PuppeteerCrawler to scrape product URLs, and they want to prevent failed requests from being added to the default RequestQueue. They are purposely throwing an error when a request fails, expecting the failed request to go back to the RequestList, but instead it is being added to the default RequestQueue, which is not the desired behavior. The comments suggest that the community member should not throw an error if they do not want to retry the same request, and instead implement scraping logic to retry on errors to resolve blocking by retries.
Is there a way to prevent the crawler from adding a failed request to the default RequestQueue?

Plain Text
const crawler = new PuppeteerCrawler({
    proxyConfiguration,
    requestHandler: router,
    maxRequestRetries: 25,
    requestList: await RequestList.open(null, [initUrl]),
    requestHandlerTimeoutSecs: 2000,
    maxConcurrency: 1,
}, config);

I'm using the default RequestQueue to add productUrls, and they're being handled inside the defaultRequestHandler, but when some of them fails, I purposely throw an Error, expecting the failed request(which is the initUrl) goes back to RequestList, but it goes to the default RequestQueue too, which is not what I want.
n
A
2 comments
Prevent Clawler from adding failed request to default RequestQueue
do not throw error if you do not want to retry the same request, scraping logic to retry on errors s to resolve blocking by retries
Add a reply
Sign up and join the conversation on Discord