foreign-sapphire
foreign-sapphire•3y ago

How to make crawlee try to refetch?

If the return value of the http api I crawl does not meet expectations, but http status is 200 How can I mark this request as a failure and let crawlee get it again with next proxy?
7 Replies
extended-salmon
extended-salmon•3y ago
From what I understood, you want to make a request based on the data you receive from the initial request? If yes, then you can use the context object in the requestHandler to make a new request or enqueue a new request like this.
import { HttpCrawler } from '@crawlee/http';

const crawler = new HttpCrawler({
async requestHandler({ crawler, sendRequest, request }) {
// Send request right away and get a response
const { body } = await sendRequest({
url: request.url
})

// RequestOptions with custom uniqueKey to prevent Crawlee from thinking its a duplicate request
const newRequest: RequestOptions = {
url: request.url,
uniqueKey: Date.now().toString()
}

// Enqueue request
await crawler.addRequests([newRequest])
},
});

await crawler.run([
'http://www.example.com/page-1',
'http://www.example.com/page-2',
]);
import { HttpCrawler } from '@crawlee/http';

const crawler = new HttpCrawler({
async requestHandler({ crawler, sendRequest, request }) {
// Send request right away and get a response
const { body } = await sendRequest({
url: request.url
})

// RequestOptions with custom uniqueKey to prevent Crawlee from thinking its a duplicate request
const newRequest: RequestOptions = {
url: request.url,
uniqueKey: Date.now().toString()
}

// Enqueue request
await crawler.addRequests([newRequest])
},
});

await crawler.run([
'http://www.example.com/page-1',
'http://www.example.com/page-2',
]);
unwilling-turquoise
unwilling-turquoise•3y ago
wouldnt it make crawlee think its a duplicate?
extended-salmon
extended-salmon•3y ago
Good point, from my understanding it shouldn't be a problem if you're using sendRequest for the new request, but if you're using crawler.addRequests you will have to manually generate a uniqueKey for each RequestOptions to prevent it from being marked as duplicate. I have updated my snippet to show how to do this.
unwilling-turquoise
unwilling-turquoise•3y ago
thanks
Alexey Udovydchenko
Alexey Udovydchenko•3y ago
throw new Error("REASONOFRETRY")
Lukas Krivka
Lukas Krivka•3y ago
You can also do session.retire() before the throw to ensure it is discarded. Normally, it only increases error score for it
foreign-sapphire
foreign-sapphireOP•3y ago
I got it, thanks all 💓

Did you find this page helpful?