rival-black
rival-black3y ago

Best practice to stop/crash the actor/crawler on high ratio of errors?

Following snippet works well for me, but it smells... sb have a cleaner approach?
// Every 3s, check for the ratio of finished (=success) and failed requests and stop the process if it's too bad
setInterval(() => {
const { requestsFinished, requestsFailed } = crawler.stats.state
if (requestsFailed > requestsFinished + 10) { // when failed 10 more than finished, stop trying bro
console.warn(`💣 Too many failed requests, stopping! (${requestsFailed} failed, ${requestsFinished} finished)`)
process.exit(1)
}
}, 3000)
// Every 3s, check for the ratio of finished (=success) and failed requests and stop the process if it's too bad
setInterval(() => {
const { requestsFinished, requestsFailed } = crawler.stats.state
if (requestsFailed > requestsFinished + 10) { // when failed 10 more than finished, stop trying bro
console.warn(`💣 Too many failed requests, stopping! (${requestsFailed} failed, ${requestsFinished} finished)`)
process.exit(1)
}
}, 3000)
3 Replies
HonzaS
HonzaS3y ago
There is now some message on apify which comes I guess from the crawler when there are problems. So maybe you can use that if you find out what is generating that message.
No description
rival-black
rival-blackOP3y ago
This @HonzaS guy knows stuff 🙏
Alexey Udovydchenko
you can use stats https://crawlee.dev/api/browser-crawler/class/BrowserCrawler#stats however approach itself is not safe - you supposed to handle sessions and-or bot protection to resolve blocking by logic, not by hammering web site doing many runs. I.e. set concurrency, max request retries, logic for session.markBad etc and implement scalable crawler.

Did you find this page helpful?