Apify Discord Mirror

Updated 5 months ago

Best practice to stop/crash the actor/crawler on high ratio of errors?

At a glance

The post presents a code snippet that checks the ratio of finished and failed requests every 3 seconds and stops the process if the failed requests exceed the finished requests by 10. The community members discuss potential improvements to this approach.

One community member suggests using the message feature on Apify to handle problems with the crawler. Another community member praises the original poster's knowledge.

A third community member recommends using the stats feature from the Crawlee library, but notes that the original approach is not safe. They suggest handling sessions and bot protection logic to resolve blocking, rather than stopping the process. They recommend setting concurrency, max request retries, and implementing a scalable crawler.

There is no explicitly marked answer in the comments.

Useful resources
Following snippet works well for me, but it smells... sb have a cleaner approach?

Plain Text
// Every 3s, check for the ratio of finished (=success) and failed requests and stop the process if it's too bad
setInterval(() => {
  const { requestsFinished, requestsFailed } = crawler.stats.state
  if (requestsFailed > requestsFinished + 10) { // when failed 10 more than finished, stop trying bro
    console.warn(`πŸ’£ Too many failed requests, stopping! (${requestsFailed} failed, ${requestsFinished} finished)`)
    process.exit(1)
  }
}, 3000)
H
s
A
3 comments
There is now some message on apify which comes I guess from the crawler when there are problems. So maybe you can use that if you find out what is generating that message.
Attachment
image.png
This guy knows stuff πŸ™
you can use stats https://crawlee.dev/api/browser-crawler/class/BrowserCrawler#stats however approach itself is not safe - you supposed to handle sessions and-or bot protection to resolve blocking by logic, not by hammering web site doing many runs. I.e. set concurrency, max request retries, logic for session.markBad etc and implement scalable crawler.
Add a reply
Sign up and join the conversation on Discord