multiple-amethyst
multiple-amethyst•2y ago

How to close the crawler from a RequestHandler?

Hey folks, I want to stop the scraper/crawler if I hit some arbritrary condition. Is there a way that I can do so from inside the RequestHandler? the closest function that I found is crawler.teardown() but it cant be executed inside a handler,
5 Replies
Alexey Udovydchenko
Alexey Udovydchenko•2y ago
Instead of await crawler.run() just crawler.run() and then teardown when you condition or event will be handled by your own code outside of crawler
multiple-amethyst
multiple-amethystOP•2y ago
issue is the conditions are triggered in specific routes of a site for eg. we have a resume function in our selenium scrapers which checks for duplicates and if some n number appear in a row we stop scraping assuming the rest of the data will have been scraped already too plus we have a couple of other such conditions, it would be helpful if something like this was present inside the request handlers similar question, how do I stop the request handler flow if some condition is satisfied? e.g. if some element is not present, stop the function right there. Will a simple return; suffice? since we anwyays dont return anything and just enqueue links. can someone help wth this? could really use this functionality to avoid duplicate/redundant scrapes etc or is there a way we can empty out request queue? I think this might work since the crawler will stop as soon as it sees there's nothing to scrape
MEE6
MEE6•2y ago
@AltairSama2 just advanced to level 4! Thanks for your contributions! 🎉
Lukas Krivka
Lukas Krivka•2y ago
For reference answered here https://discord.com/channels/801163717915574323/1075487274424352888/1179490876850974810 If you want to stop the request handler itself, you need to have a condition at that point, JS doesn't allow cancelling functions/promises from the outside.
multiple-amethyst
multiple-amethystOP•2y ago
yeah, that should not be an issue since we can control the number of requests via maxConcurrency thanks!

Did you find this page helpful?