Skip request in preNavigationHooks

is it possible to skip que request for url in preNavigationHooks ? I don't want to do the request at all in request handler if something occurs in preNavigationHooks. The only thing that worked for me is throwing a NonRetryableError but I think this is not ideal. The request.skipNavigation is not ideal because the request itself still occurs. ATM I'm using NonRetryableError but my logs are ugly. How do I suppress the logs? And I think too many NonRetryableError will cause Crawlee to fail?
Solution:
hmmm I like the idea with SKIP label.. I'll try that. Thanks
Jump to solution
4 Replies
Olexandra
Olexandra4w ago
Hi @André Mácola ! The approach with setting skipNavigation of request to true might make a bit more sense in your case. The request itself, as you said, will reach to requestHandler, but you can return right away with little to no log. If you throw a NonRetryableError, it will get to the log as every error causing an request to fail. There is no official way around how logging is working, but can try to monkeypath it. If you're interested in it, you could try to search in the history of this server.
André Mácola
André MácolaOP3w ago
There is an issue about this on github: https://github.com/apify/crawlee/issues/3107
GitHub
Rethink how skipNavigation works · Issue #3107 · apify/crawlee
The current implementation causes the requestHandler to break the type contract (e.g., context.$ will be undefined with CheerioCrawler if skipNavigation is used). We could avoid calling the request...
Matous
Matous3w ago
Hey, I don't think this is really connected to your case, but if you have problem with that, but if you are worried about this possibility you could still use combination of skipNavigation and middleware (fired from requestHandler before selecting the route) https://crawlee.dev/js/api/core/class/Router#use. Or since you are already setting skipNavigation you could also set a specific lable like "SKIP" and just return from the routeHandler...
Router | API | Crawlee for JavaScript · Build reliable crawlers. F...
Crawlee helps you build and maintain your crawlers. It's open source, but built by developers who scrape millions of pages every day for a living.
Solution
André Mácola
hmmm I like the idea with SKIP label.. I'll try that. Thanks

Did you find this page helpful?