skipNavigation per route label, instead of manually adding it to each request with given label
skipNavigation per route label, instead of manually adding it to each request with given label
At a glance
The community member is looking for a way to automatically request and parse all route handlers except one when using web crawlers like Cheerio, JSDOM, and LinkeDOM. Currently, they have to remember to specify skipNavigation at every point of adding the request to the request queue. The community members discuss potential solutions, such as throwing a NonRetryableError in the preNav hook to skip the navigation, but this would cause the entire _runRequestHandler to fail. The community member would like to find a way to dynamically skip the navigation while still running the request handler to perform custom navigation, such as using gotScraping.
Use case: When using Cheerio, JSDOM, LinkeDOM crawlers and their routers. I often wanna automatically request+parse all the route handlers except one. ATM I have to remember to specify skipNavigation at every point of adding the request to request queue. (IIUC)
Hey π If you know you want to skip it upfront, then defining it when adding sounds good enough. For dynamic skip, what we do is that we throw NonRetryableError in pre nav and then monkeypath log so that it is not logged, ugly but works
If I'm reading the source code correctly, throwing in preNav hook would cause whole _runRequestHandler to fail = not running the requestHandler altogether
I would like to dynamically skipNavigation, but still run requestHandler to so I can do "custom navigation there", e.g. via gotScraping