Apify Discord Mirror

Updated last year

skipNavigation per route label, instead of manually adding it to each request with given label

At a glance

The community member is looking for a way to automatically request and parse all route handlers except one when using web crawlers like Cheerio, JSDOM, and LinkeDOM. Currently, they have to remember to specify skipNavigation at every point of adding the request to the request queue. The community members discuss potential solutions, such as throwing a NonRetryableError in the preNav hook to skip the navigation, but this would cause the entire _runRequestHandler to fail. The community member would like to find a way to dynamically skip the navigation while still running the request handler to perform custom navigation, such as using gotScraping.

Use case: When using Cheerio, JSDOM, LinkeDOM crawlers and their routers. I often wanna automatically request+parse all the route handlers except one.
ATM I have to remember to specify skipNavigation at every point of adding the request to request queue. (IIUC)

Just food for thought, not urgent πŸ™‚
Attachment
Screen_2024-01-05_at_13.55.56.png
s
L
3 comments
or maybe or have some nice out of the box idea/pattern how to do this? πŸ₯½
Hey πŸ™‚ If you know you want to skip it upfront, then defining it when adding sounds good enough. For dynamic skip, what we do is that we throw NonRetryableError in pre nav and then monkeypath log so that it is not logged, ugly but works
If I'm reading the source code correctly, throwing in preNav hook would cause whole _runRequestHandler to fail = not running the requestHandler altogether

I would like to dynamically skipNavigation, but still run requestHandler to so I can do "custom navigation there", e.g. via gotScraping
Attachment
image.png
Add a reply
Sign up and join the conversation on Discord