Apify Discord Mirror

Updated 2 years ago

How can I add dynamically JS string function into `preNavigationHooks`?

At a glance

The community member is trying to dynamically add a JavaScript function as a string to the preNavigationHooks array in the CheerioCrawlerOptions. However, when they try to push the function string directly, they encounter an error that the hook is not a function.

In the comments, a community member suggests using eval(jsFunction), but the original poster notes that eval can be harmful. The original poster then shares their own solution, which involves transforming the function string into an array of functions using a custom _runHookWithEnhancedContext function.

Another community member suggests that in this case, it might be better to simply extend the Crawler class, but there is no explicitly marked answer in the post.

Useful resources
I would like to dynamically add a string (which describes a JS function) to preNavigationHooks array in CheerioCrawlerOptions [1]

Plain Text
const crawlerOptions = {
...  preNavigationHooks: [],
});
const jsFunction = "async ({ page, request }) => { log.info(`preNavigationHook ${request.url}`); }";
crawlerOptions.preNavigationHooks.push( ??? WHAT ???) 
const myCrawler = new CheerioCrawler(crawlerOptions);


If I do crawlerOptions.preNavigationHooks.push(jsFunction);, when I run crawler, I got error:
WARN CheerioCrawler: Reclaiming failed request back to the list or queue. TypeError: hook is not a function
at CheerioCrawler._executeHooks (D:\Developpement\NodeJS\Nowis_Scraper\node_modules@crawlee\basic\internals\basic-crawler.js:834:23)
at CheerioCrawler._handleNavigation (D:\Developpement\NodeJS\Nowis_Scraper\node_modules@crawlee\http\internals\http-crawler.js:326:20)
at CheerioCrawler._runRequestHandler (D:\Developpement\NodeJS\Nowis_Scraper\node_modules@crawlee\http\internals\http-crawler.js:286:24)

[1] https://crawlee.dev/api/cheerio-crawler/interface/CheerioCrawlerOptions#preNavigationHooks
H
L
L
3 comments
you can try push(eval(jsFunction))
Thanks for your suggestion. But eval can be harmful.

I was inspired by _runHookWithEnhancedContext [1] to make my own following JS function.
Plain Text
  _runHookWithEnhancedContext(hooks) {
    return hooks.map((hook) => function enhancedContext(ctx) {
      const { customData } = this.input;
      hook({ ...ctx, customData });
    }.bind(this));
 }

And do this:
Plain Text
const preNavigationHooks = "[ async (context) => { context.log.info(context.customData);\n context.log.info(`preNavigationHook ${context.request.url}`); }, ]"
// Transforms a pre/post navigation hooks string into array of Functions.
evaledPreNavigationHooks = evalArrayJSFunction(preNavigationHooks, 'preNavigationHooks');
preNavigationHooks.push(..._runHookWithEnhancedContext(evaledPreNavigationHooks));

[1] https://github.com/apify/apify-sdk-js/blob/master/packages/actor-scraper/cheerio-scraper/src/internals/crawler_setup.ts#L236
I guess in this case it would be better to simply extend the Crawler class
Add a reply
Sign up and join the conversation on Discord