hurt-tomatoH
Apify & Crawlee3y ago
3 replies
hurt-tomato

Passing user data to the crawler ?

Hello,
I am trying to find how to best handle the output of my scraper with Datasets.
I have a main request handler dispatching to sub-handlers based on labels, and I would like to have a Dataset for each label/sub-handler with data following a specific format (basically a database table).
I could probably open and close named Datasets every time I process one request but considering that (as far as I understand) Datasets are stored on-disk this would seem quite wasteful in terms of disk I/O.
Is there a way to pass my datasets to the crawler so that any request can access them ?

I know about Request's userData but that would require passing them explicitly to every new Request I create.
I would like to avoid global variables, especially given that I would have to initialize them which would be TypeScript-unfriendly.
I thought useState would be what I was looking for but looking at answers on this server seems to indicate I am quite wrong, and I am fairly certain that SessionPools are not what I am looking for.
If that makes any difference, I am using the CheerioCrawler.

Thanks !
Was this page helpful?