worthy-azureW
Apify & Crawleeโ€ข4y agoโ€ข
6 replies
worthy-azure

How to share object between requests with Crawlee on Apify

Hello. While scraping website, I need an access object, which will be shared between all requests. I keep some data in this object, every request can read/write there. When all requests are handled, I do some validation and calculations on the data and write the result to Dataset.

It was easy in Apify SDKv2. I created instance of the object and passed it as parameter of handleXY methods. Like this:
const myData = new MyData();

const crawlerOptions = {
  handlePageFunction: async (context) => {
      switch (context.request.userData.type) {
        case "pageA": await handleBranch(myData); break;
        default: await handleStart(myData);
      }
    },
};

const crawler = new Apify.PuppeteerCrawler(crawlerOptions);
await crawler.run();
await Apify.pushData(myData.getData());


This works without any problems. I need to achieve the same behavior with Crawlee and I want to use routing. Since I can't pass any parameters to handlers, I create instance of
myData
, set this instance to
crawler
and then read it from it. Like this:

// main.js
const crawler = new PuppeteerCrawler();
crawler.myData = new MyData();

// routes.js
router.addDefaultHandler(async ({ crawler }) => {
  const myData = crawler.myData;
}


However, I found, that sometimes the task is restarted somehow. It handles some requests and then new Docker instance is created and this handles rest of requests. When this new instance is created, I lost instance of
myData
.

2022-10-11T12:56:01.157Z INFO  Request N
2022-10-11T12:56:20.894Z ACTOR: Pulling Docker image from repository.
2022-10-11T12:56:42.031Z ACTOR: Creating Docker container.
2022-10-11T12:56:42.303Z ACTOR: Starting Docker container.
2022-10-11T12:56:54.251Z INFO Request N + 1


How to solve this issue? Do I have to serialize this object to DataSet/KeyValueStore? What about parallel request? The best solution for me would be to keep all request in one Docker instance. Is it possible somehow?
Was this page helpful?