Running crawlee multiple times with the same URL

Hi!

I am trying to build a crawler using PuppeteerCrawler. The crawler will be started by sending a POST to an API endpoint. The API is implemented using azure durable functions.

The first time I call the API it works as expected. The next time I call it I get no output. This is the log output on the second run:

INFO  PuppeteerCrawler: Initializing the crawler.
INFO  PuppeteerCrawler: All requests from the queue have been processed, the crawler will shut down.


How do I configure crawlee such that every call to the API runs a new crawl?

Here is my current implementation. This function is called from an orchestrator function.

const activityFunction: AzureFunction = async function (
  context: Context,
  crawlerParameters: CrawlingParameters
) {
  process.env.CRAWLEE_STORAGE_DIR = os.tmpdir();

  const nettsider = new Set();
  const crawler = new PuppeteerCrawler({
    async requestHandler({ request, page, enqueueLinks }) {
      nettsider.add({ title: await page.title(), url: request.url });
      await enqueueLinks({ exclude: [/.pdf$/, /.doc$/] });
    },
    maxRequestsPerCrawl: crawlerParameters.maxLenker,
  });
  await crawler.run([crawlerParameters.startUrl]);

  return Array.from(nettsider);
};
Was this page helpful?