nehalistN
Apify & Crawleeโ€ข2mo agoโ€ข
3 replies
nehalist

Running the same crawler in paralell

๐Ÿ‘จโ€๐Ÿ’ปWeb-Scraping
I've got a config file like

jobs:
  - name: "amazon.de"
    enabled: true

    crawler:
      id: "test"
      enabled: true
      config:
        urls:
          - "https://example.com"

  - name: "amazon.fr"
    enabled: true

    crawler:
      id: "test"
      enabled: true
      config:
        urls:
          - "https://example.com"


this config is processed via p-queue and, depending on the crawler id, I want to run a specific crawler, e.g.:

export const testCrawler = createCrawler({
  id: "test",

  configSchema: z.object({
    urls: z.array(z.string()),
  }),

  handler: async ({ urls }) => {
    if (!crawler) {
      crawler = new PlaywrightCrawler({
        async requestHandler({ request, log }) {
          log.info(`Processing: ${request.url}`);
        },
      });
    }
    await crawler.run(urls);
  },
});


different sites might use the crawler but a different
requestHandler
. currently when running this I get

This crawler instance is already running, you can add more requests to it via
crawler.addRequests()


so it's not possible to spawn multiple crawlers of the same type at the same time? would kinda mess up my mental model (and the current impl) a bit. if so, I guess I need to "collect" all data before running the actual crawler? since different crawler "definitions" (e.g.
testCrawler
) require different configurations, this could get messy
Solution
Hey, there is a slight problem, your code first asks whether there already is a crawler
if (!crawler)
, if not it creates one, and then if thre already is one, it still calls
await crawler.run(urls)
, that is the issue - you can have multiple crawlers, but you can't have multiple
crawler.run
at the same time.
Was this page helpful?