NeoNomadeN
Apify & Crawlee3y ago
11 replies
NeoNomade

CheerioCrawler hangs with 12 million urls

const requestList = await RequestList.open('My-ReqList', allUrls, { persistStateKey: 'My-ReqList' });
console.log(requestList.length())
const crawler = new CheerioCrawler({
  requestList,
  proxyConfiguration,
  requestHandler: router,
  minConcurrency: 32,
  maxConcurrency: 256,
  maxRequestRetries: 20,
  navigationTimeoutSecs: 6,
  loggingInterval: 30,
  useSessionPool: true,
  failedRequestHandler({ request }) {
      log.debug(`Request ${request.url} failed 20 times.`);
  },
});
await crawler.run()


allUrls is a list that contains 12 million urls, I'm trying to load them in the CheerioCrawler, but the process hangs using 14gb of ram memory and it doesn't even logs the requestList.length().

Can anybody help, please ?
Was this page helpful?