PlayWrightCrawler new request results are bleeding into old requests. RequestQueue issue?

Hello, first some code:

crawl function

     async function crawl (jobId, websiteURL, cb) {

      var crawler = new crawlee.PlaywrightCrawler({
      // Use the requestHandler to process each of the crawled pages.
      async requestHandler({ request, page, enqueueLinks, log }) {

          const element = await page.$$eval('img', as => as.map(a => a.src));
          if (element.length > 0) {
            for (var img of element) {
              if(cb.indexOf(img) === -1) {
                cb.push(img);
              }
            }
          }
          
          // Extract links from the current page
          // and add them to the crawling queue.
          await enqueueLinks();
      },
      sessionPoolOptions: { persistStateKey: jobId, persistStateKeyValueStoreId: jobId },

     });

    await crawler.run([websiteURL]);
    await crawler.teardown()
    
    return cb;
}


setInterval calls this function

 
   async function fetchImagesUrls (uid, jobId, websiteURL) {
   console.log("Fetching images...")

   const results = await crawl(jobId, websiteURL, cb = []);
   console.log(results);

   return results;
}


Background
: I'm calling the
fetchImagesUrls
from a
setInterval
function simulating a 'cron job'. I purposely make
setinterval
pick up Job#1 (details are fetched from a DB) then when the Job#1 starts, I make Job#2 be available for processing.

Behavior
: Now Job#1 and Job#2 are running from two different calls, however, the results are getting mixed into each other.

I've tried useState() and my own callback (as shown here) - is there a way to make new calls be isolated to their own results set?

I understand I might be missing something regarding JS fundamentals, but some guidance would be much appreciated. Thanks!
Was this page helpful?