Hello, first some code:
crawl function
async function crawl (jobId, websiteURL, cb) {
var crawler = new crawlee.PlaywrightCrawler({
// Use the requestHandler to process each of the crawled pages.
async requestHandler({ request, page, enqueueLinks, log }) {
const element = await page.$$eval('img', as => as.map(a => a.src));
if (element.length > 0) {
for (var img of element) {
if(cb.indexOf(img) === -1) {
cb.push(img);
}
}
}
// Extract links from the current page
// and add them to the crawling queue.
await enqueueLinks();
},
sessionPoolOptions: { persistStateKey: jobId, persistStateKeyValueStoreId: jobId },
});
await crawler.run([websiteURL]);
await crawler.teardown()
return cb;
}
setInterval calls this function
async function fetchImagesUrls (uid, jobId, websiteURL) {
console.log("Fetching images...")
const results = await crawl(jobId, websiteURL, cb = []);
console.log(results);
return results;
}
Background
: I'm calling the
fetchImagesUrls
from a
setInterval
function simulating a 'cron job'. I purposely make
setinterval
pick up Job#1 (details are fetched from a DB) then when the Job#1 starts, I make Job#2 be available for processing.
Behavior
: Now Job#1 and Job#2 are running from two different calls, however, the results are getting mixed into each other.
I've tried useState() and my own callback (as shown here) - is there a way to make new calls be isolated to their own results set?
I understand I might be missing something regarding JS fundamentals, but some guidance would be much appreciated. Thanks!