absent-sapphire
absent-sapphire•2y ago

How to find the end of all request handlers ?

Hello, I'm using JSDOMCrawler and inside requestHandler I have some logic for scraping the data and pushing that into an array called messages. I want to push all these messages to AWS SQS for further processing when the crawlee ends, but I found requestHandler is working asynchronously and due to this I'm not able to get all messages at the end. Any solution for this scenario ?
8 Replies
HonzaS
HonzaS•2y ago
You can put those messages to the dataset and after the crawler finishes put all from the dataset to the aws.
absent-sapphire
absent-sapphireOP•2y ago
I don't want to use dataset, any other way to achieve this ? My logic inside requestHandler is quite simple 1- create one entry in mongo 2- upload scarped data to AWS S3 3- push message with all details to one array But the most important thing I want to know is when everything is getting carwled so that I can proceed with further process
Pepa J
Pepa J•2y ago
Hi @AlgoAlchemist , If I understand it correctly you just need to wait for everything to be scraped. In that case you can do somenthing like this:
const messages = [];
const crawler = new CheerioCrawler({
// ...
requestHandler: async () => {
// ...
messages.push({ ... })
}
});

await crawler.run();

// Here I have all the messages and I do whatever I want with them.
console.log(messages);
const messages = [];
const crawler = new CheerioCrawler({
// ...
requestHandler: async () => {
// ...
messages.push({ ... })
}
});

await crawler.run();

// Here I have all the messages and I do whatever I want with them.
console.log(messages);
absent-sapphire
absent-sapphireOP•2y ago
Hello @Pepa J , Thank you for your suggestion, but this fails when you set maxConcurrency
Pepa J
Pepa J•2y ago
@AlgoAlchemist what do you mean by it fails?
absent-sapphire
absent-sapphireOP•2y ago
@Pepa J When we set maxConcurrency requestHandler keeps running in the background even though the crawler is stopped and eventually it leads to less no. of messages into the array at the end
MEE6
MEE6•2y ago
@AlgoAlchemist just advanced to level 1! Thanks for your contributions! 🎉
Pepa J
Pepa J•2y ago
@AlgoAlchemist Can you make minimal reproduceable example and send it here? It seems to me there has to be problem somewhere else..

Did you find this page helpful?