fair-rose
fair-rose3y ago

Avoid sharing same CheerioCrawler instance across multiple calls

Hi, I'm initializing a new CheerioCrawler() when ever a POST request is received by my server, but even though I make a new instance of CheerioCrawler for each request, they seem to be affected by configs like maxRequestsPerCrawl: 20 from previous requests. I've tried to do await crawler.teardown(); but to no avail. What's even worse is that for two requests that come in at the same time, the requestHandler is polluted by the separate calls! How do I isolate instances of CheerioCrawler from each other?
4 Replies
Oleg V.
Oleg V.3y ago
Hi, It's not 100% clear how You implemented your crawler. Can you please provide some reproduction / code snippet
Lukas Krivka
Lukas Krivka3y ago
You need to create a separate instance of RequestQueue and pass it to each crawler
fair-rose
fair-roseOP3y ago
awesome, this seems to do the trick, thanks @Lukas Krivka ! I just pass a uuid to every new instance of RequestQueue...are there any specific cleanup I need to do to make sure crawlee is cleaned up after every crawl?

Did you find this page helpful?