other-emerald
other-emerald2y ago

Avoid sharing same CheerioCrawler instance across multiple calls

Hi, I'm initializing a new CheerioCrawler() when ever a POST request is received by my server, but even though I make a new instance of CheerioCrawler for each request, they seem to be affected by configs like maxRequestsPerCrawl: 20 from previous requests. I've tried to do await crawler.teardown(); but to no avail. What's even worse is that for two requests that come in at the same time, the requestHandler is polluted by the separate calls! How do I isolate instances of CheerioCrawler from each other?
4 Replies
Oleg V.
Oleg V.2y ago
Hi, It's not 100% clear how You implemented your crawler. Can you please provide some reproduction / code snippet
Lukas Krivka
Lukas Krivka2y ago
You need to create a separate instance of RequestQueue and pass it to each crawler
other-emerald
other-emeraldOP2y ago
awesome, this seems to do the trick, thanks @Lukas Krivka ! I just pass a uuid to every new instance of RequestQueue...are there any specific cleanup I need to do to make sure crawlee is cleaned up after every crawl?

Did you find this page helpful?