sunny-green•2y ago
Avoid sharing same CheerioCrawler instance across multiple calls
Hi, I'm initializing a
new CheerioCrawler() when ever a POST request is received by my server, but even though I make a new instance of CheerioCrawler for each request, they seem to be affected by configs like maxRequestsPerCrawl: 20 from previous requests. I've tried to do await crawler.teardown(); but to no avail. What's even worse is that for two requests that come in at the same time, the requestHandler is polluted by the separate calls! How do I isolate instances of CheerioCrawler from each other?4 Replies
Hi,
It's not 100% clear how You implemented your crawler.
Can you please provide some reproduction / code snippet
You need to create a separate instance of RequestQueue and pass it to each crawler
sunny-greenOP•2y ago
awesome, this seems to do the trick, thanks @Lukas Krivka !
I just pass a uuid to every new instance of RequestQueue...are there any specific cleanup I need to do to make sure crawlee is cleaned up after every crawl?
You can use drop() method
https://crawlee.dev/api/core/class/RequestQueue#drop