other-emerald•2y ago
Avoid sharing same CheerioCrawler instance across multiple calls
Hi, I'm initializing a
new CheerioCrawler()
when ever a POST request is received by my server, but even though I make a new instance of CheerioCrawler for each request, they seem to be affected by configs like maxRequestsPerCrawl: 20
from previous requests. I've tried to do await crawler.teardown();
but to no avail. What's even worse is that for two requests that come in at the same time, the requestHandler
is polluted by the separate calls! How do I isolate instances of CheerioCrawler
from each other?4 Replies
Hi,
It's not 100% clear how You implemented your crawler.
Can you please provide some reproduction / code snippet
You need to create a separate instance of RequestQueue and pass it to each crawler
other-emeraldOP•2y ago
awesome, this seems to do the trick, thanks @Lukas Krivka !
I just pass a uuid to every new instance of RequestQueue...are there any specific cleanup I need to do to make sure crawlee is cleaned up after every crawl?
You can use drop() method
https://crawlee.dev/api/core/class/RequestQueue#drop