frail-apricotโข2y ago
purging request queue
Hello everyone,
I'm trying to integrate crawlee on an express server so that I can start crawling a site when I go to a specific route.
Everything works fine for the first request, but when I make a second request the urls are no longer crawled.
From what I understand, this is because the urls already crawled are stored with their id.
How do I empty the url table?
I tried adding CRAWLEE_PURGE_ON_START = 'true' without much success.
The first crawl :
The second crawl (on same url):
Thanks in advance to anyone who can help me. ๐
2 Replies
frail-apricotOPโข2y ago
It's always when you ask the question that you find the solution a few minutes later ๐ .
For those looking for the solution: https://github.com/apify/crawlee/discussions/2026
GitHub
not able to call two different crawler with same url ยท apify crawle...
Which package is this bug report for? If unsure which one to select, leave blank None Issue description First call the CheerioCrawler with the url after few seconds call the PlaywrightCrawler with ...
xenial-blackโข2y ago
also with named queues you have to purge them manually, crawlee wont putge them
and the way to purge it is by calling
queue.drop()
and then again instantiating the queue