popular-magentaP
Apify & Crawleeโ€ข3y agoโ€ข
3 replies
popular-magenta

purging request queue

Hello everyone,

I'm trying to integrate crawlee on an express server so that I can start crawling a site when I go to a specific route.
Everything works fine for the first request, but when I make a second request the urls are no longer crawled.
From what I understand, this is because the urls already crawled are stored with their id.

How do I empty the url table?
I tried adding CRAWLEE_PURGE_ON_START = 'true' without much success.

The first crawl :
INFO  PlaywrightCrawler: Starting the crawler.
INFO  PlaywrightCrawler: All requests from the queue have been processed, the crawler will shut down.
INFO  PlaywrightCrawler: Final request statistics: {"requestsFinished":2,"requestsFailed":0,"retryHistogram":[2],"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":1498,"requestsFinishedPerMinute":36,"requestsFailedPerMinute":0,"requestTotalDurationMillis":2996,"requestsTotal":2,"crawlerRuntimeMillis":3334}INFO  PlaywrightCrawler: Finished! Total 2 requests: 2 succeeded, 0 failed. {"terminal":true}


The second crawl (on same url):
INFO  PlaywrightCrawler: Starting the crawler.
INFO  PlaywrightCrawler: All requests from the queue have been processed, the crawler will shut down.
INFO  PlaywrightCrawler: Final request statistics: {"requestsFinished":0,"requestsFailed":0,"retryHistogram":[],"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":0,"requestsTotal":0,"crawlerRuntimeMillis":239}      
INFO  PlaywrightCrawler: Finished! Total 0 requests: 0 succeeded, 0 failed. {"terminal":true}




Thanks in advance to anyone who can help me. ๐Ÿ™
Was this page helpful?