cdogC
Apify & Crawleeβ€’3w agoβ€’
6 replies
cdog

Why request_queues's metadata file is not cleaned when purge_on_start=True?

Hello,

I find this behavior odd that when
Configuration(purge_on_start=True)
set, then the
storage/request_queues/default/__metadata__.json
file is not purged in method
crawlee.storage_clients._file_system._request_queue_client.FileSystemRequestQueueClient.purge
along with the request files.

What is the reason behind this? Which use-case this covers?


If you run the repro code (it crawls 2 URLs) multiple times right after another and check the log/metadata json. I see the following:

# first run
[BeautifulSoupCrawler] INFO  Crawled 0/2 pages, 0 failed requests, desired concurrency 1.

# second run
[BeautifulSoupCrawler] INFO  Crawled 0/4 pages, 0 failed requests, desired concurrency 1.


The 2nd run 0/4 is misleading because the only two requests were scheduled and the previous one has been purged.
After the 2nd run the manifest.json content contains handled_request_count: 4 and total_request_count: 4 which is printed to the logs.
My expectation would be 2 for both values
{
  "id": "7xvLZTJolTixoRk1x",
  "name": null,
  "accessed_at": "2026-01-02 15:10:06.721976+00:00",
  "created_at": "2026-01-02 15:09:48.267199+00:00",
  "modified_at": "2026-01-02 15:10:06.719162+00:00",
  "had_multiple_clients": false,
  "handled_request_count": 4,
  "pending_request_count": 0,
  "total_request_count": 4
}


I've attached repro code as a file due to post length limitation

Thank you,
CL
Was this page helpful?