Crawlee seems to be getting a cached version of a xml file

I'm starting my crawler with the first request being a https://site.com/sitemap.xml.
Then I read all the URLs in sitemap and check for the modified date (The website does update the modified that in the sitemap), and only crawl the pages that were modified.
The problem is that the crawler in production is doing that once every hour, and it's always getting the same version of the sitemap.xml. If I run it after a while in my PC, it finds modified URLs, crawl the pages and get the updates.

I'm enqueuing the XML with

await crawler.run([{url: "sitemap.xml", "label": "SITEMAP"}]);

await crawler.run([{url: "sitemap.xml", "label": "SITEMAP"}]);

Is there a way to add headers and prevent caching here?

Apify & Crawlee•3y ago•

10 replies

spotty-amber

Crawlee seems to be getting a cached version of a xml file

await crawler.run([{url: "sitemap.xml", "label": "SITEMAP"}]);

await crawler.run([{url: "sitemap.xml", "label": "SITEMAP"}]);

Is there a way to add headers and prevent caching here?

Crawlee seems to be getting a cached version of a xml file

Similar Threads

Crawlee seems to be getting a cached version of a xml file

Similar Threads

Similar Threads

Similar Threads