correct-apricot•2y ago
how can I manage URLs that do not need to be crawled?
Hello. In Apify, how can I manage URLs that do not need to be crawled? I aim to crawl a single website with approximately 300,000 pages daily and repeat this process the next day. However, I want to avoid crawling pages that do not require it, such as those with a recent last crawl time or low content change frequency.
#crawlee-js #apify-platform
1 Reply
Store them to separate dataset, then load them as Set and use that to filter when enqueing