xenial-black•2y ago
Hello everyone, Is there a way to avoid
Hello everyone, Is there a way to avoid scraping same pages even if the crawler is restared ??? because I'm currently working on a news website crawler, However, with each run of the scraper, I'm encountering up to 80% duplicated news from previous runs. Any suggestions on how to address this issue effectively?
3 Replies
If running on local you can set enviroment variable CRAWLEE_PURGE_ON_START to false and then the crawler will use the same request queue all over again.
https://crawlee.dev/api/3.8/core/interface/ConfigurationOptions#purgeOnStart
probable-pink•2y ago
Thanks
quickest-silver•2y ago
If running in Apify, try naming your requests queue.