sacred-roseS
Apify & Crawlee3y ago
2 replies
sacred-rose

Best practices to not crawl links that are already crawled when Actor is run as CRON

Hi, I'm building an actor that goes through a list and then goes to each individual item's page to extract information. The items themselves don't really change. New items can appear in the list and old ones can get removed. But if item details were extracted once, there's no need to repeatedly extract them on next Actor runs. E.g. Actor is run twice a day.

I'm planning to use Postgresql and Prisma to store extracted items details. Wondering, if it is a fine decision to access the target database while doing crawls within Actor (e.g. to check if URL was already scraped previously)? Or is there some better solution, possibly with built-in tools of Apify?
Thanks
Was this page helpful?