extended-salmon•2y ago
Requests queues and preserving write usage
Hello, i'm creating a supermarket data scraper. The supermarket i'm scraping has a sitemap where are the urls for every product are listed. Currently i'm loading those in like this:
And the passing them to my crawler:
However this writes all of them again to the default request queue. Writing +23.000 items to the requests queue every run costs me minimally $0.50 every time. Is there any way I can write the the request queue (or another place) once, and then read from there the next runs?
1 Reply
But the list of URLs from sitemap is dynamic, no ?
That's why You need to update / scrape it if You want up-to-date information from your target site.
In Your case You can use named request queue:
Or You can try to store all URLs in named Key Value store if it makes sense for You.