like-gold•2y ago
Waiting for all requests to be added before hitting a request handler
Hi!
I have two requestsHandlers, registered from my router. Lets call them SITEMAP and PRODUCT.
I get all of the URLs from the sitemap.xml and push them into a queue with addRequestsBatched, and i set the wait waitForAllRequestsToBeAdded to true. My urls have the "PRODUCT" label, so they will hit the "PRODUCT" route.
The problem is, that even though I have 30k requests, as soon as the first batch of 1000 is added, my "PRODUCT" router stars crawling them.
Is there any posibility of not letting that route to handle the requests, until the queue loads all of the URLs ?
What I am trying to achieve with that, is if my node process crashes for example, and I managed to load all of the URLs, the crawling from the "PRODUCTS" route will resume seamesly, as it already has all of the URLs from the sitemap already loaded.
Thanks!
2 Replies
Hi @Ayar ,
I am not sure if I understand it correctly, but the simplest solution could be to run two Crawlers, one after another - you may need to use different RequestQueue for each Crawler and you may use one Crawler for one specific Label. Once the first Crawler finishes handling all the
Label1
requests you may start the second one that would be handling Label2
requests.
You may also fill the requestQueue first without even starting the Crawler.like-goldOP•2y ago
Hello Pepa and thank you for your answer. That's what I was thinking of doing, but I wanted to make sure that something is not a miss from me and I understand correctly what can be done in this case.
Thanks a lot for your response and I wish you a great day !