other-emerald•11mo ago
How to throttle enqueuing urls to next router
Hello guys. I have a router to scrape the url list and enqueue them to the next router.
However, I want to limit the enqueuing to throttle the request to the website.
I've tried add the crawler configuration but it doesn't work as intended. Even when I have a limit of request/min or request/crawl etc. it doesn't respect that.
Inititially I thought that its because the checking of limit is done only after a certain url-list is enqueued. And if a person enqueus a list bigger than a limit in the first go, then this could be the reason of limits not taking effect.
E.g. if limit is of 10 requests, and I enqueue the 25 request as a single array.
So I manually split the job-urls array into mulitple smaller batches.
However, this does not work as well. I mean the enqueuing is definitely done with the intervals of sleep, but the next router is still called at once after all the batches are enqueued.
4 Replies
View post on community site
This post has been pushed to the community knowledgebase. Any replies in this thread will be synced to the community site.
Apify Community
other-emeraldOP•11mo ago
Here's my cheerio config
xenial-black•11mo ago
Hey,
If I understand correctly, you are trying to limit the frequency of requests that are being sent to the server, right? If so, you should enqueue all of the requests at once, and by setting the
maxRequestsPerMinute
field, CheerioCrawler will automatically limit the frequency of requests sent to the server.
By "enqueuing", you only add the requests to the RequestQueue, which then automatically feeds the Crawler. The maxRequestsPerMinute
field does not limit the enqueuing rate, but the amount of requests that are being processed per minute. There is no real advantage in limiting the enqueuing process in general.other-emeraldOP•11mo ago
Thank you @Milunnn . I tried that already wasn't working for some reason.
But now Its working thank you for the help 🙂