metropolitan-bronze
metropolitan-bronze2y ago

Stop all queued requests and wait for condition when a certain condition is met

Hello everyone. I am scrapping a site that requires authentication and I have made a postNav hook that re-validates my authentication when my auth session becomes invalid. Is there a way to stop all request when the first invalid auth session is found? The thing is that with concurrency enabled, I have several parallel requests that meet an invalid session while the first request that met the invalid session is re-vaidating itself, and thus I have several parallel logins which i would like to prevent.
8 Replies
national-gold
national-gold2y ago
Hello @Byto , you can stop the crawler by using the stop method on the autoscaled pool (https://crawlee.dev/api/core/class/AutoscaledPool#pause) and the resuming it by calling the resume method (https://crawlee.dev/api/core/class/AutoscaledPool#resume).
AutoscaledPool | API | Crawlee
Manages a pool of asynchronous resource-intensive tasks that are executed in parallel. The pool only starts new tasks if there is enough free CPU and memory available and the Javascript event loop is not blocked. The information about the CPU and memory usage is obtained by the {@apilink Snapshotter} class, which makes regular snapshots of syst...
metropolitan-bronze
metropolitan-bronzeOP2y ago
Seriously thank you so much! How can I connect that auto scale pool to my crawler? @vojtechmaslan
national-gold
national-gold2y ago
The crawler uses an instance of autoscale pool under the hood to manage concurrent scraping. You can access it like this: crawler.autoscaledPool.pause();
plain-purple
plain-purple2y ago
same issue.. how ?
metropolitan-bronze
metropolitan-bronzeOP2y ago
I ditched the idea and just used node events to wait for a specific event to resolve That way the actual crawler logic doesn’t stop it just waits for something else to finish In my use case I have 2 crawlers one with puppeteer and one with cheerio. The puppeteer crawler handles a login form and returns a cookie that I inject into the cheerio crawler to get a specific authenticated page. Then before going to the next page with the puppeteer crawler I wait for an event to resolve from the cheerio crawler
plain-purple
plain-purple2y ago
ah, I see.. thanks a lot for you summary, got the ideea ! 🙂
metropolitan-bronze
metropolitan-bronzeOP2y ago
Np, it feels hacky but honestly I didn’t find out a better way to make a crawler wait for another crawler to finish
MEE6
MEE62y ago
@Byto just advanced to level 1! Thanks for your contributions! 🎉

Did you find this page helpful?