metropolitan-bronze•2y ago

Stop all queued requests and wait for condition when a certain condition is met

Hello everyone. I am scrapping a site that requires authentication and I have made a postNav hook that re-validates my authentication when my auth session becomes invalid. Is there a way to stop all request when the first invalid auth session is found? The thing is that with concurrency enabled, I have several parallel requests that meet an invalid session while the first request that met the invalid session is re-vaidating itself, and thus I have several parallel logins which i would like to prevent.

8 Replies

national-gold•2y ago

Hello @Byto , you can stop the crawler by using the stop method on the autoscaled pool (https://crawlee.dev/api/core/class/AutoscaledPool#pause) and the resuming it by calling the resume method (https://crawlee.dev/api/core/class/AutoscaledPool#resume).

AutoscaledPool | API | Crawlee

Manages a pool of asynchronous resource-intensive tasks that are executed in parallel. The pool only starts new tasks if there is enough free CPU and memory available and the Javascript event loop is not blocked. The information about the CPU and memory usage is obtained by the {@apilink Snapshotter} class, which makes regular snapshots of syst...

metropolitan-bronzeOP•2y ago

Seriously thank you so much! How can I connect that auto scale pool to my crawler? @vojtechmaslan

national-gold•2y ago

The crawler uses an instance of autoscale pool under the hood to manage concurrent scraping. You can access it like this: crawler.autoscaledPool.pause();

plain-purple•2y ago

same issue.. how ?

metropolitan-bronzeOP•2y ago

I ditched the idea and just used node events to wait for a specific event to resolve That way the actual crawler logic doesn’t stop it just waits for something else to finish In my use case I have 2 crawlers one with puppeteer and one with cheerio. The puppeteer crawler handles a login form and returns a cookie that I inject into the cheerio crawler to get a specific authenticated page. Then before going to the next page with the puppeteer crawler I wait for an event to resolve from the cheerio crawler

plain-purple•2y ago

ah, I see.. thanks a lot for you summary, got the ideea ! 🙂

metropolitan-bronzeOP•2y ago

Np, it feels hacky but honestly I didn’t find out a better way to make a crawler wait for another crawler to finish

MEE6•2y ago

@Byto just advanced to level 1! Thanks for your contributions! 🎉

Stop all queued requests and wait for condition when a certain condition is met

Did you find this page helpful?