Gracefully closing the crawler with keepalive flag true
Gracefully closing the crawler with keepalive flag true
At a glance
The community member is using a Puppeteer crawler with keepAlive set to true and crawler.run() (without await). This runs the crawler indefinitely, and new requests added to the queue are processed. The community member wants to gracefully close the crawler, processing all pending requests in the queue before shutting down. However, using crawler.teardown() abruptly closes the crawler instances without processing the pending requests.
In the comments, another community member suggests using the isFinishedFunction option to define a custom function that checks if all requests have been handled and the crawler can be safely shut down. Another comment simply states that the community member can check if the queue is empty.
I'm using puppeteer crawler with keepAlive as true and crawler.run() (without await).
This runs the crawler infinitely and if I insert new requests to the requests queue, they get processed.
(I'm using non persisted request queue)
What I want is to gracefully close the crawler, as in If I get a signal to close, I want to process all the pending requests in the requests queue first and then kill the crawler.
Right now If I do crawler.teardown(), it abruptly closes the crawler instances without processing the pending requests.