How to keep sessions alive while crawling?
I'm using puppeteercrawler to crawl a site. I am crawling both public and authenticated pages.
The site has many subdomains, and each subdomain has it's own session. Each session has a 15 minute duration, which is refreshed when a request is sent to an authenticated page with the session cookies attached. If session cookies are attached to a request to an unauthenticated page, the session expires.
Before I start crawling, I make POST requests to the login endpoint for each subdomain and store the returned session cookies in memory in a javascript Map. For requests that need to be authenticated, I get the session cookies out of the map and set it in a preNavigationHook (using page.setCookie)
My problem is, if there are a lot of requests in the queue, some of the session can expire by the time the crawler gets to those requests because they have just been sitting for 15+ minutes. I could check the page to see if I am actually authenticated and then retry fetching the session cookies, but I am wondering if there is a better way.
The site has many subdomains, and each subdomain has it's own session. Each session has a 15 minute duration, which is refreshed when a request is sent to an authenticated page with the session cookies attached. If session cookies are attached to a request to an unauthenticated page, the session expires.
Before I start crawling, I make POST requests to the login endpoint for each subdomain and store the returned session cookies in memory in a javascript Map. For requests that need to be authenticated, I get the session cookies out of the map and set it in a preNavigationHook (using page.setCookie)
My problem is, if there are a lot of requests in the queue, some of the session can expire by the time the crawler gets to those requests because they have just been sitting for 15+ minutes. I could check the page to see if I am actually authenticated and then retry fetching the session cookies, but I am wondering if there is a better way.