colossal-harlequinC

Handle a 401 in errorHandler by detecting login form and gracefully continuing if present

Hello there!

I'm working on a page crawler that can handle logging into sites, and then crawling around as that user. We've had a lot of success so far with Crawlee (PuppeteerCrawler) by detecting the login in
requestHandler
, logging in, and then continuing with the crawl.

Recently we were asked to support "logging in" to a simple password protection screen on a Netlify site.

On navigation to the page, the page returns a 401 status code but renders the password login form. Because of the 401 status code, Crawlee sees that and calls the
errorHandler
. Inside that error handler, I'm able to detect the form, login, but then I'm not sure how to save the crawl from that point.

I can enqueue links from the page but the next request it tries to load, it gets the 401 error again. I'm guessing a little bit but I think the page is closed at the end of the
errorHandler
and this causes me to lose my logged in session?

Is there anything I can do to abort the error handling flow from
errorHandler
and let the crawl continue as normal with the same page session?

I attempted to add a code example but hit the message limit. Can try in a follow up comment.
Screen_Shot_2023-02-10_at_4.52.40_PM.png
Was this page helpful?