HonzaS
HonzaS3y ago

WARN CheerioCrawler: Reclaiming failed request back to the list or queue. Detected a session error,

Whole error: WARN CheerioCrawler: Reclaiming failed request back to the list or queue. Detected a session error, rotating session... What does this error mean? It shows when there is no webpage for example here http://www.cool-rent.eu/ Aside that this is really weird error message it is then retrying even when I have maxRequestRetries: 0. Can anything be done about it? I have tried useSessionPool: false but it did not help. Thanks
8 Replies
HonzaS
HonzaSOP3y ago
ok, I have tried to use older version of crawlee 3.3.0 and it works like it should, it just displays this error ERROR CheerioCrawler: Request failed and reached maximum retries. RequestError: getaddrinfo ENOTFOUND www.cool-rent.eu so I am not sure what happened in version 3.5.0
xenial-black
xenial-black3y ago
Hi Honza, thanks for submitting this issue. We've indeed changed the way proxy errors are handled in Crawlee v3.5.0 (relevant PR here - https://github.com/apify/crawlee/pull/2002). With this new mechanism, proxy and blocking errors are retried by default without increasing the request retry count (instead, they have a separate limit of 10 session retries per request - and after that, the crawl is interrupted as this is a clear telltale sign that something is really wrong with the proxy config). Unfortunately, I cannot reproduce your case - the http://www.cool-rent.eu/ is unreachable (I cannot even resolve the server's IP address). Crawlee v3.5.0 without proxies processes this correctly by returning the same ENOTFOUND error as 3.3.0 . With proxies, I receive a 502 error (from the proxy server) - however, Crawlee does not recognize this error (which is imho correct behaviour) and the error is processed as a regular 5xx error with errorHandler. Can you please share more details about the proxies (or Apify proxy groups) you have used? Have you used proxies even in the 3.3.0 case? Thanks!
HonzaS
HonzaSOP3y ago
Hi vroom,I have managed to reproduce this on the platform so I have shared the run url via private message. Any news? Now the actor just crashes here is the run https://console.apify.com/view/runs/12EeBfXyGV3IoLUn9 there must be some issue with that PR.
deep-jade
deep-jade3y ago
GitHub
feat: exceeding maxSessionRotations calls failedRequestHandler by b...
Any request exceeding the maxSessionRotations limit currently kills the crawler. This was intended for early exit on too many hard proxy errors, but proved to be somewhat confusing for users using ...
HonzaS
HonzaSOP2y ago
after some time I am again working on actor that takes a lot of urls and some of them do not exist anymore and the crawler again shows that rotating proxy error, is this expected behaviour? I would expect the 404 error for example
WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. Detected a session error, rotating session...
2024-01-13T11:13:43.450Z page.goto: net::ERR_TUNNEL_CONNECTION_FAILED at https://cudaops.com/
2024-01-13T11:13:43.451Z Call log:
2024-01-13T11:13:43.452Z - navigating to "https://cudaops.com/", waiting until "load"
WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. Detected a session error, rotating session...
2024-01-13T11:13:43.450Z page.goto: net::ERR_TUNNEL_CONNECTION_FAILED at https://cudaops.com/
2024-01-13T11:13:43.451Z Call log:
2024-01-13T11:13:43.452Z - navigating to "https://cudaops.com/", waiting until "load"
anybody? it is really annoying
xenophobic-harlequin
xenophobic-harlequin17mo ago
@HonzaS did you figure this out?
HonzaS
HonzaSOP17mo ago
Nope, I think this issue is still not solved.
Oleg V.
Oleg V.17mo ago
Hey, guys. I will try to check it with our team. Also faced with it several times. Thanks for patience)

Did you find this page helpful?