WARN CheerioCrawler: Reclaiming failed request back to the list or queue. Detected a session error,
Whole error:
WARN CheerioCrawler: Reclaiming failed request back to the list or queue. Detected a session error, rotating session...
What does this error mean? It shows when there is no webpage for example here http://www.cool-rent.eu/
Aside that this is really weird error message it is then retrying even when I have maxRequestRetries: 0
.
Can anything be done about it?
I have tried useSessionPool: false
but it did not help.
Thanks8 Replies
ok, I have tried to use older version of crawlee 3.3.0 and it works like it should, it just displays this error
ERROR CheerioCrawler: Request failed and reached maximum retries. RequestError: getaddrinfo ENOTFOUND www.cool-rent.eu
so I am not sure what happened in version 3.5.0xenial-black•3y ago
Hi Honza, thanks for submitting this issue. We've indeed changed the way proxy errors are handled in Crawlee
v3.5.0
(relevant PR here - https://github.com/apify/crawlee/pull/2002). With this new mechanism, proxy and blocking errors are retried by default without increasing the request retry count (instead, they have a separate limit of 10 session retries per request - and after that, the crawl is interrupted as this is a clear telltale sign that something is really wrong with the proxy config).
Unfortunately, I cannot reproduce your case - the http://www.cool-rent.eu/ is unreachable (I cannot even resolve the server's IP address). Crawlee v3.5.0
without proxies processes this correctly by returning the same ENOTFOUND
error as 3.3.0
. With proxies, I receive a 502 error (from the proxy server) - however, Crawlee does not recognize this error (which is imho correct behaviour) and the error is processed as a regular 5xx error with errorHandler
. Can you please share more details about the proxies (or Apify proxy groups) you have used? Have you used proxies even in the 3.3.0 case?
Thanks!Hi vroom,I have managed to reproduce this on the platform so I have shared the run url via private message.
Any news? Now the actor just crashes here is the run https://console.apify.com/view/runs/12EeBfXyGV3IoLUn9 there must be some issue with that PR.
deep-jade•3y ago
This is solved now
https://github.com/apify/crawlee/pull/2029
GitHub
feat: exceeding maxSessionRotations calls failedRequestHandler by b...
Any request exceeding the maxSessionRotations limit currently kills the crawler. This was intended for early exit on too many hard proxy errors, but proved to be somewhat confusing for users using ...
after some time I am again working on actor that takes a lot of urls and some of them do not exist anymore
and the crawler again shows that rotating proxy error, is this expected behaviour? I would expect the 404 error
for example
anybody? it is really annoying
xenophobic-harlequin•17mo ago
@HonzaS did you figure this out?
Nope, I think this issue is still not solved.
Hey, guys.
I will try to check it with our team.
Also faced with it several times.
Thanks for patience)