Apify Discord Mirror

Updated 5 months ago

WARN CheerioCrawler: Reclaiming failed request back to the list or queue. Detected a session error,

At a glance

The community member is experiencing an issue with the Crawlee library, where they are seeing a "WARN CheerioCrawler: Reclaiming failed request back to the list or queue. Detected a session error, rotating session..." error when trying to crawl a website that does not exist (e.g., http://www.cool-rent.eu/). They have tried setting maxRequestRetries: 0 and useSessionPool: false but the issue persists.

In the comments, another community member suggests that the issue may be related to changes made in Crawlee version 3.5.0, as the older version 3.3.0 did not exhibit the same behavior. The Crawlee team member responds that they have made changes to the way proxy errors are handled in version 3.5.0, and that the issue may be related to the proxy configuration. They request more details about the proxies used by the community member.

The community member shares a run URL with the Crawlee team, and the team member indicates that they are able to reproduce the issue. However, the issue appears to have been resolved in a later pull request (https://github.com/apify/crawlee/pull/2029).

After some time

Useful resources
Whole error: WARN CheerioCrawler: Reclaiming failed request back to the list or queue. Detected a session error, rotating session...
What does this error mean? It shows when there is no webpage for example here http://www.cool-rent.eu/
Aside that this is really weird error message it is then retrying even when I have maxRequestRetries: 0.
Can anything be done about it?
I have tried useSessionPool: false but it did not help.
Thanks
2
H
v
m
10 comments
ok, I have tried to use older version of crawlee 3.3.0 and it works like it should, it just displays this error ERROR CheerioCrawler: Request failed and reached maximum retries. RequestError: getaddrinfo ENOTFOUND www.cool-rent.eu so I am not sure what happened in version 3.5.0
Hi Honza, thanks for submitting this issue. We've indeed changed the way proxy errors are handled in Crawlee v3.5.0 (relevant PR here - https://github.com/apify/crawlee/pull/2002). With this new mechanism, proxy and blocking errors are retried by default without increasing the request retry count (instead, they have a separate limit of 10 session retries per request - and after that, the crawl is interrupted as this is a clear telltale sign that something is really wrong with the proxy config).

Unfortunately, I cannot reproduce your case - the http://www.cool-rent.eu/ is unreachable (I cannot even resolve the server's IP address). Crawlee v3.5.0 without proxies processes this correctly by returning the same ENOTFOUND error as 3.3.0 . With proxies, I receive a 502 error (from the proxy server) - however, Crawlee does not recognize this error (which is imho correct behaviour) and the error is processed as a regular 5xx error with errorHandler. Can you please share more details about the proxies (or Apify proxy groups) you have used? Have you used proxies even in the 3.3.0 case?

Thanks!
Hi vroom,I have managed to reproduce this on the platform so I have shared the run url via private message.
Any news? Now the actor just crashes here is the run https://console.apify.com/view/runs/12EeBfXyGV3IoLUn9 there must be some issue with that PR.
after some time I am again working on actor that takes a lot of urls and some of them do not exist anymore
and the crawler again shows that rotating proxy error, is this expected behaviour? I would expect the 404 error

for example

Plain Text
WARN  PlaywrightCrawler: Reclaiming failed request back to the list or queue. Detected a session error, rotating session...
2024-01-13T11:13:43.450Z page.goto: net::ERR_TUNNEL_CONNECTION_FAILED at https://cudaops.com/
2024-01-13T11:13:43.451Z Call log:
2024-01-13T11:13:43.452Z   - navigating to "https://cudaops.com/", waiting until "load"
anybody? it is really annoying
did you figure this out?
Nope, I think this issue is still not solved.
Hey, guys.
I will try to check it with our team.
Also faced with it several times.

Thanks for patience)
Add a reply
Sign up and join the conversation on Discord