constant-blue
constant-blue8mo ago

Handling of 4xx and 5xx in default handler (Python)

I built a crawler for crawling the websites and now trying to add functionality to also handle error pages/links like 4xx and 5xx. I was not able to find any documentation regarding that. So, the question is if it is supported and if yes in what direction to look at?
6 Replies
Hall
Hall8mo ago
Someone will reply to you shortly. In the meantime, this might help: -# This post was marked as solved by rast42. View answer.
Mantisus
Mantisus8mo ago
Hey @rast42 Standard crawlee has its own behavior for status error handling 5xx - cause a repeat 403, 429, 401 - cause session rotation if used 4xx - marked as erroneous without repetition If you want to handle any statuses yourself you can use ignore_http_error_status_codes.
fascinating-indigo
fascinating-indigo8mo ago
is it needed to include all the codes in this setting or can we set it to ignore all codes?
Mantisus
Mantisus8mo ago
You need to include all. Something like.
list(range(400,600))
list(range(400,600))
fascinating-indigo
fascinating-indigo8mo ago
crazy. is there no better solution to override the error handling?
Mantisus
Mantisus8mo ago
Could you give examples of the kind of behavior you want to achieve? Perhaps error_handler is better for your case https://crawlee.dev/python/api/class/BasicCrawler#error_handler

Did you find this page helpful?