memo23
memo233d ago

Retry requests with different headers and etc

Can we get this page also in JS version: https://crawlee.dev/python/docs/guides/error-handling Also I am interested regarding best practices how to change headers when 403 or 429 are encountered, so I don't repeat same request with same headers and different IP only
Error handling | Crawlee for Python · Fast, reliable Python web cr...
Crawlee helps you build and maintain your Python crawlers. It's open source and modern, with type hints for Python to help you catch bugs early.
1 Reply
Olexandra
Olexandra3d ago
Hi, thanks for your questions. Headers rotation part: First you need to have a pool of header sets and a getter logic to use afterwards (randomised or index based). That you can achieve manually or by employing a package like header-generator. If you want to keep track of used header sets inside of your pool, you can use request.retryCount. Then, if your Actor handles request sending automatically, you can adjust the headers in errorHandler once one of mentioned errors occurs. You can also remove 403 and 429 from the the list of blocked statuses under sessionPoolOptions, in this case you could add headers rotation inside requestHandler, don't forget to throw exception afterwards, so the request is retried. I f you use BasicCrawler, you will simply get a new set of headers every time you send request inside of requestHandler. Documentation part: I'll reach out to our team to share your interest in mentioned documentation page.

Did you find this page helpful?