Apify Discord Mirror

Updated 2 years ago

failedRequestHandler, error argument, detailed error message lost

At a glance

The community member is using PlaywrightCrawler and the failedRequestHandler to handle errors. They are seeing errors in the log, such as "SSL_ERROR_BAD_CERT_DOMAIN", but when they check the error argument in the failedRequestHandler, they only see {"name":"Error"}, which does not contain the detailed error message. The community members suggest passing ignoreHTTPSErrors: true to the launchOptions, which fixes the SSL_ERROR_BAD_CERT_DOMAIN issue. However, the community member wants to access the detailed error message in the code, not just in the log. The solution provided is to use error.message instead of JSON.stringify(error) to access the detailed error message.

Useful resources
I am using PlaywrightCrawler and the failedRequestHandler to handle errors.
Something like this:
Plain Text
const crawler = new PlaywrightCrawler({
    ...
    async failedRequestHandler({request, response, page, log}, error) {

    ...

And sometimes I see errors in the log:
Plain Text
ERROR failedRequestHandler: Request failed and reached maximum retries. page.goto: SSL_ERROR_BAD_CERT_DOMAIN


But! when I am looking inside the error argument of the failedRequestHandler with the JSON.stringify(error)
I see only this: {"name":"Error"}

It seems, the detailed error message I see in the log is not accessible in the error argument.

So, how to access the detailed error message in code?
n
L
A
8 comments
failedRequestHandler, error argument, detailed error message lost
Test with pass ignoreHTTPSErrors to launchContext/launchOptions
Eg:
Plain Text
const crawler = new PlaywrightCrawler({
  launchContext: {
    launchOptions: 
      headless: false,
      ignoreHTTPSErrors: true,
    },
})
πŸ‘ yes, this is the fix for the SSL_ERROR_BAD_CERT_DOMAIN problem. Great!
However - there might be another 100 different errors... and I would like to see the error messages in the error argument mentioned above (I can not always look into log files, I think we have this error for this purpose!)
may be https://crawlee.dev/api/browser-crawler/interface/BrowserCrawlerOptions#postNavigationHooks ? Sounds like you want to add logic to response, so hooks might be a better way
May be add response listener before starting the navigation.
An option is to use page.on() [1], something like this:
Plain Text
def handle_response(response):
    if response.status == 500:
        log.errror("Error: " + response.status )
        exit(1)

page.on("response", handle_request)

[1] https://playwright.dev/python/docs/api/class-page?_highlight=page.on#pageonresponse
, thanks for your responses, but let us keep simple things simple.

there are PlaywrightCrawler functions handling situations when something goes wrong:
errorHandler()
failedRequestHandler()

both functions have error argument.
These functions are called at right time, when some timeout, SSL-error, or something else happens.
Great.

But the argument contains... nothing: {"name":"Error"} is not helpfull.
On the other hand: the information about error - it exist! I see it in the Crawlee log!

It is a bug in Crawlee.
well, this is finally solved:
https://github.com/apify/crawlee/discussions/1755

short version: it was a bad idea to call JSON.stringify(error) to check the contents of error

It is enough to call error.message
Add a reply
Sign up and join the conversation on Discord