6 replies

Whole crawler dies because "failed to lookup address information: Name or service not known"

I am not able to reproduce it in a simple example (it may be a transient error), but I have gotten this error regularly and it kills the crawler completely.

Traceback:
  File "crawlee/crawlers/_basic/_basic_crawler.py", line 1366, in __run_task_function
    if not (await self._is_allowed_based_on_robots_txt_file(request.url)):

  File "crawlee/crawlers/_basic/_basic_crawler.py", line 1566, in _is_allowed_based_on_robots_txt_file
    robots_txt_file = await self._get_robots_txt_file_for_url(url)

  File "crawlee/crawlers/_basic/_basic_crawler.py", line 1589, in _get_robots_txt_file_for_url
    robots_txt_file = await self._find_txt_file_for_url(url)

  File "crawlee/crawlers/_basic/_basic_crawler.py", line 1599, in _find_txt_file_for_url
    return await RobotsTxtFile.find(url, self._http_client)

  File "crawlee/_utils/robots.py", line 48, in find
    return await cls.load(str(robots_url), http_client, proxy_info)

  File "crawlee/_utils/robots.py", line 59, in load
    response = await http_client.send_request(url, proxy_info=proxy_info)

  File "crawlee/http_clients/_impit.py", line 167, in send_request
    response = await client.request(

impit.ConnectError: Failed to connect to the server.
Reason: hyper_util::client::legacy::Error(
    Connect,
    ConnectError(
        "dns error",
        Custom {
            kind: Uncategorized,
            error: "failed to lookup address information: Name or service not known",
        },
    ),
)
exited with code 1

Traceback:
  File "crawlee/crawlers/_basic/_basic_crawler.py", line 1366, in __run_task_function
    if not (await self._is_allowed_based_on_robots_txt_file(request.url)):

  File "crawlee/crawlers/_basic/_basic_crawler.py", line 1566, in _is_allowed_based_on_robots_txt_file
    robots_txt_file = await self._get_robots_txt_file_for_url(url)

  File "crawlee/crawlers/_basic/_basic_crawler.py", line 1589, in _get_robots_txt_file_for_url
    robots_txt_file = await self._find_txt_file_for_url(url)

  File "crawlee/crawlers/_basic/_basic_crawler.py", line 1599, in _find_txt_file_for_url
    return await RobotsTxtFile.find(url, self._http_client)

  File "crawlee/_utils/robots.py", line 48, in find
    return await cls.load(str(robots_url), http_client, proxy_info)

  File "crawlee/_utils/robots.py", line 59, in load
    response = await http_client.send_request(url, proxy_info=proxy_info)

  File "crawlee/http_clients/_impit.py", line 167, in send_request
    response = await client.request(

impit.ConnectError: Failed to connect to the server.
Reason: hyper_util::client::legacy::Error(
    Connect,
    ConnectError(
        "dns error",
        Custom {
            kind: Uncategorized,
            error: "failed to lookup address information: Name or service not known",
        },
    ),
)
exited with code 1

This is my crawler:

crawler = AdaptivePlaywrightCrawler.with_beautifulsoup_static_parser(
  playwright_crawler_specific_kwargs={
      "browser_type": "firefox",
      "headless": True,
  },
  max_session_rotations=10,
  retry_on_blocked=True,
  max_request_retries=5,
  keep_alive=True,
  respect_robots_txt_file=True,
)

crawler = AdaptivePlaywrightCrawler.with_beautifulsoup_static_parser(
  playwright_crawler_specific_kwargs={
      "browser_type": "firefox",
      "headless": True,
  },
  max_session_rotations=10,
  retry_on_blocked=True,
  max_request_retries=5,
  keep_alive=True,
  respect_robots_txt_file=True,
)

I am on version 1.0.4 and I was crawling crawlee.dev (though it doesn't fail in a specific page)

Apify & Crawlee•5mo ago•

6 replies

Eric

Whole crawler dies because "failed to lookup address information: Name or service not known"

I am not able to reproduce it in a simple example (it may be a transient error), but I have gotten this error regularly and it kills the crawler completely.

Traceback:
  File "crawlee/crawlers/_basic/_basic_crawler.py", line 1366, in __run_task_function
    if not (await self._is_allowed_based_on_robots_txt_file(request.url)):

  File "crawlee/crawlers/_basic/_basic_crawler.py", line 1566, in _is_allowed_based_on_robots_txt_file
    robots_txt_file = await self._get_robots_txt_file_for_url(url)

  File "crawlee/crawlers/_basic/_basic_crawler.py", line 1589, in _get_robots_txt_file_for_url
    robots_txt_file = await self._find_txt_file_for_url(url)

  File "crawlee/crawlers/_basic/_basic_crawler.py", line 1599, in _find_txt_file_for_url
    return await RobotsTxtFile.find(url, self._http_client)

  File "crawlee/_utils/robots.py", line 48, in find
    return await cls.load(str(robots_url), http_client, proxy_info)

  File "crawlee/_utils/robots.py", line 59, in load
    response = await http_client.send_request(url, proxy_info=proxy_info)

  File "crawlee/http_clients/_impit.py", line 167, in send_request
    response = await client.request(

impit.ConnectError: Failed to connect to the server.
Reason: hyper_util::client::legacy::Error(
    Connect,
    ConnectError(
        "dns error",
        Custom {
            kind: Uncategorized,
            error: "failed to lookup address information: Name or service not known",
        },
    ),
)
exited with code 1

Traceback:
  File "crawlee/crawlers/_basic/_basic_crawler.py", line 1366, in __run_task_function
    if not (await self._is_allowed_based_on_robots_txt_file(request.url)):

  File "crawlee/crawlers/_basic/_basic_crawler.py", line 1566, in _is_allowed_based_on_robots_txt_file
    robots_txt_file = await self._get_robots_txt_file_for_url(url)

  File "crawlee/crawlers/_basic/_basic_crawler.py", line 1589, in _get_robots_txt_file_for_url
    robots_txt_file = await self._find_txt_file_for_url(url)

  File "crawlee/crawlers/_basic/_basic_crawler.py", line 1599, in _find_txt_file_for_url
    return await RobotsTxtFile.find(url, self._http_client)

  File "crawlee/_utils/robots.py", line 48, in find
    return await cls.load(str(robots_url), http_client, proxy_info)

  File "crawlee/_utils/robots.py", line 59, in load
    response = await http_client.send_request(url, proxy_info=proxy_info)

  File "crawlee/http_clients/_impit.py", line 167, in send_request
    response = await client.request(

impit.ConnectError: Failed to connect to the server.
Reason: hyper_util::client::legacy::Error(
    Connect,
    ConnectError(
        "dns error",
        Custom {
            kind: Uncategorized,
            error: "failed to lookup address information: Name or service not known",
        },
    ),
)
exited with code 1

This is my crawler:

crawler = AdaptivePlaywrightCrawler.with_beautifulsoup_static_parser(
  playwright_crawler_specific_kwargs={
      "browser_type": "firefox",
      "headless": True,
  },
  max_session_rotations=10,
  retry_on_blocked=True,
  max_request_retries=5,
  keep_alive=True,
  respect_robots_txt_file=True,
)

crawler = AdaptivePlaywrightCrawler.with_beautifulsoup_static_parser(
  playwright_crawler_specific_kwargs={
      "browser_type": "firefox",
      "headless": True,
  },
  max_session_rotations=10,
  retry_on_blocked=True,
  max_request_retries=5,
  keep_alive=True,
  respect_robots_txt_file=True,
)

I am on version 1.0.4 and I was crawling crawlee.dev (though it doesn't fail in a specific page)

Whole crawler dies because "failed to lookup address information: Name or service not known"

Similar Threads

Whole crawler dies because "failed to lookup address information: Name or service not known"

Similar Threads

Similar Threads

Similar Threads