Whole crawler dies because "failed to lookup address information: Name or service not known"
I am not able to reproduce it in a simple example (it may be a transient error), but I have gotten this error regularly and it kills the crawler completely.
This is my crawler:
I am on version 1.0.4 and I was crawling crawlee.dev (though it doesn't fail in a specific page)
Traceback:
File "crawlee/crawlers/_basic/_basic_crawler.py", line 1366, in __run_task_function
if not (await self._is_allowed_based_on_robots_txt_file(request.url)):
File "crawlee/crawlers/_basic/_basic_crawler.py", line 1566, in _is_allowed_based_on_robots_txt_file
robots_txt_file = await self._get_robots_txt_file_for_url(url)
File "crawlee/crawlers/_basic/_basic_crawler.py", line 1589, in _get_robots_txt_file_for_url
robots_txt_file = await self._find_txt_file_for_url(url)
File "crawlee/crawlers/_basic/_basic_crawler.py", line 1599, in _find_txt_file_for_url
return await RobotsTxtFile.find(url, self._http_client)
File "crawlee/_utils/robots.py", line 48, in find
return await cls.load(str(robots_url), http_client, proxy_info)
File "crawlee/_utils/robots.py", line 59, in load
response = await http_client.send_request(url, proxy_info=proxy_info)
File "crawlee/http_clients/_impit.py", line 167, in send_request
response = await client.request(
impit.ConnectError: Failed to connect to the server.
Reason: hyper_util::client::legacy::Error(
Connect,
ConnectError(
"dns error",
Custom {
kind: Uncategorized,
error: "failed to lookup address information: Name or service not known",
},
),
)
exited with code 1Traceback:
File "crawlee/crawlers/_basic/_basic_crawler.py", line 1366, in __run_task_function
if not (await self._is_allowed_based_on_robots_txt_file(request.url)):
File "crawlee/crawlers/_basic/_basic_crawler.py", line 1566, in _is_allowed_based_on_robots_txt_file
robots_txt_file = await self._get_robots_txt_file_for_url(url)
File "crawlee/crawlers/_basic/_basic_crawler.py", line 1589, in _get_robots_txt_file_for_url
robots_txt_file = await self._find_txt_file_for_url(url)
File "crawlee/crawlers/_basic/_basic_crawler.py", line 1599, in _find_txt_file_for_url
return await RobotsTxtFile.find(url, self._http_client)
File "crawlee/_utils/robots.py", line 48, in find
return await cls.load(str(robots_url), http_client, proxy_info)
File "crawlee/_utils/robots.py", line 59, in load
response = await http_client.send_request(url, proxy_info=proxy_info)
File "crawlee/http_clients/_impit.py", line 167, in send_request
response = await client.request(
impit.ConnectError: Failed to connect to the server.
Reason: hyper_util::client::legacy::Error(
Connect,
ConnectError(
"dns error",
Custom {
kind: Uncategorized,
error: "failed to lookup address information: Name or service not known",
},
),
)
exited with code 1This is my crawler:
crawler = AdaptivePlaywrightCrawler.with_beautifulsoup_static_parser(
playwright_crawler_specific_kwargs={
"browser_type": "firefox",
"headless": True,
},
max_session_rotations=10,
retry_on_blocked=True,
max_request_retries=5,
keep_alive=True,
respect_robots_txt_file=True,
)crawler = AdaptivePlaywrightCrawler.with_beautifulsoup_static_parser(
playwright_crawler_specific_kwargs={
"browser_type": "firefox",
"headless": True,
},
max_session_rotations=10,
retry_on_blocked=True,
max_request_retries=5,
keep_alive=True,
respect_robots_txt_file=True,
)I am on version 1.0.4 and I was crawling crawlee.dev (though it doesn't fail in a specific page)