MiroM
Apify & Crawlee9mo ago
4 replies
Miro

enqueue_links does not find any links

Hello, I encountered a weird issue where enqueue_links does not find any links on a webpage, specifically https://nanlab.tech. It does not find any links no matter what strategy I choose. I also tried to use extract_links, which managed to find all links with strategy all, but with strategies same-origin and same-hostname no link is extracted and with strategy same-domain there is an error. I am using the latest version of crawlee for python 0.6.10 and for scraping I am using Playwright. Any idea what might be the issue?

Here is the handler:
@self.crawler.router.default_handler
async def request_handler(context: PlaywrightCrawlingContext) -> None: # type: ignore
text = await context.page.content()
self._data[context.request.url.strip()] = {
"html": text,
"timestamp": (
datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
),
}


await asyncio.sleep(self._sleep_between_requests)
links = await context.extract_links()
print("---------------------------------------------------", len(links), links)
await context.enqueue_links(exclude=[self._blocked_extensions])

I am also setting max_requests to 100 and max_crawl_depth to 2 when creating crawler.
Nanlab
Let's jump together to transform your ideas into solutions for your business.
Was this page helpful?