embarrassing-maroonE
Apify & Crawleeโ€ข3y agoโ€ข
6 replies
embarrassing-maroon

Keeping track of the parent page with PlaywrightCrawler

Hi! I'm using Crawlee as an e2e test for broken links and generated diagrams in our documentation website. So far it's been successful and the only thing I'm missing is figuring out what page actually contained the broken link.

For example, this is the snippet I use to find pages that display the 404 message:

async requestHandler({ request, page, enqueueLinks, log }) {
    // check if Docusaurus handled 404
    const isDocusaurus404 = await page
      .locator(".terminal-body")
      .getByText("404")
      .count();

    if (isDocusaurus404) {
      console.log({ url: page.url()});      
    }

    await enqueueLinks();
  },


This will log the actual URL that does not exist, but I can't tell which page contained that URL. What's the easiest way to find this information? Sort of like a History API? Thanks!
Was this page helpful?