ambitious-aqua
ambitious-aqua3y ago

How can i skip .pdf files in PuppeteerCrawler

I want to skip all .pdf, .docx files from crawling.
1 Reply
metropolitan-bronze
metropolitan-bronze3y ago
See playwrightUtils.blockRequests Forces the Playwright browser tab to block loading URLs that match a provided pattern. This is useful to speed up crawling of websites, since it reduces the amount of data that needs to be downloaded from the web, but it may break some websites or unexpectedly prevent loading of resources. By default, the function will block all URLs including the following patterns: [".css", ".jpg", ".jpeg", ".png", ".svg", ".gif", ".woff", ".pdf", ".zip"] https://crawlee.dev/api/playwright-crawler/namespace/playwrightUtils#blockRequests
playwrightUtils | API | Crawlee
A namespace that contains various utilities for Playwright - the headless Chrome Node API. Example usage: ```javascript import { launchPlaywright, playwrightUtils } from 'crawlee'; // Navigate to https://www.example.com in Playwright with a POST request const browser = await launchPlaywright(); c...

Did you find this page helpful?