Apify Discord Mirror

Updated 5 months ago

Blocking certain requests

At a glance

The community member is trying to block certain requests in Puppeteer, but the initial approach does not seem to work. Another community member suggests an alternative approach using Playwright, which involves intercepting and aborting certain resource types. However, the community member notes that the Playwright approach does not work with Puppeteer.

The community member then provides a revised solution for Puppeteer, which involves setting request interception and aborting certain resource types. This approach is reported to work well, and the community member also mentions using it in conjunction with a Puppeteer ad-blocker. The community members express gratitude for the solution.

There is no explicitly marked answer, but the community members provide a working solution for blocking certain requests in Puppeteer.

Useful resources
I'm trying to block some requests in Puppeteer but it doesn't seem to work if I run the script headed :
Plain Text
const blockedResourceTypes = ['webp', 'svg', 'mp4', 'jpeg', 'gif', 'avif', 'font']
const crawler = new PuppeteerCrawler({
    launchContext: {
        launchOptions: {
            headless: false,
            devtools: true,
            defaultViewport:{ width: 1920, height: 6000 },
            args: [
                '--disable-dev-shm-usage',
            ]
        },
        useIncognitoPages: true,
    },
    proxyConfiguration,
    requestHandler: router,
    maxConcurrency: 16,
    maxRequestRetries: 15,
    maxRequestsPerMinute: 2,
    navigationTimeoutSecs: 120,
    useSessionPool: true,
    failedRequestHandler({ request }) {
        log.debug(`Request ${request.url} failed 15 times.`);
    },

    preNavigationHooks: [
        async ({ addInterceptRequestHandler }) => {
            await addInterceptRequestHandler((request) => {
                if (blockedResourceTypes.includes(request.resourceType())) {
                    return request.respond({
                        status: 200,
                        body: 'useless shit',
                    });
                }
                return request.continue();
            });
        },
    ],
});


Any ideas ?
A
P
N
10 comments
just advanced to level 8! Thanks for your contributions! πŸŽ‰

I mostly go for something like (it should be almost identical for JS and Pupeteer):
Plain Text
const abortAssets: PlaywrightHook = async ({ page }) => {
    const RESOURCE_EXCLUSIONS = ['image', 'media', 'font', 'stylesheet'];
    await page.route('**/*', (route) => {
        if (RESOURCE_EXCLUSIONS.includes(route.request().resourceType())) {
            return route.abort();
        }
        return route.continue();
    });
};


const playwrightCrawler = new PlaywrightCrawler({
    // ...
    preNavigationHooks: [
        abortAssets,
    ],
    // ...
});
certainly routes don't work with Pupeteer

Rewrote it for Pupeteer/JS:

Plain Text
const abortAssets = async ({ page }) => {
    const RESOURCE_EXCLUSIONS = ['image', 'media', 'font', 'stylesheet'];
    await page.setRequestInterception(true);

    await page.on('request', (request) => {
        if (RESOURCE_EXCLUSIONS.includes(request.resourceType())) {
            return request.abort();
        }
        return request.continue();
    });
};


const crawler = new PuppeteerCrawler({
    preNavigationHooks: [
        abortAssets,
    ],
    headless: false,
    proxyConfiguration,
    requestHandler: router,
});


Actually found a decent article about it on google https://www.scrapingbee.com/blog/block-requests-puppeteer/
Thanks will try immediately
Works absolutely great, also in conjuction with puppeteer ad-blocker
thank you very much !
Can I ask another thing here ?
If it is a different topic please create another thread πŸ™‚
new thread created
Add a reply
Sign up and join the conversation on Discord