old-apricot
old-apricot•4y ago

page.setRequestInterception(true)

How to I can use page.setRequestInterception(true) in PuppeteerCrawler (not use raw Puppeteer)
9 Replies
MEE6
MEE6•4y ago
@songoku just advanced to level 2! Thanks for your contributions! 🎉
Oleg V.
Oleg V.•4y ago
preNavigationHooks is the right place for it: https://crawlee.dev/api/puppeteer-crawler/interface/PuppeteerCrawlerOptions#preNavigationHooks Example:
preNavigationHooks: [async ({ page }, gotoOptions) => {
gotoOptions.waitUntil = 'domcontentloaded';
await page.setRequestInterception(true);

page.on('request', async (req) => {
if (req.url().includes('something-you-re-looking-for')) {
// your logic
}
await req.continue(); // if you don't call this, it will hang indefinitely
});
await page.setRequestInterception(false);
}],
preNavigationHooks: [async ({ page }, gotoOptions) => {
gotoOptions.waitUntil = 'domcontentloaded';
await page.setRequestInterception(true);

page.on('request', async (req) => {
if (req.url().includes('something-you-re-looking-for')) {
// your logic
}
await req.continue(); // if you don't call this, it will hang indefinitely
});
await page.setRequestInterception(false);
}],
old-apricot
old-apricotOP•4y ago
thank you
Alexey Udovydchenko
Alexey Udovydchenko•4y ago
you enabling http requests tracking, after that you need to page.on('request') (or response) if you not adding your own logic to process responses or requests then it makes no sense to enable interception, right? 😉
old-apricot
old-apricotOP•4y ago
I want to obtain respone ajax from request main URL
genetic-orange
genetic-orange•4y ago
@songoku the Ajax request is made on the page?
old-apricot
old-apricotOP•4y ago
yes
genetic-orange
genetic-orange•4y ago
Within your preNavigationHooks, you can add a function that looks like this. It will listen for responses:
async ({ page }) => {
page.on('response', function handleResponse(res) {
if (res.url().includes('foo')) {
// do something
}

page.off('response', handleResponse);
});
}
async ({ page }) => {
page.on('response', function handleResponse(res) {
if (res.url().includes('foo')) {
// do something
}

page.off('response', handleResponse);
});
}
Lukas Krivka
Lukas Krivka•4y ago
Yes, you don't need request interception at all. You only care about responses. You can also scrape the ajax directly as explained here https://developers.apify.com/academy/api-scraping
Apify
API scraping · Apify Developers
Learn all about how the professionals scrape various types of APIs with various configurations, parameters, and requirements.

Did you find this page helpful?