ugly-tan
ugly-tan17mo ago

PuppeteerCrawler waitForResponse timeout issue. Seems like it skips desired request

I'm trying to get the data from ajax post call (graphQL) on a webpage but it does not seem to work I have tried to run the crawler with headful mode and open the network tab, the request is being made and response is there but waitForResponse does not seem to work ( Here's my code:
const crawler = new PuppeteerCrawler({
proxyConfiguration,
requestQueue,
maxRequestRetries: 5,
navigationTimeoutSecs: 180,
requestHandlerTimeoutSecs: 180,
async requestHandler({ request, page }) {
// ...
log.warning('GraphQL starting to wait');

await page.waitForNetworkIdle();

log.warning('IDLE!!!');

await page.waitForRequest(
(req) => req.url().includes(URL_PROPERTIES_DICTIONARY.GRAPHQL_PATH),
);

log.warning('GraphQL request is done');

const response = await page.waitForResponse(
(httpResponse) => httpResponse.status() === 200 && httpResponse.url().includes(URL_PROPERTIES_DICTIONARY.GRAPHQL_PATH),
{ timeout: 180 * 1000 },
);

log.warning('GraphQL response arrived');

const data = await response.json();
//...
const crawler = new PuppeteerCrawler({
proxyConfiguration,
requestQueue,
maxRequestRetries: 5,
navigationTimeoutSecs: 180,
requestHandlerTimeoutSecs: 180,
async requestHandler({ request, page }) {
// ...
log.warning('GraphQL starting to wait');

await page.waitForNetworkIdle();

log.warning('IDLE!!!');

await page.waitForRequest(
(req) => req.url().includes(URL_PROPERTIES_DICTIONARY.GRAPHQL_PATH),
);

log.warning('GraphQL request is done');

const response = await page.waitForResponse(
(httpResponse) => httpResponse.status() === 200 && httpResponse.url().includes(URL_PROPERTIES_DICTIONARY.GRAPHQL_PATH),
{ timeout: 180 * 1000 },
);

log.warning('GraphQL response arrived');

const data = await response.json();
//...
As you can see I also have added waitForNetworkIdle for testing and it finishes before waitForResponse, which is strange. See the logs:
INFO Page opened. {"label":"vehicle","url":"https://www.autotrader.co.uk/car-details/202307270142806?sort=relevance&advertising-location=at_cars&make=Audi&model=A2&postcode=PO16%207GZ&fromsra"}
WARN GraphQL starting to wait
WARN IDLE!!!
WARN PuppeteerCrawler: Reclaiming failed request back to the list or queue. Timed out after waiting 30000ms
INFO Page opened. {"label":"vehicle","url":"https://www.autotrader.co.uk/car-details/202307270142806?sort=relevance&advertising-location=at_cars&make=Audi&model=A2&postcode=PO16%207GZ&fromsra"}
WARN GraphQL starting to wait
WARN IDLE!!!
WARN PuppeteerCrawler: Reclaiming failed request back to the list or queue. Timed out after waiting 30000ms
Maybe I'm missing something? By the way, the code was written for apify sdk version 1 and was working OK. I have upgraded to v3 and it stopped working OR it works reallly slow. like really slow
8 Replies
memo23
memo2317mo ago
I would use page.on response event, just add condition for that particular link, if you keep on struggling dm me
ugly-tan
ugly-tanOP17mo ago
@memo23
ugly-tan
ugly-tanOP17mo ago
https://stackoverflow.com/questions/77397585/how-to-wait-for-specific-ajax-request-in-puppeteer-crawler Few month ago I was fixing this exact scrapper and had this same issue. But I was able to solve it with waitForResponse and it was working OK with Apify sdk v1. Now with Apify SDK v3 it's not working as expected
Stack Overflow
How to wait for specific AJAX request in Puppeteer crawler
I need to fetch the data from ajax request made to graphQL. Pages are crawled by PuppeteerCrawler: const crawler = new Apify.PuppeteerCrawler({ preNavigationHooks: [ async ({ page }): Promise...
memo23
memo2317mo ago
Dm me
Alexey Udovydchenko
You need to add waiting for response in preNavigationHooks like decribed here: https://docs.apify.com/academy/node-js/how_to_fix_target-closed#page-closed-solution
ugly-tan
ugly-tanOP17mo ago
Спасибо большое, это то что я искал 🥹 I've tried the above example and in my case the context is always undefined:
preNavigationHooks: [
async ({ page, context }) => {
log.info('context', { context, type: typeof context });
context.responsePromise = page
.waitForResponse(`https://www.autotrader.co.uk${URL_PROPERTIES_DICTIONARY.GRAPHQL_PATH}`)
.catch((e) => e);
preNavigationHooks: [
async ({ page, context }) => {
log.info('context', { context, type: typeof context });
context.responsePromise = page
.waitForResponse(`https://www.autotrader.co.uk${URL_PROPERTIES_DICTIONARY.GRAPHQL_PATH}`)
.catch((e) => e);
INFO context {"type":"undefined"}
WARN PuppeteerCrawler: Reclaiming failed request back to the list or queue. TypeError: Cannot set properties of undefined (setting 'responsePromise')
INFO context {"type":"undefined"}
WARN PuppeteerCrawler: Reclaiming failed request back to the list or queue. TypeError: Cannot set properties of undefined (setting 'responsePromise')
MEE6
MEE617mo ago
@4unkur just advanced to level 2! Thanks for your contributions! 🎉
ugly-tan
ugly-tanOP17mo ago
preNavigationHooks: [
async (context) => {
context.responsePromise = context.page
.waitForResponse((httpResponse) => httpResponse.url().includes(URL_PROPERTIES_DICTIONARY.GRAPHQL_PATH))
.catch((e) => e);
},
],
async requestHandler({ request, page, responsePromise }) {
preNavigationHooks: [
async (context) => {
context.responsePromise = context.page
.waitForResponse((httpResponse) => httpResponse.url().includes(URL_PROPERTIES_DICTIONARY.GRAPHQL_PATH))
.catch((e) => e);
},
],
async requestHandler({ request, page, responsePromise }) {
basically I did this in the end and it is working. @Alexey Udovydchenko Thank very much for sharing the right article.

Did you find this page helpful?