Apify Discord Mirror

Updated 5 months ago

infinite scrolling

At a glance

The community member is trying to implement infinite scrolling to scrape all products while the page is being scrolled down. They have provided code that scrapes product URLs and adds them to a dataset, but they are having trouble with the infinite scrolling implementation.

In the comments, other community members suggest the following solutions:

1. Scroll up a bit after each scroll down to ensure all products are rendered properly, using the scrollDownAndUp option from the Crawlee library.

2. Implement the infinite scrolling in different ways, such as waiting for the scroll to finish and then selecting all the products, adding the infinite scroll to a Promise.all or Promise.race to run another function alongside it, or using the stopScrollCallback option to collect products and stop the scroll when no more are found.

The community members also discuss how to implement these solutions within the router, but there is no explicitly marked answer.

Useful resources
trying to get infinite scrolling to render in all products while scraping them as the page is being scrolled down
i looked at the documentation but didnt understand how to do this:
Plain Text
kotnRouter.addHandler('KOTN_DETAIL', async ({ log, page, parseWithCheerio }) => {
    log.info(`Scraping product URLs`);
  
    const $ = await parseWithCheerio()

    const productUrls: string[] = [];
  
    $('a').each((_, el) => {
        let productUrl = $(el).attr('href');
        if (productUrl) {
          if (!productUrl.startsWith('https://')) {
            productUrl = 'https://www.kotn.com' + productUrl;
            if(productUrl.includes('/products')){
                productUrls.push(productUrl);

            }
          } 
        }
    });
  
    // Push unique URLs to the dataset
    const uniqueProductUrls = Array.from(new Set(productUrls));
  
    await Dataset.pushData({
      urls: uniqueProductUrls,
    });
  
    await Promise.all(uniqueProductUrls.map(link => kotnPw.addRequests([{ url: link, label: 'KOTN_PRODUCT' }])));
  
    linksCount += uniqueProductUrls.length;
  
    await infiniteScroll(page, {
        maxScrollHeight: 0,
    });

    console.log(uniqueProductUrls);
    console.log(`Total product links scraped so far: ${linksCount}`);
    // Run bronPuppet crawler once after pushing the first product requests
    if (linksCount === uniqueProductUrls.length) {
      await kotnPw.run();
    }
});
1
h
H
A
6 comments
i also want to make sure it scrolls up a little bit every time it scrolls fully down to make sure it renders it in properly
To make it scroll up a bit every time after it scrolls down, you can use this option:
https://crawlee.dev/api/3.1/playwright-crawler/namespace/playwrightUtils#scrollDownAndUp
For scraping the products you can either,

Wait for the scroll to finish and then select all the products and add them to the queue.

Or

You can add the infiniteScroll to a Promise.all or Promise.race in orderer for it to keep scrolling while you run another function beside it in the same Promise.all or Promise.race.

Or

You can run the infiniteScroll function, and inside the stopScrollCallback option, you can collect the products and stop it once you don't find more.
https://crawlee.dev/api/3.1/playwright-crawler/namespace/playwrightUtils#stopScrollCallback
how do you implement this into the router do you write it in under a playwrightutils class or what do you do
just advanced to level 5! Thanks for your contributions! πŸŽ‰
Hey , you can either use the context aware method from context object, or you can use the method from playwrighUtils/puppeteerUtils, that needs page object as an argument.
Add a reply
Sign up and join the conversation on Discord