Apify Discord Mirror

Updated 2 years ago

Retry using the browser

At a glance

The post asks how to first try scraping using CheerioCrawler and if the response is 403 or 401, then try PuppeteerCrawler again. Community members suggest the following:

1. Push the failed requests to an array and then run the PuppeteerCrawler after.

2. Provide sample code that uses PuppeteerCrawler to handle the 403/401 responses by adding the failed requests back to the crawler.

3. The best practice is to throw an error and the crawler will retry the whole request.

There is no explicitly marked answer in the comments.

Useful resources
How to make it so that first try to scrap using CheerioCrawler and if the response is 403 or 401 then try PuppeteerCrawler again.
L
R
A
5 comments
The easiest is probably just to push the failed requests to an array on the side and then run the PupppeteerCrawler after. You can have more crawlers inside single script
How push the failed requests?
just advanced to level 2! Thanks for your contributions! πŸŽ‰
How do you like the idea of doing this?
Plain Text
import { PuppeteerCrawler, ProxyConfiguration, Dataset } from 'crawlee';
import * as cheerio from 'cheerio';

const crawler = new PuppeteerCrawler({
  async requestHandler({ request, sendRequest, parseWithCheerio }) {
    if (request.skipNavigation) {
      const { statusCode, body } = await sendRequest();
      if (statusCode === 200) {
        const $ = cheerio.load(body);
        const title = $('h1').text();
        Dataset.pushData({ title, url: request.url });
      } else {
        // Maybe there is a keepDuplicateUrls option πŸ€”
        await crawler.addRequests([{ url: request.url, useExtendedUniqueKey: true }]);
      }
    } else {
      const $ = await parseWithCheerio();
      const title = $('h1').text();
      Dataset.pushData({ title, url: request.url });
    }
  }
});

await crawler.run([{ url: 'https://nowsecure.nl', skipNavigation: true }]);
The best practice is to just throw an error and the crawler will retry the whole request
Add a reply
Sign up and join the conversation on Discord