NeoNomade
NeoNomade7mo ago

CheerioCrawler headerGenerator help

Hello ! I kept reading the docs but couldn't find a clear information about this. When we use Puppeteer or Playwright we can tweak in browserPool the fingerprintGenerator. For Cheerio we have the headerGenerator from got, how we can adjust it inside the CheerioCrawler ?
3 Replies
Hall
Hall7mo ago
Someone will reply to you shortly. In the meantime, this might help:
Louis Deconinck
Louis Deconinck7mo ago
Here's an example on how to work with headerGeneratorOptions using the BasicCrawler. I would assume it works in the same way for the CheerioCrawler. https://crawlee.dev/docs/next/guides/got-scraping#useheadergenerator
Got Scraping | Crawlee · Build reliable crawlers. Fast.
Crawlee helps you build and maintain your crawlers. It's open source, but built by developers who scrape millions of pages every day for a living.
optimistic-gold
optimistic-gold7mo ago
Hi! You could also attempt to add the following option to CheerioCrawler:
preNavigationHooks: [
async (crawlingContext, opts: OptionsInit) => {
opts.headers = {
...opts.headers,
// your headers
};
}
]
preNavigationHooks: [
async (crawlingContext, opts: OptionsInit) => {
opts.headers = {
...opts.headers,
// your headers
};
}
]

Did you find this page helpful?