sensitive-blue
sensitive-blue3y ago

How to add data to the SDK_CRAWLER_STATISTICS

I want to add some counts to the SDK_CRAWLER_STATISTICS json. I found the solution to create a own file like this await KeyValueStore.setValue("statistics", statistics); but i would prefer to add it to the existing statistics
1 Reply
metropolitan-bronze
metropolitan-bronze3y ago
You can add property to the statistics state https://crawlee.dev/api/core/class/Statistics#state For example, I add mydata count:
import {
CheerioCrawler, // https://crawlee.dev/api/cheerio-crawler
Configuration,
sleep,
log
} from 'crawlee';


const config = new Configuration({
persistStateIntervalMillis: 10_000,
});

const crawler = new CheerioCrawler({
async requestHandler({ request, response, body, contentType, $ }) {
if (request.url === 'https://www.example.com/1') {
this.stats.state.mydata += 5
await sleep(5_000);
await crawler.addRequests(['https://www.example.com/2'])
}
if (request.url === 'https://www.example.com/2') {
this.stats.state.mydata += 10
await sleep(10_000);
}
}
}, config);

crawler.stats.state.mydata = 0
await crawler.run(['https://www.example.com/1']);
log.info(JSON.stringify(crawler.stats.toJSON()))
import {
CheerioCrawler, // https://crawlee.dev/api/cheerio-crawler
Configuration,
sleep,
log
} from 'crawlee';


const config = new Configuration({
persistStateIntervalMillis: 10_000,
});

const crawler = new CheerioCrawler({
async requestHandler({ request, response, body, contentType, $ }) {
if (request.url === 'https://www.example.com/1') {
this.stats.state.mydata += 5
await sleep(5_000);
await crawler.addRequests(['https://www.example.com/2'])
}
if (request.url === 'https://www.example.com/2') {
this.stats.state.mydata += 10
await sleep(10_000);
}
}
}, config);

crawler.stats.state.mydata = 0
await crawler.run(['https://www.example.com/1']);
log.info(JSON.stringify(crawler.stats.toJSON()))
Statistics | API | Crawlee
The statistics class provides an interface to collecting and logging run statistics for requests. All statistic information is saved on key value store under the key SDK_CRAWLER_STATISTICS_*, persists between migrations and abort/resurrect

Did you find this page helpful?