harsh-harlequin
harsh-harlequin13mo ago

request queue data

I'd like to access the request queue data that's stored in memory and being written to the storage directory. I want to add data to my output such as the url, handledAt, etc. How would I go about that? Thanks.
2 Replies
ondro_k
ondro_k13mo ago
hey, the request data is stored in crawlingContext.request, which is accessible from your requestHandler:
const crawler = new PuppeteerCrawler({
requestHandler: async (crawlingContext) => {
const { url } = crawlingContext.request;
...
await Dataset.pushData({ ...data, url });
},
};
const crawler = new PuppeteerCrawler({
requestHandler: async (crawlingContext) => {
const { url } = crawlingContext.request;
...
await Dataset.pushData({ ...data, url });
},
};
According to the docs (https://crawlee.dev/api/3.10/core/class/Request#handledAt), handledAt is set after the request is processed, so it'll be always undefined in your requestHandler. However, you can just use new Date().toISOString() to create your own timestamp.
Request | API | Crawlee · Build reliable crawlers. Fast.
Crawlee helps you build and maintain your crawlers. It's open source, but built by developers who scrape millions of pages every day for a living.
harsh-harlequin
harsh-harlequinOP13mo ago
@ondro_k Great, thanks for the info.

Did you find this page helpful?