vwkd
vwkd4d ago

How to aggregate user data from duplicate URLs

How can I save the categories of a product with multiple categories? Currently, I'm passing the category in userData to the category handler. However, since a URL is only scraped once, only the first category gets saved and all others discarded. Here's a minimal example.
import { Router } from "crawlee";

export const router = Router.create();

// ... other handlers

// imagine the category handler is called twice
router.addHandler("CATEGORY", async ({ enqueueLinks }) => {
// ... parsing

// result of first call
await enqueueLinks({
urls: ["https://example.com/my-product"],
userData: { category: "foo" },
label: "DETAIL",
});

// result of second call
await enqueueLinks({
urls: ["https://example.com/my-product"],
userData: { category: "bar" },
label: "DETAIL",
});
});

router.addHandler("DETAIL", async ({ request }) => {
// todo: how to get both "foo" and "bar" here?
const category = request.userData.category;

// ... saving
});
import { Router } from "crawlee";

export const router = Router.create();

// ... other handlers

// imagine the category handler is called twice
router.addHandler("CATEGORY", async ({ enqueueLinks }) => {
// ... parsing

// result of first call
await enqueueLinks({
urls: ["https://example.com/my-product"],
userData: { category: "foo" },
label: "DETAIL",
});

// result of second call
await enqueueLinks({
urls: ["https://example.com/my-product"],
userData: { category: "bar" },
label: "DETAIL",
});
});

router.addHandler("DETAIL", async ({ request }) => {
// todo: how to get both "foo" and "bar" here?
const category = request.userData.category;

// ... saving
});
0 Replies
No replies yetBe the first to reply to this messageJoin

Did you find this page helpful?