How can I pass data extracted in the first part of the scraper to items that will be extracted later

Hi. I'm extracting prices of products. In the process, I have the main page where I can extract all the information I need except for the fees. If I go through every product individually, I can get the price and fees, but sometimes I lose the fee information because I get blocked on some products. I want to handle this situation. If I extract the fees, I want to add them to my product_item, but if I get blocked, I want to pass this data as empty. I'm using the "Router" class as the Crawlee team explains here: https://crawlee.dev/python/docs/introduction/refactoring. When I add my URL extracted from the first page as shown below, I cannot pass data extracted before:

await context.enqueue_links(url='product_url', label='PRODUCT_WITH_FEES')

await context.enqueue_links(url='product_url', label='PRODUCT_WITH_FEES')

I want something like this:

await context.enqueue_links(url='product_url', label='PRODUCT_WITH_FEES', data=product_item # type: dict)

await context.enqueue_links(url='product_url', label='PRODUCT_WITH_FEES', data=product_item # type: dict)

But I cannot do the above. How can I do it?

So, my final data will be showed as:

If I handle the data correctly I want something like this:

product_item = {product_id: 1234, price: 50$, fees: 3$}

product_item = {product_id: 1234, price: 50$, fees: 3$}

If I get blocked, I have something like this:

product_item = {product_id: 1234, price: 50$, fees: ''}

product_item = {product_id: 1234, price: 50$, fees: ''}

Apify & Crawlee•2y ago•

5 replies

conventional-black

How can I pass data extracted in the first part of the scraper to items that will be extracted later

await context.enqueue_links(url='product_url', label='PRODUCT_WITH_FEES')

await context.enqueue_links(url='product_url', label='PRODUCT_WITH_FEES')

I want something like this:

await context.enqueue_links(url='product_url', label='PRODUCT_WITH_FEES', data=product_item # type: dict)

await context.enqueue_links(url='product_url', label='PRODUCT_WITH_FEES', data=product_item # type: dict)

But I cannot do the above. How can I do it?

So, my final data will be showed as:

If I handle the data correctly I want something like this:

product_item = {product_id: 1234, price: 50$, fees: 3$}

product_item = {product_id: 1234, price: 50$, fees: 3$}

If I get blocked, I have something like this:

product_item = {product_id: 1234, price: 50$, fees: ''}

product_item = {product_id: 1234, price: 50$, fees: ''}

How can I pass data extracted in the first part of the scraper to items that will be extracted later

Similar Threads

How can I pass data extracted in the first part of the scraper to items that will be extracted later

Similar Threads

Similar Threads

Similar Threads