I have an actor that first scrapes raw

I have an actor that first scrapes raw results and then processes these results. I want to make both of those data collections (before and after processing) available to the end user. This is for a PPE actor. How to best do this?
10 Replies
azzouzana
azzouzana2w ago
Probably use named datasets? The default dataset would store the processed data, and a separate dataset called "raw" would store the original data. You could also let users choose whether the default dataset should contain the raw or processed data, and save the other data in the named dataset
Louis Deconinck
Louis DeconinckOP2w ago
We have to clear named datasets manually at the start of each run, right? Is a named dataset separate for each user or shared across all users?
azzouzana
azzouzana2w ago
They're unique to each user And you cannot delete from a dataset afaik (unlike KV store), a potential solution is to try to drop & recreate it, or better try use a prefix plus _ plus some unique string (run ID or hash of the input etc), depending on your usecase..
Strijdhagen
Strijdhagen2w ago
If one of the datasets is always returned, you could also return the second dataset as a json within the first dataset
Louis Deconinck
Louis DeconinckOP2w ago
Is it possible to have both show up in the output tab? Only the analysis overview shows up in the output tab, which is stored in the default dataset. How do I make the other dataset show up there. It also doesn't show under 'storage'. When I run it locally, they both show up. My setup for the named dataset is like this:
const reviewsDataset = await Actor.openDataset('reviews');
await reviewsDataset.drop();
const newReviewsDataset = await Actor.openDataset('reviews');
const reviewsDataset = await Actor.openDataset('reviews');
await reviewsDataset.drop();
const newReviewsDataset = await Actor.openDataset('reviews');
I have this as my output_schema.json:
{
"actorOutputSchemaVersion": 1,
"title": "Analysis Results",
"description": "Scraped reviews and analysis results.",
"properties": {
"analysis": {
"type": "string",
"title": "Analysis Overview",
"description": "AI analysis of the reviews.",
"template": "{{links.apiDefaultDatasetUrl}}/items?view=overview"
},
"reviews": {
"type": "string",
"title": "Scraped Reviews",
"description": "Raw reviews in named dataset 'reviews'.",
"template": "{{links.apiBaseUrl}}/datasets/reviews/items"
}
}
}
{
"actorOutputSchemaVersion": 1,
"title": "Analysis Results",
"description": "Scraped reviews and analysis results.",
"properties": {
"analysis": {
"type": "string",
"title": "Analysis Overview",
"description": "AI analysis of the reviews.",
"template": "{{links.apiDefaultDatasetUrl}}/items?view=overview"
},
"reviews": {
"type": "string",
"title": "Scraped Reviews",
"description": "Raw reviews in named dataset 'reviews'.",
"template": "{{links.apiBaseUrl}}/datasets/reviews/items"
}
}
}
No description
No description
No description
azzouzana
azzouzana2w ago
When you go to the storage tab, how many datasets are there? (When you click on the Default) I think you have two already since there seems to be the option to select
azzouzana
azzouzana2w ago
If that's the case then this is something with the output schema https://docs.apify.com/platform/actors/development/actor-definition/output-schema (which I don't have much experience with, never had to)
Actor output schema | Platform | Apify Documentation
Learn how to define and present output of your Actor.
azzouzana
azzouzana2w ago
Let me know when you figure this out please 🙏
manual-pink
manual-pink2w ago
Me too 🙏 . I'm interested in the best way to show more than one dataset on the UI.
ellativity
ellativity2w ago
This should be doable using output schema. I shared a quick summary of schemas last week: https://discord.com/channels/801163717915574323/801236795332231168/1441418040506978396 If you're interested, this is a great question for the Actor scheme webinar on 9 December - more details to follow very soon (today)

Did you find this page helpful?