ambitious-aqua
ambitious-aqua8mo ago

Best way to deal with multiple datasets ?

What is the best way to compile all the datasets into a single dataset from multiple agents running and their individual tasks, each tasks has its own set of runs producing multiple datasets. Found this to be very confusing, zapier etc don't really process data like its needed - needs additional transformation. Thought the platform would have an option as there is an option under storage to create your own dataset but found interesting that there is not way to internally link any existing datasets to it ... possible to explain and advice? Thanks
8 Replies
Hall
Hall8mo ago
Someone will reply to you shortly. In the meantime, this might help:
ambitious-aqua
ambitious-aquaOP8mo ago
Re: Had raised the same concern over chat support,
i am using the multiple scraper agents, all running different tasks, each tasks has it own set of runs and individual dataset how can i combine all the data to a single dataset using just apify platform
No description
rare-sapphire
rare-sapphire8mo ago
I believe that named datasets can be "shared" across actors. So have each of your agents save to a predetermined dataset name (passed in as input arg?). Then the master dataset aggregation actor would read from all those named datasets to generate a new one.
memo23
memo238mo ago
@De you write 'actor' to get all datasets and 'merge them', or you use kvstore and push all data there as well, like I do for deduplication and etc
ambitious-aqua
ambitious-aquaOP8mo ago
i have two tasks atm
No description
ambitious-aqua
ambitious-aquaOP8mo ago
one is the scrapper, other one is the another actor i'm using for merge all data from scrape, but i need this to be done for multiple actors/twitter scrappers, who will each fetch different items and merge into a single dataset possible to let me know if this is possible only via such actors who can merge datasets or apify platform itself has a method of merging various datasets automatically requirement is that after every run, all scrapper agents should add the scrapped data to a centralised database automatically without the need to manually merge
rare-sapphire
rare-sapphire8mo ago
Maybe you can dump datasets into a shared S3 bucket? Something like https://console.apify.com/actors/zPS5oJWp7gcpJmxeX/input (have not tried it myself)
Apify
Apify Console
Manage Apify, a full-stack web scraping and data extraction platform.
ambitious-aqua
ambitious-aquaOP8mo ago
thanks will try it out

Did you find this page helpful?