The community member is looking for a way to redirect the output of multiple runs of the same scraper to the same existing dataset, appending the new records. The community members discuss using the Apify SDK's "named Dataset" feature, but the original poster is focused on the REST API and integrating it with existing Java code. The community members explain that each run has its own default dataset, and there is no way to instruct an actor to use a pre-existing custom dataset. The solution suggested is to create an "Integration Actor" that can redirect the output to a custom dataset.
<answer>Ahh. yes if you are using preexisting Actor then no way to redirect the output. Unless the Actor have a parameter to support Custom Dataset. Otherwise you can create an "Integration Actor" which will redirect the output to custom dataset</answer>
Is there a way to redirect the output of multiple runs of the same scraper to the same existing dataset, appending the new records? The order doesn't matter. Due to the limitations of the scraper I am using, I need to perform thousands of runs that produce a very small amount of output that I would like to add to an existing dataset (obviously having the same format or schema). I skim through the Apify API documentation and I did not find anything about it.
Ahh. yes if you are using preexisting Actor then no way to redirect the output. Unless the Actor have a parameter to support Custom Dataset. Otherwise you can create an "Integration Actor" which will redirect the output to custom dataset
Thanks for the quick answer. You are redirecting me to the SDK, are there an equivalent method for the REST API? (I have to integrate the calls with existing Java code). Put in another way, I'm asking it there is a method to call/run an actor specifying an existing dataset. I'm new to Apify, I don't have a clear picture of the platform at the moment.
If I understand correctly, each run has its own new storage, no way to specify an existing one. To do a merge, I need to take every single storage created by the run and put it into the overall previously created remote storage. This is what the SDK does too, I guess.
Ok, that makes sense π . I don't want to sound rude but I'm focused on the REST API and not the SDK. As I said, I need to integrate it with existing Java code. I need to go down to a lower level than the one offered by the SDK. Given the links above, I understand how to create a remote dataset and how to store local data on it. From your answers I got that there is no way to instruct an actor to use an existing dataset (CUSTOM): you need to take the data from the DEFAULT dataset created by the run and move/copy it to the CUSTOM dataset.
I can't tell an actor to use a certain pre-existing dataset when I call it. The actor instance use its own dataset (DEFAULT). End of the story. Then, I can move/copy this dataset to a larger one.
Ahh. yes if you are using preexisting Actor then no way to redirect the output. Unless the Actor have a parameter to support Custom Dataset. Otherwise you can create an "Integration Actor" which will redirect the output to custom dataset