Apify Discord Mirror

Updated 5 months ago

saving data in apify actor and cleaning

At a glance

The community member is trying to save data scraped from their actors to a JSON file, but they are not getting the expected output. They want to save the data to the Apify console so they can then use MongoDB to store it in their database. The community member has already set up their MongoDB schema.

The community member has provided some code that reads data from a "rawData.json" file, appends new data to it, and writes the updated data back to the file. However, they are unsure if this will work in an Actor (a script running on the Apify platform) or if it only works on their local computer.

Another community member suggests that the community member could use the Apify Dataset to save the data, as it generates a new file in the storages folder for each item in the dataset. They also mention that the community member may be able to send the data directly to MongoDB, depending on their use case.

The community members discuss whether the fs module needs to be installed, and it is clarified that the fs module is part of the Node.js installation and does not need to be installed separately.

There is no explicitly marked answer in the comments, but the community members provide suggestions and guidance to help the original community member solve their

Useful resources
ive tried saving the data to a rawdata.json file from the data i scrape from my actors,

however i dont get a json output even thought the scraping works

how would i save the data to the apify console that i can then use mongodb to take that data and put it in my database -

i have my mongodb schema already setup so how would i save the data to the apify console and access it

would i have to save it to the apify dataset, if so how, and how would i also put it through a cleaning process through the same actor or if possible, a different actor and THEN save it to a mongodb database?

heres what i have for saving the json file so far:
h
P
6 comments
Plain Text
bambawRouter.addHandler('BAMBAW_PRODUCT', async ({ page, request }) => {
    try {
        console.log('Scraping products');

        const site = 'Bambaw';

        const title = await page.$eval('h1.product__title', (el) => el.textContent?.trim() || '');

        const descriptions = await ......

        const productData = {
        url: request.loadedUrl,
        site,
        title,
        descriptions,
        originalPrice,
        salePrice,
        shippingInfo,
        reviewScore,
        reviewNumber,
        };

        productList.push(productData);

        console.log('Scraped ', productList.length, ' products')
        // Read the existing data from the rawData.json file
        let rawData: any = {};
        try {
            const rawDataStr = fs.readFileSync('rawData.json', 'utf8');
            rawData = JSON.parse(rawDataStr);
        } catch (error) {
            console.log('Error reading rawData.json:', error);
        }

        // Append the new data to the existing data
        if (rawData.productList) {
            rawData.productList.push(productData);
        } else {
            rawData.productList = [productData];
        }

        // Write the updated data back to the rawData.json file
        fs.writeFileSync('rawData.json', JSON.stringify(rawData, null, 2));
        console.log('rawData.json updated for Bambaw');
    } catch (error) {
        console.log('Error scraping product:', error);
        bambawQueue.reclaimRequest(request);
        return;
    }    
Hmm... this should generally work... The question might be where is the file saved. You might find the examples for working with Dataset here https://crawlee.dev/api/core/class/Dataset (this will generated a new fiel in storages folder for each item in dataset). You should be able to even send it to the MongoDB directly, depends on your use-case.
would i have to install the fs dependency if so how
no fs module is part of nodejs instalation
does this work in an Actor because it only seems to work on my local compouter
So I am not sure where do you run it. This so you should be fully in control of wherever and how you run it. Are running it on Apify Platform? Then you may send me in DM a link with the run so I may check it.
Add a reply
Sign up and join the conversation on Discord