flat-fuchsia
flat-fuchsia7mo ago

Disable write to disk

By default, data will be write to ./storage, is there a way to turn off this and use memory instead ?
6 Replies
Hall
Hall7mo ago
Someone will reply to you shortly. In the meantime, this might help:
Louis Deconinck
Louis Deconinck7mo ago
Like storing the data in an array variable?
azzouzana
azzouzana7mo ago
What data you want to disable writing to disk? Scraping output or crawling stats/queues or what? I don't think you should disable it altogether (especially the crawlee stats/queues)
Marco
Marco7mo ago
If the problem is that data, like queues, is persisted across runs, you can try using apify run --purge: https://docs.apify.com/cli/docs/reference#apify-run
Apify CLI Reference Documentation | CLI | Apify Documentation
The Apify CLI provides tools for managing your Apify projects and resources from the command line. Use these commands to develop Actors locally, deploy them to Apify platform, manage storage, orchestrate runs, and handle account configuration.
ratty-blush
ratty-blush7mo ago
Configuration.set("persistStorage", false)
Configuration.set("persistStorage", false)
setting this before starting your crawler should do the trick. btwm you can also change the storage dir using something like
const storageClient = new MemoryStorage({
localDataDirectory: crawlStoragePath,
persistStorage: true,
});
const storageClient = new MemoryStorage({
localDataDirectory: crawlStoragePath,
persistStorage: true,
});
if you simply wanted to change the storage location
Pepa J
Pepa J7mo ago
I'll just add another example:
import { MemoryStorage } from '@crawlee/memory-storage';
import { PlaywrightCrawler } from 'crawlee';
import { RequestQueue } from 'apify';

export const memoryRequestQueue = await RequestQueue.open(null, {
storageClient: new MemoryStorage(),
});

const crawler = new PlaywrightCrawler({
proxyConfiguration,
requestQueue: memoryRequestQueue,
// ...
});
import { MemoryStorage } from '@crawlee/memory-storage';
import { PlaywrightCrawler } from 'crawlee';
import { RequestQueue } from 'apify';

export const memoryRequestQueue = await RequestQueue.open(null, {
storageClient: new MemoryStorage(),
});

const crawler = new PlaywrightCrawler({
proxyConfiguration,
requestQueue: memoryRequestQueue,
// ...
});
etc.

Did you find this page helpful?