Safe to parallelize Dataset writes across processes?
Context:
• Crawlee v3.13.10, Node 22
• Linux (ext4), using storage-local
• Multiple forked workers share one RequestQueueV2 (with request locking)
• Each worker does:
Is it safe for N processes to push to the same dataset concurrently with storage-local (no corruption/partial writes)? Any guarantees about atomicity / ordering?
If not recommended, what’s the best pattern?
• per-worker datasets then merge?
Any flags/settings I should use to make this robust? Thanks!
0 Replies