EC Volume Vacuum

SeaweedFS Enterprise includes EC volume vacuum, a background worker that continuously monitors your erasure-coded volumes for deleted data and compacts them by removing deleted needles. When EC volumes accumulate many deleted entries, vacuum reclaims wasted space and improves storage efficiency without manual intervention.

The key design advantage is that compaction happens on dedicated workers, not directly on volume servers. The worker collects all EC shards for a target volume into its local -workingDir, vacuums the data locally, and then distributes the compacted shards back to volume servers. Volume servers are not overloaded with the CPU and disk work of compaction; they only serve controlled shard transfers and receive the replacement shards.

What Can It Do?

  • Detect Deleted Data: Identifies EC volumes where deleted needles are consuming significant space (configurable threshold).
  • Compact Volumes: Removes deleted needles from EC shards, reclaiming storage space across the cluster.
  • Offload Heavy Work: Performs shard compaction in the worker’s local working directory instead of on volume servers.
  • Improve Efficiency: Reduces unnecessary parity shard overhead by eliminating deleted data before the shard is fully utilized.
  • Reduce Volume Server Load: Keeps volume servers focused on normal serving work while workers handle vacuum CPU, temporary disk, and coordination.
  • Rack-Aware Optimization: Maintains shard distribution across failure domains during compaction.

Why Do You Need It?

In erasure-coded systems, when files are deleted, the space they occupied in the EC shards remains allocated. Over time, this creates “storage waste”:

Scenario Deleted Ratio Problem EC Vacuum Solution
Fresh volume 5% Minimal waste, no action needed No vacuum triggered
Aging dataset 30% 30% of shard space wasted on deleted data Detects and triggers vacuum
Old archive 50% Half the volume is deleted, still consumes parity overhead Compacts to reclaim 50% space

For example, a 100GB EC 10+4 volume with 40% deleted data:

  • Without vacuum: Still consumes 100GB + parity (140GB total with 10+4 ratio)
  • After vacuum: Compacted to ~60GB + parity (~84GB total), a 40% space savings

How Does It Work?

The EC Vacuum worker runs as a scheduled background task on your cluster. When a volume crosses the deleted-data threshold, the worker pulls the full EC shard set into local storage, compacts it, verifies the result, and places the compacted shards back onto the cluster.

              EC Volume Vacuum: worker-local compaction

   Volume Servers                                      Volume Servers
  holding old shards                                  receiving shards

  +-----------+        1. stream EC shards             +-----------+
  | volume A  | -----\                                /| volume D  |
  +-----------+      \                              /  +-----------+
  +-----------+       \                            /   +-----------+
  | volume B  | --------> +--------------------+ ----> | volume E  |
  +-----------+          | EC Vacuum Worker    |       +-----------+
  +-----------+       /  | - collect shards    |   \   +-----------+
  | volume C  | -----/   | - vacuum locally    |    \> | volume F  |
  +-----------+          | - verify output     |       +-----------+
                         | - distribute shards |
                         +--------------------+
                             local -workingDir

       2. remove deleted needles locally
       3. rebuild compacted EC shards
       4. distribute compacted shards back with rack-aware placement

Detection Phase

The worker periodically scans EC volumes in the cluster:

  1. Analyzes each EC volume’s shard composition
  2. Calculates the ratio of deleted needles vs. total data
  3. Compares against the configured threshold (default: 30%)
  4. Only processes volumes that exceed the threshold

Compaction Phase

When a volume triggers vacuum:

  1. Analyze: Determines which shards can be safely compacted
  2. Collect: Streams all required EC shards from volume servers into the worker’s local -workingDir
  3. Vacuum Locally: Removes deleted needles and rebuilds compacted shard files on the worker
  4. Verify: Validates that compacted shards match the original data
  5. Distribute: Places compacted shards back across the cluster using rack-aware placement
  6. Cleanup: Removes old oversized shards and temporary data

Technology Advantages

  • No in-place volume-server vacuum: Volume servers are not asked to compact EC shards on their own disks. They serve shard data to the worker and receive compacted replacements.
  • Resource isolation: CPU, temporary disk I/O, and shard rewrite work are concentrated on worker nodes that operators can size independently.
  • Predictable cluster impact: Global and per-worker concurrency limits control how many volumes are vacuumed at once.
  • Network-efficient placement: The worker can distribute compacted shards back to appropriate target nodes while preserving rack-aware EC placement.
  • Operational simplicity: Adding more worker capacity increases vacuum throughput without changing the volume server role.

Safety Guarantees

  • Only proceeds when sufficient healthy shards exist (>= data shard count)
  • Validates data integrity before replacing original shards
  • Keeps original shards available until compacted replacements are verified and distributed
  • Shard distribution follows the same rack-aware rules as EC repair

Configuration

EC Vacuum runs as a plugin worker with configurable thresholds:

Setting Default Description
Detection interval 30 min How often to scan for high-deletion volumes
Detection timeout 10 min Maximum time for a detection scan
Min interval 300s Minimum seconds between detection runs
Deleted ratio threshold 0.30 Trigger vacuum if >= 30% of volume is deleted
Max jobs per cycle 100 Maximum vacuum jobs per detection cycle
Global concurrency 4 Total concurrent vacuum jobs across cluster
Per-worker concurrency 1 Concurrent vacuum jobs per worker node

You can also filter vacuums by collection to focus on specific data sets.

Deployment

EC Vacuum runs as a plugin worker process, integrated with your EC infrastructure. Start one or more weed worker processes that connect to the admin server.

Starting a Worker

EC Vacuum is automatically included when the erasure_coding handler is enabled:

# Start a worker for EC tasks (includes EC encoding, EC repair, and EC vacuum)
weed worker -admin=admin.example.com:23646 -jobType=erasure_coding \
  -workingDir=/var/lib/seaweedfs-plugin -maxExecute=2

# Start a worker handling all heavy tasks
weed worker -admin=admin.example.com:23646 -jobType=heavy \
  -workingDir=/var/lib/seaweedfs-plugin -maxExecute=4

# Start a worker handling all available task types
weed worker -admin=admin.example.com:23646 -jobType=all \
  -workingDir=/var/lib/seaweedfs-plugin

Key Options

Flag Description
-admin Admin server gRPC address (required)
-jobType Task types: erasure_coding (or ec), heavy, all
-workingDir Directory for collected input shards and compacted output during vacuums
-maxExecute Max concurrent job executions per worker (default: 4)
-metricsPort Prometheus metrics port for monitoring
-id Stable worker ID across restarts; auto-generated if omitted

Production Recommendations

  • Run at least 2 worker instances for high availability
  • Allocate sufficient -workingDir disk space for the largest concurrent EC vacuum job, including collected input shards and compacted output
  • Set metricsPort to monitor vacuum progress and troubleshoot issues
  • Consider grouping EC workers by region/data center to minimize cross-network transfers
  • Vacuum works alongside EC repair without coordination

Kubernetes

Workers are supported in the SeaweedFS Helm chart:

worker:
  enabled: true
  replicas: 2
  jobType: "heavy"
  maxExecute: 2
  workingDir: "/var/lib/seaweedfs-plugin"
  metricsPort: 9327

Health Checks

Workers expose HTTP endpoints when -metricsPort is set:

  • /health — always returns 200
  • /ready — returns 200 only when connected to admin
  • /metrics — Prometheus metrics

Monitoring and Observability

EC Vacuum integrates with SeaweedFS observability:

Metric Purpose
vacuum_jobs_detected Number of volumes requiring vacuum in the last detection cycle
vacuum_jobs_executed Number of successful compactions
vacuum_jobs_failed Number of failed compactions
vacuum_bytes_reclaimed Total storage space freed across all compactions
vacuum_shard_rebuild_time Time taken to rebuild shards during compaction

Key Benefits for Enterprise

  1. Storage Optimization: Reclaim wasted space in EC volumes without costly rebalancing
  2. Volume Server Protection: Heavy compaction runs on workers, so volume servers are not overloaded by vacuum CPU and local disk rewrite work
  3. Automatic Operation: Vacuum runs continuously in the background
  4. Custom EC Support: Works with any custom EC ratio (e.g., 20+4, 16+6)
  5. Rack-Aware Placement: Maintains shard distribution for maximum failure domain diversity
  6. Detailed Tracking: Activity events during each vacuum for full observability
  7. Scalable Efficiency: Global coordination prevents overwhelming the cluster with concurrent compactions
  8. Works with EC Repair: Complements EC repair to maintain both data integrity and storage efficiency

How EC Vacuum Complements Other Enterprise Features

  • With EC Repair: EC repair restores fault tolerance; EC vacuum improves efficiency
  • With Self-Healing: Self-healing detects shard issues; EC vacuum handles scheduled compaction
  • With Custom EC Ratios: Higher-density ratios like 20+4 benefit most from vacuum’s space reclamation