EC Volume Vacuum
SeaweedFS Enterprise includes EC volume vacuum, a background worker that continuously monitors your erasure-coded volumes for deleted data and compacts them by removing deleted needles. When EC volumes accumulate many deleted entries, vacuum reclaims wasted space and improves storage efficiency without manual intervention.
The key design advantage is that compaction happens on dedicated workers, not directly on volume servers. The worker collects all EC shards for a target volume into its local -workingDir, vacuums the data locally, and then distributes the compacted shards back to volume servers. Volume servers are not overloaded with the CPU and disk work of compaction; they only serve controlled shard transfers and receive the replacement shards.
What Can It Do?
- Detect Deleted Data: Identifies EC volumes where deleted needles are consuming significant space (configurable threshold).
- Compact Volumes: Removes deleted needles from EC shards, reclaiming storage space across the cluster.
- Offload Heavy Work: Performs shard compaction in the worker’s local working directory instead of on volume servers.
- Improve Efficiency: Reduces unnecessary parity shard overhead by eliminating deleted data before the shard is fully utilized.
- Reduce Volume Server Load: Keeps volume servers focused on normal serving work while workers handle vacuum CPU, temporary disk, and coordination.
- Rack-Aware Optimization: Maintains shard distribution across failure domains during compaction.
Why Do You Need It?
In erasure-coded systems, when files are deleted, the space they occupied in the EC shards remains allocated. Over time, this creates “storage waste”:
| Scenario | Deleted Ratio | Problem | EC Vacuum Solution |
|---|---|---|---|
| Fresh volume | 5% | Minimal waste, no action needed | No vacuum triggered |
| Aging dataset | 30% | 30% of shard space wasted on deleted data | Detects and triggers vacuum |
| Old archive | 50% | Half the volume is deleted, still consumes parity overhead | Compacts to reclaim 50% space |
For example, a 100GB EC 10+4 volume with 40% deleted data:
- Without vacuum: Still consumes 100GB + parity (140GB total with 10+4 ratio)
- After vacuum: Compacted to ~60GB + parity (~84GB total), a 40% space savings
How Does It Work?
The EC Vacuum worker runs as a scheduled background task on your cluster. When a volume crosses the deleted-data threshold, the worker pulls the full EC shard set into local storage, compacts it, verifies the result, and places the compacted shards back onto the cluster.
EC Volume Vacuum: worker-local compaction
Volume Servers Volume Servers
holding old shards receiving shards
+-----------+ 1. stream EC shards +-----------+
| volume A | -----\ /| volume D |
+-----------+ \ / +-----------+
+-----------+ \ / +-----------+
| volume B | --------> +--------------------+ ----> | volume E |
+-----------+ | EC Vacuum Worker | +-----------+
+-----------+ / | - collect shards | \ +-----------+
| volume C | -----/ | - vacuum locally | \> | volume F |
+-----------+ | - verify output | +-----------+
| - distribute shards |
+--------------------+
local -workingDir
2. remove deleted needles locally
3. rebuild compacted EC shards
4. distribute compacted shards back with rack-aware placement
Detection Phase
The worker periodically scans EC volumes in the cluster:
- Analyzes each EC volume’s shard composition
- Calculates the ratio of deleted needles vs. total data
- Compares against the configured threshold (default: 30%)
- Only processes volumes that exceed the threshold
Compaction Phase
When a volume triggers vacuum:
- Analyze: Determines which shards can be safely compacted
- Collect: Streams all required EC shards from volume servers into the worker’s local
-workingDir - Vacuum Locally: Removes deleted needles and rebuilds compacted shard files on the worker
- Verify: Validates that compacted shards match the original data
- Distribute: Places compacted shards back across the cluster using rack-aware placement
- Cleanup: Removes old oversized shards and temporary data
Technology Advantages
- No in-place volume-server vacuum: Volume servers are not asked to compact EC shards on their own disks. They serve shard data to the worker and receive compacted replacements.
- Resource isolation: CPU, temporary disk I/O, and shard rewrite work are concentrated on worker nodes that operators can size independently.
- Predictable cluster impact: Global and per-worker concurrency limits control how many volumes are vacuumed at once.
- Network-efficient placement: The worker can distribute compacted shards back to appropriate target nodes while preserving rack-aware EC placement.
- Operational simplicity: Adding more worker capacity increases vacuum throughput without changing the volume server role.
Safety Guarantees
- Only proceeds when sufficient healthy shards exist (>= data shard count)
- Validates data integrity before replacing original shards
- Keeps original shards available until compacted replacements are verified and distributed
- Shard distribution follows the same rack-aware rules as EC repair
Configuration
EC Vacuum runs as a plugin worker with configurable thresholds:
| Setting | Default | Description |
|---|---|---|
| Detection interval | 30 min | How often to scan for high-deletion volumes |
| Detection timeout | 10 min | Maximum time for a detection scan |
| Min interval | 300s | Minimum seconds between detection runs |
| Deleted ratio threshold | 0.30 | Trigger vacuum if >= 30% of volume is deleted |
| Max jobs per cycle | 100 | Maximum vacuum jobs per detection cycle |
| Global concurrency | 4 | Total concurrent vacuum jobs across cluster |
| Per-worker concurrency | 1 | Concurrent vacuum jobs per worker node |
You can also filter vacuums by collection to focus on specific data sets.
Deployment
EC Vacuum runs as a plugin worker process, integrated with your EC infrastructure. Start one or more weed worker processes that connect to the admin server.
Starting a Worker
EC Vacuum is automatically included when the erasure_coding handler is enabled:
# Start a worker for EC tasks (includes EC encoding, EC repair, and EC vacuum)
weed worker -admin=admin.example.com:23646 -jobType=erasure_coding \
-workingDir=/var/lib/seaweedfs-plugin -maxExecute=2
# Start a worker handling all heavy tasks
weed worker -admin=admin.example.com:23646 -jobType=heavy \
-workingDir=/var/lib/seaweedfs-plugin -maxExecute=4
# Start a worker handling all available task types
weed worker -admin=admin.example.com:23646 -jobType=all \
-workingDir=/var/lib/seaweedfs-plugin
Key Options
| Flag | Description |
|---|---|
-admin |
Admin server gRPC address (required) |
-jobType |
Task types: erasure_coding (or ec), heavy, all |
-workingDir |
Directory for collected input shards and compacted output during vacuums |
-maxExecute |
Max concurrent job executions per worker (default: 4) |
-metricsPort |
Prometheus metrics port for monitoring |
-id |
Stable worker ID across restarts; auto-generated if omitted |
Production Recommendations
- Run at least 2 worker instances for high availability
- Allocate sufficient
-workingDirdisk space for the largest concurrent EC vacuum job, including collected input shards and compacted output - Set
metricsPortto monitor vacuum progress and troubleshoot issues - Consider grouping EC workers by region/data center to minimize cross-network transfers
- Vacuum works alongside EC repair without coordination
Kubernetes
Workers are supported in the SeaweedFS Helm chart:
worker:
enabled: true
replicas: 2
jobType: "heavy"
maxExecute: 2
workingDir: "/var/lib/seaweedfs-plugin"
metricsPort: 9327
Health Checks
Workers expose HTTP endpoints when -metricsPort is set:
/health— always returns 200/ready— returns 200 only when connected to admin/metrics— Prometheus metrics
Monitoring and Observability
EC Vacuum integrates with SeaweedFS observability:
| Metric | Purpose |
|---|---|
vacuum_jobs_detected |
Number of volumes requiring vacuum in the last detection cycle |
vacuum_jobs_executed |
Number of successful compactions |
vacuum_jobs_failed |
Number of failed compactions |
vacuum_bytes_reclaimed |
Total storage space freed across all compactions |
vacuum_shard_rebuild_time |
Time taken to rebuild shards during compaction |
Key Benefits for Enterprise
- Storage Optimization: Reclaim wasted space in EC volumes without costly rebalancing
- Volume Server Protection: Heavy compaction runs on workers, so volume servers are not overloaded by vacuum CPU and local disk rewrite work
- Automatic Operation: Vacuum runs continuously in the background
- Custom EC Support: Works with any custom EC ratio (e.g., 20+4, 16+6)
- Rack-Aware Placement: Maintains shard distribution for maximum failure domain diversity
- Detailed Tracking: Activity events during each vacuum for full observability
- Scalable Efficiency: Global coordination prevents overwhelming the cluster with concurrent compactions
- Works with EC Repair: Complements EC repair to maintain both data integrity and storage efficiency
How EC Vacuum Complements Other Enterprise Features
- With EC Repair: EC repair restores fault tolerance; EC vacuum improves efficiency
- With Self-Healing: Self-healing detects shard issues; EC vacuum handles scheduled compaction
- With Custom EC Ratios: Higher-density ratios like 20+4 benefit most from vacuum’s space reclamation