Automatic EC Vacuum

SeaweedFS Enterprise includes EC volume vacuum, a background worker that continuously monitors your erasure-coded volumes for deleted data and compacts them by removing deleted needles. When EC volumes accumulate many deleted entries, vacuum reclaims wasted space and improves storage efficiency without manual intervention.

The key design advantage is that compaction happens on dedicated workers, not directly on volume servers. The worker collects all EC shards for a target volume into its local -workingDir, vacuums the data locally, and then distributes the compacted shards back to volume servers. Volume servers are not overloaded with the CPU and disk work of compaction; they only serve controlled shard transfers and receive the replacement shards.

What Can It Do?

Detect Deleted Data: Identifies EC volumes where deleted needles are consuming significant space (configurable threshold).
Compact Volumes: Removes deleted needles from EC shards, reclaiming storage space across the cluster.
Offload Heavy Work: Performs shard compaction in the worker’s local working directory instead of on volume servers.
Improve Efficiency: Reduces unnecessary parity shard overhead by eliminating deleted data before the shard is fully utilized.
Reduce Volume Server Load: Keeps volume servers focused on normal serving work while workers handle vacuum CPU, temporary disk, and coordination.
Rack-Aware Optimization: Maintains shard distribution across failure domains during compaction.

Why Do You Need It?

In erasure-coded systems, when files are deleted, the space they occupied in the EC shards remains allocated. Over time, this creates “storage waste”:

Scenario	Deleted Ratio	Problem	EC Vacuum Solution
Fresh volume	5%	Minimal waste, no action needed	No vacuum triggered
Aging dataset	30%	30% of shard space wasted on deleted data	Detects and triggers vacuum
Old archive	50%	Half the volume is deleted, still consumes parity overhead	Compacts to reclaim 50% space

For example, a 100GB EC 10+4 volume with 40% deleted data:

Without vacuum: Still consumes 100GB + parity (140GB total with 10+4 ratio)
After vacuum: Compacted to ~60GB + parity (~84GB total), a 40% space savings

How Does It Work?

The EC Vacuum worker runs as a scheduled background task on your cluster. When a volume crosses the deleted-data threshold, the worker pulls the full EC shard set into local storage, compacts it, verifies the result, and places the compacted shards back onto the cluster.

              EC Volume Vacuum: worker-local compaction

   Volume Servers                                      Volume Servers
  holding old shards                                  receiving shards

  +-----------+        1. stream EC shards             +-----------+
  | volume A  | -----\                                /| volume D  |
  +-----------+      \                              /  +-----------+
  +-----------+       \                            /   +-----------+
  | volume B  | --------> +--------------------+ ----> | volume E  |
  +-----------+          | EC Vacuum Worker    |       +-----------+
  +-----------+       /  | - collect shards    |   \   +-----------+
  | volume C  | -----/   | - vacuum locally    |    \> | volume F  |
  +-----------+          | - verify output     |       +-----------+
                         | - distribute shards |
                         +--------------------+
                             local -workingDir

       2. remove deleted needles locally
       3. rebuild compacted EC shards
       4. distribute compacted shards back with rack-aware placement

Detection Phase

The worker periodically scans EC volumes in the cluster:

Analyzes each EC volume’s shard composition
Calculates the ratio of deleted needles vs. total data
Compares against the configured threshold (default: 30%)
Only processes volumes that exceed the threshold

Compaction Phase

When a volume triggers vacuum:

Analyze: Determines which shards can be safely compacted
Collect: Streams all required EC shards from volume servers into the worker’s local -workingDir
Vacuum Locally: Removes deleted needles and rebuilds compacted shard files on the worker
Verify: Validates that compacted shards match the original data
Distribute: Places compacted shards back across the cluster using rack-aware placement
Cleanup: Removes old oversized shards and temporary data

Technology Advantages

No in-place volume-server vacuum: Volume servers are not asked to compact EC shards on their own disks. They serve shard data to the worker and receive compacted replacements.
Resource isolation: CPU, temporary disk I/O, and shard rewrite work are concentrated on worker nodes that operators can size independently.
Predictable cluster impact: Global and per-worker concurrency limits control how many volumes are vacuumed at once.
Network-efficient placement: The worker can distribute compacted shards back to appropriate target nodes while preserving rack-aware EC placement.
Operational simplicity: Adding more worker capacity increases vacuum throughput without changing the volume server role.

Safety Guarantees

Only proceeds when sufficient healthy shards exist (>= data shard count)
Validates data integrity before replacing original shards
Keeps original shards available until compacted replacements are verified and distributed
Shard distribution follows the same rack-aware rules as EC repair

Configuration

EC Vacuum runs as a plugin worker with configurable thresholds:

Setting	Default	Description
Detection interval	30 min	How often to scan for high-deletion volumes
Detection timeout	10 min	Maximum time for a detection scan
Min interval	300s	Minimum seconds between detection runs
Deleted ratio threshold	0.30	Trigger vacuum if >= 30% of volume is deleted
Max jobs per cycle	100	Maximum vacuum jobs per detection cycle
Global concurrency	4	Total concurrent vacuum jobs across cluster
Per-worker concurrency	1	Concurrent vacuum jobs per worker node

You can also filter vacuums by collection to focus on specific data sets.

Deployment

EC Vacuum runs as a plugin worker process, integrated with your EC infrastructure. Start one or more weed worker processes that connect to the admin server.

Starting a Worker

EC Vacuum is automatically included when the erasure_coding handler is enabled:

# Start a worker for EC tasks (includes EC encoding, EC repair, and EC vacuum)
weed worker -admin=admin.example.com:23646 -jobType=erasure_coding \
  -workingDir=/var/lib/seaweedfs-plugin -maxExecute=2

# Start a worker handling all heavy tasks
weed worker -admin=admin.example.com:23646 -jobType=heavy \
  -workingDir=/var/lib/seaweedfs-plugin -maxExecute=4

# Start a worker handling all available task types
weed worker -admin=admin.example.com:23646 -jobType=all \
  -workingDir=/var/lib/seaweedfs-plugin

Key Options

Flag	Description
`-admin`	Admin server gRPC address (required)
`-jobType`	Task types: `erasure_coding` (or `ec`), `heavy`, `all`
`-workingDir`	Directory for collected input shards and compacted output during vacuums
`-maxExecute`	Max concurrent job executions per worker (default: 4)
`-metricsPort`	Prometheus metrics port for monitoring
`-id`	Stable worker ID across restarts; auto-generated if omitted

Production Recommendations

Run at least 2 worker instances for high availability
Allocate sufficient -workingDir disk space for the largest concurrent EC vacuum job, including collected input shards and compacted output
Set metricsPort to monitor vacuum progress and troubleshoot issues
Consider grouping EC workers by region/data center to minimize cross-network transfers
Vacuum works alongside EC repair without coordination

Kubernetes

Workers are supported in the SeaweedFS Helm chart:

worker:
  enabled: true
  replicas: 2
  jobType: "heavy"
  maxExecute: 2
  workingDir: "/var/lib/seaweedfs-plugin"
  metricsPort: 9327

Health Checks

Workers expose HTTP endpoints when -metricsPort is set:

/health — always returns 200
/ready — returns 200 only when connected to admin
/metrics — Prometheus metrics

Monitoring and Observability

EC Vacuum integrates with SeaweedFS observability:

Metric	Purpose
`vacuum_jobs_detected`	Number of volumes requiring vacuum in the last detection cycle
`vacuum_jobs_executed`	Number of successful compactions
`vacuum_jobs_failed`	Number of failed compactions
`vacuum_bytes_reclaimed`	Total storage space freed across all compactions
`vacuum_shard_rebuild_time`	Time taken to rebuild shards during compaction

Key Benefits for Enterprise

Storage Optimization: Reclaim wasted space in EC volumes without costly rebalancing
Volume Server Protection: Heavy compaction runs on workers, so volume servers are not overloaded by vacuum CPU and local disk rewrite work
Automatic Operation: Vacuum runs continuously in the background
Custom EC Support: Works with any custom EC ratio (e.g., 20+4, 16+6)
Rack-Aware Placement: Maintains shard distribution for maximum failure domain diversity
Detailed Tracking: Activity events during each vacuum for full observability
Scalable Efficiency: Global coordination prevents overwhelming the cluster with concurrent compactions
Works with EC Repair: Complements EC repair to maintain both data integrity and storage efficiency

How EC Vacuum Complements Other Enterprise Features

With EC Repair: EC repair restores fault tolerance; EC vacuum improves efficiency
With Self-Healing: Self-healing detects shard issues; EC vacuum handles scheduled compaction
With Custom EC Ratios: Higher-density ratios like 20+4 benefit most from vacuum’s space reclamation