Automatic EC Shard Repair
SeaweedFS Enterprise includes automatic EC shard repair, a background worker that continuously monitors your cluster for missing, duplicate, or inconsistent erasure coding shards — and fixes them without manual intervention. Workers scan → detect → plan → rebuild from healthy shards → distribute across racks → clean up, end-to-end.
When to use it
- A disk fails or a node drops out — EC shards go missing, leaving a volume one failure away from data loss. Repair rebuilds them automatically and restores full fault tolerance.
- A previous repair or rebalance left junk behind — extra or duplicate shard copies pile up. Repair detects and cleans them.
- Corruption or partial writes produce mismatched shards — shards with inconsistent sizes across copies are spotted and removed.
- You run EC at scale and can’t babysit it — disk failures and node outages are inevitable; repair handles them continuously without operator intervention.
How to use it
EC Repair runs as a plugin worker process, separate from your master and volume servers. Start one or more weed worker processes that connect to the admin server:
# Start a worker for EC tasks (includes both EC encoding and EC repair)
weed worker -admin=admin.example.com:23646 -jobType=erasure_coding \
-workingDir=/var/lib/seaweedfs-plugin -maxExecute=2
EC Repair is automatically included when the erasure_coding handler is enabled, and ships with sensible defaults — no tuning required to get started. Run at least two workers for high availability.
Benefits
- Hands-off durability — disk failures and node outages are repaired automatically, keeping your data fully protected without operator intervention.
- Prevents data loss — a volume with missing shards is restored to full fault tolerance quickly.
- Safe by design — repairs only proceed when enough healthy shards exist, preventing destructive operations on already-degraded data.
- Rack-aware rebuilds — rebuilt shards are placed to maximize failure-domain diversity across data centers and racks.
- Works with any EC ratio — supports custom layouts (e.g. 20+4, 16+6), with batch processing, deduplication, and load-balanced repairs across the cluster.
Want the internals — the detection/placement/execution stages, full worker flags, configuration defaults, Kubernetes deployment, and health checks? See the Automatic EC Repair technical reference.