Automatic EC Shard Repair

SeaweedFS Enterprise includes automatic EC shard repair, a background worker that continuously monitors your cluster for missing, duplicate, or inconsistent erasure coding shards — and fixes them without manual intervention. Workers scan → detect → plan → rebuild from healthy shards → distribute across racks → clean up, end-to-end.

A lost shard leaves a volume one step closer to data loss — repair rebuilds it and restores full tolerance, hands-off.

When to use it

A disk fails or a node drops out — EC shards go missing, leaving a volume one failure away from data loss. Repair rebuilds them automatically and restores full fault tolerance.
A previous repair or rebalance left junk behind — extra or duplicate shard copies pile up. Repair detects and cleans them.
Corruption or partial writes produce mismatched shards — shards with inconsistent sizes across copies are spotted and removed.
You run EC at scale and can’t babysit it — disk failures and node outages are inevitable; repair handles them continuously without operator intervention.

How to use it

EC Repair runs as a plugin worker process, separate from your master and volume servers. Start one or more weed worker processes that connect to the admin server:

# Start a worker for EC tasks (includes both EC encoding and EC repair)
weed worker -admin=admin.example.com:23646 -jobType=erasure_coding \
  -workingDir=/var/lib/seaweedfs-plugin -maxExecute=2

EC Repair is automatically included when the erasure_coding handler is enabled, and ships with sensible defaults — no tuning required to get started. Run at least two workers for high availability.

The repair loop runs continuously on workers, so shard health never depends on an operator noticing.

Benefits

Hands-off durability — disk failures and node outages are repaired automatically, keeping your data fully protected without operator intervention.
Prevents data loss — a volume with missing shards is restored to full fault tolerance quickly.
Safe by design — repairs only proceed when enough healthy shards exist, preventing destructive operations on already-degraded data.
Rack-aware rebuilds — rebuilt shards are placed to maximize failure-domain diversity across data centers and racks.
Works with any EC ratio — supports custom layouts (e.g. 20+4, 16+6), with batch processing, deduplication, and load-balanced repairs across the cluster.

Want the internals — the detection/placement/execution stages, full worker flags, configuration defaults, Kubernetes deployment, and health checks? See the Automatic EC Repair technical reference.