Automatic EC Shard Repair

SeaweedFS Enterprise includes automatic EC shard repair, a background worker that continuously monitors your cluster for missing, duplicate, or inconsistent erasure coding shards — and fixes them without manual intervention. Workers scan → detect → plan → rebuild from healthy shards → distribute across racks → clean up, end-to-end.

SeaweedFS Enterprise Automatic EC Shard Repair: background workers continuously scan the cluster, detect missing/extra/mismatched shards, plan repairs, rebuild shards using Reed-Solomon math, distribute them to target nodes, and clean up inconsistencies — illustrated end-to-end with an EC 4+2 example.

When to use it

  • A disk fails or a node drops out — EC shards go missing, leaving a volume one failure away from data loss. Repair rebuilds them automatically and restores full fault tolerance.
  • A previous repair or rebalance left junk behind — extra or duplicate shard copies pile up. Repair detects and cleans them.
  • Corruption or partial writes produce mismatched shards — shards with inconsistent sizes across copies are spotted and removed.
  • You run EC at scale and can’t babysit it — disk failures and node outages are inevitable; repair handles them continuously without operator intervention.

How to use it

EC Repair runs as a plugin worker process, separate from your master and volume servers. Start one or more weed worker processes that connect to the admin server:

# Start a worker for EC tasks (includes both EC encoding and EC repair)
weed worker -admin=admin.example.com:23646 -jobType=erasure_coding \
  -workingDir=/var/lib/seaweedfs-plugin -maxExecute=2

EC Repair is automatically included when the erasure_coding handler is enabled, and ships with sensible defaults — no tuning required to get started. Run at least two workers for high availability.

Benefits

  • Hands-off durability — disk failures and node outages are repaired automatically, keeping your data fully protected without operator intervention.
  • Prevents data loss — a volume with missing shards is restored to full fault tolerance quickly.
  • Safe by design — repairs only proceed when enough healthy shards exist, preventing destructive operations on already-degraded data.
  • Rack-aware rebuilds — rebuilt shards are placed to maximize failure-domain diversity across data centers and racks.
  • Works with any EC ratio — supports custom layouts (e.g. 20+4, 16+6), with batch processing, deduplication, and load-balanced repairs across the cluster.

Want the internals — the detection/placement/execution stages, full worker flags, configuration defaults, Kubernetes deployment, and health checks? See the Automatic EC Repair technical reference.