Data Recovery

SeaweedFS Enterprise Data Recovery: recover fast from accidental or malicious deletes by finding and restoring deleted data within the retention window.

SeaweedFS Enterprise Data Recovery adds a retention-based recovery layer for delete events. When -deletionRetention is enabled, SeaweedFS keeps recently deleted bytes through vacuum and uses the filer’s metadata log to discover recoverable deletions. Operators can search, inspect, and restore files from the Admin UI or API without first restoring a cluster snapshot.

Modern storage systems are touched by people, scripts, CI/CD jobs, AI agents, sync tools, and S3 clients. Any one of them can delete the wrong prefix, remove a production dataset, or hide a versioned object behind a delete marker. Attackers and compromised credentials can do the same deliberately. Data Recovery gives operators a practical response window: find what was deleted, verify the blast radius, and restore the affected files before the retention window closes.

Data Recovery is designed for fast, in-place recovery from accidental deletes, automation mistakes, destructive AI-agent actions, malicious deletes, and S3 delete-marker incidents. It complements backups and disaster recovery; it is not a replacement for off-cluster backups, replication, or storage-level protection against disk loss, site loss, credential compromise, or data that has aged past the retention window.

Why It Matters

Deletion is one of the fastest ways to turn a routine mistake into an outage. A single command, workflow, agent action, or compromised account can remove thousands of objects in seconds. Traditional recovery often starts with finding the right backup, restoring a large snapshot, and then manually reconciling what changed. SeaweedFS Enterprise keeps the recovery path close to the data: search the deletion log, select the impacted files, and restore only what is needed.

Use Data Recovery when:

  • An AI agent deletes the wrong files: Autonomous cleanup, migration, or data-preparation agents can act quickly and at scale. Recover the affected paths without rolling back unrelated work.
  • A script or operator deletes the wrong prefix: Restore selected files or batches after a bad rm, lifecycle job, sync rule, or maintenance command.
  • A malicious actor removes data: If deletes are discovered inside the retention window, operators can inspect the affected paths and restore recoverable files while the incident response continues.
  • An S3 client creates delete markers: Remove the latest delete marker so the previous object version becomes visible again.
  • A directory tree is partially removed: Restore selected children and optionally recreate missing parent directories.

What Can It Do?

  • Recover deleted files and selected prefixes: Find deletions within the retention window and restore one file, a selected batch, or results filtered by path.
  • Keep deleted bytes recoverable: Delay garbage collection so tombstoned data remains readable until the retention window expires.
  • Reconstruct deletion context: Replay the filer metadata log to show what was deleted, when, and from which path, without running a separate catalog service.
  • Handle S3 delete markers: Recognize delete markers in versioned buckets and remove the marker so the previous object version becomes current again.
  • Support operational restore workflows: Filter by path prefix, path glob, and time range; then restore in place or to another path.
  • Protect live targets: Choose fail, overwrite, or rename when the restore target already exists.
  • Recover parent directories: Recreate missing parent directories and reuse original metadata when it is still in the metadata log.
  • Preserve object semantics: Restore inline data, chunk manifests, and server-side encryption metadata by copying retained bytes to fresh storage.

How Does It Work?

1. Delayed Garbage Collection (Retention)

Normally, when a file is deleted, SeaweedFS marks its needles as tombstones and reclaims the space on a later volume vacuum. With Data Recovery enabled, the master enforces a retention window: tombstones younger than the window are preserved through vacuum, so the underlying bytes remain readable and recoverable.

Enable it by setting a non-zero retention on the master:

weed master -deletionRetention=72h
# or, when running the combined server:
weed server -master.deletionRetention=72h

Notes:

  • The default is 0, which disables Data Recovery.
  • The value must be non-negative and is not hot-reloadable — restart the master to change it.
  • The auto-installed default 25 TB trial license caps retention at 1 hour; install a full license to use longer windows.

2. Metadata Log Analysis

The Admin server subscribes to the filer’s metadata change log and replays events within the retention window on demand. Each deletion is classified as either:

  • A normal deletion: the original file entry was removed.
  • An S3 delete marker: a versioned object was hidden behind a delete marker under <bucket>/<object>/.versions/.

The scan is stateless and paginated. Each request reads a fresh window from the log, so the Admin server does not maintain a separate recovery catalog.

3. Restore

For a normal deletion, SeaweedFS rereads the retained chunks from the volumes, including data still marked deleted, uploads fresh copies, and recreates the file entry at the chosen path. The restore path preserves inline data, chunk manifests, and server-side encryption metadata. For an S3 delete marker, SeaweedFS removes the marker so the previous version becomes visible again.

Options at restore time:

  • Target path: restore in place or to a new location.
  • Conflict mode: fail (leave an existing target untouched), overwrite, or rename (append a timestamp suffix).
  • Parent recovery: recreate missing parent directories, restoring their original metadata where the log still has it.

Using It from the Admin UI

  1. Enable retention on the master (above) and open the Admin UI.
  2. Go to Management -> Data Recovery (/object-store/undelete).
  3. Click Scan, then filter by path prefix, glob, or time range.
  4. Restore a single file with the row action, or select many and use Batch Restore.

The same operations are available over HTTP for automation:

Endpoint Method Purpose
/api/undelete/settings GET Whether recovery is enabled, and the retention window
/api/undelete/search GET List recoverable deletions (filterable, streamable)
/api/undelete/event/{eventId} GET Details for a single deletion
/api/undelete/event/{eventId}/restore POST Restore one file
/api/undelete/restore-batch POST Restore many files at once

Why Do Customers Need It?

  • Shorter recovery time: Restore a mistakenly deleted file or selected batch directly from the Admin UI instead of starting with a snapshot restore.
  • Protection against high-blast-radius deletes: Keep a configurable recovery window for bad scripts, wrong-prefix deletes, and delete-heavy operational mistakes.
  • S3-aware recovery: Resolve versioned-bucket delete-marker incidents without manually listing and removing markers.
  • Lower operational overhead: Recovery uses retained data and the existing metadata log, so there is no separate recovery catalog to deploy.
  • Better incident review: Operators can inspect the path, timestamp, object type, and restore target before taking action.

Requirements & Limits

  • Requires a valid SeaweedFS Enterprise license and -deletionRetention greater than 0.
  • Only deletions inside the retention window are recoverable; older events age out of the log and their data is vacuumed away.
  • Objects with a TTL may not be recoverable. TTL expiry runs on its own schedule, independent of the deletion-retention window, so a TTL object’s data can be reclaimed once its TTL elapses — even inside the recovery window.
  • fs.log.purge will not prune metadata-log entries inside the retention window by default (use -force to override), preserving what Data Recovery depends on.
  • Data Recovery complements backups and disaster recovery. Keep independent backups for storage failure, cluster loss, regulatory retention, and security incidents that go beyond recoverable delete events.

Note: If your enterprise license expires, SeaweedFS falls back to the open source behavior and Data Recovery is disabled. Data already retained on disk remains until the normal vacuum reclaims it.