Sealed Directories
SeaweedFS Enterprise includes sealed directories, a way to shrink the filer’s metadata store by packing a cold directory’s child entries into compressed, volume-stored segment chunks — leaving the filer with just the directory entry and a small manifest index. The directory stays listable and readable; only writes under it are paused until it is unsealed. On large hash-fanout trees this reduces the filer store for the sealed data by roughly 18–58× (see Example savings below).
The filer keeps one metadata record per file and directory. When a dataset grows into the billions of small objects, that metadata store — not the raw data — becomes the scaling limit: it drives filer memory, disk, and backup size. Sealing folds the cold, rarely-changing part of the namespace out of that store and into ordinary volumes, where it is as cheap to keep as any other data (and can be erasure-coded or cloud-tiered).
Why it is needed
Every entry in SeaweedFS costs a metadata record in the filer store (LevelDB, RocksDB, or SQL). That cost is the same whether a file was written a second ago or a decade ago — cold data keeps paying full price:
- Memory and disk pressure on the filer. A namespace of hundreds of millions of entries makes the metadata store large and slow to open, scan, and compact.
- Backups scale with entry count, not activity. Metadata backup and replication carry every cold record, forever.
- Directory listings and lookups slow down as the store grows, even for hot paths, because everything shares one store.
Yet most of that data never changes. Archives, finished experiments, last quarter’s logs, model checkpoints, and the deep leaves of a hash-fanout layout are written once and then only read. Sealing lets you take that cold metadata out of the hot store without moving or re-encoding the data itself.
How it works
When a directory is sealed, its child entries are serialized, zstd-compressed, and written as segment chunks into regular SeaweedFS volumes. The filer replaces the individual child records with a single manifest on the directory entry — a small, sorted index of (first name, last name, chunk) per segment. The original child records are then purged from the store.
- Reads keep working. Listing or looking up inside a sealed directory is served from the manifest, with the decompressed segments cached in memory. A sealed directory is indistinguishable from a normal one to a reader.
- Writes are fenced. Creating, deleting, renaming, or modifying anything under a sealed directory is rejected until the directory is unsealed — sealing is meant for data that has gone cold.
- The data is just volume data. The segment chunks live in ordinary volumes, so they inherit erasure coding, cloud tiering, and replication like everything else. Sibling directories sealed in one pass share needles, keeping uploads and needle counts low.
- Reversible.
weed shell fs.unseal /pathmaterializes the children back into the filer store, exactly as they were.
Sealing is crash-safe end to end: an evented fence, a build journal, and replay-on-recovery mean a filer (or worker) that dies mid-seal is finished or rolled back automatically, and the change converges across filer peers.
Example savings
How much you save depends on the workload — how much of the tree you seal and how large the directories are. Larger directories amortize the per-manifest overhead better, so they pack more efficiently and reach a higher reduction; a layout of many tiny directories reaches a lower one.
The figures below come from an offline repack of a live cluster’s filer store: 15.77 million entries across 457,795 directories, averaging 62 bytes per entry — a hash-fanout layout dominated by small directories (most hold 10–63 entries).
| What was sealed | Filer-store size for those entries | Residual index kept in the filer store | Reduction |
|---|---|---|---|
| Larger directories only (~6.5% of entries) | 67.3 MiB | 1.2 MiB | ≈58× |
| The whole tree (tiny directories included) | 933.5 MiB | 52.4 MiB | ≈18× |
On this fanout tree, sealing essentially everything shrank the filer store for that data by about 18×; restricting sealing to the larger directories reached about 58× on the portion sealed. Broadly, expect roughly 18–58× less filer metadata for a sealed cold subtree — toward the higher end when directories are large.
Two things to keep in mind about where the data goes:
- The packed child data does not disappear — it moves out of the replicated filer store into ordinary volumes, where it additionally compresses about 4.3–4.8× and can be erasure-coded or cloud-tiered (much cheaper per byte than the filer store).
- Recursive sealing also folds the directory entries themselves into their parents’ manifests, so a fully sealed tree’s residual store cost approaches a single row at the sealed root — the residual figures above are the conservative per-directory bound, so real savings can be higher.
Automatic sealing
You can seal on demand from the shell or the Admin UI, or make it a standing policy with the auto_seal background worker. Policy is an ordered list of path-pattern rules stored on the filer at /etc/seaweedfs/seal.conf:
{ "rules": [
{ "pattern": "/data/**", "idleSeconds": 2592000, "minEntries": 64 },
{ "pattern": "/data/hot/**", "exclude": true },
{ "pattern": "/data/logs/**", "idleSeconds": 7776000 }
] }
Here everything under /data is sealed once it has been idle for 30 days and has at least 64 entries, /data/hot is never sealed, and /data/logs waits 90 days. Rules use gitignore-style globs (** spans separators) and last matching rule wins, so carve-outs and per-subtree thresholds are easy to express. The idle window is measured from the child files’ modification times, so a directory that is still being written is never sealed out from under an active workload.
Managing it
The Admin UI has a Sealed Directories page to edit the rules, seal or unseal a specific directory on demand (with a dry-run preview), and see the effect. The file browser marks sealed directories so operators always know what is read-only. Metrics report per-operation rejections on sealed directories and the count of committed seals, unseals, and repairs.
When to use it
- Very large namespaces where the filer metadata store is the bottleneck — billions of small files, where per-entry metadata dominates memory, disk, and backup size.
- Cold, immutable datasets — archives, completed experiments, aged logs, model checkpoints, or the deep leaves of a hash-fanout tree that are read but never rewritten.
- Keeping listings fast — folding cold entries out of the hot store keeps the working set small.
Because a sealed directory is read-only until unsealed, it is not for data that is still being written — point the rules at prefixes you know have gone cold, and unseal (or let a rule not match) anything that needs to change.