zstd Storage Compression

SeaweedFS Enterprise compresses newly written data with zstd by default, instead of the open source gzip codec. Zstandard gives you a better compression ratio and dramatically faster reads, with full backward compatibility — no configuration required.

The codec only affects new writes. Existing gzip data keeps reading exactly as before, gzip and zstd data coexist, and there is nothing to migrate.

Why zstd?

gzip (DEFLATE) is over 30 years old. Zstandard is a modern compression algorithm built for the read-heavy, large-scale workloads that storage systems serve. The practical wins:

Property gzip (DEFLATE) zstd
Compression ratio Baseline Typically 10–30% smaller
Decompression speed Baseline Several times faster (often 3–5×)
Compression speed Baseline Comparable or faster at default levels
Tunable levels Limited Wide range (fast → maximum ratio)

Better ratio = lower cost

Every byte saved by compression is a byte you don’t have to store, replicate, or erasure-code. Because zstd typically compresses text, logs, JSON, CSV, and similar data noticeably smaller than gzip, the savings compound across replication and EC overhead. Combined with Customizable Erasure Coding, zstd reduces your raw data before parity overhead is applied.

Faster reads

In a storage system, data is written once but read many times. zstd’s standout property is decompression speed: it typically decodes several times faster than gzip. That means lower read latency and less CPU spent serving your traffic.

Tunable per workload

The compression level is a simple knob. Run latency-sensitive clusters at a fast level and archival clusters at a high-ratio level — whatever fits the workload.

Only compresses what benefits

SeaweedFS only keeps the compressed copy when the data actually gets smaller. Already-compressed formats (JPEG, MP4, ZIP, etc.) are stored as-is, so you never pay CPU to “compress” data that won’t shrink.

How It Works

You don’t have to track which codec was used for which data — SeaweedFS detects it automatically on read. This has two important consequences:

  • No migration. Existing gzip data reads identically forever; only new writes use zstd.
  • Mixed data is fine. gzip and zstd data coexist across your cluster, so you can adopt zstd gradually.

Reading compressed data never requires an enterprise license — only writing zstd does. If a license expires or does not permit zstd, new writes simply fall back to gzip while all existing data stays fully readable.

Configuration

zstd is the default, so most deployments need no configuration. Two flags let you tune or change it per process:

Flag Default Description
-compression.method zstd Codec for newly stored data: zstd (default) or gzip. zstd is used automatically when the license permits it, and falls back to gzip otherwise.
-compression.level 0 zstd level for newly stored data. 0 means a balanced default. Higher values favor a smaller output at more CPU; lower values favor speed. Follows the zstd CLI level scale. Ignored for gzip.

These flags are available on every process that writes data:

weed volume · weed filer · weed s3 · weed mount · weed upload · weed server · weed mini

(weed server and weed mini apply the setting to their embedded components.)

Examples

# Default: new data is compressed with zstd automatically — no flag needed
weed volume -dir=/data -mserver=master:9333

# Tune the zstd level for a colder dataset (higher ratio, more CPU)
weed filer -master=master:9333 -compression.level=19

# Fast level for a latency-sensitive workload
weed s3 -filer=filer:8888 -compression.level=3

# Opt back out to gzip if you need to
weed volume -dir=/data -mserver=master:9333 -compression.method=gzip

# All-in-one for a small deployment (zstd by default)
weed server -dir=/data -s3

Kubernetes (Helm values)

# zstd is the default — set extra args only to tune the level or opt out
volume:
  extraArgs:
    - "-compression.level=3"
filer:
  extraArgs:
    - "-compression.method=gzip"   # opt back out to gzip

What happens at startup

  • When the license permits zstd (including the free development tier), new data is compressed with zstd automatically.
  • When the license does not permit zstd, SeaweedFS logs a warning and falls back to gzip for new writes — it does not fail to start, and existing data stays readable.

You do not need every process to use the same codec.

Rollout and Compatibility

Upgrade volume servers first

Upgrade all volume servers to the enterprise binary before rolling zstd out to writers. Because zstd is the default, a writer (filer, S3 gateway, mount, etc.) on the enterprise binary will produce zstd data automatically — so your volume servers must be ready for it first.

Until every volume server is upgraded, pin your writers to gzip with -compression.method=gzip. Upgraded volume servers are always safe even if they are unlicensed or have the flag off, because reading is unconditional.

Downgrading to open source

Open source SeaweedFS cannot read zstd data. Do not downgrade a cluster that already contains zstd data to a pure open source build. A license is only needed to write zstd, not to read it — so staying on the enterprise binary keeps all data readable even after a license expires.

Key Benefits for Enterprise

  1. Lower storage cost: zstd typically compresses your data smaller than gzip, and the savings compound across replication and erasure coding.
  2. Faster reads: zstd decodes several times faster than gzip, lowering read latency and CPU on the hot path.
  3. No migration: existing gzip data keeps reading, and gzip and zstd data coexist.
  4. Zero-risk reads: reading never requires a license, so data stays readable across license changes and gradual rollouts.
  5. On by default: zstd is used automatically with no configuration; tune it with -compression.level or opt out with -compression.method=gzip.
  6. Tunable per workload: fast levels for latency-sensitive clusters, high-ratio levels for archival data.

How zstd Compression Complements Other Enterprise Features

  • With Customizable Erasure Coding: zstd shrinks the data before parity overhead is applied, multiplying the storage savings of high-density EC ratios like 20+4.
  • With Self-Healing Storage: self-healing continues to protect zstd data exactly as it does gzip data.