Iceberg Table Maintenance — Internals, API & Limits
Technical reference for Iceberg Table Maintenance — the five maintenance operations, how detection and scheduling work, the full threshold/configuration reference, engine compatibility, and concurrency safety.
How it works
- Automatic detection: A detection scan runs periodically (hourly by default) across your table buckets and flags tables that have crossed configurable thresholds — too many small files, too many snapshots, too many manifests, and so on. The admin server then schedules the maintenance jobs.
- Runs on workers, not query engines: Maintenance executes on dedicated worker nodes, so it never competes with your query workloads.
- Catalog-native and concurrency-safe: Each operation commits a new snapshot through the catalog’s normal metadata path and bails out if the table head moved during planning — so maintenance stays consistent with concurrent reads and writes.
Maintenance operations
The maintenance worker runs five operations, applied in this order:
- Compact data files: Merges small Parquet data files within a partition into larger ones, grouped by partition spec and partition key. Fewer, larger files mean faster scans and less per-file overhead. Compaction bin-packs by default, or can re-sort rows by the table’s sort order for tighter data clustering.
- Rewrite delete files: Consolidates Iceberg position delete files so reads spend less time reconciling merge-on-read deletes.
- Expire snapshots: Removes old table snapshots and cleans up their manifest-list files, always keeping the newest few regardless of age.
- Remove orphan files: Collects every file still referenced by a live snapshot and deletes unreferenced leftovers from previous writes and interrupted commits.
- Rewrite manifests: Consolidates many small manifest files into fewer, larger ones to cut query-planning overhead.
Configuration
Maintenance is threshold-driven. The most common settings (defaults shown):
| Setting | Default | Purpose |
|---|---|---|
operations |
all |
Which operations to run (compact, rewrite_position_delete_files, expire_snapshots, remove_orphans, rewrite_manifests) |
target_file_size_mb |
256 | Target size for compacted data files; smaller files are merge candidates |
min_input_files |
5 | Minimum small files in a partition before compaction runs |
rewrite_strategy |
binpack |
binpack, or sort to cluster rows by the table’s sort order |
snapshot_retention_hours |
168 | Age (7 days) past which snapshots may be expired |
max_snapshots_to_keep |
5 | Newest snapshots always kept, regardless of age |
orphan_older_than_hours |
72 | Minimum age before an unreferenced file is treated as an orphan |
min_manifests_to_rewrite |
5 | Minimum manifests before they are consolidated |
Delete-file compaction has its own thresholds (delete_target_file_size_mb, delete_min_input_files, and related knobs) for grouping and sizing rewritten delete files.
Engine Compatibility
Works with any engine that reads Iceberg tables through the SeaweedFS Iceberg REST catalog, including Apache Spark, Trino, Dremio, and DuckDB.
For the full configuration reference and the latest options, see the Iceberg Table Maintenance wiki.