Standard

Iceberg Table Spec

Summary

What it is

The specification defining how a logical table is represented as metadata files, manifest lists, manifests, and data files on object storage. Provides ACID, schema evolution, hidden partitioning, and time-travel.

Where it fits

The Iceberg spec is the blueprint that Apache Iceberg implements. It defines the metadata tree structure that turns a collection of Parquet files on S3 into a reliable, evolvable table — and enables any engine to read the same table consistently.

Misconceptions / Traps

  • The spec defines behavior, not implementation. Different engines (Spark, Flink, Trino) may implement the spec at different levels of completeness.
  • Manifest files accumulate with every write. Without regular metadata cleanup (expire snapshots, remove orphan files), metadata overhead grows.

Key Connections

  • enables Lakehouse Architecture — the specification that makes Iceberg-based lakehouses possible
  • solves Schema Evolution (column-ID-based evolution), Partition Pruning Complexity (partition specs in metadata)
  • scoped_to Table Formats, Lakehouse

Definition

What it is

A specification defining how a logical table is represented as a set of data files, metadata files, manifest lists, and snapshots on object storage. Provides ACID semantics, schema evolution, hidden partitioning, and time-travel.

Why it exists

Files on S3 have no inherent table structure. The Iceberg spec adds a metadata layer that turns a collection of Parquet files into a reliable, evolvable table — without requiring a database server.

Primary use cases

Defining lakehouse tables on S3, multi-engine table access (Spark, Trino, Flink can all read the same Iceberg table), schema evolution without rewriting data.

Relationships

Resources