Standard

Iceberg Table Spec

Summary

What it is

The specification defining how a logical table is represented as metadata files, manifest lists, manifests, and data files on object storage. Provides ACID, schema evolution, hidden partitioning, and time-travel.

Where it fits

The Iceberg spec is the blueprint that Apache Iceberg implements. It defines the metadata tree structure that turns a collection of Parquet files on S3 into a reliable, evolvable table — and enables any engine to read the same table consistently.

Misconceptions / Traps

The spec defines behavior, not implementation. Different engines (Spark, Flink, Trino) may implement the spec at different levels of completeness.
Manifest files accumulate with every write. Without regular metadata cleanup (expire snapshots, remove orphan files), metadata overhead grows.

Key Connections

enables Lakehouse Architecture — the specification that makes Iceberg-based lakehouses possible
solves Schema Evolution (column-ID-based evolution), Partition Pruning Complexity (partition specs in metadata)
scoped_to Table Formats, Lakehouse

Definition

What it is

A specification defining how a logical table is represented as a set of data files, metadata files, manifest lists, and snapshots on object storage. Provides ACID semantics, schema evolution, hidden partitioning, and time-travel.

Why it exists

Files on S3 have no inherent table structure. The Iceberg spec adds a metadata layer that turns a collection of Parquet files into a reliable, evolvable table — without requiring a database server.

Primary use cases

Defining lakehouse tables on S3, multi-engine table access (Spark, Trino, Flink can all read the same Iceberg table), schema evolution without rewriting data.

Relationships

Outbound Relationships

scoped_to

Table Formats Lakehouse

enables

Lakehouse Architecture

solves

Schema Evolution Partition Pruning Complexity

Resources

SpecHigh

iceberg.apache.org/spec/

The authoritative Iceberg Table Specification defining the metadata tree, manifest files, snapshot structure, schema evolution rules, and partitioning model.

GitHubHigh

github.com/apache/iceberg

Canonical repository containing the reference Java implementation and the specification source documents.

DocsHigh

iceberg.apache.org/docs/latest/

Official documentation covering API usage, configuration, integrations with Spark/Flink/Trino, and migration guides.