Topic

Table Formats

Summary

What it is

The category of specifications (Iceberg, Delta, Hudi) that bring table semantics — schema, partitioning, ACID transactions, time-travel — to collections of files on object storage.

Where it fits

Table formats bridge the gap between raw files on S3 and the structured tables that SQL engines expect. They are the enabling layer for lakehouse architectures.

Misconceptions / Traps

  • Table formats are specifications, not databases. They define how metadata and data files are organized — the query engine is separate.
  • Choosing a table format is increasingly a convergent decision. Iceberg has become the de-facto standard, but Delta and Hudi remain relevant in their ecosystems.

Key Connections

  • scoped_to S3 — all table formats operate on S3-stored files
  • Iceberg Table Spec, Delta Lake Protocol, Apache Hudi Spec scoped_to Table Formats — the three major specifications
  • Apache Parquet scoped_to Table Formats — the dominant data file format under all three
  • Schema Evolution scoped_to Table Formats — the problem table formats exist to solve
  • Metadata Overhead at Scale scoped_to Table Formats — the problem table formats introduce

Definition

What it is

The category of specifications (Iceberg, Delta, Hudi) that bring table semantics — schema, partitioning, ACID transactions, time-travel — to collections of files stored on object storage.

Why it exists

Raw files on S3 have no transactional guarantees, no schema enforcement, and no efficient way to track which files belong to a logical table. Table format specifications solve this by adding a metadata layer on top of the files.

Relationships

Outbound Relationships

scoped_to

Resources