Pain Point

Schema Evolution

Summary

What it is

Changing data schemas (adding columns, renaming fields, altering types) in S3-stored datasets without breaking downstream consumers.

Where it fits

Schema evolution is the recurring tension between "business requirements change" and "existing data and queries must keep working." Every table format exists in part to solve this problem.

Misconceptions / Traps

  • Not all schema changes are equal. Adding a column is safe in all table formats; renaming or changing types has format-specific behavior and risks.
  • Schema evolution in the table format does not automatically propagate to downstream tools (dashboards, ML pipelines). Consumer-side schema awareness is still required.

Key Connections

  • Apache Iceberg, Delta Lake, Apache Hudi solves Schema Evolution — table format support
  • Iceberg Table Spec, Delta Lake Protocol, Apache Avro solves Schema Evolution — specification-level solutions
  • Schema Inference solves Schema Evolution — LLM-assisted schema suggestion
  • Write-Audit-Publish catches schema-breaking changes before they reach consumers
  • scoped_to Table Formats, Data Lake

Definition

What it is

The challenge of changing data schemas (adding columns, renaming fields, altering types) in S3-stored datasets without breaking downstream consumers, queries, or pipelines.

Relationships

Outbound Relationships

Resources