Schema Evolution
Summary
What it is
Changing data schemas (adding columns, renaming fields, altering types) in S3-stored datasets without breaking downstream consumers.
Where it fits
Schema evolution is the recurring tension between "business requirements change" and "existing data and queries must keep working." Every table format exists in part to solve this problem.
Misconceptions / Traps
- Not all schema changes are equal. Adding a column is safe in all table formats; renaming or changing types has format-specific behavior and risks.
- Schema evolution in the table format does not automatically propagate to downstream tools (dashboards, ML pipelines). Consumer-side schema awareness is still required.
Key Connections
- Apache Iceberg, Delta Lake, Apache Hudi
solvesSchema Evolution — table format support - Iceberg Table Spec, Delta Lake Protocol, Apache Avro
solvesSchema Evolution — specification-level solutions - Schema Inference
solvesSchema Evolution — LLM-assisted schema suggestion - Write-Audit-Publish catches schema-breaking changes before they reach consumers
scoped_toTable Formats, Data Lake
Definition
What it is
The challenge of changing data schemas (adding columns, renaming fields, altering types) in S3-stored datasets without breaking downstream consumers, queries, or pipelines.
Relationships
Outbound Relationships
scoped_toResources
Official Apache Iceberg documentation on schema evolution (add, drop, rename, reorder columns) as pure metadata operations with no file rewrites required.
The Iceberg specification detailing how unique column IDs enable safe, side-effect-free schema evolution at the format level.