Delta Lake
Summary
What it is
An open table format and storage layer providing ACID transactions, scalable metadata, and schema enforcement on data stored in object storage. Originally developed at Databricks.
Where it fits
Delta Lake is the table format native to the Databricks ecosystem. It competes with Iceberg and Hudi but has the strongest integration with Spark-based platforms. On S3, Delta Lake requires external coordination for atomic commits due to the lack of atomic rename.
Misconceptions / Traps
- Delta Lake on S3 requires a DynamoDB-based log store or equivalent for multi-writer safety. Without it, concurrent writes can corrupt the transaction log.
- "Delta" and "Databricks" are closely associated, but Delta is open-source. However, some advanced features (liquid clustering, predictive optimization) are Databricks-proprietary.
Key Connections
implementsLakehouse Architecture — provides ACID on data lakesdepends_onDelta Lake Protocol, Apache Parquet — protocol spec and data formatsolvesSchema Evolution — schema enforcement with evolution supportconstrained_byVendor Lock-In (Databricks ecosystem affinity), Lack of Atomic Rename (S3 limitation)scoped_toTable Formats, Lakehouse
Definition
What it is
An open table format and storage layer that brings ACID transactions, scalable metadata handling, and schema enforcement to data stored on object storage.
Why it exists
To enable reliable data pipelines on data lakes by providing transaction guarantees that raw file storage lacks. Originally developed at Databricks to address data quality and consistency problems in Spark-based pipelines.
Primary use cases
ACID-compliant data lakes, streaming and batch unification, audit-ready data pipelines, time-travel queries.
Relationships
Outbound Relationships
scoped_toimplementsdepends_onsolvesconstrained_byResources
Official Delta Lake documentation covering table protocol, API usage with Spark/Flink/Trino, and storage configuration including S3.
Primary Delta Lake open-source repository maintained by Databricks and the community, including the protocol spec and Spark connector.
The Delta Lake protocol specification defines the transaction log format and storage requirements critical for S3-based Delta tables.
Delta Lake's storage configuration documentation covers S3 multi-cluster writes, DynamoDB-based log store, and credentials setup.