Architecture

Separation of Storage and Compute

Summary

What it is

The design pattern of keeping data in S3 while running independent, elastically scaled compute engines against it.

Where it fits

This is the foundational architectural principle of the S3 ecosystem. Every query engine, table format, and data pipeline in this index assumes storage and compute are separate — data stays in S3, compute spins up and down on demand.

Misconceptions / Traps

Separation of storage and compute does not mean "no local storage." Caching, spill-to-disk, and local indexes are still used — the principle is that the source of truth is in S3.
Network latency between compute and S3 is the fundamental trade-off. Every query pays the cost of reading over HTTP instead of local disk.

Key Connections

depends_on S3 API — the interface that enables decoupling
solves Vendor Lock-In — swap compute engines without moving data
constrained_by Cold Scan Latency, Egress Cost — the costs of network-based data access
ClickHouse implements Separation of Storage and Compute
scoped_to S3, Object Storage

Definition

What it is

The design pattern of keeping data exclusively in object storage (S3) while running independent, elastically scaled compute engines against it. Compute and storage scale independently.

Why it exists

Coupled storage-and-compute systems (traditional databases, HDFS with co-located compute) force you to scale both together. Decoupling allows you to pay for storage at S3 prices and spin compute up or down on demand.

Primary use cases

Elastic analytics (scale query engines independently of data volume), multi-engine access (multiple query engines read the same S3 data), cost optimization.

Relationships

Outbound Relationships

scoped_to

S3 Object Storage

depends_on

S3 API

solves

Vendor Lock-In

constrained_by

Cold Scan Latency Egress Cost

Inbound Relationships

enables

AWS S3 S3 API

implements

ClickHouse

Resources

DocsHigh

docs.snowflake.com/en/user-guide/intro-key-concepts

Snowflake's official architecture documentation describing the three-layer design (storage, compute, cloud services) that pioneered commercial separation of storage and compute.

DocsHigh

docs.databricks.com/aws/en/lakehouse-architecture/

Databricks' architecture documentation showing how Spark clusters run separately from data on S3/ADLS/GCS object storage.

DocsHigh

www.databricks.com/glossary/data-lakehouse

Databricks glossary entry explaining how the lakehouse pattern depends on decoupled storage and compute.