Architecture

Separation of Storage and Compute

Summary

What it is

The design pattern of keeping data in S3 while running independent, elastically scaled compute engines against it.

Where it fits

This is the foundational architectural principle of the S3 ecosystem. Every query engine, table format, and data pipeline in this index assumes storage and compute are separate — data stays in S3, compute spins up and down on demand.

Misconceptions / Traps

  • Separation of storage and compute does not mean "no local storage." Caching, spill-to-disk, and local indexes are still used — the principle is that the source of truth is in S3.
  • Network latency between compute and S3 is the fundamental trade-off. Every query pays the cost of reading over HTTP instead of local disk.

Key Connections

  • depends_on S3 API — the interface that enables decoupling
  • solves Vendor Lock-In — swap compute engines without moving data
  • constrained_by Cold Scan Latency, Egress Cost — the costs of network-based data access
  • ClickHouse implements Separation of Storage and Compute
  • scoped_to S3, Object Storage

Definition

What it is

The design pattern of keeping data exclusively in object storage (S3) while running independent, elastically scaled compute engines against it. Compute and storage scale independently.

Why it exists

Coupled storage-and-compute systems (traditional databases, HDFS with co-located compute) force you to scale both together. Decoupling allows you to pay for storage at S3 prices and spin compute up or down on demand.

Primary use cases

Elastic analytics (scale query engines independently of data volume), multi-engine access (multiple query engines read the same S3 data), cost optimization.

Relationships

Outbound Relationships

depends_on

Inbound Relationships

implements

Resources