Architecture

Tiered Storage

Summary

What it is

Moving data between hot, warm, and cold storage tiers based on access frequency. S3 itself offers tiering (Standard, Infrequent Access, Glacier).

Where it fits

Tiered storage is the cost optimization layer for S3 data. It ensures frequently accessed data is fast and expensive while archival data is slow and cheap — a critical pattern for large data lakes where 80%+ of data is rarely accessed.

Misconceptions / Traps

Retrieval from cold tiers (Glacier, Deep Archive) has latency measured in minutes to hours. Do not tier data that might be needed for interactive queries.
S3 Intelligent-Tiering automates tier transitions but has per-object monitoring charges. For predictable access patterns, explicit lifecycle rules are cheaper.

Key Connections

solves Egress Cost — keeps hot data close to compute, cold data in cheap tiers
constrained_by Vendor Lock-In — tiering policies and pricing are provider-specific
scoped_to S3, Object Storage

Definition

What it is

The pattern of moving data between hot, warm, and cold storage tiers based on access frequency, with S3 or S3-compatible stores serving as one or more tiers.

Why it exists

Not all data is accessed equally. Frequently queried data benefits from fast (expensive) storage; archival data can reside in cheap, slow tiers. S3 itself offers tiering (Standard, Infrequent Access, Glacier), and cross-provider tiering adds further cost optimization.

Primary use cases

Cost optimization for large data lakes, lifecycle management of S3-stored datasets, compliance archival.

Relationships

Outbound Relationships

scoped_to

S3 Object Storage

solves

Egress Cost

constrained_by

Vendor Lock-In

Resources

DocsHigh

aws.amazon.com/s3/storage-classes/

Official AWS documentation of all S3 storage classes (Standard, Intelligent-Tiering, Glacier, Deep Archive), the canonical reference for S3-based tiered storage.

DocsHigh

kafka.apache.org/41/operations/tiered-storage/

Official Apache Kafka documentation for tiered storage (KIP-405), which offloads older log segments to S3/HDFS while keeping recent data on local brokers.

DocsHigh

docs.confluent.io/platform/current/clusters/tiered-storage.h...

Confluent's production-grade tiered storage documentation showing how Kafka integrates with S3 for virtually unlimited, low-cost retention.