Architecture

Tiered Storage

Summary

What it is

Moving data between hot, warm, and cold storage tiers based on access frequency. S3 itself offers tiering (Standard, Infrequent Access, Glacier).

Where it fits

Tiered storage is the cost optimization layer for S3 data. It ensures frequently accessed data is fast and expensive while archival data is slow and cheap — a critical pattern for large data lakes where 80%+ of data is rarely accessed.

Misconceptions / Traps

  • Retrieval from cold tiers (Glacier, Deep Archive) has latency measured in minutes to hours. Do not tier data that might be needed for interactive queries.
  • S3 Intelligent-Tiering automates tier transitions but has per-object monitoring charges. For predictable access patterns, explicit lifecycle rules are cheaper.

Key Connections

  • solves Egress Cost — keeps hot data close to compute, cold data in cheap tiers
  • constrained_by Vendor Lock-In — tiering policies and pricing are provider-specific
  • scoped_to S3, Object Storage

Definition

What it is

The pattern of moving data between hot, warm, and cold storage tiers based on access frequency, with S3 or S3-compatible stores serving as one or more tiers.

Why it exists

Not all data is accessed equally. Frequently queried data benefits from fast (expensive) storage; archival data can reside in cheap, slow tiers. S3 itself offers tiering (Standard, Infrequent Access, Glacier), and cross-provider tiering adds further cost optimization.

Primary use cases

Cost optimization for large data lakes, lifecycle management of S3-stored datasets, compliance archival.

Relationships

Outbound Relationships

constrained_by

Resources