Pain Point

Object Listing Performance

Summary

What it is

The slowness and cost of listing large numbers of objects in S3's flat namespace using prefix-based scans. Paginated at 1,000 objects per request.

Where it fits

Object listing is the hidden bottleneck in S3 operations. Partition discovery, garbage collection, and table snapshots all start with listing — and at millions of objects, LIST calls dominate job startup time. Originates from: **S3 API**.

Misconceptions / Traps

  • S3 prefixes are not directories. A prefix scan does not benefit from directory-like structure — it is a linear scan filtered server-side.
  • S3 Inventory (an offline listing report) is often better than real-time LIST for large-scale enumeration. But Inventory has a 24-48 hour delay.

Key Connections

  • AWS S3 constrained_by Object Listing Performance — inherent API limitation
  • DuckDB, Trino constrained_by Object Listing Performance — query engines pay the listing cost
  • Table formats reduce listing dependency by maintaining manifests, but metadata itself must be listed
  • scoped_to S3, Object Storage

Definition

What it is

The slowness and cost of listing large numbers of objects in S3's flat namespace using prefix-based scans.

Relationships

Outbound Relationships

Inbound Relationships

constrained_by

Resources