Object Listing Performance
Summary
What it is
The slowness and cost of listing large numbers of objects in S3's flat namespace using prefix-based scans. Paginated at 1,000 objects per request.
Where it fits
Object listing is the hidden bottleneck in S3 operations. Partition discovery, garbage collection, and table snapshots all start with listing — and at millions of objects, LIST calls dominate job startup time. Originates from: **S3 API**.
Misconceptions / Traps
- S3 prefixes are not directories. A prefix scan does not benefit from directory-like structure — it is a linear scan filtered server-side.
- S3 Inventory (an offline listing report) is often better than real-time LIST for large-scale enumeration. But Inventory has a 24-48 hour delay.
Key Connections
- AWS S3
constrained_byObject Listing Performance — inherent API limitation - DuckDB, Trino
constrained_byObject Listing Performance — query engines pay the listing cost - Table formats reduce listing dependency by maintaining manifests, but metadata itself must be listed
scoped_toS3, Object Storage
Definition
What it is
The slowness and cost of listing large numbers of objects in S3's flat namespace using prefix-based scans.
Relationships
Outbound Relationships
scoped_toResources
Official AWS API reference for ListObjectsV2, documenting the 1,000-object-per-request limit and pagination mechanisms that constrain listing performance.
AWS's official performance design patterns covering S3 Inventory as an alternative to listing, prefix parallelization, and caching strategies for large-scale object enumeration.
Deep engineering investigation into why S3 ListObjects can take 120+ seconds, revealing how delete markers and versioning cause severe performance degradation.