StarRocks
Summary
What it is
An MPP analytical database with native lakehouse capabilities, able to directly query S3 data in Parquet, ORC, and Iceberg formats.
Where it fits
StarRocks bridges pure lakehouse queries (Trino) and dedicated analytical databases (ClickHouse). It can query S3 data directly like Trino but also cache hot data locally for sub-second performance, making it the choice when you need low-latency analytics over lakehouse data.
Misconceptions / Traps
- StarRocks' external table performance on S3 is comparable to Trino. The latency advantage comes from its local caching and materialized views — which require managing local storage.
- Shared-data architecture on S3 is a newer feature. Evaluate maturity for your use case before production deployment.
Key Connections
depends_onApache Parquet — reads Parquet files from S3used_byLakehouse Architecture — queries lakehouse dataconstrained_byCold Scan Latency — first-query performance limited by S3 accessscoped_toS3, Lakehouse
Definition
What it is
A massively parallel processing (MPP) analytical database with native lakehouse capabilities, able to directly query data on S3 in Parquet, ORC, and Iceberg formats.
Why it exists
Organizations want sub-second analytics without ETL. StarRocks can query S3 data directly (like Trino) but also ingest and cache hot data locally for faster performance, bridging the gap between pure lakehouse queries and dedicated analytical databases.
Primary use cases
Low-latency analytics over S3 lakehouse data, materialized views over Iceberg tables, real-time dashboards on S3-backed datasets.
Relationships
Outbound Relationships
depends_onused_byconstrained_byResources
Official StarRocks documentation covering this MPP analytical database engine with native S3-backed storage architecture.
Primary StarRocks repository with the C++/Java source for the shared-data architecture that separates compute from S3-based storage.
StarRocks' S3 shared-data deployment guide covers configuring S3 as the primary storage backend.