Technology

StarRocks

Summary

What it is

An MPP analytical database with native lakehouse capabilities, able to directly query S3 data in Parquet, ORC, and Iceberg formats.

Where it fits

StarRocks bridges pure lakehouse queries (Trino) and dedicated analytical databases (ClickHouse). It can query S3 data directly like Trino but also cache hot data locally for sub-second performance, making it the choice when you need low-latency analytics over lakehouse data.

Misconceptions / Traps

StarRocks' external table performance on S3 is comparable to Trino. The latency advantage comes from its local caching and materialized views — which require managing local storage.
Shared-data architecture on S3 is a newer feature. Evaluate maturity for your use case before production deployment.

Key Connections

depends_on Apache Parquet — reads Parquet files from S3
used_by Lakehouse Architecture — queries lakehouse data
constrained_by Cold Scan Latency — first-query performance limited by S3 access
scoped_to S3, Lakehouse

Definition

What it is

A massively parallel processing (MPP) analytical database with native lakehouse capabilities, able to directly query data on S3 in Parquet, ORC, and Iceberg formats.

Why it exists

Organizations want sub-second analytics without ETL. StarRocks can query S3 data directly (like Trino) but also ingest and cache hot data locally for faster performance, bridging the gap between pure lakehouse queries and dedicated analytical databases.

Primary use cases

Low-latency analytics over S3 lakehouse data, materialized views over Iceberg tables, real-time dashboards on S3-backed datasets.

Relationships

Outbound Relationships

scoped_to

S3 Lakehouse

depends_on

Apache Parquet

used_by

Lakehouse Architecture

constrained_by

Cold Scan Latency

Resources

DocsHigh

docs.starrocks.io/

Official StarRocks documentation covering this MPP analytical database engine with native S3-backed storage architecture.

GitHubHigh

github.com/StarRocks/starrocks

Primary StarRocks repository with the C++/Java source for the shared-data architecture that separates compute from S3-based storage.

DocsMedium

docs.starrocks.io/docs/deployment/shared_data/s3/

StarRocks' S3 shared-data deployment guide covers configuring S3 as the primary storage backend.