Technology

StarRocks

Summary

What it is

An MPP analytical database with native lakehouse capabilities, able to directly query S3 data in Parquet, ORC, and Iceberg formats.

Where it fits

StarRocks bridges pure lakehouse queries (Trino) and dedicated analytical databases (ClickHouse). It can query S3 data directly like Trino but also cache hot data locally for sub-second performance, making it the choice when you need low-latency analytics over lakehouse data.

Misconceptions / Traps

  • StarRocks' external table performance on S3 is comparable to Trino. The latency advantage comes from its local caching and materialized views — which require managing local storage.
  • Shared-data architecture on S3 is a newer feature. Evaluate maturity for your use case before production deployment.

Key Connections

  • depends_on Apache Parquet — reads Parquet files from S3
  • used_by Lakehouse Architecture — queries lakehouse data
  • constrained_by Cold Scan Latency — first-query performance limited by S3 access
  • scoped_to S3, Lakehouse

Definition

What it is

A massively parallel processing (MPP) analytical database with native lakehouse capabilities, able to directly query data on S3 in Parquet, ORC, and Iceberg formats.

Why it exists

Organizations want sub-second analytics without ETL. StarRocks can query S3 data directly (like Trino) but also ingest and cache hot data locally for faster performance, bridging the gap between pure lakehouse queries and dedicated analytical databases.

Primary use cases

Low-latency analytics over S3 lakehouse data, materialized views over Iceberg tables, real-time dashboards on S3-backed datasets.

Relationships

Outbound Relationships

scoped_to
depends_on
constrained_by

Resources