LLM Capability

Natural Language Querying

Summary

What it is

Using LLMs to translate natural language questions into executable queries (SQL, API calls) over S3-backed datasets.

Where it fits

Natural language querying is the accessibility layer of S3-backed data systems. It lets business users ask questions in plain language and get results from Iceberg, Parquet, or other S3-backed tables — without knowing SQL.

Misconceptions / Traps

  • Natural language to SQL is not solved. LLMs generate plausible-looking SQL that may be wrong. Guardrails (schema validation, result sampling, SQL review) are essential.
  • Query accuracy depends heavily on schema metadata quality. Well-documented columns, table descriptions, and sample values improve LLM-generated SQL dramatically.

Key Connections

  • depends_on General-Purpose LLM — requires language understanding and SQL generation
  • augments Trino, DuckDB — generates SQL for these engines
  • scoped_to LLM-Assisted Data Systems, Lakehouse

Definition

What it is

Using LLMs to translate natural language questions into executable queries (SQL, API calls) over S3-backed datasets, making data accessible to non-technical users.

Why it exists

S3-backed lakehouses contain valuable data accessible only through SQL or programming interfaces. Natural language querying removes this barrier, allowing business users to ask questions in plain language and get results from Iceberg, Parquet, or other S3-backed data.

Primary use cases

Self-service analytics over lakehouse data, natural language to SQL for Trino/DuckDB queries, conversational interfaces over S3-backed datasets.

Relationships

Outbound Relationships

augments

Inbound Relationships

Resources