LLM Capability

Semantic Search

Summary

What it is

Querying S3-derived vector embeddings to find content by meaning rather than exact keyword match.

Where it fits

Semantic search is the retrieval layer that makes LLMs useful over S3 data. It powers the "R" in RAG — finding the most relevant S3-stored documents for a given query without requiring exact keyword matches.

Misconceptions / Traps

Semantic search is approximate, not exact. Results are ranked by similarity score, not matched precisely. False positives are possible and must be handled.
Semantic search requires embedding generation as a prerequisite. You cannot search semantically without first vectorizing the S3 data.

Key Connections

depends_on Embedding Model — needs vectors to search
enables Hybrid S3 + Vector Index — the retrieval mechanism for the pattern
augments Lakehouse Architecture — adds semantic retrieval to structured data
scoped_to LLM-Assisted Data Systems, Vector Indexing on Object Storage

Definition

What it is

The ability to query S3-derived vector embeddings to find content by meaning rather than exact keyword match.

Why it exists

S3 objects cannot be searched by content natively. Semantic search, built on embeddings generated from S3 data, allows users to find relevant documents, records, or media by describing what they need in natural language.

Primary use cases

Document retrieval for RAG over S3 data, knowledge discovery in S3-stored archives, content recommendation from S3-backed media libraries.

Relationships

Outbound Relationships

scoped_to

LLM-Assisted Data Systems Vector Indexing on Object Storage

depends_on

Embedding Model

enables

Hybrid S3 + Vector Index

augments

Lakehouse Architecture

Inbound Relationships

enables

Embedding Model

Resources

BlogHigh

aws.amazon.com/blogs/aws/introducing-amazon-s3-vectors-first...

AWS announcement of S3 Vectors, the first cloud object store with native vector search, enabling sub-second semantic queries over billions of embeddings.

DocsHigh

docs.opensearch.org/latest/vector-search/ai-search/semantic-...

Official OpenSearch documentation for semantic search using vector embeddings, the primary open-source engine used alongside S3-backed vector stores.

BlogHigh

aws.amazon.com/blogs/big-data/optimizing-vector-search-using...

AWS Big Data Blog showing how S3 Vectors integrates with OpenSearch Service for hybrid semantic + keyword search.