Model Class

Small / Distilled Model

Summary

What it is

A compact model (typically under 10B parameters) suitable for local or edge deployment, often distilled from a larger model to retain key capabilities at lower cost.

Where it fits

Small models make LLM-over-S3 workloads economically viable at scale. They can run on commodity hardware for embedding generation, classification, and metadata extraction — avoiding cloud API costs and egress charges.

Misconceptions / Traps

  • "Small" does not mean "bad." Distilled models retain 90%+ of the teacher model's capability for specific tasks. But they are less versatile than full-size models.
  • Quantized models (4-bit, 8-bit) trade precision for throughput. Test on your specific data before assuming quality is acceptable.

Key Connections

  • enables Embedding Generation — can generate embeddings locally
  • scoped_to LLM-Assisted Data Systems

Definition

What it is

A compact model (typically under 10B parameters) suitable for local or edge deployment, often distilled from a larger model to retain key capabilities at lower computational cost.

Why it exists

Processing large volumes of S3-stored data through cloud LLM APIs is expensive. Small models can run on local hardware, enabling cost-effective embedding generation, classification, and metadata extraction at scale without egress or per-token charges.

Primary use cases

Local embedding generation for S3-stored content, on-premise data classification, edge inference for IoT data stored in S3.

Relationships

Outbound Relationships

Resources