Technical

RAG (Retrieval-Augmented Generation)

Augmenting model responses by retrieving relevant documents from an external knowledge base at inference time.

Full Definition

Retrieval-Augmented Generation (RAG) is an architecture that connects a language model to an external knowledge store. When a query arrives, a retrieval system (typically dense vector search over a vector database) fetches the most relevant documents, which are then injected into the model's context alongside the query. The model generates its response conditioned on both its parametric knowledge and the retrieved context. RAG solves the knowledge cutoff problem (retrieved documents can be current), reduces hallucination (responses are grounded in specific sources), and enables deployment in private knowledge bases. It is the dominant architecture for enterprise AI applications and has largely superseded naive fine-tuning for knowledge-intensive tasks.

Examples

A legal AI tool that embeds 50,000 case documents into a Pinecone vector database and retrieves the top-5 most relevant cases for each user query before generating an answer.

A customer support bot that fetches the relevant product manual section before answering a technical question, citing the exact page number.

Apply this in your prompts

Prompt𝙸t𝙸n automatically uses techniques like RAG (Retrieval-Augmented Generation) to build better prompts for you.

✦ Try it free

Related Terms

Vector Database

A database optimised for storing and querying high-dimensional embedding vectors…

View →

Embedding

A dense numerical vector that represents a token, sentence, or document in a con…

View →

Grounding

Connecting model outputs to verifiable external sources to reduce hallucination …

View →

← Browse all 100 terms