Definition
Retrieval-Augmented Generation (RAG) is a technique that enhances large language models by retrieving relevant documents from a knowledge base before generating responses. This grounds the AI’s output in factual, up-to-date information rather than relying solely on training data.
Why it matters
RAG is particularly valuable for knowledge-intensive domains where accuracy and currency are critical. Traditional LLMs may generate plausible but outdated or incorrect information. RAG addresses this by:
- Grounding responses in sources — every answer references specific documents from the knowledge base
- Maintaining currency — knowledge bases can be updated without expensive model retraining
- Reducing hallucination — the model generates from retrieved facts, not memorized patterns
- Enabling auditability — citations allow users to verify AI-generated responses
How it works
Question → Embed → Search KB → Retrieve docs → Generate → Response
│ │
└──── vector similarity ────┘
- User submits a question
- System converts question to embeddings and searches knowledge base
- Most relevant documents are retrieved
- LLM generates response using retrieved context
- Response includes source citations for verification
Common questions
Q: How is RAG different from fine-tuning?
A: Fine-tuning permanently modifies model weights with new data. RAG retrieves information at query time, making it easier to update and audit. RAG is preferred when source material changes frequently.
Q: Can RAG hallucinate?
A: RAG significantly reduces hallucinations by grounding responses in retrieved documents, but quality depends on the knowledge base completeness and retrieval accuracy.
Q: Why not just use a search engine?
A: Search engines return documents; RAG synthesizes information across multiple sources into a coherent answer with proper context.
Related terms
- LLM — the generation component that produces natural language responses
- Embeddings — vector representations enabling semantic search
References
Lewis et al. (2020), “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks”, NeurIPS. [11,200+ citations]
Gao et al. (2023), “Retrieval-Augmented Generation for Large Language Models: A Survey”, arXiv. [2,800+ citations]
Izacard & Grave (2021), “Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering”, EACL. [1,400+ citations]