Retrieval-Augmented Generation(RAG)
An AI pattern that retrieves relevant documents from a vector database and injects them into the LLM prompt — so the model can answer from custom knowledge it was not trained on.
RAG combines (1) an embedding model that turns documents and queries into vectors, (2) a vector store (pgvector, Pinecone, Qdrant, Weaviate) that does fast nearest-neighbour search, and (3) an LLM that conditions its answer on the retrieved snippets. RAG is the dominant production pattern for "chat with your docs" — Slack history, codebase, policy documents, support tickets. Modern RAG adds hybrid (vector + BM25), re-rankers, query rewriting, and citation enforcement.
Related terms
A database optimized for similarity search over high-dimensional embedding vectors — the backbone of RAG and semantic search.
Dense numerical vector representations of text (or images, code, audio) where semantically similar inputs map to nearby vectors.
A neural network with billions of parameters trained on broad text corpora to predict and generate language — the engine behind ChatGPT, Claude, and Gemini.
AI systems where an LLM plans and executes multi-step tasks by calling external tools, accessing files, browsing, and adjusting its own approach based on results.
Read more on the blog
Need this built into a real product?
Viprasol Tech ships production code for everything defined here — MT4/MT5 EAs, AI agents, B2B SaaS, AWS architecture.
Send a brief →