AWS Bedrock RAG in 2026

Q: What is AWS Bedrock RAG in 2026?

> Quick answer. Bedrock Knowledge Bases provide fully managed RAG: point it at an S3 bucket and it chunks, embeds with Titan Embeddings v2, and stores vectors in OpenSearch Serverless or Aurora pgvector. Use RetrieveAndGenerate for turnkey answers or Retrieve plus InvokeModel for custom, streaming control with Claude.

AWS Bedrock RAG in 2026: Knowledge Bases, Embedding Pipeline, and Retrieval-Augmented Generation

Quick answer. Bedrock Knowledge Bases provide fully managed RAG: point it at an S3 bucket and it chunks, embeds with Titan Embeddings v2, and stores vectors in OpenSearch Serverless or Aurora pgvector. Use RetrieveAndGenerate for turnkey answers or Retrieve plus InvokeModel for custom, streaming control with Claude.

AWS Bedrock's Knowledge Bases provide fully managed RAG — you point it at an S3 bucket, it handles chunking, embedding, and vector storage in OpenSearch Serverless or Aurora pgvector, and exposes a single API for retrieval. No vector database to manage, no embedding model to deploy.

This post covers Bedrock Knowledge Bases setup, the ingestion pipeline (S3 → embed → vector store), retrieval with RetrieveAndGenerate, custom RAG with Retrieve + InvokeModel for more control, streaming responses, and Terraform configuration.

Architecture

Documents (S3)
    ↓ Ingestion job
Chunks + Embeddings (Titan Embeddings v2)
    ↓
Vector Store (OpenSearch Serverless or Aurora pgvector)
    ↓ Retrieval
Top-K relevant chunks
    ↓ Augmented prompt
Claude claude-sonnet-4-6 (or other Bedrock model)
    ↓
Answer with citations

Terraform: Knowledge Base Setup

# terraform/bedrock-kb.tf

# S3 bucket for knowledge base documents
resource "aws_s3_bucket" "kb_documents" {
  bucket = "${var.name}-${var.environment}-kb-documents"
  tags   = var.common_tags
}

resource "aws_s3_bucket_versioning" "kb_documents" {
  bucket = aws_s3_bucket.kb_documents.id
  versioning_configuration { status = "Enabled" }
}

# OpenSearch Serverless collection (vector store)
resource "aws_opensearchserverless_collection" "kb" {
  name = "${var.name}-${var.environment}-kb"
  type = "VECTORSEARCH"
  tags = var.common_tags

  depends_on = [
    aws_opensearchserverless_security_policy.encryption,
    aws_opensearchserverless_security_policy.network,
    aws_opensearchserverless_access_policy.kb,
  ]
}

resource "aws_opensearchserverless_security_policy" "encryption" {
  name   = "${var.name}-${var.environment}-kb-enc"
  type   = "encryption"
  policy = jsonencode({
    Rules  = [{ ResourceType = "collection", Resource = ["collection/${var.name}-${var.environment}-kb"] }]
    AWSOwnedKey = true
  })
}

resource "aws_opensearchserverless_security_policy" "network" {
  name   = "${var.name}-${var.environment}-kb-net"
  type   = "network"
  policy = jsonencode([{
    Rules = [
      { ResourceType = "dashboard", Resource = ["collection/${var.name}-${var.environment}-kb"] },
      { ResourceType = "collection", Resource = ["collection/${var.name}-${var.environment}-kb"] }
    ]
    AllowFromPublic = true
  }])
}

resource "aws_opensearchserverless_access_policy" "kb" {
  name   = "${var.name}-${var.environment}-kb-access"
  type   = "data"
  policy = jsonencode([{
    Rules = [
      {
        ResourceType = "index"
        Resource     = ["index/${var.name}-${var.environment}-kb/*"]
        Permission   = ["aoss:*"]
      },
      {
        ResourceType = "collection"
        Resource     = ["collection/${var.name}-${var.environment}-kb"]
        Permission   = ["aoss:*"]
      }
    ]
    Principal = [
      aws_iam_role.bedrock_kb.arn,
      "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"
    ]
  }])
}

# IAM role for Bedrock Knowledge Base
resource "aws_iam_role" "bedrock_kb" {
  name = "${var.name}-${var.environment}-bedrock-kb"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = { Service = "bedrock.amazonaws.com" }
      Action    = "sts:AssumeRole"
      Condition = {
        StringEquals = { "aws:SourceAccount" = data.aws_caller_identity.current.account_id }
      }
    }]
  })
}

resource "aws_iam_role_policy" "bedrock_kb" {
  name = "kb-permissions"
  role = aws_iam_role.bedrock_kb.id
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect   = "Allow"
        Action   = ["s3:GetObject", "s3:ListBucket"]
        Resource = [
          aws_s3_bucket.kb_documents.arn,
          "${aws_s3_bucket.kb_documents.arn}/*"
        ]
      },
      {
        Effect   = "Allow"
        Action   = ["aoss:APIAccessAll"]
        Resource = aws_opensearchserverless_collection.kb.arn
      },
      {
        Effect   = "Allow"
        Action   = ["bedrock:InvokeModel"]
        Resource = "arn:aws:bedrock:${var.region}::foundation-model/amazon.titan-embed-text-v2:0"
      }
    ]
  })
}

# Bedrock Knowledge Base
resource "aws_bedrockagent_knowledge_base" "main" {
  name     = "${var.name}-${var.environment}-kb"
  role_arn = aws_iam_role.bedrock_kb.arn

  knowledge_base_configuration {
    type = "VECTOR"
    vector_knowledge_base_configuration {
      embedding_model_arn = "arn:aws:bedrock:${var.region}::foundation-model/amazon.titan-embed-text-v2:0"
    }
  }

  storage_configuration {
    type = "OPENSEARCH_SERVERLESS"
    opensearch_serverless_configuration {
      collection_arn    = aws_opensearchserverless_collection.kb.arn
      vector_index_name = "bedrock-kb-index"
      field_mapping {
        vector_field   = "embedding"
        text_field     = "AMAZON_BEDROCK_TEXT_CHUNK"
        metadata_field = "AMAZON_BEDROCK_METADATA"
      }
    }
  }

  tags = var.common_tags
}

# Data source: S3 documents
resource "aws_bedrockagent_data_source" "documents" {
  knowledge_base_id = aws_bedrockagent_knowledge_base.main.id
  name              = "documents"

  data_source_configuration {
    type = "S3"
    s3_configuration {
      bucket_arn = aws_s3_bucket.kb_documents.arn
    }
  }

  vector_ingestion_configuration {
    chunking_configuration {
      chunking_strategy = "FIXED_SIZE"
      fixed_size_chunking_configuration {
        max_tokens           = 512
        overlap_percentage   = 20
      }
    }
  }
}

🤖 AI Is Not the Future — It Is Right Now

Businesses using AI automation cut manual work by 60–80%. We build production-ready AI systems — RAG pipelines, LLM integrations, custom ML models, and AI agent workflows.

LLM integration (OpenAI, Anthropic, Gemini, local models)
RAG systems that answer from your own data
AI agents that take real actions — not just chat
Custom ML models for prediction, classification, detection

Explore AI for My Business WhatsApp

Ingestion: Upload Documents and Sync

// lib/bedrock/knowledge-base.ts
import {
  BedrockAgentClient,
  StartIngestionJobCommand,
  GetIngestionJobCommand,
} from "@aws-sdk/client-bedrock-agent";
import { S3Client, PutObjectCommand } from "@aws-sdk/client-s3";
import { readFileSync } from "fs";

const agent = new BedrockAgentClient({ region: process.env.AWS_REGION });
const s3    = new S3Client({ region: process.env.AWS_REGION });

const KB_ID        = process.env.BEDROCK_KB_ID!;
const DS_ID        = process.env.BEDROCK_DS_ID!;
const KB_S3_BUCKET = process.env.KB_S3_BUCKET!;

// Upload a document to S3 (Knowledge Base will ingest on next sync)
export async function uploadDocument(
  key: string,
  content: string | Buffer,
  contentType: "text/plain" | "text/markdown" | "application/pdf" = "text/plain"
) {
  await s3.send(new PutObjectCommand({
    Bucket: KB_S3_BUCKET,
    Key: key,
    Body: content,
    ContentType: contentType,
    Metadata: {
      // Metadata available in retrieval results
      source: key,
      uploadedAt: new Date().toISOString(),
    },
  }));
}

// Trigger ingestion job (processes new/changed S3 documents)
export async function syncKnowledgeBase(): Promise<string> {
  const { ingestionJob } = await agent.send(new StartIngestionJobCommand({
    knowledgeBaseId: KB_ID,
    dataSourceId: DS_ID,
  }));

  return ingestionJob!.ingestionJobId!;
}

// Wait for ingestion to complete
export async function waitForIngestion(jobId: string, timeoutMs = 300_000): Promise<void> {
  const deadline = Date.now() + timeoutMs;

  while (Date.now() < deadline) {
    const { ingestionJob } = await agent.send(new GetIngestionJobCommand({
      knowledgeBaseId: KB_ID,
      dataSourceId: DS_ID,
      ingestionJobId: jobId,
    }));

    const status = ingestionJob?.status;
    if (status === "COMPLETE") return;
    if (status === "FAILED") throw new Error(`Ingestion failed: ${ingestionJob?.failureReasons}`);

    await new Promise((r) => setTimeout(r, 5000));
  }

  throw new Error("Ingestion timed out");
}

RetrieveAndGenerate: One-Call RAG

// lib/bedrock/rag.ts
import {
  BedrockAgentRuntimeClient,
  RetrieveAndGenerateCommand,
  RetrieveCommand,
  type RetrieveAndGenerateCommandInput,
} from "@aws-sdk/client-bedrock-agent-runtime";
import {
  BedrockRuntimeClient,
  InvokeModelWithResponseStreamCommand,
} from "@aws-sdk/client-bedrock-runtime";

const agentRuntime = new BedrockAgentRuntimeClient({ region: process.env.AWS_REGION });
const bedrockRuntime = new BedrockRuntimeClient({ region: process.env.AWS_REGION });

const KB_ID = process.env.BEDROCK_KB_ID!;

export interface RAGResult {
  answer: string;
  citations: Array<{
    text: string;
    source: string;
    score: number;
  }>;
}

// Simple RAG: retrieve + generate in one API call
export async function retrieveAndGenerate(
  query: string,
  options: { maxResults?: number; systemPrompt?: string } = {}
): Promise<RAGResult> {
  const input: RetrieveAndGenerateCommandInput = {
    input: { text: query },
    retrieveAndGenerateConfiguration: {
      type: "KNOWLEDGE_BASE",
      knowledgeBaseConfiguration: {
        knowledgeBaseId: KB_ID,
        modelArn: "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-sonnet-4-6-v1:0",
        retrievalConfiguration: {
          vectorSearchConfiguration: {
            numberOfResults: options.maxResults ?? 5,
          },
        },
        generationConfiguration: {
          promptTemplate: {
            textPromptTemplate: options.systemPrompt
              ? `${options.systemPrompt}\n\n$search_results$\n\nQuestion: $query$\n\nAnswer based only on the provided context.`
              : undefined,
          },
          inferenceConfig: {
            textInferenceConfig: {
              maxTokens: 1024,
              temperature: 0.1,  // Low temperature for factual retrieval
            },
          },
        },
      },
    },
  };

  const response = await agentRuntime.send(new RetrieveAndGenerateCommand(input));

  const answer = response.output?.text ?? "";
  const citations = (response.citations ?? []).flatMap((citation) =>
    (citation.retrievedReferences ?? []).map((ref) => ({
      text: ref.content?.text ?? "",
      source: (ref.location?.s3Location?.uri ?? "").split("/").pop() ?? "",
      score: 0, // Score not available in RetrieveAndGenerate response
    }))
  );

  return { answer, citations };
}

AWS - AWS Bedrock RAG in 2026: Knowledge Bases, Embedding Pipeline

⚡ Your Competitors Are Already Using AI — Are You?

We build AI systems that actually work in production — not demos that die in a Colab notebook. From data pipeline to deployed model to real business outcomes.

AI agent systems that run autonomously — not just chatbots
Integrates with your existing tools (CRM, ERP, Slack, etc.)
Explainable outputs — know why the model decided what it did
Free AI opportunity audit for your business

Get a Free AI Audit WhatsApp

Custom RAG: Retrieve + Stream

For more control — retrieve chunks, build your own prompt, stream the response:

// lib/bedrock/custom-rag.ts

export async function* retrieveAndStreamAnswer(
  query: string,
  systemPrompt: string,
  options: { maxResults?: number } = {}
): AsyncGenerator<string> {
  // Step 1: Retrieve relevant chunks
  const { retrievalResults } = await agentRuntime.send(new RetrieveCommand({
    knowledgeBaseId: KB_ID,
    retrievalQuery: { text: query },
    retrievalConfiguration: {
      vectorSearchConfiguration: {
        numberOfResults: options.maxResults ?? 5,
        overrideSearchType: "HYBRID", // Semantic + keyword search
      },
    },
  }));

  // Filter low-relevance results
  const relevantChunks = (retrievalResults ?? [])
    .filter((r) => (r.score ?? 0) > 0.5)
    .map((r) => ({
      text: r.content?.text ?? "",
      source: r.location?.s3Location?.uri?.split("/").pop() ?? "unknown",
      score: r.score ?? 0,
    }));

  if (relevantChunks.length === 0) {
    yield "I don't have information about that in my knowledge base.";
    return;
  }

  // Step 2: Build augmented prompt
  const context = relevantChunks
    .map((c, i) => `[${i + 1}] Source: ${c.source}\n${c.text}`)
    .join("\n\n---\n\n");

  const userMessage = `Context from knowledge base:\n\n${context}\n\n---\n\nQuestion: ${query}`;

  // Step 3: Stream response from Claude
  const stream = await bedrockRuntime.send(new InvokeModelWithResponseStreamCommand({
    modelId: "anthropic.claude-sonnet-4-6-v1:0",
    contentType: "application/json",
    accept: "application/json",
    body: JSON.stringify({
      anthropic_version: "bedrock-2023-05-31",
      max_tokens: 1024,
      system: systemPrompt,
      messages: [{ role: "user", content: userMessage }],
    }),
  }));

  // Stream text chunks
  for await (const chunk of stream.body ?? []) {
    if (chunk.chunk?.bytes) {
      const parsed = JSON.parse(new TextDecoder().decode(chunk.chunk.bytes));
      if (parsed.type === "content_block_delta" && parsed.delta?.type === "text_delta") {
        yield parsed.delta.text;
      }
    }
  }
}

API Route: Streaming RAG Endpoint

// app/api/ai/ask/route.ts
import { NextRequest } from "next/server";
import { getWorkspaceContext } from "@/lib/auth/workspace-context";
import { retrieveAndStreamAnswer } from "@/lib/bedrock/custom-rag";

export async function POST(req: NextRequest) {
  const ctx = await getWorkspaceContext();
  if (!ctx) return new Response("Unauthorized", { status: 401 });

  const { query } = await req.json();
  if (!query?.trim()) return new Response("Query required", { status: 400 });

  const systemPrompt = `You are a helpful assistant for ${ctx.workspace.name}. 
Answer questions based only on the provided context. 
If the context doesn't contain the answer, say so clearly.
Cite your sources using [1], [2], etc.`;

  const encoder = new TextEncoder();

  const stream = new ReadableStream({
    async start(controller) {
      try {
        for await (const chunk of retrieveAndStreamAnswer(query, systemPrompt)) {
          controller.enqueue(encoder.encode(`data: ${JSON.stringify({ text: chunk })}\n\n`));
        }
        controller.enqueue(encoder.encode("data: [DONE]\n\n"));
      } catch (err) {
        controller.enqueue(
          encoder.encode(`data: ${JSON.stringify({ error: "Generation failed" })}\n\n`)
        );
      } finally {
        controller.close();
      }
    },
  });

  return new Response(stream, {
    headers: {
      "Content-Type": "text/event-stream",
      "Cache-Control": "no-cache",
      Connection: "keep-alive",
    },
  });
}

Cost Estimation

Component	Cost
Titan Embeddings v2	$0.00002/1K tokens (ingestion)
Claude claude-sonnet-4-6	$3/M input tokens, $15/M output tokens
OpenSearch Serverless	~$0.24/OCU-hour (min 2 OCU = ~$350/month)
Aurora pgvector (alternative)	db.t3.medium ~$60/month
S3 storage	$0.023/GB/month

For small-medium RAG workloads (<100K documents), Aurora pgvector is significantly cheaper than OpenSearch Serverless. Use OpenSearch Serverless for millions of documents or when you need full-text hybrid search.

Partnering With Viprasol

We build RAG systems on AWS Bedrock for SaaS products — from internal documentation search through customer-facing AI assistants. Our AI team has shipped Bedrock Knowledge Base integrations with streaming responses, citation tracking, and relevance filtering.

What we deliver:

Bedrock Knowledge Base setup with S3 data source and OpenSearch/pgvector
Document ingestion pipeline with sync automation
Custom RAG with retrieve + stream for full control
Streaming SSE endpoint for real-time response display
Cost analysis: OpenSearch Serverless vs Aurora pgvector for your document volume

See our AI/ML services or contact us to build your RAG system on AWS Bedrock.

AWS Bedrock Knowledge Bases RAG: Official Documentation 2026 Reference

When teams search for aws bedrock knowledge bases rag official documentation 2026, they usually want to confirm how the managed service actually wires retrieval to generation today. The current Amazon Bedrock Knowledge Bases docs describe a fully managed RAG workflow: you point a data source at S3, choose an embeddings model such as Amazon Titan or Cohere, and Bedrock handles chunking, vectorization, and storage in a vector store like OpenSearch Serverless, Aurora pgvector, or Pinecone. At query time, the RetrieveAndGenerate API fetches relevant chunks and grounds the model response with citations. We treat the official AWS documentation as the source of truth, then map it to your data, latency, and cost constraints. Our senior engineers build the ingestion pipeline, tune retrieval, and hand you full ownership of a production-ready Bedrock RAG deployment.

AWS Bedrock Knowledge Bases RAG Official Documentation 2026: What to Read First

When teams ask us where to start, we point them to the AWS Bedrock Knowledge Bases RAG official documentation 2026 edition before writing a single line of integration code. The docs now cover the full managed retrieval-augmented generation flow: ingesting sources from S3, chunking and embedding through your chosen Titan or third-party embedding model, and storing vectors in OpenSearch Serverless, Aurora pgvector, or Pinecone. Pay close attention to the sections on the RetrieveAndGenerate API, metadata filtering, and reranking, since these determine answer accuracy more than model choice alone. We treat the official guidance as the contract for IAM scoping, ingestion jobs, and sync schedules. Our senior engineers read it end to end, then build you a production pipeline you fully own, rather than a fragile prototype.

AWS Bedrock RAG in 2026: Knowledge Bases, Embedding Pipeline

AWS Bedrock RAG in 2026: Knowledge Bases, Embedding Pipeline, and Retrieval-Augmented Generation

Architecture

Terraform: Knowledge Base Setup

🤖 AI Is Not the Future — It Is Right Now

Ingestion: Upload Documents and Sync

RetrieveAndGenerate: One-Call RAG

⚡ Your Competitors Are Already Using AI — Are You?

Recommended Reading

Custom RAG: Retrieve + Stream

API Route: Streaming RAG Endpoint

Cost Estimation

Related Reading

Partnering With Viprasol

AWS Bedrock Knowledge Bases RAG: Official Documentation 2026 Reference

AWS Bedrock Knowledge Bases RAG Official Documentation 2026: What to Read First

External Resources

Viprasol Tech Team

Want to Implement AI in Your Business?

Ready to automate your business with AI agents?

Related Articles

SaaS AI Assistant: Streaming Chat, Tool Calls

OpenAI Assistants API: Threads, File Search, Code Interpreter

OpenAI Function Calling Guide 2026: Build AI Agents with Tool Use

LLM Prompt Engineering: System Prompts

AI Model Evaluation: Benchmarking LLMs, Regression Testing

RAG: Chunking Strategies, pgvector vs Pinecone, Retrieval Quality