Vector Databases: Choosing the Right One for Semantic Search and RAG
Vector database comparison in 2026 — pgvector vs Pinecone vs Weaviate vs Qdrant, embedding models, ANN search, RAG implementation, and when a vector DB is the r
Vector Databases: Choosing the Right One for Semantic Search and RAG
Vector databases store high-dimensional embeddings and retrieve the nearest neighbors at scale. They're the retrieval layer behind RAG (Retrieval-Augmented Generation), semantic search, recommendation systems, and anomaly detection.
Choosing the right one depends on your scale, existing stack, query patterns, and whether you want a managed service or full control. This guide gives you the technical tradeoffs and production implementation patterns for each major option.
What a Vector Database Actually Does
Traditional databases answer: "Give me rows where column = value."
Vector databases answer: "Give me the K most similar vectors to this query vector."
This similarity is measured by distance — cosine similarity, Euclidean distance, or dot product — and computed using Approximate Nearest Neighbor (ANN) algorithms (HNSW, IVF) that trade a tiny accuracy loss for orders-of-magnitude speed improvement.
Text → Embedding Model → [0.12, -0.34, 0.89, ..., 0.07] (1536 dimensions for text-embedding-3-small)
↓
Vector Database stores and indexes
↓
Query: "How do I reset my password?"
→ Embed query → Find 5 most similar stored vectors → Return their text content
Embedding Models (2026)
Before choosing a vector database, choose your embedding model — it determines vector dimensions and accuracy:
| Model | Dimensions | Cost | Best For |
|---|---|---|---|
| OpenAI text-embedding-3-small | 1536 | $0.02/1M tokens | General purpose, cost-effective |
| OpenAI text-embedding-3-large | 3072 | $0.13/1M tokens | Higher accuracy tasks |
| Cohere embed-v3 | 1024 | $0.10/1M tokens | Multilingual, search-optimized |
| Google textembedding-gecko | 768 | GCP pricing | Google Cloud native |
| BGE-M3 (open source) | 1024 | Free (self-host) | Cost-sensitive, multilingual |
| Nomic Embed (open source) | 768 | Free (self-host) | Long documents |
For most RAG applications: text-embedding-3-small is the right default — excellent quality, low cost, 1536 dimensions is well-supported by all major vector DBs.
🤖 AI Is Not the Future — It Is Right Now
Businesses using AI automation cut manual work by 60–80%. We build production-ready AI systems — RAG pipelines, LLM integrations, custom ML models, and AI agent workflows.
- LLM integration (OpenAI, Anthropic, Gemini, local models)
- RAG systems that answer from your own data
- AI agents that take real actions — not just chat
- Custom ML models for prediction, classification, detection
Vector Database Comparison
pgvector (PostgreSQL Extension)
The simplest starting point. If you're already on PostgreSQL, pgvector adds vector storage and similarity search as a first-class extension.
-- Enable extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Table with vector column
CREATE TABLE document_chunks (
id BIGSERIAL PRIMARY KEY,
source TEXT NOT NULL,
chunk_index INT NOT NULL,
content TEXT NOT NULL,
embedding vector(1536), -- OpenAI text-embedding-3-small dimensions
metadata JSONB,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- HNSW index for fast approximate nearest neighbor search
CREATE INDEX idx_embeddings_hnsw ON document_chunks
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Similarity search query
SELECT
id,
source,
content,
1 - (embedding <=> $1::vector) AS similarity
FROM document_chunks
WHERE 1 - (embedding <=> $1::vector) > 0.7 -- Minimum similarity threshold
ORDER BY embedding <=> $1::vector -- <=> is cosine distance operator
LIMIT 5;
# Python: store and query embeddings with pgvector
import psycopg2
import openai
from pgvector.psycopg2 import register_vector
import numpy as np
client = openai.OpenAI()
def get_embedding(text: str) -> list[float]:
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
def store_chunk(conn, source: str, chunk_index: int, content: str):
embedding = get_embedding(content)
with conn.cursor() as cur:
cur.execute(
"""INSERT INTO document_chunks (source, chunk_index, content, embedding)
VALUES (%s, %s, %s, %s)""",
(source, chunk_index, content, embedding)
)
conn.commit()
def semantic_search(conn, query: str, limit: int = 5) -> list[dict]:
query_embedding = get_embedding(query)
with conn.cursor() as cur:
cur.execute(
"""SELECT id, source, content,
1 - (embedding <=> %s::vector) AS similarity
FROM document_chunks
ORDER BY embedding <=> %s::vector
LIMIT %s""",
(query_embedding, query_embedding, limit)
)
rows = cur.fetchall()
return [
{"id": r[0], "source": r[1], "content": r[2], "similarity": float(r[3])}
for r in rows
]
pgvector strengths:
- No new infrastructure — runs inside PostgreSQL
- Full SQL queries — filter by metadata, join with other tables
- ACID transactions
- Same backup/monitoring as your existing DB
pgvector limitations:
- Performance at very large scale (>10M vectors) requires careful tuning
- Memory-intensive (HNSW index must fit in memory for best performance)
- Not designed for billion-scale vector search
Best for: < 5M vectors, existing PostgreSQL stack, need SQL joins with relational data
Pinecone (Managed)
Fully managed, serverless vector database. No infrastructure to manage.
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
# Create index
pc.create_index(
name="documents",
dimension=1536,
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1")
)
index = pc.Index("documents")
# Upsert vectors
vectors = [
{
"id": f"doc-{i}",
"values": get_embedding(chunk),
"metadata": {"source": source, "chunk_index": i, "content": chunk}
}
for i, chunk in enumerate(chunks)
]
index.upsert(vectors=vectors, namespace="production")
# Query with metadata filtering
results = index.query(
vector=get_embedding(query),
top_k=5,
namespace="production",
filter={"source": {"$eq": "user-manual.pdf"}}, # Filter by metadata
include_metadata=True
)
for match in results.matches:
print(f"Score: {match.score:.4f} | {match.metadata['content'][:100]}")
Pinecone strengths:
- Fully managed — no ops overhead
- Scales to billions of vectors
- Fast query latency (~10–30ms p99)
- Namespace isolation for multi-tenancy
Pinecone limitations:
- Vendor lock-in
- Cost: $70–$100+/month for meaningful scale
- No SQL — metadata filtering only
Best for: Teams that want zero vector DB ops, high-scale production, startup willing to pay for managed
Qdrant (Open Source / Managed)
High-performance, written in Rust. Self-host or use Qdrant Cloud.
from qdrant_client import QdrantClient, models
client = QdrantClient(url="http://localhost:6333") # Or Qdrant Cloud URL
# Create collection
client.create_collection(
collection_name="documents",
vectors_config=models.VectorParams(
size=1536,
distance=models.Distance.COSINE
)
)
# Upsert points
client.upsert(
collection_name="documents",
points=[
models.PointStruct(
id=i,
vector=get_embedding(chunk),
payload={
"source": source,
"content": chunk,
"chunk_index": i
}
)
for i, chunk in enumerate(chunks)
]
)
# Semantic search with filter
results = client.search(
collection_name="documents",
query_vector=get_embedding(query),
query_filter=models.Filter(
must=[
models.FieldCondition(
key="source",
match=models.MatchValue(value="user-manual.pdf")
)
]
),
limit=5,
with_payload=True
)
Qdrant strengths:
- Rust performance — very fast, low memory
- Rich filtering (nested conditions, geo search)
- Open source — full control
- Qdrant Cloud available for managed option
Qdrant limitations:
- Newer ecosystem vs Pinecone
- Self-hosted requires ops knowledge
Best for: Performance-sensitive applications, teams comfortable with self-hosting, cost-sensitive at scale
Weaviate
Schema-based, GraphQL API, native multi-modal support.
import weaviate
client = weaviate.Client(url="http://localhost:8080")
# Schema definition (enforced)
client.schema.create_class({
"class": "Document",
"vectorizer": "text2vec-openai", # Weaviate handles embedding automatically
"moduleConfig": {
"text2vec-openai": {"model": "text-embedding-3-small"}
},
"properties": [
{"name": "content", "dataType": ["text"]},
{"name": "source", "dataType": ["string"]},
{"name": "chunkIndex", "dataType": ["int"]},
]
})
# Insert (Weaviate auto-embeds via configured vectorizer)
client.data_object.create(
data_object={"content": chunk, "source": source, "chunkIndex": i},
class_name="Document"
)
# Semantic search
result = (
client.query
.get("Document", ["content", "source"])
.with_near_text({"concepts": [query]})
.with_limit(5)
.with_additional(["certainty"])
.do()
)
Best for: Auto-vectorization workflows, multi-modal (text + images), GraphQL-native teams
Decision Framework
| Criteria | pgvector | Pinecone | Qdrant | Weaviate |
|---|---|---|---|---|
| Scale < 5M vectors | ✅ Best | ✅ | ✅ | ✅ |
| Scale 5M–100M | ⚠️ Needs tuning | ✅ | ✅ | ✅ |
| Scale > 100M | ❌ | ✅ | ✅ | ✅ |
| Existing PostgreSQL | ✅ Best | — | — | — |
| Zero ops preference | — | ✅ Best | ✅ Cloud | ✅ Cloud |
| Cost sensitivity | ✅ Cheapest | ❌ Expensive | ✅ | ✅ |
| SQL joins needed | ✅ Best | ❌ | ❌ | ❌ |
| Multi-tenancy | ⚠️ RLS | ✅ Namespaces | ✅ Collections | ✅ Tenancy |
Short version:
- Existing PostgreSQL + small-medium scale → pgvector
- Managed, willing to pay → Pinecone
- Self-hosted, high performance → Qdrant
- Auto-vectorization, multi-modal → Weaviate
⚡ Your Competitors Are Already Using AI — Are You?
We build AI systems that actually work in production — not demos that die in a Colab notebook. From data pipeline to deployed model to real business outcomes.
- AI agent systems that run autonomously — not just chatbots
- Integrates with your existing tools (CRM, ERP, Slack, etc.)
- Explainable outputs — know why the model decided what it did
- Free AI opportunity audit for your business
RAG Pipeline with pgvector (Complete Example)
// Complete RAG implementation using pgvector
import OpenAI from 'openai';
import { Pool } from 'pg';
const openai = new OpenAI();
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
async function embedText(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text,
});
return response.data[0].embedding;
}
async function retrieveContext(query: string, limit = 5): Promise<string[]> {
const queryEmbedding = await embedText(query);
const result = await pool.query<{ content: string; similarity: number }>(
`SELECT content, 1 - (embedding <=> $1::vector) AS similarity
FROM document_chunks
WHERE 1 - (embedding <=> $1::vector) > 0.65
ORDER BY embedding <=> $1::vector
LIMIT $2`,
[JSON.stringify(queryEmbedding), limit]
);
return result.rows.map(r => r.content);
}
async function ragAnswer(question: string): Promise<string> {
const contextChunks = await retrieveContext(question);
if (contextChunks.length === 0) {
return "I don't have information about that in my knowledge base.";
}
const context = contextChunks
.map((chunk, i) => `[${i + 1}] ${chunk}`)
.join('\n\n');
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{
role: 'system',
content: `Answer questions using only the provided context.
If the answer isn't in the context, say so. Be concise and accurate.
Context:
${context}`,
},
{ role: 'user', content: question },
],
temperature: 0.2,
max_tokens: 500,
});
return response.choices[0].message.content ?? '';
}
Implementation Costs
| Scope | Investment |
|---|---|
| pgvector setup + RAG pipeline | $5,000–$15,000 |
| Pinecone integration + ingestion pipeline | $8,000–$20,000 |
| Full semantic search feature | $15,000–$35,000 |
| Enterprise knowledge base (ingestion + search + chat) | $40,000–$100,000 |
Infrastructure: pgvector adds ~$0 to existing PostgreSQL costs; Pinecone starts at $70/month; Qdrant Cloud from $25/month.
Working With Viprasol
We build vector search and RAG systems — document ingestion pipelines, embedding management, similarity search, and chat interfaces over private knowledge bases.
→ Semantic search consultation →
→ AI & Machine Learning Services →
→ ChatGPT API Integration →
See Also
About the Author
Viprasol Tech Team
Custom Software Development Specialists
The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.
Want to Implement AI in Your Business?
From chatbots to predictive models — harness the power of AI with a team that delivers.
Free consultation • No commitment • Response within 24 hours
Ready to automate your business with AI agents?
We build custom multi-agent AI systems that handle sales, support, ops, and content — across Telegram, WhatsApp, Slack, and 20+ other platforms. We run our own business on these systems.