Advanced PostgreSQL Full-Text Search: Rankings, Phrase Search, Multi-Language, and pg_trgm Hybrid
Build advanced full-text search in PostgreSQL. Covers tsvector/tsquery with rankings, phrase search, multi-language dictionaries, fuzzy matching with pg_trgm, hybrid search combining FTS and trigrams, and search index design.
PostgreSQL's full-text search (tsvector/tsquery) handles most SaaS search needs up to tens of millions of rows. But basic FTS — to_tsvector('english', text) @@ to_tsquery('english', query) — only scratches the surface. Production search needs rankings, phrase search, multi-language support, and graceful handling of partial words and typos.
This guide goes deep on advanced FTS patterns and the pg_trgm hybrid approach that makes PostgreSQL search feel genuinely good.
Review: The Basics
-- Generate tsvector (weighted) and query it
SELECT
title,
ts_rank(search_vector, query) AS rank
FROM documents,
to_tsquery('english', 'postgresql & search') query
WHERE search_vector @@ query
ORDER BY rank DESC;
The starting point. Now let's make it production-grade.
Weighted Search Vectors
Different fields matter differently. Weight them:
-- Weighted tsvector: title (A) > description (B) > body (C)
-- A > B > C > D in ts_rank
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
workspace_id UUID NOT NULL,
title TEXT NOT NULL,
description TEXT,
body TEXT,
tags TEXT[],
language TEXT NOT NULL DEFAULT 'english',
author_id UUID,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
-- Stored, computed tsvector for index performance
search_vector TSVECTOR GENERATED ALWAYS AS (
setweight(to_tsvector(COALESCE(language, 'english')::regconfig, COALESCE(title, '')), 'A') ||
setweight(to_tsvector(COALESCE(language, 'english')::regconfig, COALESCE(description, '')), 'B') ||
setweight(to_tsvector(COALESCE(language, 'english')::regconfig, COALESCE(body, '')), 'C') ||
setweight(to_tsvector('simple', COALESCE(array_to_string(tags, ' '), '')), 'B')
) STORED
);
CREATE INDEX idx_documents_fts ON documents USING GIN(search_vector);
CREATE INDEX idx_documents_workspace ON documents(workspace_id, created_at DESC);
-- Query with ranking
SELECT
id,
title,
ts_rank_cd(search_vector, query, 32) AS rank, -- 32 = normalize by document length
ts_headline(
'english',
COALESCE(body, description, ''),
query,
'StartSel=<mark>, StopSel=</mark>, MaxWords=30, MinWords=15, MaxFragments=2'
) AS excerpt
FROM documents,
websearch_to_tsquery('english', $1) query -- Better than to_tsquery for user input
WHERE
workspace_id = $2
AND search_vector @@ query
ORDER BY rank DESC, created_at DESC
LIMIT 20;
websearch_to_tsquery vs to_tsquery:
to_tsquery('english', 'project management')→ ERROR (needs&operator)websearch_to_tsquery('english', 'project management')→'project' & 'management'✅websearch_to_tsquery('english', '"project management"')→ phrase query ✅websearch_to_tsquery('english', 'project -test')→ exclude ✅
Always use websearch_to_tsquery for user-facing search.
☁️ Is Your Cloud Costing Too Much?
Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.
- AWS, GCP, Azure certified engineers
- Infrastructure as Code (Terraform, CDK)
- Docker, Kubernetes, GitHub Actions CI/CD
- Typical audit recovers $500–$3,000/month in savings
Phrase Search
-- Exact phrase: words adjacent in order
SELECT id, title
FROM documents
WHERE search_vector @@ phraseto_tsquery('english', 'database migration')
AND workspace_id = $1;
-- websearch_to_tsquery handles quotes for phrase search
SELECT id, title
FROM documents
WHERE search_vector @@ websearch_to_tsquery('english', '"blue green deployment"')
AND workspace_id = $1;
Multi-Language Search
-- Store language per document; query uses the stored language
-- Documents table has: language TEXT DEFAULT 'english'
-- Multi-language query: search across all documents regardless of language
-- Generate tsvector per document using its own language config
CREATE OR REPLACE FUNCTION search_multilang(
p_workspace_id UUID,
p_query TEXT,
p_limit INT DEFAULT 20
)
RETURNS TABLE(id BIGINT, title TEXT, rank REAL, language TEXT)
LANGUAGE SQL STABLE AS $$
WITH queries AS (
SELECT lang, websearch_to_tsquery(lang::regconfig, p_query) AS q
FROM unnest(ARRAY['english','spanish','french','german','portuguese']::text[]) AS lang
WHERE p_query IS NOT NULL
)
SELECT DISTINCT ON (d.id)
d.id,
d.title,
MAX(ts_rank_cd(d.search_vector, q.q)) AS rank,
d.language
FROM documents d
JOIN queries q ON d.language = q.lang AND d.search_vector @@ q.q
WHERE d.workspace_id = p_workspace_id
GROUP BY d.id, d.title, d.language
ORDER BY d.id, rank DESC
LIMIT p_limit;
$$;
-- Simpler: store normalized search_vector using the document's own language
-- (already done in GENERATED ALWAYS AS above)
-- For cross-language, also maintain a 'simple' dictionary vector (no stemming)
ALTER TABLE documents ADD COLUMN search_vector_simple TSVECTOR GENERATED ALWAYS AS (
to_tsvector('simple', COALESCE(title, '') || ' ' || COALESCE(description, '') || ' ' || COALESCE(body, ''))
) STORED;
CREATE INDEX idx_documents_fts_simple ON documents USING GIN(search_vector_simple);
⚙️ DevOps Done Right — Zero Downtime, Full Automation
Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.
- Staging + production environments with feature flags
- Automated security scanning in the pipeline
- Uptime monitoring + alerting + runbook automation
- On-call support handover docs included
Fuzzy Search with pg_trgm
FTS requires dictionary words — "managment" (typo) won't match "management". pg_trgm handles typos via trigram similarity:
-- Enable extension
CREATE EXTENSION IF NOT EXISTS pg_trgm;
-- Trigram index on title + description for fuzzy matching
CREATE INDEX idx_documents_trgm ON documents
USING GIN((title || ' ' || COALESCE(description, '')) gin_trgm_ops);
-- Fuzzy title search: similarity > 0.3 (0 = no match, 1 = identical)
SELECT id, title, similarity(title, $1) AS sim
FROM documents
WHERE
workspace_id = $2
AND title % $1 -- % operator: similarity threshold (default 0.3)
ORDER BY sim DESC
LIMIT 10;
-- Fuzzy search across title + description combined
SELECT
id,
title,
similarity(title || ' ' || COALESCE(description, ''), $1) AS sim
FROM documents
WHERE
workspace_id = $2
AND (title || ' ' || COALESCE(description, '')) % $1
ORDER BY sim DESC
LIMIT 10;
Hybrid Search: FTS + Trigram
The best of both worlds: FTS for relevance, trigrams for typo tolerance and partial matching:
// lib/search/hybrid-search.ts
import { prisma } from "@/lib/prisma";
interface SearchResult {
id: string;
title: string;
excerpt: string;
rank: number;
matchType: "exact" | "fuzzy" | "prefix";
}
export async function hybridSearch(
workspaceId: string,
query: string,
limit = 20
): Promise<SearchResult[]> {
if (!query.trim()) return [];
const results = await prisma.$queryRaw<SearchResult[]>`
WITH
-- Stage 1: FTS with ranking (high precision)
fts_results AS (
SELECT
id::text,
title,
ts_rank_cd(search_vector, q) * 2.0 AS rank, -- Weight FTS higher
ts_headline(
'english',
COALESCE(description, LEFT(body, 500), ''),
q,
'StartSel=<mark>, StopSel=</mark>, MaxWords=25, MinWords=10, MaxFragments=1'
) AS excerpt,
'exact'::text AS match_type
FROM documents,
websearch_to_tsquery('english', ${query}) q
WHERE
workspace_id = ${workspaceId}::uuid
AND search_vector @@ q
),
-- Stage 2: Trigram fuzzy (catches typos, partial words)
trgm_results AS (
SELECT
id::text,
title,
similarity(title || ' ' || COALESCE(description, ''), ${query}) AS rank,
LEFT(COALESCE(description, body, ''), 200) AS excerpt,
'fuzzy'::text AS match_type
FROM documents
WHERE
workspace_id = ${workspaceId}::uuid
AND (title || ' ' || COALESCE(description, '')) % ${query}
AND id::text NOT IN (SELECT id FROM fts_results) -- No duplicates
),
-- Stage 3: Prefix match for autocomplete-style queries
prefix_results AS (
SELECT
id::text,
title,
0.5 AS rank,
LEFT(COALESCE(description, body, ''), 200) AS excerpt,
'prefix'::text AS match_type
FROM documents
WHERE
workspace_id = ${workspaceId}::uuid
AND title ILIKE ${query + "%"}
AND id::text NOT IN (SELECT id FROM fts_results)
AND id::text NOT IN (SELECT id FROM trgm_results)
)
SELECT id, title, excerpt, rank::real, match_type
FROM (
SELECT * FROM fts_results
UNION ALL
SELECT * FROM trgm_results
UNION ALL
SELECT * FROM prefix_results
) combined
ORDER BY rank DESC, title
LIMIT ${limit}
`;
return results;
}
Search Suggestions / Autocomplete
-- Fast autocomplete: prefix match on title using GIN trigram index
-- pg_trgm supports LIKE/ILIKE with GIN index if query starts with searched term
SELECT DISTINCT title
FROM documents
WHERE
workspace_id = $1
AND title ILIKE $2 || '%' -- $2 = user input
ORDER BY title
LIMIT 8;
-- Alternatively, use a materialized suggestions table refreshed periodically
CREATE MATERIALIZED VIEW search_suggestions AS
SELECT DISTINCT
workspace_id,
LOWER(title) AS suggestion,
COUNT(*) AS frequency
FROM documents
WHERE status = 'published'
GROUP BY workspace_id, LOWER(title)
HAVING COUNT(*) >= 1;
CREATE INDEX idx_suggestions ON search_suggestions
USING GIN(suggestion gin_trgm_ops);
CREATE INDEX idx_suggestions_workspace ON search_suggestions(workspace_id, suggestion);
Search Analytics
Track what users search for and what gets no results:
CREATE TABLE search_queries (
id BIGSERIAL PRIMARY KEY,
workspace_id UUID NOT NULL,
user_id UUID,
query TEXT NOT NULL,
result_count INTEGER NOT NULL,
clicked_id UUID, -- Did user click a result?
searched_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_search_queries_workspace ON search_queries(workspace_id, searched_at DESC);
CREATE INDEX idx_search_queries_zero ON search_queries(workspace_id)
WHERE result_count = 0; -- Partial index for zero-result queries
-- Top zero-result queries (missing content to create)
SELECT query, COUNT(*) AS searches
FROM search_queries
WHERE workspace_id = $1 AND result_count = 0
AND searched_at > NOW() - INTERVAL '30 days'
GROUP BY query
ORDER BY 2 DESC
LIMIT 20;
-- Search click-through rate by query
SELECT
query,
COUNT(*) AS searches,
COUNT(clicked_id) AS clicks,
ROUND(100.0 * COUNT(clicked_id) / COUNT(*), 1) AS ctr_pct
FROM search_queries
WHERE workspace_id = $1
AND searched_at > NOW() - INTERVAL '30 days'
GROUP BY query
HAVING COUNT(*) >= 5
ORDER BY searches DESC
LIMIT 20;
TypeScript Search API
// app/api/search/route.ts
import { NextRequest, NextResponse } from "next/server";
import { auth } from "@/auth";
import { hybridSearch } from "@/lib/search/hybrid-search";
import { prisma } from "@/lib/prisma";
import { z } from "zod";
const SearchSchema = z.object({
q: z.string().min(1).max(200),
limit: z.coerce.number().min(1).max(50).default(20),
});
export async function GET(req: NextRequest) {
const session = await auth();
if (!session?.user) return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
const parsed = SearchSchema.safeParse(
Object.fromEntries(req.nextUrl.searchParams)
);
if (!parsed.success) {
return NextResponse.json({ error: parsed.error.issues[0].message }, { status: 400 });
}
const { q, limit } = parsed.data;
const results = await hybridSearch(session.user.organizationId, q, limit);
// Log search for analytics (fire-and-forget)
prisma.searchQuery.create({
data: {
workspaceId: session.user.organizationId,
userId: session.user.id,
query: q,
resultCount: results.length,
},
}).catch(console.error);
return NextResponse.json({ results, query: q });
}
When to Move Beyond PostgreSQL FTS
| Signal | Action |
|---|---|
| <5M searchable documents | PostgreSQL FTS + pg_trgm (this guide) |
| 5M–50M documents | PostgreSQL with aggressive indexing + materialized suggestions |
| >50M documents | Elasticsearch or OpenSearch |
| Faceted search (filter by 10+ attributes simultaneously) | Elasticsearch |
| Semantic/vector search (meaning, not keywords) | pgvector or Pinecone |
| Real-time personalization | Algolia or custom vector store |
Cost and Timeline Estimates
| Scope | Team | Timeline | Cost Range |
|---|---|---|---|
| Basic FTS with ranking | 1 dev | 1–2 days | $300–600 |
| Hybrid FTS + pg_trgm + highlights | 1 dev | 3–5 days | $800–1,500 |
| Full search (multi-language + analytics + autocomplete) | 1–2 devs | 1–2 weeks | $2,500–5,000 |
| Elasticsearch migration for scale | 2 devs | 3–5 weeks | $8,000–18,000 |
See Also
- PostgreSQL Full-Text Search Basics
- AWS OpenSearch for Analytics and Search
- SaaS Search and Autocomplete
- PostgreSQL JSONB Patterns
- PostgreSQL Materialized Views
Working With Viprasol
Search is often the first feature that feels broken to users — and fixing it after the fact is harder than doing it right. Our team builds PostgreSQL search with weighted tsvectors, phrase search, and pg_trgm fuzzy matching, plus the search analytics that show you what users are looking for and not finding.
What we deliver:
- Weighted stored tsvector (GENERATED ALWAYS AS) with GIN index
websearch_to_tsquerywithts_headlineexcerpts and markup- pg_trgm hybrid: FTS + fuzzy + prefix in a single ranked query
- Multi-language tsvector configuration
- Search analytics (zero-result queries, click-through rate)
Talk to our team about your search implementation →
Or explore our cloud and database services.
About the Author
Viprasol Tech Team
Custom Software Development Specialists
The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.
Need DevOps & Cloud Expertise?
Scale your infrastructure with confidence. AWS, GCP, Azure certified team.
Free consultation • No commitment • Response within 24 hours
Making sense of your data at scale?
Viprasol builds end-to-end big data analytics solutions — ETL pipelines, data warehouses on Snowflake or BigQuery, and self-service BI dashboards. One reliable source of truth for your entire organisation.