Back to Blog

Advanced PostgreSQL Full-Text Search: Rankings, Phrase Search, Multi-Language, and pg_trgm Hybrid

Build advanced full-text search in PostgreSQL. Covers tsvector/tsquery with rankings, phrase search, multi-language dictionaries, fuzzy matching with pg_trgm, hybrid search combining FTS and trigrams, and search index design.

Viprasol Tech Team
April 3, 2027
14 min read

PostgreSQL's full-text search (tsvector/tsquery) handles most SaaS search needs up to tens of millions of rows. But basic FTS — to_tsvector('english', text) @@ to_tsquery('english', query) — only scratches the surface. Production search needs rankings, phrase search, multi-language support, and graceful handling of partial words and typos.

This guide goes deep on advanced FTS patterns and the pg_trgm hybrid approach that makes PostgreSQL search feel genuinely good.

Review: The Basics

-- Generate tsvector (weighted) and query it
SELECT
  title,
  ts_rank(search_vector, query) AS rank
FROM documents,
  to_tsquery('english', 'postgresql & search') query
WHERE search_vector @@ query
ORDER BY rank DESC;

The starting point. Now let's make it production-grade.

Weighted Search Vectors

Different fields matter differently. Weight them:

-- Weighted tsvector: title (A) > description (B) > body (C)
-- A > B > C > D in ts_rank
CREATE TABLE documents (
  id            BIGSERIAL PRIMARY KEY,
  workspace_id  UUID NOT NULL,
  title         TEXT NOT NULL,
  description   TEXT,
  body          TEXT,
  tags          TEXT[],
  language      TEXT NOT NULL DEFAULT 'english',
  author_id     UUID,
  created_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),

  -- Stored, computed tsvector for index performance
  search_vector TSVECTOR GENERATED ALWAYS AS (
    setweight(to_tsvector(COALESCE(language, 'english')::regconfig, COALESCE(title, '')), 'A') ||
    setweight(to_tsvector(COALESCE(language, 'english')::regconfig, COALESCE(description, '')), 'B') ||
    setweight(to_tsvector(COALESCE(language, 'english')::regconfig, COALESCE(body, '')), 'C') ||
    setweight(to_tsvector('simple', COALESCE(array_to_string(tags, ' '), '')), 'B')
  ) STORED
);

CREATE INDEX idx_documents_fts ON documents USING GIN(search_vector);
CREATE INDEX idx_documents_workspace ON documents(workspace_id, created_at DESC);
-- Query with ranking
SELECT
  id,
  title,
  ts_rank_cd(search_vector, query, 32) AS rank,  -- 32 = normalize by document length
  ts_headline(
    'english',
    COALESCE(body, description, ''),
    query,
    'StartSel=<mark>, StopSel=</mark>, MaxWords=30, MinWords=15, MaxFragments=2'
  ) AS excerpt
FROM documents,
  websearch_to_tsquery('english', $1) query  -- Better than to_tsquery for user input
WHERE
  workspace_id = $2
  AND search_vector @@ query
ORDER BY rank DESC, created_at DESC
LIMIT 20;

websearch_to_tsquery vs to_tsquery:

  • to_tsquery('english', 'project management') → ERROR (needs & operator)
  • websearch_to_tsquery('english', 'project management')'project' & 'management'
  • websearch_to_tsquery('english', '"project management"') → phrase query ✅
  • websearch_to_tsquery('english', 'project -test') → exclude ✅

Always use websearch_to_tsquery for user-facing search.

☁️ Is Your Cloud Costing Too Much?

Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.

  • AWS, GCP, Azure certified engineers
  • Infrastructure as Code (Terraform, CDK)
  • Docker, Kubernetes, GitHub Actions CI/CD
  • Typical audit recovers $500–$3,000/month in savings

Phrase Search

-- Exact phrase: words adjacent in order
SELECT id, title
FROM documents
WHERE search_vector @@ phraseto_tsquery('english', 'database migration')
  AND workspace_id = $1;

-- websearch_to_tsquery handles quotes for phrase search
SELECT id, title
FROM documents
WHERE search_vector @@ websearch_to_tsquery('english', '"blue green deployment"')
  AND workspace_id = $1;

Multi-Language Search

-- Store language per document; query uses the stored language
-- Documents table has: language TEXT DEFAULT 'english'

-- Multi-language query: search across all documents regardless of language
-- Generate tsvector per document using its own language config
CREATE OR REPLACE FUNCTION search_multilang(
  p_workspace_id UUID,
  p_query TEXT,
  p_limit INT DEFAULT 20
)
RETURNS TABLE(id BIGINT, title TEXT, rank REAL, language TEXT)
LANGUAGE SQL STABLE AS $$
  WITH queries AS (
    SELECT lang, websearch_to_tsquery(lang::regconfig, p_query) AS q
    FROM unnest(ARRAY['english','spanish','french','german','portuguese']::text[]) AS lang
    WHERE p_query IS NOT NULL
  )
  SELECT DISTINCT ON (d.id)
    d.id,
    d.title,
    MAX(ts_rank_cd(d.search_vector, q.q)) AS rank,
    d.language
  FROM documents d
  JOIN queries q ON d.language = q.lang AND d.search_vector @@ q.q
  WHERE d.workspace_id = p_workspace_id
  GROUP BY d.id, d.title, d.language
  ORDER BY d.id, rank DESC
  LIMIT p_limit;
$$;
-- Simpler: store normalized search_vector using the document's own language
-- (already done in GENERATED ALWAYS AS above)
-- For cross-language, also maintain a 'simple' dictionary vector (no stemming)
ALTER TABLE documents ADD COLUMN search_vector_simple TSVECTOR GENERATED ALWAYS AS (
  to_tsvector('simple', COALESCE(title, '') || ' ' || COALESCE(description, '') || ' ' || COALESCE(body, ''))
) STORED;

CREATE INDEX idx_documents_fts_simple ON documents USING GIN(search_vector_simple);

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.

  • Staging + production environments with feature flags
  • Automated security scanning in the pipeline
  • Uptime monitoring + alerting + runbook automation
  • On-call support handover docs included

Fuzzy Search with pg_trgm

FTS requires dictionary words — "managment" (typo) won't match "management". pg_trgm handles typos via trigram similarity:

-- Enable extension
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Trigram index on title + description for fuzzy matching
CREATE INDEX idx_documents_trgm ON documents
  USING GIN((title || ' ' || COALESCE(description, '')) gin_trgm_ops);

-- Fuzzy title search: similarity > 0.3 (0 = no match, 1 = identical)
SELECT id, title, similarity(title, $1) AS sim
FROM documents
WHERE
  workspace_id = $2
  AND title % $1  -- % operator: similarity threshold (default 0.3)
ORDER BY sim DESC
LIMIT 10;

-- Fuzzy search across title + description combined
SELECT
  id,
  title,
  similarity(title || ' ' || COALESCE(description, ''), $1) AS sim
FROM documents
WHERE
  workspace_id = $2
  AND (title || ' ' || COALESCE(description, '')) % $1
ORDER BY sim DESC
LIMIT 10;

Hybrid Search: FTS + Trigram

The best of both worlds: FTS for relevance, trigrams for typo tolerance and partial matching:

// lib/search/hybrid-search.ts
import { prisma } from "@/lib/prisma";

interface SearchResult {
  id: string;
  title: string;
  excerpt: string;
  rank: number;
  matchType: "exact" | "fuzzy" | "prefix";
}

export async function hybridSearch(
  workspaceId: string,
  query: string,
  limit = 20
): Promise<SearchResult[]> {
  if (!query.trim()) return [];

  const results = await prisma.$queryRaw<SearchResult[]>`
    WITH
    -- Stage 1: FTS with ranking (high precision)
    fts_results AS (
      SELECT
        id::text,
        title,
        ts_rank_cd(search_vector, q) * 2.0 AS rank,  -- Weight FTS higher
        ts_headline(
          'english',
          COALESCE(description, LEFT(body, 500), ''),
          q,
          'StartSel=<mark>, StopSel=</mark>, MaxWords=25, MinWords=10, MaxFragments=1'
        ) AS excerpt,
        'exact'::text AS match_type
      FROM documents,
        websearch_to_tsquery('english', ${query}) q
      WHERE
        workspace_id = ${workspaceId}::uuid
        AND search_vector @@ q
    ),

    -- Stage 2: Trigram fuzzy (catches typos, partial words)
    trgm_results AS (
      SELECT
        id::text,
        title,
        similarity(title || ' ' || COALESCE(description, ''), ${query}) AS rank,
        LEFT(COALESCE(description, body, ''), 200) AS excerpt,
        'fuzzy'::text AS match_type
      FROM documents
      WHERE
        workspace_id = ${workspaceId}::uuid
        AND (title || ' ' || COALESCE(description, '')) % ${query}
        AND id::text NOT IN (SELECT id FROM fts_results)  -- No duplicates
    ),

    -- Stage 3: Prefix match for autocomplete-style queries
    prefix_results AS (
      SELECT
        id::text,
        title,
        0.5 AS rank,
        LEFT(COALESCE(description, body, ''), 200) AS excerpt,
        'prefix'::text AS match_type
      FROM documents
      WHERE
        workspace_id = ${workspaceId}::uuid
        AND title ILIKE ${query + "%"}
        AND id::text NOT IN (SELECT id FROM fts_results)
        AND id::text NOT IN (SELECT id FROM trgm_results)
    )

    SELECT id, title, excerpt, rank::real, match_type
    FROM (
      SELECT * FROM fts_results
      UNION ALL
      SELECT * FROM trgm_results
      UNION ALL
      SELECT * FROM prefix_results
    ) combined
    ORDER BY rank DESC, title
    LIMIT ${limit}
  `;

  return results;
}

Search Suggestions / Autocomplete

-- Fast autocomplete: prefix match on title using GIN trigram index
-- pg_trgm supports LIKE/ILIKE with GIN index if query starts with searched term
SELECT DISTINCT title
FROM documents
WHERE
  workspace_id = $1
  AND title ILIKE $2 || '%'  -- $2 = user input
ORDER BY title
LIMIT 8;

-- Alternatively, use a materialized suggestions table refreshed periodically
CREATE MATERIALIZED VIEW search_suggestions AS
SELECT DISTINCT
  workspace_id,
  LOWER(title) AS suggestion,
  COUNT(*) AS frequency
FROM documents
WHERE status = 'published'
GROUP BY workspace_id, LOWER(title)
HAVING COUNT(*) >= 1;

CREATE INDEX idx_suggestions ON search_suggestions
  USING GIN(suggestion gin_trgm_ops);

CREATE INDEX idx_suggestions_workspace ON search_suggestions(workspace_id, suggestion);

Search Analytics

Track what users search for and what gets no results:

CREATE TABLE search_queries (
  id            BIGSERIAL PRIMARY KEY,
  workspace_id  UUID NOT NULL,
  user_id       UUID,
  query         TEXT NOT NULL,
  result_count  INTEGER NOT NULL,
  clicked_id    UUID,                    -- Did user click a result?
  searched_at   TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_search_queries_workspace ON search_queries(workspace_id, searched_at DESC);
CREATE INDEX idx_search_queries_zero ON search_queries(workspace_id)
  WHERE result_count = 0;  -- Partial index for zero-result queries

-- Top zero-result queries (missing content to create)
SELECT query, COUNT(*) AS searches
FROM search_queries
WHERE workspace_id = $1 AND result_count = 0
  AND searched_at > NOW() - INTERVAL '30 days'
GROUP BY query
ORDER BY 2 DESC
LIMIT 20;

-- Search click-through rate by query
SELECT
  query,
  COUNT(*) AS searches,
  COUNT(clicked_id) AS clicks,
  ROUND(100.0 * COUNT(clicked_id) / COUNT(*), 1) AS ctr_pct
FROM search_queries
WHERE workspace_id = $1
  AND searched_at > NOW() - INTERVAL '30 days'
GROUP BY query
HAVING COUNT(*) >= 5
ORDER BY searches DESC
LIMIT 20;

TypeScript Search API

// app/api/search/route.ts
import { NextRequest, NextResponse } from "next/server";
import { auth } from "@/auth";
import { hybridSearch } from "@/lib/search/hybrid-search";
import { prisma } from "@/lib/prisma";
import { z } from "zod";

const SearchSchema = z.object({
  q: z.string().min(1).max(200),
  limit: z.coerce.number().min(1).max(50).default(20),
});

export async function GET(req: NextRequest) {
  const session = await auth();
  if (!session?.user) return NextResponse.json({ error: "Unauthorized" }, { status: 401 });

  const parsed = SearchSchema.safeParse(
    Object.fromEntries(req.nextUrl.searchParams)
  );
  if (!parsed.success) {
    return NextResponse.json({ error: parsed.error.issues[0].message }, { status: 400 });
  }

  const { q, limit } = parsed.data;

  const results = await hybridSearch(session.user.organizationId, q, limit);

  // Log search for analytics (fire-and-forget)
  prisma.searchQuery.create({
    data: {
      workspaceId: session.user.organizationId,
      userId: session.user.id,
      query: q,
      resultCount: results.length,
    },
  }).catch(console.error);

  return NextResponse.json({ results, query: q });
}

When to Move Beyond PostgreSQL FTS

SignalAction
<5M searchable documentsPostgreSQL FTS + pg_trgm (this guide)
5M–50M documentsPostgreSQL with aggressive indexing + materialized suggestions
>50M documentsElasticsearch or OpenSearch
Faceted search (filter by 10+ attributes simultaneously)Elasticsearch
Semantic/vector search (meaning, not keywords)pgvector or Pinecone
Real-time personalizationAlgolia or custom vector store

Cost and Timeline Estimates

ScopeTeamTimelineCost Range
Basic FTS with ranking1 dev1–2 days$300–600
Hybrid FTS + pg_trgm + highlights1 dev3–5 days$800–1,500
Full search (multi-language + analytics + autocomplete)1–2 devs1–2 weeks$2,500–5,000
Elasticsearch migration for scale2 devs3–5 weeks$8,000–18,000

See Also


Working With Viprasol

Search is often the first feature that feels broken to users — and fixing it after the fact is harder than doing it right. Our team builds PostgreSQL search with weighted tsvectors, phrase search, and pg_trgm fuzzy matching, plus the search analytics that show you what users are looking for and not finding.

What we deliver:

  • Weighted stored tsvector (GENERATED ALWAYS AS) with GIN index
  • websearch_to_tsquery with ts_headline excerpts and markup
  • pg_trgm hybrid: FTS + fuzzy + prefix in a single ranked query
  • Multi-language tsvector configuration
  • Search analytics (zero-result queries, click-through rate)

Talk to our team about your search implementation →

Or explore our cloud and database services.

Share this article:

About the Author

V

Viprasol Tech Team

Custom Software Development Specialists

The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.

MT4/MT5 EA DevelopmentAI Agent SystemsSaaS DevelopmentAlgorithmic Trading

Need DevOps & Cloud Expertise?

Scale your infrastructure with confidence. AWS, GCP, Azure certified team.

Free consultation • No commitment • Response within 24 hours

Viprasol · Big Data & Analytics

Making sense of your data at scale?

Viprasol builds end-to-end big data analytics solutions — ETL pipelines, data warehouses on Snowflake or BigQuery, and self-service BI dashboards. One reliable source of truth for your entire organisation.