Back to Blog

PostgreSQL Advanced Indexing: Covering Indexes, Expression Indexes, Index-Only Scans, and Bloat

Master advanced PostgreSQL indexing. Covers covering indexes with INCLUDE, expression indexes for computed values, index-only scans, partial indexes, index bloat monitoring and REINDEX CONCURRENTLY, and multi-column index column ordering.

Viprasol Tech Team
May 5, 2027
13 min read

Adding a B-tree index on a single column is just the beginning. Advanced PostgreSQL indexing — covering indexes that eliminate table lookups, expression indexes for computed values, partial indexes that index only the rows you query, and bloat monitoring to catch indexes that silently degrade — can transform query performance by orders of magnitude.

This guide goes beyond basic indexing to the techniques that make a real difference at scale.

1. Covering Indexes (INCLUDE)

A covering index stores additional columns alongside the index key, enabling index-only scans — the query fetches everything from the index without touching the table.

-- Query: fetch invoice amount and paid_at for a workspace
-- Without covering index:
EXPLAIN ANALYZE
SELECT id, amount, paid_at
FROM invoices
WHERE workspace_id = $1 AND status = 'paid'
ORDER BY paid_at DESC;

-- Execution plan:
-- Index Scan using idx_invoices_workspace on invoices
--   Filter: (status = 'paid')
--   Heap Fetches: 45231    ← reads 45K table pages to get amount and paid_at

-- WITH covering index: INCLUDE adds amount and paid_at to the index leaf pages
CREATE INDEX idx_invoices_workspace_paid
  ON invoices(workspace_id, status, paid_at DESC)
  INCLUDE (amount);
  -- Can also: INCLUDE (amount, currency, customer_id) for wider coverage

-- Now EXPLAIN shows:
-- Index Only Scan using idx_invoices_workspace_paid on invoices
--   Heap Fetches: 0    ← zero table reads!

-- When to use INCLUDE:
-- ✅ Query selects a few extra columns beyond the filter/sort columns
-- ✅ Table is large (row fetches are expensive)
-- ❌ Don't include columns that are updated frequently (index maintenance cost)
-- ❌ Don't include columns you ORDER BY or filter on (put those in key, not INCLUDE)

2. Expression Indexes

-- Case-insensitive email lookup
-- ❌ Without expression index: full seq scan (can't use idx on email for lower())
SELECT * FROM users WHERE lower(email) = lower($1);

-- ✅ Expression index: index the result of the expression
CREATE INDEX idx_users_email_lower ON users(lower(email));

-- Query must use the same expression to use the index:
SELECT * FROM users WHERE lower(email) = lower($1);  -- Uses index ✅
SELECT * FROM users WHERE email = $1;                 -- Doesn't use it ❌ (different expression)

-- Date-based partitioning without partitioned tables
CREATE INDEX idx_events_year_month ON events(
  EXTRACT(YEAR FROM created_at)::int,
  EXTRACT(MONTH FROM created_at)::int
);
SELECT * FROM events
WHERE EXTRACT(YEAR FROM created_at) = 2027
  AND EXTRACT(MONTH FROM created_at) = 4;

-- JSON field extraction
CREATE INDEX idx_metadata_user_id
  ON events((metadata->>'userId'));
-- Indexes the text value of the JSON field
SELECT * FROM events WHERE metadata->>'userId' = $1;

-- Computed string for full-text search
CREATE INDEX idx_products_search
  ON products USING GIN(
    to_tsvector('english', name || ' ' || coalesce(description, ''))
  );
-- Query:
SELECT * FROM products
WHERE to_tsvector('english', name || ' ' || coalesce(description, '')) @@ plainto_tsquery('english', $1);

☁️ Is Your Cloud Costing Too Much?

Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.

  • AWS, GCP, Azure certified engineers
  • Infrastructure as Code (Terraform, CDK)
  • Docker, Kubernetes, GitHub Actions CI/CD
  • Typical audit recovers $500–$3,000/month in savings

3. Partial Indexes

-- Index only the rows you actually query — much smaller, faster to maintain

-- Active records only (most queries filter deleted_at IS NULL)
CREATE INDEX idx_projects_active
  ON projects(workspace_id, updated_at DESC)
  WHERE deleted_at IS NULL;
-- Index size: orders of magnitude smaller when most records are deleted

-- Unread notifications (active processing queue)
CREATE INDEX idx_notifications_unread
  ON notifications(user_id, created_at DESC)
  WHERE read_at IS NULL;
-- This index might be 1% the size of a full index

-- Pending jobs (hot path in job processing)
CREATE INDEX idx_jobs_pending
  ON background_jobs(scheduled_at ASC, priority DESC)
  WHERE status = 'pending';
-- Only indexes the small fraction of jobs not yet processed

-- Partial unique constraint: one active record per user per type
CREATE UNIQUE INDEX idx_subscriptions_active_unique
  ON subscriptions(workspace_id, plan)
  WHERE status = 'active' AND cancelled_at IS NULL;
-- Allows multiple cancelled subscriptions (historical records) per workspace
-- Enforces only one active subscription per workspace per plan

-- IMPORTANT: Query WHERE clause must match index WHERE clause
-- This uses the partial index:
SELECT * FROM projects WHERE workspace_id = $1 AND deleted_at IS NULL ORDER BY updated_at DESC;
-- This does NOT use it (missing deleted_at IS NULL):
SELECT * FROM projects WHERE workspace_id = $1 ORDER BY updated_at DESC;

4. Multi-Column Index Column Order

-- Rule: equality filters first, range/sort last

-- Query pattern: WHERE status = ? AND created_at > ? ORDER BY created_at DESC
-- ✅ Correct order: status (equality) before created_at (range + sort)
CREATE INDEX idx_invoices_status_created
  ON invoices(status, created_at DESC);

-- ❌ Wrong order: created_at first means status filter can't use the index efficiently
CREATE INDEX idx_invoices_created_status
  ON invoices(created_at DESC, status);  -- status lookup requires full index scan per date range

-- For workspace-scoped multi-tenant apps: workspace_id ALWAYS goes first
CREATE INDEX idx_projects_ws_status_created
  ON projects(workspace_id, status, created_at DESC)
  WHERE deleted_at IS NULL;
-- Supports: WHERE workspace_id = ? AND status = ? ORDER BY created_at DESC
-- Also supports: WHERE workspace_id = ? ORDER BY ... (leftmost prefix)
-- Does NOT support efficiently: WHERE status = ? (missing workspace_id)

-- The leading column rule: an index on (A, B, C) supports queries filtering on:
-- A           ✅
-- A, B        ✅
-- A, B, C     ✅
-- B           ❌ (no A prefix — not efficient)
-- B, C        ❌
-- A, C        ✅ but only A is selective; C may need re-check

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.

  • Staging + production environments with feature flags
  • Automated security scanning in the pipeline
  • Uptime monitoring + alerting + runbook automation
  • On-call support handover docs included

5. Index Bloat Monitoring

-- Detect bloated indexes (dead index entries from UPDATE/DELETE)
-- Requires pgstattuple extension
CREATE EXTENSION IF NOT EXISTS pgstattuple;

SELECT
  i.relname                                AS index_name,
  pg_size_pretty(pg_relation_size(i.oid)) AS index_size,
  s.leaf_fragmentation                    AS bloat_pct
FROM
  pg_class t
  JOIN pg_index x ON t.oid = x.indrelid
  JOIN pg_class i ON i.oid = x.indexrelid
  CROSS JOIN LATERAL pgstatindex(i.relname) s
WHERE t.relkind = 'r'
  AND t.relname NOT LIKE 'pg_%'
  AND s.leaf_fragmentation > 30   -- More than 30% bloat
ORDER BY pg_relation_size(i.oid) DESC;

-- Simple bloat estimate (without pgstattuple — no extension needed)
SELECT
  schemaname,
  tablename,
  indexname,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
WHERE pg_relation_size(indexrelid) > 10 * 1024 * 1024  -- Indexes > 10MB
  AND idx_scan < 100                                     -- Barely used
ORDER BY pg_relation_size(indexrelid) DESC;

6. REINDEX Without Downtime

-- REINDEX TABLE blocks all reads and writes — never use on production
-- REINDEX TABLE CONCURRENTLY: builds new index while allowing reads/writes

-- Rebuild a bloated index without locking
REINDEX INDEX CONCURRENTLY idx_invoices_workspace_paid;

-- Rebuild all indexes on a table
REINDEX TABLE CONCURRENTLY invoices;

-- Note: CONCURRENTLY takes longer and temporarily has twice the index storage
-- It also cannot run inside a transaction block

-- Alternative: CREATE INDEX CONCURRENTLY + DROP old index
-- This is safer because you can verify the new index before dropping the old one
CREATE INDEX CONCURRENTLY idx_invoices_workspace_paid_v2
  ON invoices(workspace_id, status, paid_at DESC) INCLUDE (amount);

-- Verify it works (check EXPLAIN uses the new index)
EXPLAIN SELECT id, amount FROM invoices WHERE workspace_id = $1 AND status = 'paid';

-- Drop the old bloated index
DROP INDEX CONCURRENTLY idx_invoices_workspace_paid;

-- Rename new to canonical name
ALTER INDEX idx_invoices_workspace_paid_v2 RENAME TO idx_invoices_workspace_paid;

7. Index Usage Statistics

-- Find the most-used indexes (worth protecting from bloat)
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan                                       AS times_used,
  pg_size_pretty(pg_relation_size(indexrelid))   AS size,
  idx_tup_read                                   AS tuples_read,
  idx_tup_fetch                                  AS tuples_fetched
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC
LIMIT 20;

-- Find tables with high seq scan rate (likely missing indexes)
SELECT
  relname                                   AS table_name,
  seq_scan,
  idx_scan,
  n_live_tup                                AS live_rows,
  round(seq_scan::numeric / nullif(seq_scan + idx_scan, 0) * 100, 1) AS seq_scan_pct
FROM pg_stat_user_tables
WHERE seq_scan > 0 AND n_live_tup > 10000
ORDER BY seq_scan DESC
LIMIT 20;

-- Check if index-only scans are being used
-- High heap_fetches with low idx_scan suggests visibility map isn't up to date
-- Run VACUUM to fix
SELECT
  relname,
  indexrelname,
  idx_scan,
  idx_tup_fetch,
  idx_tup_read
FROM pg_stat_user_indexes
WHERE indexrelname LIKE '%workspace%'
ORDER BY idx_scan DESC;

8. Visibility Map and Index-Only Scans

-- Index-only scans require up-to-date visibility map
-- If heap_fetches is high, VACUUM more aggressively

SELECT
  relname,
  n_dead_tup,
  last_autovacuum,
  last_vacuum
FROM pg_stat_user_tables
WHERE relname = 'invoices';

-- Force VACUUM to update visibility map
VACUUM (ANALYZE, VERBOSE) invoices;

-- Check visibility map status
SELECT
  relname,
  all_visible,     -- Pages where all rows are visible to all transactions
  all_frozen       -- Pages with all rows frozen
FROM pg_class c
JOIN pg_visibility_map_summary(c.oid) v ON true
WHERE relname = 'invoices';
-- If all_visible is low, index-only scans will fall back to heap fetches

Indexing Decision Matrix

Query PatternIndex Type
WHERE col = ?B-tree on col
WHERE col = ? ORDER BY other DESCB-tree on (col, other DESC)
WHERE col IN (?, ?, ?)B-tree on col
WHERE col LIKE 'prefix%'B-tree on col
WHERE col LIKE '%suffix'GIN with pg_trgm
WHERE lower(col) = ?Expression: lower(col)
WHERE jsonb_col->>'key' = ?Expression: (jsonb_col->>'key')
WHERE col @@ tsqueryGIN on tsvector
WHERE array_col @> ARRAY[?]GIN on array
WHERE status = 'active' (most rows inactive)Partial WHERE status = 'active'
ORDER BY col on large tableB-tree on col with sort direction
WHERE a = ? AND b = ? — select c, dCovering: (a, b) INCLUDE (c, d)

Cost and Timeline

Indexing is inexpensive — the cost is analyst time and the disk space for index storage:

ActivityTimeStorage Overhead
Add a covering index (CONCURRENTLY)5–30 min (build time)10–50% of table size
Expression index5–20 minVaries
Partial index1–10 min (only indexes matching rows)1–50% of full index
REINDEX CONCURRENTLY (bloat fix)10 min – 2 hours2× index during build

See Also


Working With Viprasol

The right index can turn a 30-second query into 30 milliseconds. Our team audits your query patterns using pg_stat_statements and EXPLAIN ANALYZE, identifies missing covering indexes, expression indexes for case-insensitive lookups, and partial indexes for your active-record query patterns — all applied CONCURRENTLY with zero production downtime.

What we deliver:

  • Covering index audit: identify queries with high heap_fetches that benefit from INCLUDE
  • Expression index setup for lower(email), JSONB fields, and computed columns
  • Partial index strategy for soft-deleted records and status-filtered hot paths
  • Index bloat scan + REINDEX CONCURRENTLY runbook
  • Column ordering review for all composite indexes (equality before range)

Talk to our team about your PostgreSQL indexing strategy →

Or explore our cloud and database services.

Share this article:

About the Author

V

Viprasol Tech Team

Custom Software Development Specialists

The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.

MT4/MT5 EA DevelopmentAI Agent SystemsSaaS DevelopmentAlgorithmic Trading

Need DevOps & Cloud Expertise?

Scale your infrastructure with confidence. AWS, GCP, Azure certified team.

Free consultation • No commitment • Response within 24 hours

Viprasol · Big Data & Analytics

Making sense of your data at scale?

Viprasol builds end-to-end big data analytics solutions — ETL pipelines, data warehouses on Snowflake or BigQuery, and self-service BI dashboards. One reliable source of truth for your entire organisation.