PostgreSQL Advanced Indexing: Covering Indexes, Expression Indexes, Index-Only Scans, and Bloat
Master advanced PostgreSQL indexing. Covers covering indexes with INCLUDE, expression indexes for computed values, index-only scans, partial indexes, index bloat monitoring and REINDEX CONCURRENTLY, and multi-column index column ordering.
Adding a B-tree index on a single column is just the beginning. Advanced PostgreSQL indexing — covering indexes that eliminate table lookups, expression indexes for computed values, partial indexes that index only the rows you query, and bloat monitoring to catch indexes that silently degrade — can transform query performance by orders of magnitude.
This guide goes beyond basic indexing to the techniques that make a real difference at scale.
1. Covering Indexes (INCLUDE)
A covering index stores additional columns alongside the index key, enabling index-only scans — the query fetches everything from the index without touching the table.
-- Query: fetch invoice amount and paid_at for a workspace
-- Without covering index:
EXPLAIN ANALYZE
SELECT id, amount, paid_at
FROM invoices
WHERE workspace_id = $1 AND status = 'paid'
ORDER BY paid_at DESC;
-- Execution plan:
-- Index Scan using idx_invoices_workspace on invoices
-- Filter: (status = 'paid')
-- Heap Fetches: 45231 ← reads 45K table pages to get amount and paid_at
-- WITH covering index: INCLUDE adds amount and paid_at to the index leaf pages
CREATE INDEX idx_invoices_workspace_paid
ON invoices(workspace_id, status, paid_at DESC)
INCLUDE (amount);
-- Can also: INCLUDE (amount, currency, customer_id) for wider coverage
-- Now EXPLAIN shows:
-- Index Only Scan using idx_invoices_workspace_paid on invoices
-- Heap Fetches: 0 ← zero table reads!
-- When to use INCLUDE:
-- ✅ Query selects a few extra columns beyond the filter/sort columns
-- ✅ Table is large (row fetches are expensive)
-- ❌ Don't include columns that are updated frequently (index maintenance cost)
-- ❌ Don't include columns you ORDER BY or filter on (put those in key, not INCLUDE)
2. Expression Indexes
-- Case-insensitive email lookup
-- ❌ Without expression index: full seq scan (can't use idx on email for lower())
SELECT * FROM users WHERE lower(email) = lower($1);
-- ✅ Expression index: index the result of the expression
CREATE INDEX idx_users_email_lower ON users(lower(email));
-- Query must use the same expression to use the index:
SELECT * FROM users WHERE lower(email) = lower($1); -- Uses index ✅
SELECT * FROM users WHERE email = $1; -- Doesn't use it ❌ (different expression)
-- Date-based partitioning without partitioned tables
CREATE INDEX idx_events_year_month ON events(
EXTRACT(YEAR FROM created_at)::int,
EXTRACT(MONTH FROM created_at)::int
);
SELECT * FROM events
WHERE EXTRACT(YEAR FROM created_at) = 2027
AND EXTRACT(MONTH FROM created_at) = 4;
-- JSON field extraction
CREATE INDEX idx_metadata_user_id
ON events((metadata->>'userId'));
-- Indexes the text value of the JSON field
SELECT * FROM events WHERE metadata->>'userId' = $1;
-- Computed string for full-text search
CREATE INDEX idx_products_search
ON products USING GIN(
to_tsvector('english', name || ' ' || coalesce(description, ''))
);
-- Query:
SELECT * FROM products
WHERE to_tsvector('english', name || ' ' || coalesce(description, '')) @@ plainto_tsquery('english', $1);
☁️ Is Your Cloud Costing Too Much?
Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.
- AWS, GCP, Azure certified engineers
- Infrastructure as Code (Terraform, CDK)
- Docker, Kubernetes, GitHub Actions CI/CD
- Typical audit recovers $500–$3,000/month in savings
3. Partial Indexes
-- Index only the rows you actually query — much smaller, faster to maintain
-- Active records only (most queries filter deleted_at IS NULL)
CREATE INDEX idx_projects_active
ON projects(workspace_id, updated_at DESC)
WHERE deleted_at IS NULL;
-- Index size: orders of magnitude smaller when most records are deleted
-- Unread notifications (active processing queue)
CREATE INDEX idx_notifications_unread
ON notifications(user_id, created_at DESC)
WHERE read_at IS NULL;
-- This index might be 1% the size of a full index
-- Pending jobs (hot path in job processing)
CREATE INDEX idx_jobs_pending
ON background_jobs(scheduled_at ASC, priority DESC)
WHERE status = 'pending';
-- Only indexes the small fraction of jobs not yet processed
-- Partial unique constraint: one active record per user per type
CREATE UNIQUE INDEX idx_subscriptions_active_unique
ON subscriptions(workspace_id, plan)
WHERE status = 'active' AND cancelled_at IS NULL;
-- Allows multiple cancelled subscriptions (historical records) per workspace
-- Enforces only one active subscription per workspace per plan
-- IMPORTANT: Query WHERE clause must match index WHERE clause
-- This uses the partial index:
SELECT * FROM projects WHERE workspace_id = $1 AND deleted_at IS NULL ORDER BY updated_at DESC;
-- This does NOT use it (missing deleted_at IS NULL):
SELECT * FROM projects WHERE workspace_id = $1 ORDER BY updated_at DESC;
4. Multi-Column Index Column Order
-- Rule: equality filters first, range/sort last
-- Query pattern: WHERE status = ? AND created_at > ? ORDER BY created_at DESC
-- ✅ Correct order: status (equality) before created_at (range + sort)
CREATE INDEX idx_invoices_status_created
ON invoices(status, created_at DESC);
-- ❌ Wrong order: created_at first means status filter can't use the index efficiently
CREATE INDEX idx_invoices_created_status
ON invoices(created_at DESC, status); -- status lookup requires full index scan per date range
-- For workspace-scoped multi-tenant apps: workspace_id ALWAYS goes first
CREATE INDEX idx_projects_ws_status_created
ON projects(workspace_id, status, created_at DESC)
WHERE deleted_at IS NULL;
-- Supports: WHERE workspace_id = ? AND status = ? ORDER BY created_at DESC
-- Also supports: WHERE workspace_id = ? ORDER BY ... (leftmost prefix)
-- Does NOT support efficiently: WHERE status = ? (missing workspace_id)
-- The leading column rule: an index on (A, B, C) supports queries filtering on:
-- A ✅
-- A, B ✅
-- A, B, C ✅
-- B ❌ (no A prefix — not efficient)
-- B, C ❌
-- A, C ✅ but only A is selective; C may need re-check
⚙️ DevOps Done Right — Zero Downtime, Full Automation
Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.
- Staging + production environments with feature flags
- Automated security scanning in the pipeline
- Uptime monitoring + alerting + runbook automation
- On-call support handover docs included
5. Index Bloat Monitoring
-- Detect bloated indexes (dead index entries from UPDATE/DELETE)
-- Requires pgstattuple extension
CREATE EXTENSION IF NOT EXISTS pgstattuple;
SELECT
i.relname AS index_name,
pg_size_pretty(pg_relation_size(i.oid)) AS index_size,
s.leaf_fragmentation AS bloat_pct
FROM
pg_class t
JOIN pg_index x ON t.oid = x.indrelid
JOIN pg_class i ON i.oid = x.indexrelid
CROSS JOIN LATERAL pgstatindex(i.relname) s
WHERE t.relkind = 'r'
AND t.relname NOT LIKE 'pg_%'
AND s.leaf_fragmentation > 30 -- More than 30% bloat
ORDER BY pg_relation_size(i.oid) DESC;
-- Simple bloat estimate (without pgstattuple — no extension needed)
SELECT
schemaname,
tablename,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE pg_relation_size(indexrelid) > 10 * 1024 * 1024 -- Indexes > 10MB
AND idx_scan < 100 -- Barely used
ORDER BY pg_relation_size(indexrelid) DESC;
6. REINDEX Without Downtime
-- REINDEX TABLE blocks all reads and writes — never use on production
-- REINDEX TABLE CONCURRENTLY: builds new index while allowing reads/writes
-- Rebuild a bloated index without locking
REINDEX INDEX CONCURRENTLY idx_invoices_workspace_paid;
-- Rebuild all indexes on a table
REINDEX TABLE CONCURRENTLY invoices;
-- Note: CONCURRENTLY takes longer and temporarily has twice the index storage
-- It also cannot run inside a transaction block
-- Alternative: CREATE INDEX CONCURRENTLY + DROP old index
-- This is safer because you can verify the new index before dropping the old one
CREATE INDEX CONCURRENTLY idx_invoices_workspace_paid_v2
ON invoices(workspace_id, status, paid_at DESC) INCLUDE (amount);
-- Verify it works (check EXPLAIN uses the new index)
EXPLAIN SELECT id, amount FROM invoices WHERE workspace_id = $1 AND status = 'paid';
-- Drop the old bloated index
DROP INDEX CONCURRENTLY idx_invoices_workspace_paid;
-- Rename new to canonical name
ALTER INDEX idx_invoices_workspace_paid_v2 RENAME TO idx_invoices_workspace_paid;
7. Index Usage Statistics
-- Find the most-used indexes (worth protecting from bloat)
SELECT
schemaname,
tablename,
indexname,
idx_scan AS times_used,
pg_size_pretty(pg_relation_size(indexrelid)) AS size,
idx_tup_read AS tuples_read,
idx_tup_fetch AS tuples_fetched
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC
LIMIT 20;
-- Find tables with high seq scan rate (likely missing indexes)
SELECT
relname AS table_name,
seq_scan,
idx_scan,
n_live_tup AS live_rows,
round(seq_scan::numeric / nullif(seq_scan + idx_scan, 0) * 100, 1) AS seq_scan_pct
FROM pg_stat_user_tables
WHERE seq_scan > 0 AND n_live_tup > 10000
ORDER BY seq_scan DESC
LIMIT 20;
-- Check if index-only scans are being used
-- High heap_fetches with low idx_scan suggests visibility map isn't up to date
-- Run VACUUM to fix
SELECT
relname,
indexrelname,
idx_scan,
idx_tup_fetch,
idx_tup_read
FROM pg_stat_user_indexes
WHERE indexrelname LIKE '%workspace%'
ORDER BY idx_scan DESC;
8. Visibility Map and Index-Only Scans
-- Index-only scans require up-to-date visibility map
-- If heap_fetches is high, VACUUM more aggressively
SELECT
relname,
n_dead_tup,
last_autovacuum,
last_vacuum
FROM pg_stat_user_tables
WHERE relname = 'invoices';
-- Force VACUUM to update visibility map
VACUUM (ANALYZE, VERBOSE) invoices;
-- Check visibility map status
SELECT
relname,
all_visible, -- Pages where all rows are visible to all transactions
all_frozen -- Pages with all rows frozen
FROM pg_class c
JOIN pg_visibility_map_summary(c.oid) v ON true
WHERE relname = 'invoices';
-- If all_visible is low, index-only scans will fall back to heap fetches
Indexing Decision Matrix
| Query Pattern | Index Type |
|---|---|
WHERE col = ? | B-tree on col |
WHERE col = ? ORDER BY other DESC | B-tree on (col, other DESC) |
WHERE col IN (?, ?, ?) | B-tree on col |
WHERE col LIKE 'prefix%' | B-tree on col |
WHERE col LIKE '%suffix' | GIN with pg_trgm |
WHERE lower(col) = ? | Expression: lower(col) |
WHERE jsonb_col->>'key' = ? | Expression: (jsonb_col->>'key') |
WHERE col @@ tsquery | GIN on tsvector |
WHERE array_col @> ARRAY[?] | GIN on array |
WHERE status = 'active' (most rows inactive) | Partial WHERE status = 'active' |
ORDER BY col on large table | B-tree on col with sort direction |
WHERE a = ? AND b = ? — select c, d | Covering: (a, b) INCLUDE (c, d) |
Cost and Timeline
Indexing is inexpensive — the cost is analyst time and the disk space for index storage:
| Activity | Time | Storage Overhead |
|---|---|---|
| Add a covering index (CONCURRENTLY) | 5–30 min (build time) | 10–50% of table size |
| Expression index | 5–20 min | Varies |
| Partial index | 1–10 min (only indexes matching rows) | 1–50% of full index |
| REINDEX CONCURRENTLY (bloat fix) | 10 min – 2 hours | 2× index during build |
See Also
- PostgreSQL Performance Tuning
- PostgreSQL EXPLAIN ANALYZE Patterns
- PostgreSQL Partitioning Advanced Patterns
- PostgreSQL Full-Text Search Advanced
- PostgreSQL Schema Design for SaaS
Working With Viprasol
The right index can turn a 30-second query into 30 milliseconds. Our team audits your query patterns using pg_stat_statements and EXPLAIN ANALYZE, identifies missing covering indexes, expression indexes for case-insensitive lookups, and partial indexes for your active-record query patterns — all applied CONCURRENTLY with zero production downtime.
What we deliver:
- Covering index audit: identify queries with high heap_fetches that benefit from INCLUDE
- Expression index setup for lower(email), JSONB fields, and computed columns
- Partial index strategy for soft-deleted records and status-filtered hot paths
- Index bloat scan + REINDEX CONCURRENTLY runbook
- Column ordering review for all composite indexes (equality before range)
Talk to our team about your PostgreSQL indexing strategy →
Or explore our cloud and database services.
About the Author
Viprasol Tech Team
Custom Software Development Specialists
The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.
Need DevOps & Cloud Expertise?
Scale your infrastructure with confidence. AWS, GCP, Azure certified team.
Free consultation • No commitment • Response within 24 hours
Making sense of your data at scale?
Viprasol builds end-to-end big data analytics solutions — ETL pipelines, data warehouses on Snowflake or BigQuery, and self-service BI dashboards. One reliable source of truth for your entire organisation.