PostgreSQL Performance Tuning: EXPLAIN ANALYZE, Index Types, Vacuum, and pg_stat
Deep-dive PostgreSQL performance tuning guide. Read EXPLAIN ANALYZE output, choose the right index type (B-tree, GIN, BRIN, partial), configure autovacuum, and use pg_stat views to find bottlenecks.
PostgreSQL performance problems follow a predictable pattern: the application works fine at launch, slows down as data grows, and becomes a crisis when the team has no idea where to look. The good news is that PostgreSQL's tooling for diagnosing problems is excellent — EXPLAIN ANALYZE, pg_stat_statements, pg_stat_user_tables, and pg_stat_user_indexes give you everything you need to find and fix bottlenecks systematically.
This guide covers reading query plans, choosing indexes, and tuning autovacuum — the three areas that solve 90% of PostgreSQL performance problems.
Reading EXPLAIN ANALYZE Output
EXPLAIN shows the query plan. EXPLAIN ANALYZE executes the query and shows actual vs. estimated row counts and timing.
-- Always use ANALYZE + BUFFERS together
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT
u.id,
u.email,
COUNT(o.id) AS order_count,
SUM(o.total_amount) AS total_spent
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2027-01-01'
AND u.status = 'active'
GROUP BY u.id, u.email
ORDER BY total_spent DESC
LIMIT 20;
Sample output:
Limit (cost=4521.32..4521.37 rows=20 width=52)
(actual time=312.455..312.461 rows=20 loops=1)
Buffers: shared hit=1842 read=3201
-> Sort (cost=4521.32..4534.82 rows=5400 width=52)
(actual time=312.453..312.455 rows=20 loops=1)
Sort Key: (sum(o.total_amount)) DESC
Sort Method: top-N heapsort Memory: 27kB
Buffers: shared hit=1842 read=3201
-> HashAggregate (cost=4194.00..4302.00 rows=5400 width=52)
(actual time=308.211..311.994 rows=5382 loops=1)
Group Key: u.id
Batches: 1 Memory Usage: 1553kB
Buffers: shared hit=1842 read=3201
-> Hash Left Join (cost=1823.00..3897.00 rows=59400 width=36)
(actual time=45.122..289.452 rows=58941 loops=1)
Hash Cond: (o.user_id = u.id)
Buffers: shared hit=1842 read=3201
-> Seq Scan on orders o (cost=0.00..1823.00 rows=82300 width=16)
(actual time=0.021..89.344 rows=82300 loops=1)
Buffers: shared hit=842 read=1201
-> Hash (cost=1598.00..1598.00 rows=18000 width=28)
(actual time=44.918..44.919 rows=18000 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 1130kB
Buffers: shared hit=1000 read=2000
-> Seq Scan on users u (cost=0.00..1598.00 rows=18000 width=28)
(actual time=0.015..39.213 rows=18000 loops=1)
Filter: ((status = 'active') AND (created_at > '2027-01-01'))
Rows Removed by Filter: 27000
Buffers: shared hit=1000 read=2000
Planning Time: 1.234 ms
Execution Time: 312.761 ms
What to look for:
| Signal | Meaning | Action |
|---|---|---|
Seq Scan on large table | No usable index | Add index on filter columns |
| Rows Removed by Filter: high | Index exists but not selective enough | Partial index or composite index |
cost=X..Y estimated vs actual time large gap | Stale statistics | ANALYZE tablename |
Buffers: read=N (not hit=N) | Cache misses — reading from disk | Increase shared_buffers, or add index to reduce rows read |
loops=N on inner plan | Nested loop with many iterations | Switch to hash join, add index on join key |
Sort Method: external merge Disk | Sort spilling to disk | Increase work_mem for this session |
Index Types
B-tree (Default)
B-tree indexes support =, <, >, <=, >=, BETWEEN, IN, IS NULL, LIKE 'prefix%'.
-- Standard B-tree
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- Composite B-tree — column order matters!
-- Put equality columns first, range column last
CREATE INDEX idx_orders_user_status_created ON orders(user_id, status, created_at DESC);
-- Supports: WHERE user_id = X AND status = Y AND created_at > Z
-- Does NOT support: WHERE status = Y (without user_id)
-- Covering index (INCLUDE avoids table heap fetch)
CREATE INDEX idx_orders_user_covering
ON orders(user_id)
INCLUDE (status, total_amount, created_at);
-- SELECT status, total_amount, created_at WHERE user_id = X
-- can be satisfied from index alone (index-only scan)
Partial Index
Index a subset of rows — much smaller, faster to maintain:
-- Only index active subscriptions (most queries filter on active)
CREATE INDEX idx_subscriptions_active
ON subscriptions(workspace_id, plan)
WHERE status = 'active';
-- Only index recent orders
CREATE INDEX idx_orders_recent
ON orders(user_id, created_at DESC)
WHERE created_at > '2026-01-01';
-- Only index pending jobs
CREATE INDEX idx_jobs_pending
ON background_jobs(created_at, priority)
WHERE status = 'pending';
-- Tiny index: only unprocessed rows, not millions of completed ones
GIN Index (Generalized Inverted Index)
For full-text search, JSONB, and arrays:
-- Full-text search
CREATE INDEX idx_articles_fts ON articles
USING GIN(to_tsvector('english', title || ' ' || body));
-- JSONB containment (@>) and existence (?)
CREATE INDEX idx_users_metadata ON users USING GIN(metadata);
-- Supports: WHERE metadata @> '{"plan": "pro"}'
-- Supports: WHERE metadata ? 'custom_field'
-- Array containment
CREATE INDEX idx_posts_tags ON posts USING GIN(tags);
-- Supports: WHERE tags @> ARRAY['postgresql']
-- Supports: WHERE tags && ARRAY['postgresql', 'mysql']
-- pg_trgm for ILIKE and fuzzy search
CREATE EXTENSION pg_trgm;
CREATE INDEX idx_users_email_trgm ON users USING GIN(email gin_trgm_ops);
-- Supports: WHERE email ILIKE '%smith%'
-- Supports: similarity(email, 'johm') > 0.3
BRIN Index (Block Range INdex)
For naturally-ordered append-only data (timestamps, serial IDs):
-- 128× smaller than B-tree for time-series data
-- Good for: WHERE created_at > '2027-01-01' on a 100M row table
CREATE INDEX idx_events_created_brin
ON events USING BRIN(created_at)
WITH (pages_per_range = 128);
-- BRIN works because new rows always have later timestamps
-- Row order in heap must correlate with index column
-- NOT suitable for random UUIDs or frequently updated columns
Expression Index
Index the result of an expression:
-- Case-insensitive email lookup
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Supports: WHERE LOWER(email) = LOWER($1)
-- Date part extraction
CREATE INDEX idx_events_year_month ON events(
EXTRACT(YEAR FROM created_at),
EXTRACT(MONTH FROM created_at)
);
-- Supports: WHERE EXTRACT(YEAR FROM created_at) = 2027
-- Computed status priority
CREATE INDEX idx_jobs_priority ON jobs(
CASE status
WHEN 'critical' THEN 1
WHEN 'high' THEN 2
WHEN 'normal' THEN 3
ELSE 4
END
);
☁️ Is Your Cloud Costing Too Much?
Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.
- AWS, GCP, Azure certified engineers
- Infrastructure as Code (Terraform, CDK)
- Docker, Kubernetes, GitHub Actions CI/CD
- Typical audit recovers $500–$3,000/month in savings
Finding Missing Indexes
-- Sequential scans on large tables (missing index candidates)
SELECT
schemaname,
relname AS table_name,
seq_scan,
seq_tup_read,
idx_scan,
seq_tup_read / NULLIF(seq_scan, 0) AS avg_rows_per_seq_scan,
pg_size_pretty(pg_total_relation_size(relid)) AS table_size
FROM pg_stat_user_tables
WHERE seq_scan > 100
AND seq_tup_read > 100000
ORDER BY seq_tup_read DESC
LIMIT 20;
-- Unused indexes (wasting write overhead and storage)
SELECT
schemaname,
tablename,
indexname,
idx_scan AS scans,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_relation_size(indexrelid) DESC;
-- Duplicate indexes (same column set, different names)
SELECT
array_agg(indexname) AS duplicate_indexes,
tablename,
array_agg(indexdef) AS definitions
FROM pg_indexes
WHERE schemaname = 'public'
GROUP BY tablename, indexdef HAVING COUNT(*) > 1;
pg_stat_statements: Finding Slow Queries
-- Enable pg_stat_statements (add to postgresql.conf)
-- shared_preload_libraries = 'pg_stat_statements'
-- pg_stat_statements.max = 10000
-- pg_stat_statements.track = all
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Top 20 slowest queries by total time
SELECT
round(total_exec_time::numeric, 2) AS total_ms,
round(mean_exec_time::numeric, 2) AS mean_ms,
round(stddev_exec_time::numeric, 2) AS stddev_ms,
calls,
round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS pct_of_total,
LEFT(query, 120) AS query_preview
FROM pg_stat_statements
WHERE calls > 10
ORDER BY total_exec_time DESC
LIMIT 20;
-- Queries with highest row reads (table scan candidates)
SELECT
calls,
round(mean_exec_time::numeric, 2) AS mean_ms,
rows / NULLIF(calls, 0) AS avg_rows,
shared_blks_read / NULLIF(calls, 0) AS avg_blocks_read,
LEFT(query, 120) AS query
FROM pg_stat_statements
WHERE calls > 10
ORDER BY shared_blks_read DESC
LIMIT 20;
-- Cache hit ratio per query (low = too many disk reads)
SELECT
round(mean_exec_time::numeric, 2) AS mean_ms,
calls,
round(
100.0 * shared_blks_hit /
NULLIF(shared_blks_hit + shared_blks_read, 0),
2
) AS cache_hit_pct,
LEFT(query, 120) AS query
FROM pg_stat_statements
WHERE calls > 10
AND (shared_blks_hit + shared_blks_read) > 0
ORDER BY cache_hit_pct ASC
LIMIT 20;
-- Reset statistics after analysis
SELECT pg_stat_statements_reset();
⚙️ DevOps Done Right — Zero Downtime, Full Automation
Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.
- Staging + production environments with feature flags
- Automated security scanning in the pipeline
- Uptime monitoring + alerting + runbook automation
- On-call support handover docs included
Autovacuum Tuning
Dead tuples accumulate from UPDATEs and DELETEs. Autovacuum reclaims space and updates statistics, but default settings are optimized for small tables:
-- Check table bloat and autovacuum stats
SELECT
schemaname,
relname AS table_name,
n_live_tup,
n_dead_tup,
round(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 1) AS dead_pct,
last_vacuum,
last_autovacuum,
last_analyze,
last_autoanalyze,
autovacuum_count,
pg_size_pretty(pg_total_relation_size(relid)) AS total_size
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC
LIMIT 20;
-- Table-level autovacuum tuning for high-churn tables
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.01, -- vacuum when 1% is dead (default: 20%)
autovacuum_analyze_scale_factor = 0.005, -- analyze when 0.5% changed (default: 10%)
autovacuum_vacuum_cost_delay = 2, -- ms between work cost units (default: 2)
autovacuum_vacuum_cost_limit = 400 -- work per cycle (default: 200)
);
-- For write-heavy tables, reduce cost delay further
ALTER TABLE events SET (
autovacuum_vacuum_scale_factor = 0.02,
autovacuum_vacuum_cost_delay = 0, -- no throttling for urgent vacuums
autovacuum_vacuum_cost_limit = 800
);
-- Manual vacuum (non-blocking) for an urgent case
VACUUM (ANALYZE, VERBOSE) orders;
-- VACUUM FULL reclaims space but takes full table lock — use sparingly
-- Only run in maintenance window with no active queries
VACUUM FULL orders;
postgresql.conf Autovacuum Settings
# postgresql.conf
# Global autovacuum settings (override per-table for high-churn tables)
autovacuum = on
autovacuum_max_workers = 5 # Default 3 — increase for many tables
autovacuum_naptime = 15s # How often autovacuum checks (default: 1min)
autovacuum_vacuum_threshold = 50 # Min dead tuples before vacuum
autovacuum_vacuum_scale_factor = 0.1 # % of table (default: 0.2 = 20%)
autovacuum_analyze_threshold = 50
autovacuum_analyze_scale_factor = 0.05 # (default: 0.1 = 10%)
autovacuum_vacuum_cost_delay = 2ms # Throttle to avoid I/O saturation
autovacuum_vacuum_cost_limit = 400 # Work units per cost delay cycle (default: 200)
# Table statistics — affects query planning
default_statistics_target = 200 # Default 100 — increase for complex queries
Memory Configuration
# postgresql.conf — key memory settings
# Shared buffer pool (data cache)
# Rule: 25% of RAM (not more than 40%)
shared_buffers = 4GB # For 16GB RAM server
# Work memory per sort/hash operation
# WARNING: each sort/hash uses up to this much
# Rule: (RAM - shared_buffers) / max_connections / 2
work_mem = 64MB # For low-concurrency workloads
# Reduce to 4-16MB for high-concurrency
# Maintenance operations (VACUUM, CREATE INDEX)
maintenance_work_mem = 512MB
# OS cache estimate (PostgreSQL does not control this)
# Helps planner decide between index scan vs seq scan
effective_cache_size = 12GB # ~75% of RAM
# WAL buffering (write performance)
wal_buffers = 64MB
# Checkpoint behavior (trade durability for write performance)
checkpoint_completion_target = 0.9
max_wal_size = 4GB
min_wal_size = 1GB
Connection Pool Sizing
-- Current connection usage
SELECT
count(*) FILTER (WHERE state = 'active') AS active,
count(*) FILTER (WHERE state = 'idle') AS idle,
count(*) FILTER (WHERE state = 'idle in transaction') AS idle_in_transaction,
count(*) AS total,
(SELECT setting::int FROM pg_settings WHERE name = 'max_connections') AS max_connections,
round(100.0 * count(*) / (SELECT setting::int FROM pg_settings WHERE name = 'max_connections'), 1) AS pct_used
FROM pg_stat_activity
WHERE backend_type = 'client backend';
-- Queries running longer than 30 seconds
SELECT
pid,
now() - query_start AS duration,
state,
LEFT(query, 200) AS query
FROM pg_stat_activity
WHERE state != 'idle'
AND query_start < now() - INTERVAL '30 seconds'
AND backend_type = 'client backend'
ORDER BY duration DESC;
-- Kill a long-running query (use carefully)
SELECT pg_cancel_backend(pid); -- sends SIGINT (graceful)
SELECT pg_terminate_backend(pid); -- sends SIGTERM (force)
Monitoring Query: Database Health Dashboard
-- One-stop health check view
SELECT
'Cache Hit Ratio' AS metric,
round(
100.0 * sum(blks_hit) / NULLIF(sum(blks_hit + blks_read), 0),
2
)::text || '%' AS value,
CASE
WHEN 100.0 * sum(blks_hit) / NULLIF(sum(blks_hit + blks_read), 0) >= 99 THEN '✅'
WHEN 100.0 * sum(blks_hit) / NULLIF(sum(blks_hit + blks_read), 0) >= 95 THEN '⚠️'
ELSE '❌'
END AS status
FROM pg_stat_database
WHERE datname = current_database()
UNION ALL
SELECT
'Total Dead Tuples',
to_char(sum(n_dead_tup), 'FM999,999,999'),
CASE WHEN sum(n_dead_tup) < 1000000 THEN '✅' ELSE '⚠️' END
FROM pg_stat_user_tables
UNION ALL
SELECT
'Index Hit Ratio',
round(
100.0 * sum(idx_blks_hit) / NULLIF(sum(idx_blks_hit + idx_blks_read), 0),
2
)::text || '%',
CASE
WHEN 100.0 * sum(idx_blks_hit) / NULLIF(sum(idx_blks_hit + idx_blks_read), 0) >= 99 THEN '✅'
ELSE '⚠️'
END
FROM pg_statio_user_indexes
UNION ALL
SELECT
'Tables Needing Vacuum',
count(*)::text,
CASE WHEN count(*) = 0 THEN '✅' ELSE '⚠️' END
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
AND (last_autovacuum < now() - INTERVAL '1 hour' OR last_autovacuum IS NULL);
Cost and Timeline Estimates
| Engagement Type | Team | Duration | Cost Range |
|---|---|---|---|
| Performance audit (find top 10 bottlenecks) | 1 DBA | 1–2 days | $500–1,500 |
| Index strategy implementation | 1 DBA/dev | 1 week | $2,000–4,000 |
| Full tuning (indexes + autovacuum + memory + partitioning) | 1–2 engineers | 2–3 weeks | $6,000–14,000 |
| Ongoing DBA retainer | 1 DBA | Monthly | $2,000–6,000/month |
See Also
- PostgreSQL EXPLAIN ANALYZE Deep Dive
- PostgreSQL Partitioning for Large Tables
- PostgreSQL Materialized Views for Analytics
- PostgreSQL Window Functions for Analytics
- PostgreSQL Connection Pooling with PgBouncer
Working With Viprasol
PostgreSQL performance problems rarely announce themselves clearly — a query that runs fine at 100K rows falls apart at 10M. Our team has tuned PostgreSQL databases from startup scale to hundreds of millions of rows, using the same systematic approach: pg_stat_statements to find the queries, EXPLAIN ANALYZE to understand the plans, and targeted indexes and autovacuum configuration to fix them.
What we deliver:
- Query plan analysis and index recommendations
- Autovacuum and memory configuration tuned to your workload
- pg_stat_statements setup with alerting on slow query regressions
- Partitioning strategy for large tables
- Written optimization report with before/after benchmarks
Talk to our team about PostgreSQL performance →
Or explore our cloud infrastructure services.
About the Author
Viprasol Tech Team
Custom Software Development Specialists
The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.
Need DevOps & Cloud Expertise?
Scale your infrastructure with confidence. AWS, GCP, Azure certified team.
Free consultation • No commitment • Response within 24 hours
Making sense of your data at scale?
Viprasol builds end-to-end big data analytics solutions — ETL pipelines, data warehouses on Snowflake or BigQuery, and self-service BI dashboards. One reliable source of truth for your entire organisation.