PostgreSQL Replication: Streaming Replication, Read Replicas, Logical Replication, and Failover
Set up PostgreSQL replication in 2026 — streaming replication configuration, read replica routing, logical replication for zero-downtime migrations, replication
PostgreSQL Replication: Streaming Replication, Read Replicas, Logical Replication, and Failover
A single PostgreSQL primary is a single point of failure and a single write bottleneck. Replication solves both: read replicas distribute read load, and standby servers provide failover capability. Understanding the types of replication and their tradeoffs is essential for production database architecture.
Replication Types
| Type | What Replicates | Granularity | Use Case |
|---|---|---|---|
| Streaming (physical) | All WAL records — entire cluster | Row-level, byte-for-byte copy | Read replicas, standby failover |
| Logical | Decoded change events (INSERT/UPDATE/DELETE) | Table-level, selective | Cross-version migration, CDC, partial replication |
| Cascading | Replica replicates from another replica | Like streaming | Reduce primary load (many replicas) |
Streaming Replication Setup
# Primary server: postgresql.conf
wal_level = replica # Minimum for replication
max_wal_senders = 10 # Max concurrent replication connections
wal_keep_size = 1GB # Keep 1GB of WAL for replicas to catch up
synchronous_commit = on # Async replicas: 'off' for better write performance
# Sync replicas: 'on' guarantees no data loss on failover
# For synchronous replication (zero data loss, but slower writes):
# synchronous_standby_names = 'FIRST 1 (replica1, replica2)'
# Primary server: pg_hba.conf — allow replication connections
host replication replicator 10.0.0.0/8 scram-sha-256
# On primary: create replication user
psql -c "CREATE USER replicator REPLICATION LOGIN ENCRYPTED PASSWORD 'strong-password';"
# On replica: create base backup from primary
pg_basebackup \
--host=primary-db.internal \
--username=replicator \
--pgdata=/var/lib/postgresql/17/main \
--wal-method=stream \
--checkpoint=fast \
--progress \
--verbose
# Replica server: postgresql.conf
hot_standby = on # Allow reads on replica (essential for read replicas)
primary_conninfo = 'host=primary-db.internal user=replicator password=strong-password application_name=replica1'
primary_slot_name = 'replica1_slot' # Replication slot (primary keeps WAL until replica confirms receipt)
recovery_target_timeline = 'latest'
# Signal the replica to start as standby
touch /var/lib/postgresql/17/main/standby.signal
systemctl start postgresql
☁️ Is Your Cloud Costing Too Much?
Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.
- AWS, GCP, Azure certified engineers
- Infrastructure as Code (Terraform, CDK)
- Docker, Kubernetes, GitHub Actions CI/CD
- Typical audit recovers $500–$3,000/month in savings
Read Replica Routing in Your Application
// lib/db.ts — primary/replica routing with Prisma
import { PrismaClient } from '@prisma/client';
// Separate clients for primary (writes) and replica (reads)
const primaryClient = new PrismaClient({
datasources: { db: { url: process.env.DATABASE_URL } },
log: ['error'],
});
const replicaClient = new PrismaClient({
datasources: { db: { url: process.env.DATABASE_REPLICA_URL } },
log: ['error'],
});
// Wrapper that routes by operation type
export const db = {
// Read operations → replica
users: {
findUnique: (...args: Parameters<typeof primaryClient.users.findUnique>) =>
replicaClient.users.findUnique(...args),
findMany: (...args: Parameters<typeof primaryClient.users.findMany>) =>
replicaClient.users.findMany(...args),
// Write operations → primary
create: (...args: Parameters<typeof primaryClient.users.create>) =>
primaryClient.users.create(...args),
update: (...args: Parameters<typeof primaryClient.users.update>) =>
primaryClient.users.update(...args),
delete: (...args: Parameters<typeof primaryClient.users.delete>) =>
primaryClient.users.delete(...args),
upsert: (...args: Parameters<typeof primaryClient.users.upsert>) =>
primaryClient.users.upsert(...args),
},
// Transactions always go to primary
$transaction: (...args: Parameters<typeof primaryClient.$transaction>) =>
primaryClient.$transaction(...args),
$executeRaw: (...args: Parameters<typeof primaryClient.$executeRaw>) =>
primaryClient.$executeRaw(...args),
};
Important caveat: After a write on the primary, a subsequent read from the replica may not see the new data yet (replication lag). For reads that must be consistent with a just-completed write, route them to the primary:
// After writing: read from primary to avoid stale replica read
async function createOrderAndReturnFresh(data: CreateOrderData): Promise<Order> {
const order = await primaryClient.orders.create({ data });
// Read from PRIMARY (same connection guarantees fresh data)
// If you read from replica, you might get stale data for ~100ms
return primaryClient.orders.findUnique({ where: { id: order.id } })!;
}
Monitoring Replication Lag
-- On primary: check replica lag
SELECT
application_name,
state,
sent_lsn,
write_lsn,
flush_lsn,
replay_lsn,
-- Lag in bytes
(sent_lsn - replay_lsn)::bigint AS replay_lag_bytes,
-- Lag in time
write_lag,
flush_lag,
replay_lag
FROM pg_stat_replication;
-- On replica: check own lag
SELECT
now() - pg_last_xact_replay_timestamp() AS replication_lag_seconds,
pg_is_in_recovery() AS is_replica,
pg_last_wal_receive_lsn() AS received_lsn,
pg_last_wal_replay_lsn() AS replayed_lsn;
// lib/replication-monitor.ts — alert on high lag
export async function checkReplicationLag(): Promise<void> {
const result = await primaryClient.$queryRaw<Array<{ replay_lag_bytes: number; application_name: string }>>`
SELECT application_name, (sent_lsn - replay_lsn)::bigint AS replay_lag_bytes
FROM pg_stat_replication;
`;
for (const replica of result) {
const lagMB = replica.replay_lag_bytes / (1024 * 1024);
if (lagMB > 100) {
await alerting.send({
severity: 'warning',
title: `Replica ${replica.application_name} lag: ${lagMB.toFixed(0)}MB`,
description: 'Replica is falling behind. Check for: slow network, long queries, autovacuum contention.',
});
}
}
}
⚙️ DevOps Done Right — Zero Downtime, Full Automation
Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.
- Staging + production environments with feature flags
- Automated security scanning in the pipeline
- Uptime monitoring + alerting + runbook automation
- On-call support handover docs included
Logical Replication
Logical replication decodes WAL into SQL-level events. Unlike physical replication, it's selective (table-level) and works across major PostgreSQL versions — making it essential for zero-downtime major version upgrades.
-- On source (old version): create publication
CREATE PUBLICATION migration_pub FOR TABLE orders, users, products;
-- On target (new version): create subscription
CREATE SUBSCRIPTION migration_sub
CONNECTION 'host=old-db.internal user=replicator password=xxx dbname=app'
PUBLICATION migration_pub;
-- Monitor subscription progress
SELECT
subname,
pid,
received_lsn,
latest_end_lsn,
latest_end_time
FROM pg_stat_subscription;
Zero-downtime major version migration steps:
1. Spin up new PostgreSQL 17 instance alongside existing PostgreSQL 15
2. Create schema on new instance (migrations only, no data)
3. Start logical replication: new subscribes to old's publication
4. Wait for replication lag to reach < 1 second
5. Maintenance window (seconds):
a. Block writes on old instance (set to read-only)
b. Wait for replication to fully catch up (lag = 0)
c. Promote new instance to primary
d. Update DATABASE_URL in app config
e. Restart app servers
6. Monitor for 1 hour
7. Decommission old instance
Automatic Failover with Patroni
Patroni manages PostgreSQL high availability — detecting primary failure and promoting a replica automatically:
# patroni.yml — on each PostgreSQL node
scope: postgres-cluster
name: pg-node-1
restapi:
listen: 0.0.0.0:8008
connect_address: 10.0.1.1:8008
etcd3:
hosts:
- etcd-1:2379
- etcd-2:2379
- etcd-3:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 30
maximum_lag_on_failover: 10485760 # 10MB — don't promote replica with > 10MB lag
postgresql:
use_pg_rewind: true
parameters:
max_connections: 200
shared_buffers: 4GB
wal_level: replica
postgresql:
listen: 0.0.0.0:5432
connect_address: 10.0.1.1:5432
data_dir: /var/lib/postgresql/17/main
authentication:
replication:
username: replicator
password: strong-password
superuser:
username: postgres
password: postgres-password
# Monitor cluster health
patronictl -c /etc/patroni.yml list
# Member Host Role State TL Lag in MB
# pg-node-1 10.0.1.1:5432 Leader running 3 |
# pg-node-2 10.0.1.2:5432 Replica running 3 0
# pg-node-3 10.0.1.3:5432 Replica running 3 0
# Manual failover (e.g., for maintenance)
patronictl -c /etc/patroni.yml failover postgres-cluster --master pg-node-1 --candidate pg-node-2
RDS Multi-AZ vs Read Replicas
For teams on AWS RDS, the two options:
| RDS Multi-AZ | RDS Read Replica | |
|---|---|---|
| Purpose | High availability (failover) | Read scaling |
| Sync | Synchronous | Asynchronous |
| Failover time | 60–120 seconds | Not automatic |
| Can read from standby? | ❌ (Multi-AZ standby is not readable) | ✅ |
| Cost | 2× instance cost | Additional instance |
| In same region? | Same AZ + standby AZ | Any region |
# terraform/rds.tf
resource "aws_db_instance" "primary" {
identifier = "app-postgres"
engine = "postgres"
engine_version = "17.2"
instance_class = "db.r6g.large"
allocated_storage = 100
multi_az = true # Creates synchronous standby in different AZ
backup_retention_period = 7 # Days of automated backups
deletion_protection = true
}
resource "aws_db_instance" "read_replica" {
identifier = "app-postgres-read"
replicate_source_db = aws_db_instance.primary.id
instance_class = "db.r6g.large"
# No storage needed — derived from source
}
Working With Viprasol
We architect and implement PostgreSQL replication infrastructure — streaming replication, read replica routing, logical replication for zero-downtime migrations, Patroni HA setup, and RDS Multi-AZ configuration.
→ Talk to our team about database reliability and infrastructure.
See Also
- PostgreSQL Performance — PgBouncer, autovacuum, and partitioning
- Database Migrations — zero-downtime schema changes
- Database Indexing — query optimization before scaling out
- Cloud Solutions — managed database infrastructure
- Infrastructure as Code — Terraform for RDS provisioning
About the Author
Viprasol Tech Team
Custom Software Development Specialists
The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.
Need DevOps & Cloud Expertise?
Scale your infrastructure with confidence. AWS, GCP, Azure certified team.
Free consultation • No commitment • Response within 24 hours
Making sense of your data at scale?
Viprasol builds end-to-end big data analytics solutions — ETL pipelines, data warehouses on Snowflake or BigQuery, and self-service BI dashboards. One reliable source of truth for your entire organisation.