PostgreSQL Replication: Streaming Replication, Read Replicas, Logical Replication, and Failover

A single PostgreSQL primary is a single point of failure and a single write bottleneck. Replication solves both: read replicas distribute read load, and standby servers provide failover capability. Understanding the types of replication and their tradeoffs is essential for production database architecture.

Replication Types

Type	What Replicates	Granularity	Use Case
Streaming (physical)	All WAL records — entire cluster	Row-level, byte-for-byte copy	Read replicas, standby failover
Logical	Decoded change events (INSERT/UPDATE/DELETE)	Table-level, selective	Cross-version migration, CDC, partial replication
Cascading	Replica replicates from another replica	Like streaming	Reduce primary load (many replicas)

Streaming Replication Setup

# Primary server: postgresql.conf
wal_level = replica                    # Minimum for replication
max_wal_senders = 10                   # Max concurrent replication connections
wal_keep_size = 1GB                    # Keep 1GB of WAL for replicas to catch up
synchronous_commit = on                # Async replicas: 'off' for better write performance
                                       # Sync replicas: 'on' guarantees no data loss on failover

# For synchronous replication (zero data loss, but slower writes):
# synchronous_standby_names = 'FIRST 1 (replica1, replica2)'

# Primary server: pg_hba.conf — allow replication connections
host  replication  replicator  10.0.0.0/8  scram-sha-256

# On primary: create replication user
psql -c "CREATE USER replicator REPLICATION LOGIN ENCRYPTED PASSWORD 'strong-password';"

# On replica: create base backup from primary
pg_basebackup \
  --host=primary-db.internal \
  --username=replicator \
  --pgdata=/var/lib/postgresql/17/main \
  --wal-method=stream \
  --checkpoint=fast \
  --progress \
  --verbose

# Replica server: postgresql.conf
hot_standby = on                       # Allow reads on replica (essential for read replicas)
primary_conninfo = 'host=primary-db.internal user=replicator password=strong-password application_name=replica1'
primary_slot_name = 'replica1_slot'    # Replication slot (primary keeps WAL until replica confirms receipt)
recovery_target_timeline = 'latest'

# Signal the replica to start as standby
touch /var/lib/postgresql/17/main/standby.signal
systemctl start postgresql

☁️ Is Your Cloud Costing Too Much?

Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.

AWS, GCP, Azure certified engineers
Infrastructure as Code (Terraform, CDK)
Docker, Kubernetes, GitHub Actions CI/CD
Typical audit recovers $500–$3,000/month in savings

Get a Free Cloud Audit WhatsApp

Read Replica Routing in Your Application

// lib/db.ts — primary/replica routing with Prisma
import { PrismaClient } from '@prisma/client';

// Separate clients for primary (writes) and replica (reads)
const primaryClient = new PrismaClient({
  datasources: { db: { url: process.env.DATABASE_URL } },
  log: ['error'],
});

const replicaClient = new PrismaClient({
  datasources: { db: { url: process.env.DATABASE_REPLICA_URL } },
  log: ['error'],
});

// Wrapper that routes by operation type
export const db = {
  // Read operations → replica
  users: {
    findUnique: (...args: Parameters<typeof primaryClient.users.findUnique>) =>
      replicaClient.users.findUnique(...args),
    findMany: (...args: Parameters<typeof primaryClient.users.findMany>) =>
      replicaClient.users.findMany(...args),
    // Write operations → primary
    create: (...args: Parameters<typeof primaryClient.users.create>) =>
      primaryClient.users.create(...args),
    update: (...args: Parameters<typeof primaryClient.users.update>) =>
      primaryClient.users.update(...args),
    delete: (...args: Parameters<typeof primaryClient.users.delete>) =>
      primaryClient.users.delete(...args),
    upsert: (...args: Parameters<typeof primaryClient.users.upsert>) =>
      primaryClient.users.upsert(...args),
  },

  // Transactions always go to primary
  $transaction: (...args: Parameters<typeof primaryClient.$transaction>) =>
    primaryClient.$transaction(...args),

  $executeRaw: (...args: Parameters<typeof primaryClient.$executeRaw>) =>
    primaryClient.$executeRaw(...args),
};

Important caveat: After a write on the primary, a subsequent read from the replica may not see the new data yet (replication lag). For reads that must be consistent with a just-completed write, route them to the primary:

// After writing: read from primary to avoid stale replica read
async function createOrderAndReturnFresh(data: CreateOrderData): Promise<Order> {
  const order = await primaryClient.orders.create({ data });

  // Read from PRIMARY (same connection guarantees fresh data)
  // If you read from replica, you might get stale data for ~100ms
  return primaryClient.orders.findUnique({ where: { id: order.id } })!;
}

Monitoring Replication Lag

-- On primary: check replica lag
SELECT
  application_name,
  state,
  sent_lsn,
  write_lsn,
  flush_lsn,
  replay_lsn,
  -- Lag in bytes
  (sent_lsn - replay_lsn)::bigint AS replay_lag_bytes,
  -- Lag in time
  write_lag,
  flush_lag,
  replay_lag
FROM pg_stat_replication;

-- On replica: check own lag
SELECT
  now() - pg_last_xact_replay_timestamp() AS replication_lag_seconds,
  pg_is_in_recovery() AS is_replica,
  pg_last_wal_receive_lsn() AS received_lsn,
  pg_last_wal_replay_lsn() AS replayed_lsn;

// lib/replication-monitor.ts — alert on high lag
export async function checkReplicationLag(): Promise<void> {
  const result = await primaryClient.$queryRaw<Array<{ replay_lag_bytes: number; application_name: string }>>`
    SELECT application_name, (sent_lsn - replay_lsn)::bigint AS replay_lag_bytes
    FROM pg_stat_replication;
  `;

  for (const replica of result) {
    const lagMB = replica.replay_lag_bytes / (1024 * 1024);

    if (lagMB > 100) {
      await alerting.send({
        severity: 'warning',
        title: `Replica ${replica.application_name} lag: ${lagMB.toFixed(0)}MB`,
        description: 'Replica is falling behind. Check for: slow network, long queries, autovacuum contention.',
      });
    }
  }
}

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.

Staging + production environments with feature flags
Automated security scanning in the pipeline
Uptime monitoring + alerting + runbook automation
On-call support handover docs included

Modernize My DevOps WhatsApp

Logical Replication

Logical replication decodes WAL into SQL-level events. Unlike physical replication, it's selective (table-level) and works across major PostgreSQL versions — making it essential for zero-downtime major version upgrades.

-- On source (old version): create publication
CREATE PUBLICATION migration_pub FOR TABLE orders, users, products;

-- On target (new version): create subscription
CREATE SUBSCRIPTION migration_sub
  CONNECTION 'host=old-db.internal user=replicator password=xxx dbname=app'
  PUBLICATION migration_pub;

-- Monitor subscription progress
SELECT
  subname,
  pid,
  received_lsn,
  latest_end_lsn,
  latest_end_time
FROM pg_stat_subscription;

Zero-downtime major version migration steps:

1. Spin up new PostgreSQL 17 instance alongside existing PostgreSQL 15
2. Create schema on new instance (migrations only, no data)
3. Start logical replication: new subscribes to old's publication
4. Wait for replication lag to reach < 1 second
5. Maintenance window (seconds):
   a. Block writes on old instance (set to read-only)
   b. Wait for replication to fully catch up (lag = 0)
   c. Promote new instance to primary
   d. Update DATABASE_URL in app config
   e. Restart app servers
6. Monitor for 1 hour
7. Decommission old instance

Automatic Failover with Patroni

Patroni manages PostgreSQL high availability — detecting primary failure and promoting a replica automatically:

# patroni.yml — on each PostgreSQL node
scope: postgres-cluster
name: pg-node-1

restapi:
  listen: 0.0.0.0:8008
  connect_address: 10.0.1.1:8008

etcd3:
  hosts:
    - etcd-1:2379
    - etcd-2:2379
    - etcd-3:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 30
    maximum_lag_on_failover: 10485760  # 10MB — don't promote replica with > 10MB lag
    postgresql:
      use_pg_rewind: true
      parameters:
        max_connections: 200
        shared_buffers: 4GB
        wal_level: replica

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 10.0.1.1:5432
  data_dir: /var/lib/postgresql/17/main
  authentication:
    replication:
      username: replicator
      password: strong-password
    superuser:
      username: postgres
      password: postgres-password

# Monitor cluster health
patronictl -c /etc/patroni.yml list
# Member      Host         Role    State   TL  Lag in MB
# pg-node-1   10.0.1.1:5432  Leader  running  3  |
# pg-node-2   10.0.1.2:5432  Replica running  3  0
# pg-node-3   10.0.1.3:5432  Replica running  3  0

# Manual failover (e.g., for maintenance)
patronictl -c /etc/patroni.yml failover postgres-cluster --master pg-node-1 --candidate pg-node-2

RDS Multi-AZ vs Read Replicas

For teams on AWS RDS, the two options:

	RDS Multi-AZ	RDS Read Replica
Purpose	High availability (failover)	Read scaling
Sync	Synchronous	Asynchronous
Failover time	60–120 seconds	Not automatic
Can read from standby?	❌ (Multi-AZ standby is not readable)	✅
Cost	2× instance cost	Additional instance
In same region?	Same AZ + standby AZ	Any region

# terraform/rds.tf
resource "aws_db_instance" "primary" {
  identifier        = "app-postgres"
  engine            = "postgres"
  engine_version    = "17.2"
  instance_class    = "db.r6g.large"
  allocated_storage = 100

  multi_az = true  # Creates synchronous standby in different AZ

  backup_retention_period = 7  # Days of automated backups
  deletion_protection     = true
}

resource "aws_db_instance" "read_replica" {
  identifier          = "app-postgres-read"
  replicate_source_db = aws_db_instance.primary.id
  instance_class      = "db.r6g.large"
  # No storage needed — derived from source
}

Working With Viprasol

We architect and implement PostgreSQL replication infrastructure — streaming replication, read replica routing, logical replication for zero-downtime migrations, Patroni HA setup, and RDS Multi-AZ configuration.

→ Talk to our team about database reliability and infrastructure.

PostgreSQL Replication: Streaming Replication, Read Replicas, Logical Replication, and Failover

PostgreSQL Replication: Streaming Replication, Read Replicas, Logical Replication, and Failover

Replication Types

Streaming Replication Setup

☁️ Is Your Cloud Costing Too Much?

Read Replica Routing in Your Application

Monitoring Replication Lag

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Logical Replication

Automatic Failover with Patroni

RDS Multi-AZ vs Read Replicas

Working With Viprasol

See Also

Viprasol Tech Team

Need DevOps & Cloud Expertise?

Making sense of your data at scale?

Related Articles

PostgreSQL Performance Tuning: EXPLAIN ANALYZE, Index Types, Vacuum, and pg_stat

Zero-Downtime Database Migrations: Expand-Contract, Shadow Tables, and Safe Deploys

PostgreSQL EXPLAIN ANALYZE in 2026: Reading Query Plans, Identifying Slow Queries, and Index Fixes