PostgreSQL Backup and Restore in 2026: pg_dump, WAL Archiving, PITR, and Restore Testing

Most teams think about backup strategy after their first data loss incident. The correct time to think about it is now, when the cost of getting it wrong is zero. Backups that aren't tested aren't backups — they're a false sense of security. And RDS automated backups, while necessary, aren't sufficient on their own for most SaaS products.

This post covers the full backup strategy: logical backups with pg_dump, WAL-based continuous archiving with pgBackRest, point-in-time recovery, AWS RDS snapshot configuration, and — most importantly — automated restore testing.

Backup Strategy Layers

Layer	Tool	RPO	What It Protects Against
Logical (schema + data)	pg_dump / pg_dumpall	Daily	Accidental DROP TABLE, corruption
Continuous WAL archiving	pgBackRest	Seconds	Server failure, need for PITR
Cloud snapshots	RDS automated backups	5 min	AZ failure, instance corruption
Cross-region copy	S3 replication / RDS copy	RPO + copy time	Region failure
Point-in-time recovery	WAL replay	Any point	"Restore to 10 min ago"

pg_dump: Logical Backups

#!/bin/bash
# scripts/pg-dump-backup.sh
# Run daily via Lambda cron or EC2 cron

set -euo pipefail

DB_HOST="${PGHOST}"
DB_NAME="${PGDATABASE}"
DB_USER="${PGUSER}"
S3_BUCKET="${BACKUP_S3_BUCKET}"
DATE=$(date +%Y-%m-%d-%H%M%S)
BACKUP_FILE="/tmp/backup-${DB_NAME}-${DATE}.dump"

echo "Starting pg_dump backup: ${DB_NAME}"

# Custom format: compressed, parallel restore capable
pg_dump \
  --host="${DB_HOST}" \
  --username="${DB_USER}" \
  --dbname="${DB_NAME}" \
  --format=custom \         # Binary format — faster restore than plain SQL
  --compress=9 \            # Max compression
  --no-password \           # Use PGPASSWORD env var
  --verbose \
  --file="${BACKUP_FILE}"

# Upload to S3 with server-side encryption
aws s3 cp "${BACKUP_FILE}" \
  "s3://${S3_BUCKET}/logical/${DATE:0:7}/${DATE}.dump" \
  --storage-class STANDARD_IA \   # Cheaper for infrequent access
  --sse aws:kms

# Clean up local file
rm -f "${BACKUP_FILE}"

echo "Backup complete: s3://${S3_BUCKET}/logical/${DATE:0:7}/${DATE}.dump"

# Restore from pg_dump backup
pg_restore \
  --host="${RESTORE_HOST}" \
  --username="${DB_USER}" \
  --dbname="${RESTORE_DB}" \
  --format=custom \
  --jobs=4 \          # Parallel restore (use CPU cores)
  --verbose \
  --no-owner \        # Don't restore ownership (useful when restoring to different user)
  /path/to/backup.dump

☁️ Is Your Cloud Costing Too Much?

Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.

AWS, GCP, Azure certified engineers
Infrastructure as Code (Terraform, CDK)
Docker, Kubernetes, GitHub Actions CI/CD
Typical audit recovers $500–$3,000/month in savings

Get a Free Cloud Audit WhatsApp

pgBackRest: Continuous WAL Archiving to S3

pgBackRest is the modern replacement for pg_basebackup + WAL-E. It supports:

Full, differential, and incremental backups
WAL archiving to S3
Parallel backup and restore
Backup verification

# /etc/pgbackrest/pgbackrest.conf

[global]
repo1-type=s3
repo1-s3-bucket=myapp-production-pgbackrest
repo1-s3-region=us-east-1
repo1-s3-key=<aws-access-key>
repo1-s3-key-secret=<aws-secret-key>
repo1-path=/pgbackrest
repo1-retention-full=7           # Keep 7 full backups
repo1-retention-diff=14          # Keep 14 differential backups

# Encryption at rest
repo1-cipher-type=aes-256-cbc
repo1-cipher-pass=your-strong-passphrase

log-level-console=info
log-level-file=detail

[myapp]
pg1-path=/var/lib/postgresql/15/main
pg1-port=5432

-- postgresql.conf changes for WAL archiving
-- archive_mode = on
-- archive_command = 'pgbackrest --stanza=myapp archive-push %p'
-- wal_level = replica   (minimum for WAL archiving)

# Initial setup
pgbackrest --stanza=myapp stanza-create

# First full backup
pgbackrest --stanza=myapp --type=full backup

# Schedule via cron:
# Full backup: weekly
# 0 2 * * 0 pgbackrest --stanza=myapp --type=full backup

# Differential backup: daily
# 0 2 * * 1-6 pgbackrest --stanza=myapp --type=diff backup

# Verify backup integrity
pgbackrest --stanza=myapp verify

Point-In-Time Recovery (PITR)

# Scenario: user deleted all orders at 14:32 UTC
# Restore to 14:30 UTC (2 minutes before)

# Step 1: Stop the current database
systemctl stop postgresql

# Step 2: Restore base backup
pgbackrest --stanza=myapp \
  --type=time \
  --target="2027-03-04 14:30:00+00" \
  --target-action=promote \
  restore

# Step 3: Start PostgreSQL (it will replay WAL up to 14:30)
systemctl start postgresql

# Step 4: Verify data is as expected
psql -c "SELECT COUNT(*) FROM orders WHERE created_at < '2027-03-04 14:30:00';"

# Step 5: Once verified, promote to normal operation
# (already promoted via --target-action=promote)

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.

Staging + production environments with feature flags
Automated security scanning in the pipeline
Uptime monitoring + alerting + runbook automation
On-call support handover docs included

Modernize My DevOps WhatsApp

AWS RDS: Automated Backup Configuration

# terraform/rds.tf

resource "aws_rds_cluster" "main" {
  cluster_identifier      = "${var.name}-${var.environment}"
  engine                  = "aurora-postgresql"
  engine_version          = "16.2"
  database_name           = var.db_name
  master_username         = var.db_username
  manage_master_user_password = true  # Rotate via Secrets Manager

  # Backup configuration
  backup_retention_period   = var.environment == "production" ? 35 : 7
  # 35 days retention in production (max for Aurora)
  # PITR available for any point in this window

  preferred_backup_window   = "03:00-04:00"  # Off-peak UTC
  copy_tags_to_snapshot     = true
  deletion_protection       = var.environment == "production"
  skip_final_snapshot       = var.environment != "production"
  final_snapshot_identifier = "${var.name}-${var.environment}-final-${formatdate("YYYY-MM-DD", timestamp())}"

  db_subnet_group_name   = aws_db_subnet_group.main.name
  vpc_security_group_ids = [aws_security_group.rds.id]

  tags = var.common_tags

  lifecycle {
    ignore_changes = [availability_zones]
  }
}

# Cross-region snapshot copy (for disaster recovery)
resource "aws_db_cluster_snapshot" "cross_region" {
  count                          = var.environment == "production" ? 1 : 0
  db_cluster_identifier          = aws_rds_cluster.main.id
  db_cluster_snapshot_identifier = "${var.name}-${var.environment}-cross-region"
}

RDS PITR Restore

# AWS CLI: restore RDS cluster to a point in time
aws rds restore-db-cluster-to-point-in-time \
  --db-cluster-identifier myapp-production-restored \
  --source-db-cluster-identifier myapp-production \
  --restore-to-time "2027-03-04T14:30:00Z" \
  --db-subnet-group-name myapp-production-subnet-group \
  --vpc-security-group-ids sg-0abc123 \
  --tags Key=Purpose,Value=PITRRestore

# Wait for restore to complete
aws rds wait db-cluster-available \
  --db-cluster-identifier myapp-production-restored

# Connect and verify
psql "postgresql://user:pass@myapp-production-restored.cluster-xxxx.us-east-1.rds.amazonaws.com/myapp"

Automated Restore Testing

A backup that isn't tested isn't a backup. Run this weekly:

// scripts/test-backup-restore.ts
// Run via Lambda cron: every Sunday at 4am UTC

import {
  RDSClient,
  RestoreDBClusterToPointInTimeCommand,
  DeleteDBClusterCommand,
  DescribeDBClustersCommand,
} from "@aws-sdk/client-rds";
import { db } from "@/lib/db"; // Connected to restored cluster

const rds = new RDSClient({ region: process.env.AWS_REGION });

async function testRestoreAndValidate() {
  const testClusterId = `${process.env.SOURCE_CLUSTER_ID}-restore-test-${Date.now()}`;

  try {
    console.log("Step 1: Restoring cluster to 1 hour ago...");
    await rds.send(new RestoreDBClusterToPointInTimeCommand({
      DBClusterIdentifier: testClusterId,
      SourceDBClusterIdentifier: process.env.SOURCE_CLUSTER_ID!,
      RestoreToTime: new Date(Date.now() - 60 * 60 * 1000), // 1 hour ago
      DBSubnetGroupName: process.env.SUBNET_GROUP!,
      VpcSecurityGroupIds: [process.env.SECURITY_GROUP_ID!],
    }));

    // Wait up to 20 minutes for restore
    await waitForCluster(testClusterId, 20 * 60 * 1000);
    console.log("Step 2: Restore complete.");

    // Step 3: Run validation queries against restored cluster
    const endpoint = await getClusterEndpoint(testClusterId);
    const testDb = createTestConnection(endpoint);

    const [userCount, orderCount, recentOrder] = await Promise.all([
      testDb.user.count(),
      testDb.order.count(),
      testDb.order.findFirst({ orderBy: { createdAt: "desc" } }),
    ]);

    console.log(`Validation: ${userCount} users, ${orderCount} orders`);
    console.log(`Most recent order: ${recentOrder?.id} at ${recentOrder?.createdAt}`);

    if (userCount === 0) throw new Error("Restore validation failed: no users found");
    if (orderCount === 0) throw new Error("Restore validation failed: no orders found");

    console.log("✅ Restore test PASSED");

    // Send success notification
    await sendAlert({
      type: "success",
      message: `Weekly restore test passed: ${userCount} users, ${orderCount} orders`,
    });
  } catch (err) {
    console.error("❌ Restore test FAILED:", err);
    await sendAlert({ type: "failure", message: String(err) });
    throw err;
  } finally {
    // Always clean up the test cluster
    console.log("Cleaning up test cluster...");
    await rds.send(new DeleteDBClusterCommand({
      DBClusterIdentifier: testClusterId,
      SkipFinalSnapshot: true,
    }));
    console.log("Test cluster deleted.");
  }
}

async function waitForCluster(clusterId: string, timeoutMs: number) {
  const deadline = Date.now() + timeoutMs;
  while (Date.now() < deadline) {
    const { DBClusters } = await rds.send(new DescribeDBClustersCommand({
      DBClusterIdentifier: clusterId,
    }));
    if (DBClusters?.[0]?.Status === "available") return;
    await new Promise((r) => setTimeout(r, 30_000)); // Poll every 30s
  }
  throw new Error(`Cluster ${clusterId} did not become available within ${timeoutMs}ms`);
}

Backup Monitoring

-- Check WAL archiving status
SELECT
  archived_count,
  last_archived_wal,
  last_archived_time,
  failed_count,
  last_failed_wal,
  last_failed_time
FROM pg_stat_archiver;

-- Alert if failed_count > 0 or last_archived_time > 1 hour ago

Cost Estimate

Backup Type	Storage	Cost/Month
pg_dump (daily, 30 day retention, 10GB compressed)	300 GB S3-IA	~$5
WAL archiving (continuous, 7 day retention)	~50 GB	~$1.15
RDS automated backup (35 day, 100GB DB)	3.5 TB	~$87 (first 100% free)
Cross-region snapshot copy	Same as above	+$87

Working With Viprasol

We design and implement PostgreSQL backup strategies for production SaaS databases — from simple pg_dump schedules through continuous WAL archiving with automated weekly restore testing. Our team has implemented backup systems that have successfully restored production data multiple times.

What we deliver:

Layered backup strategy design (logical + WAL + snapshots)
pgBackRest configuration with S3 archiving
RDS backup retention and cross-region copy configuration
Automated restore testing Lambda (weekly, validates row counts)
CloudWatch alarms for WAL archiving failures and backup age

Explore our cloud infrastructure services or contact us to implement a tested backup strategy.

PostgreSQL Backup and Restore in 2026: pg_dump, WAL Archiving, PITR, and Restore Testing

PostgreSQL Backup and Restore in 2026: pg_dump, WAL Archiving, PITR, and Restore Testing

Backup Strategy Layers

pg_dump: Logical Backups

☁️ Is Your Cloud Costing Too Much?

pgBackRest: Continuous WAL Archiving to S3

Point-In-Time Recovery (PITR)

⚙️ DevOps Done Right — Zero Downtime, Full Automation

AWS RDS: Automated Backup Configuration

RDS PITR Restore

Automated Restore Testing

Backup Monitoring

Cost Estimate

See Also

Working With Viprasol

Viprasol Tech Team

Need DevOps & Cloud Expertise?

Making sense of your data at scale?

Related Articles

PostgreSQL Backup Strategy: pg_dump, WAL Archiving, Point-in-Time Recovery, and RDS Automated Backups

AWS RDS Aurora: Serverless v2, Read Replicas, Failover, and Performance Insights

AWS RDS Read Replicas: Routing, Connection Pooling, Lag Monitoring, and Failover Patterns