PostgreSQL Backup and Restore in 2026: pg_dump, WAL Archiving, PITR, and Restore Testing
Production PostgreSQL backup strategy: pg_dump for logical backups, WAL archiving with pgBackRest, point-in-time recovery, RDS automated backups, and restore testing automation.
PostgreSQL Backup and Restore in 2026: pg_dump, WAL Archiving, PITR, and Restore Testing
Most teams think about backup strategy after their first data loss incident. The correct time to think about it is now, when the cost of getting it wrong is zero. Backups that aren't tested aren't backups — they're a false sense of security. And RDS automated backups, while necessary, aren't sufficient on their own for most SaaS products.
This post covers the full backup strategy: logical backups with pg_dump, WAL-based continuous archiving with pgBackRest, point-in-time recovery, AWS RDS snapshot configuration, and — most importantly — automated restore testing.
Backup Strategy Layers
| Layer | Tool | RPO | What It Protects Against |
|---|---|---|---|
| Logical (schema + data) | pg_dump / pg_dumpall | Daily | Accidental DROP TABLE, corruption |
| Continuous WAL archiving | pgBackRest | Seconds | Server failure, need for PITR |
| Cloud snapshots | RDS automated backups | 5 min | AZ failure, instance corruption |
| Cross-region copy | S3 replication / RDS copy | RPO + copy time | Region failure |
| Point-in-time recovery | WAL replay | Any point | "Restore to 10 min ago" |
pg_dump: Logical Backups
#!/bin/bash
# scripts/pg-dump-backup.sh
# Run daily via Lambda cron or EC2 cron
set -euo pipefail
DB_HOST="${PGHOST}"
DB_NAME="${PGDATABASE}"
DB_USER="${PGUSER}"
S3_BUCKET="${BACKUP_S3_BUCKET}"
DATE=$(date +%Y-%m-%d-%H%M%S)
BACKUP_FILE="/tmp/backup-${DB_NAME}-${DATE}.dump"
echo "Starting pg_dump backup: ${DB_NAME}"
# Custom format: compressed, parallel restore capable
pg_dump \
--host="${DB_HOST}" \
--username="${DB_USER}" \
--dbname="${DB_NAME}" \
--format=custom \ # Binary format — faster restore than plain SQL
--compress=9 \ # Max compression
--no-password \ # Use PGPASSWORD env var
--verbose \
--file="${BACKUP_FILE}"
# Upload to S3 with server-side encryption
aws s3 cp "${BACKUP_FILE}" \
"s3://${S3_BUCKET}/logical/${DATE:0:7}/${DATE}.dump" \
--storage-class STANDARD_IA \ # Cheaper for infrequent access
--sse aws:kms
# Clean up local file
rm -f "${BACKUP_FILE}"
echo "Backup complete: s3://${S3_BUCKET}/logical/${DATE:0:7}/${DATE}.dump"
# Restore from pg_dump backup
pg_restore \
--host="${RESTORE_HOST}" \
--username="${DB_USER}" \
--dbname="${RESTORE_DB}" \
--format=custom \
--jobs=4 \ # Parallel restore (use CPU cores)
--verbose \
--no-owner \ # Don't restore ownership (useful when restoring to different user)
/path/to/backup.dump
☁️ Is Your Cloud Costing Too Much?
Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.
- AWS, GCP, Azure certified engineers
- Infrastructure as Code (Terraform, CDK)
- Docker, Kubernetes, GitHub Actions CI/CD
- Typical audit recovers $500–$3,000/month in savings
pgBackRest: Continuous WAL Archiving to S3
pgBackRest is the modern replacement for pg_basebackup + WAL-E. It supports:
- Full, differential, and incremental backups
- WAL archiving to S3
- Parallel backup and restore
- Backup verification
# /etc/pgbackrest/pgbackrest.conf
[global]
repo1-type=s3
repo1-s3-bucket=myapp-production-pgbackrest
repo1-s3-region=us-east-1
repo1-s3-key=<aws-access-key>
repo1-s3-key-secret=<aws-secret-key>
repo1-path=/pgbackrest
repo1-retention-full=7 # Keep 7 full backups
repo1-retention-diff=14 # Keep 14 differential backups
# Encryption at rest
repo1-cipher-type=aes-256-cbc
repo1-cipher-pass=your-strong-passphrase
log-level-console=info
log-level-file=detail
[myapp]
pg1-path=/var/lib/postgresql/15/main
pg1-port=5432
-- postgresql.conf changes for WAL archiving
-- archive_mode = on
-- archive_command = 'pgbackrest --stanza=myapp archive-push %p'
-- wal_level = replica (minimum for WAL archiving)
# Initial setup
pgbackrest --stanza=myapp stanza-create
# First full backup
pgbackrest --stanza=myapp --type=full backup
# Schedule via cron:
# Full backup: weekly
# 0 2 * * 0 pgbackrest --stanza=myapp --type=full backup
# Differential backup: daily
# 0 2 * * 1-6 pgbackrest --stanza=myapp --type=diff backup
# Verify backup integrity
pgbackrest --stanza=myapp verify
Point-In-Time Recovery (PITR)
# Scenario: user deleted all orders at 14:32 UTC
# Restore to 14:30 UTC (2 minutes before)
# Step 1: Stop the current database
systemctl stop postgresql
# Step 2: Restore base backup
pgbackrest --stanza=myapp \
--type=time \
--target="2027-03-04 14:30:00+00" \
--target-action=promote \
restore
# Step 3: Start PostgreSQL (it will replay WAL up to 14:30)
systemctl start postgresql
# Step 4: Verify data is as expected
psql -c "SELECT COUNT(*) FROM orders WHERE created_at < '2027-03-04 14:30:00';"
# Step 5: Once verified, promote to normal operation
# (already promoted via --target-action=promote)
⚙️ DevOps Done Right — Zero Downtime, Full Automation
Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.
- Staging + production environments with feature flags
- Automated security scanning in the pipeline
- Uptime monitoring + alerting + runbook automation
- On-call support handover docs included
AWS RDS: Automated Backup Configuration
# terraform/rds.tf
resource "aws_rds_cluster" "main" {
cluster_identifier = "${var.name}-${var.environment}"
engine = "aurora-postgresql"
engine_version = "16.2"
database_name = var.db_name
master_username = var.db_username
manage_master_user_password = true # Rotate via Secrets Manager
# Backup configuration
backup_retention_period = var.environment == "production" ? 35 : 7
# 35 days retention in production (max for Aurora)
# PITR available for any point in this window
preferred_backup_window = "03:00-04:00" # Off-peak UTC
copy_tags_to_snapshot = true
deletion_protection = var.environment == "production"
skip_final_snapshot = var.environment != "production"
final_snapshot_identifier = "${var.name}-${var.environment}-final-${formatdate("YYYY-MM-DD", timestamp())}"
db_subnet_group_name = aws_db_subnet_group.main.name
vpc_security_group_ids = [aws_security_group.rds.id]
tags = var.common_tags
lifecycle {
ignore_changes = [availability_zones]
}
}
# Cross-region snapshot copy (for disaster recovery)
resource "aws_db_cluster_snapshot" "cross_region" {
count = var.environment == "production" ? 1 : 0
db_cluster_identifier = aws_rds_cluster.main.id
db_cluster_snapshot_identifier = "${var.name}-${var.environment}-cross-region"
}
RDS PITR Restore
# AWS CLI: restore RDS cluster to a point in time
aws rds restore-db-cluster-to-point-in-time \
--db-cluster-identifier myapp-production-restored \
--source-db-cluster-identifier myapp-production \
--restore-to-time "2027-03-04T14:30:00Z" \
--db-subnet-group-name myapp-production-subnet-group \
--vpc-security-group-ids sg-0abc123 \
--tags Key=Purpose,Value=PITRRestore
# Wait for restore to complete
aws rds wait db-cluster-available \
--db-cluster-identifier myapp-production-restored
# Connect and verify
psql "postgresql://user:pass@myapp-production-restored.cluster-xxxx.us-east-1.rds.amazonaws.com/myapp"
Automated Restore Testing
A backup that isn't tested isn't a backup. Run this weekly:
// scripts/test-backup-restore.ts
// Run via Lambda cron: every Sunday at 4am UTC
import {
RDSClient,
RestoreDBClusterToPointInTimeCommand,
DeleteDBClusterCommand,
DescribeDBClustersCommand,
} from "@aws-sdk/client-rds";
import { db } from "@/lib/db"; // Connected to restored cluster
const rds = new RDSClient({ region: process.env.AWS_REGION });
async function testRestoreAndValidate() {
const testClusterId = `${process.env.SOURCE_CLUSTER_ID}-restore-test-${Date.now()}`;
try {
console.log("Step 1: Restoring cluster to 1 hour ago...");
await rds.send(new RestoreDBClusterToPointInTimeCommand({
DBClusterIdentifier: testClusterId,
SourceDBClusterIdentifier: process.env.SOURCE_CLUSTER_ID!,
RestoreToTime: new Date(Date.now() - 60 * 60 * 1000), // 1 hour ago
DBSubnetGroupName: process.env.SUBNET_GROUP!,
VpcSecurityGroupIds: [process.env.SECURITY_GROUP_ID!],
}));
// Wait up to 20 minutes for restore
await waitForCluster(testClusterId, 20 * 60 * 1000);
console.log("Step 2: Restore complete.");
// Step 3: Run validation queries against restored cluster
const endpoint = await getClusterEndpoint(testClusterId);
const testDb = createTestConnection(endpoint);
const [userCount, orderCount, recentOrder] = await Promise.all([
testDb.user.count(),
testDb.order.count(),
testDb.order.findFirst({ orderBy: { createdAt: "desc" } }),
]);
console.log(`Validation: ${userCount} users, ${orderCount} orders`);
console.log(`Most recent order: ${recentOrder?.id} at ${recentOrder?.createdAt}`);
if (userCount === 0) throw new Error("Restore validation failed: no users found");
if (orderCount === 0) throw new Error("Restore validation failed: no orders found");
console.log("✅ Restore test PASSED");
// Send success notification
await sendAlert({
type: "success",
message: `Weekly restore test passed: ${userCount} users, ${orderCount} orders`,
});
} catch (err) {
console.error("❌ Restore test FAILED:", err);
await sendAlert({ type: "failure", message: String(err) });
throw err;
} finally {
// Always clean up the test cluster
console.log("Cleaning up test cluster...");
await rds.send(new DeleteDBClusterCommand({
DBClusterIdentifier: testClusterId,
SkipFinalSnapshot: true,
}));
console.log("Test cluster deleted.");
}
}
async function waitForCluster(clusterId: string, timeoutMs: number) {
const deadline = Date.now() + timeoutMs;
while (Date.now() < deadline) {
const { DBClusters } = await rds.send(new DescribeDBClustersCommand({
DBClusterIdentifier: clusterId,
}));
if (DBClusters?.[0]?.Status === "available") return;
await new Promise((r) => setTimeout(r, 30_000)); // Poll every 30s
}
throw new Error(`Cluster ${clusterId} did not become available within ${timeoutMs}ms`);
}
Backup Monitoring
-- Check WAL archiving status
SELECT
archived_count,
last_archived_wal,
last_archived_time,
failed_count,
last_failed_wal,
last_failed_time
FROM pg_stat_archiver;
-- Alert if failed_count > 0 or last_archived_time > 1 hour ago
Cost Estimate
| Backup Type | Storage | Cost/Month |
|---|---|---|
| pg_dump (daily, 30 day retention, 10GB compressed) | 300 GB S3-IA | ~$5 |
| WAL archiving (continuous, 7 day retention) | ~50 GB | ~$1.15 |
| RDS automated backup (35 day, 100GB DB) | 3.5 TB | ~$87 (first 100% free) |
| Cross-region snapshot copy | Same as above | +$87 |
See Also
- PostgreSQL Schema Migrations — Backup before migrations
- AWS RDS Aurora — Aurora-specific backup features
- AWS Secrets Manager — Securing backup encryption keys
- AWS Lambda Scheduled — Scheduling backup and restore tests
Working With Viprasol
We design and implement PostgreSQL backup strategies for production SaaS databases — from simple pg_dump schedules through continuous WAL archiving with automated weekly restore testing. Our team has implemented backup systems that have successfully restored production data multiple times.
What we deliver:
- Layered backup strategy design (logical + WAL + snapshots)
- pgBackRest configuration with S3 archiving
- RDS backup retention and cross-region copy configuration
- Automated restore testing Lambda (weekly, validates row counts)
- CloudWatch alarms for WAL archiving failures and backup age
Explore our cloud infrastructure services or contact us to implement a tested backup strategy.
About the Author
Viprasol Tech Team
Custom Software Development Specialists
The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.
Need DevOps & Cloud Expertise?
Scale your infrastructure with confidence. AWS, GCP, Azure certified team.
Free consultation • No commitment • Response within 24 hours
Making sense of your data at scale?
Viprasol builds end-to-end big data analytics solutions — ETL pipelines, data warehouses on Snowflake or BigQuery, and self-service BI dashboards. One reliable source of truth for your entire organisation.