Back to Blog

AWS RDS Aurora 2026: Serverless v2, Read Replicas & Failover Guide

Deploy production PostgreSQL on AWS Aurora: Serverless v2 auto-scaling, read replica routing, failover testing, Performance Insights query analysis, and Terraform IaC for Aurora clusters.

Viprasol Tech Team
13 min read
Updated 2026

AWS RDS vs Aurora in 2026: Architecture, Cost, and When to Choose Each

Quick answer. Amazon Aurora is PostgreSQL/MySQL-compatible but re-architected: storage spans six copies across three Availability Zones and writes commit after four confirm, giving 99.99% availability and fast point-in-time recovery. Choose Aurora for bursty or mission-critical workloads (Serverless v2 auto-scales); choose plain RDS for predictable, cost-sensitive ones.

Aurora is PostgreSQL-compatible, but it's not just managed PostgreSQL. The storage layer is fundamentally different — distributed across six copies in three AZs, with writes acknowledged after four copies confirm — giving you 5 nines availability and point-in-time recovery without running your own replication. Aurora Serverless v2 adds automatic capacity scaling from 0.5 to 256 ACUs, making it cost-effective for variable workloads that would otherwise require over-provisioned instances.

This post covers production Aurora setup: Serverless v2 cluster configuration, read replica routing, failover behavior and testing, Performance Insights for query analysis, and Terraform IaC.

Aurora vs Standard RDS PostgreSQL

FeatureAurora PostgreSQLRDS PostgreSQL
StorageDistributed (6 copies, 3 AZs)Single-AZ EBS (Multi-AZ = 2 copies)
Failover time~30 seconds~60–120 seconds
Read replicasUp to 15 (in-cluster)Up to 5 (separate instances)
Replica lag~10–20ms~100ms–several seconds
Serverless optionv2 (sub-minute scaling)No
Storage auto-scalingYes (to 128TB)Yes (to 64TB)
Cost~20% higher than RDSBaseline
Global databasesYes (cross-region)No (manual replication)

1. Aurora Serverless v2 Cluster (Terraform)

# infrastructure/aurora/main.tf

locals {
  cluster_identifier = "${var.project}-${var.environment}-aurora"
  db_name           = replace(var.project, "-", "_")
}

# Subnet group
resource "aws_db_subnet_group" "aurora" {
  name       = "${local.cluster_identifier}-subnet-group"
  subnet_ids = var.private_subnet_ids

  tags = {
    Name        = "${local.cluster_identifier}-subnet-group"
    Environment = var.environment
  }
}

# Security group: only accept connections from ECS tasks / Lambda
resource "aws_security_group" "aurora" {
  name        = "${local.cluster_identifier}-sg"
  description = "Aurora cluster security group"
  vpc_id      = var.vpc_id

  ingress {
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = var.allowed_security_group_ids
    description     = "PostgreSQL from app tier"
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = { Name = "${local.cluster_identifier}-sg" }
}

# Parameter group for PostgreSQL tuning
resource "aws_rds_cluster_parameter_group" "aurora" {
  family      = "aurora-postgresql16"
  name        = "${local.cluster_identifier}-params"
  description = "Custom parameter group for ${local.cluster_identifier}"

  parameter {
    name  = "shared_preload_libraries"
    value = "pg_stat_statements,auto_explain"
  }
  parameter {
    name  = "log_min_duration_statement"
    value = "1000"  # Log queries > 1 second
  }
  parameter {
    name  = "auto_explain.log_min_duration"
    value = "500"   # Auto-explain queries > 500ms
  }
  parameter {
    name  = "auto_explain.log_analyze"
    value = "1"
  }
  parameter {
    name  = "work_mem"
    value = "65536"  # 64MB per sort/hash operation
  }
  parameter {
    name  = "random_page_cost"
    value = "1.0"   # Aurora storage is fast; default 4.0 is too conservative
  }
}

# Aurora cluster
resource "aws_rds_cluster" "main" {
  cluster_identifier     = local.cluster_identifier
  engine                 = "aurora-postgresql"
  engine_version         = "16.4"
  database_name          = local.db_name
  master_username        = "postgres"
  manage_master_user_password = true  # AWS Secrets Manager rotation

  db_subnet_group_name   = aws_db_subnet_group.aurora.name
  vpc_security_group_ids = [aws_security_group.aurora.id]
  db_cluster_parameter_group_name = aws_rds_cluster_parameter_group.aurora.name

  # Serverless v2 capacity range
  serverlessv2_scaling_configuration {
    min_capacity = var.environment == "prod" ? 2.0 : 0.5  # ACUs
    max_capacity = var.environment == "prod" ? 64.0 : 8.0
  }

  # Backup and recovery
  backup_retention_period   = var.environment == "prod" ? 14 : 7
  preferred_backup_window   = "03:00-04:00"  # UTC
  deletion_protection       = var.environment == "prod"
  skip_final_snapshot       = var.environment != "prod"
  final_snapshot_identifier = var.environment == "prod" ? "${local.cluster_identifier}-final" : null

  # Storage encryption
  storage_encrypted = true
  kms_key_id        = var.kms_key_arn

  # CloudWatch logging
  enabled_cloudwatch_logs_exports = ["postgresql"]

  # Performance Insights (covered later)
  # Enabled at instance level

  # Apply immediately for non-prod
  apply_immediately = var.environment != "prod"

  tags = {
    Project     = var.project
    Environment = var.environment
    ManagedBy   = "terraform"
  }

  lifecycle {
    ignore_changes = [
      # Don't recreate cluster if master password changes (managed by Secrets Manager)
      master_password,
    ]
  }
}

# Writer instance (Serverless v2)
resource "aws_rds_cluster_instance" "writer" {
  identifier         = "${local.cluster_identifier}-writer"
  cluster_identifier = aws_rds_cluster.main.id
  instance_class     = "db.serverless"  # Required for Serverless v2
  engine             = aws_rds_cluster.main.engine
  engine_version     = aws_rds_cluster.main.engine_version

  db_subnet_group_name    = aws_db_subnet_group.aurora.name
  db_parameter_group_name = aws_rds_cluster_parameter_group.aurora.name

  performance_insights_enabled          = true
  performance_insights_retention_period = var.environment == "prod" ? 731 : 7  # days
  performance_insights_kms_key_id       = var.kms_key_arn

  monitoring_interval = 60  # Enhanced Monitoring every 60 seconds
  monitoring_role_arn = aws_iam_role.rds_monitoring.arn

  auto_minor_version_upgrade = true

  tags = {
    Project     = var.project
    Environment = var.environment
    Role        = "writer"
  }
}

# Read replica (Serverless v2)
resource "aws_rds_cluster_instance" "reader" {
  count = var.environment == "prod" ? 2 : 1

  identifier         = "${local.cluster_identifier}-reader-${count.index + 1}"
  cluster_identifier = aws_rds_cluster.main.id
  instance_class     = "db.serverless"
  engine             = aws_rds_cluster.main.engine
  engine_version     = aws_rds_cluster.main.engine_version

  db_subnet_group_name = aws_db_subnet_group.aurora.name

  performance_insights_enabled          = true
  performance_insights_retention_period = var.environment == "prod" ? 731 : 7
  performance_insights_kms_key_id       = var.kms_key_arn

  promotion_tier = count.index + 1  # Lower = promoted first on failover

  tags = {
    Project     = var.project
    Environment = var.environment
    Role        = "reader-${count.index + 1}"
  }
}

# Outputs
output "writer_endpoint" {
  value = aws_rds_cluster.main.endpoint  # Always points to current writer
}

output "reader_endpoint" {
  value = aws_rds_cluster.main.reader_endpoint  # Load-balanced across replicas
}

output "cluster_identifier" {
  value = aws_rds_cluster.main.cluster_identifier
}

☁️ Is Your Cloud Costing Too Much?

Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.

  • AWS, GCP, Azure certified engineers
  • Infrastructure as Code (Terraform, CDK)
  • Docker, Kubernetes, GitHub Actions CI/CD
  • Typical audit recovers $500–$3,000/month in savings

2. Read Replica Routing in Application Code

Aurora provides two endpoints: writer (single instance, always the current primary) and reader (load-balanced across all replicas). Route read-heavy queries to the reader endpoint.

// src/lib/db/aurora-client.ts
import { PrismaClient } from '@prisma/client';

// Writer: for mutations and strongly consistent reads
export const dbWriter = new PrismaClient({
  datasourceUrl: process.env.DATABASE_URL_WRITER, // Aurora cluster endpoint
  log: process.env.NODE_ENV === 'development' ? ['query', 'warn', 'error'] : ['warn', 'error'],
});

// Reader: for analytics, reports, non-critical reads
// Aurora reader endpoint load-balances across all replicas
export const dbReader = new PrismaClient({
  datasourceUrl: process.env.DATABASE_URL_READER, // Aurora reader endpoint
});

// Helper: use reader for queries, writer for mutations
export async function readQuery<T>(fn: (db: PrismaClient) => Promise<T>): Promise<T> {
  return fn(dbReader);
}

export async function writeQuery<T>(fn: (db: PrismaClient) => Promise<T>): Promise<T> {
  return fn(dbWriter);
}

// Usage:
// const users = await readQuery(db => db.user.findMany({ take: 100 }));
// await writeQuery(db => db.user.create({ data: { ... } }));
# .env
# Writer endpoint (cluster endpoint — auto-fails over to new primary)
DATABASE_URL_WRITER="postgresql://postgres:${PASSWORD}@${CLUSTER_ENDPOINT}:5432/myapp"

# Reader endpoint (load-balanced across replicas)
DATABASE_URL_READER="postgresql://postgres:${PASSWORD}@${READER_ENDPOINT}:5432/myapp"

PgBouncer Connection Pooling with Aurora

Aurora Serverless v2 supports up to 5,000 connections, but connection overhead still adds up. Use RDS Proxy (AWS-managed PgBouncer) or self-managed PgBouncer.

# RDS Proxy for connection pooling
resource "aws_db_proxy" "main" {
  name                   = "${local.cluster_identifier}-proxy"
  debug_logging          = false
  engine_family          = "POSTGRESQL"
  idle_client_timeout    = 1800
  require_tls            = true
  role_arn               = aws_iam_role.rds_proxy.arn
  vpc_security_group_ids = [aws_security_group.aurora.id]
  vpc_subnet_ids         = var.private_subnet_ids

  auth {
    auth_scheme = "SECRETS"
    iam_auth    = "DISABLED"
    secret_arn  = aws_rds_cluster.main.master_user_secret[0].secret_arn
  }

  tags = { Name = "${local.cluster_identifier}-proxy" }
}

resource "aws_db_proxy_default_target_group" "main" {
  db_proxy_name = aws_db_proxy.main.name

  connection_pool_config {
    connection_borrow_timeout    = 120  # seconds to wait for a connection
    max_connections_percent      = 90   # Use up to 90% of max_connections
    max_idle_connections_percent = 50
  }
}

resource "aws_db_proxy_target" "main" {
  db_cluster_identifier = aws_rds_cluster.main.cluster_identifier
  db_proxy_name         = aws_db_proxy.main.name
  target_group_name     = aws_db_proxy_default_target_group.main.name
}

3. Failover Testing

Aurora promotes a read replica to writer when the current writer fails. Test this in non-production environments to verify your application handles the ~30-second reconnection window.

#!/usr/bin/env bash
# scripts/test-aurora-failover.sh

CLUSTER_ID="${1:?Usage: test-aurora-failover.sh <cluster-id>}"

echo "Initiating Aurora failover for cluster: $CLUSTER_ID"
echo "Watch your application logs — there should be a ~30s reconnection window"

# Trigger failover (promotes lowest promotion_tier replica)
aws rds failover-db-cluster \
  --db-cluster-identifier "$CLUSTER_ID" \
  --region us-east-1

# Monitor until failover completes
echo "Monitoring failover status..."
while true; do
  STATUS=$(aws rds describe-db-clusters \
    --db-cluster-identifier "$CLUSTER_ID" \
    --query 'DBClusters[0].Status' \
    --output text)
  
  WRITER=$(aws rds describe-db-clusters \
    --db-cluster-identifier "$CLUSTER_ID" \
    --query 'DBClusters[0].DBClusterMembers[?IsClusterWriter==`true`].DBInstanceIdentifier' \
    --output text)

  echo "$(date): Status=$STATUS, Writer=$WRITER"
  
  if [ "$STATUS" = "available" ]; then
    echo "Failover complete! New writer: $WRITER"
    break
  fi
  
  sleep 5
done

Application Retry Logic for Failover

// src/lib/db/retry.ts
import { PrismaClient, Prisma } from '@prisma/client';

// Aurora failover causes ~30s of connection errors
// This retry wrapper handles the reconnection window
const FAILOVER_ERROR_CODES = [
  'P1001',  // Can't reach database
  'P1002',  // Database connection timed out
  'P1008',  // Operations timed out
  'P1017',  // Server closed connection unexpectedly
];

export async function withRetry<T>(
  fn: () => Promise<T>,
  maxRetries: number = 5,
  baseDelayMs: number = 1000
): Promise<T> {
  let lastError: Error;

  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (err: any) {
      lastError = err;

      const isRetryable =
        err instanceof Prisma.PrismaClientKnownRequestError &&
        FAILOVER_ERROR_CODES.includes(err.code);

      const isConnectionError =
        err.message?.includes('ECONNREFUSED') ||
        err.message?.includes('Connection terminated') ||
        err.message?.includes('read ECONNRESET');

      if (!isRetryable && !isConnectionError) throw err;
      if (attempt === maxRetries) throw err;

      const delay = baseDelayMs * Math.pow(2, attempt) + Math.random() * 1000;
      console.warn(`DB error (attempt ${attempt + 1}/${maxRetries + 1}), retrying in ${Math.round(delay)}ms:`, err.message);
      await new Promise((r) => setTimeout(r, delay));
    }
  }

  throw lastError!;
}

aws - AWS RDS Aurora 2026: Serverless v2, Read Replicas & Failover Guide

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.

  • Staging + production environments with feature flags
  • Automated security scanning in the pipeline
  • Uptime monitoring + alerting + runbook automation
  • On-call support handover docs included

4. Performance Insights

Performance Insights shows DB load by wait event, SQL query, and user — identifying bottlenecks without needing direct database access.

Read Performance Insights via AWS SDK

// src/scripts/analyze-performance.ts
import {
  PIClient,
  GetResourceMetricsCommand,
  DescribeDimensionKeysCommand,
} from '@aws-sdk/client-pi';

const pi = new PIClient({ region: 'us-east-1' });

const RESOURCE_ID = process.env.PI_RESOURCE_ID!; // dbi-xxxxx from RDS console

// Get top SQL queries by average active sessions (last hour)
async function getTopQueries() {
  const endTime = new Date();
  const startTime = new Date(endTime.getTime() - 60 * 60 * 1000);

  const { Keys } = await pi.send(
    new DescribeDimensionKeysCommand({
      ServiceType: 'RDS',
      Identifier: RESOURCE_ID,
      StartTime: startTime,
      EndTime: endTime,
      Metric: 'db.load.avg',
      GroupBy: {
        Group: 'db.sql',
        Dimensions: ['db.sql.statement'],
        Limit: 10,
      },
    })
  );

  console.log('\n📊 Top 10 Queries by DB Load (last hour):');
  Keys?.forEach((key, i) => {
    const stmt = key.Dimensions?.['db.sql.statement'] ?? 'N/A';
    const load = key.Total?.toFixed(4) ?? '0';
    console.log(`\n${i + 1}. Load: ${load} avg active sessions`);
    console.log(`   SQL: ${stmt.slice(0, 200)}...`);
  });
}

// Get wait events (identify bottlenecks: LWLock, I/O, lock contention)
async function getWaitEvents() {
  const endTime = new Date();
  const startTime = new Date(endTime.getTime() - 60 * 60 * 1000);

  const { Keys } = await pi.send(
    new DescribeDimensionKeysCommand({
      ServiceType: 'RDS',
      Identifier: RESOURCE_ID,
      StartTime: startTime,
      EndTime: endTime,
      Metric: 'db.load.avg',
      GroupBy: {
        Group: 'db.wait_event',
        Dimensions: ['db.wait_event.name'],
        Limit: 10,
      },
    })
  );

  console.log('\n⏳ Top Wait Events:');
  Keys?.forEach((key) => {
    const event = key.Dimensions?.['db.wait_event.name'] ?? 'N/A';
    const load = key.Total?.toFixed(4) ?? '0';
    console.log(`  ${event}: ${load}`);
  });
}

await getTopQueries();
await getWaitEvents();

Cost Reference

ConfigurationMonthly Cost (us-east-1)Use Case
Serverless v2, 0.5–8 ACU (dev)$20–120/mo (usage-based)Development / staging
Serverless v2, 2–32 ACU + 1 reader (prod)$300–900/moSmall-medium SaaS
db.r8g.2xlarge + 1 reader~$1,400/moHigh-throughput, predictable load
db.r8g.4xlarge (Global DB)~$4,200/moMulti-region with cross-region reads
+ RDS Proxy+$0.015/hour (~$11/mo)Connection pooling
+ Performance Insights (7 days)FreeQuery analysis
+ Performance Insights (2 years)+$0.02/vCPU/hourLong-term analysis

More on This Topic


Viprasol in Action

Migrating from self-managed PostgreSQL or standard RDS to Aurora, or right-sizing an existing Aurora cluster that's over-provisioned? We design Aurora architectures — Serverless v2 for variable workloads, provisioned for steady high-throughput, Global Databases for multi-region — with Terraform IaC, read replica routing, and Performance Insights dashboards.

Talk to our team → | Explore our cloud solutions →

How an Aurora Serverless Read Replica Improves Scaling and Availability

An Aurora Serverless read replica lets you absorb read-heavy traffic without manually sizing instances. With Aurora Serverless v2, each reader scales its Aurora Capacity Units up and down based on live demand, so reporting dashboards, search queries, and analytics workloads draw from replicas while the writer stays focused on transactions. Because all readers share the same distributed storage volume, replication lag stays low and you avoid the overhead of copying data between nodes.

Adding a reader also strengthens availability. If the primary fails, Aurora can promote a replica during failover, typically restoring writes in well under a minute. Placing readers across multiple Availability Zones spreads risk further.

At Viprasol Tech, our senior engineers design Aurora topologies end to end, tuning capacity ranges, endpoints, and failover priorities so your database scales predictably under real load.

awsaurorapostgresqlterraformrdsclouddevops
Share this article:

About the Author

V

Viprasol Tech Team

Custom Software Development Specialists

The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 1000+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement.

MT4/MT5 EA DevelopmentAI Agent SystemsSaaS DevelopmentAlgorithmic Trading

Need DevOps & Cloud Expertise?

Scale your infrastructure with confidence. AWS, GCP, Azure certified team.

Free consultation • No commitment • Response within 24 hours

Viprasol · Big Data & Analytics

Making sense of your data at scale?

Viprasol builds end-to-end big data analytics solutions — ETL pipelines, data warehouses on Snowflake or BigQuery, and self-service BI dashboards. One reliable source of truth for your entire organisation.