Back to Blog

AWS RDS Read Replicas: Routing, Connection Pooling, Lag Monitoring, and Failover Patterns

Scale read-heavy PostgreSQL workloads with AWS RDS read replicas. Covers Prisma read/write splitting, PgBouncer per-replica connection pooling, replica lag monitoring, automatic failover with Aurora, and Terraform setup for multi-replica configurations.

Viprasol Tech Team
June 2, 2027
12 min read

Read replicas offload SELECT queries from your primary RDS instance, letting it focus on writes. The performance improvement is significant: if 80% of your queries are reads, a single replica can cut primary load in half. The implementation challenge is routing — making sure writes always go to the primary, reads go to replicas, and your application handles the replica lag gracefully.

When Read Replicas Help (and When They Don't)

ScenarioRead Replicas HelpAlternative
Read-heavy analytics (dashboards, reports)✅ YesAlso consider materialized views
Read-heavy app queries (80%+ SELECTs)✅ Yes
Write-heavy workloads❌ MinimalSharding, CQRS
Queries needing fresh data (< 100ms lag)❌ RiskyRoute to primary
Full-text search❌ NoUse OpenSearch
< 100 RPS total❌ OverkillOptimize indexes first

Terraform: RDS with Read Replicas

# terraform/rds-replicas.tf

# Primary instance
resource "aws_db_instance" "primary" {
  identifier           = "${var.app_name}-primary"
  engine               = "postgres"
  engine_version       = "16.3"
  instance_class       = "db.t4g.large"
  allocated_storage    = 100
  storage_type         = "gp3"
  storage_encrypted    = true

  db_name  = var.db_name
  username = var.db_username
  password = var.db_password

  db_subnet_group_name   = aws_db_subnet_group.main.name
  vpc_security_group_ids = [aws_security_group.rds.id]
  multi_az               = true   # Synchronous standby for failover

  # Enable enhanced monitoring for replica lag metrics
  monitoring_interval = 60
  monitoring_role_arn = aws_iam_role.rds_monitoring.arn

  # Backup for Point-in-Time Recovery
  backup_retention_period = 7
  backup_window           = "03:00-04:00"
  maintenance_window      = "sun:04:30-sun:05:30"

  # Required for read replicas
  parameter_group_name = aws_db_parameter_group.postgres16.name

  tags = local.tags
}

# Read replica 1 (same region, different AZ)
resource "aws_db_instance" "replica_1" {
  identifier             = "${var.app_name}-replica-1"
  replicate_source_db    = aws_db_instance.primary.identifier
  instance_class         = "db.t4g.medium"  # Can be smaller than primary
  publicly_accessible    = false
  vpc_security_group_ids = [aws_security_group.rds.id]

  # Replicas don't need backups (primary has them)
  backup_retention_period = 0
  skip_final_snapshot     = true

  # Auto-minor version upgrade
  auto_minor_version_upgrade = true

  tags = merge(local.tags, { Role = "replica" })
}

# Optional: replica in a different region (cross-region)
# resource "aws_db_instance" "replica_eu" {
#   provider              = aws.eu_west_1
#   replicate_source_db   = aws_db_instance.primary.arn  # ARN for cross-region
#   ...
# }

# CloudWatch alarm: replica lag > 30 seconds
resource "aws_cloudwatch_metric_alarm" "replica_lag" {
  alarm_name          = "${var.app_name}-replica-lag"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "ReplicaLag"
  namespace           = "AWS/RDS"
  period              = 60
  statistic           = "Average"
  threshold           = 30    # seconds
  alarm_actions       = [var.sns_alert_topic_arn]

  dimensions = {
    DBInstanceIdentifier = aws_db_instance.replica_1.identifier
  }
}

output "primary_endpoint" {
  value = aws_db_instance.primary.endpoint
}

output "replica_endpoint" {
  value = aws_db_instance.replica_1.endpoint
}

☁️ Is Your Cloud Costing Too Much?

Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.

  • AWS, GCP, Azure certified engineers
  • Infrastructure as Code (Terraform, CDK)
  • Docker, Kubernetes, GitHub Actions CI/CD
  • Typical audit recovers $500–$3,000/month in savings

Prisma: Read/Write Splitting

// lib/prisma.ts — separate clients for primary and replica

import { PrismaClient } from "@prisma/client";

function createPrismaClient(databaseUrl: string) {
  return new PrismaClient({
    datasources: { db: { url: databaseUrl } },
    log: process.env.NODE_ENV === "development" ? ["query", "error"] : ["error"],
  });
}

// Primary: all writes + reads requiring freshness
const primaryClient = createPrismaClient(process.env.DATABASE_URL!);

// Replica: read-only queries
const replicaClient = process.env.DATABASE_REPLICA_URL
  ? createPrismaClient(process.env.DATABASE_REPLICA_URL)
  : primaryClient;  // Fallback to primary if no replica configured

export const prisma   = primaryClient;
export const prismaRO = replicaClient;   // "RO" = read-only

// Usage pattern:
// import { prisma, prismaRO } from "@/lib/prisma";
// const user = await prismaRO.user.findUnique(...)   ← reads from replica
// await prisma.user.update(...)                       ← writes to primary

Read/Write Routing Layer

// lib/db/router.ts — route queries based on operation type

import { prisma, prismaRO } from "@/lib/prisma";
import type { Prisma } from "@prisma/client";

// Read operations that can use replica (stale data acceptable)
export async function readFromReplica<T>(
  query: (client: typeof prismaRO) => Promise<T>
): Promise<T> {
  return query(prismaRO);
}

// Write + read: always use primary
export async function readFromPrimary<T>(
  query: (client: typeof prisma) => Promise<T>
): Promise<T> {
  return query(prisma);
}

// Practical usage:
// Reports, dashboards, list pages → replica
// Post-write reads ("read your writes") → primary

// Example: dashboard stats (stale ok, use replica)
export async function getDashboardStats(workspaceId: string) {
  return readFromReplica((db) =>
    db.project.groupBy({
      by:     ["status"],
      where:  { workspaceId },
      _count: { id: true },
    })
  );
}

// Example: fetch user after updating (must use primary)
export async function updateAndFetchUser(userId: string, data: Prisma.UserUpdateInput) {
  const updated = await prisma.user.update({ where: { id: userId }, data });
  // Read from PRIMARY — replica may be 100-200ms behind
  const fresh = await readFromPrimary((db) =>
    db.user.findUniqueOrThrow({ where: { id: userId } })
  );
  return fresh;
}

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.

  • Staging + production environments with feature flags
  • Automated security scanning in the pipeline
  • Uptime monitoring + alerting + runbook automation
  • On-call support handover docs included

Monitoring Replica Lag

// lib/monitoring/replica-lag.ts
import { CloudWatchClient, GetMetricStatisticsCommand } from "@aws-sdk/client-cloudwatch";

const cw = new CloudWatchClient({ region: process.env.AWS_REGION });

export async function getReplicaLagSeconds(): Promise<number> {
  const now  = new Date();
  const past = new Date(now.getTime() - 5 * 60 * 1000); // Last 5 minutes

  const result = await cw.send(
    new GetMetricStatisticsCommand({
      Namespace:  "AWS/RDS",
      MetricName: "ReplicaLag",
      Dimensions: [
        { Name: "DBInstanceIdentifier", Value: process.env.RDS_REPLICA_ID! },
      ],
      StartTime:  past,
      EndTime:    now,
      Period:     60,
      Statistics: ["Average"],
    })
  );

  const points = result.Datapoints ?? [];
  if (points.length === 0) return 0;

  return Math.max(...points.map((p) => p.Average ?? 0));
}

// Health check: skip replica if lagging > 5 seconds
export async function getReadClient() {
  const lagSeconds = await getReplicaLagSeconds();
  if (lagSeconds > 5) {
    console.warn(`[replica] Lag ${lagSeconds}s > 5s — routing to primary`);
    return prisma;   // Fall back to primary
  }
  return prismaRO;
}

Aurora: Automatic Failover and Cluster Endpoint

# For Aurora PostgreSQL: use cluster endpoint — auto-routes to current writer
resource "aws_rds_cluster" "aurora" {
  cluster_identifier   = "${var.app_name}-aurora"
  engine               = "aurora-postgresql"
  engine_version       = "16.3"
  database_name        = var.db_name
  master_username      = var.db_username
  master_password      = var.db_password
  storage_encrypted    = true

  db_subnet_group_name   = aws_db_subnet_group.main.name
  vpc_security_group_ids = [aws_security_group.rds.id]

  backup_retention_period = 7
  preferred_backup_window = "03:00-04:00"

  skip_final_snapshot = false
  final_snapshot_identifier = "${var.app_name}-final"
}

# Writer instance
resource "aws_rds_cluster_instance" "writer" {
  identifier         = "${var.app_name}-writer"
  cluster_identifier = aws_rds_cluster.aurora.id
  instance_class     = "db.t4g.large"
  engine             = aws_rds_cluster.aurora.engine
}

# Reader instance (Aurora auto-promotes to writer on failover)
resource "aws_rds_cluster_instance" "reader" {
  identifier         = "${var.app_name}-reader"
  cluster_identifier = aws_rds_cluster.aurora.id
  instance_class     = "db.t4g.medium"
  engine             = aws_rds_cluster.aurora.engine
}

output "aurora_writer_endpoint" {
  value = aws_rds_cluster.aurora.endpoint        # Always points to current writer
}

output "aurora_reader_endpoint" {
  value = aws_rds_cluster.aurora.reader_endpoint  # Round-robins across readers
}
// Aurora endpoints in Prisma
// DATABASE_URL     = aurora_writer_endpoint (writes)
// DATABASE_REPLICA_URL = aurora_reader_endpoint (reads)
// Aurora handles failover automatically — no DNS update needed

Cost Comparison

ConfigurationMonthly Cost (approx)
Single db.t4g.large (no replica)~$130
Primary db.t4g.large + 1 replica db.t4g.medium~$190
Aurora db.t4g.large writer + 1 reader~$220
RDS Multi-AZ db.t4g.large (standby, no read traffic)~$260

See Also


Working With Viprasol

Read replicas require correct routing discipline — a single misrouted write to a read-only endpoint causes an error; a single "read your own write" routed to a lagging replica causes a confusing user experience. Our team sets up RDS replicas with Terraform, implements prismaRO for replica reads and prisma for primary writes, adds replica lag CloudWatch alarms, and builds a getReadClient() health-check function that falls back to primary when lag exceeds 5 seconds.

What we deliver:

  • RDS Terraform: primary Multi-AZ + replica (smaller instance class), enhanced monitoring, lag CloudWatch alarm
  • prisma (primary) and prismaRO (replica) Prisma clients with fallback to primary when no replica URL
  • readFromReplica() and readFromPrimary() routing helpers
  • getReplicaLagSeconds() CloudWatch metric query
  • getReadClient() health check: lag > 5s → fall back to primary
  • Aurora cluster Terraform: writer + reader endpoints, auto-failover

Talk to our team about your RDS scaling architecture →

Or explore our cloud infrastructure services.

Share this article:

About the Author

V

Viprasol Tech Team

Custom Software Development Specialists

The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.

MT4/MT5 EA DevelopmentAI Agent SystemsSaaS DevelopmentAlgorithmic Trading

Need DevOps & Cloud Expertise?

Scale your infrastructure with confidence. AWS, GCP, Azure certified team.

Free consultation • No commitment • Response within 24 hours

Viprasol · Big Data & Analytics

Making sense of your data at scale?

Viprasol builds end-to-end big data analytics solutions — ETL pipelines, data warehouses on Snowflake or BigQuery, and self-service BI dashboards. One reliable source of truth for your entire organisation.