Read replicas offload SELECT queries from your primary RDS instance, letting it focus on writes. The performance improvement is significant: if 80% of your queries are reads, a single replica can cut primary load in half. The implementation challenge is routing — making sure writes always go to the primary, reads go to replicas, and your application handles the replica lag gracefully.

When Read Replicas Help (and When They Don't)

Scenario	Read Replicas Help	Alternative
Read-heavy analytics (dashboards, reports)	✅ Yes	Also consider materialized views
Read-heavy app queries (80%+ SELECTs)	✅ Yes
Write-heavy workloads	❌ Minimal	Sharding, CQRS
Queries needing fresh data (< 100ms lag)	❌ Risky	Route to primary
Full-text search	❌ No	Use OpenSearch
< 100 RPS total	❌ Overkill	Optimize indexes first

Terraform: RDS with Read Replicas

# terraform/rds-replicas.tf

# Primary instance
resource "aws_db_instance" "primary" {
  identifier           = "${var.app_name}-primary"
  engine               = "postgres"
  engine_version       = "16.3"
  instance_class       = "db.t4g.large"
  allocated_storage    = 100
  storage_type         = "gp3"
  storage_encrypted    = true

  db_name  = var.db_name
  username = var.db_username
  password = var.db_password

  db_subnet_group_name   = aws_db_subnet_group.main.name
  vpc_security_group_ids = [aws_security_group.rds.id]
  multi_az               = true   # Synchronous standby for failover

  # Enable enhanced monitoring for replica lag metrics
  monitoring_interval = 60
  monitoring_role_arn = aws_iam_role.rds_monitoring.arn

  # Backup for Point-in-Time Recovery
  backup_retention_period = 7
  backup_window           = "03:00-04:00"
  maintenance_window      = "sun:04:30-sun:05:30"

  # Required for read replicas
  parameter_group_name = aws_db_parameter_group.postgres16.name

  tags = local.tags
}

# Read replica 1 (same region, different AZ)
resource "aws_db_instance" "replica_1" {
  identifier             = "${var.app_name}-replica-1"
  replicate_source_db    = aws_db_instance.primary.identifier
  instance_class         = "db.t4g.medium"  # Can be smaller than primary
  publicly_accessible    = false
  vpc_security_group_ids = [aws_security_group.rds.id]

  # Replicas don't need backups (primary has them)
  backup_retention_period = 0
  skip_final_snapshot     = true

  # Auto-minor version upgrade
  auto_minor_version_upgrade = true

  tags = merge(local.tags, { Role = "replica" })
}

# Optional: replica in a different region (cross-region)
# resource "aws_db_instance" "replica_eu" {
#   provider              = aws.eu_west_1
#   replicate_source_db   = aws_db_instance.primary.arn  # ARN for cross-region
#   ...
# }

# CloudWatch alarm: replica lag > 30 seconds
resource "aws_cloudwatch_metric_alarm" "replica_lag" {
  alarm_name          = "${var.app_name}-replica-lag"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "ReplicaLag"
  namespace           = "AWS/RDS"
  period              = 60
  statistic           = "Average"
  threshold           = 30    # seconds
  alarm_actions       = [var.sns_alert_topic_arn]

  dimensions = {
    DBInstanceIdentifier = aws_db_instance.replica_1.identifier
  }
}

output "primary_endpoint" {
  value = aws_db_instance.primary.endpoint
}

output "replica_endpoint" {
  value = aws_db_instance.replica_1.endpoint
}

☁️ Is Your Cloud Costing Too Much?

Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.

AWS, GCP, Azure certified engineers
Infrastructure as Code (Terraform, CDK)
Docker, Kubernetes, GitHub Actions CI/CD
Typical audit recovers $500–$3,000/month in savings

Get a Free Cloud Audit WhatsApp

Prisma: Read/Write Splitting

// lib/prisma.ts — separate clients for primary and replica

import { PrismaClient } from "@prisma/client";

function createPrismaClient(databaseUrl: string) {
  return new PrismaClient({
    datasources: { db: { url: databaseUrl } },
    log: process.env.NODE_ENV === "development" ? ["query", "error"] : ["error"],
  });
}

// Primary: all writes + reads requiring freshness
const primaryClient = createPrismaClient(process.env.DATABASE_URL!);

// Replica: read-only queries
const replicaClient = process.env.DATABASE_REPLICA_URL
  ? createPrismaClient(process.env.DATABASE_REPLICA_URL)
  : primaryClient;  // Fallback to primary if no replica configured

export const prisma   = primaryClient;
export const prismaRO = replicaClient;   // "RO" = read-only

// Usage pattern:
// import { prisma, prismaRO } from "@/lib/prisma";
// const user = await prismaRO.user.findUnique(...)   ← reads from replica
// await prisma.user.update(...)                       ← writes to primary

Read/Write Routing Layer

// lib/db/router.ts — route queries based on operation type

import { prisma, prismaRO } from "@/lib/prisma";
import type { Prisma } from "@prisma/client";

// Read operations that can use replica (stale data acceptable)
export async function readFromReplica<T>(
  query: (client: typeof prismaRO) => Promise<T>
): Promise<T> {
  return query(prismaRO);
}

// Write + read: always use primary
export async function readFromPrimary<T>(
  query: (client: typeof prisma) => Promise<T>
): Promise<T> {
  return query(prisma);
}

// Practical usage:
// Reports, dashboards, list pages → replica
// Post-write reads ("read your writes") → primary

// Example: dashboard stats (stale ok, use replica)
export async function getDashboardStats(workspaceId: string) {
  return readFromReplica((db) =>
    db.project.groupBy({
      by:     ["status"],
      where:  { workspaceId },
      _count: { id: true },
    })
  );
}

// Example: fetch user after updating (must use primary)
export async function updateAndFetchUser(userId: string, data: Prisma.UserUpdateInput) {
  const updated = await prisma.user.update({ where: { id: userId }, data });
  // Read from PRIMARY — replica may be 100-200ms behind
  const fresh = await readFromPrimary((db) =>
    db.user.findUniqueOrThrow({ where: { id: userId } })
  );
  return fresh;
}

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.

Staging + production environments with feature flags
Automated security scanning in the pipeline
Uptime monitoring + alerting + runbook automation
On-call support handover docs included

Modernize My DevOps WhatsApp

Monitoring Replica Lag

// lib/monitoring/replica-lag.ts
import { CloudWatchClient, GetMetricStatisticsCommand } from "@aws-sdk/client-cloudwatch";

const cw = new CloudWatchClient({ region: process.env.AWS_REGION });

export async function getReplicaLagSeconds(): Promise<number> {
  const now  = new Date();
  const past = new Date(now.getTime() - 5 * 60 * 1000); // Last 5 minutes

  const result = await cw.send(
    new GetMetricStatisticsCommand({
      Namespace:  "AWS/RDS",
      MetricName: "ReplicaLag",
      Dimensions: [
        { Name: "DBInstanceIdentifier", Value: process.env.RDS_REPLICA_ID! },
      ],
      StartTime:  past,
      EndTime:    now,
      Period:     60,
      Statistics: ["Average"],
    })
  );

  const points = result.Datapoints ?? [];
  if (points.length === 0) return 0;

  return Math.max(...points.map((p) => p.Average ?? 0));
}

// Health check: skip replica if lagging > 5 seconds
export async function getReadClient() {
  const lagSeconds = await getReplicaLagSeconds();
  if (lagSeconds > 5) {
    console.warn(`[replica] Lag ${lagSeconds}s > 5s — routing to primary`);
    return prisma;   // Fall back to primary
  }
  return prismaRO;
}

Aurora: Automatic Failover and Cluster Endpoint

# For Aurora PostgreSQL: use cluster endpoint — auto-routes to current writer
resource "aws_rds_cluster" "aurora" {
  cluster_identifier   = "${var.app_name}-aurora"
  engine               = "aurora-postgresql"
  engine_version       = "16.3"
  database_name        = var.db_name
  master_username      = var.db_username
  master_password      = var.db_password
  storage_encrypted    = true

  db_subnet_group_name   = aws_db_subnet_group.main.name
  vpc_security_group_ids = [aws_security_group.rds.id]

  backup_retention_period = 7
  preferred_backup_window = "03:00-04:00"

  skip_final_snapshot = false
  final_snapshot_identifier = "${var.app_name}-final"
}

# Writer instance
resource "aws_rds_cluster_instance" "writer" {
  identifier         = "${var.app_name}-writer"
  cluster_identifier = aws_rds_cluster.aurora.id
  instance_class     = "db.t4g.large"
  engine             = aws_rds_cluster.aurora.engine
}

# Reader instance (Aurora auto-promotes to writer on failover)
resource "aws_rds_cluster_instance" "reader" {
  identifier         = "${var.app_name}-reader"
  cluster_identifier = aws_rds_cluster.aurora.id
  instance_class     = "db.t4g.medium"
  engine             = aws_rds_cluster.aurora.engine
}

output "aurora_writer_endpoint" {
  value = aws_rds_cluster.aurora.endpoint        # Always points to current writer
}

output "aurora_reader_endpoint" {
  value = aws_rds_cluster.aurora.reader_endpoint  # Round-robins across readers
}

// Aurora endpoints in Prisma
// DATABASE_URL     = aurora_writer_endpoint (writes)
// DATABASE_REPLICA_URL = aurora_reader_endpoint (reads)
// Aurora handles failover automatically — no DNS update needed

Cost Comparison

Configuration	Monthly Cost (approx)
Single db.t4g.large (no replica)	~$130
Primary db.t4g.large + 1 replica db.t4g.medium	~$190
Aurora db.t4g.large writer + 1 reader	~$220
RDS Multi-AZ db.t4g.large (standby, no read traffic)	~$260

Working With Viprasol

Read replicas require correct routing discipline — a single misrouted write to a read-only endpoint causes an error; a single "read your own write" routed to a lagging replica causes a confusing user experience. Our team sets up RDS replicas with Terraform, implements prismaRO for replica reads and prisma for primary writes, adds replica lag CloudWatch alarms, and builds a getReadClient() health-check function that falls back to primary when lag exceeds 5 seconds.

What we deliver:

RDS Terraform: primary Multi-AZ + replica (smaller instance class), enhanced monitoring, lag CloudWatch alarm
prisma (primary) and prismaRO (replica) Prisma clients with fallback to primary when no replica URL
readFromReplica() and readFromPrimary() routing helpers
getReplicaLagSeconds() CloudWatch metric query
getReadClient() health check: lag > 5s → fall back to primary
Aurora cluster Terraform: writer + reader endpoints, auto-failover

Talk to our team about your RDS scaling architecture →

Or explore our cloud infrastructure services.

AWS RDS Read Replicas: Routing, Connection Pooling, Lag Monitoring, and Failover Patterns

When Read Replicas Help (and When They Don't)

Terraform: RDS with Read Replicas

☁️ Is Your Cloud Costing Too Much?

Prisma: Read/Write Splitting

Read/Write Routing Layer

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Monitoring Replica Lag

Aurora: Automatic Failover and Cluster Endpoint

Cost Comparison

See Also

Working With Viprasol

Viprasol Tech Team

Need DevOps & Cloud Expertise?

Making sense of your data at scale?

Related Articles

AWS Aurora Serverless v2: Setup, Auto-Pause, RDS Proxy, and Connection Pooling

AWS CloudFront Cache Policies: TTLs, Vary Headers, Origin Request Policies, and Invalidation

AWS ElastiCache Redis: Caching Strategies, Session Storage, Rate Limiting, and Pub/Sub