Most SaaS products don't need multi-region until they have paying customers on multiple continents or an SLA that requires 99.99% uptime. Getting there prematurely adds enormous operational complexity. But when you do need it — whether for latency, compliance (GDPR data residency), or disaster recovery — you need a clear architecture before you're under pressure.

This post covers the two main patterns (active-passive and active-active), PostgreSQL replication with Aurora Global Database, latency-based routing with Route 53, handling replication lag in your application, and Terraform for the whole thing.

Pattern Comparison

Pattern	RTO	RPO	Complexity	Cost	Use case
Single region	N/A	N/A	Low	Low	< 50K users, no SLA
Active-passive (warm standby)	1–5 min	< 1 min	Medium	1.5–2x	99.9% SLA, DR
Active-passive (hot standby)	< 1 min	Seconds	Medium-High	2x	99.95% SLA
Active-active	Near-zero	Near-zero	Very High	3–4x	99.99% SLA, global users

Start with active-passive. Active-active requires solving distributed writes — conflict resolution, two-phase commit or event sourcing — complexity most products don't need.

1. Aurora Global Database (Recommended for AWS)

Aurora Global Database provides sub-second replication lag across up to 5 regions with automatic failover in under 1 minute.

# infrastructure/aurora-global/main.tf

# Primary cluster (us-east-1)
resource "aws_rds_global_cluster" "main" {
  global_cluster_identifier = "${var.project}-global"
  engine                    = "aurora-postgresql"
  engine_version            = "16.3"
  database_name             = var.db_name
  storage_encrypted         = true
}

resource "aws_rds_cluster" "primary" {
  provider = aws.us_east_1

  cluster_identifier        = "${var.project}-primary"
  engine                    = "aurora-postgresql"
  engine_version            = "16.3"
  global_cluster_identifier = aws_rds_global_cluster.main.id
  db_subnet_group_name      = aws_db_subnet_group.primary.name
  vpc_security_group_ids    = [aws_security_group.rds.id]

  master_username = var.db_username
  master_password = random_password.db.result

  backup_retention_period = 7
  preferred_backup_window = "03:00-04:00"
  skip_final_snapshot     = false

  deletion_protection = true

  tags = { Region = "primary", Environment = var.environment }
}

resource "aws_rds_cluster_instance" "primary" {
  provider = aws.us_east_1
  count    = 2  # Writer + 1 reader in primary region

  identifier         = "${var.project}-primary-${count.index}"
  cluster_identifier = aws_rds_cluster.primary.id
  instance_class     = "db.r8g.xlarge"
  engine             = "aurora-postgresql"

  performance_insights_enabled = true
}

# Secondary cluster (eu-west-1) — read replica region
resource "aws_rds_cluster" "secondary" {
  provider = aws.eu_west_1

  cluster_identifier        = "${var.project}-secondary"
  engine                    = "aurora-postgresql"
  engine_version            = "16.3"
  global_cluster_identifier = aws_rds_global_cluster.main.id
  db_subnet_group_name      = aws_db_subnet_group.secondary.name
  vpc_security_group_ids    = [aws_security_group.rds_eu.id]

  # Secondary clusters don't have master credentials (replicated)
  skip_final_snapshot = false

  tags = { Region = "secondary", Environment = var.environment }

  lifecycle {
    ignore_changes = [replication_source_identifier]
  }
}

resource "aws_rds_cluster_instance" "secondary" {
  provider = aws.eu_west_1
  count    = 1

  identifier         = "${var.project}-secondary-${count.index}"
  cluster_identifier = aws_rds_cluster.secondary.id
  instance_class     = "db.r8g.large"
  engine             = "aurora-postgresql"
}

☁️ Is Your Cloud Costing Too Much?

Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.

AWS, GCP, Azure certified engineers
Infrastructure as Code (Terraform, CDK)
Docker, Kubernetes, GitHub Actions CI/CD
Typical audit recovers $500–$3,000/month in savings

Get a Free Cloud Audit WhatsApp

2. Route 53 Latency-Based Routing

# infrastructure/dns/main.tf

# Health checks for each region's ALB
resource "aws_route53_health_check" "us_east_1" {
  fqdn              = aws_lb.us_east_1.dns_name
  port              = 443
  type              = "HTTPS"
  resource_path     = "/health"
  failure_threshold = 3
  request_interval  = 10

  tags = { Name = "${var.project}-hc-us-east-1" }
}

resource "aws_route53_health_check" "eu_west_1" {
  fqdn              = aws_lb.eu_west_1.dns_name
  port              = 443
  type              = "HTTPS"
  resource_path     = "/health"
  failure_threshold = 3
  request_interval  = 10

  tags = { Name = "${var.project}-hc-eu-west-1" }
}

# Latency-based records: Route 53 picks the nearest healthy region
resource "aws_route53_record" "api_us" {
  zone_id = data.aws_route53_zone.main.zone_id
  name    = "api.${var.domain}"
  type    = "A"

  set_identifier = "us-east-1"
  latency_routing_policy {
    region = "us-east-1"
  }

  health_check_id = aws_route53_health_check.us_east_1.id

  alias {
    name                   = aws_lb.us_east_1.dns_name
    zone_id                = aws_lb.us_east_1.zone_id
    evaluate_target_health = true
  }
}

resource "aws_route53_record" "api_eu" {
  zone_id = data.aws_route53_zone.main.zone_id
  name    = "api.${var.domain}"
  type    = "A"

  set_identifier = "eu-west-1"
  latency_routing_policy {
    region = "eu-west-1"
  }

  health_check_id = aws_route53_health_check.eu_west_1.id

  alias {
    name                   = aws_lb.eu_west_1.dns_name
    zone_id                = aws_lb.eu_west_1.zone_id
    evaluate_target_health = true
  }
}

3. Application Layer: Read/Write Splitting

// src/lib/db/multi-region.ts
import { PrismaClient } from '@prisma/client';

// Separate clients for write (primary) and read (local replica)
const writeDb = new PrismaClient({
  datasourceUrl: process.env.DATABASE_URL_PRIMARY,  // us-east-1 writer
  log: ['error'],
});

const readDb = new PrismaClient({
  datasourceUrl: process.env.DATABASE_URL_REPLICA,  // Local regional replica
  log: ['error'],
});

// Typed wrapper that enforces read/write routing
export const db = {
  // Reads go to local replica (low latency)
  query: readDb,
  // Writes always go to primary
  mutation: writeDb,
};

// Usage example:
// await db.query.post.findMany({ where: { status: 'published' } });
// await db.mutation.post.create({ data: { ... } });

Handling Replication Lag

// src/lib/db/read-after-write.ts
// Problem: user writes, then immediately reads — might see stale data from replica
// Solution: route reads to primary for a short window after a write

import { AsyncLocalStorage } from 'async_hooks';

const readAfterWriteStorage = new AsyncLocalStorage<{
  expiresAt: number;
  entity: string;
}>();

export function markReadAfterWrite(entity: string, windowMs = 5000) {
  // Signal: this request should read from primary for next 5s
  readAfterWriteStorage.run(
    { expiresAt: Date.now() + windowMs, entity },
    () => {} // AsyncLocalStorage context set; caller uses getDb()
  );
}

export function getDb() {
  const context = readAfterWriteStorage.getStore();

  if (context && Date.now() < context.expiresAt) {
    // Within read-after-write window — use primary for reads too
    return writeDb;
  }

  return readDb;
}

// In your API routes:
export async function updateUserProfile(userId: string, data: ProfileInput) {
  // Write to primary
  const updated = await writeDb.user.update({ where: { id: userId }, data });

  // Signal: next reads for this user should hit primary
  markReadAfterWrite('user');

  return updated;
}

// Middleware: propagate read-after-write window across async calls
// Use a Redis key with TTL as a distributed alternative:
async function isInReadAfterWriteWindow(userId: string): Promise<boolean> {
  const key = `raw:user:${userId}`;
  return (await redis.exists(key)) === 1;
}

async function setReadAfterWriteWindow(userId: string, ttlMs = 5000): Promise<void> {
  await redis.set(`raw:user:${userId}`, '1', 'PX', ttlMs);
}

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.

Staging + production environments with feature flags
Automated security scanning in the pipeline
Uptime monitoring + alerting + runbook automation
On-call support handover docs included

Modernize My DevOps WhatsApp

4. Health Check Endpoint

// src/app/api/health/route.ts
import { NextResponse } from 'next/server';
import { writeDb, readDb } from '../../../lib/db/multi-region';

export const dynamic = 'force-dynamic';
export const runtime = 'nodejs';

export async function GET() {
  const checks = await Promise.allSettled([
    writeDb.$queryRaw`SELECT 1`,
    readDb.$queryRaw`SELECT 1`,
    checkRedis(),
  ]);

  const [primary, replica, cache] = checks;

  const healthy =
    primary.status === 'fulfilled' &&
    cache.status === 'fulfilled';
    // Note: replica failure alone doesn't make us unhealthy
    // Route 53 will still route writes to primary

  const details = {
    region: process.env.AWS_REGION ?? 'unknown',
    primary: primary.status === 'fulfilled' ? 'ok' : 'degraded',
    replica: replica.status === 'fulfilled' ? 'ok' : 'degraded',
    cache: cache.status === 'fulfilled' ? 'ok' : 'degraded',
    timestamp: new Date().toISOString(),
  };

  return NextResponse.json(details, {
    status: healthy ? 200 : 503,
  });
}

async function checkRedis(): Promise<void> {
  const { redis } = await import('../../../lib/redis');
  await redis.ping();
}

5. Failover Runbook (Automated)

# infrastructure/aurora-global/failover.tf
# CloudWatch alarm + Lambda to trigger Aurora failover automatically

resource "aws_cloudwatch_metric_alarm" "primary_db_down" {
  alarm_name          = "${var.project}-primary-db-unavailable"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 3
  metric_name         = "DatabaseConnections"
  namespace           = "AWS/RDS"
  period              = 60
  statistic           = "Sum"
  threshold           = 0
  treat_missing_data  = "breaching"  # Missing data = alarm

  dimensions = {
    DBClusterIdentifier = aws_rds_cluster.primary.cluster_identifier
  }

  alarm_actions = [aws_sns_topic.ops_alerts.arn]
  # For auto-failover: trigger Lambda via SNS that calls
  # aws rds failover-global-cluster --global-cluster-identifier ...
}

Cost Reference

Architecture	Monthly cost (medium SaaS)	Notes
Single region (us-east-1)	$800–2,000	Baseline
Active-passive (2 regions)	$1,400–3,500	~1.7x single region
Active-active (2 regions)	$2,000–5,000	~2.5x single region
Aurora Global DB (3 regions)	$3,000–8,000	Includes storage replication
Route 53 latency routing	+$2–15/mo	Negligible

Working With Viprasol

Approaching the scale where a single-region outage would cost you customers, or facing GDPR data residency requirements? We design and implement multi-region AWS architectures with Aurora Global Database, Route 53 latency routing, read-after-write consistency handling, and automated failover — with full Terraform IaC and documented runbooks.

Talk to our team → | Explore our cloud solutions →

SaaS Multi-Region Deployment: PostgreSQL Replication, Latency Routing, and Disaster Recovery

Pattern Comparison

1. Aurora Global Database (Recommended for AWS)

☁️ Is Your Cloud Costing Too Much?

2. Route 53 Latency-Based Routing

3. Application Layer: Read/Write Splitting

Handling Replication Lag

⚙️ DevOps Done Right — Zero Downtime, Full Automation

4. Health Check Endpoint

5. Failover Runbook (Automated)

Cost Reference

See Also

Working With Viprasol

Viprasol Tech Team

Need DevOps & Cloud Expertise?

Making sense of your data at scale?

Related Articles

AWS RDS Read Replicas: Routing, Connection Pooling, Lag Monitoring, and Failover Patterns

AWS Aurora Serverless v2: Setup, Auto-Pause, RDS Proxy, and Connection Pooling

AWS SQS and SNS Patterns in 2026: Fan-Out, FIFO Queues, and Message Filtering