The most dangerous moment in a software deploy is when the new application code and old database schema are simultaneously in use — or vice versa. A naive migration strategy locks tables, blocks reads, and turns a 5-minute deploy into a 45-minute outage window.

Zero-downtime migrations aren't magic. They're a set of sequenced steps that ensure the database and application code are always compatible, regardless of what's currently deployed.

Why Migrations Break Production

Three failure modes cause most migration-related outages:

1. Table locks on ALTER TABLE

Adding a NOT NULL column in PostgreSQL acquires an ACCESS EXCLUSIVE lock — the strongest lock, blocking all reads and writes for the entire table rewrite duration. On a 100GB table, that's 10–30 minutes of downtime.

2. Deploying code before the schema is ready

New code references users.phone_number before the column exists. Every request fails with a SQL error until migration completes.

3. Deploying schema changes that break old code

You drop the users.username column, but old app instances still running (blue/green deploy mid-cutover) immediately start throwing errors.

The Expand-Contract Pattern

The solution is to decouple schema changes from application changes across three phases:

Phase 1: EXPAND   → Add new structure, keep old structure
Phase 2: MIGRATE  → Backfill data, deploy new application code
Phase 3: CONTRACT → Remove old structure

Example: Renaming a Column

Scenario: rename users.fullname → users.full_name

-- PHASE 1: EXPAND (deploy with old app running)
-- Add new column, keep old one
ALTER TABLE users ADD COLUMN full_name TEXT;

-- Copy existing data
UPDATE users SET full_name = fullname WHERE full_name IS NULL;

-- Trigger to keep both columns in sync during transition
CREATE OR REPLACE FUNCTION sync_fullname()
RETURNS TRIGGER AS $$
BEGIN
  IF NEW.fullname IS DISTINCT FROM OLD.fullname THEN
    NEW.full_name := NEW.fullname;
  ELSIF NEW.full_name IS DISTINCT FROM OLD.full_name THEN
    NEW.fullname := NEW.full_name;
  END IF;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER sync_user_name
  BEFORE INSERT OR UPDATE ON users
  FOR EACH ROW EXECUTE FUNCTION sync_fullname();

// PHASE 2: MIGRATE — deploy new app code that reads/writes full_name
// But still writes to both columns via the trigger
// Old app instances still work because fullname still exists

// New application code:
export async function getUser(id: string): Promise<User> {
  const { rows } = await db.query(
    "SELECT id, email, full_name, created_at FROM users WHERE id = $1",
    [id]
  );
  return rows[0];
}

-- PHASE 3: CONTRACT (after all old app instances are drained)
-- Drop the trigger and old column
DROP TRIGGER sync_user_name ON users;
DROP FUNCTION sync_fullname();
ALTER TABLE users DROP COLUMN fullname;

☁️ Is Your Cloud Costing Too Much?

Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.

AWS, GCP, Azure certified engineers
Infrastructure as Code (Terraform, CDK)
Docker, Kubernetes, GitHub Actions CI/CD
Typical audit recovers $500–$3,000/month in savings

Get a Free Cloud Audit WhatsApp

Adding NOT NULL Columns Safely

The naive approach locks the table:

-- ❌ DANGEROUS: full table rewrite with ACCESS EXCLUSIVE lock
ALTER TABLE orders ADD COLUMN processed_at TIMESTAMPTZ NOT NULL DEFAULT NOW();

The safe approach:

-- STEP 1: Add nullable column (no lock needed for nullable adds in PG 11+)
ALTER TABLE orders ADD COLUMN processed_at TIMESTAMPTZ;

-- STEP 2: Backfill in batches (no lock, runs online)
DO $$
DECLARE
  batch_size INT := 5000;
  last_id BIGINT := 0;
  max_id BIGINT;
BEGIN
  SELECT MAX(id) INTO max_id FROM orders;
  
  WHILE last_id <= max_id LOOP
    UPDATE orders
    SET processed_at = created_at  -- or appropriate default
    WHERE id > last_id
      AND id <= last_id + batch_size
      AND processed_at IS NULL;
    
    last_id := last_id + batch_size;
    PERFORM pg_sleep(0.1);  -- Throttle to avoid replication lag
  END LOOP;
END $$;

-- STEP 3: Add NOT NULL constraint using NOT VALID (validates without full scan)
ALTER TABLE orders
  ADD CONSTRAINT orders_processed_at_not_null
  CHECK (processed_at IS NOT NULL) NOT VALID;

-- STEP 4: Validate the constraint (runs as ShareUpdateExclusiveLock — allows reads/writes)
ALTER TABLE orders VALIDATE CONSTRAINT orders_processed_at_not_null;

-- STEP 5 (optional): Convert to actual NOT NULL after validation
-- This is now instant because the constraint is already validated
ALTER TABLE orders ALTER COLUMN processed_at SET NOT NULL;
ALTER TABLE orders DROP CONSTRAINT orders_processed_at_not_null;

Adding Indexes Without Locking

-- ❌ Locks the table for the entire index build
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- ✅ Builds concurrently — only takes ShareUpdateExclusiveLock (reads/writes continue)
CREATE INDEX CONCURRENTLY idx_orders_user_id ON orders(user_id);

-- Note: CONCURRENTLY is slower and can't run inside a transaction
-- If it fails midway, clean up with:
DROP INDEX CONCURRENTLY IF EXISTS idx_orders_user_id;
-- Then retry

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.

Staging + production environments with feature flags
Automated security scanning in the pipeline
Uptime monitoring + alerting + runbook automation
On-call support handover docs included

Modernize My DevOps WhatsApp

Shadow Table Pattern for Structural Changes

When you need to change a table's structure dramatically (partition it, change primary key type, add a new sort column), shadow tables let you do it online:

-- Scenario: migrate users table from BIGINT id to UUID id

-- STEP 1: Create new table with target structure
CREATE TABLE users_new (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  legacy_id   BIGINT UNIQUE NOT NULL,  -- keep for foreign key mapping
  email       TEXT UNIQUE NOT NULL,
  name        TEXT NOT NULL,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- STEP 2: Install CDC trigger on old table
CREATE OR REPLACE FUNCTION mirror_to_users_new()
RETURNS TRIGGER AS $$
BEGIN
  IF TG_OP = 'INSERT' THEN
    INSERT INTO users_new (legacy_id, email, name, created_at)
    VALUES (NEW.id, NEW.email, NEW.name, NEW.created_at)
    ON CONFLICT (legacy_id) DO NOTHING;
  ELSIF TG_OP = 'UPDATE' THEN
    UPDATE users_new SET email = NEW.email, name = NEW.name
    WHERE legacy_id = NEW.id;
  ELSIF TG_OP = 'DELETE' THEN
    DELETE FROM users_new WHERE legacy_id = OLD.id;
  END IF;
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER mirror_users
  AFTER INSERT OR UPDATE OR DELETE ON users
  FOR EACH ROW EXECUTE FUNCTION mirror_to_users_new();

-- STEP 3: Backfill historical data (batched)
INSERT INTO users_new (legacy_id, email, name, created_at)
SELECT id, email, name, created_at FROM users
ON CONFLICT (legacy_id) DO NOTHING;

-- STEP 4: Verify row counts match
SELECT
  (SELECT COUNT(*) FROM users) AS old_count,
  (SELECT COUNT(*) FROM users_new) AS new_count;

-- STEP 5: Cutover (brief lock window)
BEGIN;
  LOCK TABLE users IN ACCESS EXCLUSIVE MODE;
  -- Final sync of any rows that came in during backfill
  INSERT INTO users_new (legacy_id, email, name, created_at)
  SELECT id, email, name, created_at FROM users
  ON CONFLICT (legacy_id) DO NOTHING;
  -- Swap tables
  ALTER TABLE users RENAME TO users_old;
  ALTER TABLE users_new RENAME TO users;
  -- Update sequences, views, FK references here
COMMIT;

-- STEP 6: After full validation, drop old table
-- DROP TABLE users_old;  -- Wait at least 1 week to be safe

Migration Tool Comparison

Tool	Language	Approach	Version Control	Cloud Native
Flyway	SQL/Java	Versioned (V1__name.sql)	Git-friendly	AWS RDS support
Liquibase	XML/SQL/YAML	Changeset-based	Checksum validation	Docker-native
Prisma	TypeScript	Auto-generated	Migration files	Full-stack JS
golang-migrate	SQL	Sequential versioned	Git-friendly	Minimal deps
Alembic	Python	Auto + manual	Git-friendly	SQLAlchemy native

Prisma Migration Safety

// schema.prisma — Prisma generates migrations from schema diffs
model User {
  id          String   @id @default(uuid())
  email       String   @unique
  fullName    String?  // Nullable first (expand phase)
  legacyName  String?  // Old column kept during transition
  createdAt   DateTime @default(now())
}

# Generate migration (never use --create-only in production without review)
npx prisma migrate dev --name add_full_name_column

# Review the generated SQL before applying
cat prisma/migrations/20260904_add_full_name_column/migration.sql

# Apply in production (no prompts)
npx prisma migrate deploy

// prisma/migrations/20260904_add_full_name_column/migration.sql
-- Prisma generates this; review and modify if needed
ALTER TABLE "User" ADD COLUMN "fullName" TEXT;

-- Add your backfill manually before making it NOT NULL:
-- UPDATE "User" SET "fullName" = "legacyName" WHERE "fullName" IS NULL;

Deploy Sequence for Zero Downtime

// scripts/safe-deploy.ts
// Enforces correct migration + deploy ordering

interface DeployStep {
  name: string;
  phase: "pre-deploy" | "deploy" | "post-deploy";
  run: () => Promise<void>;
  rollback?: () => Promise<void>;
}

const deploySteps: DeployStep[] = [
  {
    name: "Run expand migrations",
    phase: "pre-deploy",
    run: async () => {
      // Only additive migrations: new tables, new nullable columns, new indexes
      await runMigrations({ tags: ["expand"] });
    },
    rollback: async () => {
      await rollbackMigrations({ tags: ["expand"] });
    },
  },
  {
    name: "Deploy application",
    phase: "deploy",
    run: async () => {
      await deployToECS({ desiredCount: 2, maxSurge: "100%", minHealthy: "50%" });
      await waitForHealthCheck({ timeout: 300_000 });
    },
    rollback: async () => {
      await rollbackECS();
    },
  },
  {
    name: "Run contract migrations",
    phase: "post-deploy",
    run: async () => {
      // Only after all old instances are drained
      await waitForOldInstanceDrain({ timeout: 60_000 });
      await runMigrations({ tags: ["contract"] });
    },
  },
];

async function safeDeploy() {
  const executed: DeployStep[] = [];

  for (const step of deploySteps) {
    console.log(`[${step.phase}] ${step.name}`);
    try {
      await step.run();
      executed.push(step);
    } catch (error) {
      console.error(`Step failed: ${step.name}`, error);

      // Rollback in reverse order
      for (const completedStep of executed.reverse()) {
        if (completedStep.rollback) {
          console.log(`Rolling back: ${completedStep.name}`);
          await completedStep.rollback().catch(console.error);
        }
      }
      throw error;
    }
  }
}

Migration Tagging with Flyway

-- V1__create_users.sql (baseline)
CREATE TABLE users (
  id BIGSERIAL PRIMARY KEY,
  email TEXT UNIQUE NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- V2__add_full_name_expand.sql (tagged: expand)
-- flyway:tag=expand
ALTER TABLE users ADD COLUMN full_name TEXT;
ALTER TABLE users ADD COLUMN legacy_name TEXT;
UPDATE users SET legacy_name = email;  -- preserve old data

-- V3__add_full_name_index.sql
CREATE INDEX CONCURRENTLY idx_users_full_name ON users(full_name);

-- V4__drop_legacy_name_contract.sql (tagged: contract — run after deploy)
-- flyway:tag=contract
ALTER TABLE users DROP COLUMN legacy_name;

Cost of Migration Downtime

Table Size	Naive ALTER TABLE	pg_repack	Shadow Table	CONCURRENTLY index
1 GB	30–120s downtime	~5 min online	~10 min online	~3 min online
10 GB	5–30 min downtime	~45 min online	~90 min online	~25 min online
100 GB	1–6h downtime	~8h online	~15h online	~4h online
1 TB	Days of downtime	3–5 days online	5–7 days online	2–3 days online

pg_repack is the best tool for large table rewrites — it creates a shadow table internally, syncs via triggers, then does a brief lock swap.

# Install pg_repack (requires extension on server)
psql -c "CREATE EXTENSION pg_repack;"

# Repack a table online (no downtime)
pg_repack --host=db.prod.internal --dbname=myapp --table=orders --jobs=2

Monitoring During Migrations

-- Monitor long-running migrations in real time
SELECT
  pid,
  now() - pg_stat_activity.query_start AS duration,
  query,
  state,
  wait_event_type,
  wait_event
FROM pg_stat_activity
WHERE state != 'idle'
  AND query_start < NOW() - INTERVAL '30 seconds'
ORDER BY duration DESC;

-- Check lock contention
SELECT
  blocked_locks.pid AS blocked_pid,
  blocked_activity.usename AS blocked_user,
  blocking_locks.pid AS blocking_pid,
  blocking_activity.usename AS blocking_user,
  blocked_activity.query AS blocked_statement,
  blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
  ON blocking_locks.locktype = blocked_locks.locktype
  AND blocking_locks.relation = blocked_locks.relation
  AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

Working With Viprasol

Migration accidents are among the most costly engineering incidents — and they're almost entirely preventable with the right sequencing strategy. Our team has run zero-downtime migrations on PostgreSQL databases ranging from 10GB to 8TB, using expand-contract, shadow tables, and pg_repack.

Cloud & DevOps services → | Talk to our engineers →

Zero-Downtime Database Migrations: Expand-Contract, Shadow Tables, and Safe Deploys

Why Migrations Break Production

The Expand-Contract Pattern

Example: Renaming a Column

☁️ Is Your Cloud Costing Too Much?

Adding NOT NULL Columns Safely

Adding Indexes Without Locking

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Shadow Table Pattern for Structural Changes

Migration Tool Comparison

Prisma Migration Safety

Deploy Sequence for Zero Downtime

Migration Tagging with Flyway

Cost of Migration Downtime

Monitoring During Migrations

See Also

Working With Viprasol

Viprasol Tech Team

Need DevOps & Cloud Expertise?

Making sense of your data at scale?

Related Articles

Database Schema Versioning: Flyway vs Liquibase vs Prisma Migrate in Production

Zero-Downtime Database Migrations: Expand-Contract Pattern and Safe Schema Changes

Event Sourcing with PostgreSQL: Append-Only Event Log, Projections, and Snapshots