Back to Blog

Zero Downtime Deployment: Blue-Green, Canary, and Feature Flags Explained

Zero downtime deployment strategies in 2026 — blue-green deployments, canary releases, feature flags, and rolling updates with real Kubernetes, AWS, and CI/CD i

Viprasol Tech Team
March 22, 2026
13 min read

Zero Downtime Deployment: Blue-Green, Canary, and Feature Flags Explained

A production outage during deployment costs money, damages customer trust, and creates exactly the wrong incentive for engineering teams — if deploying is risky, teams deploy less frequently, which makes each deployment larger and riskier.

Zero downtime deployment breaks this cycle. When every deploy is safe, small, and reversible, teams ship more often, catch problems earlier, and build a culture where continuous improvement is the default.

This guide covers the four primary strategies — rolling updates, blue-green, canary, and feature flags — with production-ready implementation examples for Kubernetes, AWS, and common CI/CD pipelines.


Why Deployments Cause Downtime

Before choosing a strategy, understand the failure modes:

  1. Cold cutover: Old version stopped, new version started; gap between them = downtime
  2. Failed healthcheck: New version starts, fails health check, traffic stays on failing instance
  3. Incompatible database migration: New code assumes schema that doesn't exist yet (or vice versa)
  4. Connection draining: In-flight requests killed when pod/instance is terminated
  5. Dependency version mismatch: New service version requires updated sidecar/library not yet deployed

Each strategy addresses some of these. None addresses all of them without the others.


Strategy 1: Rolling Update

The simplest zero-downtime approach. Replace instances one at a time, waiting for each to become healthy before proceeding.

Kubernetes Rolling Update

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
spec:
  replicas: 6
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2         # Max extra pods during update
      maxUnavailable: 0   # Never reduce below desired replica count
  selector:
    matchLabels:
      app: api-service
  template:
    metadata:
      labels:
        app: api-service
    spec:
      containers:
        - name: api
          image: your-registry/api:${VERSION}
          ports:
            - containerPort: 3000
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 3000
            initialDelaySeconds: 10
            periodSeconds: 5
            failureThreshold: 3
          livenessProbe:
            httpGet:
              path: /health/live
              port: 3000
            initialDelaySeconds: 30
            periodSeconds: 10
          lifecycle:
            preStop:
              exec:
                # Allow in-flight requests to complete before pod terminates
                command: ["/bin/sh", "-c", "sleep 10"]
      terminationGracePeriodSeconds: 30

Readiness vs. Liveness probes — get this right:

  • readinessProbe: "Is this pod ready to receive traffic?" — fails this, Kubernetes removes it from the load balancer
  • livenessProbe: "Is this pod still alive?" — fails this, Kubernetes restarts the pod
// Health check endpoints — these must be fast and accurate
// GET /health/ready — fails if DB is unavailable, queues are backed up, etc.
app.get('/health/ready', async (req, res) => {
  try {
    await db.raw('SELECT 1');  // Quick DB connectivity check
    res.json({ status: 'ready' });
  } catch {
    res.status(503).json({ status: 'not ready', reason: 'database unavailable' });
  }
});

// GET /health/live — only fails if the process itself is broken
app.get('/health/live', (req, res) => {
  res.json({ status: 'alive' });
});

Rolling update limitations:

  • Both versions run simultaneously during rollout — API must be backward-compatible
  • Slow for large clusters (rolling through 50 pods takes time)
  • No traffic control (you can't route only 5% of traffic to the new version)

☁️ Is Your Cloud Costing Too Much?

Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.

  • AWS, GCP, Azure certified engineers
  • Infrastructure as Code (Terraform, CDK)
  • Docker, Kubernetes, GitHub Actions CI/CD
  • Typical audit recovers $500–$3,000/month in savings

Strategy 2: Blue-Green Deployment

Two identical environments (blue = current, green = new). Traffic switches atomically from blue to green. Rollback = switch back to blue.

Before deploy:  100% traffic → BLUE (v1.2)
During deploy:  green (v1.3) receives 0% traffic, being prepared
After deploy:   100% traffic → GREEN (v1.3)
Rollback:       100% traffic → BLUE (v1.2) — instant

AWS ALB Blue-Green

#!/bin/bash
# blue-green-deploy.sh

CURRENT_TG=$(aws elbv2 describe-rules \
  --listener-arn "$LISTENER_ARN" \
  --query 'Rules[?Priority==`100`].Actions[0].TargetGroupArn' \
  --output text)

# Determine which is blue and which is green
if [[ "$CURRENT_TG" == "$BLUE_TG_ARN" ]]; then
  NEW_TG="$GREEN_TG_ARN"
  LABEL="green"
else
  NEW_TG="$BLUE_TG_ARN"
  LABEL="blue"
fi

echo "Deploying to $LABEL environment ($NEW_TG)"

# Deploy new version to the inactive target group
aws ecs update-service \
  --cluster "$CLUSTER" \
  --service "api-${LABEL}" \
  --task-definition "api:${NEW_VERSION}" \
  --force-new-deployment

# Wait for service to stabilize
aws ecs wait services-stable \
  --cluster "$CLUSTER" \
  --services "api-${LABEL}"

# Run smoke tests against the inactive environment
./scripts/smoke-test.sh "https://${LABEL}.internal.example.com"

if [[ $? -ne 0 ]]; then
  echo "Smoke tests failed — aborting, traffic stays on current environment"
  exit 1
fi

# Switch traffic to new environment
aws elbv2 modify-rule \
  --rule-arn "$RULE_ARN" \
  --actions "Type=forward,TargetGroupArn=${NEW_TG}"

echo "Traffic switched to $LABEL environment"
echo "Previous environment ($CURRENT_TG) standing by for rollback"

Blue-green advantages:

  • Instant rollback (switch DNS/load balancer back)
  • New version is fully tested before it receives any production traffic
  • Clean separation — no version mixing during transition

Blue-green limitations:

  • Requires 2x infrastructure cost during deployment
  • Database migrations must be forward/backward compatible
  • Cold-start if new version hasn't warmed up before traffic switch

Strategy 3: Canary Release

Route a small percentage of production traffic to the new version, monitor metrics, then gradually increase.

Phase 1: 1% → new version, 99% → old (5 minutes, watch error rates)
Phase 2: 10% → new version, 90% → old (15 minutes)
Phase 3: 50% → new version, 50% → old (30 minutes)
Phase 4: 100% → new version (cleanup old version)

Kubernetes with Argo Rollouts

# rollout.yaml — requires Argo Rollouts controller
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: api-service
spec:
  replicas: 10
  selector:
    matchLabels:
      app: api-service
  strategy:
    canary:
      steps:
        - setWeight: 10
        - pause: { duration: 5m }
        - analysis:
            templates:
              - templateName: error-rate-check
        - setWeight: 30
        - pause: { duration: 10m }
        - setWeight: 60
        - pause: { duration: 10m }
        - setWeight: 100
      # Automatic rollback on analysis failure
      autoPromotionEnabled: false

---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: error-rate-check
spec:
  metrics:
    - name: error-rate
      interval: 1m
      successCondition: result[0] < 0.01   # < 1% error rate
      failureLimit: 3
      provider:
        prometheus:
          address: http://prometheus:9090
          query: |
            sum(rate(http_requests_total{status=~"5.."}[2m])) /
            sum(rate(http_requests_total[2m]))

Nginx Canary with Kubernetes Ingress

# Stable ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-stable
spec:
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /
            backend:
              service:
                name: api-stable
                port: { number: 80 }

---
# Canary ingress — nginx.ingress.kubernetes.io/canary annotations control weight
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-canary
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "10"   # 10% to canary
spec:
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /
            backend:
              service:
                name: api-canary
                port: { number: 80 }

Increase canary-weight from 10 → 30 → 60 → 100 as you validate the canary. At 100%, delete the canary ingress.


⚙️ DevOps Done Right — Zero Downtime, Full Automation

Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.

  • Staging + production environments with feature flags
  • Automated security scanning in the pipeline
  • Uptime monitoring + alerting + runbook automation
  • On-call support handover docs included

Strategy 4: Feature Flags

Decouple deployment from release. Code ships to all users, but features are gated by flag state. This is the most powerful and flexible approach — and the most commonly underused.

// Feature flag evaluation — supports percentage rollouts and user targeting
interface FlagConfig {
  enabled: boolean;
  rolloutPercentage: number;  // 0–100
  allowList?: string[];       // Always-on user IDs
  denyList?: string[];        // Always-off user IDs
}

class FeatureFlags {
  async isEnabled(flagName: string, userId: string): Promise<boolean> {
    const config = await this.getConfig(flagName);
    
    if (!config.enabled) return false;
    if (config.denyList?.includes(userId)) return false;
    if (config.allowList?.includes(userId)) return true;
    
    // Stable hash — same user always gets same bucket
    const bucket = this.stableHash(userId + flagName) % 100;
    return bucket < config.rolloutPercentage;
  }

  private stableHash(input: string): number {
    let hash = 0;
    for (let i = 0; i < input.length; i++) {
      const char = input.charCodeAt(i);
      hash = (hash << 5) - hash + char;
      hash = hash & hash; // Convert to 32-bit int
    }
    return Math.abs(hash);
  }
}

// Usage in application code
const flags = new FeatureFlags();

async function handleCheckout(userId: string, cart: Cart) {
  const useNewCheckout = await flags.isEnabled('new-checkout-flow', userId);
  
  if (useNewCheckout) {
    return newCheckoutService.process(cart);
  }
  return legacyCheckoutService.process(cart);
}

Feature flag services (2026):

ToolSelf-HostedSaaSBest For
LaunchDarklyEnterprise, complex targeting
UnleashOpen-source, full control
FlagsmithMid-market, easy setup
PostHogCombined with product analytics
CustomSimple use cases, full ownership

For simple use cases, a database-backed feature flag system is 2–3 days of engineering work and eliminates the $20k+/year LaunchDarkly bill.


Database Migration Safety

The most common cause of deployment-related outages is database schema changes that break compatibility. Use expand/contract:

-- ❌ WRONG: Adding NOT NULL column in one step breaks existing app instances
ALTER TABLE orders ADD COLUMN shipping_method TEXT NOT NULL DEFAULT 'standard';

-- ✅ RIGHT: Expand/contract across 3 deployments

-- Deployment 1: Add nullable column (no existing code breaks)
ALTER TABLE orders ADD COLUMN shipping_method TEXT;

-- Between Deployment 1 and 2: Application writes to new column, reads from both
-- Background job backfills existing rows
UPDATE orders SET shipping_method = 'standard' WHERE shipping_method IS NULL;

-- Deployment 2: Add NOT NULL constraint after backfill completes
ALTER TABLE orders ALTER COLUMN shipping_method SET NOT NULL;
ALTER TABLE orders ALTER COLUMN shipping_method SET DEFAULT 'standard';

-- Deployment 3: Clean up any compatibility shims in application code

Deployment Strategy Selection Guide

SituationRecommended Strategy
Small team, simple app, <100 usersRolling update
Need instant rollback capabilityBlue-green
Releasing risky changes to large user baseCanary
Separating code deploy from feature releaseFeature flags
Database schema changesExpand/contract + rolling
High-traffic e-commerce, payment flowsCanary + feature flags
Compliance-sensitive features (healthcare, fintech)Feature flags with audit trail

Most mature engineering organizations use all four in combination:

  • Rolling updates for routine service deployments
  • Blue-green for major infrastructure changes
  • Canary for risky application changes
  • Feature flags for product releases

Implementation Costs

ScopeInvestment
Rolling update setup (Kubernetes)$3,000–$8,000
Blue-green pipeline implementation$8,000–$20,000
Canary with automated analysis$15,000–$35,000
Feature flag system (custom)$5,000–$15,000
Full deployment platform (all strategies)$30,000–$70,000

Most teams underinvest here relative to the value. A single prevented outage typically pays for the entire deployment infrastructure investment.


Working With Viprasol

We design and implement deployment pipelines that eliminate deployment-related downtime — from simple Kubernetes rolling updates through full canary release systems with automated analysis and rollback.

Discuss your deployment infrastructure →
Cloud Solutions →
DevOps as a Service →


See Also


Share this article:

About the Author

V

Viprasol Tech Team

Custom Software Development Specialists

The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.

MT4/MT5 EA DevelopmentAI Agent SystemsSaaS DevelopmentAlgorithmic Trading

Need DevOps & Cloud Expertise?

Scale your infrastructure with confidence. AWS, GCP, Azure certified team.

Free consultation • No commitment • Response within 24 hours

Viprasol · Big Data & Analytics

Making sense of your data at scale?

Viprasol builds end-to-end big data analytics solutions — ETL pipelines, data warehouses on Snowflake or BigQuery, and self-service BI dashboards. One reliable source of truth for your entire organisation.