Back to Blog

AWS ECS Fargate in Production: Task Definitions, Service Discovery, and Blue/Green Deploys

Run production workloads on AWS ECS Fargate: configure task definitions with Secrets Manager, set up service discovery with Cloud Map, implement blue/green deployments with CodeDeploy, and monitor with Container Insights.

Viprasol Tech Team
October 22, 2026
13 min read

ECS Fargate removes the burden of managing EC2 instances — no AMI patching, no capacity planning for the underlying nodes. You define a task (container spec + resource requirements), and Fargate runs it. The operational model is closer to Lambda than EC2, but with persistent connections and no cold start penalty.

The production setup: Application Load Balancer → ECS Service → Fargate tasks, with blue/green deployments for zero-downtime releases and Container Insights for observability.


Task Definition

# modules/ecs-service/main.tf (Terraform)

resource "aws_ecs_task_definition" "this" {
  family                   = local.full_name
  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"  # Required for Fargate
  cpu                      = var.cpu
  memory                   = var.memory

  # Execution role: allows ECS to pull images and write logs
  execution_role_arn = aws_iam_role.execution.arn
  # Task role: permissions for your application code
  task_role_arn      = aws_iam_role.task.arn

  container_definitions = jsonencode([
    {
      name      = var.service_name
      image     = var.docker_image
      essential = true

      portMappings = [{
        containerPort = var.container_port
        protocol      = "tcp"
      }]

      # Environment variables — non-secret values
      environment = [
        for k, v in var.environment_variables : { name = k, value = v }
      ]

      # Secrets from AWS Secrets Manager or SSM Parameter Store
      # ECS injects these as environment variables at container start
      secrets = [
        for k, v in var.secrets : { name = k, valueFrom = v }
      ]

      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = "/ecs/${local.full_name}"
          "awslogs-region"        = data.aws_region.current.name
          "awslogs-stream-prefix" = "ecs"
          # Structured JSON logging for CloudWatch Insights queries
          "awslogs-multiline-pattern" = "^{"
        }
      }

      healthCheck = {
        command     = ["CMD-SHELL", "curl -sf http://localhost:${var.container_port}${var.health_check_path} || exit 1"]
        interval    = 30
        timeout     = 5
        retries     = 3
        startPeriod = 60  # Grace period for slow startup
      }

      # Prevent container escape — drop all capabilities
      linuxParameters = {
        capabilities = {
          drop = ["ALL"]
          add  = []
        }
        readonlyRootFilesystem = false  # Set true if your app doesn't write to FS
      }

      # Stop timeout: how long to wait for graceful shutdown
      stopTimeout = 60  # Give Node.js time to drain connections
    }
  ])

  tags = local.common_tags
}

IAM Roles

# Execution role: ECS control plane permissions
resource "aws_iam_role" "execution" {
  name = "${local.full_name}-execution"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = { Service = "ecs-tasks.amazonaws.com" }
      Action    = "sts:AssumeRole"
    }]
  })
}

resource "aws_iam_role_policy_attachment" "execution_basic" {
  role       = aws_iam_role.execution.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}

# Allow ECS to read specific secrets
resource "aws_iam_role_policy" "execution_secrets" {
  name = "secrets-access"
  role = aws_iam_role.execution.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Action = [
        "secretsmanager:GetSecretValue",
        "ssm:GetParameters",
        "kms:Decrypt"
      ]
      Resource = [for arn in values(var.secrets) : arn]
    }]
  })
}

# Task role: application code permissions
resource "aws_iam_role" "task" {
  name = "${local.full_name}-task"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = { Service = "ecs-tasks.amazonaws.com" }
      Action    = "sts:AssumeRole"
      Condition = {
        StringEquals = {
          "aws:SourceAccount" = data.aws_caller_identity.current.account_id
        }
      }
    }]
  })
}

☁️ Is Your Cloud Costing Too Much?

Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.

  • AWS, GCP, Azure certified engineers
  • Infrastructure as Code (Terraform, CDK)
  • Docker, Kubernetes, GitHub Actions CI/CD
  • Typical audit recovers $500–$3,000/month in savings

ECS Service with Blue/Green Deployment

# ECS Service configured for CodeDeploy blue/green
resource "aws_ecs_service" "this" {
  name            = local.full_name
  cluster         = data.aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.this.arn
  desired_count   = var.desired_count
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = var.private_subnet_ids
    security_groups  = [aws_security_group.tasks.id]
    assign_public_ip = false
  }

  # Blue/green requires EXTERNAL deployment controller
  deployment_controller {
    type = "CODE_DEPLOY"
  }

  # Load balancer: blue target group (CodeDeploy manages green)
  load_balancer {
    target_group_arn = aws_lb_target_group.blue.arn
    container_name   = var.service_name
    container_port   = var.container_port
  }

  # Service discovery: register tasks with Cloud Map
  service_registries {
    registry_arn   = aws_service_discovery_service.this.arn
    container_name = var.service_name
    container_port = var.container_port
  }

  # Prevent Terraform from managing deployment (CodeDeploy handles it)
  lifecycle {
    ignore_changes = [
      task_definition,
      load_balancer,
      desired_count,
    ]
  }

  tags = local.common_tags
}

# Two target groups for blue/green
resource "aws_lb_target_group" "blue" {
  name        = "${local.full_name}-blue"
  port        = var.container_port
  protocol    = "HTTP"
  vpc_id      = var.vpc_id
  target_type = "ip"  # Required for Fargate awsvpc mode

  health_check {
    path                = var.health_check_path
    healthy_threshold   = 2
    unhealthy_threshold = 3
    timeout             = 5
    interval            = 30
    matcher             = "200-299"
  }

  deregistration_delay = 30  # Wait 30s after removing from LB (connection draining)
}

resource "aws_lb_target_group" "green" {
  name        = "${local.full_name}-green"
  port        = var.container_port
  protocol    = "HTTP"
  vpc_id      = var.vpc_id
  target_type = "ip"

  health_check {
    path    = var.health_check_path
    matcher = "200-299"
  }

  deregistration_delay = 30
}

CodeDeploy Blue/Green Configuration

resource "aws_codedeploy_app" "this" {
  compute_platform = "ECS"
  name             = local.full_name
}

resource "aws_codedeploy_deployment_group" "this" {
  app_name               = aws_codedeploy_app.this.name
  deployment_group_name  = local.full_name
  service_role_arn       = aws_iam_role.codedeploy.arn
  deployment_config_name = "CodeDeployDefault.ECSAllAtOnce"

  ecs_service {
    cluster_name = data.aws_ecs_cluster.main.name
    service_name = aws_ecs_service.this.name
  }

  load_balancer_info {
    target_group_pair_info {
      prod_traffic_route {
        listener_arns = [aws_lb_listener.https.arn]
      }
      test_traffic_route {
        # Test traffic on port 8080 before switching production
        listener_arns = [aws_lb_listener.test.arn]
      }
      target_group { name = aws_lb_target_group.blue.name }
      target_group { name = aws_lb_target_group.green.name }
    }
  }

  blue_green_deployment_config {
    deployment_ready_option {
      action_on_timeout = "CONTINUE_DEPLOYMENT"
      wait_time_in_minutes = 0  # Auto-proceed (change to STOP_DEPLOYMENT for manual approval)
    }

    terminate_blue_instances_on_deployment_success {
      action                           = "TERMINATE"
      termination_wait_time_in_minutes = 5  # Keep blue running 5min after cutover (for rollback)
    }
  }
}

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.

  • Staging + production environments with feature flags
  • Automated security scanning in the pipeline
  • Uptime monitoring + alerting + runbook automation
  • On-call support handover docs included

Service Discovery with Cloud Map

# Internal service-to-service communication without going through ALB
resource "aws_service_discovery_private_dns_namespace" "main" {
  name = "internal.production"
  vpc  = var.vpc_id
}

resource "aws_service_discovery_service" "this" {
  name = var.service_name

  dns_config {
    namespace_id = aws_service_discovery_private_dns_namespace.main.id
    dns_records {
      ttl  = 10
      type = "A"
    }
    routing_policy = "MULTIVALUE"
  }

  health_check_custom_config {
    failure_threshold = 1
  }
}

# Service is now reachable at:
# api-server.internal.production:3000 (from other containers in the VPC)
// src/lib/internal-client.ts
// Use Cloud Map DNS for internal service calls (no ALB hop)

export const internalApiClient = {
  baseUrl: process.env.NODE_ENV === "production"
    ? "http://api-server.internal.production:3000"  // Cloud Map
    : "http://localhost:3000",
};

Container Insights Monitoring

# Enable Container Insights for the cluster
resource "aws_ecs_cluster" "main" {
  name = "production"

  setting {
    name  = "containerInsights"
    value = "enabled"
  }
}
# Useful CloudWatch Insights queries for ECS

# P99 task CPU utilization by service
fields @timestamp, ServiceName, CpuUtilized, CpuReserved
| filter Type = "Task"
| stats pct(CpuUtilized/CpuReserved*100, 99) as p99_cpu by ServiceName
| sort p99_cpu desc

# Tasks with OOM kills (memory exceeded limit)
fields @timestamp, ServiceName, TaskId
| filter Type = "Task" AND StopCode = "OutOfMemoryError"
| stats count() as oom_count by ServiceName
| sort oom_count desc

# Average task startup time (from PENDING to RUNNING)
fields @timestamp, ServiceName, LaunchType, @message
| filter Type = "Task" AND CurrentStatus = "RUNNING"
| stats avg(startedAt - createdAt) as avg_startup_ms by ServiceName

Deployment Script

#!/bin/bash
# scripts/deploy.sh — deploy new image via CodeDeploy blue/green

set -e

SERVICE_NAME=$1
IMAGE_TAG=$2
CLUSTER="production"
REGION="us-east-1"
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)

# Get current task definition
TASK_DEF=$(aws ecs describe-task-definition \
  --task-definition "${SERVICE_NAME}-production" \
  --query "taskDefinition" \
  --output json)

# Update image tag in task definition
NEW_TASK_DEF=$(echo $TASK_DEF | jq \
  --arg IMAGE "${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${SERVICE_NAME}:${IMAGE_TAG}" \
  '.containerDefinitions[0].image = $IMAGE | del(.taskDefinitionArn, .revision, .status, .requiresAttributes, .compatibilities, .registeredAt, .registeredBy)')

# Register new task definition revision
NEW_TASK_ARN=$(aws ecs register-task-definition \
  --cli-input-json "$NEW_TASK_DEF" \
  --query "taskDefinition.taskDefinitionArn" \
  --output text)

echo "Registered new task definition: $NEW_TASK_ARN"

# Create CodeDeploy deployment (blue/green)
DEPLOYMENT_ID=$(aws deploy create-deployment \
  --application-name "${SERVICE_NAME}-production" \
  --deployment-group-name "${SERVICE_NAME}-production" \
  --revision "revisionType=AppSpecContent,appSpecContent={content='{\"version\":0.0,\"Resources\":[{\"TargetService\":{\"Type\":\"AWS::ECS::Service\",\"Properties\":{\"TaskDefinition\":\"'\"$NEW_TASK_ARN\"'\",\"LoadBalancerInfo\":{\"ContainerName\":\"'\"$SERVICE_NAME\"'\",\"ContainerPort\":3000}}}}]}'}" \
  --query "deploymentId" \
  --output text)

echo "Started deployment: $DEPLOYMENT_ID"

# Wait for deployment to complete
aws deploy wait deployment-successful --deployment-id $DEPLOYMENT_ID
echo "Deployment $DEPLOYMENT_ID succeeded"

See Also


Working With Viprasol

ECS Fargate reduces operational overhead significantly compared to self-managed EC2, but getting production-grade deployments right — blue/green with proper health checks, secrets injection from Secrets Manager, Container Insights monitoring — requires careful setup. Our AWS engineers design and implement ECS architectures from scratch or migrate existing EC2 workloads to Fargate.

AWS infrastructure services → | Talk to our engineers →

Share this article:

About the Author

V

Viprasol Tech Team

Custom Software Development Specialists

The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.

MT4/MT5 EA DevelopmentAI Agent SystemsSaaS DevelopmentAlgorithmic Trading

Need DevOps & Cloud Expertise?

Scale your infrastructure with confidence. AWS, GCP, Azure certified team.

Free consultation • No commitment • Response within 24 hours

Viprasol · Big Data & Analytics

Making sense of your data at scale?

Viprasol builds end-to-end big data analytics solutions — ETL pipelines, data warehouses on Snowflake or BigQuery, and self-service BI dashboards. One reliable source of truth for your entire organisation.