AWS ECS Fargate in Production: Task Definitions, Service Discovery, and Blue/Green Deploys
Run production workloads on AWS ECS Fargate: configure task definitions with Secrets Manager, set up service discovery with Cloud Map, implement blue/green deployments with CodeDeploy, and monitor with Container Insights.
ECS Fargate removes the burden of managing EC2 instances — no AMI patching, no capacity planning for the underlying nodes. You define a task (container spec + resource requirements), and Fargate runs it. The operational model is closer to Lambda than EC2, but with persistent connections and no cold start penalty.
The production setup: Application Load Balancer → ECS Service → Fargate tasks, with blue/green deployments for zero-downtime releases and Container Insights for observability.
Task Definition
# modules/ecs-service/main.tf (Terraform)
resource "aws_ecs_task_definition" "this" {
family = local.full_name
requires_compatibilities = ["FARGATE"]
network_mode = "awsvpc" # Required for Fargate
cpu = var.cpu
memory = var.memory
# Execution role: allows ECS to pull images and write logs
execution_role_arn = aws_iam_role.execution.arn
# Task role: permissions for your application code
task_role_arn = aws_iam_role.task.arn
container_definitions = jsonencode([
{
name = var.service_name
image = var.docker_image
essential = true
portMappings = [{
containerPort = var.container_port
protocol = "tcp"
}]
# Environment variables — non-secret values
environment = [
for k, v in var.environment_variables : { name = k, value = v }
]
# Secrets from AWS Secrets Manager or SSM Parameter Store
# ECS injects these as environment variables at container start
secrets = [
for k, v in var.secrets : { name = k, valueFrom = v }
]
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = "/ecs/${local.full_name}"
"awslogs-region" = data.aws_region.current.name
"awslogs-stream-prefix" = "ecs"
# Structured JSON logging for CloudWatch Insights queries
"awslogs-multiline-pattern" = "^{"
}
}
healthCheck = {
command = ["CMD-SHELL", "curl -sf http://localhost:${var.container_port}${var.health_check_path} || exit 1"]
interval = 30
timeout = 5
retries = 3
startPeriod = 60 # Grace period for slow startup
}
# Prevent container escape — drop all capabilities
linuxParameters = {
capabilities = {
drop = ["ALL"]
add = []
}
readonlyRootFilesystem = false # Set true if your app doesn't write to FS
}
# Stop timeout: how long to wait for graceful shutdown
stopTimeout = 60 # Give Node.js time to drain connections
}
])
tags = local.common_tags
}
IAM Roles
# Execution role: ECS control plane permissions
resource "aws_iam_role" "execution" {
name = "${local.full_name}-execution"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = { Service = "ecs-tasks.amazonaws.com" }
Action = "sts:AssumeRole"
}]
})
}
resource "aws_iam_role_policy_attachment" "execution_basic" {
role = aws_iam_role.execution.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
# Allow ECS to read specific secrets
resource "aws_iam_role_policy" "execution_secrets" {
name = "secrets-access"
role = aws_iam_role.execution.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = [
"secretsmanager:GetSecretValue",
"ssm:GetParameters",
"kms:Decrypt"
]
Resource = [for arn in values(var.secrets) : arn]
}]
})
}
# Task role: application code permissions
resource "aws_iam_role" "task" {
name = "${local.full_name}-task"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = { Service = "ecs-tasks.amazonaws.com" }
Action = "sts:AssumeRole"
Condition = {
StringEquals = {
"aws:SourceAccount" = data.aws_caller_identity.current.account_id
}
}
}]
})
}
☁️ Is Your Cloud Costing Too Much?
Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.
- AWS, GCP, Azure certified engineers
- Infrastructure as Code (Terraform, CDK)
- Docker, Kubernetes, GitHub Actions CI/CD
- Typical audit recovers $500–$3,000/month in savings
ECS Service with Blue/Green Deployment
# ECS Service configured for CodeDeploy blue/green
resource "aws_ecs_service" "this" {
name = local.full_name
cluster = data.aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.this.arn
desired_count = var.desired_count
launch_type = "FARGATE"
network_configuration {
subnets = var.private_subnet_ids
security_groups = [aws_security_group.tasks.id]
assign_public_ip = false
}
# Blue/green requires EXTERNAL deployment controller
deployment_controller {
type = "CODE_DEPLOY"
}
# Load balancer: blue target group (CodeDeploy manages green)
load_balancer {
target_group_arn = aws_lb_target_group.blue.arn
container_name = var.service_name
container_port = var.container_port
}
# Service discovery: register tasks with Cloud Map
service_registries {
registry_arn = aws_service_discovery_service.this.arn
container_name = var.service_name
container_port = var.container_port
}
# Prevent Terraform from managing deployment (CodeDeploy handles it)
lifecycle {
ignore_changes = [
task_definition,
load_balancer,
desired_count,
]
}
tags = local.common_tags
}
# Two target groups for blue/green
resource "aws_lb_target_group" "blue" {
name = "${local.full_name}-blue"
port = var.container_port
protocol = "HTTP"
vpc_id = var.vpc_id
target_type = "ip" # Required for Fargate awsvpc mode
health_check {
path = var.health_check_path
healthy_threshold = 2
unhealthy_threshold = 3
timeout = 5
interval = 30
matcher = "200-299"
}
deregistration_delay = 30 # Wait 30s after removing from LB (connection draining)
}
resource "aws_lb_target_group" "green" {
name = "${local.full_name}-green"
port = var.container_port
protocol = "HTTP"
vpc_id = var.vpc_id
target_type = "ip"
health_check {
path = var.health_check_path
matcher = "200-299"
}
deregistration_delay = 30
}
CodeDeploy Blue/Green Configuration
resource "aws_codedeploy_app" "this" {
compute_platform = "ECS"
name = local.full_name
}
resource "aws_codedeploy_deployment_group" "this" {
app_name = aws_codedeploy_app.this.name
deployment_group_name = local.full_name
service_role_arn = aws_iam_role.codedeploy.arn
deployment_config_name = "CodeDeployDefault.ECSAllAtOnce"
ecs_service {
cluster_name = data.aws_ecs_cluster.main.name
service_name = aws_ecs_service.this.name
}
load_balancer_info {
target_group_pair_info {
prod_traffic_route {
listener_arns = [aws_lb_listener.https.arn]
}
test_traffic_route {
# Test traffic on port 8080 before switching production
listener_arns = [aws_lb_listener.test.arn]
}
target_group { name = aws_lb_target_group.blue.name }
target_group { name = aws_lb_target_group.green.name }
}
}
blue_green_deployment_config {
deployment_ready_option {
action_on_timeout = "CONTINUE_DEPLOYMENT"
wait_time_in_minutes = 0 # Auto-proceed (change to STOP_DEPLOYMENT for manual approval)
}
terminate_blue_instances_on_deployment_success {
action = "TERMINATE"
termination_wait_time_in_minutes = 5 # Keep blue running 5min after cutover (for rollback)
}
}
}
⚙️ DevOps Done Right — Zero Downtime, Full Automation
Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.
- Staging + production environments with feature flags
- Automated security scanning in the pipeline
- Uptime monitoring + alerting + runbook automation
- On-call support handover docs included
Service Discovery with Cloud Map
# Internal service-to-service communication without going through ALB
resource "aws_service_discovery_private_dns_namespace" "main" {
name = "internal.production"
vpc = var.vpc_id
}
resource "aws_service_discovery_service" "this" {
name = var.service_name
dns_config {
namespace_id = aws_service_discovery_private_dns_namespace.main.id
dns_records {
ttl = 10
type = "A"
}
routing_policy = "MULTIVALUE"
}
health_check_custom_config {
failure_threshold = 1
}
}
# Service is now reachable at:
# api-server.internal.production:3000 (from other containers in the VPC)
// src/lib/internal-client.ts
// Use Cloud Map DNS for internal service calls (no ALB hop)
export const internalApiClient = {
baseUrl: process.env.NODE_ENV === "production"
? "http://api-server.internal.production:3000" // Cloud Map
: "http://localhost:3000",
};
Container Insights Monitoring
# Enable Container Insights for the cluster
resource "aws_ecs_cluster" "main" {
name = "production"
setting {
name = "containerInsights"
value = "enabled"
}
}
# Useful CloudWatch Insights queries for ECS
# P99 task CPU utilization by service
fields @timestamp, ServiceName, CpuUtilized, CpuReserved
| filter Type = "Task"
| stats pct(CpuUtilized/CpuReserved*100, 99) as p99_cpu by ServiceName
| sort p99_cpu desc
# Tasks with OOM kills (memory exceeded limit)
fields @timestamp, ServiceName, TaskId
| filter Type = "Task" AND StopCode = "OutOfMemoryError"
| stats count() as oom_count by ServiceName
| sort oom_count desc
# Average task startup time (from PENDING to RUNNING)
fields @timestamp, ServiceName, LaunchType, @message
| filter Type = "Task" AND CurrentStatus = "RUNNING"
| stats avg(startedAt - createdAt) as avg_startup_ms by ServiceName
Deployment Script
#!/bin/bash
# scripts/deploy.sh — deploy new image via CodeDeploy blue/green
set -e
SERVICE_NAME=$1
IMAGE_TAG=$2
CLUSTER="production"
REGION="us-east-1"
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
# Get current task definition
TASK_DEF=$(aws ecs describe-task-definition \
--task-definition "${SERVICE_NAME}-production" \
--query "taskDefinition" \
--output json)
# Update image tag in task definition
NEW_TASK_DEF=$(echo $TASK_DEF | jq \
--arg IMAGE "${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${SERVICE_NAME}:${IMAGE_TAG}" \
'.containerDefinitions[0].image = $IMAGE | del(.taskDefinitionArn, .revision, .status, .requiresAttributes, .compatibilities, .registeredAt, .registeredBy)')
# Register new task definition revision
NEW_TASK_ARN=$(aws ecs register-task-definition \
--cli-input-json "$NEW_TASK_DEF" \
--query "taskDefinition.taskDefinitionArn" \
--output text)
echo "Registered new task definition: $NEW_TASK_ARN"
# Create CodeDeploy deployment (blue/green)
DEPLOYMENT_ID=$(aws deploy create-deployment \
--application-name "${SERVICE_NAME}-production" \
--deployment-group-name "${SERVICE_NAME}-production" \
--revision "revisionType=AppSpecContent,appSpecContent={content='{\"version\":0.0,\"Resources\":[{\"TargetService\":{\"Type\":\"AWS::ECS::Service\",\"Properties\":{\"TaskDefinition\":\"'\"$NEW_TASK_ARN\"'\",\"LoadBalancerInfo\":{\"ContainerName\":\"'\"$SERVICE_NAME\"'\",\"ContainerPort\":3000}}}}]}'}" \
--query "deploymentId" \
--output text)
echo "Started deployment: $DEPLOYMENT_ID"
# Wait for deployment to complete
aws deploy wait deployment-successful --deployment-id $DEPLOYMENT_ID
echo "Deployment $DEPLOYMENT_ID succeeded"
See Also
- Kubernetes Cost Optimization — ECS vs EKS tradeoffs
- Terraform Module Design — IaC for ECS
- Database Connection Pooling — RDS Proxy with Fargate
- Zero-Trust Security — network security for ECS
Working With Viprasol
ECS Fargate reduces operational overhead significantly compared to self-managed EC2, but getting production-grade deployments right — blue/green with proper health checks, secrets injection from Secrets Manager, Container Insights monitoring — requires careful setup. Our AWS engineers design and implement ECS architectures from scratch or migrate existing EC2 workloads to Fargate.
About the Author
Viprasol Tech Team
Custom Software Development Specialists
The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.
Need DevOps & Cloud Expertise?
Scale your infrastructure with confidence. AWS, GCP, Azure certified team.
Free consultation • No commitment • Response within 24 hours
Making sense of your data at scale?
Viprasol builds end-to-end big data analytics solutions — ETL pipelines, data warehouses on Snowflake or BigQuery, and self-service BI dashboards. One reliable source of truth for your entire organisation.