Back to Blog

AWS Step Functions: State Machines, Error Handling, Parallel Execution, and Lambda Orchestration

Build production AWS Step Functions workflows: state machine design, Lambda orchestration, error handling with retry/catch, parallel execution, Map state for batch processing, and Terraform IaC.

Viprasol Tech Team
November 25, 2026
13 min read

Complex multi-step serverless workflows — order processing, document pipelines, ML training jobs — are hard to build reliably with Lambda alone. You end up passing state through SQS queues, tracking job progress in DynamoDB, and hand-rolling retry logic. AWS Step Functions provides a managed state machine that handles sequencing, parallel execution, retry with backoff, error catching, and long-running workflows (up to a year) without you managing any of that infrastructure.

This post covers real Step Functions patterns: Express vs Standard workflows, sequential and parallel states, retry/catch configuration, Map state for batch processing, and Terraform IaC for the whole thing.

Express vs Standard Workflows

StandardExpress
Max duration1 year5 minutes
Execution modelExactly-onceAt-least-once
Pricing$0.025/1K state transitions$1/M executions + duration
Use forBusiness processes, long-runningHigh-volume, short-lived
Audit historyFull (90-day)CloudWatch only

Standard: Order processing, user provisioning, document pipelines. Express: Real-time event processing, API orchestration, streaming data.


1. State Machine Definition (ASL)

// infrastructure/step-functions/order-processing.asl.json
{
  "Comment": "Order processing workflow with payment, inventory, and fulfillment",
  "StartAt": "ValidateOrder",
  "States": {
    "ValidateOrder": {
      "Type": "Task",
      "Resource": "arn:aws:states:::lambda:invoke",
      "Parameters": {
        "FunctionName": "${ValidateOrderFunctionArn}",
        "Payload.$": "$"
      },
      "ResultPath": "$.validation",
      "Retry": [
        {
          "ErrorEquals": ["Lambda.ServiceException", "Lambda.AWSLambdaException"],
          "IntervalSeconds": 2,
          "MaxAttempts": 3,
          "BackoffRate": 2.0,
          "JitterStrategy": "FULL"
        }
      ],
      "Catch": [
        {
          "ErrorEquals": ["ValidationError"],
          "ResultPath": "$.error",
          "Next": "OrderFailed"
        },
        {
          "ErrorEquals": ["States.ALL"],
          "ResultPath": "$.error",
          "Next": "OrderFailed"
        }
      ],
      "Next": "CheckInventoryAndCharge"
    },

    "CheckInventoryAndCharge": {
      "Type": "Parallel",
      "Branches": [
        {
          "StartAt": "CheckInventory",
          "States": {
            "CheckInventory": {
              "Type": "Task",
              "Resource": "arn:aws:states:::lambda:invoke",
              "Parameters": {
                "FunctionName": "${CheckInventoryFunctionArn}",
                "Payload.$": "$"
              },
              "ResultPath": "$.inventory",
              "Retry": [
                {
                  "ErrorEquals": ["States.TaskFailed"],
                  "IntervalSeconds": 1,
                  "MaxAttempts": 2,
                  "BackoffRate": 2.0
                }
              ],
              "End": true
            }
          }
        },
        {
          "StartAt": "ChargePayment",
          "States": {
            "ChargePayment": {
              "Type": "Task",
              "Resource": "arn:aws:states:::lambda:invoke",
              "Parameters": {
                "FunctionName": "${ChargePaymentFunctionArn}",
                "Payload.$": "$"
              },
              "ResultPath": "$.payment",
              "Retry": [
                {
                  "ErrorEquals": ["Lambda.ServiceException"],
                  "IntervalSeconds": 2,
                  "MaxAttempts": 3,
                  "BackoffRate": 2.0
                }
              ],
              "Catch": [
                {
                  "ErrorEquals": ["PaymentDeclinedError"],
                  "ResultPath": "$.error",
                  "Next": "HandleDeclinedPayment"
                }
              ],
              "End": true
            },
            "HandleDeclinedPayment": {
              "Type": "Task",
              "Resource": "arn:aws:states:::lambda:invoke",
              "Parameters": {
                "FunctionName": "${NotifyPaymentDeclinedFunctionArn}",
                "Payload.$": "$"
              },
              "End": true
            }
          }
        }
      ],
      "ResultPath": "$.parallelResults",
      "Catch": [
        {
          "ErrorEquals": ["States.ALL"],
          "ResultPath": "$.error",
          "Next": "CompensatingTransaction"
        }
      ],
      "Next": "FulfillOrder"
    },

    "FulfillOrder": {
      "Type": "Task",
      "Resource": "arn:aws:states:::lambda:invoke",
      "Parameters": {
        "FunctionName": "${FulfillOrderFunctionArn}",
        "Payload.$": "$"
      },
      "ResultPath": "$.fulfillment",
      "Next": "SendConfirmation"
    },

    "SendConfirmation": {
      "Type": "Task",
      "Resource": "arn:aws:states:::sns:publish",
      "Parameters": {
        "TopicArn": "${OrderConfirmationTopicArn}",
        "Message.$": "States.JsonToString($.fulfillment)"
      },
      "Next": "OrderComplete"
    },

    "OrderComplete": {
      "Type": "Succeed"
    },

    "CompensatingTransaction": {
      "Type": "Task",
      "Resource": "arn:aws:states:::lambda:invoke",
      "Parameters": {
        "FunctionName": "${RollbackOrderFunctionArn}",
        "Payload.$": "$"
      },
      "Next": "OrderFailed"
    },

    "OrderFailed": {
      "Type": "Fail",
      "Error": "OrderProcessingFailed",
      "Cause": "Order could not be processed"
    }
  }
}

☁️ Is Your Cloud Costing Too Much?

Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.

  • AWS, GCP, Azure certified engineers
  • Infrastructure as Code (Terraform, CDK)
  • Docker, Kubernetes, GitHub Actions CI/CD
  • Typical audit recovers $500–$3,000/month in savings

2. Lambda Handler Patterns

// src/functions/order/validate-order.ts
interface OrderInput {
  orderId: string;
  userId: string;
  items: Array<{ productId: string; quantity: number }>;
  shippingAddress: Address;
}

interface ValidationResult {
  isValid: boolean;
  errors: string[];
}

export const handler = async (event: OrderInput): Promise<ValidationResult> => {
  const errors: string[] = [];

  if (!event.orderId) errors.push('Missing orderId');
  if (!event.items?.length) errors.push('Order must have at least one item');
  if (!event.shippingAddress?.country) errors.push('Invalid shipping address');

  if (errors.length > 0) {
    // Throw named error — Step Functions Catch can match on this
    const err = new Error(errors.join('; '));
    err.name = 'ValidationError';
    throw err;
  }

  return { isValid: true, errors: [] };
};

// src/functions/order/charge-payment.ts
export const handler = async (event: OrderInput & { validation: ValidationResult }) => {
  const { stripe } = await import('../../lib/stripe');

  const amount = await calculateOrderTotal(event.items);

  try {
    const paymentIntent = await stripe.paymentIntents.create({
      amount,
      currency: 'usd',
      customer: await getStripeCustomerId(event.userId),
      confirm: true,
      automatic_payment_methods: { enabled: true, allow_redirects: 'never' },
      metadata: { orderId: event.orderId },
      idempotency_key: `charge-${event.orderId}`,  // Critical for retry safety
    });

    return {
      paymentIntentId: paymentIntent.id,
      amount,
      status: paymentIntent.status,
    };
  } catch (err: any) {
    if (err.code === 'card_declined') {
      const declinedError = new Error('Payment declined');
      declinedError.name = 'PaymentDeclinedError';
      throw declinedError;
    }
    throw err; // Bubble up other errors for retry
  }
};

3. Map State for Batch Processing

// Process each item in a batch concurrently (up to maxConcurrency)
"ProcessLineItems": {
  "Type": "Map",
  "ItemsPath": "$.order.items",
  "ItemSelector": {
    "item.$": "$$.Map.Item.Value",
    "orderId.$": "$.orderId",
    "index.$": "$$.Map.Item.Index"
  },
  "MaxConcurrency": 10,
  "Iterator": {
    "StartAt": "ProcessItem",
    "States": {
      "ProcessItem": {
        "Type": "Task",
        "Resource": "arn:aws:states:::lambda:invoke",
        "Parameters": {
          "FunctionName": "${ProcessLineItemFunctionArn}",
          "Payload.$": "$"
        },
        "Retry": [
          {
            "ErrorEquals": ["States.ALL"],
            "IntervalSeconds": 1,
            "MaxAttempts": 3,
            "BackoffRate": 2.0
          }
        ],
        "End": true
      }
    }
  },
  "ResultPath": "$.processedItems",
  "Next": "AggregateResults"
}

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.

  • Staging + production environments with feature flags
  • Automated security scanning in the pipeline
  • Uptime monitoring + alerting + runbook automation
  • On-call support handover docs included

4. Terraform Configuration

# infrastructure/step-functions/main.tf

# Load ASL definition from file and substitute Lambda ARNs
locals {
  asl = templatefile(
    "${path.module}/order-processing.asl.json",
    {
      ValidateOrderFunctionArn        = aws_lambda_function.validate_order.arn
      CheckInventoryFunctionArn       = aws_lambda_function.check_inventory.arn
      ChargePaymentFunctionArn        = aws_lambda_function.charge_payment.arn
      FulfillOrderFunctionArn         = aws_lambda_function.fulfill_order.arn
      NotifyPaymentDeclinedFunctionArn = aws_lambda_function.notify_declined.arn
      RollbackOrderFunctionArn        = aws_lambda_function.rollback_order.arn
      SendConfirmationFunctionArn     = aws_lambda_function.send_confirmation.arn
      OrderConfirmationTopicArn       = aws_sns_topic.order_confirmation.arn
      ProcessLineItemFunctionArn      = aws_lambda_function.process_line_item.arn
    }
  )
}

resource "aws_sfn_state_machine" "order_processing" {
  name     = "${var.project}-order-processing"
  role_arn = aws_iam_role.step_functions.arn

  definition = local.asl

  type = "STANDARD"  # Use STANDARD for order processing (exactly-once)

  logging_configuration {
    log_destination        = "${aws_cloudwatch_log_group.sfn_logs.arn}:*"
    include_execution_data = true
    level                  = "ERROR"  # ALL in dev, ERROR in prod
  }

  tracing_configuration {
    enabled = true  # X-Ray tracing
  }

  tags = {
    Environment = var.environment
    Project     = var.project
  }
}

# CloudWatch log group
resource "aws_cloudwatch_log_group" "sfn_logs" {
  name              = "/aws/states/${var.project}-order-processing"
  retention_in_days = 30
}

# IAM role for Step Functions to invoke Lambda + SNS
resource "aws_iam_role" "step_functions" {
  name = "${var.project}-step-functions-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = { Service = "states.amazonaws.com" }
      Action    = "sts:AssumeRole"
    }]
  })
}

resource "aws_iam_role_policy" "step_functions_policy" {
  name = "${var.project}-sfn-policy"
  role = aws_iam_role.step_functions.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = ["lambda:InvokeFunction"]
        Resource = [
          aws_lambda_function.validate_order.arn,
          aws_lambda_function.check_inventory.arn,
          aws_lambda_function.charge_payment.arn,
          aws_lambda_function.fulfill_order.arn,
          aws_lambda_function.notify_declined.arn,
          aws_lambda_function.rollback_order.arn,
          aws_lambda_function.process_line_item.arn,
        ]
      },
      {
        Effect   = "Allow"
        Action   = ["sns:Publish"]
        Resource = [aws_sns_topic.order_confirmation.arn]
      },
      {
        Effect = "Allow"
        Action = [
          "logs:CreateLogDelivery",
          "logs:PutLogEvents",
          "logs:GetLogDelivery",
          "logs:UpdateLogDelivery",
          "logs:DeleteLogDelivery",
          "logs:ListLogDeliveries",
          "logs:PutResourcePolicy",
          "logs:DescribeResourcePolicies",
          "logs:DescribeLogGroups",
        ]
        Resource = "*"
      },
      {
        Effect   = "Allow"
        Action   = ["xray:PutTraceSegments", "xray:PutTelemetryRecords"]
        Resource = "*"
      }
    ]
  })
}

# Start execution from Lambda trigger
output "state_machine_arn" {
  value = aws_sfn_state_machine.order_processing.arn
}

5. Starting Executions from TypeScript

// src/lib/workflows/order.ts
import { SFNClient, StartExecutionCommand } from '@aws-sdk/client-sfn';

const sfn = new SFNClient({ region: process.env.AWS_REGION });

export async function startOrderProcessing(order: OrderInput): Promise<string> {
  const command = new StartExecutionCommand({
    stateMachineArn: process.env.ORDER_STATE_MACHINE_ARN!,
    name: `order-${order.orderId}`,  // Unique name = idempotent: second call returns existing execution
    input: JSON.stringify(order),
  });

  const result = await sfn.send(command);
  return result.executionArn!;
}

// Check execution status
import { DescribeExecutionCommand } from '@aws-sdk/client-sfn';

export async function getExecutionStatus(executionArn: string) {
  const result = await sfn.send(
    new DescribeExecutionCommand({ executionArn })
  );

  return {
    status: result.status,         // RUNNING | SUCCEEDED | FAILED | TIMED_OUT | ABORTED
    startedAt: result.startDate,
    completedAt: result.stopDate,
    output: result.output ? JSON.parse(result.output) : null,
    error: result.error,
    cause: result.cause,
  };
}

Cost Reference

Workflow typeScaleMonthly costNotes
Standard10K executions, 10 states each~$2.50$0.025/1K transitions
Standard1M executions, 10 states~$250Consider Express for high volume
Express100M executions, 5s each~$350$1/M + $0.00001667/GB-second
Express1B events/month~$1,200Compare to SQS+Lambda DIY

See Also


Working With Viprasol

Building complex multi-step serverless workflows that need reliable orchestration, compensating transactions on failure, and parallel execution? We design and implement AWS Step Functions state machines for your business processes — with proper error handling, retry strategies, Terraform IaC, and CloudWatch observability.

Talk to our team → | Explore our cloud solutions →

Share this article:

About the Author

V

Viprasol Tech Team

Custom Software Development Specialists

The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.

MT4/MT5 EA DevelopmentAI Agent SystemsSaaS DevelopmentAlgorithmic Trading

Need DevOps & Cloud Expertise?

Scale your infrastructure with confidence. AWS, GCP, Azure certified team.

Free consultation • No commitment • Response within 24 hours

Viprasol · Big Data & Analytics

Making sense of your data at scale?

Viprasol builds end-to-end big data analytics solutions — ETL pipelines, data warehouses on Snowflake or BigQuery, and self-service BI dashboards. One reliable source of truth for your entire organisation.