Back to Blog

Serverless Cost Optimization: Lambda Cold Starts, Provisioned Concurrency, and Right-Sizing

Reduce AWS Lambda costs and eliminate cold starts — memory right-sizing, provisioned concurrency, ARM Graviton, Lambda layers, reserved concurrency, and when se

Viprasol Tech Team
May 2, 2026
12 min read

Serverless Cost Optimization: Lambda Cold Starts, Provisioned Concurrency, and Right-Sizing

AWS Lambda pricing seems simple: pay per invocation and per GB-second of execution. In practice, serverless bills surprise teams constantly — either through unexpectedly high costs from inefficient functions or through cold start latency that degrades user experience.

This guide covers the techniques that cut Lambda costs 40–70% and eliminate cold start issues without abandoning serverless.


Lambda Pricing Basics (2026)

ResourcePrice
Requests$0.20 per 1M requests
Duration (x86)$0.0000166667 per GB-second
Duration (ARM/Graviton2)$0.0000133334 per GB-second (20% cheaper)
Provisioned Concurrency$0.0000041667 per GB-second (allocated)
Free tier1M requests + 400,000 GB-seconds per month

Example cost: API handling 10M requests/month, 200ms avg duration, 512MB memory:

Requests: 10M × $0.20/1M = $2.00
Duration: 10M × 0.2s × 0.5GB × $0.0000166667 = $16.67
Total: ~$18.67/month

Same workload on ARM Graviton:

Duration: 10M × 0.2s × 0.5GB × $0.0000133334 = $13.33
Total: ~$15.33/month (18% cheaper, same compute)

Memory Right-Sizing

Lambda charges for memory × duration. The counterintuitive finding: more memory often costs less, because higher memory = more CPU = faster execution.

# benchmark_lambda.py — test your function at different memory settings
# Deploy with AWS Lambda Power Tuning (Step Functions state machine)
# https://github.com/alexcasalboni/aws-lambda-power-tuning

import boto3
import json
import time

lambda_client = boto3.client('lambda', region_name='us-east-1')

def benchmark_memory(function_name: str, test_payload: dict, memory_sizes: list[int]):
    results = []

    for memory_mb in memory_sizes:
        # Update function memory
        lambda_client.update_function_configuration(
            FunctionName=function_name,
            MemorySize=memory_mb,
        )
        time.sleep(2)  # Wait for config propagation

        # Run multiple invocations and average
        durations = []
        for _ in range(10):
            response = lambda_client.invoke(
                FunctionName=function_name,
                Payload=json.dumps(test_payload),
            )
            log = response.get('LogResult', '')
            # Parse duration from REPORT line in Lambda logs
            # REPORT RequestId: ... Duration: 45.23 ms Billed Duration: 46 ms ...

        avg_duration_ms = sum(durations) / len(durations)
        gb_seconds = (memory_mb / 1024) * (avg_duration_ms / 1000)
        cost_per_million = gb_seconds * 0.0000166667 * 1_000_000

        results.append({
            'memory_mb': memory_mb,
            'avg_duration_ms': avg_duration_ms,
            'cost_per_million_invocations': cost_per_million,
        })
        print(f"Memory: {memory_mb}MB | Duration: {avg_duration_ms:.1f}ms | Cost/1M: ${cost_per_million:.2f}")

    return results

Use AWS Lambda Power Tuning — it automates this benchmark across memory settings and produces a cost/performance graph. Most teams find their sweet spot is 512MB–1024MB for Node.js/Python, 1024MB–2048MB for JVM-based functions.


☁️ Is Your Cloud Costing Too Much?

Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.

  • AWS, GCP, Azure certified engineers
  • Infrastructure as Code (Terraform, CDK)
  • Docker, Kubernetes, GitHub Actions CI/CD
  • Typical audit recovers $500–$3,000/month in savings

Cold Starts: Root Causes and Solutions

A cold start happens when Lambda needs to initialize a new execution environment — download your code, start the runtime, run initialization code. This adds 100ms–5s of latency on top of your function's actual execution time.

Cold start latency by runtime (typical):

RuntimeCold StartWarm Execution
Node.js 20150–400ms5–50ms
Python 3.12100–300ms5–30ms
Go 1.2150–150ms1–10ms
Java 21 (with SnapStart)500ms–1s → ~100ms10–100ms
Java 21 (without SnapStart)3–10s10–100ms

Solutions by approach:

1. Reduce Package Size

Smaller deployment packages initialize faster. The cold start is partly I/O — loading your code from S3.

# Audit your bundle
npx source-map-explorer dist/function.js

# Common wins:
# - Use bundler (esbuild/webpack) instead of deploying node_modules/
# - Tree-shake unused imports
# - Move large static assets to S3 (not the Lambda package)
# - Use Lambda Layers for shared dependencies

# Target: < 5MB for Node.js, < 50MB zipped total
// esbuild.config.ts — bundle to single file
import { build } from 'esbuild';

await build({
  entryPoints: ['src/handler.ts'],
  bundle: true,
  platform: 'node',
  target: 'node20',
  outfile: 'dist/handler.js',
  external: [
    // Don't bundle AWS SDK v3 (available in Lambda runtime)
    '@aws-sdk/*',
  ],
  minify: true,
  sourcemap: 'external',
});

2. Move Heavy Init Outside the Handler

// ❌ Bad: DB connection created on every cold start AND on handler calls
export const handler = async (event: APIGatewayEvent) => {
  const db = new Pool({ connectionString: process.env.DATABASE_URL });
  const result = await db.query('SELECT * FROM users WHERE id = $1', [event.pathParameters?.id]);
  await db.end();
  return { statusCode: 200, body: JSON.stringify(result.rows[0]) };
};

// ✅ Good: DB connection created once, reused across invocations
import { Pool } from 'pg';

// Module-level initialization — runs once per execution environment
const db = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: 2,  // Lambda: keep pool small (1-2 connections per function)
});

export const handler = async (event: APIGatewayEvent) => {
  const result = await db.query('SELECT * FROM users WHERE id = $1', [event.pathParameters?.id]);
  return { statusCode: 200, body: JSON.stringify(result.rows[0]) };
};

3. Provisioned Concurrency

For latency-sensitive functions (user-facing APIs), pre-warm a fixed number of execution environments:

# terraform/lambda.tf
resource "aws_lambda_function" "api" {
  function_name = "api-handler"
  runtime       = "nodejs20.x"
  architectures = ["arm64"]  # Graviton — 20% cheaper
  memory_size   = 512
  timeout       = 30

  # ... rest of config
}

# Provisioned concurrency — keeps N environments warm
resource "aws_lambda_provisioned_concurrency_config" "api" {
  function_name                  = aws_lambda_function.api.function_name
  qualifier                      = aws_lambda_alias.api_live.name
  provisioned_concurrent_executions = 5  # 5 warm environments
}

# Auto-scale provisioned concurrency with traffic patterns
resource "aws_appautoscaling_target" "lambda_concurrency" {
  max_capacity       = 50
  min_capacity       = 2
  resource_id        = "function:${aws_lambda_function.api.function_name}:live"
  scalable_dimension = "lambda:function:ProvisionedConcurrency"
  service_namespace  = "lambda"
}

resource "aws_appautoscaling_policy" "lambda_concurrency" {
  name               = "lambda-target-tracking"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.lambda_concurrency.resource_id
  scalable_dimension = aws_appautoscaling_target.lambda_concurrency.scalable_dimension
  service_namespace  = aws_appautoscaling_target.lambda_concurrency.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "LambdaProvisionedConcurrencyUtilization"
    }
    target_value = 0.7  # Scale up when 70% of provisioned capacity is in use
  }
}

Provisioned concurrency cost: ~$0.0000041667/GB-sec allocated (not invoked) — about 25% of execution cost. For 5 × 512MB functions running 24/7: 5 × 0.5GB × 86400s × $0.0000041667 = $0.90/day = $27/month. Worth it if cold starts cause user-facing latency.

4. Java SnapStart

For Java Lambda functions, SnapStart takes a snapshot of the initialized JVM state and restores it on cold start — reducing cold start from 3–10 seconds to ~100ms:

# AWS SAM template
Resources:
  JavaApiFunction:
    Type: AWS::Serverless::Function
    Properties:
      Runtime: java21
      SnapStart:
        ApplyOn: PublishedVersions
      # ... rest of config

Lambda Layers: Shared Dependencies

Lambda Layers let you share code and dependencies across functions without including them in every deployment package:

# Create a layer with shared dependencies
mkdir -p layer/nodejs
cd layer/nodejs
npm install pg ioredis zod  # Shared dependencies
cd ..
zip -r layer.zip nodejs/

aws lambda publish-layer-version \
  --layer-name shared-deps \
  --zip-file fileb://layer.zip \
  --compatible-runtimes nodejs20.x \
  --compatible-architectures arm64
# Attach layer to functions
resource "aws_lambda_function" "api" {
  layers = [
    aws_lambda_layer_version.shared_deps.arn,
    "arn:aws:lambda:us-east-1:580247275435:layer:LambdaInsightsExtension-Arm64:20",
  ]
}

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.

  • Staging + production environments with feature flags
  • Automated security scanning in the pipeline
  • Uptime monitoring + alerting + runbook automation
  • On-call support handover docs included

When Serverless Costs More Than EC2

Lambda is economical for spiky, unpredictable traffic. At sustained high volume, EC2 or ECS Fargate can be cheaper:

Monthly InvocationsLambda CostECS Fargate (t3.small)Winner
1M (spiky)~$2$15–20Lambda
10M~$20$15–20Tie
50M~$100$15–20ECS
500M~$1,000$50–100ECS

Rule of thumb: if your Lambda functions run > 50% of the time (sustained load), containerized compute is cheaper. Lambda's value is elasticity — scaling to zero and scaling to thousands of concurrent executions without pre-provisioning.


Cost Monitoring

# Get Lambda cost breakdown per function from AWS Cost Explorer
import boto3

ce = boto3.client('ce', region_name='us-east-1')

response = ce.get_cost_and_usage(
    TimePeriod={'Start': '2026-04-01', 'End': '2026-05-01'},
    Granularity='MONTHLY',
    Filter={
        'Dimensions': {
            'Key': 'SERVICE',
            'Values': ['AWS Lambda'],
        }
    },
    GroupBy=[{'Type': 'DIMENSION', 'Key': 'OPERATION'}],
    Metrics=['BlendedCost'],
)

for result in response['ResultsByTime']:
    for group in result['Groups']:
        operation = group['Keys'][0]
        cost = group['Metrics']['BlendedCost']['Amount']
        print(f"{operation}: ${float(cost):.2f}")

Set AWS Cost Anomaly Detection alerts on Lambda — unexpected cost spikes often indicate runaway recursion or misconfigured event triggers.


Working With Viprasol

We audit and optimize serverless architectures — identifying memory sizing opportunities, implementing provisioned concurrency for latency-sensitive paths, migrating high-volume workloads to more cost-effective compute, and setting up cost monitoring and alerting.

Talk to our cloud team about serverless cost optimization.


See Also

Share this article:

About the Author

V

Viprasol Tech Team

Custom Software Development Specialists

The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.

MT4/MT5 EA DevelopmentAI Agent SystemsSaaS DevelopmentAlgorithmic Trading

Need DevOps & Cloud Expertise?

Scale your infrastructure with confidence. AWS, GCP, Azure certified team.

Free consultation • No commitment • Response within 24 hours

Viprasol · Big Data & Analytics

Making sense of your data at scale?

Viprasol builds end-to-end big data analytics solutions — ETL pipelines, data warehouses on Snowflake or BigQuery, and self-service BI dashboards. One reliable source of truth for your entire organisation.