Serverless Cost Optimization: Lambda Cold Starts

Serverless Cost Optimization: Lambda Cold Starts, Provisioned Concurrency, and Right-Sizing

Quick answer. Cut Lambda costs 40-70% by switching to ARM/Graviton2 (20% cheaper at $0.0000133334 per GB-second), right-sizing memory, and trimming execution time. Eliminate cold starts with provisioned concurrency for latency-sensitive functions. Lambda bills $0.20 per million requests plus duration, with 1M requests and 400,000 GB-seconds free monthly.

AWS Lambda pricing seems simple: pay per invocation and per GB-second of execution. In practice, serverless bills surprise teams constantly — either through unexpectedly high costs from inefficient functions or through cold start latency that degrades user experience.

This guide covers the techniques that cut Lambda costs 40–70% and eliminate cold start issues without abandoning serverless.

Lambda Pricing Basics (2026)

Resource	Price
Requests	$0.20 per 1M requests
Duration (x86)	$0.0000166667 per GB-second
Duration (ARM/Graviton2)	$0.0000133334 per GB-second (20% cheaper)
Provisioned Concurrency	$0.0000041667 per GB-second (allocated)
Free tier	1M requests + 400,000 GB-seconds per month

Example cost: API handling 10M requests/month, 200ms avg duration, 512MB memory:

Requests: 10M × $0.20/1M = $2.00
Duration: 10M × 0.2s × 0.5GB × $0.0000166667 = $16.67
Total: ~$18.67/month

Same workload on ARM Graviton:

Duration: 10M × 0.2s × 0.5GB × $0.0000133334 = $13.33
Total: ~$15.33/month (18% cheaper, same compute)

Memory Right-Sizing

Lambda charges for memory × duration. The counterintuitive finding: more memory often costs less, because higher memory = more CPU = faster execution.

# benchmark_lambda.py — test your function at different memory settings
# Deploy with AWS Lambda Power Tuning (Step Functions state machine)
# https://github.com/alexcasalboni/aws-lambda-power-tuning

import boto3
import json
import time

lambda_client = boto3.client('lambda', region_name='us-east-1')

def benchmark_memory(function_name: str, test_payload: dict, memory_sizes: list[int]):
    results = []

    for memory_mb in memory_sizes:
        # Update function memory
        lambda_client.update_function_configuration(
            FunctionName=function_name,
            MemorySize=memory_mb,
        )
        time.sleep(2)  # Wait for config propagation

        # Run multiple invocations and average
        durations = []
        for _ in range(10):
            response = lambda_client.invoke(
                FunctionName=function_name,
                Payload=json.dumps(test_payload),
            )
            log = response.get('LogResult', '')
            # Parse duration from REPORT line in Lambda logs
            # REPORT RequestId: ... Duration: 45.23 ms Billed Duration: 46 ms ...

        avg_duration_ms = sum(durations) / len(durations)
        gb_seconds = (memory_mb / 1024) * (avg_duration_ms / 1000)
        cost_per_million = gb_seconds * 0.0000166667 * 1_000_000

        results.append({
            'memory_mb': memory_mb,
            'avg_duration_ms': avg_duration_ms,
            'cost_per_million_invocations': cost_per_million,
        })
        print(f"Memory: {memory_mb}MB | Duration: {avg_duration_ms:.1f}ms | Cost/1M: ${cost_per_million:.2f}")

    return results

Use AWS Lambda Power Tuning — it automates this benchmark across memory settings and produces a cost/performance graph. Most teams find their sweet spot is 512MB–1024MB for Node.js/Python, 1024MB–2048MB for JVM-based functions.

☁️ Is Your Cloud Costing Too Much?

Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.

AWS, GCP, Azure certified engineers
Infrastructure as Code (Terraform, CDK)
Docker, Kubernetes, GitHub Actions CI/CD
Typical audit recovers $500–$3,000/month in savings

Get a Free Cloud Audit WhatsApp

Cold Starts: Root Causes and Solutions

A cold start happens when Lambda needs to initialize a new execution environment — download your code, start the runtime, run initialization code. This adds 100ms–5s of latency on top of your function's actual execution time.

Cold start latency by runtime (typical):

Runtime	Cold Start	Warm Execution
Node.js 20	150–400ms	5–50ms
Python 3.12	100–300ms	5–30ms
Go 1.21	50–150ms	1–10ms
Java 21 (with SnapStart)	500ms–1s → ~100ms	10–100ms
Java 21 (without SnapStart)	3–10s	10–100ms

Solutions by approach:

1. Reduce Package Size

Smaller deployment packages initialize faster. The cold start is partly I/O — loading your code from S3.

# Audit your bundle
npx source-map-explorer dist/function.js

# Common wins:
# - Use bundler (esbuild/webpack) instead of deploying node_modules/
# - Tree-shake unused imports
# - Move large static assets to S3 (not the Lambda package)
# - Use Lambda Layers for shared dependencies

# Target: < 5MB for Node.js, < 50MB zipped total

// esbuild.config.ts — bundle to single file
import { build } from 'esbuild';

await build({
  entryPoints: ['src/handler.ts'],
  bundle: true,
  platform: 'node',
  target: 'node20',
  outfile: 'dist/handler.js',
  external: [
    // Don't bundle AWS SDK v3 (available in Lambda runtime)
    '@aws-sdk/*',
  ],
  minify: true,
  sourcemap: 'external',
});

2. Move Heavy Init Outside the Handler

// ❌ Bad: DB connection created on every cold start AND on handler calls
export const handler = async (event: APIGatewayEvent) => {
  const db = new Pool({ connectionString: process.env.DATABASE_URL });
  const result = await db.query('SELECT * FROM users WHERE id = $1', [event.pathParameters?.id]);
  await db.end();
  return { statusCode: 200, body: JSON.stringify(result.rows[0]) };
};

// ✅ Good: DB connection created once, reused across invocations
import { Pool } from 'pg';

// Module-level initialization — runs once per execution environment
const db = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: 2,  // Lambda: keep pool small (1-2 connections per function)
});

export const handler = async (event: APIGatewayEvent) => {
  const result = await db.query('SELECT * FROM users WHERE id = $1', [event.pathParameters?.id]);
  return { statusCode: 200, body: JSON.stringify(result.rows[0]) };
};

3. Provisioned Concurrency

For latency-sensitive functions (user-facing APIs), pre-warm a fixed number of execution environments:

# terraform/lambda.tf
resource "aws_lambda_function" "api" {
  function_name = "api-handler"
  runtime       = "nodejs20.x"
  architectures = ["arm64"]  # Graviton — 20% cheaper
  memory_size   = 512
  timeout       = 30

  # ... rest of config
}

# Provisioned concurrency — keeps N environments warm
resource "aws_lambda_provisioned_concurrency_config" "api" {
  function_name                  = aws_lambda_function.api.function_name
  qualifier                      = aws_lambda_alias.api_live.name
  provisioned_concurrent_executions = 5  # 5 warm environments
}

# Auto-scale provisioned concurrency with traffic patterns
resource "aws_appautoscaling_target" "lambda_concurrency" {
  max_capacity       = 50
  min_capacity       = 2
  resource_id        = "function:${aws_lambda_function.api.function_name}:live"
  scalable_dimension = "lambda:function:ProvisionedConcurrency"
  service_namespace  = "lambda"
}

resource "aws_appautoscaling_policy" "lambda_concurrency" {
  name               = "lambda-target-tracking"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.lambda_concurrency.resource_id
  scalable_dimension = aws_appautoscaling_target.lambda_concurrency.scalable_dimension
  service_namespace  = aws_appautoscaling_target.lambda_concurrency.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "LambdaProvisionedConcurrencyUtilization"
    }
    target_value = 0.7  # Scale up when 70% of provisioned capacity is in use
  }
}

Provisioned concurrency cost: ~$0.0000041667/GB-sec allocated (not invoked) — about 25% of execution cost. For 5 × 512MB functions running 24/7: 5 × 0.5GB × 86400s × $0.0000041667 = $0.90/day = $27/month. Worth it if cold starts cause user-facing latency.

4. Java SnapStart

For Java Lambda functions, SnapStart takes a snapshot of the initialized JVM state and restores it on cold start — reducing cold start from 3–10 seconds to ~100ms:

# AWS SAM template
Resources:
  JavaApiFunction:
    Type: AWS::Serverless::Function
    Properties:
      Runtime: java21
      SnapStart:
        ApplyOn: PublishedVersions
      # ... rest of config

Lambda Layers: Shared Dependencies

Lambda Layers let you share code and dependencies across functions without including them in every deployment package:

# Create a layer with shared dependencies
mkdir -p layer/nodejs
cd layer/nodejs
npm install pg ioredis zod  # Shared dependencies
cd ..
zip -r layer.zip nodejs/

aws lambda publish-layer-version \
  --layer-name shared-deps \
  --zip-file fileb://layer.zip \
  --compatible-runtimes nodejs20.x \
  --compatible-architectures arm64

# Attach layer to functions
resource "aws_lambda_function" "api" {
  layers = [
    aws_lambda_layer_version.shared_deps.arn,
    "arn:aws:lambda:us-east-1:580247275435:layer:LambdaInsightsExtension-Arm64:20",
  ]
}

serverless - Serverless Cost Optimization: Lambda Cold Starts

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.

Staging + production environments with feature flags
Automated security scanning in the pipeline
Uptime monitoring + alerting + runbook automation
On-call support handover docs included

Modernize My DevOps WhatsApp

When Serverless Costs More Than EC2

Lambda is economical for spiky, unpredictable traffic. At sustained high volume, EC2 or ECS Fargate can be cheaper:

Monthly Invocations	Lambda Cost	ECS Fargate (t3.small)	Winner
1M (spiky)	~$2	$15–20	Lambda
10M	~$20	$15–20	Tie
50M	~$100	$15–20	ECS
500M	~$1,000	$50–100	ECS

Rule of thumb: if your Lambda functions run > 50% of the time (sustained load), containerized compute is cheaper. Lambda's value is elasticity — scaling to zero and scaling to thousands of concurrent executions without pre-provisioning.

Cost Monitoring

# Get Lambda cost breakdown per function from AWS Cost Explorer
import boto3

ce = boto3.client('ce', region_name='us-east-1')

response = ce.get_cost_and_usage(
    TimePeriod={'Start': '2026-04-01', 'End': '2026-05-01'},
    Granularity='MONTHLY',
    Filter={
        'Dimensions': {
            'Key': 'SERVICE',
            'Values': ['AWS Lambda'],
        }
    },
    GroupBy=[{'Type': 'DIMENSION', 'Key': 'OPERATION'}],
    Metrics=['BlendedCost'],
)

for result in response['ResultsByTime']:
    for group in result['Groups']:
        operation = group['Keys'][0]
        cost = group['Metrics']['BlendedCost']['Amount']
        print(f"{operation}: ${float(cost):.2f}")

Set AWS Cost Anomaly Detection alerts on Lambda — unexpected cost spikes often indicate runaway recursion or misconfigured event triggers.

What Viprasol Offers

We audit and optimize serverless architectures — identifying memory sizing opportunities, implementing provisioned concurrency for latency-sensitive paths, migrating high-volume workloads to more cost-effective compute, and setting up cost monitoring and alerting.

→ Talk to our cloud team about serverless cost optimization.

Understanding AWS Lambda Provisioned Concurrency Pricing at 0.0000041667

If you are budgeting around the aws lambda provisioned concurrency pricing 0.0000041667 figure, that number is the per-second rate charged for each GB of memory you keep warm. Stated fully, the aws lambda provisioned concurrency pricing 0.0000041667 gb-second rate applies the moment provisioned concurrency is enabled and continues whether or not requests arrive, which is what eliminates cold starts but adds a steady baseline cost. To estimate spend, multiply your function's allocated memory in GB by the seconds it stays provisioned by the number of concurrent instances. Pair this with the standard per-invocation request and duration charges to see the full picture. Our senior engineers model these trade-offs for clients, comparing provisioned concurrency against on-demand and right-sized memory so you only pay for warmth where latency genuinely matters. We take full ownership of the analysis and the implementation.

Serverless Cost Optimization: Lambda Cold Starts