Serverless Cost Optimization: Lambda Cold Starts, Provisioned Concurrency, and Right-Sizing
Reduce AWS Lambda costs and eliminate cold starts — memory right-sizing, provisioned concurrency, ARM Graviton, Lambda layers, reserved concurrency, and when se
Serverless Cost Optimization: Lambda Cold Starts, Provisioned Concurrency, and Right-Sizing
AWS Lambda pricing seems simple: pay per invocation and per GB-second of execution. In practice, serverless bills surprise teams constantly — either through unexpectedly high costs from inefficient functions or through cold start latency that degrades user experience.
This guide covers the techniques that cut Lambda costs 40–70% and eliminate cold start issues without abandoning serverless.
Lambda Pricing Basics (2026)
| Resource | Price |
|---|---|
| Requests | $0.20 per 1M requests |
| Duration (x86) | $0.0000166667 per GB-second |
| Duration (ARM/Graviton2) | $0.0000133334 per GB-second (20% cheaper) |
| Provisioned Concurrency | $0.0000041667 per GB-second (allocated) |
| Free tier | 1M requests + 400,000 GB-seconds per month |
Example cost: API handling 10M requests/month, 200ms avg duration, 512MB memory:
Requests: 10M × $0.20/1M = $2.00
Duration: 10M × 0.2s × 0.5GB × $0.0000166667 = $16.67
Total: ~$18.67/month
Same workload on ARM Graviton:
Duration: 10M × 0.2s × 0.5GB × $0.0000133334 = $13.33
Total: ~$15.33/month (18% cheaper, same compute)
Memory Right-Sizing
Lambda charges for memory × duration. The counterintuitive finding: more memory often costs less, because higher memory = more CPU = faster execution.
# benchmark_lambda.py — test your function at different memory settings
# Deploy with AWS Lambda Power Tuning (Step Functions state machine)
# https://github.com/alexcasalboni/aws-lambda-power-tuning
import boto3
import json
import time
lambda_client = boto3.client('lambda', region_name='us-east-1')
def benchmark_memory(function_name: str, test_payload: dict, memory_sizes: list[int]):
results = []
for memory_mb in memory_sizes:
# Update function memory
lambda_client.update_function_configuration(
FunctionName=function_name,
MemorySize=memory_mb,
)
time.sleep(2) # Wait for config propagation
# Run multiple invocations and average
durations = []
for _ in range(10):
response = lambda_client.invoke(
FunctionName=function_name,
Payload=json.dumps(test_payload),
)
log = response.get('LogResult', '')
# Parse duration from REPORT line in Lambda logs
# REPORT RequestId: ... Duration: 45.23 ms Billed Duration: 46 ms ...
avg_duration_ms = sum(durations) / len(durations)
gb_seconds = (memory_mb / 1024) * (avg_duration_ms / 1000)
cost_per_million = gb_seconds * 0.0000166667 * 1_000_000
results.append({
'memory_mb': memory_mb,
'avg_duration_ms': avg_duration_ms,
'cost_per_million_invocations': cost_per_million,
})
print(f"Memory: {memory_mb}MB | Duration: {avg_duration_ms:.1f}ms | Cost/1M: ${cost_per_million:.2f}")
return results
Use AWS Lambda Power Tuning — it automates this benchmark across memory settings and produces a cost/performance graph. Most teams find their sweet spot is 512MB–1024MB for Node.js/Python, 1024MB–2048MB for JVM-based functions.
☁️ Is Your Cloud Costing Too Much?
Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.
- AWS, GCP, Azure certified engineers
- Infrastructure as Code (Terraform, CDK)
- Docker, Kubernetes, GitHub Actions CI/CD
- Typical audit recovers $500–$3,000/month in savings
Cold Starts: Root Causes and Solutions
A cold start happens when Lambda needs to initialize a new execution environment — download your code, start the runtime, run initialization code. This adds 100ms–5s of latency on top of your function's actual execution time.
Cold start latency by runtime (typical):
| Runtime | Cold Start | Warm Execution |
|---|---|---|
| Node.js 20 | 150–400ms | 5–50ms |
| Python 3.12 | 100–300ms | 5–30ms |
| Go 1.21 | 50–150ms | 1–10ms |
| Java 21 (with SnapStart) | 500ms–1s → ~100ms | 10–100ms |
| Java 21 (without SnapStart) | 3–10s | 10–100ms |
Solutions by approach:
1. Reduce Package Size
Smaller deployment packages initialize faster. The cold start is partly I/O — loading your code from S3.
# Audit your bundle
npx source-map-explorer dist/function.js
# Common wins:
# - Use bundler (esbuild/webpack) instead of deploying node_modules/
# - Tree-shake unused imports
# - Move large static assets to S3 (not the Lambda package)
# - Use Lambda Layers for shared dependencies
# Target: < 5MB for Node.js, < 50MB zipped total
// esbuild.config.ts — bundle to single file
import { build } from 'esbuild';
await build({
entryPoints: ['src/handler.ts'],
bundle: true,
platform: 'node',
target: 'node20',
outfile: 'dist/handler.js',
external: [
// Don't bundle AWS SDK v3 (available in Lambda runtime)
'@aws-sdk/*',
],
minify: true,
sourcemap: 'external',
});
2. Move Heavy Init Outside the Handler
// ❌ Bad: DB connection created on every cold start AND on handler calls
export const handler = async (event: APIGatewayEvent) => {
const db = new Pool({ connectionString: process.env.DATABASE_URL });
const result = await db.query('SELECT * FROM users WHERE id = $1', [event.pathParameters?.id]);
await db.end();
return { statusCode: 200, body: JSON.stringify(result.rows[0]) };
};
// ✅ Good: DB connection created once, reused across invocations
import { Pool } from 'pg';
// Module-level initialization — runs once per execution environment
const db = new Pool({
connectionString: process.env.DATABASE_URL,
max: 2, // Lambda: keep pool small (1-2 connections per function)
});
export const handler = async (event: APIGatewayEvent) => {
const result = await db.query('SELECT * FROM users WHERE id = $1', [event.pathParameters?.id]);
return { statusCode: 200, body: JSON.stringify(result.rows[0]) };
};
3. Provisioned Concurrency
For latency-sensitive functions (user-facing APIs), pre-warm a fixed number of execution environments:
# terraform/lambda.tf
resource "aws_lambda_function" "api" {
function_name = "api-handler"
runtime = "nodejs20.x"
architectures = ["arm64"] # Graviton — 20% cheaper
memory_size = 512
timeout = 30
# ... rest of config
}
# Provisioned concurrency — keeps N environments warm
resource "aws_lambda_provisioned_concurrency_config" "api" {
function_name = aws_lambda_function.api.function_name
qualifier = aws_lambda_alias.api_live.name
provisioned_concurrent_executions = 5 # 5 warm environments
}
# Auto-scale provisioned concurrency with traffic patterns
resource "aws_appautoscaling_target" "lambda_concurrency" {
max_capacity = 50
min_capacity = 2
resource_id = "function:${aws_lambda_function.api.function_name}:live"
scalable_dimension = "lambda:function:ProvisionedConcurrency"
service_namespace = "lambda"
}
resource "aws_appautoscaling_policy" "lambda_concurrency" {
name = "lambda-target-tracking"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.lambda_concurrency.resource_id
scalable_dimension = aws_appautoscaling_target.lambda_concurrency.scalable_dimension
service_namespace = aws_appautoscaling_target.lambda_concurrency.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "LambdaProvisionedConcurrencyUtilization"
}
target_value = 0.7 # Scale up when 70% of provisioned capacity is in use
}
}
Provisioned concurrency cost: ~$0.0000041667/GB-sec allocated (not invoked) — about 25% of execution cost. For 5 × 512MB functions running 24/7: 5 × 0.5GB × 86400s × $0.0000041667 = $0.90/day = $27/month. Worth it if cold starts cause user-facing latency.
4. Java SnapStart
For Java Lambda functions, SnapStart takes a snapshot of the initialized JVM state and restores it on cold start — reducing cold start from 3–10 seconds to ~100ms:
# AWS SAM template
Resources:
JavaApiFunction:
Type: AWS::Serverless::Function
Properties:
Runtime: java21
SnapStart:
ApplyOn: PublishedVersions
# ... rest of config
Lambda Layers: Shared Dependencies
Lambda Layers let you share code and dependencies across functions without including them in every deployment package:
# Create a layer with shared dependencies
mkdir -p layer/nodejs
cd layer/nodejs
npm install pg ioredis zod # Shared dependencies
cd ..
zip -r layer.zip nodejs/
aws lambda publish-layer-version \
--layer-name shared-deps \
--zip-file fileb://layer.zip \
--compatible-runtimes nodejs20.x \
--compatible-architectures arm64
# Attach layer to functions
resource "aws_lambda_function" "api" {
layers = [
aws_lambda_layer_version.shared_deps.arn,
"arn:aws:lambda:us-east-1:580247275435:layer:LambdaInsightsExtension-Arm64:20",
]
}
⚙️ DevOps Done Right — Zero Downtime, Full Automation
Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.
- Staging + production environments with feature flags
- Automated security scanning in the pipeline
- Uptime monitoring + alerting + runbook automation
- On-call support handover docs included
When Serverless Costs More Than EC2
Lambda is economical for spiky, unpredictable traffic. At sustained high volume, EC2 or ECS Fargate can be cheaper:
| Monthly Invocations | Lambda Cost | ECS Fargate (t3.small) | Winner |
|---|---|---|---|
| 1M (spiky) | ~$2 | $15–20 | Lambda |
| 10M | ~$20 | $15–20 | Tie |
| 50M | ~$100 | $15–20 | ECS |
| 500M | ~$1,000 | $50–100 | ECS |
Rule of thumb: if your Lambda functions run > 50% of the time (sustained load), containerized compute is cheaper. Lambda's value is elasticity — scaling to zero and scaling to thousands of concurrent executions without pre-provisioning.
Cost Monitoring
# Get Lambda cost breakdown per function from AWS Cost Explorer
import boto3
ce = boto3.client('ce', region_name='us-east-1')
response = ce.get_cost_and_usage(
TimePeriod={'Start': '2026-04-01', 'End': '2026-05-01'},
Granularity='MONTHLY',
Filter={
'Dimensions': {
'Key': 'SERVICE',
'Values': ['AWS Lambda'],
}
},
GroupBy=[{'Type': 'DIMENSION', 'Key': 'OPERATION'}],
Metrics=['BlendedCost'],
)
for result in response['ResultsByTime']:
for group in result['Groups']:
operation = group['Keys'][0]
cost = group['Metrics']['BlendedCost']['Amount']
print(f"{operation}: ${float(cost):.2f}")
Set AWS Cost Anomaly Detection alerts on Lambda — unexpected cost spikes often indicate runaway recursion or misconfigured event triggers.
Working With Viprasol
We audit and optimize serverless architectures — identifying memory sizing opportunities, implementing provisioned concurrency for latency-sensitive paths, migrating high-volume workloads to more cost-effective compute, and setting up cost monitoring and alerting.
→ Talk to our cloud team about serverless cost optimization.
See Also
- Serverless Architecture — foundational serverless patterns
- Infrastructure as Code — Terraform for Lambda configuration
- Caching Strategies — reduce Lambda invocations with caching
- DevOps Best Practices — CI/CD for serverless functions
- Cloud Solutions — cloud cost optimization and architecture
About the Author
Viprasol Tech Team
Custom Software Development Specialists
The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.
Need DevOps & Cloud Expertise?
Scale your infrastructure with confidence. AWS, GCP, Azure certified team.
Free consultation • No commitment • Response within 24 hours
Making sense of your data at scale?
Viprasol builds end-to-end big data analytics solutions — ETL pipelines, data warehouses on Snowflake or BigQuery, and self-service BI dashboards. One reliable source of truth for your entire organisation.