API Rate Limiting: Token Bucket, Sliding Window, and Production Implementation
Implement production API rate limiting with token bucket and sliding window algorithms. Covers Redis implementation, Nginx config, per-user and per-endpoint lim
API Rate Limiting: Token Bucket, Sliding Window, and Production Implementation
Rate limiting is the control mechanism that prevents your API from being overwhelmed by a single client โ whether that's an abusive user, a misconfigured client firing requests in a tight loop, or a DDoS attack. Without it, one bad actor can degrade service for everyone.
This guide covers the algorithms and production implementation patterns that handle millions of requests per day.
The Core Algorithms
Fixed Window Counter
The simplest algorithm: count requests per client per time window.
// Count requests in the current minute
const key = `rl:${clientId}:${Math.floor(Date.now() / 60_000)}`;
const count = await redis.incr(key);
await redis.expire(key, 60);
const allowed = count <= 100;
The problem: At window boundaries, a client can fire 100 requests at 11:59:59 and 100 more at 12:00:00 โ 200 requests in 2 seconds. The fixed window allows 2ร the intended burst.
Sliding Window Log
Track the timestamp of every request. Count requests in the rolling window.
const now = Date.now();
const windowStart = now - 60_000;
const key = `rl:log:${clientId}`;
const pipeline = redis.pipeline();
pipeline.zremrangebyscore(key, '-inf', windowStart); // Remove old entries
pipeline.zadd(key, now, `${now}-${Math.random()}`); // Add this request
pipeline.zcard(key); // Count in window
pipeline.expire(key, 60);
const results = await pipeline.exec();
const count = results![2][1] as number;
const allowed = count <= 100;
Accurate but memory-intensive at high throughput โ every request stores a timestamp entry.
Sliding Window Counter (Recommended)
A hybrid approach: store the count for the current and previous window, then compute the weighted average. Accurate to within ~0.1% of the true sliding window, with O(1) storage per client.
// lib/rateLimiter.ts
import Redis from 'ioredis';
const redis = new Redis(process.env.REDIS_URL!);
interface RateLimitResult {
allowed: boolean;
remaining: number;
resetAt: number;
limit: number;
}
export async function slidingWindowCounter(
clientId: string,
windowMs: number,
maxRequests: number
): Promise<RateLimitResult> {
const now = Date.now();
const windowSec = windowMs / 1000;
const currentWindow = Math.floor(now / windowMs);
const previousWindow = currentWindow - 1;
const currentKey = `rl:${clientId}:${currentWindow}`;
const previousKey = `rl:${clientId}:${previousWindow}`;
const [currentCount, previousCount] = await Promise.all([
redis.get(currentKey).then(v => parseInt(v ?? '0', 10)),
redis.get(previousKey).then(v => parseInt(v ?? '0', 10)),
]);
// Weight previous window by how much of the current window has passed
const elapsedInCurrentWindow = (now % windowMs) / windowMs;
const weightedCount =
previousCount * (1 - elapsedInCurrentWindow) + currentCount;
const allowed = weightedCount < maxRequests;
if (allowed) {
const pipeline = redis.pipeline();
pipeline.incr(currentKey);
pipeline.expire(currentKey, Math.ceil(windowSec * 2));
await pipeline.exec();
}
const remaining = Math.max(0, maxRequests - Math.ceil(weightedCount));
const resetAt = (currentWindow + 1) * windowMs;
return { allowed, remaining, resetAt, limit: maxRequests };
}
Token Bucket (Best for Bursts)
A client accumulates tokens over time. Each request consumes a token. Allows short bursts while enforcing an average rate.
export async function tokenBucket(
clientId: string,
bucketCapacity: number, // Max burst size
refillRatePerSecond: number,
tokensConsumed = 1
): Promise<RateLimitResult> {
const now = Date.now() / 1000; // Unix timestamp in seconds
const key = `rl:bucket:${clientId}`;
// Use a Lua script for atomic read-modify-write
const luaScript = `
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local tokens_requested = tonumber(ARGV[4])
local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(data[1]) or capacity
local last_refill = tonumber(data[2]) or now
-- Add tokens based on elapsed time
local elapsed = now - last_refill
tokens = math.min(capacity, tokens + elapsed * refill_rate)
local allowed = 0
if tokens >= tokens_requested then
tokens = tokens - tokens_requested
allowed = 1
end
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) + 60)
return {allowed, math.floor(tokens)}
`;
const result = await redis.eval(
luaScript,
1,
key,
bucketCapacity.toString(),
refillRatePerSecond.toString(),
now.toString(),
tokensConsumed.toString()
) as [number, number];
return {
allowed: result[0] === 1,
remaining: result[1],
resetAt: Date.now() + (1 / refillRatePerSecond) * 1000,
limit: bucketCapacity,
};
}
Middleware Implementation
// middleware/rateLimiter.ts
import { FastifyRequest, FastifyReply } from 'fastify';
import { slidingWindowCounter } from '@/lib/rateLimiter';
interface RateLimitConfig {
windowMs: number;
maxRequests: number;
keyFn?: (request: FastifyRequest) => string;
}
export function createRateLimiter(config: RateLimitConfig) {
return async function rateLimitMiddleware(
request: FastifyRequest,
reply: FastifyReply
) {
// Default key: per authenticated user, fallback to IP
const clientId = config.keyFn
? config.keyFn(request)
: (request.headers['x-user-id'] as string) ?? request.ip;
const result = await slidingWindowCounter(
clientId,
config.windowMs,
config.maxRequests
);
// Standard rate limit response headers
reply.header('X-RateLimit-Limit', config.maxRequests);
reply.header('X-RateLimit-Remaining', result.remaining);
reply.header('X-RateLimit-Reset', Math.ceil(result.resetAt / 1000));
reply.header('X-RateLimit-Policy', `${config.maxRequests};w=${config.windowMs / 1000}`);
if (!result.allowed) {
const retryAfter = Math.ceil((result.resetAt - Date.now()) / 1000);
reply.header('Retry-After', retryAfter);
return reply.code(429).send({
error: 'Too Many Requests',
message: `Rate limit exceeded. Retry after ${retryAfter} seconds.`,
retryAfter,
});
}
};
}
// Per-route rate limits
const globalLimit = createRateLimiter({ windowMs: 60_000, maxRequests: 100 });
const strictLimit = createRateLimiter({
windowMs: 60_000,
maxRequests: 5,
keyFn: (req) => `auth:${req.ip}`, // Per-IP for auth endpoints
});
// Apply different limits per route
app.post('/auth/login', { preHandler: strictLimit }, loginHandler);
app.post('/auth/forgot-password', { preHandler: strictLimit }, forgotPasswordHandler);
app.get('/api/*', { preHandler: globalLimit }, apiHandler);
๐ Looking for a Dev Team That Actually Delivers?
Most agencies sell you a project manager and assign juniors. Viprasol is different โ senior engineers only, direct Slack access, and a 5.0โ Upwork record across 100+ projects.
- React, Next.js, Node.js, TypeScript โ production-grade stack
- Fixed-price contracts โ no surprise invoices
- Full source code ownership from day one
- 90-day post-launch support included
Tiered Rate Limits by Plan
SaaS products often have different limits by subscription tier:
// middleware/tieredRateLimit.ts
const PLAN_LIMITS = {
free: { requestsPerMinute: 30, requestsPerDay: 1_000 },
starter: { requestsPerMinute: 100, requestsPerDay: 10_000 },
pro: { requestsPerMinute: 500, requestsPerDay: 100_000 },
enterprise: { requestsPerMinute: 2_000, requestsPerDay: 1_000_000 },
} as const;
export async function tieredRateLimitMiddleware(
request: FastifyRequest,
reply: FastifyReply
) {
const userId = request.headers['x-user-id'] as string;
if (!userId) return; // Unauthenticated โ handled by auth middleware
// Cache plan lookup to avoid DB hit on every request
const plan = await getCachedUserPlan(userId);
const limits = PLAN_LIMITS[plan] ?? PLAN_LIMITS.free;
// Check both per-minute and per-day limits
const [minuteResult, dayResult] = await Promise.all([
slidingWindowCounter(`${userId}:min`, 60_000, limits.requestsPerMinute),
slidingWindowCounter(`${userId}:day`, 86_400_000, limits.requestsPerDay),
]);
// Most restrictive limit wins
const result = !minuteResult.allowed ? minuteResult : dayResult;
reply.header('X-RateLimit-Limit-Minute', limits.requestsPerMinute);
reply.header('X-RateLimit-Limit-Day', limits.requestsPerDay);
reply.header('X-RateLimit-Remaining-Minute', minuteResult.remaining);
reply.header('X-RateLimit-Remaining-Day', dayResult.remaining);
reply.header('X-RateLimit-Plan', plan);
if (!result.allowed) {
const retryAfter = Math.ceil((result.resetAt - Date.now()) / 1000);
reply.header('Retry-After', retryAfter);
return reply.code(429).send({
error: 'Rate limit exceeded',
plan,
retryAfter,
upgradeUrl: 'https://yourapp.com/pricing',
});
}
}
Nginx Rate Limiting (Infrastructure Level)
Rate limiting at the Nginx/load balancer level prevents traffic from ever reaching your application:
# nginx.conf
http {
# Define rate limit zones
# $binary_remote_addr uses 4 bytes (vs 15+ for $remote_addr)
limit_req_zone $binary_remote_addr zone=ip_limit:10m rate=10r/s;
limit_req_zone $http_x_api_key zone=api_key_limit:10m rate=100r/s;
# Connection limits (separate from request rate)
limit_conn_zone $binary_remote_addr zone=conn_limit:10m;
server {
# Global IP rate limit โ 10 req/s, burst of 20, no delay
limit_req zone=ip_limit burst=20 nodelay;
limit_conn conn_limit 100; # Max 100 concurrent connections per IP
location /api/ {
# Per-API-key limit โ allows legitimate high-volume clients
limit_req zone=api_key_limit burst=200 nodelay;
proxy_pass http://api_backend;
}
location /auth/ {
# Stricter limit for auth endpoints
limit_req zone=ip_limit burst=5 nodelay;
limit_req_status 429;
proxy_pass http://api_backend;
}
# Custom error page for 429
error_page 429 /429.json;
location = /429.json {
internal;
return 429 '{"error":"Too Many Requests","retryAfter":60}';
add_header Content-Type application/json;
add_header Retry-After 60;
}
}
}
๐ Senior Engineers. No Junior Handoffs. Ever.
You get the senior developer, not a project manager who relays your requirements to someone you never meet. Every Viprasol project has a senior lead from kickoff to launch.
- MVPs in 4โ8 weeks, full platforms in 3โ5 months
- Lighthouse 90+ performance scores standard
- Works across US, UK, AU timezones
- Free 30-min architecture review, no commitment
Algorithm Comparison
| Algorithm | Memory | Accuracy | Burst Handling | Best For |
|---|---|---|---|---|
| Fixed window | O(1) | Medium | Allows 2ร burst at boundary | Simple use cases |
| Sliding window log | O(requests) | High | Exact | Low-volume, strict accuracy |
| Sliding window counter | O(1) | High | Accurate | Most production APIs |
| Token bucket | O(1) | High | Explicit burst capacity | APIs with legitimate burst needs |
| Leaky bucket | O(1) | High | Smooths all bursts | Strictly smooth output rate |
Rate Limit Header Standards
Follow the IETF draft standard for rate limit headers:
RateLimit-Limit: 100 # Requests allowed in the window
RateLimit-Remaining: 75 # Requests remaining
RateLimit-Reset: 1714428600 # Unix timestamp when limit resets
X-RateLimit-Limit: 100 # Also send X- prefixed variants for compatibility
X-RateLimit-Remaining: 75
X-RateLimit-Reset: 1714428600
Retry-After: 30 # On 429: seconds until client can retry
Working With Viprasol
We implement rate limiting as part of API development and security hardening engagements โ covering per-user, per-IP, and per-endpoint limits, tiered limits by subscription plan, distributed rate limiting with Redis, and infrastructure-level limiting with Nginx or AWS API Gateway.
โ Talk to our API team about rate limiting your application.
See Also
- API Gateway Patterns โ rate limiting at the gateway layer
- Redis Use Cases โ Redis data structures for rate limiting
- API Security Best Practices โ rate limiting as a security control
- Webhook Design Patterns โ rate limiting outbound webhook delivery
- Web Development Services โ API architecture and security
About the Author
Viprasol Tech Team
Custom Software Development Specialists
The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.
Need a Modern Web Application?
From landing pages to complex SaaS platforms โ we build it all with Next.js and React.
Free consultation โข No commitment โข Response within 24 hours
Need a custom web application built?
We build React and Next.js web applications with Lighthouse โฅ90 scores, mobile-first design, and full source code ownership. Senior engineers only โ from architecture through deployment.