API Rate Limiting: Token Bucket, Sliding Window, and Production Implementation

Rate limiting is the control mechanism that prevents your API from being overwhelmed by a single client — whether that's an abusive user, a misconfigured client firing requests in a tight loop, or a DDoS attack. Without it, one bad actor can degrade service for everyone.

This guide covers the algorithms and production implementation patterns that handle millions of requests per day.

The Core Algorithms

Fixed Window Counter

The simplest algorithm: count requests per client per time window.

// Count requests in the current minute
const key = `rl:${clientId}:${Math.floor(Date.now() / 60_000)}`;
const count = await redis.incr(key);
await redis.expire(key, 60);
const allowed = count <= 100;

The problem: At window boundaries, a client can fire 100 requests at 11:59:59 and 100 more at 12:00:00 — 200 requests in 2 seconds. The fixed window allows 2× the intended burst.

Sliding Window Log

Track the timestamp of every request. Count requests in the rolling window.

const now = Date.now();
const windowStart = now - 60_000;
const key = `rl:log:${clientId}`;

const pipeline = redis.pipeline();
pipeline.zremrangebyscore(key, '-inf', windowStart);  // Remove old entries
pipeline.zadd(key, now, `${now}-${Math.random()}`);   // Add this request
pipeline.zcard(key);                                   // Count in window
pipeline.expire(key, 60);

const results = await pipeline.exec();
const count = results![2][1] as number;
const allowed = count <= 100;

Accurate but memory-intensive at high throughput — every request stores a timestamp entry.

Sliding Window Counter (Recommended)

A hybrid approach: store the count for the current and previous window, then compute the weighted average. Accurate to within ~0.1% of the true sliding window, with O(1) storage per client.

// lib/rateLimiter.ts
import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL!);

interface RateLimitResult {
  allowed: boolean;
  remaining: number;
  resetAt: number;
  limit: number;
}

export async function slidingWindowCounter(
  clientId: string,
  windowMs: number,
  maxRequests: number
): Promise<RateLimitResult> {
  const now = Date.now();
  const windowSec = windowMs / 1000;
  const currentWindow = Math.floor(now / windowMs);
  const previousWindow = currentWindow - 1;

  const currentKey = `rl:${clientId}:${currentWindow}`;
  const previousKey = `rl:${clientId}:${previousWindow}`;

  const [currentCount, previousCount] = await Promise.all([
    redis.get(currentKey).then(v => parseInt(v ?? '0', 10)),
    redis.get(previousKey).then(v => parseInt(v ?? '0', 10)),
  ]);

  // Weight previous window by how much of the current window has passed
  const elapsedInCurrentWindow = (now % windowMs) / windowMs;
  const weightedCount =
    previousCount * (1 - elapsedInCurrentWindow) + currentCount;

  const allowed = weightedCount < maxRequests;

  if (allowed) {
    const pipeline = redis.pipeline();
    pipeline.incr(currentKey);
    pipeline.expire(currentKey, Math.ceil(windowSec * 2));
    await pipeline.exec();
  }

  const remaining = Math.max(0, maxRequests - Math.ceil(weightedCount));
  const resetAt = (currentWindow + 1) * windowMs;

  return { allowed, remaining, resetAt, limit: maxRequests };
}

Token Bucket (Best for Bursts)

A client accumulates tokens over time. Each request consumes a token. Allows short bursts while enforcing an average rate.

export async function tokenBucket(
  clientId: string,
  bucketCapacity: number,  // Max burst size
  refillRatePerSecond: number,
  tokensConsumed = 1
): Promise<RateLimitResult> {
  const now = Date.now() / 1000;  // Unix timestamp in seconds
  const key = `rl:bucket:${clientId}`;

  // Use a Lua script for atomic read-modify-write
  const luaScript = `
    local key = KEYS[1]
    local capacity = tonumber(ARGV[1])
    local refill_rate = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])
    local tokens_requested = tonumber(ARGV[4])

    local data = redis.call('HMGET', key, 'tokens', 'last_refill')
    local tokens = tonumber(data[1]) or capacity
    local last_refill = tonumber(data[2]) or now

    -- Add tokens based on elapsed time
    local elapsed = now - last_refill
    tokens = math.min(capacity, tokens + elapsed * refill_rate)

    local allowed = 0
    if tokens >= tokens_requested then
      tokens = tokens - tokens_requested
      allowed = 1
    end

    redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
    redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) + 60)

    return {allowed, math.floor(tokens)}
  `;

  const result = await redis.eval(
    luaScript,
    1,
    key,
    bucketCapacity.toString(),
    refillRatePerSecond.toString(),
    now.toString(),
    tokensConsumed.toString()
  ) as [number, number];

  return {
    allowed: result[0] === 1,
    remaining: result[1],
    resetAt: Date.now() + (1 / refillRatePerSecond) * 1000,
    limit: bucketCapacity,
  };
}

Middleware Implementation

// middleware/rateLimiter.ts
import { FastifyRequest, FastifyReply } from 'fastify';
import { slidingWindowCounter } from '@/lib/rateLimiter';

interface RateLimitConfig {
  windowMs: number;
  maxRequests: number;
  keyFn?: (request: FastifyRequest) => string;
}

export function createRateLimiter(config: RateLimitConfig) {
  return async function rateLimitMiddleware(
    request: FastifyRequest,
    reply: FastifyReply
  ) {
    // Default key: per authenticated user, fallback to IP
    const clientId = config.keyFn
      ? config.keyFn(request)
      : (request.headers['x-user-id'] as string) ?? request.ip;

    const result = await slidingWindowCounter(
      clientId,
      config.windowMs,
      config.maxRequests
    );

    // Standard rate limit response headers
    reply.header('X-RateLimit-Limit', config.maxRequests);
    reply.header('X-RateLimit-Remaining', result.remaining);
    reply.header('X-RateLimit-Reset', Math.ceil(result.resetAt / 1000));
    reply.header('X-RateLimit-Policy', `${config.maxRequests};w=${config.windowMs / 1000}`);

    if (!result.allowed) {
      const retryAfter = Math.ceil((result.resetAt - Date.now()) / 1000);
      reply.header('Retry-After', retryAfter);
      return reply.code(429).send({
        error: 'Too Many Requests',
        message: `Rate limit exceeded. Retry after ${retryAfter} seconds.`,
        retryAfter,
      });
    }
  };
}

// Per-route rate limits
const globalLimit = createRateLimiter({ windowMs: 60_000, maxRequests: 100 });

const strictLimit = createRateLimiter({
  windowMs: 60_000,
  maxRequests: 5,
  keyFn: (req) => `auth:${req.ip}`,  // Per-IP for auth endpoints
});

// Apply different limits per route
app.post('/auth/login', { preHandler: strictLimit }, loginHandler);
app.post('/auth/forgot-password', { preHandler: strictLimit }, forgotPasswordHandler);
app.get('/api/*', { preHandler: globalLimit }, apiHandler);

🌐 Looking for a Dev Team That Actually Delivers?

Most agencies sell you a project manager and assign juniors. Viprasol is different — senior engineers only, direct Slack access, and a 5.0★ Upwork record across 100+ projects.

React, Next.js, Node.js, TypeScript — production-grade stack
Fixed-price contracts — no surprise invoices
Full source code ownership from day one
90-day post-launch support included

Get a Free Scope Review WhatsApp

Tiered Rate Limits by Plan

SaaS products often have different limits by subscription tier:

// middleware/tieredRateLimit.ts
const PLAN_LIMITS = {
  free: { requestsPerMinute: 30, requestsPerDay: 1_000 },
  starter: { requestsPerMinute: 100, requestsPerDay: 10_000 },
  pro: { requestsPerMinute: 500, requestsPerDay: 100_000 },
  enterprise: { requestsPerMinute: 2_000, requestsPerDay: 1_000_000 },
} as const;

export async function tieredRateLimitMiddleware(
  request: FastifyRequest,
  reply: FastifyReply
) {
  const userId = request.headers['x-user-id'] as string;
  if (!userId) return;  // Unauthenticated — handled by auth middleware

  // Cache plan lookup to avoid DB hit on every request
  const plan = await getCachedUserPlan(userId);
  const limits = PLAN_LIMITS[plan] ?? PLAN_LIMITS.free;

  // Check both per-minute and per-day limits
  const [minuteResult, dayResult] = await Promise.all([
    slidingWindowCounter(`${userId}:min`, 60_000, limits.requestsPerMinute),
    slidingWindowCounter(`${userId}:day`, 86_400_000, limits.requestsPerDay),
  ]);

  // Most restrictive limit wins
  const result = !minuteResult.allowed ? minuteResult : dayResult;

  reply.header('X-RateLimit-Limit-Minute', limits.requestsPerMinute);
  reply.header('X-RateLimit-Limit-Day', limits.requestsPerDay);
  reply.header('X-RateLimit-Remaining-Minute', minuteResult.remaining);
  reply.header('X-RateLimit-Remaining-Day', dayResult.remaining);
  reply.header('X-RateLimit-Plan', plan);

  if (!result.allowed) {
    const retryAfter = Math.ceil((result.resetAt - Date.now()) / 1000);
    reply.header('Retry-After', retryAfter);
    return reply.code(429).send({
      error: 'Rate limit exceeded',
      plan,
      retryAfter,
      upgradeUrl: 'https://yourapp.com/pricing',
    });
  }
}

Nginx Rate Limiting (Infrastructure Level)

Rate limiting at the Nginx/load balancer level prevents traffic from ever reaching your application:

# nginx.conf
http {
  # Define rate limit zones
  # $binary_remote_addr uses 4 bytes (vs 15+ for $remote_addr)
  limit_req_zone $binary_remote_addr zone=ip_limit:10m rate=10r/s;
  limit_req_zone $http_x_api_key zone=api_key_limit:10m rate=100r/s;

  # Connection limits (separate from request rate)
  limit_conn_zone $binary_remote_addr zone=conn_limit:10m;

  server {
    # Global IP rate limit — 10 req/s, burst of 20, no delay
    limit_req zone=ip_limit burst=20 nodelay;
    limit_conn conn_limit 100;  # Max 100 concurrent connections per IP

    location /api/ {
      # Per-API-key limit — allows legitimate high-volume clients
      limit_req zone=api_key_limit burst=200 nodelay;

      proxy_pass http://api_backend;
    }

    location /auth/ {
      # Stricter limit for auth endpoints
      limit_req zone=ip_limit burst=5 nodelay;
      limit_req_status 429;

      proxy_pass http://api_backend;
    }

    # Custom error page for 429
    error_page 429 /429.json;
    location = /429.json {
      internal;
      return 429 '{"error":"Too Many Requests","retryAfter":60}';
      add_header Content-Type application/json;
      add_header Retry-After 60;
    }
  }
}

🚀 Senior Engineers. No Junior Handoffs. Ever.

You get the senior developer, not a project manager who relays your requirements to someone you never meet. Every Viprasol project has a senior lead from kickoff to launch.

MVPs in 4–8 weeks, full platforms in 3–5 months
Lighthouse 90+ performance scores standard
Works across US, UK, AU timezones
Free 30-min architecture review, no commitment

Start My Project WhatsApp

Algorithm Comparison

Algorithm	Memory	Accuracy	Burst Handling	Best For
Fixed window	O(1)	Medium	Allows 2× burst at boundary	Simple use cases
Sliding window log	O(requests)	High	Exact	Low-volume, strict accuracy
Sliding window counter	O(1)	High	Accurate	Most production APIs
Token bucket	O(1)	High	Explicit burst capacity	APIs with legitimate burst needs
Leaky bucket	O(1)	High	Smooths all bursts	Strictly smooth output rate

Rate Limit Header Standards

Follow the IETF draft standard for rate limit headers:

RateLimit-Limit: 100           # Requests allowed in the window
RateLimit-Remaining: 75        # Requests remaining
RateLimit-Reset: 1714428600    # Unix timestamp when limit resets

X-RateLimit-Limit: 100         # Also send X- prefixed variants for compatibility
X-RateLimit-Remaining: 75
X-RateLimit-Reset: 1714428600
Retry-After: 30                # On 429: seconds until client can retry

Working With Viprasol

We implement rate limiting as part of API development and security hardening engagements — covering per-user, per-IP, and per-endpoint limits, tiered limits by subscription plan, distributed rate limiting with Redis, and infrastructure-level limiting with Nginx or AWS API Gateway.

→ Talk to our API team about rate limiting your application.

API Rate Limiting: Token Bucket, Sliding Window, and Production Implementation

API Rate Limiting: Token Bucket, Sliding Window, and Production Implementation

The Core Algorithms

Fixed Window Counter

Sliding Window Log

Sliding Window Counter (Recommended)

Token Bucket (Best for Bursts)

Middleware Implementation

🌐 Looking for a Dev Team That Actually Delivers?

Tiered Rate Limits by Plan

Nginx Rate Limiting (Infrastructure Level)

🚀 Senior Engineers. No Junior Handoffs. Ever.

Algorithm Comparison

Rate Limit Header Standards

Working With Viprasol

See Also

Viprasol Tech Team

Need a Modern Web Application?

Need a custom web application built?

Related Articles

Next.js API Rate Limiting with Upstash Redis: Per-User, Per-IP, and Sliding Window Algorithms

Advanced API Rate Limiting in 2026: Token Bucket, Sliding Window, Redis Lua, and Tier-Based Limits

SaaS API Rate Limiting: Token Bucket, Sliding Window, Per-Plan Limits, and Stripe-Style Headers