Back to Blog

API Rate Limiting: Token Bucket, Sliding Window, and Production Implementation

Implement production API rate limiting with token bucket and sliding window algorithms. Covers Redis implementation, Nginx config, per-user and per-endpoint lim

Viprasol Tech Team
April 19, 2026
12 min read

API Rate Limiting: Token Bucket, Sliding Window, and Production Implementation

Rate limiting is the control mechanism that prevents your API from being overwhelmed by a single client โ€” whether that's an abusive user, a misconfigured client firing requests in a tight loop, or a DDoS attack. Without it, one bad actor can degrade service for everyone.

This guide covers the algorithms and production implementation patterns that handle millions of requests per day.


The Core Algorithms

Fixed Window Counter

The simplest algorithm: count requests per client per time window.

// Count requests in the current minute
const key = `rl:${clientId}:${Math.floor(Date.now() / 60_000)}`;
const count = await redis.incr(key);
await redis.expire(key, 60);
const allowed = count <= 100;

The problem: At window boundaries, a client can fire 100 requests at 11:59:59 and 100 more at 12:00:00 โ€” 200 requests in 2 seconds. The fixed window allows 2ร— the intended burst.

Sliding Window Log

Track the timestamp of every request. Count requests in the rolling window.

const now = Date.now();
const windowStart = now - 60_000;
const key = `rl:log:${clientId}`;

const pipeline = redis.pipeline();
pipeline.zremrangebyscore(key, '-inf', windowStart);  // Remove old entries
pipeline.zadd(key, now, `${now}-${Math.random()}`);   // Add this request
pipeline.zcard(key);                                   // Count in window
pipeline.expire(key, 60);

const results = await pipeline.exec();
const count = results![2][1] as number;
const allowed = count <= 100;

Accurate but memory-intensive at high throughput โ€” every request stores a timestamp entry.

Sliding Window Counter (Recommended)

A hybrid approach: store the count for the current and previous window, then compute the weighted average. Accurate to within ~0.1% of the true sliding window, with O(1) storage per client.

// lib/rateLimiter.ts
import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL!);

interface RateLimitResult {
  allowed: boolean;
  remaining: number;
  resetAt: number;
  limit: number;
}

export async function slidingWindowCounter(
  clientId: string,
  windowMs: number,
  maxRequests: number
): Promise<RateLimitResult> {
  const now = Date.now();
  const windowSec = windowMs / 1000;
  const currentWindow = Math.floor(now / windowMs);
  const previousWindow = currentWindow - 1;

  const currentKey = `rl:${clientId}:${currentWindow}`;
  const previousKey = `rl:${clientId}:${previousWindow}`;

  const [currentCount, previousCount] = await Promise.all([
    redis.get(currentKey).then(v => parseInt(v ?? '0', 10)),
    redis.get(previousKey).then(v => parseInt(v ?? '0', 10)),
  ]);

  // Weight previous window by how much of the current window has passed
  const elapsedInCurrentWindow = (now % windowMs) / windowMs;
  const weightedCount =
    previousCount * (1 - elapsedInCurrentWindow) + currentCount;

  const allowed = weightedCount < maxRequests;

  if (allowed) {
    const pipeline = redis.pipeline();
    pipeline.incr(currentKey);
    pipeline.expire(currentKey, Math.ceil(windowSec * 2));
    await pipeline.exec();
  }

  const remaining = Math.max(0, maxRequests - Math.ceil(weightedCount));
  const resetAt = (currentWindow + 1) * windowMs;

  return { allowed, remaining, resetAt, limit: maxRequests };
}

Token Bucket (Best for Bursts)

A client accumulates tokens over time. Each request consumes a token. Allows short bursts while enforcing an average rate.

export async function tokenBucket(
  clientId: string,
  bucketCapacity: number,  // Max burst size
  refillRatePerSecond: number,
  tokensConsumed = 1
): Promise<RateLimitResult> {
  const now = Date.now() / 1000;  // Unix timestamp in seconds
  const key = `rl:bucket:${clientId}`;

  // Use a Lua script for atomic read-modify-write
  const luaScript = `
    local key = KEYS[1]
    local capacity = tonumber(ARGV[1])
    local refill_rate = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])
    local tokens_requested = tonumber(ARGV[4])

    local data = redis.call('HMGET', key, 'tokens', 'last_refill')
    local tokens = tonumber(data[1]) or capacity
    local last_refill = tonumber(data[2]) or now

    -- Add tokens based on elapsed time
    local elapsed = now - last_refill
    tokens = math.min(capacity, tokens + elapsed * refill_rate)

    local allowed = 0
    if tokens >= tokens_requested then
      tokens = tokens - tokens_requested
      allowed = 1
    end

    redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
    redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) + 60)

    return {allowed, math.floor(tokens)}
  `;

  const result = await redis.eval(
    luaScript,
    1,
    key,
    bucketCapacity.toString(),
    refillRatePerSecond.toString(),
    now.toString(),
    tokensConsumed.toString()
  ) as [number, number];

  return {
    allowed: result[0] === 1,
    remaining: result[1],
    resetAt: Date.now() + (1 / refillRatePerSecond) * 1000,
    limit: bucketCapacity,
  };
}

Middleware Implementation

// middleware/rateLimiter.ts
import { FastifyRequest, FastifyReply } from 'fastify';
import { slidingWindowCounter } from '@/lib/rateLimiter';

interface RateLimitConfig {
  windowMs: number;
  maxRequests: number;
  keyFn?: (request: FastifyRequest) => string;
}

export function createRateLimiter(config: RateLimitConfig) {
  return async function rateLimitMiddleware(
    request: FastifyRequest,
    reply: FastifyReply
  ) {
    // Default key: per authenticated user, fallback to IP
    const clientId = config.keyFn
      ? config.keyFn(request)
      : (request.headers['x-user-id'] as string) ?? request.ip;

    const result = await slidingWindowCounter(
      clientId,
      config.windowMs,
      config.maxRequests
    );

    // Standard rate limit response headers
    reply.header('X-RateLimit-Limit', config.maxRequests);
    reply.header('X-RateLimit-Remaining', result.remaining);
    reply.header('X-RateLimit-Reset', Math.ceil(result.resetAt / 1000));
    reply.header('X-RateLimit-Policy', `${config.maxRequests};w=${config.windowMs / 1000}`);

    if (!result.allowed) {
      const retryAfter = Math.ceil((result.resetAt - Date.now()) / 1000);
      reply.header('Retry-After', retryAfter);
      return reply.code(429).send({
        error: 'Too Many Requests',
        message: `Rate limit exceeded. Retry after ${retryAfter} seconds.`,
        retryAfter,
      });
    }
  };
}

// Per-route rate limits
const globalLimit = createRateLimiter({ windowMs: 60_000, maxRequests: 100 });

const strictLimit = createRateLimiter({
  windowMs: 60_000,
  maxRequests: 5,
  keyFn: (req) => `auth:${req.ip}`,  // Per-IP for auth endpoints
});

// Apply different limits per route
app.post('/auth/login', { preHandler: strictLimit }, loginHandler);
app.post('/auth/forgot-password', { preHandler: strictLimit }, forgotPasswordHandler);
app.get('/api/*', { preHandler: globalLimit }, apiHandler);

๐ŸŒ Looking for a Dev Team That Actually Delivers?

Most agencies sell you a project manager and assign juniors. Viprasol is different โ€” senior engineers only, direct Slack access, and a 5.0โ˜… Upwork record across 100+ projects.

  • React, Next.js, Node.js, TypeScript โ€” production-grade stack
  • Fixed-price contracts โ€” no surprise invoices
  • Full source code ownership from day one
  • 90-day post-launch support included

Tiered Rate Limits by Plan

SaaS products often have different limits by subscription tier:

// middleware/tieredRateLimit.ts
const PLAN_LIMITS = {
  free: { requestsPerMinute: 30, requestsPerDay: 1_000 },
  starter: { requestsPerMinute: 100, requestsPerDay: 10_000 },
  pro: { requestsPerMinute: 500, requestsPerDay: 100_000 },
  enterprise: { requestsPerMinute: 2_000, requestsPerDay: 1_000_000 },
} as const;

export async function tieredRateLimitMiddleware(
  request: FastifyRequest,
  reply: FastifyReply
) {
  const userId = request.headers['x-user-id'] as string;
  if (!userId) return;  // Unauthenticated โ€” handled by auth middleware

  // Cache plan lookup to avoid DB hit on every request
  const plan = await getCachedUserPlan(userId);
  const limits = PLAN_LIMITS[plan] ?? PLAN_LIMITS.free;

  // Check both per-minute and per-day limits
  const [minuteResult, dayResult] = await Promise.all([
    slidingWindowCounter(`${userId}:min`, 60_000, limits.requestsPerMinute),
    slidingWindowCounter(`${userId}:day`, 86_400_000, limits.requestsPerDay),
  ]);

  // Most restrictive limit wins
  const result = !minuteResult.allowed ? minuteResult : dayResult;

  reply.header('X-RateLimit-Limit-Minute', limits.requestsPerMinute);
  reply.header('X-RateLimit-Limit-Day', limits.requestsPerDay);
  reply.header('X-RateLimit-Remaining-Minute', minuteResult.remaining);
  reply.header('X-RateLimit-Remaining-Day', dayResult.remaining);
  reply.header('X-RateLimit-Plan', plan);

  if (!result.allowed) {
    const retryAfter = Math.ceil((result.resetAt - Date.now()) / 1000);
    reply.header('Retry-After', retryAfter);
    return reply.code(429).send({
      error: 'Rate limit exceeded',
      plan,
      retryAfter,
      upgradeUrl: 'https://yourapp.com/pricing',
    });
  }
}

Nginx Rate Limiting (Infrastructure Level)

Rate limiting at the Nginx/load balancer level prevents traffic from ever reaching your application:

# nginx.conf
http {
  # Define rate limit zones
  # $binary_remote_addr uses 4 bytes (vs 15+ for $remote_addr)
  limit_req_zone $binary_remote_addr zone=ip_limit:10m rate=10r/s;
  limit_req_zone $http_x_api_key zone=api_key_limit:10m rate=100r/s;

  # Connection limits (separate from request rate)
  limit_conn_zone $binary_remote_addr zone=conn_limit:10m;

  server {
    # Global IP rate limit โ€” 10 req/s, burst of 20, no delay
    limit_req zone=ip_limit burst=20 nodelay;
    limit_conn conn_limit 100;  # Max 100 concurrent connections per IP

    location /api/ {
      # Per-API-key limit โ€” allows legitimate high-volume clients
      limit_req zone=api_key_limit burst=200 nodelay;

      proxy_pass http://api_backend;
    }

    location /auth/ {
      # Stricter limit for auth endpoints
      limit_req zone=ip_limit burst=5 nodelay;
      limit_req_status 429;

      proxy_pass http://api_backend;
    }

    # Custom error page for 429
    error_page 429 /429.json;
    location = /429.json {
      internal;
      return 429 '{"error":"Too Many Requests","retryAfter":60}';
      add_header Content-Type application/json;
      add_header Retry-After 60;
    }
  }
}

๐Ÿš€ Senior Engineers. No Junior Handoffs. Ever.

You get the senior developer, not a project manager who relays your requirements to someone you never meet. Every Viprasol project has a senior lead from kickoff to launch.

  • MVPs in 4โ€“8 weeks, full platforms in 3โ€“5 months
  • Lighthouse 90+ performance scores standard
  • Works across US, UK, AU timezones
  • Free 30-min architecture review, no commitment

Algorithm Comparison

AlgorithmMemoryAccuracyBurst HandlingBest For
Fixed windowO(1)MediumAllows 2ร— burst at boundarySimple use cases
Sliding window logO(requests)HighExactLow-volume, strict accuracy
Sliding window counterO(1)HighAccurateMost production APIs
Token bucketO(1)HighExplicit burst capacityAPIs with legitimate burst needs
Leaky bucketO(1)HighSmooths all burstsStrictly smooth output rate

Rate Limit Header Standards

Follow the IETF draft standard for rate limit headers:

RateLimit-Limit: 100           # Requests allowed in the window
RateLimit-Remaining: 75        # Requests remaining
RateLimit-Reset: 1714428600    # Unix timestamp when limit resets

X-RateLimit-Limit: 100         # Also send X- prefixed variants for compatibility
X-RateLimit-Remaining: 75
X-RateLimit-Reset: 1714428600
Retry-After: 30                # On 429: seconds until client can retry

Working With Viprasol

We implement rate limiting as part of API development and security hardening engagements โ€” covering per-user, per-IP, and per-endpoint limits, tiered limits by subscription plan, distributed rate limiting with Redis, and infrastructure-level limiting with Nginx or AWS API Gateway.

โ†’ Talk to our API team about rate limiting your application.


See Also

Share this article:

About the Author

V

Viprasol Tech Team

Custom Software Development Specialists

The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.

MT4/MT5 EA DevelopmentAI Agent SystemsSaaS DevelopmentAlgorithmic Trading

Need a Modern Web Application?

From landing pages to complex SaaS platforms โ€” we build it all with Next.js and React.

Free consultation โ€ข No commitment โ€ข Response within 24 hours

Viprasol ยท Web Development

Need a custom web application built?

We build React and Next.js web applications with Lighthouse โ‰ฅ90 scores, mobile-first design, and full source code ownership. Senior engineers only โ€” from architecture through deployment.