API rate limiting is how you prevent one customer from ruining the experience for everyone else. Without it, a single misconfigured script can saturate your database, spike your bill, and degrade service for your entire customer base. With it properly implemented, you protect infrastructure, enforce pricing tiers, and give API consumers the feedback they need to behave well.

This guide covers the two algorithms used in production SaaS systems, how to implement them atomically in Redis, and how to build the per-plan limit system your pricing page will need.

Choosing the Right Algorithm

Token Bucket

Bucket holds N tokens (capacity)
Every T seconds, add R tokens (refill rate)
Each request consumes 1 token
If bucket is empty → 429

Strengths: Allows burst up to bucket capacity while enforcing average rate. Natural model for API consumers who occasionally burst.

Weakness: Users learn to burst at refill time.

Sliding Window Counter

Track request count in a rolling window (e.g., last 60 seconds)
New request: count requests in [now - window, now]
If count >= limit → 429

Strengths: Smoothly enforced rate — no gaming the refill cycle. More accurate than fixed window (which allows 2× burst at window boundaries).

Weakness: Slightly more complex to implement correctly.

Fixed Window Counter

Simpler but has the boundary problem: a user making 100 requests at 11:59:50 and 100 more at 12:00:05 effectively makes 200 requests within 15 seconds against a "100 per minute" limit.

Use sliding window for anything customer-facing. Fixed window for internal limits where precision matters less.

Redis Lua Scripts: Atomic Rate Limiting

Rate limit checks must be atomic — checking count and incrementing must happen in one operation, or two concurrent requests can both pass a limit that should only allow one.

Sliding Window in Lua

-- sliding-window.lua
-- KEYS[1] = rate limit key (e.g., "ratelimit:user:123:api_calls")
-- ARGV[1] = current timestamp (milliseconds)
-- ARGV[2] = window size (milliseconds)
-- ARGV[3] = limit (max requests per window)
-- ARGV[4] = unique request ID (for ZADD dedup)

local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
local req_id = ARGV[4]
local window_start = now - window

-- Remove expired entries
redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start)

-- Count current requests in window
local current = redis.call('ZCARD', key)

if current >= limit then
  -- Get the oldest request timestamp to calculate retry-after
  local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
  local retry_after_ms = window - (now - tonumber(oldest[2]))
  return {0, current, limit, retry_after_ms}
end

-- Add this request
redis.call('ZADD', key, now, req_id)
redis.call('PEXPIRE', key, window)

-- Return: {allowed, current_count, limit, retry_after_ms}
return {1, current + 1, limit, 0}

Token Bucket in Lua

-- token-bucket.lua
-- KEYS[1] = bucket key
-- ARGV[1] = current timestamp (milliseconds)
-- ARGV[2] = bucket capacity
-- ARGV[3] = refill rate (tokens per second)
-- ARGV[4] = cost (tokens to consume, usually 1)

local key = KEYS[1]
local now = tonumber(ARGV[1])
local capacity = tonumber(ARGV[2])
local refill_rate = tonumber(ARGV[3])
local cost = tonumber(ARGV[4])

local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now

-- Calculate tokens added since last request
local elapsed_seconds = (now - last_refill) / 1000
local tokens_to_add = elapsed_seconds * refill_rate
tokens = math.min(capacity, tokens + tokens_to_add)

if tokens < cost then
  -- Not enough tokens — calculate time until refill
  local tokens_needed = cost - tokens
  local wait_seconds = tokens_needed / refill_rate
  redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
  redis.call('PEXPIRE', key, math.ceil(capacity / refill_rate * 1000 * 2))
  return {0, math.floor(tokens), capacity, math.ceil(wait_seconds * 1000)}
end

tokens = tokens - cost
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('PEXPIRE', key, math.ceil(capacity / refill_rate * 1000 * 2))

return {1, math.floor(tokens), capacity, 0}

🚀 SaaS MVP in 8 Weeks — Seriously

We have launched 50+ SaaS platforms. Multi-tenant architecture, Stripe billing, auth, role-based access, and cloud deployment — all handled by one senior team.

Week 1–2: Architecture design + wireframes
Week 3–6: Core features built + tested
Week 7–8: Launch-ready on AWS/Vercel with CI/CD
Post-launch: Maintenance plans from month 3

Estimate My SaaS MVP WhatsApp

TypeScript Rate Limiter Class

// lib/rate-limiter.ts
import { createClient } from "redis";
import crypto from "crypto";
import * as fs from "fs";
import * as path from "path";

type RateLimitResult = {
  allowed: boolean;
  current: number;
  limit: number;
  retryAfterMs: number;
  resetAt: Date;
};

export class RateLimiter {
  private client: ReturnType<typeof createClient>;
  private slidingWindowSha: string | null = null;
  private tokenBucketSha: string | null = null;

  constructor(client: ReturnType<typeof createClient>) {
    this.client = client;
  }

  private async loadScript(script: string): Promise<string> {
    return this.client.scriptLoad(script);
  }

  async slidingWindow(options: {
    key: string;
    limit: number;
    windowMs: number;
  }): Promise<RateLimitResult> {
    const { key, limit, windowMs } = options;
    const now = Date.now();
    const requestId = crypto.randomUUID();

    const script = `
      local key = KEYS[1]
      local now = tonumber(ARGV[1])
      local window = tonumber(ARGV[2])
      local limit = tonumber(ARGV[3])
      local req_id = ARGV[4]
      local window_start = now - window
      redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start)
      local current = redis.call('ZCARD', key)
      if current >= limit then
        local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
        local retry_after_ms = window - (now - tonumber(oldest[2]))
        return {0, current, limit, retry_after_ms}
      end
      redis.call('ZADD', key, now, req_id)
      redis.call('PEXPIRE', key, window)
      return {1, current + 1, limit, 0}
    `;

    try {
      if (!this.slidingWindowSha) {
        this.slidingWindowSha = await this.loadScript(script);
      }

      const result = await this.client.evalSha(this.slidingWindowSha, {
        keys: [key],
        arguments: [String(now), String(windowMs), String(limit), requestId],
      }) as number[];

      return {
        allowed: result[0] === 1,
        current: result[1],
        limit: result[2],
        retryAfterMs: result[3],
        resetAt: new Date(now + windowMs),
      };
    } catch (err) {
      if ((err as Error).message?.includes("NOSCRIPT")) {
        this.slidingWindowSha = null;
        return this.slidingWindow(options);
      }
      throw err;
    }
  }

  async tokenBucket(options: {
    key: string;
    capacity: number;
    refillRatePerSecond: number;
    cost?: number;
  }): Promise<RateLimitResult> {
    const { key, capacity, refillRatePerSecond, cost = 1 } = options;
    const now = Date.now();

    const script = `
      local key = KEYS[1]
      local now = tonumber(ARGV[1])
      local capacity = tonumber(ARGV[2])
      local refill_rate = tonumber(ARGV[3])
      local cost = tonumber(ARGV[4])
      local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
      local tokens = tonumber(bucket[1]) or capacity
      local last_refill = tonumber(bucket[2]) or now
      local elapsed_seconds = (now - last_refill) / 1000
      local tokens_to_add = elapsed_seconds * refill_rate
      tokens = math.min(capacity, tokens + tokens_to_add)
      if tokens < cost then
        local tokens_needed = cost - tokens
        local wait_seconds = tokens_needed / refill_rate
        redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
        redis.call('PEXPIRE', key, math.ceil(capacity / refill_rate * 1000 * 2))
        return {0, math.floor(tokens), capacity, math.ceil(wait_seconds * 1000)}
      end
      tokens = tokens - cost
      redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
      redis.call('PEXPIRE', key, math.ceil(capacity / refill_rate * 1000 * 2))
      return {1, math.floor(tokens), capacity, 0}
    `;

    try {
      if (!this.tokenBucketSha) {
        this.tokenBucketSha = await this.loadScript(script);
      }

      const result = await this.client.evalSha(this.tokenBucketSha, {
        keys: [key],
        arguments: [
          String(now),
          String(capacity),
          String(refillRatePerSecond),
          String(cost),
        ],
      }) as number[];

      return {
        allowed: result[0] === 1,
        current: result[1],
        limit: result[2],
        retryAfterMs: result[3],
        resetAt: new Date(now + (result[3] || 1000)),
      };
    } catch (err) {
      if ((err as Error).message?.includes("NOSCRIPT")) {
        this.tokenBucketSha = null;
        return this.tokenBucket(options);
      }
      throw err;
    }
  }
}

Per-Plan Rate Limits

// lib/rate-limits/plans.ts

export type Plan = "free" | "starter" | "professional" | "enterprise";

interface PlanLimits {
  requestsPerMinute: number;
  requestsPerHour: number;
  requestsPerDay: number;
  burstCapacity: number;       // token bucket size
  burstRefillPerSecond: number; // token bucket refill rate
  concurrentRequests: number;
}

export const PLAN_LIMITS: Record<Plan, PlanLimits> = {
  free: {
    requestsPerMinute: 10,
    requestsPerHour: 100,
    requestsPerDay: 500,
    burstCapacity: 20,
    burstRefillPerSecond: 0.167, // 10/min
    concurrentRequests: 2,
  },
  starter: {
    requestsPerMinute: 60,
    requestsPerHour: 1_000,
    requestsPerDay: 10_000,
    burstCapacity: 100,
    burstRefillPerSecond: 1,
    concurrentRequests: 5,
  },
  professional: {
    requestsPerMinute: 300,
    requestsPerHour: 10_000,
    requestsPerDay: 100_000,
    burstCapacity: 500,
    burstRefillPerSecond: 5,
    concurrentRequests: 20,
  },
  enterprise: {
    requestsPerMinute: 3_000,
    requestsPerHour: 100_000,
    requestsPerDay: 1_000_000,
    burstCapacity: 5_000,
    burstRefillPerSecond: 50,
    concurrentRequests: 100,
  },
};

// Endpoint-specific multipliers (some endpoints cost more)
export const ENDPOINT_COSTS: Record<string, number> = {
  "/api/ai/generate": 10,      // 10× more expensive
  "/api/reports/export": 5,
  "/api/bulk/import": 5,
  "/api/search": 2,
  "/api/webhooks": 1,
};

export function getEndpointCost(pathname: string): number {
  for (const [pattern, cost] of Object.entries(ENDPOINT_COSTS)) {
    if (pathname.startsWith(pattern)) return cost;
  }
  return 1;
}

💡 The Difference Between a SaaS Demo and a SaaS Business

Anyone can build a demo. We build SaaS products that handle real load, real users, and real payments — with architecture that does not need to be rewritten at 1,000 users.

Multi-tenant PostgreSQL with row-level security
Stripe subscriptions, usage billing, annual plans
SOC2-ready infrastructure from day one
We own zero equity — you own everything

Start My SaaS WhatsApp

Next.js Middleware Rate Limiting

// middleware.ts
import { NextRequest, NextResponse } from "next/server";
import { auth } from "@/auth";
import { RateLimiter } from "@/lib/rate-limiter";
import { PLAN_LIMITS, getEndpointCost } from "@/lib/rate-limits/plans";
import { redis } from "@/lib/redis";
import { prisma } from "@/lib/prisma";
import { cache } from "react";
import type { Plan } from "@/lib/rate-limits/plans";

const rateLimiter = new RateLimiter(redis);

// Cache plan lookups for 5 minutes per request context
const getUserPlan = cache(async (userId: string): Promise<Plan> => {
  const subscription = await prisma.subscription.findFirst({
    where: { userId, status: "active" },
    select: { plan: true },
    orderBy: { createdAt: "desc" },
  });
  return (subscription?.plan as Plan) ?? "free";
});

function rateLimit429Response(result: {
  current: number;
  limit: number;
  retryAfterMs: number;
  resetAt: Date;
}) {
  return NextResponse.json(
    {
      error: "Too Many Requests",
      message: `Rate limit exceeded. Retry after ${Math.ceil(result.retryAfterMs / 1000)} seconds.`,
      limit: result.limit,
      current: result.current,
      retryAfter: Math.ceil(result.retryAfterMs / 1000),
    },
    {
      status: 429,
      headers: buildRateLimitHeaders(result),
    }
  );
}

function buildRateLimitHeaders(result: {
  current: number;
  limit: number;
  retryAfterMs: number;
  resetAt: Date;
}): Record<string, string> {
  return {
    "X-RateLimit-Limit": String(result.limit),
    "X-RateLimit-Remaining": String(Math.max(0, result.limit - result.current)),
    "X-RateLimit-Reset": String(Math.floor(result.resetAt.getTime() / 1000)),
    "X-RateLimit-Reset-After": String(Math.ceil(result.retryAfterMs / 1000)),
    "Retry-After": String(Math.ceil(result.retryAfterMs / 1000)),
  };
}

export async function middleware(req: NextRequest) {
  // Only rate limit API routes
  if (!req.nextUrl.pathname.startsWith("/api/")) {
    return NextResponse.next();
  }

  // Skip auth endpoints
  if (req.nextUrl.pathname.startsWith("/api/auth")) {
    return NextResponse.next();
  }

  const session = await auth();

  // Unauthenticated API requests: strict IP-based limiting
  if (!session?.user) {
    const ip =
      req.headers.get("cf-connecting-ip") ??
      req.headers.get("x-forwarded-for")?.split(",")[0].trim() ??
      "unknown";

    const result = await rateLimiter.slidingWindow({
      key: `ratelimit:ip:${ip}`,
      limit: 20,
      windowMs: 60_000, // 20 per minute for unauthenticated
    });

    if (!result.allowed) {
      return rateLimit429Response(result);
    }

    const response = NextResponse.next();
    Object.entries(buildRateLimitHeaders(result)).forEach(([k, v]) =>
      response.headers.set(k, v)
    );
    return response;
  }

  const userId = session.user.id;
  const plan = await getUserPlan(userId);
  const limits = PLAN_LIMITS[plan];
  const endpointCost = getEndpointCost(req.nextUrl.pathname);

  // Layer 1: Per-minute sliding window
  const minuteResult = await rateLimiter.slidingWindow({
    key: `ratelimit:${userId}:minute`,
    limit: limits.requestsPerMinute,
    windowMs: 60_000,
  });

  if (!minuteResult.allowed) {
    return rateLimit429Response(minuteResult);
  }

  // Layer 2: Per-hour sliding window
  const hourResult = await rateLimiter.slidingWindow({
    key: `ratelimit:${userId}:hour`,
    limit: limits.requestsPerHour,
    windowMs: 3_600_000,
  });

  if (!hourResult.allowed) {
    return rateLimit429Response(hourResult);
  }

  // Layer 3: Token bucket for burst control (endpoint-cost-aware)
  const burstResult = await rateLimiter.tokenBucket({
    key: `ratelimit:${userId}:burst`,
    capacity: limits.burstCapacity,
    refillRatePerSecond: limits.burstRefillPerSecond,
    cost: endpointCost,
  });

  if (!burstResult.allowed) {
    return rateLimit429Response(burstResult);
  }

  // All checks passed — add headers to response
  const response = NextResponse.next();
  const headers = buildRateLimitHeaders(minuteResult);
  // Add burst remaining as additional info
  headers["X-RateLimit-Burst-Remaining"] = String(burstResult.current);
  headers["X-RateLimit-Plan"] = plan;

  Object.entries(headers).forEach(([k, v]) => response.headers.set(k, v));
  return response;
}

export const config = {
  matcher: ["/api/:path*"],
};

Stripe-Style Rate Limit Response Headers

Following Stripe's API pattern makes your API developer-friendly:

// Stripe-style headers your API consumers will see:
// X-RateLimit-Limit: 300          — limit for current window
// X-RateLimit-Remaining: 247      — requests left in window
// X-RateLimit-Reset: 1740000060   — Unix timestamp when window resets
// X-RateLimit-Reset-After: 42     — seconds until window resets
// Retry-After: 42                 — seconds to wait before retrying (on 429)

// Example 429 response body (Stripe-style):
{
  "error": {
    "type": "rate_limit_error",
    "code": "rate_limited",
    "message": "Too many requests made to the API too quickly.",
    "param": null,
    "doc_url": "https://yourapp.com/docs/api/rate-limits"
  }
}

// lib/api/errors.ts — standardized error response format
export function rateLimitErrorResponse(details: {
  limit: number;
  current: number;
  retryAfterMs: number;
  resetAt: Date;
  plan: string;
}) {
  return {
    body: {
      error: {
        type: "rate_limit_error",
        code: "rate_limited",
        message: `API rate limit exceeded for ${details.plan} plan. Upgrade for higher limits.`,
        doc_url: "https://yourapp.com/docs/api/rate-limits",
      },
    },
    headers: {
      "X-RateLimit-Limit": String(details.limit),
      "X-RateLimit-Remaining": "0",
      "X-RateLimit-Reset": String(Math.floor(details.resetAt.getTime() / 1000)),
      "X-RateLimit-Reset-After": String(Math.ceil(details.retryAfterMs / 1000)),
      "X-RateLimit-Plan": details.plan,
      "Retry-After": String(Math.ceil(details.retryAfterMs / 1000)),
    },
    status: 429,
  };
}

Rate Limit Analytics

Track limit violations to understand upgrade pressure:

-- Create table to log rate limit hits
CREATE TABLE rate_limit_events (
  id              BIGSERIAL PRIMARY KEY,
  user_id         UUID NOT NULL,
  workspace_id    UUID,
  plan            TEXT NOT NULL,
  endpoint        TEXT NOT NULL,
  limit_type      TEXT NOT NULL,  -- 'minute', 'hour', 'day', 'burst'
  limit_value     INTEGER NOT NULL,
  current_value   INTEGER NOT NULL,
  ip_address      INET,
  occurred_at     TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_rle_user ON rate_limit_events(user_id, occurred_at DESC);
CREATE INDEX idx_rle_occurred ON rate_limit_events(occurred_at DESC);

-- Most rate-limited users (upgrade candidates)
SELECT
  user_id,
  plan,
  COUNT(*) AS limit_hits,
  COUNT(DISTINCT DATE(occurred_at)) AS days_hitting_limits,
  MAX(occurred_at) AS last_hit,
  MODE() WITHIN GROUP (ORDER BY endpoint) AS most_limited_endpoint
FROM rate_limit_events
WHERE occurred_at > NOW() - INTERVAL '30 days'
GROUP BY user_id, plan
HAVING COUNT(*) > 10
ORDER BY limit_hits DESC
LIMIT 50;

// Log rate limit violations (non-blocking)
async function logRateLimitViolation(params: {
  userId: string;
  plan: string;
  endpoint: string;
  limitType: string;
  limit: number;
  current: number;
}) {
  prisma.rateLimitEvent
    .create({
      data: {
        userId: params.userId,
        plan: params.plan,
        endpoint: params.endpoint,
        limitType: params.limitType,
        limitValue: params.limit,
        currentValue: params.current,
      },
    })
    .catch((err) => console.error("Failed to log rate limit event:", err));
}

Cost and Timeline Estimates

Scope	Team	Timeline	Cost Range
Basic IP rate limiting (in-memory)	1 dev	0.5 days	$100–300
Redis sliding window, single tier	1 dev	1–2 days	$400–800
Multi-tier per-plan with burst control	1 dev	3–5 days	$1,000–2,000
Full system (per-plan + analytics + headers + docs)	1–2 devs	1–2 weeks	$2,500–6,000
API gateway managed rate limiting (AWS/Kong)	1 dev	2–3 days	$600–1,500

Redis running costs: A Redis cluster handling 10,000 rate limit operations/second costs ~$50–200/month (ElastiCache t3.small–medium).

Working With Viprasol

Rate limiting that works correctly is deceptively complex — the edge cases (concurrent requests, Redis script expiry, per-endpoint cost accounting) only surface under load. Our team has built rate limiting systems for SaaS APIs handling millions of requests per day, with the analytics infrastructure to turn limit violations into upgrade conversations.

What we deliver:

Atomic Redis Lua scripts for sliding window and token bucket
Per-plan limit configuration aligned to your pricing tiers
Stripe-compatible response headers with full RFC 6585 compliance
Rate limit violation logging and upgrade-pressure analytics
Load testing to validate limits under realistic traffic

Talk to our team about your API infrastructure →

Or explore our SaaS development services to see how we build production-grade products.

SaaS API Rate Limiting: Token Bucket, Sliding Window, Per-Plan Limits, and Stripe-Style Headers

Choosing the Right Algorithm

Token Bucket

Sliding Window Counter

Fixed Window Counter

Redis Lua Scripts: Atomic Rate Limiting

Sliding Window in Lua

Token Bucket in Lua

🚀 SaaS MVP in 8 Weeks — Seriously

TypeScript Rate Limiter Class

Per-Plan Rate Limits

💡 The Difference Between a SaaS Demo and a SaaS Business

Next.js Middleware Rate Limiting

Stripe-Style Rate Limit Response Headers

Rate Limit Analytics

Cost and Timeline Estimates

See Also

Working With Viprasol

Viprasol Tech Team

Building a SaaS Product?

Add AI automation to your SaaS product?

Related Articles

Next.js API Rate Limiting with Upstash Redis: Per-User, Per-IP, and Sliding Window Algorithms

Building a Typed TypeScript SDK for Your SaaS Public API

SaaS Webhook Signatures: HMAC Verification, Replay Attack Prevention, and Delivery Retry