Back to Blog

SaaS API Rate Limiting: Token Bucket, Sliding Window, Per-Plan Limits, and Stripe-Style Headers

Build production API rate limiting for SaaS with token bucket and sliding window algorithms, per-plan tier limits, Redis atomic Lua scripts, and Stripe-style rate limit response headers.

Viprasol Tech Team
March 6, 2027
13 min read

API rate limiting is how you prevent one customer from ruining the experience for everyone else. Without it, a single misconfigured script can saturate your database, spike your bill, and degrade service for your entire customer base. With it properly implemented, you protect infrastructure, enforce pricing tiers, and give API consumers the feedback they need to behave well.

This guide covers the two algorithms used in production SaaS systems, how to implement them atomically in Redis, and how to build the per-plan limit system your pricing page will need.

Choosing the Right Algorithm

Token Bucket

Bucket holds N tokens (capacity)
Every T seconds, add R tokens (refill rate)
Each request consumes 1 token
If bucket is empty β†’ 429

Strengths: Allows burst up to bucket capacity while enforcing average rate. Natural model for API consumers who occasionally burst.

Weakness: Users learn to burst at refill time.

Sliding Window Counter

Track request count in a rolling window (e.g., last 60 seconds)
New request: count requests in [now - window, now]
If count >= limit β†’ 429

Strengths: Smoothly enforced rate β€” no gaming the refill cycle. More accurate than fixed window (which allows 2Γ— burst at window boundaries).

Weakness: Slightly more complex to implement correctly.

Fixed Window Counter

Simpler but has the boundary problem: a user making 100 requests at 11:59:50 and 100 more at 12:00:05 effectively makes 200 requests within 15 seconds against a "100 per minute" limit.

Use sliding window for anything customer-facing. Fixed window for internal limits where precision matters less.

Redis Lua Scripts: Atomic Rate Limiting

Rate limit checks must be atomic β€” checking count and incrementing must happen in one operation, or two concurrent requests can both pass a limit that should only allow one.

Sliding Window in Lua

-- sliding-window.lua
-- KEYS[1] = rate limit key (e.g., "ratelimit:user:123:api_calls")
-- ARGV[1] = current timestamp (milliseconds)
-- ARGV[2] = window size (milliseconds)
-- ARGV[3] = limit (max requests per window)
-- ARGV[4] = unique request ID (for ZADD dedup)

local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
local req_id = ARGV[4]
local window_start = now - window

-- Remove expired entries
redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start)

-- Count current requests in window
local current = redis.call('ZCARD', key)

if current >= limit then
  -- Get the oldest request timestamp to calculate retry-after
  local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
  local retry_after_ms = window - (now - tonumber(oldest[2]))
  return {0, current, limit, retry_after_ms}
end

-- Add this request
redis.call('ZADD', key, now, req_id)
redis.call('PEXPIRE', key, window)

-- Return: {allowed, current_count, limit, retry_after_ms}
return {1, current + 1, limit, 0}

Token Bucket in Lua

-- token-bucket.lua
-- KEYS[1] = bucket key
-- ARGV[1] = current timestamp (milliseconds)
-- ARGV[2] = bucket capacity
-- ARGV[3] = refill rate (tokens per second)
-- ARGV[4] = cost (tokens to consume, usually 1)

local key = KEYS[1]
local now = tonumber(ARGV[1])
local capacity = tonumber(ARGV[2])
local refill_rate = tonumber(ARGV[3])
local cost = tonumber(ARGV[4])

local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now

-- Calculate tokens added since last request
local elapsed_seconds = (now - last_refill) / 1000
local tokens_to_add = elapsed_seconds * refill_rate
tokens = math.min(capacity, tokens + tokens_to_add)

if tokens < cost then
  -- Not enough tokens β€” calculate time until refill
  local tokens_needed = cost - tokens
  local wait_seconds = tokens_needed / refill_rate
  redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
  redis.call('PEXPIRE', key, math.ceil(capacity / refill_rate * 1000 * 2))
  return {0, math.floor(tokens), capacity, math.ceil(wait_seconds * 1000)}
end

tokens = tokens - cost
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('PEXPIRE', key, math.ceil(capacity / refill_rate * 1000 * 2))

return {1, math.floor(tokens), capacity, 0}

πŸš€ SaaS MVP in 8 Weeks β€” Seriously

We have launched 50+ SaaS platforms. Multi-tenant architecture, Stripe billing, auth, role-based access, and cloud deployment β€” all handled by one senior team.

  • Week 1–2: Architecture design + wireframes
  • Week 3–6: Core features built + tested
  • Week 7–8: Launch-ready on AWS/Vercel with CI/CD
  • Post-launch: Maintenance plans from month 3

TypeScript Rate Limiter Class

// lib/rate-limiter.ts
import { createClient } from "redis";
import crypto from "crypto";
import * as fs from "fs";
import * as path from "path";

type RateLimitResult = {
  allowed: boolean;
  current: number;
  limit: number;
  retryAfterMs: number;
  resetAt: Date;
};

export class RateLimiter {
  private client: ReturnType<typeof createClient>;
  private slidingWindowSha: string | null = null;
  private tokenBucketSha: string | null = null;

  constructor(client: ReturnType<typeof createClient>) {
    this.client = client;
  }

  private async loadScript(script: string): Promise<string> {
    return this.client.scriptLoad(script);
  }

  async slidingWindow(options: {
    key: string;
    limit: number;
    windowMs: number;
  }): Promise<RateLimitResult> {
    const { key, limit, windowMs } = options;
    const now = Date.now();
    const requestId = crypto.randomUUID();

    const script = `
      local key = KEYS[1]
      local now = tonumber(ARGV[1])
      local window = tonumber(ARGV[2])
      local limit = tonumber(ARGV[3])
      local req_id = ARGV[4]
      local window_start = now - window
      redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start)
      local current = redis.call('ZCARD', key)
      if current >= limit then
        local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
        local retry_after_ms = window - (now - tonumber(oldest[2]))
        return {0, current, limit, retry_after_ms}
      end
      redis.call('ZADD', key, now, req_id)
      redis.call('PEXPIRE', key, window)
      return {1, current + 1, limit, 0}
    `;

    try {
      if (!this.slidingWindowSha) {
        this.slidingWindowSha = await this.loadScript(script);
      }

      const result = await this.client.evalSha(this.slidingWindowSha, {
        keys: [key],
        arguments: [String(now), String(windowMs), String(limit), requestId],
      }) as number[];

      return {
        allowed: result[0] === 1,
        current: result[1],
        limit: result[2],
        retryAfterMs: result[3],
        resetAt: new Date(now + windowMs),
      };
    } catch (err) {
      if ((err as Error).message?.includes("NOSCRIPT")) {
        this.slidingWindowSha = null;
        return this.slidingWindow(options);
      }
      throw err;
    }
  }

  async tokenBucket(options: {
    key: string;
    capacity: number;
    refillRatePerSecond: number;
    cost?: number;
  }): Promise<RateLimitResult> {
    const { key, capacity, refillRatePerSecond, cost = 1 } = options;
    const now = Date.now();

    const script = `
      local key = KEYS[1]
      local now = tonumber(ARGV[1])
      local capacity = tonumber(ARGV[2])
      local refill_rate = tonumber(ARGV[3])
      local cost = tonumber(ARGV[4])
      local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
      local tokens = tonumber(bucket[1]) or capacity
      local last_refill = tonumber(bucket[2]) or now
      local elapsed_seconds = (now - last_refill) / 1000
      local tokens_to_add = elapsed_seconds * refill_rate
      tokens = math.min(capacity, tokens + tokens_to_add)
      if tokens < cost then
        local tokens_needed = cost - tokens
        local wait_seconds = tokens_needed / refill_rate
        redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
        redis.call('PEXPIRE', key, math.ceil(capacity / refill_rate * 1000 * 2))
        return {0, math.floor(tokens), capacity, math.ceil(wait_seconds * 1000)}
      end
      tokens = tokens - cost
      redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
      redis.call('PEXPIRE', key, math.ceil(capacity / refill_rate * 1000 * 2))
      return {1, math.floor(tokens), capacity, 0}
    `;

    try {
      if (!this.tokenBucketSha) {
        this.tokenBucketSha = await this.loadScript(script);
      }

      const result = await this.client.evalSha(this.tokenBucketSha, {
        keys: [key],
        arguments: [
          String(now),
          String(capacity),
          String(refillRatePerSecond),
          String(cost),
        ],
      }) as number[];

      return {
        allowed: result[0] === 1,
        current: result[1],
        limit: result[2],
        retryAfterMs: result[3],
        resetAt: new Date(now + (result[3] || 1000)),
      };
    } catch (err) {
      if ((err as Error).message?.includes("NOSCRIPT")) {
        this.tokenBucketSha = null;
        return this.tokenBucket(options);
      }
      throw err;
    }
  }
}

Per-Plan Rate Limits

// lib/rate-limits/plans.ts

export type Plan = "free" | "starter" | "professional" | "enterprise";

interface PlanLimits {
  requestsPerMinute: number;
  requestsPerHour: number;
  requestsPerDay: number;
  burstCapacity: number;       // token bucket size
  burstRefillPerSecond: number; // token bucket refill rate
  concurrentRequests: number;
}

export const PLAN_LIMITS: Record<Plan, PlanLimits> = {
  free: {
    requestsPerMinute: 10,
    requestsPerHour: 100,
    requestsPerDay: 500,
    burstCapacity: 20,
    burstRefillPerSecond: 0.167, // 10/min
    concurrentRequests: 2,
  },
  starter: {
    requestsPerMinute: 60,
    requestsPerHour: 1_000,
    requestsPerDay: 10_000,
    burstCapacity: 100,
    burstRefillPerSecond: 1,
    concurrentRequests: 5,
  },
  professional: {
    requestsPerMinute: 300,
    requestsPerHour: 10_000,
    requestsPerDay: 100_000,
    burstCapacity: 500,
    burstRefillPerSecond: 5,
    concurrentRequests: 20,
  },
  enterprise: {
    requestsPerMinute: 3_000,
    requestsPerHour: 100_000,
    requestsPerDay: 1_000_000,
    burstCapacity: 5_000,
    burstRefillPerSecond: 50,
    concurrentRequests: 100,
  },
};

// Endpoint-specific multipliers (some endpoints cost more)
export const ENDPOINT_COSTS: Record<string, number> = {
  "/api/ai/generate": 10,      // 10Γ— more expensive
  "/api/reports/export": 5,
  "/api/bulk/import": 5,
  "/api/search": 2,
  "/api/webhooks": 1,
};

export function getEndpointCost(pathname: string): number {
  for (const [pattern, cost] of Object.entries(ENDPOINT_COSTS)) {
    if (pathname.startsWith(pattern)) return cost;
  }
  return 1;
}

πŸ’‘ The Difference Between a SaaS Demo and a SaaS Business

Anyone can build a demo. We build SaaS products that handle real load, real users, and real payments β€” with architecture that does not need to be rewritten at 1,000 users.

  • Multi-tenant PostgreSQL with row-level security
  • Stripe subscriptions, usage billing, annual plans
  • SOC2-ready infrastructure from day one
  • We own zero equity β€” you own everything

Next.js Middleware Rate Limiting

// middleware.ts
import { NextRequest, NextResponse } from "next/server";
import { auth } from "@/auth";
import { RateLimiter } from "@/lib/rate-limiter";
import { PLAN_LIMITS, getEndpointCost } from "@/lib/rate-limits/plans";
import { redis } from "@/lib/redis";
import { prisma } from "@/lib/prisma";
import { cache } from "react";
import type { Plan } from "@/lib/rate-limits/plans";

const rateLimiter = new RateLimiter(redis);

// Cache plan lookups for 5 minutes per request context
const getUserPlan = cache(async (userId: string): Promise<Plan> => {
  const subscription = await prisma.subscription.findFirst({
    where: { userId, status: "active" },
    select: { plan: true },
    orderBy: { createdAt: "desc" },
  });
  return (subscription?.plan as Plan) ?? "free";
});

function rateLimit429Response(result: {
  current: number;
  limit: number;
  retryAfterMs: number;
  resetAt: Date;
}) {
  return NextResponse.json(
    {
      error: "Too Many Requests",
      message: `Rate limit exceeded. Retry after ${Math.ceil(result.retryAfterMs / 1000)} seconds.`,
      limit: result.limit,
      current: result.current,
      retryAfter: Math.ceil(result.retryAfterMs / 1000),
    },
    {
      status: 429,
      headers: buildRateLimitHeaders(result),
    }
  );
}

function buildRateLimitHeaders(result: {
  current: number;
  limit: number;
  retryAfterMs: number;
  resetAt: Date;
}): Record<string, string> {
  return {
    "X-RateLimit-Limit": String(result.limit),
    "X-RateLimit-Remaining": String(Math.max(0, result.limit - result.current)),
    "X-RateLimit-Reset": String(Math.floor(result.resetAt.getTime() / 1000)),
    "X-RateLimit-Reset-After": String(Math.ceil(result.retryAfterMs / 1000)),
    "Retry-After": String(Math.ceil(result.retryAfterMs / 1000)),
  };
}

export async function middleware(req: NextRequest) {
  // Only rate limit API routes
  if (!req.nextUrl.pathname.startsWith("/api/")) {
    return NextResponse.next();
  }

  // Skip auth endpoints
  if (req.nextUrl.pathname.startsWith("/api/auth")) {
    return NextResponse.next();
  }

  const session = await auth();

  // Unauthenticated API requests: strict IP-based limiting
  if (!session?.user) {
    const ip =
      req.headers.get("cf-connecting-ip") ??
      req.headers.get("x-forwarded-for")?.split(",")[0].trim() ??
      "unknown";

    const result = await rateLimiter.slidingWindow({
      key: `ratelimit:ip:${ip}`,
      limit: 20,
      windowMs: 60_000, // 20 per minute for unauthenticated
    });

    if (!result.allowed) {
      return rateLimit429Response(result);
    }

    const response = NextResponse.next();
    Object.entries(buildRateLimitHeaders(result)).forEach(([k, v]) =>
      response.headers.set(k, v)
    );
    return response;
  }

  const userId = session.user.id;
  const plan = await getUserPlan(userId);
  const limits = PLAN_LIMITS[plan];
  const endpointCost = getEndpointCost(req.nextUrl.pathname);

  // Layer 1: Per-minute sliding window
  const minuteResult = await rateLimiter.slidingWindow({
    key: `ratelimit:${userId}:minute`,
    limit: limits.requestsPerMinute,
    windowMs: 60_000,
  });

  if (!minuteResult.allowed) {
    return rateLimit429Response(minuteResult);
  }

  // Layer 2: Per-hour sliding window
  const hourResult = await rateLimiter.slidingWindow({
    key: `ratelimit:${userId}:hour`,
    limit: limits.requestsPerHour,
    windowMs: 3_600_000,
  });

  if (!hourResult.allowed) {
    return rateLimit429Response(hourResult);
  }

  // Layer 3: Token bucket for burst control (endpoint-cost-aware)
  const burstResult = await rateLimiter.tokenBucket({
    key: `ratelimit:${userId}:burst`,
    capacity: limits.burstCapacity,
    refillRatePerSecond: limits.burstRefillPerSecond,
    cost: endpointCost,
  });

  if (!burstResult.allowed) {
    return rateLimit429Response(burstResult);
  }

  // All checks passed β€” add headers to response
  const response = NextResponse.next();
  const headers = buildRateLimitHeaders(minuteResult);
  // Add burst remaining as additional info
  headers["X-RateLimit-Burst-Remaining"] = String(burstResult.current);
  headers["X-RateLimit-Plan"] = plan;

  Object.entries(headers).forEach(([k, v]) => response.headers.set(k, v));
  return response;
}

export const config = {
  matcher: ["/api/:path*"],
};

Stripe-Style Rate Limit Response Headers

Following Stripe's API pattern makes your API developer-friendly:

// Stripe-style headers your API consumers will see:
// X-RateLimit-Limit: 300          β€” limit for current window
// X-RateLimit-Remaining: 247      β€” requests left in window
// X-RateLimit-Reset: 1740000060   β€” Unix timestamp when window resets
// X-RateLimit-Reset-After: 42     β€” seconds until window resets
// Retry-After: 42                 β€” seconds to wait before retrying (on 429)

// Example 429 response body (Stripe-style):
{
  "error": {
    "type": "rate_limit_error",
    "code": "rate_limited",
    "message": "Too many requests made to the API too quickly.",
    "param": null,
    "doc_url": "https://yourapp.com/docs/api/rate-limits"
  }
}
// lib/api/errors.ts β€” standardized error response format
export function rateLimitErrorResponse(details: {
  limit: number;
  current: number;
  retryAfterMs: number;
  resetAt: Date;
  plan: string;
}) {
  return {
    body: {
      error: {
        type: "rate_limit_error",
        code: "rate_limited",
        message: `API rate limit exceeded for ${details.plan} plan. Upgrade for higher limits.`,
        doc_url: "https://yourapp.com/docs/api/rate-limits",
      },
    },
    headers: {
      "X-RateLimit-Limit": String(details.limit),
      "X-RateLimit-Remaining": "0",
      "X-RateLimit-Reset": String(Math.floor(details.resetAt.getTime() / 1000)),
      "X-RateLimit-Reset-After": String(Math.ceil(details.retryAfterMs / 1000)),
      "X-RateLimit-Plan": details.plan,
      "Retry-After": String(Math.ceil(details.retryAfterMs / 1000)),
    },
    status: 429,
  };
}

Rate Limit Analytics

Track limit violations to understand upgrade pressure:

-- Create table to log rate limit hits
CREATE TABLE rate_limit_events (
  id              BIGSERIAL PRIMARY KEY,
  user_id         UUID NOT NULL,
  workspace_id    UUID,
  plan            TEXT NOT NULL,
  endpoint        TEXT NOT NULL,
  limit_type      TEXT NOT NULL,  -- 'minute', 'hour', 'day', 'burst'
  limit_value     INTEGER NOT NULL,
  current_value   INTEGER NOT NULL,
  ip_address      INET,
  occurred_at     TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_rle_user ON rate_limit_events(user_id, occurred_at DESC);
CREATE INDEX idx_rle_occurred ON rate_limit_events(occurred_at DESC);

-- Most rate-limited users (upgrade candidates)
SELECT
  user_id,
  plan,
  COUNT(*) AS limit_hits,
  COUNT(DISTINCT DATE(occurred_at)) AS days_hitting_limits,
  MAX(occurred_at) AS last_hit,
  MODE() WITHIN GROUP (ORDER BY endpoint) AS most_limited_endpoint
FROM rate_limit_events
WHERE occurred_at > NOW() - INTERVAL '30 days'
GROUP BY user_id, plan
HAVING COUNT(*) > 10
ORDER BY limit_hits DESC
LIMIT 50;
// Log rate limit violations (non-blocking)
async function logRateLimitViolation(params: {
  userId: string;
  plan: string;
  endpoint: string;
  limitType: string;
  limit: number;
  current: number;
}) {
  prisma.rateLimitEvent
    .create({
      data: {
        userId: params.userId,
        plan: params.plan,
        endpoint: params.endpoint,
        limitType: params.limitType,
        limitValue: params.limit,
        currentValue: params.current,
      },
    })
    .catch((err) => console.error("Failed to log rate limit event:", err));
}

Cost and Timeline Estimates

ScopeTeamTimelineCost Range
Basic IP rate limiting (in-memory)1 dev0.5 days$100–300
Redis sliding window, single tier1 dev1–2 days$400–800
Multi-tier per-plan with burst control1 dev3–5 days$1,000–2,000
Full system (per-plan + analytics + headers + docs)1–2 devs1–2 weeks$2,500–6,000
API gateway managed rate limiting (AWS/Kong)1 dev2–3 days$600–1,500

Redis running costs: A Redis cluster handling 10,000 rate limit operations/second costs ~$50–200/month (ElastiCache t3.small–medium).

See Also


Working With Viprasol

Rate limiting that works correctly is deceptively complex β€” the edge cases (concurrent requests, Redis script expiry, per-endpoint cost accounting) only surface under load. Our team has built rate limiting systems for SaaS APIs handling millions of requests per day, with the analytics infrastructure to turn limit violations into upgrade conversations.

What we deliver:

  • Atomic Redis Lua scripts for sliding window and token bucket
  • Per-plan limit configuration aligned to your pricing tiers
  • Stripe-compatible response headers with full RFC 6585 compliance
  • Rate limit violation logging and upgrade-pressure analytics
  • Load testing to validate limits under realistic traffic

Talk to our team about your API infrastructure β†’

Or explore our SaaS development services to see how we build production-grade products.

Share this article:

About the Author

V

Viprasol Tech Team

Custom Software Development Specialists

The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.

MT4/MT5 EA DevelopmentAI Agent SystemsSaaS DevelopmentAlgorithmic Trading

Building a SaaS Product?

We've helped launch 50+ SaaS platforms. Let's build yours β€” fast.

Free consultation β€’ No commitment β€’ Response within 24 hours

Viprasol Β· AI Agent Systems

Add AI automation to your SaaS product?

Viprasol builds custom AI agent crews that plug into any SaaS workflow β€” automating repetitive tasks, qualifying leads, and responding across every channel your customers use.