SaaS API Rate Limiting: Token Bucket, Sliding Window, Per-Plan Limits, and Stripe-Style Headers
Build production API rate limiting for SaaS with token bucket and sliding window algorithms, per-plan tier limits, Redis atomic Lua scripts, and Stripe-style rate limit response headers.
API rate limiting is how you prevent one customer from ruining the experience for everyone else. Without it, a single misconfigured script can saturate your database, spike your bill, and degrade service for your entire customer base. With it properly implemented, you protect infrastructure, enforce pricing tiers, and give API consumers the feedback they need to behave well.
This guide covers the two algorithms used in production SaaS systems, how to implement them atomically in Redis, and how to build the per-plan limit system your pricing page will need.
Choosing the Right Algorithm
Token Bucket
Bucket holds N tokens (capacity)
Every T seconds, add R tokens (refill rate)
Each request consumes 1 token
If bucket is empty β 429
Strengths: Allows burst up to bucket capacity while enforcing average rate. Natural model for API consumers who occasionally burst.
Weakness: Users learn to burst at refill time.
Sliding Window Counter
Track request count in a rolling window (e.g., last 60 seconds)
New request: count requests in [now - window, now]
If count >= limit β 429
Strengths: Smoothly enforced rate β no gaming the refill cycle. More accurate than fixed window (which allows 2Γ burst at window boundaries).
Weakness: Slightly more complex to implement correctly.
Fixed Window Counter
Simpler but has the boundary problem: a user making 100 requests at 11:59:50 and 100 more at 12:00:05 effectively makes 200 requests within 15 seconds against a "100 per minute" limit.
Use sliding window for anything customer-facing. Fixed window for internal limits where precision matters less.
Redis Lua Scripts: Atomic Rate Limiting
Rate limit checks must be atomic β checking count and incrementing must happen in one operation, or two concurrent requests can both pass a limit that should only allow one.
Sliding Window in Lua
-- sliding-window.lua
-- KEYS[1] = rate limit key (e.g., "ratelimit:user:123:api_calls")
-- ARGV[1] = current timestamp (milliseconds)
-- ARGV[2] = window size (milliseconds)
-- ARGV[3] = limit (max requests per window)
-- ARGV[4] = unique request ID (for ZADD dedup)
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
local req_id = ARGV[4]
local window_start = now - window
-- Remove expired entries
redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start)
-- Count current requests in window
local current = redis.call('ZCARD', key)
if current >= limit then
-- Get the oldest request timestamp to calculate retry-after
local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
local retry_after_ms = window - (now - tonumber(oldest[2]))
return {0, current, limit, retry_after_ms}
end
-- Add this request
redis.call('ZADD', key, now, req_id)
redis.call('PEXPIRE', key, window)
-- Return: {allowed, current_count, limit, retry_after_ms}
return {1, current + 1, limit, 0}
Token Bucket in Lua
-- token-bucket.lua
-- KEYS[1] = bucket key
-- ARGV[1] = current timestamp (milliseconds)
-- ARGV[2] = bucket capacity
-- ARGV[3] = refill rate (tokens per second)
-- ARGV[4] = cost (tokens to consume, usually 1)
local key = KEYS[1]
local now = tonumber(ARGV[1])
local capacity = tonumber(ARGV[2])
local refill_rate = tonumber(ARGV[3])
local cost = tonumber(ARGV[4])
local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now
-- Calculate tokens added since last request
local elapsed_seconds = (now - last_refill) / 1000
local tokens_to_add = elapsed_seconds * refill_rate
tokens = math.min(capacity, tokens + tokens_to_add)
if tokens < cost then
-- Not enough tokens β calculate time until refill
local tokens_needed = cost - tokens
local wait_seconds = tokens_needed / refill_rate
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('PEXPIRE', key, math.ceil(capacity / refill_rate * 1000 * 2))
return {0, math.floor(tokens), capacity, math.ceil(wait_seconds * 1000)}
end
tokens = tokens - cost
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('PEXPIRE', key, math.ceil(capacity / refill_rate * 1000 * 2))
return {1, math.floor(tokens), capacity, 0}
π SaaS MVP in 8 Weeks β Seriously
We have launched 50+ SaaS platforms. Multi-tenant architecture, Stripe billing, auth, role-based access, and cloud deployment β all handled by one senior team.
- Week 1β2: Architecture design + wireframes
- Week 3β6: Core features built + tested
- Week 7β8: Launch-ready on AWS/Vercel with CI/CD
- Post-launch: Maintenance plans from month 3
TypeScript Rate Limiter Class
// lib/rate-limiter.ts
import { createClient } from "redis";
import crypto from "crypto";
import * as fs from "fs";
import * as path from "path";
type RateLimitResult = {
allowed: boolean;
current: number;
limit: number;
retryAfterMs: number;
resetAt: Date;
};
export class RateLimiter {
private client: ReturnType<typeof createClient>;
private slidingWindowSha: string | null = null;
private tokenBucketSha: string | null = null;
constructor(client: ReturnType<typeof createClient>) {
this.client = client;
}
private async loadScript(script: string): Promise<string> {
return this.client.scriptLoad(script);
}
async slidingWindow(options: {
key: string;
limit: number;
windowMs: number;
}): Promise<RateLimitResult> {
const { key, limit, windowMs } = options;
const now = Date.now();
const requestId = crypto.randomUUID();
const script = `
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
local req_id = ARGV[4]
local window_start = now - window
redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start)
local current = redis.call('ZCARD', key)
if current >= limit then
local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
local retry_after_ms = window - (now - tonumber(oldest[2]))
return {0, current, limit, retry_after_ms}
end
redis.call('ZADD', key, now, req_id)
redis.call('PEXPIRE', key, window)
return {1, current + 1, limit, 0}
`;
try {
if (!this.slidingWindowSha) {
this.slidingWindowSha = await this.loadScript(script);
}
const result = await this.client.evalSha(this.slidingWindowSha, {
keys: [key],
arguments: [String(now), String(windowMs), String(limit), requestId],
}) as number[];
return {
allowed: result[0] === 1,
current: result[1],
limit: result[2],
retryAfterMs: result[3],
resetAt: new Date(now + windowMs),
};
} catch (err) {
if ((err as Error).message?.includes("NOSCRIPT")) {
this.slidingWindowSha = null;
return this.slidingWindow(options);
}
throw err;
}
}
async tokenBucket(options: {
key: string;
capacity: number;
refillRatePerSecond: number;
cost?: number;
}): Promise<RateLimitResult> {
const { key, capacity, refillRatePerSecond, cost = 1 } = options;
const now = Date.now();
const script = `
local key = KEYS[1]
local now = tonumber(ARGV[1])
local capacity = tonumber(ARGV[2])
local refill_rate = tonumber(ARGV[3])
local cost = tonumber(ARGV[4])
local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now
local elapsed_seconds = (now - last_refill) / 1000
local tokens_to_add = elapsed_seconds * refill_rate
tokens = math.min(capacity, tokens + tokens_to_add)
if tokens < cost then
local tokens_needed = cost - tokens
local wait_seconds = tokens_needed / refill_rate
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('PEXPIRE', key, math.ceil(capacity / refill_rate * 1000 * 2))
return {0, math.floor(tokens), capacity, math.ceil(wait_seconds * 1000)}
end
tokens = tokens - cost
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('PEXPIRE', key, math.ceil(capacity / refill_rate * 1000 * 2))
return {1, math.floor(tokens), capacity, 0}
`;
try {
if (!this.tokenBucketSha) {
this.tokenBucketSha = await this.loadScript(script);
}
const result = await this.client.evalSha(this.tokenBucketSha, {
keys: [key],
arguments: [
String(now),
String(capacity),
String(refillRatePerSecond),
String(cost),
],
}) as number[];
return {
allowed: result[0] === 1,
current: result[1],
limit: result[2],
retryAfterMs: result[3],
resetAt: new Date(now + (result[3] || 1000)),
};
} catch (err) {
if ((err as Error).message?.includes("NOSCRIPT")) {
this.tokenBucketSha = null;
return this.tokenBucket(options);
}
throw err;
}
}
}
Per-Plan Rate Limits
// lib/rate-limits/plans.ts
export type Plan = "free" | "starter" | "professional" | "enterprise";
interface PlanLimits {
requestsPerMinute: number;
requestsPerHour: number;
requestsPerDay: number;
burstCapacity: number; // token bucket size
burstRefillPerSecond: number; // token bucket refill rate
concurrentRequests: number;
}
export const PLAN_LIMITS: Record<Plan, PlanLimits> = {
free: {
requestsPerMinute: 10,
requestsPerHour: 100,
requestsPerDay: 500,
burstCapacity: 20,
burstRefillPerSecond: 0.167, // 10/min
concurrentRequests: 2,
},
starter: {
requestsPerMinute: 60,
requestsPerHour: 1_000,
requestsPerDay: 10_000,
burstCapacity: 100,
burstRefillPerSecond: 1,
concurrentRequests: 5,
},
professional: {
requestsPerMinute: 300,
requestsPerHour: 10_000,
requestsPerDay: 100_000,
burstCapacity: 500,
burstRefillPerSecond: 5,
concurrentRequests: 20,
},
enterprise: {
requestsPerMinute: 3_000,
requestsPerHour: 100_000,
requestsPerDay: 1_000_000,
burstCapacity: 5_000,
burstRefillPerSecond: 50,
concurrentRequests: 100,
},
};
// Endpoint-specific multipliers (some endpoints cost more)
export const ENDPOINT_COSTS: Record<string, number> = {
"/api/ai/generate": 10, // 10Γ more expensive
"/api/reports/export": 5,
"/api/bulk/import": 5,
"/api/search": 2,
"/api/webhooks": 1,
};
export function getEndpointCost(pathname: string): number {
for (const [pattern, cost] of Object.entries(ENDPOINT_COSTS)) {
if (pathname.startsWith(pattern)) return cost;
}
return 1;
}
π‘ The Difference Between a SaaS Demo and a SaaS Business
Anyone can build a demo. We build SaaS products that handle real load, real users, and real payments β with architecture that does not need to be rewritten at 1,000 users.
- Multi-tenant PostgreSQL with row-level security
- Stripe subscriptions, usage billing, annual plans
- SOC2-ready infrastructure from day one
- We own zero equity β you own everything
Next.js Middleware Rate Limiting
// middleware.ts
import { NextRequest, NextResponse } from "next/server";
import { auth } from "@/auth";
import { RateLimiter } from "@/lib/rate-limiter";
import { PLAN_LIMITS, getEndpointCost } from "@/lib/rate-limits/plans";
import { redis } from "@/lib/redis";
import { prisma } from "@/lib/prisma";
import { cache } from "react";
import type { Plan } from "@/lib/rate-limits/plans";
const rateLimiter = new RateLimiter(redis);
// Cache plan lookups for 5 minutes per request context
const getUserPlan = cache(async (userId: string): Promise<Plan> => {
const subscription = await prisma.subscription.findFirst({
where: { userId, status: "active" },
select: { plan: true },
orderBy: { createdAt: "desc" },
});
return (subscription?.plan as Plan) ?? "free";
});
function rateLimit429Response(result: {
current: number;
limit: number;
retryAfterMs: number;
resetAt: Date;
}) {
return NextResponse.json(
{
error: "Too Many Requests",
message: `Rate limit exceeded. Retry after ${Math.ceil(result.retryAfterMs / 1000)} seconds.`,
limit: result.limit,
current: result.current,
retryAfter: Math.ceil(result.retryAfterMs / 1000),
},
{
status: 429,
headers: buildRateLimitHeaders(result),
}
);
}
function buildRateLimitHeaders(result: {
current: number;
limit: number;
retryAfterMs: number;
resetAt: Date;
}): Record<string, string> {
return {
"X-RateLimit-Limit": String(result.limit),
"X-RateLimit-Remaining": String(Math.max(0, result.limit - result.current)),
"X-RateLimit-Reset": String(Math.floor(result.resetAt.getTime() / 1000)),
"X-RateLimit-Reset-After": String(Math.ceil(result.retryAfterMs / 1000)),
"Retry-After": String(Math.ceil(result.retryAfterMs / 1000)),
};
}
export async function middleware(req: NextRequest) {
// Only rate limit API routes
if (!req.nextUrl.pathname.startsWith("/api/")) {
return NextResponse.next();
}
// Skip auth endpoints
if (req.nextUrl.pathname.startsWith("/api/auth")) {
return NextResponse.next();
}
const session = await auth();
// Unauthenticated API requests: strict IP-based limiting
if (!session?.user) {
const ip =
req.headers.get("cf-connecting-ip") ??
req.headers.get("x-forwarded-for")?.split(",")[0].trim() ??
"unknown";
const result = await rateLimiter.slidingWindow({
key: `ratelimit:ip:${ip}`,
limit: 20,
windowMs: 60_000, // 20 per minute for unauthenticated
});
if (!result.allowed) {
return rateLimit429Response(result);
}
const response = NextResponse.next();
Object.entries(buildRateLimitHeaders(result)).forEach(([k, v]) =>
response.headers.set(k, v)
);
return response;
}
const userId = session.user.id;
const plan = await getUserPlan(userId);
const limits = PLAN_LIMITS[plan];
const endpointCost = getEndpointCost(req.nextUrl.pathname);
// Layer 1: Per-minute sliding window
const minuteResult = await rateLimiter.slidingWindow({
key: `ratelimit:${userId}:minute`,
limit: limits.requestsPerMinute,
windowMs: 60_000,
});
if (!minuteResult.allowed) {
return rateLimit429Response(minuteResult);
}
// Layer 2: Per-hour sliding window
const hourResult = await rateLimiter.slidingWindow({
key: `ratelimit:${userId}:hour`,
limit: limits.requestsPerHour,
windowMs: 3_600_000,
});
if (!hourResult.allowed) {
return rateLimit429Response(hourResult);
}
// Layer 3: Token bucket for burst control (endpoint-cost-aware)
const burstResult = await rateLimiter.tokenBucket({
key: `ratelimit:${userId}:burst`,
capacity: limits.burstCapacity,
refillRatePerSecond: limits.burstRefillPerSecond,
cost: endpointCost,
});
if (!burstResult.allowed) {
return rateLimit429Response(burstResult);
}
// All checks passed β add headers to response
const response = NextResponse.next();
const headers = buildRateLimitHeaders(minuteResult);
// Add burst remaining as additional info
headers["X-RateLimit-Burst-Remaining"] = String(burstResult.current);
headers["X-RateLimit-Plan"] = plan;
Object.entries(headers).forEach(([k, v]) => response.headers.set(k, v));
return response;
}
export const config = {
matcher: ["/api/:path*"],
};
Stripe-Style Rate Limit Response Headers
Following Stripe's API pattern makes your API developer-friendly:
// Stripe-style headers your API consumers will see:
// X-RateLimit-Limit: 300 β limit for current window
// X-RateLimit-Remaining: 247 β requests left in window
// X-RateLimit-Reset: 1740000060 β Unix timestamp when window resets
// X-RateLimit-Reset-After: 42 β seconds until window resets
// Retry-After: 42 β seconds to wait before retrying (on 429)
// Example 429 response body (Stripe-style):
{
"error": {
"type": "rate_limit_error",
"code": "rate_limited",
"message": "Too many requests made to the API too quickly.",
"param": null,
"doc_url": "https://yourapp.com/docs/api/rate-limits"
}
}
// lib/api/errors.ts β standardized error response format
export function rateLimitErrorResponse(details: {
limit: number;
current: number;
retryAfterMs: number;
resetAt: Date;
plan: string;
}) {
return {
body: {
error: {
type: "rate_limit_error",
code: "rate_limited",
message: `API rate limit exceeded for ${details.plan} plan. Upgrade for higher limits.`,
doc_url: "https://yourapp.com/docs/api/rate-limits",
},
},
headers: {
"X-RateLimit-Limit": String(details.limit),
"X-RateLimit-Remaining": "0",
"X-RateLimit-Reset": String(Math.floor(details.resetAt.getTime() / 1000)),
"X-RateLimit-Reset-After": String(Math.ceil(details.retryAfterMs / 1000)),
"X-RateLimit-Plan": details.plan,
"Retry-After": String(Math.ceil(details.retryAfterMs / 1000)),
},
status: 429,
};
}
Rate Limit Analytics
Track limit violations to understand upgrade pressure:
-- Create table to log rate limit hits
CREATE TABLE rate_limit_events (
id BIGSERIAL PRIMARY KEY,
user_id UUID NOT NULL,
workspace_id UUID,
plan TEXT NOT NULL,
endpoint TEXT NOT NULL,
limit_type TEXT NOT NULL, -- 'minute', 'hour', 'day', 'burst'
limit_value INTEGER NOT NULL,
current_value INTEGER NOT NULL,
ip_address INET,
occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_rle_user ON rate_limit_events(user_id, occurred_at DESC);
CREATE INDEX idx_rle_occurred ON rate_limit_events(occurred_at DESC);
-- Most rate-limited users (upgrade candidates)
SELECT
user_id,
plan,
COUNT(*) AS limit_hits,
COUNT(DISTINCT DATE(occurred_at)) AS days_hitting_limits,
MAX(occurred_at) AS last_hit,
MODE() WITHIN GROUP (ORDER BY endpoint) AS most_limited_endpoint
FROM rate_limit_events
WHERE occurred_at > NOW() - INTERVAL '30 days'
GROUP BY user_id, plan
HAVING COUNT(*) > 10
ORDER BY limit_hits DESC
LIMIT 50;
// Log rate limit violations (non-blocking)
async function logRateLimitViolation(params: {
userId: string;
plan: string;
endpoint: string;
limitType: string;
limit: number;
current: number;
}) {
prisma.rateLimitEvent
.create({
data: {
userId: params.userId,
plan: params.plan,
endpoint: params.endpoint,
limitType: params.limitType,
limitValue: params.limit,
currentValue: params.current,
},
})
.catch((err) => console.error("Failed to log rate limit event:", err));
}
Cost and Timeline Estimates
| Scope | Team | Timeline | Cost Range |
|---|---|---|---|
| Basic IP rate limiting (in-memory) | 1 dev | 0.5 days | $100β300 |
| Redis sliding window, single tier | 1 dev | 1β2 days | $400β800 |
| Multi-tier per-plan with burst control | 1 dev | 3β5 days | $1,000β2,000 |
| Full system (per-plan + analytics + headers + docs) | 1β2 devs | 1β2 weeks | $2,500β6,000 |
| API gateway managed rate limiting (AWS/Kong) | 1 dev | 2β3 days | $600β1,500 |
Redis running costs: A Redis cluster handling 10,000 rate limit operations/second costs ~$50β200/month (ElastiCache t3.smallβmedium).
See Also
- Redis Advanced Patterns: Caching, Pub/Sub, and Streams
- SaaS Usage-Based Billing with Stripe Meters
- SaaS Webhook System with Delivery Guarantees
- Next.js Middleware for Auth and Routing
- AWS API Gateway Authentication Patterns
Working With Viprasol
Rate limiting that works correctly is deceptively complex β the edge cases (concurrent requests, Redis script expiry, per-endpoint cost accounting) only surface under load. Our team has built rate limiting systems for SaaS APIs handling millions of requests per day, with the analytics infrastructure to turn limit violations into upgrade conversations.
What we deliver:
- Atomic Redis Lua scripts for sliding window and token bucket
- Per-plan limit configuration aligned to your pricing tiers
- Stripe-compatible response headers with full RFC 6585 compliance
- Rate limit violation logging and upgrade-pressure analytics
- Load testing to validate limits under realistic traffic
Talk to our team about your API infrastructure β
Or explore our SaaS development services to see how we build production-grade products.
About the Author
Viprasol Tech Team
Custom Software Development Specialists
The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.
Building a SaaS Product?
We've helped launch 50+ SaaS platforms. Let's build yours β fast.
Free consultation β’ No commitment β’ Response within 24 hours
Add AI automation to your SaaS product?
Viprasol builds custom AI agent crews that plug into any SaaS workflow β automating repetitive tasks, qualifying leads, and responding across every channel your customers use.