Back to Blog

WebSocket Scaling: Sticky Sessions, Redis Pub/Sub Fan-Out, and Horizontal Scaling

Scale WebSocket servers horizontally: implement sticky sessions with nginx, Redis pub/sub fan-out for cross-server message delivery, Socket.io cluster adapter, and connection health monitoring.

Viprasol Tech Team
September 24, 2026
13 min read

WebSockets don't scale like HTTP. An HTTP request can hit any server in the pool — servers are stateless. A WebSocket connection is persistent and stateful, held by one specific server for its entire lifetime. This breaks naive load balancing immediately.

If a message needs to reach a user whose connection lives on server B, but arrives at server A, it gets dropped. The fix requires two things: sticky sessions (route the same client to the same server) and pub/sub fan-out (broadcast messages across all servers).


The Scaling Problem

User A → connects to Server 1
User B → connects to Server 2

Server 1 receives: "send notification to User B"
Server 1 doesn't have User B's connection → message dropped ❌

Fix: Server 1 publishes to Redis → Server 2 receives via subscription → 
     Server 2 delivers to User B ✅

Sticky Sessions with Nginx

Sticky sessions route the same client to the same upstream server based on IP hash or cookie:

# nginx/nginx.conf

upstream websocket_servers {
  # ip_hash: same IP always hits same server
  # Works well when clients have stable IPs (not mobile)
  ip_hash;

  server ws-server-1:3000;
  server ws-server-2:3000;
  server ws-server-3:3000;

  # Mark servers that are down temporarily
  # server ws-server-4:3000 down;
}

server {
  listen 80;
  listen 443 ssl;

  ssl_certificate     /etc/nginx/ssl/cert.pem;
  ssl_certificate_key /etc/nginx/ssl/key.pem;

  location /ws {
    proxy_pass http://websocket_servers;

    # Critical WebSocket headers
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

    # Keep connection alive — default 60s is too short
    proxy_read_timeout 3600s;   # 1 hour
    proxy_send_timeout 3600s;
    proxy_connect_timeout 10s;

    # Disable buffering for real-time delivery
    proxy_buffering off;
    proxy_cache off;
  }
}

Cookie-Based Stickiness (More Reliable Than IP Hash)

Mobile clients frequently change IP. Use a sticky session cookie instead:

upstream websocket_servers {
  # No ip_hash — let the cookie decide
  server ws-server-1:3000;
  server ws-server-2:3000;
  server ws-server-3:3000;

  # Keepalive pool — reuse upstream connections
  keepalive 32;
}

map $cookie_ws_server_id $ws_upstream {
  default           "ws-server-1:3000";
  "server-2"        "ws-server-2:3000";
  "server-3"        "ws-server-3:3000";
}

server {
  location /ws {
    # Route based on cookie
    proxy_pass http://$ws_upstream;
    # ... WebSocket headers as above

    # Set sticky cookie on first connection
    add_header Set-Cookie "ws_server_id=$upstream_addr; Path=/; HttpOnly; SameSite=Strict";
  }
}

☁️ Is Your Cloud Costing Too Much?

Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.

  • AWS, GCP, Azure certified engineers
  • Infrastructure as Code (Terraform, CDK)
  • Docker, Kubernetes, GitHub Actions CI/CD
  • Typical audit recovers $500–$3,000/month in savings

Redis Pub/Sub Fan-Out

When a server receives a message for a user on a different server, it publishes to Redis. All servers subscribe and deliver to local connections:

// src/websocket/redis-adapter.ts
import { Redis } from "ioredis";
import { WebSocketServer, WebSocket } from "ws";

interface Message {
  type: string;
  targetUserId?: string;
  targetRoom?: string;
  broadcast?: boolean;
  payload: unknown;
  originServerId: string;
}

export class RedisWebSocketAdapter {
  private publisher: Redis;
  private subscriber: Redis;
  private wss: WebSocketServer;
  private serverId: string;

  // Local connection maps
  private userConnections = new Map<string, Set<WebSocket>>();
  private roomConnections = new Map<string, Set<WebSocket>>();
  private connectionUsers = new Map<WebSocket, string>();

  constructor(wss: WebSocketServer, redisUrl: string) {
    this.wss = wss;
    this.serverId = `ws-server-${process.env.HOSTNAME ?? Math.random().toString(36).slice(2)}`;

    // Separate connections for pub and sub (Redis requirement)
    this.publisher = new Redis(redisUrl);
    this.subscriber = new Redis(redisUrl);

    this.subscribeToChannels();
    this.setupConnectionHandlers();
  }

  private subscribeToChannels() {
    // Subscribe to all message channels
    this.subscriber.subscribe(
      "ws:user-message",
      "ws:room-message",
      "ws:broadcast",
      (err) => {
        if (err) console.error("Redis subscription error:", err);
      }
    );

    this.subscriber.on("message", (channel, rawMessage) => {
      const message: Message = JSON.parse(rawMessage);

      // Skip messages we sent ourselves
      if (message.originServerId === this.serverId) return;

      this.deliverLocally(channel, message);
    });
  }

  private setupConnectionHandlers() {
    this.wss.on("connection", (ws, request) => {
      // Extract userId from auth token in query param or header
      const userId = this.authenticateConnection(request);
      if (!userId) {
        ws.close(4001, "Unauthorized");
        return;
      }

      // Register connection
      this.connectionUsers.set(ws, userId);
      if (!this.userConnections.has(userId)) {
        this.userConnections.set(userId, new Set());
      }
      this.userConnections.get(userId)!.add(ws);

      ws.on("close", () => {
        const uid = this.connectionUsers.get(ws);
        if (uid) {
          this.userConnections.get(uid)?.delete(ws);
          if (this.userConnections.get(uid)?.size === 0) {
            this.userConnections.delete(uid);
          }
        }
        this.connectionUsers.delete(ws);
      });

      ws.on("message", (data) => {
        this.handleClientMessage(ws, userId, JSON.parse(data.toString()));
      });

      // Send connection confirmation
      ws.send(JSON.stringify({ type: "connected", serverId: this.serverId }));
    });
  }

  // Send to a specific user (fan-out via Redis if not local)
  async sendToUser(userId: string, payload: unknown): Promise<void> {
    const message: Message = {
      type: "user-message",
      targetUserId: userId,
      payload,
      originServerId: this.serverId,
    };

    // Try local delivery first
    const delivered = this.deliverToLocalUser(userId, payload);

    // Always publish to Redis — other servers may have this user too
    await this.publisher.publish("ws:user-message", JSON.stringify(message));
  }

  // Send to all members of a room
  async sendToRoom(room: string, payload: unknown): Promise<void> {
    const message: Message = {
      type: "room-message",
      targetRoom: room,
      payload,
      originServerId: this.serverId,
    };

    // Deliver to local room members
    this.deliverToLocalRoom(room, payload);

    // Publish so other servers deliver to their members
    await this.publisher.publish("ws:room-message", JSON.stringify(message));
  }

  // Broadcast to all connected clients across all servers
  async broadcast(payload: unknown): Promise<void> {
    const message: Message = {
      type: "broadcast",
      broadcast: true,
      payload,
      originServerId: this.serverId,
    };

    // Deliver locally
    for (const ws of this.wss.clients) {
      if (ws.readyState === WebSocket.OPEN) {
        ws.send(JSON.stringify(payload));
      }
    }

    // Publish to other servers
    await this.publisher.publish("ws:broadcast", JSON.stringify(message));
  }

  private deliverLocally(channel: string, message: Message) {
    switch (channel) {
      case "ws:user-message":
        if (message.targetUserId) {
          this.deliverToLocalUser(message.targetUserId, message.payload);
        }
        break;
      case "ws:room-message":
        if (message.targetRoom) {
          this.deliverToLocalRoom(message.targetRoom, message.payload);
        }
        break;
      case "ws:broadcast":
        for (const ws of this.wss.clients) {
          if (ws.readyState === WebSocket.OPEN) {
            ws.send(JSON.stringify(message.payload));
          }
        }
        break;
    }
  }

  private deliverToLocalUser(userId: string, payload: unknown): boolean {
    const connections = this.userConnections.get(userId);
    if (!connections || connections.size === 0) return false;

    const serialized = JSON.stringify(payload);
    for (const ws of connections) {
      if (ws.readyState === WebSocket.OPEN) {
        ws.send(serialized);
      }
    }
    return true;
  }

  private deliverToLocalRoom(room: string, payload: unknown): void {
    const connections = this.roomConnections.get(room);
    if (!connections) return;

    const serialized = JSON.stringify(payload);
    for (const ws of connections) {
      if (ws.readyState === WebSocket.OPEN) {
        ws.send(serialized);
      }
    }
  }

  private authenticateConnection(request: any): string | null {
    const url = new URL(request.url, "http://localhost");
    const token = url.searchParams.get("token");
    if (!token) return null;
    return verifyWebSocketToken(token); // Returns userId or null
  }

  private handleClientMessage(ws: WebSocket, userId: string, message: any) {
    // Handle room join/leave
    if (message.type === "join-room") {
      if (!this.roomConnections.has(message.room)) {
        this.roomConnections.set(message.room, new Set());
      }
      this.roomConnections.get(message.room)!.add(ws);
    } else if (message.type === "leave-room") {
      this.roomConnections.get(message.room)?.delete(ws);
    }
  }
}

Socket.io Cluster Adapter (Simpler Alternative)

If you're using Socket.io, the @socket.io/redis-adapter handles pub/sub automatically:

// src/websocket/socketio-server.ts
import { createServer } from "http";
import { Server } from "socket.io";
import { createAdapter } from "@socket.io/redis-adapter";
import { createClient } from "redis";

const httpServer = createServer();
const io = new Server(httpServer, {
  cors: {
    origin: process.env.ALLOWED_ORIGINS?.split(",") ?? [],
    credentials: true,
  },
  // Increase ping timeout for mobile clients
  pingTimeout: 60000,
  pingInterval: 25000,
  // Allow HTTP long-polling fallback
  transports: ["websocket", "polling"],
});

// Redis adapter — pub/sub happens automatically
const pubClient = createClient({ url: process.env.REDIS_URL });
const subClient = pubClient.duplicate();

await Promise.all([pubClient.connect(), subClient.connect()]);
io.adapter(createAdapter(pubClient, subClient));

// Authentication middleware
io.use(async (socket, next) => {
  const token = socket.handshake.auth.token;
  if (!token) return next(new Error("Unauthorized"));

  const user = await verifyToken(token);
  if (!user) return next(new Error("Invalid token"));

  socket.data.userId = user.id;
  socket.data.tenantId = user.tenantId;
  next();
});

io.on("connection", (socket) => {
  const { userId, tenantId } = socket.data;

  // Auto-join user's personal room and tenant room
  socket.join(`user:${userId}`);
  socket.join(`tenant:${tenantId}`);

  socket.on("join-channel", (channelId: string) => {
    socket.join(`channel:${channelId}`);
  });

  socket.on("disconnect", () => {
    console.log(`User ${userId} disconnected from ${socket.id}`);
  });
});

// Emit to a user (works across all servers automatically)
export function sendToUser(userId: string, event: string, data: unknown) {
  io.to(`user:${userId}`).emit(event, data);
}

// Emit to all users in a tenant
export function sendToTenant(tenantId: string, event: string, data: unknown) {
  io.to(`tenant:${tenantId}`).emit(event, data);
}

httpServer.listen(3000);

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.

  • Staging + production environments with feature flags
  • Automated security scanning in the pipeline
  • Uptime monitoring + alerting + runbook automation
  • On-call support handover docs included

Connection Health and Heartbeat

// src/websocket/heartbeat.ts
// Detect and clean up dead connections

const HEARTBEAT_INTERVAL_MS = 30_000; // 30 seconds
const HEARTBEAT_TIMEOUT_MS = 10_000;  // 10 seconds to respond

interface HeartbeatState {
  lastPing: number;
  isAlive: boolean;
}

const connectionHealth = new WeakMap<WebSocket, HeartbeatState>();

export function enableHeartbeat(wss: WebSocketServer) {
  const interval = setInterval(() => {
    for (const ws of wss.clients) {
      const state = connectionHealth.get(ws);

      if (state && !state.isAlive) {
        // Client didn't respond to last ping — terminate
        console.log("Terminating dead WebSocket connection");
        ws.terminate();
        continue;
      }

      // Mark as not alive until pong received
      connectionHealth.set(ws, { lastPing: Date.now(), isAlive: false });
      ws.ping();
    }
  }, HEARTBEAT_INTERVAL_MS);

  wss.on("connection", (ws) => {
    connectionHealth.set(ws, { lastPing: Date.now(), isAlive: true });

    ws.on("pong", () => {
      const state = connectionHealth.get(ws);
      if (state) {
        state.isAlive = true;
        state.lastPing = Date.now();
      }
    });
  });

  wss.on("close", () => clearInterval(interval));
}

Monitoring WebSocket Servers

// src/websocket/metrics.ts
import { Registry, Gauge, Counter, Histogram } from "prom-client";

const registry = new Registry();

export const wsMetrics = {
  activeConnections: new Gauge({
    name: "websocket_active_connections",
    help: "Number of active WebSocket connections",
    labelNames: ["server_id"],
    registers: [registry],
  }),

  messagesDelivered: new Counter({
    name: "websocket_messages_delivered_total",
    help: "Total messages delivered via WebSocket",
    labelNames: ["type", "server_id"],
    registers: [registry],
  }),

  messageDeliveryLatency: new Histogram({
    name: "websocket_message_delivery_latency_ms",
    help: "WebSocket message delivery latency in milliseconds",
    buckets: [1, 5, 10, 25, 50, 100, 250, 500, 1000],
    registers: [registry],
  }),

  redisPublishErrors: new Counter({
    name: "websocket_redis_publish_errors_total",
    help: "Total Redis pub/sub errors",
    registers: [registry],
  }),
};

// Expose metrics endpoint for Prometheus
export async function getMetrics() {
  return registry.metrics();
}

Capacity Planning

Server SizeMax ConnectionsMemoryNotes
1 vCPU, 1GB~5,000100B/connDevelopment only
2 vCPU, 4GB~25,000100B/connSmall production
4 vCPU, 8GB~60,000100B/connMid-scale
8 vCPU, 16GB~100,000100B/connHigh-scale

Rule of thumb: Each idle WebSocket connection uses ~100–200 bytes of memory. Active connections (high message volume) use more CPU than memory.

Redis capacity: One Redis pub/sub channel can handle ~100K messages/second. At high scale, use Redis Cluster with channel sharding.


See Also


Working With Viprasol

WebSocket scaling requires different architectural thinking than REST APIs. Our engineers have built WebSocket infrastructure handling 500K+ concurrent connections across trading platforms, live collaboration tools, and notification systems — with Redis pub/sub fan-out, connection health monitoring, and zero message drops under failover.

Real-time systems → | Talk to our engineers →

Share this article:

About the Author

V

Viprasol Tech Team

Custom Software Development Specialists

The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.

MT4/MT5 EA DevelopmentAI Agent SystemsSaaS DevelopmentAlgorithmic Trading

Need DevOps & Cloud Expertise?

Scale your infrastructure with confidence. AWS, GCP, Azure certified team.

Free consultation • No commitment • Response within 24 hours

Viprasol · Big Data & Analytics

Making sense of your data at scale?

Viprasol builds end-to-end big data analytics solutions — ETL pipelines, data warehouses on Snowflake or BigQuery, and self-service BI dashboards. One reliable source of truth for your entire organisation.