Software Scalability: Horizontal Scaling Patterns for Web Applications

Most applications don't fail under load because of bad code. They fail because of architectural decisions made when traffic was 100 users/day — decisions that work perfectly until they don't.

Scalability isn't about rewriting everything in a faster language. It's about identifying where your system breaks under load, then addressing those bottlenecks systematically. The same techniques that handle 10x traffic usually handle 100x with the same approach applied more aggressively.

This guide covers the practical patterns: stateless services, caching layers, database read scaling, async job processing, and the signals that tell you what to fix next.

The Scalability Stack

Every web application has the same basic scaling stack:

Load Balancer (distributes requests)
        ↓
Application Servers (stateless, horizontally scalable)
        ↓
Cache Layer (Redis — avoid hitting DB for common reads)
        ↓
Database (primary for writes, replicas for reads)
        ↓
Job Queue (async work — don't block HTTP requests)
        ↓
Object Storage (S3 — files, large assets, never on disk)

Scaling any layer is straightforward once the architecture is right. Scaling the wrong layer wastes money and doesn't solve the problem.

Principle 1: Stateless Application Servers

The foundation of horizontal scaling. If your application server stores any state in memory (sessions, uploads, local file cache), you can't add more servers — requests will go to different instances and miss the state.

Common stateful patterns that block scaling:

// ❌ BAD: Session stored in server memory
app.use(session({
  secret: 'mysecret',
  resave: false,
  saveUninitialized: false,
  // No store configured = in-memory = not scalable
}));

// ✅ GOOD: Session stored in Redis — works across any number of instances
import RedisStore from 'connect-redis';
import { createClient } from 'redis';

const redisClient = createClient({ url: process.env.REDIS_URL });
await redisClient.connect();

app.use(session({
  store: new RedisStore({ client: redisClient }),
  secret: process.env.SESSION_SECRET!,
  resave: false,
  saveUninitialized: false,
  cookie: { secure: true, httpOnly: true, maxAge: 86400000 },
}));

// ❌ BAD: File uploads stored on local filesystem
app.post('/upload', upload.single('file'), (req, res) => {
  // req.file.path points to local disk — inaccessible to other servers
  res.json({ path: req.file.path });
});

// ✅ GOOD: Files go directly to S3
import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3';
import multer from 'multer';

const s3 = new S3Client({ region: 'us-east-1' });
const upload = multer({ storage: multer.memoryStorage() }); // buffer, not disk

app.post('/upload', upload.single('file'), async (req, res) => {
  const key = `uploads/${Date.now()}-${req.file.originalname}`;
  
  await s3.send(new PutObjectCommand({
    Bucket: process.env.S3_BUCKET!,
    Key: key,
    Body: req.file.buffer,
    ContentType: req.file.mimetype,
  }));

  res.json({ url: `https://${process.env.CDN_DOMAIN}/${key}` });
});

🌐 Looking for a Dev Team That Actually Delivers?

Most agencies sell you a project manager and assign juniors. Viprasol is different — senior engineers only, direct Slack access, and a 5.0★ Upwork record across 100+ projects.

React, Next.js, Node.js, TypeScript — production-grade stack
Fixed-price contracts — no surprise invoices
Full source code ownership from day one
90-day post-launch support included

Get a Free Scope Review WhatsApp

Principle 2: Caching Strategy

The fastest query is the one you don't make. A well-designed cache layer can eliminate 80–95% of database reads for read-heavy workloads.

Cache-Aside Pattern (Lazy Loading)

class UserService {
  private readonly CACHE_TTL = 300; // 5 minutes

  async getUser(userId: string): Promise<User | null> {
    const cacheKey = `user:${userId}`;
    
    // 1. Check cache first
    const cached = await redis.get(cacheKey);
    if (cached) {
      return JSON.parse(cached) as User;
    }

    // 2. Cache miss — query database
    const user = await db('users').where({ id: userId }).first();
    
    if (user) {
      // 3. Populate cache for next request
      await redis.setex(cacheKey, this.CACHE_TTL, JSON.stringify(user));
    }

    return user ?? null;
  }

  async updateUser(userId: string, data: Partial<User>): Promise<User> {
    const user = await db('users').where({ id: userId }).update(data).returning('*');
    
    // Invalidate cache on write
    await redis.del(`user:${userId}`);
    
    return user[0];
  }
}

Write-Through Cache

For data that must be consistent immediately after writes:

async updateUserProfile(userId: string, profile: ProfileUpdate): Promise<void> {
  // Write to DB and cache simultaneously
  await Promise.all([
    db('user_profiles').where({ user_id: userId }).update(profile),
    redis.setex(`profile:${userId}`, 300, JSON.stringify({
      ...await this.getProfile(userId),
      ...profile,
    })),
  ]);
}

What to Cache (and What Not To)

Cache	Yes	No
User profile data	✅
Session data	✅
Expensive aggregations	✅
Rate limit counters	✅
Feature flag state	✅
Financial balances		❌ (must be consistent)
Inventory counts		❌ (stale = overselling)
Actively written records		❌ (invalidation complexity)

Principle 3: Database Read Scaling

The database is almost always the first bottleneck. Two strategies work together:

Read Replicas

Route read queries to replicas, writes to primary:

// Two separate Knex connections — primary for writes, replica for reads
import Knex from 'knex';

const dbPrimary = Knex({
  client: 'pg',
  connection: process.env.DATABASE_PRIMARY_URL,
});

const dbReplica = Knex({
  client: 'pg',
  connection: process.env.DATABASE_REPLICA_URL,
  // Add read-only timeout — replicas lag behind primary
  pool: { acquireTimeoutMillis: 5000 },
});

class OrderService {
  // Reads from replica
  async getOrders(userId: string): Promise<Order[]> {
    return dbReplica('orders').where({ user_id: userId }).orderBy('created_at', 'desc');
  }

  // Writes to primary
  async createOrder(data: CreateOrderData): Promise<Order> {
    const [order] = await dbPrimary('orders').insert(data).returning('*');
    return order;
  }
}

Replication lag consideration: Replica lag is typically 10–500ms. For reads immediately after a write (e.g., "show me the order I just placed"), read from primary or add a brief delay.

Index Optimization

Missing indexes are the most common cause of database performance problems. Every foreign key and every WHERE clause column should be indexed unless you've explicitly decided not to.

-- Find slow queries (requires pg_stat_statements extension)
SELECT
  query,
  calls,
  total_exec_time / calls AS avg_ms,
  rows / calls AS avg_rows
FROM pg_stat_statements
WHERE calls > 100
ORDER BY avg_ms DESC
LIMIT 20;

-- Find tables with sequential scans (usually means missing index)
SELECT
  relname AS table,
  seq_scan,
  idx_scan,
  ROUND(seq_scan::numeric / NULLIF(seq_scan + idx_scan, 0) * 100, 1) AS seq_scan_pct
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_scan DESC;

-- Check existing indexes on a table
SELECT
  indexname,
  indexdef,
  pg_size_pretty(pg_relation_size(indexname::regclass)) AS index_size
FROM pg_indexes
WHERE tablename = 'orders';

🚀 Senior Engineers. No Junior Handoffs. Ever.

You get the senior developer, not a project manager who relays your requirements to someone you never meet. Every Viprasol project has a senior lead from kickoff to launch.

MVPs in 4–8 weeks, full platforms in 3–5 months
Lighthouse 90+ performance scores standard
Works across US, UK, AU timezones
Free 30-min architecture review, no commitment

Start My Project WhatsApp

Principle 4: Async Job Processing

Any operation that takes more than 100ms should not run synchronously in an HTTP request. This includes:

Sending emails / SMS
Generating reports or exports
Processing images or files
Calling external APIs
Running machine learning inference
Sending webhooks

// ❌ BAD: Email sent synchronously — user waits for SMTP roundtrip
app.post('/signup', async (req, res) => {
  const user = await createUser(req.body);
  await sendWelcomeEmail(user.email); // blocks response for 200–800ms
  res.json({ user });
});

// ✅ GOOD: Email queued, response returned immediately
import Bull from 'bull';

const emailQueue = new Bull('email', { redis: { url: process.env.REDIS_URL } });

app.post('/signup', async (req, res) => {
  const user = await createUser(req.body);
  
  // Queue job — returns in <5ms
  await emailQueue.add('welcome', { userId: user.id, email: user.email });
  
  res.json({ user }); // Response sent immediately
});

// Worker process (separate dyno/container)
emailQueue.process('welcome', async (job) => {
  const { userId, email } = job.data;
  await sendWelcomeEmail(email);
  await db('users').where({ id: userId }).update({ welcome_email_sent: true });
});

Job Queue Options (2026)

Tool	Best For	Scaling
BullMQ (Redis)	Node.js, high throughput	Horizontal workers
Celery (Python)	Python, complex workflows	Horizontal workers
Sidekiq (Ruby)	Ruby/Rails ecosystem	Horizontal workers
AWS SQS + Lambda	Serverless, event-driven	Auto-scales
AWS SQS + ECS	Controlled scaling, cost	Manual worker scaling
Temporal	Complex workflows, durability	Managed or self-hosted

Principle 5: Connection Pooling

Database connections are expensive to create (10–50ms each). Without pooling, high-concurrency applications exhaust database connections.

// PostgreSQL with PgBouncer (connection pooler) in transaction mode
// Or use pg-pool directly for moderate scale

const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: 20,           // Max connections in pool
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
});

// Monitor pool health
setInterval(() => {
  console.log({
    totalCount: pool.totalCount,
    idleCount: pool.idleCount,
    waitingCount: pool.waitingCount,
  });
}, 60000);

For serverless (Lambda, Vercel Edge) — use RDS Proxy or PgBouncer. Serverless functions can create thousands of connections simultaneously; without a proxy, you'll hit PostgreSQL's connection limit.

Knowing What to Scale Next

Use these signals to identify your next bottleneck:

# Application layer bottleneck signals:
# - CPU > 80% on app servers while DB is idle
# - Response times increase linearly with concurrent users
# Solution: Add more app server instances (horizontal scale)

# Database bottleneck signals:
# - DB CPU > 70%
# - Slow query log filling up
# - Connection wait times increasing
# Solution: Add read replicas, optimize indexes, add caching

# Cache bottleneck signals:
# - Redis memory > 80% used
# - Cache hit rate < 70%
# - Redis CPU spikes
# Solution: Increase Redis memory, review eviction policy, add Redis cluster

# Network bottleneck signals:
# - Large response payloads (> 500KB per request)
# - Many small requests to same service
# Solution: Pagination, compression, HTTP/2, CDN for static assets

Scalability Cost Ranges (AWS, 2026)

Tier	Monthly Traffic	Architecture	Monthly Cost
Starter	<100K req/day	1 ECS task + RDS t3.small	$80–$150
Growth	1M req/day	3 ECS tasks + RDS t3.medium + ElastiCache	$300–$600
Scale	10M req/day	5–10 ECS tasks + RDS r6g.large + replicas + ElastiCache	$1,200–$2,500
Enterprise	100M+ req/day	Multi-region, auto-scaling, Aurora + Redis cluster	$8,000–$25,000

Working With Viprasol

We architect and implement scalable backend systems — from stateless service refactors through caching layers, read replica setups, and async job pipelines.

→ Architecture review →
→ Cloud Solutions →

Software Scalability: Horizontal Scaling Patterns for Web Applications

Software Scalability: Horizontal Scaling Patterns for Web Applications

The Scalability Stack

Principle 1: Stateless Application Servers

🌐 Looking for a Dev Team That Actually Delivers?

Principle 2: Caching Strategy

Cache-Aside Pattern (Lazy Loading)

Write-Through Cache

What to Cache (and What Not To)

Principle 3: Database Read Scaling

Read Replicas

Index Optimization

🚀 Senior Engineers. No Junior Handoffs. Ever.

Principle 4: Async Job Processing

Job Queue Options (2026)

Principle 5: Connection Pooling

Knowing What to Scale Next

Scalability Cost Ranges (AWS, 2026)

Working With Viprasol

See Also

Sources

Viprasol Tech Team

Need a Modern Web Application?

Need a custom web application built?

Related Articles

PostgreSQL Partitioning in 2026: Range, List, and Hash Partitioning with Partition Pruning

PostgreSQL UUID vs Serial vs ULID in 2026: Ordering, Indexing, and Sharding Trade-offs

Redis Use Cases: Caching, Pub/Sub, Rate Limiting, and Session Storage