Software Scalability: Horizontal Scaling Patterns for Web Applications
Software scalability patterns in 2026 — horizontal vs vertical scaling, database sharding, caching strategies, async job queues, and how to architect web applic
Software Scalability: Horizontal Scaling Patterns for Web Applications
Most applications don't fail under load because of bad code. They fail because of architectural decisions made when traffic was 100 users/day — decisions that work perfectly until they don't.
Scalability isn't about rewriting everything in a faster language. It's about identifying where your system breaks under load, then addressing those bottlenecks systematically. The same techniques that handle 10x traffic usually handle 100x with the same approach applied more aggressively.
This guide covers the practical patterns: stateless services, caching layers, database read scaling, async job processing, and the signals that tell you what to fix next.
The Scalability Stack
Every web application has the same basic scaling stack:
Load Balancer (distributes requests)
↓
Application Servers (stateless, horizontally scalable)
↓
Cache Layer (Redis — avoid hitting DB for common reads)
↓
Database (primary for writes, replicas for reads)
↓
Job Queue (async work — don't block HTTP requests)
↓
Object Storage (S3 — files, large assets, never on disk)
Scaling any layer is straightforward once the architecture is right. Scaling the wrong layer wastes money and doesn't solve the problem.
Principle 1: Stateless Application Servers
The foundation of horizontal scaling. If your application server stores any state in memory (sessions, uploads, local file cache), you can't add more servers — requests will go to different instances and miss the state.
Common stateful patterns that block scaling:
// ❌ BAD: Session stored in server memory
app.use(session({
secret: 'mysecret',
resave: false,
saveUninitialized: false,
// No store configured = in-memory = not scalable
}));
// ✅ GOOD: Session stored in Redis — works across any number of instances
import RedisStore from 'connect-redis';
import { createClient } from 'redis';
const redisClient = createClient({ url: process.env.REDIS_URL });
await redisClient.connect();
app.use(session({
store: new RedisStore({ client: redisClient }),
secret: process.env.SESSION_SECRET!,
resave: false,
saveUninitialized: false,
cookie: { secure: true, httpOnly: true, maxAge: 86400000 },
}));
// ❌ BAD: File uploads stored on local filesystem
app.post('/upload', upload.single('file'), (req, res) => {
// req.file.path points to local disk — inaccessible to other servers
res.json({ path: req.file.path });
});
// ✅ GOOD: Files go directly to S3
import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3';
import multer from 'multer';
const s3 = new S3Client({ region: 'us-east-1' });
const upload = multer({ storage: multer.memoryStorage() }); // buffer, not disk
app.post('/upload', upload.single('file'), async (req, res) => {
const key = `uploads/${Date.now()}-${req.file.originalname}`;
await s3.send(new PutObjectCommand({
Bucket: process.env.S3_BUCKET!,
Key: key,
Body: req.file.buffer,
ContentType: req.file.mimetype,
}));
res.json({ url: `https://${process.env.CDN_DOMAIN}/${key}` });
});
🌐 Looking for a Dev Team That Actually Delivers?
Most agencies sell you a project manager and assign juniors. Viprasol is different — senior engineers only, direct Slack access, and a 5.0★ Upwork record across 100+ projects.
- React, Next.js, Node.js, TypeScript — production-grade stack
- Fixed-price contracts — no surprise invoices
- Full source code ownership from day one
- 90-day post-launch support included
Principle 2: Caching Strategy
The fastest query is the one you don't make. A well-designed cache layer can eliminate 80–95% of database reads for read-heavy workloads.
Cache-Aside Pattern (Lazy Loading)
class UserService {
private readonly CACHE_TTL = 300; // 5 minutes
async getUser(userId: string): Promise<User | null> {
const cacheKey = `user:${userId}`;
// 1. Check cache first
const cached = await redis.get(cacheKey);
if (cached) {
return JSON.parse(cached) as User;
}
// 2. Cache miss — query database
const user = await db('users').where({ id: userId }).first();
if (user) {
// 3. Populate cache for next request
await redis.setex(cacheKey, this.CACHE_TTL, JSON.stringify(user));
}
return user ?? null;
}
async updateUser(userId: string, data: Partial<User>): Promise<User> {
const user = await db('users').where({ id: userId }).update(data).returning('*');
// Invalidate cache on write
await redis.del(`user:${userId}`);
return user[0];
}
}
Write-Through Cache
For data that must be consistent immediately after writes:
async updateUserProfile(userId: string, profile: ProfileUpdate): Promise<void> {
// Write to DB and cache simultaneously
await Promise.all([
db('user_profiles').where({ user_id: userId }).update(profile),
redis.setex(`profile:${userId}`, 300, JSON.stringify({
...await this.getProfile(userId),
...profile,
})),
]);
}
What to Cache (and What Not To)
| Cache | Yes | No |
|---|---|---|
| User profile data | ✅ | |
| Session data | ✅ | |
| Expensive aggregations | ✅ | |
| Rate limit counters | ✅ | |
| Feature flag state | ✅ | |
| Financial balances | ❌ (must be consistent) | |
| Inventory counts | ❌ (stale = overselling) | |
| Actively written records | ❌ (invalidation complexity) |
Principle 3: Database Read Scaling
The database is almost always the first bottleneck. Two strategies work together:
Read Replicas
Route read queries to replicas, writes to primary:
// Two separate Knex connections — primary for writes, replica for reads
import Knex from 'knex';
const dbPrimary = Knex({
client: 'pg',
connection: process.env.DATABASE_PRIMARY_URL,
});
const dbReplica = Knex({
client: 'pg',
connection: process.env.DATABASE_REPLICA_URL,
// Add read-only timeout — replicas lag behind primary
pool: { acquireTimeoutMillis: 5000 },
});
class OrderService {
// Reads from replica
async getOrders(userId: string): Promise<Order[]> {
return dbReplica('orders').where({ user_id: userId }).orderBy('created_at', 'desc');
}
// Writes to primary
async createOrder(data: CreateOrderData): Promise<Order> {
const [order] = await dbPrimary('orders').insert(data).returning('*');
return order;
}
}
Replication lag consideration: Replica lag is typically 10–500ms. For reads immediately after a write (e.g., "show me the order I just placed"), read from primary or add a brief delay.
Index Optimization
Missing indexes are the most common cause of database performance problems. Every foreign key and every WHERE clause column should be indexed unless you've explicitly decided not to.
-- Find slow queries (requires pg_stat_statements extension)
SELECT
query,
calls,
total_exec_time / calls AS avg_ms,
rows / calls AS avg_rows
FROM pg_stat_statements
WHERE calls > 100
ORDER BY avg_ms DESC
LIMIT 20;
-- Find tables with sequential scans (usually means missing index)
SELECT
relname AS table,
seq_scan,
idx_scan,
ROUND(seq_scan::numeric / NULLIF(seq_scan + idx_scan, 0) * 100, 1) AS seq_scan_pct
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_scan DESC;
-- Check existing indexes on a table
SELECT
indexname,
indexdef,
pg_size_pretty(pg_relation_size(indexname::regclass)) AS index_size
FROM pg_indexes
WHERE tablename = 'orders';
🚀 Senior Engineers. No Junior Handoffs. Ever.
You get the senior developer, not a project manager who relays your requirements to someone you never meet. Every Viprasol project has a senior lead from kickoff to launch.
- MVPs in 4–8 weeks, full platforms in 3–5 months
- Lighthouse 90+ performance scores standard
- Works across US, UK, AU timezones
- Free 30-min architecture review, no commitment
Principle 4: Async Job Processing
Any operation that takes more than 100ms should not run synchronously in an HTTP request. This includes:
- Sending emails / SMS
- Generating reports or exports
- Processing images or files
- Calling external APIs
- Running machine learning inference
- Sending webhooks
// ❌ BAD: Email sent synchronously — user waits for SMTP roundtrip
app.post('/signup', async (req, res) => {
const user = await createUser(req.body);
await sendWelcomeEmail(user.email); // blocks response for 200–800ms
res.json({ user });
});
// ✅ GOOD: Email queued, response returned immediately
import Bull from 'bull';
const emailQueue = new Bull('email', { redis: { url: process.env.REDIS_URL } });
app.post('/signup', async (req, res) => {
const user = await createUser(req.body);
// Queue job — returns in <5ms
await emailQueue.add('welcome', { userId: user.id, email: user.email });
res.json({ user }); // Response sent immediately
});
// Worker process (separate dyno/container)
emailQueue.process('welcome', async (job) => {
const { userId, email } = job.data;
await sendWelcomeEmail(email);
await db('users').where({ id: userId }).update({ welcome_email_sent: true });
});
Job Queue Options (2026)
| Tool | Best For | Scaling |
|---|---|---|
| BullMQ (Redis) | Node.js, high throughput | Horizontal workers |
| Celery (Python) | Python, complex workflows | Horizontal workers |
| Sidekiq (Ruby) | Ruby/Rails ecosystem | Horizontal workers |
| AWS SQS + Lambda | Serverless, event-driven | Auto-scales |
| AWS SQS + ECS | Controlled scaling, cost | Manual worker scaling |
| Temporal | Complex workflows, durability | Managed or self-hosted |
Principle 5: Connection Pooling
Database connections are expensive to create (10–50ms each). Without pooling, high-concurrency applications exhaust database connections.
// PostgreSQL with PgBouncer (connection pooler) in transaction mode
// Or use pg-pool directly for moderate scale
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 20, // Max connections in pool
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});
// Monitor pool health
setInterval(() => {
console.log({
totalCount: pool.totalCount,
idleCount: pool.idleCount,
waitingCount: pool.waitingCount,
});
}, 60000);
For serverless (Lambda, Vercel Edge) — use RDS Proxy or PgBouncer. Serverless functions can create thousands of connections simultaneously; without a proxy, you'll hit PostgreSQL's connection limit.
Knowing What to Scale Next
Use these signals to identify your next bottleneck:
# Application layer bottleneck signals:
# - CPU > 80% on app servers while DB is idle
# - Response times increase linearly with concurrent users
# Solution: Add more app server instances (horizontal scale)
# Database bottleneck signals:
# - DB CPU > 70%
# - Slow query log filling up
# - Connection wait times increasing
# Solution: Add read replicas, optimize indexes, add caching
# Cache bottleneck signals:
# - Redis memory > 80% used
# - Cache hit rate < 70%
# - Redis CPU spikes
# Solution: Increase Redis memory, review eviction policy, add Redis cluster
# Network bottleneck signals:
# - Large response payloads (> 500KB per request)
# - Many small requests to same service
# Solution: Pagination, compression, HTTP/2, CDN for static assets
Scalability Cost Ranges (AWS, 2026)
| Tier | Monthly Traffic | Architecture | Monthly Cost |
|---|---|---|---|
| Starter | <100K req/day | 1 ECS task + RDS t3.small | $80–$150 |
| Growth | 1M req/day | 3 ECS tasks + RDS t3.medium + ElastiCache | $300–$600 |
| Scale | 10M req/day | 5–10 ECS tasks + RDS r6g.large + replicas + ElastiCache | $1,200–$2,500 |
| Enterprise | 100M+ req/day | Multi-region, auto-scaling, Aurora + Redis cluster | $8,000–$25,000 |
Working With Viprasol
We architect and implement scalable backend systems — from stateless service refactors through caching layers, read replica setups, and async job pipelines.
→ Architecture review →
→ Cloud Solutions →
See Also
About the Author
Viprasol Tech Team
Custom Software Development Specialists
The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.
Need a Modern Web Application?
From landing pages to complex SaaS platforms — we build it all with Next.js and React.
Free consultation • No commitment • Response within 24 hours
Need a custom web application built?
We build React and Next.js web applications with Lighthouse ≥90 scores, mobile-first design, and full source code ownership. Senior engineers only — from architecture through deployment.