Introduction to Load Balancing
Load balancing distributes incoming traffic across multiple servers to ensure no single server becomes overwhelmed. It's fundamental to building scalable, highly available systems.
Why Load Balance?
Scalability
Handle more traffic by adding more servers.
High Availability
If one server fails, others continue serving requests.
Performance
Route requests to the least loaded or nearest server.
Flexibility
Add/remove servers without affecting users.
Types of Load Balancers
Layer 4 (Transport Layer)
Routes based on IP address and TCP/UDP port:
- Faster (no content inspection)
- Can't make routing decisions based on content
- Examples: AWS NLB, HAProxy (TCP mode)
Layer 7 (Application Layer)
Routes based on HTTP content (URL, headers, cookies):
- More flexible routing options
- Can do SSL termination
- Slightly higher latency
- Examples: AWS ALB, Nginx, HAProxy (HTTP mode)
Load Balancing Algorithms
Round Robin
Requests are distributed sequentially across servers:
Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A
...
Pros: Simple, even distribution Cons: Ignores server load and capacity Use when: Servers are identical, requests are similar
Weighted Round Robin
Servers with more capacity receive more requests:
Server A (weight 3): Gets 3 requests
Server B (weight 2): Gets 2 requests
Server C (weight 1): Gets 1 request
Use when: Servers have different capacities
Least Connections
Routes to the server with fewest active connections:
Server A: 10 connections ← Next request
Server B: 25 connections
Server C: 15 connections
Pros: Accounts for request duration Cons: Doesn't consider server capacity Use when: Requests have varying durations
Weighted Least Connections
Combines least connections with server weights:
Score = active_connections / weight
Route to lowest score
IP Hash
Routes based on client IP address:
hash(client_ip) % server_count = server_index
Pros: Same client always hits same server (sticky) Cons: Can become unbalanced, problematic with NAT Use when: Session affinity needed, no shared session store
Consistent Hashing
Servers are placed on a hash ring. Requests are routed to the nearest server clockwise:
Server A
/ \
Request → Server B
\ /
Server C
Pros: Minimal redistribution when servers change Cons: More complex to implement Use when: Caching, minimizing cache invalidation
Random
Randomly select a server:
server = servers[random()]
Pros: Simple, no state needed Cons: Can be uneven short-term Use when: Large number of requests averages out
Least Response Time
Routes to the server with fastest response + fewest connections:
Pros: Considers actual server performance Cons: Requires response time tracking Use when: Server performance varies
Health Checks
Load balancers must detect unhealthy servers:
Passive Health Checks
Monitor actual traffic for errors:
If server returns 5xx errors → Mark unhealthy
Active Health Checks
Periodically probe servers:
Every 10 seconds:
GET /health → 200 OK → Healthy
GET /health → timeout → Unhealthy
Health Check Configuration
- Interval: How often to check (5-30 seconds)
- Timeout: How long to wait for response
- Threshold: Failures before marking unhealthy
- Recovery: Successes before marking healthy
Session Persistence (Sticky Sessions)
Sometimes requests from the same user must go to the same server:
Cookie-Based
Load balancer sets a cookie identifying the server:
Set-Cookie: SERVERID=server-a
IP-Based
Route based on client IP (problematic with mobile/NAT).
When to Avoid
Sticky sessions reduce flexibility. Prefer:
- Externalized session storage (Redis)
- Stateless architecture (JWT)
SSL/TLS Termination
Load balancers can handle SSL:
SSL Termination
Client → HTTPS → Load Balancer → HTTP → Servers
Pros: Offloads crypto from servers Cons: Internal traffic is unencrypted
SSL Passthrough
Client → HTTPS → Load Balancer → HTTPS → Servers
Pros: End-to-end encryption Cons: Can't inspect traffic for routing
SSL Re-encryption
Client → HTTPS → Load Balancer → HTTPS → Servers
Decrypt at LB, re-encrypt for internal communication.
High Availability for Load Balancers
Load balancers shouldn't be single points of failure:
Active-Passive
Primary LB (active) ← Traffic
Secondary LB (standby)
Failover via virtual IP (VIP) or DNS.
Active-Active
LB 1 ← Traffic (50%)
LB 2 ← Traffic (50%)
Both handle traffic, DNS or anycast distributes.
Global Load Balancing
Distribute traffic across geographic regions:
DNS-Based
Return different IPs based on client location:
US user → US server IP
EU user → EU server IP
Anycast
Same IP announced from multiple locations. Network routes to nearest.
GeoDNS + Regional LBs
Global DNS → Regional LB → Local Servers
Popular Load Balancers
Software
- Nginx: HTTP/TCP, widely used
- HAProxy: High performance, feature-rich
- Envoy: Modern, service mesh oriented
- Traefik: Cloud-native, auto-discovery
Cloud
- AWS: ALB (L7), NLB (L4), Global Accelerator
- GCP: Cloud Load Balancing
- Azure: Azure Load Balancer, Application Gateway
Best Practices
- Use health checks: Detect and remove unhealthy servers
- Choose the right algorithm: Match your traffic pattern
- Plan for LB failures: Active-passive or active-active
- Monitor everything: Latency, error rates, server health
- Consider geography: Use global load balancing for distributed users
- Externalize state: Avoid sticky sessions when possible
Conclusion
Load balancing is fundamental to building scalable systems. Understanding the various algorithms and their trade-offs helps you make the right choice for your specific requirements.