Load Balancing Strategies | Archicise

Introduction to Load Balancing

Load balancing distributes incoming traffic across multiple servers to ensure no single server becomes overwhelmed. It's fundamental to building scalable, highly available systems.

Why Load Balance?

Scalability

Handle more traffic by adding more servers.

High Availability

If one server fails, others continue serving requests.

Performance

Route requests to the least loaded or nearest server.

Flexibility

Add/remove servers without affecting users.

Types of Load Balancers

Layer 4 (Transport Layer)

Routes based on IP address and TCP/UDP port:

Faster (no content inspection)
Can't make routing decisions based on content
Examples: AWS NLB, HAProxy (TCP mode)

Layer 7 (Application Layer)

Routes based on HTTP content (URL, headers, cookies):

More flexible routing options
Can do SSL termination
Slightly higher latency
Examples: AWS ALB, Nginx, HAProxy (HTTP mode)

Load Balancing Algorithms

Round Robin

Requests are distributed sequentially across servers:

Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A
...

Pros: Simple, even distribution Cons: Ignores server load and capacity Use when: Servers are identical, requests are similar

Weighted Round Robin

Servers with more capacity receive more requests:

Server A (weight 3): Gets 3 requests
Server B (weight 2): Gets 2 requests
Server C (weight 1): Gets 1 request

Use when: Servers have different capacities

Least Connections

Routes to the server with fewest active connections:

Server A: 10 connections ← Next request
Server B: 25 connections
Server C: 15 connections

Pros: Accounts for request duration Cons: Doesn't consider server capacity Use when: Requests have varying durations

Weighted Least Connections

Combines least connections with server weights:

Score = active_connections / weight
Route to lowest score

IP Hash

Routes based on client IP address:

hash(client_ip) % server_count = server_index

Pros: Same client always hits same server (sticky) Cons: Can become unbalanced, problematic with NAT Use when: Session affinity needed, no shared session store

Consistent Hashing

Servers are placed on a hash ring. Requests are routed to the nearest server clockwise:

        Server A
       /        \
Request →        Server B
       \        /
        Server C

Pros: Minimal redistribution when servers change Cons: More complex to implement Use when: Caching, minimizing cache invalidation

Random

Randomly select a server:

server = servers[random()]

Pros: Simple, no state needed Cons: Can be uneven short-term Use when: Large number of requests averages out

Least Response Time

Routes to the server with fastest response + fewest connections:

Pros: Considers actual server performance Cons: Requires response time tracking Use when: Server performance varies

Health Checks

Load balancers must detect unhealthy servers:

Passive Health Checks

Monitor actual traffic for errors:

If server returns 5xx errors → Mark unhealthy

Active Health Checks

Periodically probe servers:

Every 10 seconds:
  GET /health → 200 OK → Healthy
  GET /health → timeout → Unhealthy

Health Check Configuration

Interval: How often to check (5-30 seconds)
Timeout: How long to wait for response
Threshold: Failures before marking unhealthy
Recovery: Successes before marking healthy

Session Persistence (Sticky Sessions)

Sometimes requests from the same user must go to the same server:

Cookie-Based

Load balancer sets a cookie identifying the server:

Set-Cookie: SERVERID=server-a

IP-Based

Route based on client IP (problematic with mobile/NAT).

When to Avoid

Sticky sessions reduce flexibility. Prefer:

Externalized session storage (Redis)
Stateless architecture (JWT)

SSL/TLS Termination

Load balancers can handle SSL:

SSL Termination

Client → HTTPS → Load Balancer → HTTP → Servers

Pros: Offloads crypto from servers Cons: Internal traffic is unencrypted

SSL Passthrough

Client → HTTPS → Load Balancer → HTTPS → Servers

Pros: End-to-end encryption Cons: Can't inspect traffic for routing

SSL Re-encryption

Client → HTTPS → Load Balancer → HTTPS → Servers

Decrypt at LB, re-encrypt for internal communication.

High Availability for Load Balancers

Load balancers shouldn't be single points of failure:

Active-Passive

Primary LB (active) ← Traffic
Secondary LB (standby)

Failover via virtual IP (VIP) or DNS.

Active-Active

LB 1 ← Traffic (50%)
LB 2 ← Traffic (50%)

Both handle traffic, DNS or anycast distributes.

Global Load Balancing

Distribute traffic across geographic regions:

DNS-Based

Return different IPs based on client location:

US user → US server IP
EU user → EU server IP

Anycast

Same IP announced from multiple locations. Network routes to nearest.

GeoDNS + Regional LBs

Global DNS → Regional LB → Local Servers

Popular Load Balancers

Software

Nginx: HTTP/TCP, widely used
HAProxy: High performance, feature-rich
Envoy: Modern, service mesh oriented
Traefik: Cloud-native, auto-discovery

Cloud

AWS: ALB (L7), NLB (L4), Global Accelerator
GCP: Cloud Load Balancing
Azure: Azure Load Balancer, Application Gateway

Best Practices

Use health checks: Detect and remove unhealthy servers
Choose the right algorithm: Match your traffic pattern
Plan for LB failures: Active-passive or active-active
Monitor everything: Latency, error rates, server health
Consider geography: Use global load balancing for distributed users
Externalize state: Avoid sticky sessions when possible

Conclusion

Load balancing is fundamental to building scalable systems. Understanding the various algorithms and their trade-offs helps you make the right choice for your specific requirements.