Reverse Proxy Load Balancing: Queues, Health Checks, and a Reproducible Scheduler
Reverse Proxy Load Balancing: Queues, Health Checks, and a Reproducible Scheduler

Reverse Proxy Load Balancing: Queues, Health Checks, and a Reproducible Scheduler

A reverse proxy frequently serves as the critical entry point for modern web architectures, handling TLS termination, request routing, and load balancing. While simply alternating requests across two backends (Round Robin) is logically straightforward, it catastrophically fails under heterogeneous workloads. When service time varies, equalizing request counts exacerbates tail latency due to Head-of-Line (HoL) blocking. In this deep-dive, we mathematically dissect load balancing utilizing Queuing Theory, explore Nginx’s C source code for Smooth Weighted Round Robin, and mathematically prove the variance reduction of Consistent Hashing with virtual nodes via MurmurHash3.

1. The Queuing Theory of Load Balancing

Why does Round Robin fail? Let’s model our proxy-to-backend system as an $M/M/c$ queue (Poisson arrivals, Exponential service time, $c$ backend servers). Under Round Robin, the proxy blindly dispatches requests, effectively decoupling the system into $c$ independent $M/M/1$ queues.

According to Little’s Law and the Pollaczek-Khinchine formula, the waiting time in an $M/M/1$ queue is highly sensitive to the variance of service times ($sigma^2$). If one request requires a heavy database aggregation (high $sigma^2$), the specific $M/M/1$ queue it occupies becomes saturated, blocking all subsequent requests assigned to that backend, even if other backends are idle.

In contrast, a Least-Connections algorithm acts dynamically, approximating an $M/M/c$ global queue where requests are dispatched to the first available worker. The probability of delay $P(W > 0)$ in an $M/M/c$ queue is defined by Erlang’s C formula:

$$ C(c, lambda/mu) = frac{frac{(c rho)^c}{c!} frac{1}{1-rho}}{sum_{k=0}^{c-1} frac{(c rho)^k}{k!} + frac{(c rho)^c}{c!} frac{1}{1-rho}} $$

Mathematically, the $M/M/c$ model drastically reduces the variance of waiting times compared to $c$ disjoint $M/M/1$ queues, proving why dynamic active-load-aware routing is strictly superior to static Round Robin.


graph TD
    Proxy[Reverse Proxy / eBPF XDP]
    
    subgraph M/M/c Dynamic Queueing (Least Connections)
        Queue((Global Virtual Queue))
        B1[Backend Node 1]
        B2[Backend Node 2]
        B3[Backend Node 3]
        
        Proxy ==>|O(1) Dispatch| Queue
        Queue -->|Idle Worker Pull| B1
        Queue -->|Idle Worker Pull| B2
        Queue -->|Idle Worker Pull| B3
    end
    
    style Queue fill:#e6f3ff,stroke:#0066cc

2. Source Code Analysis: Nginx Smooth Weighted Round Robin (SWRR)

When weights are introduced (e.g., node A is 3x faster than node B), Nginx does not naively send 3 requests to A, then 1 to B (A, A, A, B). That would cause bursty micro-saturations. Instead, Nginx implemented the Smooth Weighted Round Robin (SWRR) algorithm, written in C inside ngx_http_upstream_module.c.


// Simplified Nginx SWRR core logic
// ngx_http_upstream_round_robin.c
ngx_http_upstream_rr_peer_t *peer, *best = NULL;
ngx_uint_t total = 0;

for (peer = peers->peer; peer; peer = peer->next) {
    if (peer->down || peer->max_fails <= peer->fails) {
        continue;
    }
    
    peer->current_weight += peer->effective_weight;
    total += peer->effective_weight;
    
    if (best == NULL || peer->current_weight > best->current_weight) {
        best = peer;
    }
}

if (best == NULL) { return NULL; }

best->current_weight -= total;
return best;

By constantly accumulating effective_weight and subtracting the total weight from the chosen peer, Nginx ensures a perfectly interleaved distribution (A, B, A, A), minimizing transient queue buildup on heavy nodes.

3. Consistent Hashing and Virtual Node Mathematical Distribution

When caching statefully, requests must route to the same backend based on a key (e.g., User ID). Standard modulo hashing ($H(k) pmod n$) collapses when a node dies, remapping nearly $100%$ of keys and causing a catastrophic cache stampede. Consistent Hashing maps nodes and keys onto a unit circle $[0, 2^{32}-1]$.

However, pure consistent hashing suffers from skewed load variance. If we map $N$ physical nodes, the expected load variance is high. To solve this, we introduce $V$ virtual nodes per physical node. Using MurmurHash3 (which provides excellent avalanche properties), we map $N times V$ virtual nodes onto the ring.

The standard deviation of load across nodes $sigma_{load}$ scales mathematically inversely with the square root of virtual nodes:

$$ sigma_{load} approx frac{1}{sqrt{V}} $$

In production architectures like Envoy’s Maglev or Ketama, $V$ is typically set between $100$ and $256$, ensuring that key distribution is uniform within a $1%$ error margin, completely eliminating hot-spots.

4. eBPF: Preemptive Health Checks at the Kernel Level

Layer 7 HTTP health checks are slow. Waiting for 3 timeouts of 5 seconds means a node stays “healthy” while dropping thousands of packets. High-performance proxies utilize eBPF to monitor kernel TCP metrics directly.

By tracing the tcp_drop kernel function or monitoring the TCP listen backlog queue, an eBPF daemon can detect a struggling backend microsecond-level precision. When the backlog queue depth $Q_d ge Q_{max}$, the eBPF program updates a BPF map. The load balancer reads this map and immediately drains traffic from the node, achieving 0-RTT eviction before a single HTTP 502 Bad Gateway is ever returned to the client.

5. Engineer’s Perspective: Real-World Catastrophes

The Active-Passive Split-Brain via VRRP: High-availability load balancers (HAProxy/Keepalived) use VRRP for Virtual IP failover. In a 10G mesh, a 50ms BGP reconvergence caused a VRRP partition. Both proxies assumed the Active role. We witnessed violent MAC address flapping in the Arista switches, dropping 50% of packets. The fix? BFD (Bidirectional Forwarding Detection) tied to BGP, and an external Raft-based distributed lock (etcd) for VIP fencing.

FAQ

Should a proxy retry every 5xx response?

Absolutely not. Automatic retries of non-idempotent methods (POST) can result in double-billing incidents. Even for GET requests, uncontrolled retries amplify traffic by a factor of $R$. If an upstream service is struggling due to database locks, multiplying the traffic via proxy retries will instantly trigger a cascading failure (Retry Storm). Always implement exponential backoff, jitter, and strict failure budgets.

References

Search questions

FAQ

Who is this article for?

This article is for readers who want a professional-level guide to Reverse Proxy Load Balancing: Queues, Health Checks, and a Reproducible Scheduler. It takes about 13 min and focuses on Reverse Proxy, Load Balancing, Health Checks, Python.

What should I read next?

The recommended next step is Proxy Cache Revalidation: Cache-Control, ETag, and Observable Correctness, so the article connects into a longer learning route instead of ending as an isolated note.

Does this article include runnable code or companion resources?

Yes. Use the run notes, resource cards, and download links on the page to reproduce the example or inspect the companion files.

How does this article fit into the larger site?

It is connected to the article context block, learning routes, resources, and project timeline so readers can move from concept to implementation.

Article context

Network Fundamentals

A reproducible route through DNS, TCP, TLS, HTTP/3, proxy tunnels, load balancing, and shared caches with code and figures.

Level: Professional Reading time: 13 min
  • Reverse Proxy
  • Load Balancing
  • Health Checks
  • Python
Other language version 反向代理负载均衡原理:队列、健康检查和可复现调度实验
Share summary Reverse Proxy Load Balancing: Queues, Health Checks, and a Reproducible Scheduler

Compare round robin and load-aware queue selection while reasoning about health checks and retry boundaries.

Download share card Open share center

Companion resources

Leave a Reply

Project timeline

Published posts

  1. DNS Resolution Explained: Build a TTL Cache and Packet Parser in Python A runnable DNS guide covering resolution paths, response headers, TTL cache latency, and deterministic Python/C experiments.
  2. CIDR, Longest Prefix Match, and MTU: Calculate IP Routing Step by Step Calculate CIDR ranges, longest-prefix route choice, and MTU/MSS payload segmentation with runnable Python and C examples.
  3. TCP Reliability and Congestion Window: A Runnable Sequence Number Experiment Track TCP sequence numbers, cumulative ACKs, loss, retransmission, and congestion-window changes with safe local experiments.
  4. HTTPS and TLS 1.3 Handshake: Keys, Certificates, and RTT in Practice Understand TLS 1.3 message flights, certificate authentication, ephemeral key agreement, and handshake latency with a safe teaching model.
  5. HTTP/2, HTTP/3, and CDN Caching: Read Page Speed from a Waterfall A deterministic browser-waterfall model for HTTP/2, HTTP/3, QUIC streams, and CDN cache hits or misses.
  6. Forward Proxy vs Reverse Proxy: Connection Paths, Trust Boundaries, and Latency A reproducible guide to forward proxies, reverse proxies, tunnels, TLS boundaries, and latency segments.
  7. HTTP CONNECT and HTTPS Proxy Tunnels: TLS Boundaries and Handshake Latency An RFC-based explanation of CONNECT tunnels, encrypted HTTPS payloads, and modeled first-request latency.
  8. SOCKS5 Proxy Explained: Protocol Bytes, DNS Resolution Boundaries, and Leakage Risk Decode safe SOCKS5 CONNECT bytes and compare local-DNS and proxy-side hostname resolution boundaries.
  9. Reverse Proxy Load Balancing: Queues, Health Checks, and a Reproducible Scheduler Compare round robin and load-aware queue selection while reasoning about health checks and retry boundaries.
  10. Proxy Cache Revalidation: Cache-Control, ETag, and Observable Correctness Use an RFC 9111 shared-cache model to calculate MISS, HIT, and 304 revalidation latency and correctness boundaries.

Published resources

  1. Network Fundamentals Lab README Setup, no-privilege safety boundary, ten Python experiments, and three C examples.
  2. Network fundamentals full lab bundle Bundles Python/C source, fixed scenarios, ten result CSVs, and protocol/proxy figures.
  3. DNS TTL results CSV HIT/MISS state, expiry, and latency for four fixed lookups.
  4. CIDR and MTU results CSV Longest-prefix route and 3600-byte payload segmentation results.
  5. TCP cwnd events CSV Per-round ACK, window, and deterministic retransmission events.
  6. TLS 1.3 flight results CSV Message direction, timing, and teaching shared value in a fixed RTT model.
  7. HTTP/CDN waterfall results CSV Phase timing for HTTP/2 and HTTP/3 in cold and warm cache models.
  8. Proxy path latency results CSV Phase timing for direct access, forward-proxy tunneling, and reverse-proxy cache paths.
  9. CONNECT/TLS timeline CSV Records CONNECT authority, tunnel establishment, and the encrypted HTTPS-request boundary.
  10. SOCKS5 DNS boundary CSV Stores ATYP, destination bytes, request length, and modeled local DNS counts.
  11. Proxy load-balancing queue CSV Compares backend selection and queue waiting for round robin and least queue.
  12. Proxy cache revalidation CSV Records MISS, HIT, 304 revalidation, object age, and response latency.
  13. Network request path visualizer Adjust TTL, prefixes, loss, handshake RTT, and cache paths in the browser.
  14. Network fundamentals topic share card A 1200x630 SVG card for the DNS, TLS, HTTP/3, proxy tunnel, and caching topic hub.

Next notes

  1. Add IPv6 and QUIC observation notes
  2. Review caching and protocol benefits with real-user metrics
Scroll down