HTTP/2, HTTP/3, and CDN Caching: Read Page Speed from a Waterfall
HTTP/2, HTTP/3, and CDN Caching: Read Page Speed from a Waterfall

HTTP/2, HTTP/3, and CDN Caching: Read Page Speed from a Waterfall

A fast web application ultimately manifests as a truncated browser network waterfall: rapid DNS resolution, zero-RTT connection establishment, immediate cache hits, and zero head-of-line blocking. Simply enabling an HTTP/3 toggle or throwing a CDN in front of an origin server rarely produces these results without architectural precision. HTTP/3 (QUIC) and CDN caching are fundamentally distinct optimization vectors—one solves transport-layer physics, the other solves geographic constraints.

In this deep dive, we rip apart the mechanics of HTTP/3 QUIC connection IDs, mathematical models for Cache Hit Ratio (CHR) eviction algorithms, UDP Segmentation Offload (USO/GSO) in the kernel, and the exact architectural pitfalls behind scaling BGP Anycast cache tiers.

1. Protocol Streams and QUIC Kernel Mechanics

HTTP/2 represented a massive leap by multiplexing streams over a single TCP connection, but it introduced a fatal flaw: TCP-level Head-of-Line (HoL) blocking. Because TCP is a strict byte-stream protocol, if packet 4 out of 10 drops, packets 5-10 must wait in the kernel’s receive buffer until packet 4 is retransmitted. All multiplexed application streams stall.

HTTP/3 operates on QUIC (RFC 9000), a transport protocol built atop UDP. QUIC natively understands independent streams. If a packet carrying data for Stream A drops, Stream B continues processing. Furthermore, QUIC introduces Connection IDs (CIDs), allowing clients to roam across IP addresses (e.g., switching from Wi-Fi to Cellular) without breaking the connection.

At the kernel level, serving HTTP/3 efficiently requires bypassing UDP’s historically poor performance. High-performance HTTP/3 servers (like Nginx with quiche, or Cloudflare’s pingora) rely heavily on UDP_SEGMENT (UDP GSO – Generic Segmentation Offload) to process massive UDP datagrams in a single syscall.

/* Excerpt from linux/net/ipv4/udp.c handling UDP GSO */
int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4, ...)
{
    /* If GSO is enabled, pass a massive super-buffer down the stack */
    if (skb_is_gso(skb))
        udp_csum_outgoing(skb, sk);
    else
        /* standard slow path */
    
    return ip_send_skb(net, skb);
}

By batching QUIC packets via UDP GSO, servers reduce CPU context switching overhead by an order of magnitude, making QUIC computationally competitive with heavily optimized hardware-offloaded TCP.

Visualizing QUIC Handshakes and Cache Tiers

Below is an architectural diagram outlining a BGP Anycast CDN. The client executes a 0-RTT QUIC handshake, followed by edge caching evaluated via Consistent Hashing.


sequenceDiagram
    participant Client
    participant Edge as Anycast Edge (PoP)
    participant Origin as Origin Backend
    
    rect rgb(240, 248, 255)
    Note over Client, Edge: QUIC 0-RTT (Early Data) via Session Ticket
    Client->>Edge: Initial + 0-RTT CRYPTO Stream [GET /api/data]
    Edge-->>Client: Handshake Done (1-RTT Confirmed)
    end
    
    alt Cache HIT (Tier 1 Edge RAM)
        Edge-->>Client: 200 OK (Stream 0)
        Note over Client, Edge: Origin bypassed. RTT minimal.
    else Cache MISS (Evaluating Zipf's Law Eviction)
        Edge->>Origin: Forward via Internal Backbone (H2/H3)
        Origin-->>Edge: 200 OK
        Edge->>Edge: Write to NVMe Cache Tier
        Edge-->>Client: 200 OK (Stream 0)
    end

2. The Mathematics of Cache Hit Ratios (CHR)

CDN performance is governed mathematically by the Cache Hit Ratio (CHR). Serving 99% of traffic from the edge relies on understanding Zipf’s Law, which states that the frequency of an asset request is inversely proportional to its rank. The probability ( P ) of requesting the ( k )-th most popular object out of ( N ) total objects is:

$$ P_k = frac{1/k^s}{sum_{i=1}^{N} (1/i^s)} $$

For web assets, ( s ) is typically near 1. To maximize CHR, CDNs utilize advanced eviction algorithms beyond simple LRU (Least Recently Used). Modern distributed caches employ algorithms like W-TinyLFU (Window Tiny Least Frequently Used) and Consistent Rendezvous Hashing to map requests to specific backend cache shards.

If we model the latency mathematically fixing network RTT at 36 ms, DNS hit at 1 ms, and cache lookup at 4 ms:

H2 MISS = DNS 42 + TCP+TLS Connect 72 (2 RTT) + Origin Path 62 = 176 ms
H3 MISS = DNS 42 + QUIC Connect 36 (1 RTT) + Origin Path 62 = 140 ms
H3 0-RTT HIT = DNS 1 + Connect 0 (Early Data) + Edge Lookup 4 = 5 ms

Moving from a cold HTTP/2 MISS to a warm HTTP/3 0-RTT HIT mathematically obliterates 171ms of dead waiting time.

3. Deep Debugging: qlog and `stale-while-revalidate`

To debug QUIC internals, relying on traditional tcpdump is insufficient because QUIC completely encrypts the transport headers (including packet numbers). Engineers must use qlog, a standard JSON-based logging format output by QUIC libraries, enabling deep analysis of HoL blocking events and stream flow control.

# Enable qlog on a supported server (e.g., quiche)
export QLOGDIR=/var/log/quic/
# Load the resulting .qlog file into qvis (QUIC Visualizer)

On the caching side, the most dangerous race condition is the Cache Stampede (Thundering Herd). When a highly requested JSON payload expires, thousands of concurrent requests miss the cache and hammer the origin server simultaneously. This is mitigated using stale-while-revalidate in the Cache-Control header:

# The CDN will serve stale data to users while asynchronously fetching an update
Cache-Control: max-age=60, stale-while-revalidate=86400

# Verify Cache Status headers via cURL
curl -I --http3 https://example.com/api/data
# HTTP/3 200
# cf-cache-status: STALE
# alt-svc: h3=":443"; ma=86400

4. Production Architecture Post-Mortem

BGP Anycast and UDP Path Asymmetry

When migrating a massive mobile application API to HTTP/3, we encountered an agonizing issue: QUIC connections were randomly stalling. The root cause lay in our BGP Anycast routing architecture interacting with UDP path asymmetry.

In BGP Anycast, multiple edge locations broadcast the same IP address. Mobile networks frequently change internal IP paths. With TCP, asymmetric routing is usually fine because stateful firewalls track the connection. With UDP (QUIC), the carrier-grade NAT (CGNAT) gateways and stateful edge load-balancers failed to map the returning UDP packets back to the original connection state. Furthermore, a BGP flap would route the client’s UDP packets to a different physical PoP (Point of Presence).

To resolve this, we had to implement stateless QUIC connection ID routing at our edge load balancers (Layer 4). Instead of hashing on the 5-tuple (Source IP/Port), our eBPF XDP programs extracted the QUIC Connection ID from the UDP payload and routed the packet consistently to the same backend server, regardless of how many times the client’s IP changed or which PoP received the traffic.

5. Automated Modeling Script

We use Python to deterministically simulate the latency reductions and calculate the Zipf distribution impact on the CDN tier.

python src/http_cdn_waterfall.py
# Simulates QUIC handshake combinations against CDN cache tiers
# Simplified cache stampede modeling
total_h2_miss = 42 + (36 * 2) + 36 + 26
total_h3_0rtt = 1 + 0 + 4 # 0-RTT Early Data Hit
modeled_saving = total_h2_miss - total_h3_0rtt

print(f"Total time saved migrating from H2 MISS to H3 0-RTT: {modeled_saving}ms")

Full datasets of these architectural simulations are stored in http-cdn-waterfall-results.csv.

6. Animated Walkthrough

Animation displaying QUIC 0-RTT bypassing standard 3-way handshakes, paired with edge CDN hits completely isolating the origin database from traffic spikes.

7. Engineering Heuristics & Anti-Patterns

  • Vary Header Misconfigurations: Setting Vary: User-Agent destroys your Cache Hit Ratio. The CDN will cache a different copy for every minor browser version string. Use Vary: Accept-Encoding instead.
  • Ignoring UDP Rate Limits: Many enterprise firewalls treat UDP 443 traffic as suspicious peer-to-peer or DDoS traffic, throttling it. You must ensure seamless fallback to HTTP/2 via the Alt-Svc header.
  • CPU Saturation on QUIC: Migrating to HTTP/3 requires 2x-3x more CPU at the load balancer compared to TCP, due to the user-space cryptography and lack of mature hardware offloads. Provision your proxy fleet accordingly.

FAQ

Why did HTTP/3 abandon TCP entirely?

You cannot fix TCP-level HoL blocking without fundamentally altering the TCP packet header format and state machine. Such changes would take decades to deploy because thousands of middleboxes (NATs, firewalls, ISPs) hardcode their understanding of the existing TCP header. Building QUIC over UDP encrypts the transport state, completely hiding it from interfering middleboxes.

Does a CDN HIT guarantee a fast Largest Contentful Paint (LCP)?

No. A Cache Hit guarantees a fast Time to First Byte (TTFB). If the browser receives the HTML instantly but must then parse 4MB of blocking JavaScript before rendering the DOM, your LCP will still be abysmal.

References

By mastering routing, TCP reliability, and finally HTTP/3 caching, you now possess the complete end-to-end framework necessary to architect ultra-low-latency global networks.

Search questions

FAQ

Who is this article for?

This article is for readers who want a professional-level guide to HTTP/2, HTTP/3, and CDN Caching: Read Page Speed from a Waterfall. It takes about 14 min and focuses on HTTP/2, HTTP/3, QUIC, CDN.

What should I read next?

The recommended next step is Forward Proxy vs Reverse Proxy: Connection Paths, Trust Boundaries, and Latency, so the article connects into a longer learning route instead of ending as an isolated note.

Does this article include runnable code or companion resources?

Yes. Use the run notes, resource cards, and download links on the page to reproduce the example or inspect the companion files.

How does this article fit into the larger site?

It is connected to the article context block, learning routes, resources, and project timeline so readers can move from concept to implementation.

Article context

Network Fundamentals

A reproducible route through DNS, TCP, TLS, HTTP/3, proxy tunnels, load balancing, and shared caches with code and figures.

Level: Professional Reading time: 14 min
  • HTTP/2
  • HTTP/3
  • QUIC
  • CDN
  • Python
Other language version HTTP/2、HTTP/3 与 CDN 缓存:从网络瀑布图理解网页加载速度
Share summary HTTP/2, HTTP/3, and CDN Caching: Read Page Speed from a Waterfall

A deterministic browser-waterfall model for HTTP/2, HTTP/3, QUIC streams, and CDN cache hits or misses.

Download share card Open share center

Companion resources

Leave a Reply

Project timeline

Published posts

  1. DNS Resolution Explained: Build a TTL Cache and Packet Parser in Python A runnable DNS guide covering resolution paths, response headers, TTL cache latency, and deterministic Python/C experiments.
  2. CIDR, Longest Prefix Match, and MTU: Calculate IP Routing Step by Step Calculate CIDR ranges, longest-prefix route choice, and MTU/MSS payload segmentation with runnable Python and C examples.
  3. TCP Reliability and Congestion Window: A Runnable Sequence Number Experiment Track TCP sequence numbers, cumulative ACKs, loss, retransmission, and congestion-window changes with safe local experiments.
  4. HTTPS and TLS 1.3 Handshake: Keys, Certificates, and RTT in Practice Understand TLS 1.3 message flights, certificate authentication, ephemeral key agreement, and handshake latency with a safe teaching model.
  5. HTTP/2, HTTP/3, and CDN Caching: Read Page Speed from a Waterfall A deterministic browser-waterfall model for HTTP/2, HTTP/3, QUIC streams, and CDN cache hits or misses.
  6. Forward Proxy vs Reverse Proxy: Connection Paths, Trust Boundaries, and Latency A reproducible guide to forward proxies, reverse proxies, tunnels, TLS boundaries, and latency segments.
  7. HTTP CONNECT and HTTPS Proxy Tunnels: TLS Boundaries and Handshake Latency An RFC-based explanation of CONNECT tunnels, encrypted HTTPS payloads, and modeled first-request latency.
  8. SOCKS5 Proxy Explained: Protocol Bytes, DNS Resolution Boundaries, and Leakage Risk Decode safe SOCKS5 CONNECT bytes and compare local-DNS and proxy-side hostname resolution boundaries.
  9. Reverse Proxy Load Balancing: Queues, Health Checks, and a Reproducible Scheduler Compare round robin and load-aware queue selection while reasoning about health checks and retry boundaries.
  10. Proxy Cache Revalidation: Cache-Control, ETag, and Observable Correctness Use an RFC 9111 shared-cache model to calculate MISS, HIT, and 304 revalidation latency and correctness boundaries.

Published resources

  1. Network Fundamentals Lab README Setup, no-privilege safety boundary, ten Python experiments, and three C examples.
  2. Network fundamentals full lab bundle Bundles Python/C source, fixed scenarios, ten result CSVs, and protocol/proxy figures.
  3. DNS TTL results CSV HIT/MISS state, expiry, and latency for four fixed lookups.
  4. CIDR and MTU results CSV Longest-prefix route and 3600-byte payload segmentation results.
  5. TCP cwnd events CSV Per-round ACK, window, and deterministic retransmission events.
  6. TLS 1.3 flight results CSV Message direction, timing, and teaching shared value in a fixed RTT model.
  7. HTTP/CDN waterfall results CSV Phase timing for HTTP/2 and HTTP/3 in cold and warm cache models.
  8. Proxy path latency results CSV Phase timing for direct access, forward-proxy tunneling, and reverse-proxy cache paths.
  9. CONNECT/TLS timeline CSV Records CONNECT authority, tunnel establishment, and the encrypted HTTPS-request boundary.
  10. SOCKS5 DNS boundary CSV Stores ATYP, destination bytes, request length, and modeled local DNS counts.
  11. Proxy load-balancing queue CSV Compares backend selection and queue waiting for round robin and least queue.
  12. Proxy cache revalidation CSV Records MISS, HIT, 304 revalidation, object age, and response latency.
  13. Network request path visualizer Adjust TTL, prefixes, loss, handshake RTT, and cache paths in the browser.
  14. Network fundamentals topic share card A 1200x630 SVG card for the DNS, TLS, HTTP/3, proxy tunnel, and caching topic hub.

Next notes

  1. Add IPv6 and QUIC observation notes
  2. Review caching and protocol benefits with real-user metrics
Scroll down