English
HTTP/2, HTTP/3, and CDN Caching: Read Page Speed from a Waterfall
A fast web application ultimately manifests as a truncated browser network waterfall: rapid DNS resolution, zero-RTT connection establishment, immediate cache hits, and zero head-of-line blocking. Simply enabling an HTTP/3 toggle or throwing a CDN in front of an origin server rarely produces these results without architectural precision. HTTP/3 (QUIC) and CDN caching are fundamentally distinct optimization vectors—one solves transport-layer physics, the other solves geographic constraints.
In this deep dive, we rip apart the mechanics of HTTP/3 QUIC connection IDs, mathematical models for Cache Hit Ratio (CHR) eviction algorithms, UDP Segmentation Offload (USO/GSO) in the kernel, and the exact architectural pitfalls behind scaling BGP Anycast cache tiers.
1. Protocol Streams and QUIC Kernel Mechanics
HTTP/2 represented a massive leap by multiplexing streams over a single TCP connection, but it introduced a fatal flaw: TCP-level Head-of-Line (HoL) blocking. Because TCP is a strict byte-stream protocol, if packet 4 out of 10 drops, packets 5-10 must wait in the kernel's receive buffer until packet 4 is retransmitted. All multiplexed application streams stall.
HTTP/3 operates on QUIC (RFC 9000), a transport protocol built atop UDP. QUIC natively understands independent streams. If a packet carrying data for Stream A drops, Stream B continues processing. Furthermore, QUIC introduces Connection IDs (CIDs), allowing clients to roam across IP addresses (e.g., switching from Wi-Fi to Cellular) without breaking the connection.
At the kernel level, serving HTTP/3 efficiently requires bypassing UDP's historically poor performance. High-performance HTTP/3 servers (like Nginx with quiche, or Cloudflare's pingora) rely heavily on UDP_SEGMENT (UDP GSO - Generic Segmentation Offload) to process massive UDP datagrams in a single syscall.
/* Excerpt from linux/net/ipv4/udp.c handling UDP GSO */
int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4, ...)
{
/* If GSO is enabled, pass a massive super-buffer down the stack */
if (skb_is_gso(skb))
udp_csum_outgoing(skb, sk);
else
/* standard slow path */
return ip_send_skb(net, skb);
}
By batching QUIC packets via UDP GSO, servers reduce CPU context switching overhead by an order of magnitude, making QUIC computationally competitive with heavily optimized hardware-offloaded TCP.
Visualizing QUIC Handshakes and Cache Tiers
Below is an architectural diagram outlining a BGP Anycast CDN. The client executes a 0-RTT QUIC handshake, followed by edge caching evaluated via Consistent Hashing.
sequenceDiagram
participant Client
participant Edge as Anycast Edge (PoP)
participant Origin as Origin Backend
rect rgb(240, 248, 255)
Note over Client, Edge: QUIC 0-RTT (Early Data) via Session Ticket
Client->>Edge: Initial + 0-RTT CRYPTO Stream [GET /api/data]
Edge-->>Client: Handshake Done (1-RTT Confirmed)
end
alt Cache HIT (Tier 1 Edge RAM)
Edge-->>Client: 200 OK (Stream 0)
Note over Client, Edge: Origin bypassed. RTT minimal.
else Cache MISS (Evaluating Zipf's Law Eviction)
Edge->>Origin: Forward via Internal Backbone (H2/H3)
Origin-->>Edge: 200 OK
Edge->>Edge: Write to NVMe Cache Tier
Edge-->>Client: 200 OK (Stream 0)
end
2. The Mathematics of Cache Hit Ratios (CHR)
CDN performance is governed mathematically by the Cache Hit Ratio (CHR). Serving 99% of traffic from the edge relies on understanding Zipf's Law, which states that the frequency of an asset request is inversely proportional to its rank. The probability ( P ) of requesting the ( k )-th most popular object out of ( N ) total objects is:
$$ P_k = frac{1/k^s}{sum_{i=1}^{N} (1/i^s)} $$
For web assets, ( s ) is typically near 1. To maximize CHR, CDNs utilize advanced eviction algorithms beyond simple LRU (Least Recently Used). Modern distributed caches employ algorithms like W-TinyLFU (Window Tiny Least Frequently Used) and Consistent Rendezvous Hashing to map requests to specific backend cache shards.
If we model the latency mathematically fixing network RTT at 36 ms, DNS hit at 1 ms, and cache lookup at 4 ms:
H2 MISS = DNS 42 + TCP+TLS Connect 72 (2 RTT) + Origin Path 62 = 176 ms
H3 MISS = DNS 42 + QUIC Connect 36 (1 RTT) + Origin Path 62 = 140 ms
H3 0-RTT HIT = DNS 1 + Connect 0 (Early Data) + Edge Lookup 4 = 5 ms
Moving from a cold HTTP/2 MISS to a warm HTTP/3 0-RTT HIT mathematically obliterates 171ms of dead waiting time.
3. Deep Debugging: qlog and `stale-while-revalidate`
To debug QUIC internals, relying on traditional tcpdump is insufficient because QUIC completely encrypts the transport headers (including packet numbers). Engineers must use qlog, a standard JSON-based logging format output by QUIC libraries, enabling deep analysis of HoL blocking events and stream flow control.
# Enable qlog on a supported server (e.g., quiche)
export QLOGDIR=/var/log/quic/
# Load the resulting .qlog file into qvis (QUIC Visualizer)
On the caching side, the most dangerous race condition is the Cache Stampede (Thundering Herd). When a highly requested JSON payload expires, thousands of concurrent requests miss the cache and hammer the origin server simultaneously. This is mitigated using stale-while-revalidate in the Cache-Control header:
# The CDN will serve stale data to users while asynchronously fetching an update
Cache-Control: max-age=60, stale-while-revalidate=86400
# Verify Cache Status headers via cURL
curl -I --http3 https://example.com/api/data
# HTTP/3 200
# cf-cache-status: STALE
# alt-svc: h3=":443"; ma=86400
4. Production Architecture Post-Mortem
BGP Anycast and UDP Path Asymmetry
When migrating a massive mobile application API to HTTP/3, we encountered an agonizing issue: QUIC connections were randomly stalling. The root cause lay in our BGP Anycast routing architecture interacting with UDP path asymmetry.
In BGP Anycast, multiple edge locations broadcast the same IP address. Mobile networks frequently change internal IP paths. With TCP, asymmetric routing is usually fine because stateful firewalls track the connection. With UDP (QUIC), the carrier-grade NAT (CGNAT) gateways and stateful edge load-balancers failed to map the returning UDP packets back to the original connection state. Furthermore, a BGP flap would route the client's UDP packets to a different physical PoP (Point of Presence).
To resolve this, we had to implement stateless QUIC connection ID routing at our edge load balancers (Layer 4). Instead of hashing on the 5-tuple (Source IP/Port), our eBPF XDP programs extracted the QUIC Connection ID from the UDP payload and routed the packet consistently to the same backend server, regardless of how many times the client's IP changed or which PoP received the traffic.
5. Automated Modeling Script
We use Python to deterministically simulate the latency reductions and calculate the Zipf distribution impact on the CDN tier.
python src/http_cdn_waterfall.py
# Simulates QUIC handshake combinations against CDN cache tiers
# Simplified cache stampede modeling
total_h2_miss = 42 + (36 * 2) + 36 + 26
total_h3_0rtt = 1 + 0 + 4 # 0-RTT Early Data Hit
modeled_saving = total_h2_miss - total_h3_0rtt
print(f"Total time saved migrating from H2 MISS to H3 0-RTT: {modeled_saving}ms")
Full datasets of these architectural simulations are stored in http-cdn-waterfall-results.csv.
6. Animated Walkthrough
7. Engineering Heuristics & Anti-Patterns
- Vary Header Misconfigurations: Setting
Vary: User-Agentdestroys your Cache Hit Ratio. The CDN will cache a different copy for every minor browser version string. UseVary: Accept-Encodinginstead. - Ignoring UDP Rate Limits: Many enterprise firewalls treat UDP 443 traffic as suspicious peer-to-peer or DDoS traffic, throttling it. You must ensure seamless fallback to HTTP/2 via the
Alt-Svcheader. - CPU Saturation on QUIC: Migrating to HTTP/3 requires 2x-3x more CPU at the load balancer compared to TCP, due to the user-space cryptography and lack of mature hardware offloads. Provision your proxy fleet accordingly.
FAQ
Why did HTTP/3 abandon TCP entirely?
You cannot fix TCP-level HoL blocking without fundamentally altering the TCP packet header format and state machine. Such changes would take decades to deploy because thousands of middleboxes (NATs, firewalls, ISPs) hardcode their understanding of the existing TCP header. Building QUIC over UDP encrypts the transport state, completely hiding it from interfering middleboxes.
Does a CDN HIT guarantee a fast Largest Contentful Paint (LCP)?
No. A Cache Hit guarantees a fast Time to First Byte (TTFB). If the browser receives the HTML instantly but must then parse 4MB of blocking JavaScript before rendering the DOM, your LCP will still be abysmal.
References
By mastering routing, TCP reliability, and finally HTTP/3 caching, you now possess the complete end-to-end framework necessary to architect ultra-low-latency global networks.
Chinese
HTTP/2、HTTP/3 与 CDN 缓存:从网络瀑布图理解网页加载速度
Open as a full page极致的网页加载速度,在浏览器网络面板中最终体现为被极端压缩的瀑布流:毫秒级的 DNS 解析、零 RTT 的建连、边缘缓存的瞬间命中,以及零队头阻塞(Head-of-Line Blocking)的数据流。但在生产环境中,指望在服务器上随便敲个开关启用 HTTP/3,或者买个 CDN 挡在前面,是绝对达不到这种效果的。HTTP/3 (QUIC) 致力于打破物理传输层的性能上限,而 CDN 缓存则用来抹平地理距离带来的光速延迟。
在这篇硬核深挖中,我们将拆解 HTTP/3 QUIC 连接 ID 的底层路由机制,推导缓存命中率(CHR)与齐普夫定律(Zipf's Law)的数学模型,深入 Linux 内核解析 UDP GSO 的硬件卸载,并复盘大规模 BGP Anycast 架构下的缓存雪崩与路由漂移事故。
一、协议流重构与 QUIC 内核极致优化
HTTP/2 的出现是革命性的,它在一个 TCP 连接上多路复用(Multiplexing)了多个数据流,但这也引爆了一个致命缺陷:TCP 层的队头阻塞(HoL Blocking)。由于 TCP 是一种严格保序的字节流协议,如果在网络传输中第 4 个包丢失了,哪怕第 5 到第 10 个包已经到达网卡,内核也会将它们死死压在接收缓冲区(Receive Buffer)里,直到第 4 个包被重传为止。在这个期间,所有应用层的数据流全部陷入停滞。
HTTP/3 彻底抛弃了 TCP,将基石建立在 QUIC(基于 UDP 构建的传输层协议,RFC 9000)之上。QUIC 原生地理解“独立流”的概念:数据流 A 的丢包绝不会阻塞数据流 B。更关键的是,QUIC 引入了 连接 ID (Connection ID, CID),这意味着即使用户的 IP 地址发生突变(例如从 Wi-Fi 切换到蜂窝数据网络),只要 CID 不变,连接就不会断开,无需重新进行极其耗时的三次握手和 TLS 协商。
在 Linux 内核层面,要让基于 UDP 的 HTTP/3 跑出能与 TCP 媲美的性能,极其依赖网卡的硬件卸载能力。高性能 Web 服务器(如 Nginx 的 quiche 分支,或 Cloudflare 的 Pingora)会深度调用 UDP_SEGMENT (Generic Segmentation Offload, GSO),在一个系统调用内处理海量数据报文。
/* 节选自 linux/net/ipv4/udp.c 中的 UDP GSO 发送逻辑 */
int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4, ...)
{
/* 如果开启了 GSO,内核直接将包含巨大 Payload 的超级帧下放 */
if (skb_is_gso(skb))
udp_csum_outgoing(skb, sk);
else
/* 传统的慢速单包处理路径 */
return ip_send_skb(net, skb);
}
通过使用 UDP GSO 将 QUIC 封包批量推向底层,服务器能将 CPU 上下文切换的开销降低一个数量级,这使得 QUIC 在极高并发下的算力消耗勉强能与高度成熟的 TCP 协议栈扳手腕。
可视化:BGP Anycast 架构下的 QUIC 与 缓存分层
下面是一个 Mermaid 架构图,展示了客户端如何通过 0-RTT 握手连接至 BGP Anycast 边缘节点,并通过一致性哈希(Consistent Hashing)穿透缓存层级:
sequenceDiagram
participant Client
participant Edge as Anycast 边缘节点 (PoP)
participant Origin as 源站集群
rect rgb(240, 248, 255)
Note over Client, Edge: QUIC 0-RTT 握手 (基于 Session Ticket 的早期数据)
Client->>Edge: 初始握手包 + 0-RTT CRYPTO 流 [GET /api/data]
Edge-->>Client: 握手确认 (1-RTT 完成状态机固化)
end
alt 缓存命中 (Cache HIT - 边缘内存层)
Edge-->>Client: 200 OK (Stream 0 直接推流)
Note over Client, Edge: 彻底绕过源站,将 RTT 压至极低。
else 缓存未命中 (执行 Zipf's 淘汰算法)
Edge->>Origin: 通过专线骨干网透传 (H2/H3)
Origin-->>Edge: 200 OK (数据库渲染完毕)
Edge->>Edge: 并行写入 NVMe 缓存层
Edge-->>Client: 200 OK (流式交付给客户端)
end
二、缓存命中率 (CHR) 的数学法则
CDN 的终极奥义受制于严格的数学规律:缓存命中率(CHR)。要在边缘节点拦截 99% 的流量,我们必须理解 齐普夫定律 (Zipf's Law):在互联网流量中,资源的访问频率与其热度排名成反比。如果在 ( N ) 个资源中,第 ( k ) 名的热门资源被访问的概率 ( P ) 为:
$$ P_k = frac{1/k^s}{sum_{i=1}^{N} (1/i^s)} $$
对于现代 Web 应用,( s ) 的值通常在 1 左右。为了在有限的 SSD 和内存条上最大化 CHR,顶尖的 CDN 厂商早就淘汰了简单的 LRU (最近最少使用) 算法。取而代之的是 W-TinyLFU (Window Tiny Least Frequently Used) 以及利用 Rendezvous Hashing (最高随机权重哈希) 来将海量请求精准分发到后台缓存分片上。
如果我们将理论延迟建立数学模型(固定网络 RTT 为 36 ms,DNS MISS 42 ms,边缘读取 4 ms):
H2 纯冷 MISS = DNS 42 + TCP/TLS 握手 72(2 RTT) + 回源骨干网 62 = 176 ms
H3 纯冷 MISS = DNS 42 + QUIC 握手 36(1 RTT) + 回源骨干网 62 = 140 ms
H3 0-RTT 命中 = DNS 1(缓存) + QUIC 握手 0 + 边缘读取 4 = 5 ms
在数学模型上,从传统的 HTTP/2 冷启动切换到带有 CDN 热缓存的 HTTP/3 0-RTT 请求,生生抹除掉了 171ms 的绝对物理等待时间。
三、硬核排障工具:qlog 与 `stale-while-revalidate`
在调试 QUIC 时,网络工程师最常用的 tcpdump 变成了瞎子。因为 QUIC 极端注重隐私,不仅加密了应用层数据,甚至连传输层的包序列号(Packet Number)和标志位都被加密了。高级 SRE 必须依赖 qlog(一种 JSON 格式的 QUIC 内部日志标准)导入可视化工具中,才能定位队头阻塞或流量控制窗口受限的问题。
# 在 Nginx/Quiche 代理服务器上开启 qlog 导出
export QLOGDIR=/var/log/quic/
# 将生成的 .qlog 文件导入 qvis 工具链中进行帧级分析
而在缓存架构上,最凶险的生产事故叫做 缓存击穿 / 缓存雪崩 (Cache Stampede)。当某个百万 QPS 的首页接口 JSON 缓存突然到期时,所有的并发请求会瞬间穿透 CDN 防线,像海啸一样砸向源站的 MySQL 数据库。工程上唯一优雅的解法是使用 stale-while-revalidate:
# CDN 会给用户立即返回一份过期的缓存,并在后台悄悄发起异步回源更新
Cache-Control: max-age=60, stale-while-revalidate=86400
# 使用带有 HTTP/3 模块的 cURL 进行缓存状态验证
curl -I --http3 https://example.com/api/data
# HTTP/3 200
# cf-cache-status: STALE (证明异步回源策略生效)
# alt-svc: h3=":443"; ma=86400
四、深度生产架构事故复盘
BGP Anycast 与 UDP 路径不对称的惨痛教训
在为一款亿级日活的 App 灰度推行 HTTP/3 时,我们收到了大量“进度条卡死”的投诉。经过几个通宵的抓包,元凶锁定在了 BGP Anycast 架构与 UDP 协议特性的物理冲突上。
在 BGP Anycast 架构下,全国所有的边缘节点都宣告同一个公网 IP。移动端网络环境极其恶劣,路由路径(Path)可能秒级变动。在 TCP 时代,这勉强能忍,因为各大运营商的防火墙和 NAT 设备都是有状态的(Stateful),它们会死盯 TCP 标志位。但到了 UDP(QUIC)时代,那些廉价的运营商 NAT 网关一旦发现路径轻微变动,就会丢失映射表,或者 BGP 路由震荡直接把客户端的包甩到了另一个物理机房(PoP)。新机房压根不认识这个 UDP 包,直接丢弃,导致连接挂起。
为了拯救这项技术,我们必须在四层(L4)负载均衡器上开发无状态的 QUIC CID 路由。我们编写了复杂的 eBPF XDP 程序,它不再通过传统的五元组(源 IP/端口)进行哈希转发,而是暴力拆解 UDP Payload,提取出处于明文状态的 QUIC Connection ID。无论客户端 IP 怎么飞,只要 CID 不变,eBPF 就会强行把数据包精准重定向到后台同一台处理握手状态的服务器上,彻底解决了连接迁移断流的业界级难题。
五、自动化时延计算脚本
我们采用 Python 来构建确定性的延迟削减模型,并评估 Zipf 分布对多级存储的影响:
python src/http_cdn_waterfall.py
# 执行针对不同 CDN 命中率的握手模拟
# 简化的防雪崩时间节约测算
total_h2_miss = 42 + (36 * 2) + 36 + 26
total_h3_0rtt = 1 + 0 + 4 # 极致的 0-RTT 早期数据命中
modeled_saving = total_h2_miss - total_h3_0rtt
print(f"从传统 H2 架构升级到 H3 0-RTT 边缘缓存,总体削减耗时: {modeled_saving}ms")
完整的架构压测仿真数据归档在 http-cdn-waterfall-results.csv 中。
六、动画讲解:0-RTT 与 Cache Path
七、工程避坑指南 (Anti-Patterns)
- 愚蠢的 Vary 响应头: 如果你在 Nginx 里配置了
Vary: User-Agent,你就是在谋杀你的缓存命中率。CDN 会把同一个页面根据几千种不同的浏览器 UA 存成几千份副本,命中率瞬间归零。绝大多数情况下,你只需要Vary: Accept-Encoding。 - 无视 UDP 降速惩罚: 许多保守的企业级防火墙(甚至部分劣质运营商)依然把海量 UDP 流视为 DDoS 攻击进行无差别限速。如果你的 Web 服务器不配置正确的
Alt-Svc响应头来引导浏览器平滑降级回 HTTP/2,大量用户的体验会比用 HTTP/1.1 还要糟糕。 - CPU 算力雪崩: 从内核态的 TCP 迁移到基于用户态加密栈的 HTTP/3,你的 7 层负载均衡器 CPU 占用率很可能会飙升 2 到 3 倍。在铺开规模之前,必须留足充分的算力冗余。
FAQ
为什么 HTTP/3 非要另起炉灶用 UDP,而不是直接修好 TCP?
因为 TCP 的队头阻塞是刻在协议头部和状态机基因里的。要修复它,必须修改 TCP 的封包格式。然而全世界有成百上千万的中间盒(NAT、防火墙、IPS)将旧版 TCP 格式硬编码在 ASIC 芯片里了,如果发送不合规的 TCP 包,就会被直接丢弃。只有把新协议套在大家都不管的 UDP 壳子里并全程加密,才能骗过这些中间盒,实现传输层的演进。
只要 CDN 命中了 (Cache Hit),用户的首屏渲染速度 (LCP) 就一定快吗?
绝对不是。Cache Hit 只保证了极速的首字节时间 (TTFB)。如果浏览器瞬间下载了 HTML,但页面里塞满了几十兆阻塞渲染的巨无霸 JavaScript 框架,用户的屏幕依然会白板很久,你的 Core Web Vitals 依然会不及格。
References
- RFC 9000: QUIC (A UDP-Based Multiplexed and Secure Transport)
- RFC 9114: HTTP/3
- Cloudflare: 打造 Pingora 的底层架构
- The Qlog Specification (QUIC 日志标准)
从最底层的 CIDR 路由切片、TCP 可靠性重传,再到最顶层的 HTTP/3 多路复用与全球缓存分发,你现在已经掌握了构建超低延迟骨干网络的完整技术栈闭环。
A fast web application ultimately manifests as a truncated browser network waterfall: rapid DNS resolution, zero-RTT connection establishment, immediate cache hits, and zero head-of-line blocking. Simply enabling an HTTP/3 toggle or throwing a CDN in front of an origin server rarely produces these results without architectural precision. HTTP/3 (QUIC) and CDN caching are fundamentally distinct optimization vectors—one solves transport-layer physics, the other solves geographic constraints.
In this deep dive, we rip apart the mechanics of HTTP/3 QUIC connection IDs, mathematical models for Cache Hit Ratio (CHR) eviction algorithms, UDP Segmentation Offload (USO/GSO) in the kernel, and the exact architectural pitfalls behind scaling BGP Anycast cache tiers.
1. Protocol Streams and QUIC Kernel Mechanics
HTTP/2 represented a massive leap by multiplexing streams over a single TCP connection, but it introduced a fatal flaw: TCP-level Head-of-Line (HoL) blocking. Because TCP is a strict byte-stream protocol, if packet 4 out of 10 drops, packets 5-10 must wait in the kernel’s receive buffer until packet 4 is retransmitted. All multiplexed application streams stall.
HTTP/3 operates on QUIC (RFC 9000), a transport protocol built atop UDP. QUIC natively understands independent streams. If a packet carrying data for Stream A drops, Stream B continues processing. Furthermore, QUIC introduces Connection IDs (CIDs), allowing clients to roam across IP addresses (e.g., switching from Wi-Fi to Cellular) without breaking the connection.
At the kernel level, serving HTTP/3 efficiently requires bypassing UDP’s historically poor performance. High-performance HTTP/3 servers (like Nginx with quiche, or Cloudflare’s pingora) rely heavily on UDP_SEGMENT (UDP GSO – Generic Segmentation Offload) to process massive UDP datagrams in a single syscall.
/* Excerpt from linux/net/ipv4/udp.c handling UDP GSO */
int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4, ...)
{
/* If GSO is enabled, pass a massive super-buffer down the stack */
if (skb_is_gso(skb))
udp_csum_outgoing(skb, sk);
else
/* standard slow path */
return ip_send_skb(net, skb);
}
By batching QUIC packets via UDP GSO, servers reduce CPU context switching overhead by an order of magnitude, making QUIC computationally competitive with heavily optimized hardware-offloaded TCP.
Visualizing QUIC Handshakes and Cache Tiers
Below is an architectural diagram outlining a BGP Anycast CDN. The client executes a 0-RTT QUIC handshake, followed by edge caching evaluated via Consistent Hashing.
sequenceDiagram
participant Client
participant Edge as Anycast Edge (PoP)
participant Origin as Origin Backend
rect rgb(240, 248, 255)
Note over Client, Edge: QUIC 0-RTT (Early Data) via Session Ticket
Client->>Edge: Initial + 0-RTT CRYPTO Stream [GET /api/data]
Edge-->>Client: Handshake Done (1-RTT Confirmed)
end
alt Cache HIT (Tier 1 Edge RAM)
Edge-->>Client: 200 OK (Stream 0)
Note over Client, Edge: Origin bypassed. RTT minimal.
else Cache MISS (Evaluating Zipf's Law Eviction)
Edge->>Origin: Forward via Internal Backbone (H2/H3)
Origin-->>Edge: 200 OK
Edge->>Edge: Write to NVMe Cache Tier
Edge-->>Client: 200 OK (Stream 0)
end
2. The Mathematics of Cache Hit Ratios (CHR)
CDN performance is governed mathematically by the Cache Hit Ratio (CHR). Serving 99% of traffic from the edge relies on understanding Zipf’s Law, which states that the frequency of an asset request is inversely proportional to its rank. The probability ( P ) of requesting the ( k )-th most popular object out of ( N ) total objects is:
$$ P_k = frac{1/k^s}{sum_{i=1}^{N} (1/i^s)} $$
For web assets, ( s ) is typically near 1. To maximize CHR, CDNs utilize advanced eviction algorithms beyond simple LRU (Least Recently Used). Modern distributed caches employ algorithms like W-TinyLFU (Window Tiny Least Frequently Used) and Consistent Rendezvous Hashing to map requests to specific backend cache shards.
If we model the latency mathematically fixing network RTT at 36 ms, DNS hit at 1 ms, and cache lookup at 4 ms:
H2 MISS = DNS 42 + TCP+TLS Connect 72 (2 RTT) + Origin Path 62 = 176 ms
H3 MISS = DNS 42 + QUIC Connect 36 (1 RTT) + Origin Path 62 = 140 ms
H3 0-RTT HIT = DNS 1 + Connect 0 (Early Data) + Edge Lookup 4 = 5 ms
Moving from a cold HTTP/2 MISS to a warm HTTP/3 0-RTT HIT mathematically obliterates 171ms of dead waiting time.
3. Deep Debugging: qlog and `stale-while-revalidate`
To debug QUIC internals, relying on traditional tcpdump is insufficient because QUIC completely encrypts the transport headers (including packet numbers). Engineers must use qlog, a standard JSON-based logging format output by QUIC libraries, enabling deep analysis of HoL blocking events and stream flow control.
# Enable qlog on a supported server (e.g., quiche)
export QLOGDIR=/var/log/quic/
# Load the resulting .qlog file into qvis (QUIC Visualizer)
On the caching side, the most dangerous race condition is the Cache Stampede (Thundering Herd). When a highly requested JSON payload expires, thousands of concurrent requests miss the cache and hammer the origin server simultaneously. This is mitigated using stale-while-revalidate in the Cache-Control header:
# The CDN will serve stale data to users while asynchronously fetching an update
Cache-Control: max-age=60, stale-while-revalidate=86400
# Verify Cache Status headers via cURL
curl -I --http3 https://example.com/api/data
# HTTP/3 200
# cf-cache-status: STALE
# alt-svc: h3=":443"; ma=86400
4. Production Architecture Post-Mortem
BGP Anycast and UDP Path Asymmetry
When migrating a massive mobile application API to HTTP/3, we encountered an agonizing issue: QUIC connections were randomly stalling. The root cause lay in our BGP Anycast routing architecture interacting with UDP path asymmetry.
In BGP Anycast, multiple edge locations broadcast the same IP address. Mobile networks frequently change internal IP paths. With TCP, asymmetric routing is usually fine because stateful firewalls track the connection. With UDP (QUIC), the carrier-grade NAT (CGNAT) gateways and stateful edge load-balancers failed to map the returning UDP packets back to the original connection state. Furthermore, a BGP flap would route the client’s UDP packets to a different physical PoP (Point of Presence).
To resolve this, we had to implement stateless QUIC connection ID routing at our edge load balancers (Layer 4). Instead of hashing on the 5-tuple (Source IP/Port), our eBPF XDP programs extracted the QUIC Connection ID from the UDP payload and routed the packet consistently to the same backend server, regardless of how many times the client’s IP changed or which PoP received the traffic.
5. Automated Modeling Script
We use Python to deterministically simulate the latency reductions and calculate the Zipf distribution impact on the CDN tier.
python src/http_cdn_waterfall.py
# Simulates QUIC handshake combinations against CDN cache tiers
# Simplified cache stampede modeling
total_h2_miss = 42 + (36 * 2) + 36 + 26
total_h3_0rtt = 1 + 0 + 4 # 0-RTT Early Data Hit
modeled_saving = total_h2_miss - total_h3_0rtt
print(f"Total time saved migrating from H2 MISS to H3 0-RTT: {modeled_saving}ms")
Full datasets of these architectural simulations are stored in http-cdn-waterfall-results.csv.
6. Animated Walkthrough
7. Engineering Heuristics & Anti-Patterns
- Vary Header Misconfigurations: Setting
Vary: User-Agentdestroys your Cache Hit Ratio. The CDN will cache a different copy for every minor browser version string. UseVary: Accept-Encodinginstead. - Ignoring UDP Rate Limits: Many enterprise firewalls treat UDP 443 traffic as suspicious peer-to-peer or DDoS traffic, throttling it. You must ensure seamless fallback to HTTP/2 via the
Alt-Svcheader. - CPU Saturation on QUIC: Migrating to HTTP/3 requires 2x-3x more CPU at the load balancer compared to TCP, due to the user-space cryptography and lack of mature hardware offloads. Provision your proxy fleet accordingly.
FAQ
Why did HTTP/3 abandon TCP entirely?
You cannot fix TCP-level HoL blocking without fundamentally altering the TCP packet header format and state machine. Such changes would take decades to deploy because thousands of middleboxes (NATs, firewalls, ISPs) hardcode their understanding of the existing TCP header. Building QUIC over UDP encrypts the transport state, completely hiding it from interfering middleboxes.
Does a CDN HIT guarantee a fast Largest Contentful Paint (LCP)?
No. A Cache Hit guarantees a fast Time to First Byte (TTFB). If the browser receives the HTML instantly but must then parse 4MB of blocking JavaScript before rendering the DOM, your LCP will still be abysmal.
References
By mastering routing, TCP reliability, and finally HTTP/3 caching, you now possess the complete end-to-end framework necessary to architect ultra-low-latency global networks.
Search questions
FAQ
Who is this article for?
This article is for readers who want a professional-level guide to HTTP/2, HTTP/3, and CDN Caching: Read Page Speed from a Waterfall. It takes about 14 min and focuses on HTTP/2, HTTP/3, QUIC, CDN.
What should I read next?
The recommended next step is Forward Proxy vs Reverse Proxy: Connection Paths, Trust Boundaries, and Latency, so the article connects into a longer learning route instead of ending as an isolated note.
Does this article include runnable code or companion resources?
Yes. Use the run notes, resource cards, and download links on the page to reproduce the example or inspect the companion files.
How does this article fit into the larger site?
It is connected to the article context block, learning routes, resources, and project timeline so readers can move from concept to implementation.
Article context
Network Fundamentals
A reproducible route through DNS, TCP, TLS, HTTP/3, proxy tunnels, load balancing, and shared caches with code and figures.
Your next step
Continue: Forward Proxy vs Reverse Proxy: Connection Paths, Trust Boundaries, and LatencyA deterministic browser-waterfall model for HTTP/2, HTTP/3, QUIC streams, and CDN cache hits or misses.
Download share card Open share centerCompanion resources
Network Fundamentals / GUIDE
Network Fundamentals Lab README
Setup, no-privilege safety boundary, ten Python experiments, and three C examples.
Network Fundamentals / DATASET
HTTP/CDN waterfall results CSV
Phase timing for HTTP/2 and HTTP/3 in cold and warm cache models.
Network Fundamentals / ARCHIVE
Network fundamentals full lab bundle
Bundles Python/C source, fixed scenarios, ten result CSVs, and protocol/proxy figures.
Network Fundamentals / TOOL
Network request path visualizer
Adjust TTL, prefixes, loss, handshake RTT, and cache paths in the browser.
Project timeline
Published posts
- DNS Resolution Explained: Build a TTL Cache and Packet Parser in Python A runnable DNS guide covering resolution paths, response headers, TTL cache latency, and deterministic Python/C experiments.
- CIDR, Longest Prefix Match, and MTU: Calculate IP Routing Step by Step Calculate CIDR ranges, longest-prefix route choice, and MTU/MSS payload segmentation with runnable Python and C examples.
- TCP Reliability and Congestion Window: A Runnable Sequence Number Experiment Track TCP sequence numbers, cumulative ACKs, loss, retransmission, and congestion-window changes with safe local experiments.
- HTTPS and TLS 1.3 Handshake: Keys, Certificates, and RTT in Practice Understand TLS 1.3 message flights, certificate authentication, ephemeral key agreement, and handshake latency with a safe teaching model.
- HTTP/2, HTTP/3, and CDN Caching: Read Page Speed from a Waterfall A deterministic browser-waterfall model for HTTP/2, HTTP/3, QUIC streams, and CDN cache hits or misses.
- Forward Proxy vs Reverse Proxy: Connection Paths, Trust Boundaries, and Latency A reproducible guide to forward proxies, reverse proxies, tunnels, TLS boundaries, and latency segments.
- HTTP CONNECT and HTTPS Proxy Tunnels: TLS Boundaries and Handshake Latency An RFC-based explanation of CONNECT tunnels, encrypted HTTPS payloads, and modeled first-request latency.
- SOCKS5 Proxy Explained: Protocol Bytes, DNS Resolution Boundaries, and Leakage Risk Decode safe SOCKS5 CONNECT bytes and compare local-DNS and proxy-side hostname resolution boundaries.
- Reverse Proxy Load Balancing: Queues, Health Checks, and a Reproducible Scheduler Compare round robin and load-aware queue selection while reasoning about health checks and retry boundaries.
- Proxy Cache Revalidation: Cache-Control, ETag, and Observable Correctness Use an RFC 9111 shared-cache model to calculate MISS, HIT, and 304 revalidation latency and correctness boundaries.
Published resources
- Network Fundamentals Lab README Setup, no-privilege safety boundary, ten Python experiments, and three C examples.
- Network fundamentals full lab bundle Bundles Python/C source, fixed scenarios, ten result CSVs, and protocol/proxy figures.
- DNS TTL results CSV HIT/MISS state, expiry, and latency for four fixed lookups.
- CIDR and MTU results CSV Longest-prefix route and 3600-byte payload segmentation results.
- TCP cwnd events CSV Per-round ACK, window, and deterministic retransmission events.
- TLS 1.3 flight results CSV Message direction, timing, and teaching shared value in a fixed RTT model.
- HTTP/CDN waterfall results CSV Phase timing for HTTP/2 and HTTP/3 in cold and warm cache models.
- Proxy path latency results CSV Phase timing for direct access, forward-proxy tunneling, and reverse-proxy cache paths.
- CONNECT/TLS timeline CSV Records CONNECT authority, tunnel establishment, and the encrypted HTTPS-request boundary.
- SOCKS5 DNS boundary CSV Stores ATYP, destination bytes, request length, and modeled local DNS counts.
- Proxy load-balancing queue CSV Compares backend selection and queue waiting for round robin and least queue.
- Proxy cache revalidation CSV Records MISS, HIT, 304 revalidation, object age, and response latency.
- Network request path visualizer Adjust TTL, prefixes, loss, handshake RTT, and cache paths in the browser.
- Network fundamentals topic share card A 1200x630 SVG card for the DNS, TLS, HTTP/3, proxy tunnel, and caching topic hub.
Next notes
- Add IPv6 and QUIC observation notes
- Review caching and protocol benefits with real-user metrics
