English
Proxy Cache Revalidation: Cache-Control, ETag, and Observable Correctness
Reverse-proxy caches and Content Delivery Networks (CDNs) are fundamentally shared memory paradigms. They possess the capability to multiplex a single origin representation across millions of clients. Consequently, they must rigorously enforce cache coherency, probabilistic eviction logic, and concurrency locks. Observing a HIT in an access log is superficial; the true engineering challenges lie in Shared Memory (shm) slab fragmentation, cache stampedes (thundering herds), and algorithmic replacement models. Caching without mathematical observability is indistinguishable from serving stale, corrupted, or insecure memory segments.
1. Cache Replacement Algorithms: LRU vs. LFU vs. ARC
When a proxy's memory pool fills, it must evict objects. The naive choice is Least Recently Used (LRU), implemented via a doubly-linked list and a hash map. However, LRU is highly susceptible to "cache pollution" from sequential scans (e.g., a nightly backup script scraping all assets).
Modern edge proxies utilize advanced algorithms like ARC (Adaptive Replacement Cache) or W-TinyLFU. ARC mathematically maintains two LRU lists: $L_1$ for recently seen items and $L_2$ for frequently seen items. A tunable parameter $p$ dictates the boundary between them.
The state transition of ARC operates on a Markov Chain model. If a cache miss hits the ghost list $B_1$ (evicted recent items), $p$ is incremented to favor recency. If it hits $B_2$ (evicted frequent items), $p$ is decremented to favor frequency. This creates an autonomous, mathematically optimal eviction threshold.
$$ p_{new} = minleft(c, p_{old} + maxleft(1, frac{|B_2|}{|B_1|}right)right) $$
2. Nginx Shared Memory (shm) Architecture & Lock Contention
In multiprocess proxies like Nginx, the cache index is stored in a shared memory zone (shm_zone). Because multiple worker processes must read/write to this memory concurrently, it requires kernel-level concurrency control.
Nginx manages this using an internal slab allocator (ngx_slab_alloc) to prevent memory fragmentation, highly analogous to the Linux kernel's slab allocator. Synchronization is achieved via spinlocks (ngx_shmtx_t) built on atomic CPU instructions (CMPXCHG).
// Nginx Spinlock acquisition for Cache Key insertion
// Source: ngx_shmtx.c
void ngx_shmtx_lock(ngx_shmtx_t *mtx) {
ngx_uint_t i, n;
for ( ;; ) {
// Atomic compare-and-swap (fast path)
if (*mtx->lock == 0 && ngx_atomic_cmp_set(mtx->lock, 0, ngx_pid)) {
return;
}
// CPU Pause to reduce Cache Line Bouncing / MESI bus traffic
for (n = 1; n < mtx->spin; n <<= 1) {
for (i = 0; i < n; i++) {
ngx_cpu_pause();
}
if (*mtx->lock == 0 && ngx_atomic_cmp_set(mtx->lock, 0, ngx_pid)) {
return;
}
}
// Fallback to kernel futex yield
ngx_shmtx_wakeup(mtx);
}
}
Under massive high-concurrency MISS rates, spinlock contention causes violent CPU cache line bouncing (MESI protocol invalidations). Understanding this code dictates that you must tune proxy_cache_lock on; to collapse concurrent identical MISS requests into a single origin fetch.
3. Mathematical Eradication of Thundering Herds: X-Fetch Algorithm
When a highly requested object's max-age expires, thousands of concurrent requests will suddenly MISS and hit the origin, causing a Database Meltdown (Thundering Herd / Cache Stampede). Standard stale-while-revalidate helps, but what if the proxy restarts? We use Probabilistic Early Expiration (X-Fetch), native to Varnish.
Instead of expiring exactly at TTL, a request has a probability $P$ of preemptively triggering a background revalidation. As the current time $t$ approaches the expiration time $t_{exp}$, the probability exponentially increases.
$$ P(text{fetch}) = 1 - expleft(-frac{Delta t}{beta cdot text{TTL}}right) $$
Where $Delta t = t_{exp} - t$, and $beta$ is a tuning constant controlling the aggressiveness of the prefetch. By injecting randomized jitter into the expiration decision, the deterministic cache stampede is mathematically smoothed into a manageable curve of origin hits, guaranteeing zero latency spikes.
4. eBPF: Profiling Memory Allocation Penalities
To measure true cache latency, observing HTTP headers is insufficient. Using eBPF, we trace the memory allocation functions within the proxy. By hooking uprobe:nginx:ngx_http_file_cache_read and uprobe:nginx:ngx_slab_alloc, we can plot histograms of disk I/O blocking time vs shared memory lookup time.
If the eBPF histogram shows the 99th percentile ($P_{99}$) of ngx_slab_alloc exceeding 10ms, your shared memory zone is heavily fragmented, or lock contention is severe. The solution is not more cache, but increasing proxy_cache_path keys_zone=name:size and adjusting slab sizes.
5. Architecture Observability Checklist
- Vary & Key Fragmentation: Cache keys must deterministically include required headers via the
Varyheader. Failure to normalizeAccept-Encoding(e.g., gzip vs br) results in redundant origin fetches. - Mutex and Request Collapsing: Utilize cache locks (
proxy_cache_lock) to coalesce simultaneous cache misses. - Probabilistic Prefetching: Implement X-Fetch algorithms or
stale-background-fetchto eradicate origin CPU spikes. - eBPF Slab Monitoring: Continuously monitor proxy memory fragmentation using kernel tracing.
References
Chinese
代理缓存与重新验证:Cache-Control、ETag 和可观测性实验
Open as a full page反向代理缓存与 CDN(内容分发网络)在本质上是极致的共享内存(Shared Memory)并发模型。它们具备将单一源站的资源表示,复用给数以百万计的并发客户端的能力。因此,它们必须受到严苛的缓存一致性协议、概率驱逐逻辑(Probabilistic Eviction)以及并发锁机制的严格制约。在访问日志中仅仅盯着一个 HIT 标签是极其肤浅的;真正的工程挑战潜伏在共享内存池(shm slab)碎片化、缓存雪崩/惊群效应(Thundering Herds)以及算法级别的替换模型中。缺乏数学可观测性的缓存系统,与直接给用户返回过期、损坏、甚至发生数据串联的危险内存段毫无区别。
1. 缓存驱逐算法的极限对决:LRU vs LFU vs ARC
当代理服务器的内存池耗尽时,必须驱逐(Evict)部分对象。最简单的做法是最近最少使用(LRU,Least Recently Used),通常通过双向链表和哈希表来实现。然而,LRU 在面对顺序扫描时的“缓存污染”(Cache Pollution)极其脆弱(例如,一个半夜执行的备份脚本遍历了所有的静态资源)。
现代处于边缘计算前沿的代理使用的是高度进化的算法,如 ARC(Adaptive Replacement Cache) 或 W-TinyLFU。ARC 在数学上维护了两个动态伸缩的 LRU 链表:$L_1$ 用于追踪最近看到的对象,$L_2$ 用于追踪频繁看到的对象。一个可自适应调节的边界参数 $p$ 掌控着它们之间的平衡。
ARC 的状态转移机制运作于马尔可夫链(Markov Chain)模型之上。如果一次 Cache Miss 命中了影子链表 $B_1$(存放刚被驱逐的“最近使用”记录),参数 $p$ 就会增加以倾向于保护“最近性”(Recency)。如果命中了 $B_2$(被驱逐的“频繁使用”记录),$p$ 则会减小以倾向于保护“频率”(Frequency)。这就造就了一个完全自治的、在数学上始终趋近最优解的自适应驱逐阈值:
$$ p_{new} = minleft(c, p_{old} + maxleft(1, frac{|B_2|}{|B_1|}right)right) $$
2. Nginx 共享内存架构与自旋锁争用(Lock Contention)
在 Nginx 这种多进程模型的代理中,缓存的索引字典驻留在共享内存区域(shm_zone)中。因为数十个 Worker 进程必须高度并发地对这块内存执行读写操作,这不可避免地引入了内核级别的并发控制问题。
Nginx 采用了其内部专属的 slab 内存分配器(ngx_slab_alloc)来彻底根除内存碎片,其设计思想与 Linux 内核底层的 slab 分配器如出一辙。而多进程的同步则通过构建在 CPU 原子指令(如 CMPXCHG)之上的自旋锁(ngx_shmtx_t)来实现。
// Nginx 为插入 Cache Key 获取共享内存自旋锁的核心源码
// 源码路径: src/core/ngx_shmtx.c
void ngx_shmtx_lock(ngx_shmtx_t *mtx) {
ngx_uint_t i, n;
for ( ;; ) {
// 原子性的 Compare-And-Swap (极速路径)
if (*mtx->lock == 0 && ngx_atomic_cmp_set(mtx->lock, 0, ngx_pid)) {
return;
}
// CPU 暂停指令 (Pause),用于缓解 Cache Line 弹跳 (Bouncing)
// 极大地降低 MESI 总线嗅探风暴
for (n = 1; n < mtx->spin; n <<= 1) {
for (i = 0; i < n; i++) {
ngx_cpu_pause();
}
if (*mtx->lock == 0 && ngx_atomic_cmp_set(mtx->lock, 0, ngx_pid)) {
return;
}
}
// 退化路径:放弃 CPU 时间片,挂起等待内核 Futex 唤醒
ngx_shmtx_wakeup(mtx);
}
}
在应对海量高并发并发 MISS 的极端场景下,自旋锁的剧烈争用会导致 CPU 缓存行疯狂弹跳(Cache Line Bouncing),引发 MESI 协议的全局失效。深刻理解这段底层 C 源码,就决定了你必须在生产环境开启 proxy_cache_lock on; 配置,强制将瞬间爆发的、对同一资源的并发 MISS 请求坍缩(Collapse)为单次唯一的回源(Origin Fetch)请求。
3. 数学层面消灭惊群效应:X-Fetch 概率算法
当一个访问量极高的对象的 max-age 倒计时归零时,成千上万个并发请求会瞬间 MISS 并像海啸一样击穿至源站,导致数据库瞬间熔断(这被称为 Thundering Herd 或 Cache Stampede)。标准的 stale-while-revalidate 机制有所帮助,但如果代理进程刚重启呢?答案是引入 Varnish 中原生的 概率早期过期算法(Probabilistic Early Expiration, X-Fetch)。
算法并不在 TTL 彻底耗尽时一刀切地过期,而是赋予每个请求一个抢占式触发后台重新验证(Revalidation)的概率 $P$。当当前时间 $t$ 越逼近过期时间 $t_{exp}$ 时,触发提前预取的概率就会呈指数级攀升。
$$ P(text{fetch}) = 1 - expleft(-frac{Delta t}{beta cdot text{TTL}}right) $$
这里的 $Delta t = t_{exp} - t$,而 $beta$ 是一个控制预取激进程度的调优常数。通过在过期决策的临界点注入概率学的抖动(Randomized Jitter),那种毁灭性的、确定性的缓存雪崩,在数学上被完美平滑为了一条温和的源站访问量正态分布曲线,保障了系统 $99.99%$ 的 0 延迟毛刺。
4. eBPF:深挖内存分配的性能暗面
若要洞悉缓存最真实的延迟,仅仅抓取 HTTP 标头无异于隔靴搔痒。借助 eBPF 的无侵入式探针,我们可以直接追踪代理进程内部的内存分配函数。通过 Hook uprobe:nginx:ngx_http_file_cache_read 与 uprobe:nginx:ngx_slab_alloc,我们能精准绘制出 磁盘 I/O 阻塞时间 vs 共享内存寻址时间 的立体直方图。
如果 eBPF 直方图无情地指出 ngx_slab_alloc 的 P99 分位数超越了 10 毫秒的红线,那宣告着你的共享内存区已沦为碎片的废墟,或是锁争用已到了水深火热的地步。此时盲目增加硬盘缓存无济于事,救命稻草是立刻扩容 proxy_cache_path keys_zone=name:size 中的内存索引区大小,并针对性地调优 Slab 分配层级。
5. 终极架构可观测性检查清单
- Vary 头与缓存键(Key)的碎片化防御: 缓存键必须确定性地整合
Vary声明的输入维度。如果你连Accept-Encoding(gzip 与 br) 都不做严格归一化,就会导致海量的冗余回源穿透。 - Mutex 锁与请求坍缩: 在面临高热点穿透时,坚定不移地开启
proxy_cache_lock以实现并发 MISS 请求的安全合并。 - 概率性后台预取: 在架构层面植入 X-Fetch 算法思想或
stale-background-fetch策略,将源站 CPU 尖刺扼杀在摇篮中。 - eBPF Slab 持续监控: 将针对代理内核态内存碎片的 eBPF 探针列为最高级别的 24x7 监控基线。
参考资料
Reverse-proxy caches and Content Delivery Networks (CDNs) are fundamentally shared memory paradigms. They possess the capability to multiplex a single origin representation across millions of clients. Consequently, they must rigorously enforce cache coherency, probabilistic eviction logic, and concurrency locks. Observing a HIT in an access log is superficial; the true engineering challenges lie in Shared Memory (shm) slab fragmentation, cache stampedes (thundering herds), and algorithmic replacement models. Caching without mathematical observability is indistinguishable from serving stale, corrupted, or insecure memory segments.
1. Cache Replacement Algorithms: LRU vs. LFU vs. ARC
When a proxy’s memory pool fills, it must evict objects. The naive choice is Least Recently Used (LRU), implemented via a doubly-linked list and a hash map. However, LRU is highly susceptible to “cache pollution” from sequential scans (e.g., a nightly backup script scraping all assets).
Modern edge proxies utilize advanced algorithms like ARC (Adaptive Replacement Cache) or W-TinyLFU. ARC mathematically maintains two LRU lists: $L_1$ for recently seen items and $L_2$ for frequently seen items. A tunable parameter $p$ dictates the boundary between them.
The state transition of ARC operates on a Markov Chain model. If a cache miss hits the ghost list $B_1$ (evicted recent items), $p$ is incremented to favor recency. If it hits $B_2$ (evicted frequent items), $p$ is decremented to favor frequency. This creates an autonomous, mathematically optimal eviction threshold.
$$ p_{new} = minleft(c, p_{old} + maxleft(1, frac{|B_2|}{|B_1|}right)right) $$
2. Nginx Shared Memory (shm) Architecture & Lock Contention
In multiprocess proxies like Nginx, the cache index is stored in a shared memory zone (shm_zone). Because multiple worker processes must read/write to this memory concurrently, it requires kernel-level concurrency control.
Nginx manages this using an internal slab allocator (ngx_slab_alloc) to prevent memory fragmentation, highly analogous to the Linux kernel’s slab allocator. Synchronization is achieved via spinlocks (ngx_shmtx_t) built on atomic CPU instructions (CMPXCHG).
// Nginx Spinlock acquisition for Cache Key insertion
// Source: ngx_shmtx.c
void ngx_shmtx_lock(ngx_shmtx_t *mtx) {
ngx_uint_t i, n;
for ( ;; ) {
// Atomic compare-and-swap (fast path)
if (*mtx->lock == 0 && ngx_atomic_cmp_set(mtx->lock, 0, ngx_pid)) {
return;
}
// CPU Pause to reduce Cache Line Bouncing / MESI bus traffic
for (n = 1; n < mtx->spin; n <<= 1) {
for (i = 0; i < n; i++) {
ngx_cpu_pause();
}
if (*mtx->lock == 0 && ngx_atomic_cmp_set(mtx->lock, 0, ngx_pid)) {
return;
}
}
// Fallback to kernel futex yield
ngx_shmtx_wakeup(mtx);
}
}
Under massive high-concurrency MISS rates, spinlock contention causes violent CPU cache line bouncing (MESI protocol invalidations). Understanding this code dictates that you must tune proxy_cache_lock on; to collapse concurrent identical MISS requests into a single origin fetch.
3. Mathematical Eradication of Thundering Herds: X-Fetch Algorithm
When a highly requested object’s max-age expires, thousands of concurrent requests will suddenly MISS and hit the origin, causing a Database Meltdown (Thundering Herd / Cache Stampede). Standard stale-while-revalidate helps, but what if the proxy restarts? We use Probabilistic Early Expiration (X-Fetch), native to Varnish.
Instead of expiring exactly at TTL, a request has a probability $P$ of preemptively triggering a background revalidation. As the current time $t$ approaches the expiration time $t_{exp}$, the probability exponentially increases.
$$ P(text{fetch}) = 1 – expleft(-frac{Delta t}{beta cdot text{TTL}}right) $$
Where $Delta t = t_{exp} – t$, and $beta$ is a tuning constant controlling the aggressiveness of the prefetch. By injecting randomized jitter into the expiration decision, the deterministic cache stampede is mathematically smoothed into a manageable curve of origin hits, guaranteeing zero latency spikes.
4. eBPF: Profiling Memory Allocation Penalities
To measure true cache latency, observing HTTP headers is insufficient. Using eBPF, we trace the memory allocation functions within the proxy. By hooking uprobe:nginx:ngx_http_file_cache_read and uprobe:nginx:ngx_slab_alloc, we can plot histograms of disk I/O blocking time vs shared memory lookup time.
If the eBPF histogram shows the 99th percentile ($P_{99}$) of ngx_slab_alloc exceeding 10ms, your shared memory zone is heavily fragmented, or lock contention is severe. The solution is not more cache, but increasing proxy_cache_path keys_zone=name:size and adjusting slab sizes.
5. Architecture Observability Checklist
- Vary & Key Fragmentation: Cache keys must deterministically include required headers via the
Varyheader. Failure to normalizeAccept-Encoding(e.g., gzip vs br) results in redundant origin fetches. - Mutex and Request Collapsing: Utilize cache locks (
proxy_cache_lock) to coalesce simultaneous cache misses. - Probabilistic Prefetching: Implement X-Fetch algorithms or
stale-background-fetchto eradicate origin CPU spikes. - eBPF Slab Monitoring: Continuously monitor proxy memory fragmentation using kernel tracing.
References
Search questions
FAQ
Who is this article for?
This article is for readers who want a professional-level guide to Proxy Cache Revalidation: Cache-Control, ETag, and Observable Correctness. It takes about 13 min and focuses on HTTP Cache, ETag, Observability, Python.
What should I read next?
Use the related tutorials and project links below the article to continue through the closest topic hub.
Does this article include runnable code or companion resources?
Yes. Use the run notes, resource cards, and download links on the page to reproduce the example or inspect the companion files.
How does this article fit into the larger site?
It is connected to the article context block, learning routes, resources, and project timeline so readers can move from concept to implementation.
Article context
Network Fundamentals
A reproducible route through DNS, TCP, TLS, HTTP/3, proxy tunnels, load balancing, and shared caches with code and figures.
Your next step
Open resourceUse an RFC 9111 shared-cache model to calculate MISS, HIT, and 304 revalidation latency and correctness boundaries.
Download share card Open share centerCompanion resources
Network Fundamentals / GUIDE
Network Fundamentals Lab README
Setup, no-privilege safety boundary, ten Python experiments, and three C examples.
Network Fundamentals / DATASET
Proxy cache revalidation CSV
Records MISS, HIT, 304 revalidation, object age, and response latency.
Network Fundamentals / ARCHIVE
Network fundamentals full lab bundle
Bundles Python/C source, fixed scenarios, ten result CSVs, and protocol/proxy figures.
Project timeline
Published posts
- DNS Resolution Explained: Build a TTL Cache and Packet Parser in Python A runnable DNS guide covering resolution paths, response headers, TTL cache latency, and deterministic Python/C experiments.
- CIDR, Longest Prefix Match, and MTU: Calculate IP Routing Step by Step Calculate CIDR ranges, longest-prefix route choice, and MTU/MSS payload segmentation with runnable Python and C examples.
- TCP Reliability and Congestion Window: A Runnable Sequence Number Experiment Track TCP sequence numbers, cumulative ACKs, loss, retransmission, and congestion-window changes with safe local experiments.
- HTTPS and TLS 1.3 Handshake: Keys, Certificates, and RTT in Practice Understand TLS 1.3 message flights, certificate authentication, ephemeral key agreement, and handshake latency with a safe teaching model.
- HTTP/2, HTTP/3, and CDN Caching: Read Page Speed from a Waterfall A deterministic browser-waterfall model for HTTP/2, HTTP/3, QUIC streams, and CDN cache hits or misses.
- Forward Proxy vs Reverse Proxy: Connection Paths, Trust Boundaries, and Latency A reproducible guide to forward proxies, reverse proxies, tunnels, TLS boundaries, and latency segments.
- HTTP CONNECT and HTTPS Proxy Tunnels: TLS Boundaries and Handshake Latency An RFC-based explanation of CONNECT tunnels, encrypted HTTPS payloads, and modeled first-request latency.
- SOCKS5 Proxy Explained: Protocol Bytes, DNS Resolution Boundaries, and Leakage Risk Decode safe SOCKS5 CONNECT bytes and compare local-DNS and proxy-side hostname resolution boundaries.
- Reverse Proxy Load Balancing: Queues, Health Checks, and a Reproducible Scheduler Compare round robin and load-aware queue selection while reasoning about health checks and retry boundaries.
- Proxy Cache Revalidation: Cache-Control, ETag, and Observable Correctness Use an RFC 9111 shared-cache model to calculate MISS, HIT, and 304 revalidation latency and correctness boundaries.
Published resources
- Network Fundamentals Lab README Setup, no-privilege safety boundary, ten Python experiments, and three C examples.
- Network fundamentals full lab bundle Bundles Python/C source, fixed scenarios, ten result CSVs, and protocol/proxy figures.
- DNS TTL results CSV HIT/MISS state, expiry, and latency for four fixed lookups.
- CIDR and MTU results CSV Longest-prefix route and 3600-byte payload segmentation results.
- TCP cwnd events CSV Per-round ACK, window, and deterministic retransmission events.
- TLS 1.3 flight results CSV Message direction, timing, and teaching shared value in a fixed RTT model.
- HTTP/CDN waterfall results CSV Phase timing for HTTP/2 and HTTP/3 in cold and warm cache models.
- Proxy path latency results CSV Phase timing for direct access, forward-proxy tunneling, and reverse-proxy cache paths.
- CONNECT/TLS timeline CSV Records CONNECT authority, tunnel establishment, and the encrypted HTTPS-request boundary.
- SOCKS5 DNS boundary CSV Stores ATYP, destination bytes, request length, and modeled local DNS counts.
- Proxy load-balancing queue CSV Compares backend selection and queue waiting for round robin and least queue.
- Proxy cache revalidation CSV Records MISS, HIT, 304 revalidation, object age, and response latency.
- Network request path visualizer Adjust TTL, prefixes, loss, handshake RTT, and cache paths in the browser.
- Network fundamentals topic share card A 1200x630 SVG card for the DNS, TLS, HTTP/3, proxy tunnel, and caching topic hub.
Next notes
- Add IPv6 and QUIC observation notes
- Review caching and protocol benefits with real-user metrics
