代理缓存教程：Cache-Control、ETag、304 与观测指标

Q: 这篇文章适合谁读？

这篇文章适合想用 专业 难度理解“代理缓存与重新验证：Cache-Control、ETag 和可观测性实验”的读者，预计阅读时间约 13 分钟，重点覆盖 HTTP Cache, ETag, Observability, Python。

阅读信息

难度: 专业阅读时间: 13 分钟

HTTP Cache
ETag
Observability
Python

打开知识图谱

中文

代理缓存与重新验证：Cache-Control、ETag 和可观测性实验

反向代理缓存与 CDN（内容分发网络）在本质上是极致的共享内存（Shared Memory）并发模型。它们具备将单一源站的资源表示，复用给数以百万计的并发客户端的能力。因此，它们必须受到严苛的缓存一致性协议、概率驱逐逻辑（Probabilistic Eviction）以及并发锁机制的严格制约。在访问日志中仅仅盯着一个 HIT 标签是极其肤浅的；真正的工程挑战潜伏在共享内存池（shm slab）碎片化、缓存雪崩/惊群效应（Thundering Herds）以及算法级别的替换模型中。缺乏数学可观测性的缓存系统，与直接给用户返回过期、损坏、甚至发生数据串联的危险内存段毫无区别。

1. 缓存驱逐算法的极限对决：LRU vs LFU vs ARC

当代理服务器的内存池耗尽时，必须驱逐（Evict）部分对象。最简单的做法是最近最少使用（LRU，Least Recently Used），通常通过双向链表和哈希表来实现。然而，LRU 在面对顺序扫描时的“缓存污染”（Cache Pollution）极其脆弱（例如，一个半夜执行的备份脚本遍历了所有的静态资源）。

现代处于边缘计算前沿的代理使用的是高度进化的算法，如 ARC（Adaptive Replacement Cache） 或 W-TinyLFU。ARC 在数学上维护了两个动态伸缩的 LRU 链表：$L_1$ 用于追踪最近看到的对象，$L_2$ 用于追踪频繁看到的对象。一个可自适应调节的边界参数 $p$ 掌控着它们之间的平衡。

ARC 的状态转移机制运作于马尔可夫链（Markov Chain）模型之上。如果一次 Cache Miss 命中了影子链表 $B_1$（存放刚被驱逐的“最近使用”记录），参数 $p$ 就会增加以倾向于保护“最近性”（Recency）。如果命中了 $B_2$（被驱逐的“频繁使用”记录），$p$ 则会减小以倾向于保护“频率”（Frequency）。这就造就了一个完全自治的、在数学上始终趋近最优解的自适应驱逐阈值：

$$ p_{new} = minleft(c, p_{old} + maxleft(1, frac{|B_2|}{|B_1|}right)right) $$

2. Nginx 共享内存架构与自旋锁争用（Lock Contention）

在 Nginx 这种多进程模型的代理中，缓存的索引字典驻留在共享内存区域（shm_zone）中。因为数十个 Worker 进程必须高度并发地对这块内存执行读写操作，这不可避免地引入了内核级别的并发控制问题。

Nginx 采用了其内部专属的 slab 内存分配器（ngx_slab_alloc）来彻底根除内存碎片，其设计思想与 Linux 内核底层的 slab 分配器如出一辙。而多进程的同步则通过构建在 CPU 原子指令（如 CMPXCHG）之上的自旋锁（ngx_shmtx_t）来实现。


// Nginx 为插入 Cache Key 获取共享内存自旋锁的核心源码
// 源码路径: src/core/ngx_shmtx.c
void ngx_shmtx_lock(ngx_shmtx_t *mtx) {
    ngx_uint_t  i, n;
    for ( ;; ) {
        // 原子性的 Compare-And-Swap (极速路径)
        if (*mtx->lock == 0 && ngx_atomic_cmp_set(mtx->lock, 0, ngx_pid)) {
            return;
        }
        // CPU 暂停指令 (Pause)，用于缓解 Cache Line 弹跳 (Bouncing)
        // 极大地降低 MESI 总线嗅探风暴
        for (n = 1; n < mtx->spin; n <<= 1) {
            for (i = 0; i < n; i++) {
                ngx_cpu_pause(); 
            }
            if (*mtx->lock == 0 && ngx_atomic_cmp_set(mtx->lock, 0, ngx_pid)) {
                return;
            }
        }
        // 退化路径：放弃 CPU 时间片，挂起等待内核 Futex 唤醒
        ngx_shmtx_wakeup(mtx);
    }
}

在应对海量高并发并发 MISS 的极端场景下，自旋锁的剧烈争用会导致 CPU 缓存行疯狂弹跳（Cache Line Bouncing），引发 MESI 协议的全局失效。深刻理解这段底层 C 源码，就决定了你必须在生产环境开启 proxy_cache_lock on; 配置，强制将瞬间爆发的、对同一资源的并发 MISS 请求坍缩（Collapse）为单次唯一的回源（Origin Fetch）请求。

3. 数学层面消灭惊群效应：X-Fetch 概率算法

当一个访问量极高的对象的 max-age 倒计时归零时，成千上万个并发请求会瞬间 MISS 并像海啸一样击穿至源站，导致数据库瞬间熔断（这被称为 Thundering Herd 或 Cache Stampede）。标准的 stale-while-revalidate 机制有所帮助，但如果代理进程刚重启呢？答案是引入 Varnish 中原生的 概率早期过期算法（Probabilistic Early Expiration, X-Fetch）。

算法并不在 TTL 彻底耗尽时一刀切地过期，而是赋予每个请求一个抢占式触发后台重新验证（Revalidation）的概率 $P$。当当前时间 $t$ 越逼近过期时间 $t_{exp}$ 时，触发提前预取的概率就会呈指数级攀升。

$$ P(text{fetch}) = 1 - expleft(-frac{Delta t}{beta cdot text{TTL}}right) $$

这里的 $Delta t = t_{exp} - t$，而 $beta$ 是一个控制预取激进程度的调优常数。通过在过期决策的临界点注入概率学的抖动（Randomized Jitter），那种毁灭性的、确定性的缓存雪崩，在数学上被完美平滑为了一条温和的源站访问量正态分布曲线，保障了系统 $99.99%$ 的 0 延迟毛刺。

4. eBPF：深挖内存分配的性能暗面

若要洞悉缓存最真实的延迟，仅仅抓取 HTTP 标头无异于隔靴搔痒。借助 eBPF 的无侵入式探针，我们可以直接追踪代理进程内部的内存分配函数。通过 Hook uprobe:nginx:ngx_http_file_cache_read 与 uprobe:nginx:ngx_slab_alloc，我们能精准绘制出磁盘 I/O 阻塞时间 vs 共享内存寻址时间的立体直方图。

如果 eBPF 直方图无情地指出 ngx_slab_alloc 的 P99 分位数超越了 10 毫秒的红线，那宣告着你的共享内存区已沦为碎片的废墟，或是锁争用已到了水深火热的地步。此时盲目增加硬盘缓存无济于事，救命稻草是立刻扩容 proxy_cache_path keys_zone=name:size 中的内存索引区大小，并针对性地调优 Slab 分配层级。

5. 终极架构可观测性检查清单

Vary 头与缓存键（Key）的碎片化防御： 缓存键必须确定性地整合 Vary 声明的输入维度。如果你连 Accept-Encoding (gzip 与 br) 都不做严格归一化，就会导致海量的冗余回源穿透。
Mutex 锁与请求坍缩： 在面临高热点穿透时，坚定不移地开启 proxy_cache_lock 以实现并发 MISS 请求的安全合并。
概率性后台预取： 在架构层面植入 X-Fetch 算法思想或 stale-background-fetch 策略，将源站 CPU 尖刺扼杀在摇篮中。
eBPF Slab 持续监控： 将针对代理内核态内存碎片的 eBPF 探针列为最高级别的 24x7 监控基线。

参考资料

英文

Proxy Cache Revalidation: Cache-Control, ETag, and Observable Correctness

在独立页面打开

Reverse-proxy caches and Content Delivery Networks (CDNs) are fundamentally shared memory paradigms. They possess the capability to multiplex a single origin representation across millions of clients. Consequently, they must rigorously enforce cache coherency, probabilistic eviction logic, and concurrency locks. Observing a HIT in an access log is superficial; the true engineering challenges lie in Shared Memory (shm) slab fragmentation, cache stampedes (thundering herds), and algorithmic replacement models. Caching without mathematical observability is indistinguishable from serving stale, corrupted, or insecure memory segments.

1. Cache Replacement Algorithms: LRU vs. LFU vs. ARC

When a proxy's memory pool fills, it must evict objects. The naive choice is Least Recently Used (LRU), implemented via a doubly-linked list and a hash map. However, LRU is highly susceptible to "cache pollution" from sequential scans (e.g., a nightly backup script scraping all assets).

Modern edge proxies utilize advanced algorithms like ARC (Adaptive Replacement Cache) or W-TinyLFU. ARC mathematically maintains two LRU lists: $L_1$ for recently seen items and $L_2$ for frequently seen items. A tunable parameter $p$ dictates the boundary between them.

The state transition of ARC operates on a Markov Chain model. If a cache miss hits the ghost list $B_1$ (evicted recent items), $p$ is incremented to favor recency. If it hits $B_2$ (evicted frequent items), $p$ is decremented to favor frequency. This creates an autonomous, mathematically optimal eviction threshold.

$$ p_{new} = minleft(c, p_{old} + maxleft(1, frac{|B_2|}{|B_1|}right)right) $$

2. Nginx Shared Memory (shm) Architecture & Lock Contention

In multiprocess proxies like Nginx, the cache index is stored in a shared memory zone (shm_zone). Because multiple worker processes must read/write to this memory concurrently, it requires kernel-level concurrency control.

Nginx manages this using an internal slab allocator (ngx_slab_alloc) to prevent memory fragmentation, highly analogous to the Linux kernel's slab allocator. Synchronization is achieved via spinlocks (ngx_shmtx_t) built on atomic CPU instructions (CMPXCHG).


// Nginx Spinlock acquisition for Cache Key insertion
// Source: ngx_shmtx.c
void ngx_shmtx_lock(ngx_shmtx_t *mtx) {
    ngx_uint_t  i, n;
    for ( ;; ) {
        // Atomic compare-and-swap (fast path)
        if (*mtx->lock == 0 && ngx_atomic_cmp_set(mtx->lock, 0, ngx_pid)) {
            return;
        }
        // CPU Pause to reduce Cache Line Bouncing / MESI bus traffic
        for (n = 1; n < mtx->spin; n <<= 1) {
            for (i = 0; i < n; i++) {
                ngx_cpu_pause(); 
            }
            if (*mtx->lock == 0 && ngx_atomic_cmp_set(mtx->lock, 0, ngx_pid)) {
                return;
            }
        }
        // Fallback to kernel futex yield
        ngx_shmtx_wakeup(mtx);
    }
}

Under massive high-concurrency MISS rates, spinlock contention causes violent CPU cache line bouncing (MESI protocol invalidations). Understanding this code dictates that you must tune proxy_cache_lock on; to collapse concurrent identical MISS requests into a single origin fetch.

3. Mathematical Eradication of Thundering Herds: X-Fetch Algorithm

When a highly requested object's max-age expires, thousands of concurrent requests will suddenly MISS and hit the origin, causing a Database Meltdown (Thundering Herd / Cache Stampede). Standard stale-while-revalidate helps, but what if the proxy restarts? We use Probabilistic Early Expiration (X-Fetch), native to Varnish.

Instead of expiring exactly at TTL, a request has a probability $P$ of preemptively triggering a background revalidation. As the current time $t$ approaches the expiration time $t_{exp}$, the probability exponentially increases.

$$ P(text{fetch}) = 1 - expleft(-frac{Delta t}{beta cdot text{TTL}}right) $$

Where $Delta t = t_{exp} - t$, and $beta$ is a tuning constant controlling the aggressiveness of the prefetch. By injecting randomized jitter into the expiration decision, the deterministic cache stampede is mathematically smoothed into a manageable curve of origin hits, guaranteeing zero latency spikes.

4. eBPF: Profiling Memory Allocation Penalities

To measure true cache latency, observing HTTP headers is insufficient. Using eBPF, we trace the memory allocation functions within the proxy. By hooking uprobe:nginx:ngx_http_file_cache_read and uprobe:nginx:ngx_slab_alloc, we can plot histograms of disk I/O blocking time vs shared memory lookup time.

If the eBPF histogram shows the 99th percentile ($P_{99}$) of ngx_slab_alloc exceeding 10ms, your shared memory zone is heavily fragmented, or lock contention is severe. The solution is not more cache, but increasing proxy_cache_path keys_zone=name:size and adjusting slab sizes.

5. Architecture Observability Checklist

Vary & Key Fragmentation: Cache keys must deterministically include required headers via the Vary header. Failure to normalize Accept-Encoding (e.g., gzip vs br) results in redundant origin fetches.
Mutex and Request Collapsing: Utilize cache locks (proxy_cache_lock) to coalesce simultaneous cache misses.
Probabilistic Prefetching: Implement X-Fetch algorithms or stale-background-fetch to eradicate origin CPU spikes.
eBPF Slab Monitoring: Continuously monitor proxy memory fragmentation using kernel tracing.

References

代码运行说明

环境: Python 3 + Matplotlib；内存缓存状态模型

安装

cd network-fundamentals-lab
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

运行

python src/proxy_cache_revalidation.py

输入文件: 固定 max-age、validator 成本与四次请求时间点
预期输出: MISS、HIT、REVALIDATED_304、HIT，平均延迟为 16.50 ms。

安装 cd network-fundamentals-lab
安装 python3 -m venv .venv
安装 source .venv/bin/activate
安装 pip install -r requirements.txt
运行 python src/proxy_cache_revalidation.py

1. 缓存驱逐算法的极限对决：LRU vs LFU vs ARC

$$ p_{new} = minleft(c, p_{old} + maxleft(1, frac{|B_2|}{|B_1|}right)right) $$

2. Nginx 共享内存架构与自旋锁争用（Lock Contention）


// Nginx 为插入 Cache Key 获取共享内存自旋锁的核心源码
// 源码路径: src/core/ngx_shmtx.c
void ngx_shmtx_lock(ngx_shmtx_t *mtx) {
    ngx_uint_t  i, n;
    for ( ;; ) {
        // 原子性的 Compare-And-Swap (极速路径)
        if (*mtx->lock == 0 && ngx_atomic_cmp_set(mtx->lock, 0, ngx_pid)) {
            return;
        }
        // CPU 暂停指令 (Pause)，用于缓解 Cache Line 弹跳 (Bouncing)
        // 极大地降低 MESI 总线嗅探风暴
        for (n = 1; n < mtx->spin; n <<= 1) {
            for (i = 0; i < n; i++) {
                ngx_cpu_pause(); 
            }
            if (*mtx->lock == 0 && ngx_atomic_cmp_set(mtx->lock, 0, ngx_pid)) {
                return;
            }
        }
        // 退化路径：放弃 CPU 时间片，挂起等待内核 Futex 唤醒
        ngx_shmtx_wakeup(mtx);
    }
}

3. 数学层面消灭惊群效应：X-Fetch 概率算法

$$ P(text{fetch}) = 1 – expleft(-frac{Delta t}{beta cdot text{TTL}}right) $$

这里的 $Delta t = t_{exp} – t$，而 $beta$ 是一个控制预取激进程度的调优常数。通过在过期决策的临界点注入概率学的抖动（Randomized Jitter），那种毁灭性的、确定性的缓存雪崩，在数学上被完美平滑为了一条温和的源站访问量正态分布曲线，保障了系统 $99.99%$ 的 0 延迟毛刺。

4. eBPF：深挖内存分配的性能暗面

5. 终极架构可观测性检查清单

Vary 头与缓存键（Key）的碎片化防御： 缓存键必须确定性地整合 Vary 声明的输入维度。如果你连 Accept-Encoding (gzip 与 br) 都不做严格归一化，就会导致海量的冗余回源穿透。
Mutex 锁与请求坍缩： 在面临高热点穿透时，坚定不移地开启 proxy_cache_lock 以实现并发 MISS 请求的安全合并。
概率性后台预取： 在架构层面植入 X-Fetch 算法思想或 stale-background-fetch 策略，将源站 CPU 尖刺扼杀在摇篮中。
eBPF Slab 持续监控： 将针对代理内核态内存碎片的 eBPF 探针列为最高级别的 24×7 监控基线。

参考资料

搜索问题

常见问题

这篇文章适合谁读？

这篇文章适合想用专业难度理解“代理缓存与重新验证：Cache-Control、ETag 和可观测性实验”的读者，预计阅读时间约 13 分钟，重点覆盖 HTTP Cache, ETag, Observability, Python。

读完后下一步应该看什么？

可以从文末相关阅读、项目页和知识图谱继续进入相邻主题。

这篇文章有没有可运行代码或配套资源？

有。页面里的运行说明、资源卡片和下载入口会指向复现实验所需的命令、数据、代码或说明文件。

这篇文章和整个网站的学习路线有什么关系？

它会通过文章上下文、学习路线、资源库和项目时间线连接到同一主题下的其他内容。

文章上下文

网络基础原理

从 DNS、TCP、TLS 与 HTTP/3 到代理隧道、负载均衡和共享缓存，以可重现的代码和图分析网页请求路径。

难度: 专业阅读时间: 13 分钟

HTTP Cache
ETag
Observability
Python

继续下一步

打开资源

先补基础打开资源

对应语言版本 Proxy Cache Revalidation: Cache-Control, ETag, and Observable Correctness

可分享摘要 代理缓存与重新验证：Cache-Control、ETag 和可观测性实验

依据 RFC 9111 计算共享缓存 MISS、HIT 与 304 revalidation 的时延，并解释缓存 key 和隐私边界。

下载分享图打开分享中心

配套资源

安装、无权限安全边界、十个 Python 实验和三个 C 示例的运行说明。

打开资源关联文章

记录 MISS、HIT、304 重新验证、对象年龄和响应时延。

打开资源关联文章

打包 Python/C 源码、固定场景、十份结果 CSV 与协议/代理图。

打开资源关联文章

发表回复取消回复

要发表评论，您必须先登录。

项目时间线

已发布文章

DNS 解析过程详解：从域名查询到 TTL 缓存的 Python 实验从 RFC DNS 报文与递归查询出发，用 Python 和 C 实验计算 TTL 缓存命中对解析延迟的影响。
CIDR、子网掩码与最长前缀匹配：用代码算清 IP 路由和 MTU 手算 CIDR 网段、最长前缀匹配与 MTU/MSS 分段，并用 Python/C 输出固定路由结果。
TCP 三次握手、重传与拥塞窗口：可运行的序列号实验从 TCP sequence/ACK 和慢启动出发，用确定性丢包曲线与 localhost C socket 实验理解可靠传输。
HTTPS 与 TLS 1.3 握手原理：密钥交换、证书和 RTT 实验解释 TLS 1.3 消息 flight、证书与临时密钥交换，用安全的教学模型计算一次 RTT 握手。
HTTP/2、HTTP/3 与 CDN 缓存：从网络瀑布图理解网页加载速度用确定性 waterfall 模型拆解 HTTP/2、HTTP/3、QUIC stream 和 CDN HIT/MISS 对网页等待时间的影响。
正向代理与反向代理原理：连接路径、信任边界和时延计算从连接方向和 TLS 终止点解释正向代理、反向代理与隧道代理，并用 Python 模型分段计算代理 hop 与缓存收益。
HTTP CONNECT 与 HTTPS 代理隧道：TLS 边界和握手时延以 RFC CONNECT 状态机解释 HTTPS 代理隧道、TLS 可见性和首次加密请求时延。
SOCKS5 代理原理：协议字节、DNS 解析边界与泄漏风险按 RFC 1928 拆解 SOCKS5 CONNECT 字节，通过安全编码实验比较本地 DNS 与代理侧域名解析。
反向代理负载均衡原理：队列、健康检查和可复现调度实验用固定请求队列比较 round robin 与负载感知调度，并解释反向代理健康检查和重试边界。
代理缓存与重新验证：Cache-Control、ETag 和可观测性实验依据 RFC 9111 计算共享缓存 MISS、HIT 与 304 revalidation 的时延，并解释缓存 key 和隐私边界。

已公开资源

Network Fundamentals Lab 说明安装、无权限安全边界、十个 Python 实验和三个 C 示例的运行说明。
网络基础原理完整实验包打包 Python/C 源码、固定场景、十份结果 CSV 与协议/代理图。
DNS TTL 结果 CSV 四次固定查询的 HIT/MISS、过期时间和解析延迟。
CIDR 与 MTU 结果 CSV 最长前缀路由和 3600 B payload 分段计算结果。
TCP cwnd 事件 CSV 逐轮记录 ACK、窗口和固定重传事件。
TLS 1.3 flight 结果 CSV 固定 RTT 模型中的消息方向、时间点和教学共享值。
HTTP/CDN waterfall 结果 CSV HTTP/2 与 HTTP/3 在冷暖缓存模型中的分阶段耗时。
代理路径时延结果 CSV 直接访问、正向代理隧道与反向代理缓存路径的分阶段等待。
CONNECT/TLS 时间线 CSV 记录 CONNECT authority、隧道建立与加密 HTTPS 请求的状态边界。
SOCKS5 DNS 边界 CSV 保存 ATYP、目标字节、请求长度和本机 DNS 解析计数。
代理负载均衡队列 CSV 比较 round robin 与 least queue 的 backend 选择和排队等待。
代理缓存重新验证 CSV 记录 MISS、HIT、304 重新验证、对象年龄和响应时延。
网络请求链路交互演示在浏览器里调整 TTL、前缀、丢包、握手 RTT 与缓存路径。
网络基础原理专题分享图用于分享 DNS、TLS、HTTP/3、代理隧道和缓存专题的 1200x630 SVG 图。

下一步计划

补充 IPv6 与 QUIC 报文观察笔记
继续用真实用户指标复查缓存与协议收益

1. 缓存驱逐算法的极限对决：LRU vs LFU vs ARC

2. Nginx 共享内存架构与自旋锁争用（Lock Contention）

3. 数学层面消灭惊群效应：X-Fetch 概率算法

4. eBPF：深挖内存分配的性能暗面

5. 终极架构可观测性检查清单

参考资料

1. Cache Replacement Algorithms: LRU vs. LFU vs. ARC

2. Nginx Shared Memory (shm) Architecture & Lock Contention

3. Mathematical Eradication of Thundering Herds: X-Fetch Algorithm

4. eBPF: Profiling Memory Allocation Penalities

5. Architecture Observability Checklist

References

1. 缓存驱逐算法的极限对决：LRU vs LFU vs ARC

2. Nginx 共享内存架构与自旋锁争用（Lock Contention）

3. 数学层面消灭惊群效应：X-Fetch 概率算法

4. eBPF：深挖内存分配的性能暗面

5. 终极架构可观测性检查清单

参考资料

这篇文章适合谁读？

读完后下一步应该看什么？

这篇文章有没有可运行代码或配套资源？

这篇文章和整个网站的学习路线有什么关系？

配套资源

Network Fundamentals Lab 说明

代理缓存重新验证 CSV

网络基础原理完整实验包

发表回复 取消回复

项目时间线

发表回复取消回复