卷积神经网络数学教程：输出尺寸、感受野、im2col 与手算卷积

Q: 这篇文章适合谁读？

这篇文章适合想用 进阶 难度理解“卷积与感受野数学：5×5 输入、3×3 kernel、padding 和 im2col”的读者，预计阅读时间约 13 分钟，重点覆盖 Convolution, Receptive Field, im2col。

阅读信息

难度: 进阶阅读时间: 13 分钟

Convolution
Receptive Field
im2col

打开知识图谱

中文

卷积与感受野数学：5×5 输入、3×3 kernel、padding 和 im2col

卷积层绝不仅仅是一个用于图像处理的模型组件。它是建立在三个核心原则基础上的计算模式：局部连接、权重共享和空间结构保留。与将每个输入连接到每个输出的全连接层不同，卷积会在输入数据上滑动一个局部的窗口（即卷积核）。从第一性原理理解卷积，是掌握现代深度学习的关键。

本文将深入探讨卷积操作背后的数学原理，展示感受野是如何按层级扩大的，并使用 Numpy 从头手写一个功能完整的 2D 卷积。

一、卷积输出尺寸的数学公式

当您在输入张量上滑动卷积核时，输出特征图的空间尺寸取决于四个因素：输入尺寸 ($W, H$)、卷积核大小 ($K$)、填充/Padding ($P$) 和步长/Stride ($S$)。

计算输出尺寸的通用公式为：

Output_Size = floor((Input_Size + 2 * Padding - Kernel_Size) / Stride) + 1

我们来拆解一个经典场景：输入图像为 5x5，卷积核为 3x3，填充为 0 (Valid 卷积)，步长为 1。代入公式：

floor((5 + 2*0 - 3) / 1) + 1 = 3

因此，输出的特征图精确地为 3x3。如果我们希望输出保持 5x5 的尺寸，则需要添加 1 的填充 (Same 卷积)，假设步长依然为 1。

二、手算单个输出像素点

卷积的一个输出值，等于输入的一个局部图像块（Patch）与卷积核矩阵之间逐元素相乘后的总和。这本质上就是一个点积操作。

5x5 输入、3x3 卷积核和 3x3 输出特征图 — 高亮图像块与卷积核做点积，得到输出特征图上的一个像素。

实验包输出的 conv2d-results.csv 显示中间位置 row=1,col=1 的结果是 -1.000000。如果您手动把高亮的 3x3 区域和卷积核对齐相乘并求和，也会得到一模一样的结果。

三、不断扩张的感受野 (Receptive Field)

感受野 (Receptive Field, RF) 是指神经网络中某个特征影响了原始输入空间多大的区域。单层 3x3 卷积只能看到 3x3 的局部区域。然而，深度神经网络堆叠了多个卷积层，网络到底是如何“看懂”整张图片的呢？

当你在第一层上再堆叠一个 3x3 卷积层时，第二层的一个输出神经元连接到第一层隐藏层中 3x3 的区域。但是，这 9 个神经元中的每一个，都分别连接到原始输入中 3x3 的区域。因此，第二层中的单个神经元在原始输入上具有 5x5 的有效感受野。


graph TD
    sublayer_2["第 2 层 (1x1 输出)"] --> sublayer_1["第 1 层 (3x3 特征图)"]
    sublayer_1 --> input["原始输入 (5x5 感受野)"]
    
    style sublayer_2 fill:#f9f,stroke:#333,stroke-width:2px
    style sublayer_1 fill:#bbf,stroke:#333,stroke-width:2px
    style input fill:#bfb,stroke:#333,stroke-width:2px

在数学上，第 $l$ 层的感受野尺寸 $RF_l$ 可以用以下公式计算：

RF_l = RF_{l-1} + (Kernel_Size_l - 1) * Stride_Product_{i=1 to l-1}

这就解释了为什么卷积网络可以从检测微小的边缘开始，逐渐组合以识别复杂的纹理、形状，最终识别出人脸或汽车等完整对象。

四、代码实现：im2col 与矩阵乘法

在工程实践中，使用嵌套循环（滑动窗口）遍历图像的速度极其缓慢。现代深度学习框架（如 PyTorch 和 TensorFlow）通过将卷积转换为庞大的矩阵乘法来对该操作进行向量化加速。这种技术被称为 im2col (Image to Column)。

im2col 会从输入图像中提取每个局部块（Patch），将其展平为一维向量，并将它们堆叠成一个大型矩阵。卷积核也会被展平。随后，卷积操作就变成了一次高度优化的矩阵乘法 (GEMM)。


import numpy as np

def conv2d_im2col(image, kernel, stride=1):
    """
    一个使用 im2col 实现 2D 卷积的实用 Numpy 代码。
    """
    h_in, w_in = image.shape
    k_h, k_w = kernel.shape
    
    # 计算输出尺寸
    out_h = (h_in - k_h) // stride + 1
    out_w = (w_in - k_w) // stride + 1
    
    # 提取图像块 (im2col 步骤)
    # cols 矩阵的形状: (out_h * out_w, k_h * k_w)
    cols = []
    for r in range(0, h_in - k_h + 1, stride):
        for c in range(0, w_in - k_w + 1, stride):
            patch = image[r:r+k_h, c:c+k_w]
            cols.append(patch.reshape(-1))
            
    im_matrix = np.vstack(cols)
    
    # 展平卷积核
    weight_matrix = kernel.reshape(-1, 1)
    
    # 执行矩阵乘法
    result = im_matrix @ weight_matrix
    
    # 将结果 reshape 回特征图的尺寸
    return result.reshape(out_h, out_w)

# 测试实现
test_img = np.arange(25).reshape(5, 5)
test_kernel = np.ones((3, 3))
output = conv2d_im2col(test_img, test_kernel)
print("输出形状:", output.shape)
print(output)

对于 5x5 输入和 3x3 卷积核，共有 9 个有效图像块，每个块包含 9 个值。生成的 im_matrix 形状将是 (9, 9)。

五、卷积维度检查表

卷积代码最常见的问题不是公式不会写，而是张量维度在某一层悄悄错了。下面这张表可以在调试模型时逐层检查。

检查项	应该确认什么	常见错误
输入格式	NCHW 还是 NHWC	通道维和高度维写反
输出尺寸	是否符合公式	padding 或 stride 少写导致 flatten 失败
感受野	深层特征是否覆盖足够输入区域	网络太浅，只能看到局部纹理
参数量	`out_channels * in_channels * k_h * k_w`	通道数暴涨导致显存超限
边界填充	zero / reflect / replicate 是否符合任务	边缘目标预测明显差

六、动画演示核心过程

动画展示了卷积窗口扫描、输出像素填充以及感受野扩大的全过程。

在观看动画时，请注意两个核心属性：相同的卷积核权重在许多空间位置被重复使用（这就是权重共享），并且每个输出最初只“看到”输入的一个局部区域（这就是局部连接）。

七、个人经验与工程师视角

在真实的业务场景中使用卷积神经网络，会遇到一些单看数学公式很难发现的工程挑战：

显存与算力的博弈： im2col 方法非常巧妙，它通过矩阵乘法充分压榨 GPU 核心的算力，但代价极其昂贵：内存复制冗余。通过提取重叠的图像块，im2col 会成倍地膨胀输入张量的显存占用。如果您处理的是大型医疗影像（如 3D CT 扫描），调用 im2col 很容易直接导致 Out-Of-Memory (OOM) 报错。在生产级的 C++/CUDA 开发中，我们通常采用显存效率更高的隐式 GEMM (Implicit GEMM) 或 Winograd 算法。

维度不匹配的噩梦： 初级工程师遇到的最常见的错误，就是从最后一个卷积层过渡到第一个全连接（Dense）层时，突然跳出 RuntimeError: size mismatch。在执行 flatten() 操作之前，务必养成习惯：打印张量形状，或者使用输出尺寸公式手动验算最后的尺寸。
棋盘伪影 (Checkerboard Artifacts)： 在生成对抗网络 (GANs) 中使用转置卷积（常被误称为反卷积）进行上采样时，您经常会遇到棋盘状的伪影。当卷积核的大小不能被步长整除时，就会发生这种现象。我常用的一个实用解决方案是：放弃转置卷积，改用“最近邻插值上采样 (Nearest-neighbor upsample) + 标准步长为 1 的普通卷积”组合来替代。
Padding 对边缘的影响： 补零 (Zero-padding) 是框架的默认选项，但它人为地在特征图中引入了黑色边缘。如果您发现模型在处理图像边缘的对象时性能不佳，可以考虑将其切换为 “Reflect” 或 “Replicate” 填充模式。

下一篇，我们将离开死板、局部连接的卷积，进入 Transformer 中极其灵活、基于全局 Token-to-Token 交互的注意力机制世界。

英文

Convolution and Receptive Field Math: Padding, Stride, Channels, and im2col

在独立页面打开

A convolutional layer is not just an image-model component. It is a fundamental computation pattern built on three core principles: local connectivity, shared weights, and preserved spatial structure. Rather than connecting every input to every output like a dense layer, convolution slides a localized window—a kernel—across the input data. Understanding convolution from first principles is critical to mastering modern deep learning.

This article explores the mathematics behind the convolution operation, demonstrates how the receptive field grows hierarchically, and implements a functioning 2D convolution from scratch using Numpy.

1. The Mathematics of Output Dimensions

When you slide a kernel across an input tensor, the spatial dimensions of the output feature map are determined by four factors: Input size ($W, H$), Kernel size ($K$), Padding ($P$), and Stride ($S$).

The formula to compute the output dimension is:

Output_Size = floor((Input_Size + 2 * Padding - Kernel_Size) / Stride) + 1

Let's break down a classic scenario: an input image of 5x5, a kernel of 3x3, a padding of 0 (Valid convolution), and a stride of 1. Plugging these into our formula:

floor((5 + 2*0 - 3) / 1) + 1 = 3

Thus, our output feature map is exactly 3x3. If we wanted the output to remain 5x5, we would need to add a padding of 1 (Same convolution), assuming a stride of 1.

2. Hand-Calculating One Output Cell

One convolution output value is the sum of element-wise products between a local patch of the input and the kernel matrix. This is essentially a dot product.

5x5 input, 3x3 kernel, and 3x3 output feature map — The highlighted patch is dotted with the kernel to produce one output cell.

If the companion lab writes -1.000000 for row=1,col=1 in conv2d-results.csv, it means multiplying the highlighted 3x3 input patch by the 3x3 kernel matrix and summing all 9 entries yields exactly -1.0.

3. The Expanding Receptive Field

The Receptive Field (RF) is the size of the region in the original input space that affects a specific neural network feature. A single 3x3 convolution sees a 3x3 local region. However, deep neural networks stack multiple convolutional layers. How does the network ever "see" the whole picture?

When you stack a second 3x3 convolution on top of the first one, a single output neuron in the second layer connects to a 3x3 region in the first hidden layer. But each of those 9 neurons in the first hidden layer itself connects to a 3x3 region in the original input. Consequently, a single neuron in layer 2 has an effective receptive field of 5x5 on the original input.


graph TD
    sublayer_2["Layer 2 (1x1 Output)"] --> sublayer_1["Layer 1 (3x3 Feature Map)"]
    sublayer_1 --> input["Original Input (5x5 Receptive Field)"]
    
    style sublayer_2 fill:#f9f,stroke:#333,stroke-width:2px
    style sublayer_1 fill:#bbf,stroke:#333,stroke-width:2px
    style input fill:#bfb,stroke:#333,stroke-width:2px

Mathematically, the receptive field size $RF_l$ at layer $l$ can be computed using the formula:

RF_l = RF_{l-1} + (Kernel_Size_l - 1) * Stride_Product_{i=1 to l-1}

This explains how convolutional networks can start by detecting tiny edges and gradually build up to recognizing complex textures, shapes, and eventually entire objects like faces or cars.

4. Implementation: im2col and Matrix Multiplication

In practice, iterating through an image using nested loops (sliding window) is extremely slow. Modern deep learning frameworks (like PyTorch and TensorFlow) vectorize this operation by transforming the convolution into a massive matrix multiplication. This technique is known as im2col (Image to Column).

im2col extracts each local patch from the input image, flattens it into a 1D vector, and stacks them into a large matrix. The kernel is also flattened. The convolution then becomes a single, highly optimized matrix multiplication (GEMM).


import numpy as np

def conv2d_im2col(image, kernel, stride=1):
    """
    A practical Numpy implementation of 2D Convolution using im2col.
    """
    h_in, w_in = image.shape
    k_h, k_w = kernel.shape
    
    # Calculate output dimensions
    out_h = (h_in - k_h) // stride + 1
    out_w = (w_in - k_w) // stride + 1
    
    # Extract patches (im2col step)
    # Shape of cols: (out_h * out_w, k_h * k_w)
    cols = []
    for r in range(0, h_in - k_h + 1, stride):
        for c in range(0, w_in - k_w + 1, stride):
            patch = image[r:r+k_h, c:c+k_w]
            cols.append(patch.reshape(-1))
            
    im_matrix = np.vstack(cols)
    
    # Flatten kernel
    weight_matrix = kernel.reshape(-1, 1)
    
    # Perform matrix multiplication
    result = im_matrix @ weight_matrix
    
    # Reshape back to feature map dimensions
    return result.reshape(out_h, out_w)

# Test the implementation
test_img = np.arange(25).reshape(5, 5)
test_kernel = np.ones((3, 3))
output = conv2d_im2col(test_img, test_kernel)
print("Output Shape:", output.shape)
print(output)

For a 5x5 input and 3x3 kernel, there are 9 valid patches, each containing 9 values. The im_matrix will have the shape (9, 9).

5. Visualizing the Process

The animation shows the convolution window scanning, output cells filling, and the receptive field expanding.

While watching the animation, notice two properties: the same kernel weights are reused across many spatial positions (weight sharing), and each output initially only "sees" a local region of the input (local connectivity).

6. Personal Experience / Engineer's Perspective

Working with convolutions in real-world scenarios introduces several practical challenges that aren't immediately obvious from the math:

The Memory vs. Compute Trade-off: The im2col approach is brilliant for fully utilizing GPU cores via matrix multiplication, but it comes with a massive cost: memory duplication. By extracting overlapping patches, im2col inflates the memory footprint of the input tensor. If you are working with large medical images (e.g., 3D CT scans), calling im2col can easily cause an Out-Of-Memory (OOM) error. In production C++/CUDA, we often use more memory-efficient implicit GEMM or Winograd algorithms.

Debugging Dimensionality Nightmares: The number one error junior engineers encounter is the dreaded RuntimeError: size mismatch when transitioning from the final Convolutional layer to the first Fully Connected (Dense) layer. Always log or manually compute your final tensor shape using the output size formula before applying a flatten() operation.
Checkerboard Artifacts: When using transposed convolutions (often wrongly called deconvolutions) for upsampling in Generative Adversarial Networks (GANs), you frequently encounter checkerboard artifacts. These happen when the kernel size is not evenly divisible by the stride. A practical fix I often use is to replace transposed convolutions with a nearest-neighbor upsample followed by a standard stride-1 convolution.
Padding Effects on Edges: Zero-padding is the default, but it artificially introduces dark borders into your feature maps. If you notice your model performing poorly on objects at the edge of the image, consider switching to "Reflect" or "Replicate" padding.

7. Convolution Verification Table

A convolution implementation should be checked at three levels: shape math, numerical output, and architectural side effects. The table below provides a compact audit trail for the example in this article.

Check	Expected evidence	Why it matters	Failure signal
Output shape	Input size, kernel size, padding, stride, and computed output dimensions	Shape errors propagate into flatten and dense layers	The formula predicts one shape while the code prints another
Single-cell value	A hand-calculated patch-kernel dot product for one output location	Proves that the sliding-window operation is numerically correct	The implementation is actually correlation, transposed axes, or off by one
Receptive field	Layer-by-layer RF calculation with kernel and stride history	Explains what region of the input influences a deep feature	Architecture changes stride or dilation without updating RF assumptions
Memory behavior	`im2col` matrix shape and estimated memory footprint	Vectorization can trade compute efficiency for memory pressure	Large inputs cause OOM after patch extraction duplicates data

Next up, we transition from rigid, locally-connected convolutions to the flexible, global token-to-token interactions of Attention mechanisms in Transformers.

代码运行说明

环境: Python 3 + NumPy + Matplotlib

安装

cd deep-learning-math-lab
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

运行

python src/conv2d_im2col.py

输入文件: 5x5 输入、3x3 kernel 和 stride 1
预期输出: 输出卷积结果、im2col 形状和卷积扫描 SVG。

安装 cd deep-learning-math-lab
安装 python3 -m venv .venv
安装 source .venv/bin/activate
安装 pip install -r requirements.txt
运行 python src/conv2d_im2col.py

本文将深入探讨卷积操作背后的数学原理，展示感受野是如何按层级扩大的，并使用 Numpy 从头手写一个功能完整的 2D 卷积。

一、卷积输出尺寸的数学公式

当您在输入张量上滑动卷积核时，输出特征图的空间尺寸取决于四个因素：输入尺寸 ($W, H$)、卷积核大小 ($K$)、填充/Padding ($P$) 和步长/Stride ($S$)。

计算输出尺寸的通用公式为：

Output_Size = floor((Input_Size + 2 * Padding - Kernel_Size) / Stride) + 1

我们来拆解一个经典场景：输入图像为 5x5，卷积核为 3x3，填充为 0 (Valid 卷积)，步长为 1。代入公式：

floor((5 + 2*0 - 3) / 1) + 1 = 3

因此，输出的特征图精确地为 3x3。如果我们希望输出保持 5x5 的尺寸，则需要添加 1 的填充 (Same 卷积)，假设步长依然为 1。

二、手算单个输出像素点

卷积的一个输出值，等于输入的一个局部图像块（Patch）与卷积核矩阵之间逐元素相乘后的总和。这本质上就是一个点积操作。

三、不断扩张的感受野 (Receptive Field)


graph TD
    sublayer_2["第 2 层 (1x1 输出)"] --> sublayer_1["第 1 层 (3x3 特征图)"]
    sublayer_1 --> input["原始输入 (5x5 感受野)"]
    
    style sublayer_2 fill:#f9f,stroke:#333,stroke-width:2px
    style sublayer_1 fill:#bbf,stroke:#333,stroke-width:2px
    style input fill:#bfb,stroke:#333,stroke-width:2px

在数学上，第 $l$ 层的感受野尺寸 $RF_l$ 可以用以下公式计算：

RF_l = RF_{l-1} + (Kernel_Size_l - 1) * Stride_Product_{i=1 to l-1}

这就解释了为什么卷积网络可以从检测微小的边缘开始，逐渐组合以识别复杂的纹理、形状，最终识别出人脸或汽车等完整对象。

四、代码实现：im2col 与矩阵乘法


import numpy as np

def conv2d_im2col(image, kernel, stride=1):
    """
    一个使用 im2col 实现 2D 卷积的实用 Numpy 代码。
    """
    h_in, w_in = image.shape
    k_h, k_w = kernel.shape
    
    # 计算输出尺寸
    out_h = (h_in - k_h) // stride + 1
    out_w = (w_in - k_w) // stride + 1
    
    # 提取图像块 (im2col 步骤)
    # cols 矩阵的形状: (out_h * out_w, k_h * k_w)
    cols = []
    for r in range(0, h_in - k_h + 1, stride):
        for c in range(0, w_in - k_w + 1, stride):
            patch = image[r:r+k_h, c:c+k_w]
            cols.append(patch.reshape(-1))
            
    im_matrix = np.vstack(cols)
    
    # 展平卷积核
    weight_matrix = kernel.reshape(-1, 1)
    
    # 执行矩阵乘法
    result = im_matrix @ weight_matrix
    
    # 将结果 reshape 回特征图的尺寸
    return result.reshape(out_h, out_w)

# 测试实现
test_img = np.arange(25).reshape(5, 5)
test_kernel = np.ones((3, 3))
output = conv2d_im2col(test_img, test_kernel)
print("输出形状:", output.shape)
print(output)

对于 5x5 输入和 3x3 卷积核，共有 9 个有效图像块，每个块包含 9 个值。生成的 im_matrix 形状将是 (9, 9)。

五、卷积维度检查表

卷积代码最常见的问题不是公式不会写，而是张量维度在某一层悄悄错了。下面这张表可以在调试模型时逐层检查。

检查项	应该确认什么	常见错误
输入格式	NCHW 还是 NHWC	通道维和高度维写反
输出尺寸	是否符合公式	padding 或 stride 少写导致 flatten 失败
感受野	深层特征是否覆盖足够输入区域	网络太浅，只能看到局部纹理
参数量	`out_channels * in_channels * k_h * k_w`	通道数暴涨导致显存超限
边界填充	zero / reflect / replicate 是否符合任务	边缘目标预测明显差

六、动画演示核心过程

动画展示了卷积窗口扫描、输出像素填充以及感受野扩大的全过程。

七、个人经验与工程师视角

在真实的业务场景中使用卷积神经网络，会遇到一些单看数学公式很难发现的工程挑战：

显存与算力的博弈： im2col 方法非常巧妙，它通过矩阵乘法充分压榨 GPU 核心的算力，但代价极其昂贵：内存复制冗余。通过提取重叠的图像块，im2col 会成倍地膨胀输入张量的显存占用。如果您处理的是大型医疗影像（如 3D CT 扫描），调用 im2col 很容易直接导致 Out-Of-Memory (OOM) 报错。在生产级的 C++/CUDA 开发中，我们通常采用显存效率更高的隐式 GEMM (Implicit GEMM) 或 Winograd 算法。

维度不匹配的噩梦： 初级工程师遇到的最常见的错误，就是从最后一个卷积层过渡到第一个全连接（Dense）层时，突然跳出 RuntimeError: size mismatch。在执行 flatten() 操作之前，务必养成习惯：打印张量形状，或者使用输出尺寸公式手动验算最后的尺寸。
棋盘伪影 (Checkerboard Artifacts)： 在生成对抗网络 (GANs) 中使用转置卷积（常被误称为反卷积）进行上采样时，您经常会遇到棋盘状的伪影。当卷积核的大小不能被步长整除时，就会发生这种现象。我常用的一个实用解决方案是：放弃转置卷积，改用“最近邻插值上采样 (Nearest-neighbor upsample) + 标准步长为 1 的普通卷积”组合来替代。
Padding 对边缘的影响： 补零 (Zero-padding) 是框架的默认选项，但它人为地在特征图中引入了黑色边缘。如果您发现模型在处理图像边缘的对象时性能不佳，可以考虑将其切换为 “Reflect” 或 “Replicate” 填充模式。

下一篇，我们将离开死板、局部连接的卷积，进入 Transformer 中极其灵活、基于全局 Token-to-Token 交互的注意力机制世界。

搜索问题

常见问题

这篇文章适合谁读？

这篇文章适合想用进阶难度理解“卷积与感受野数学：5×5 输入、3×3 kernel、padding 和 im2col”的读者，预计阅读时间约 13 分钟，重点覆盖 Convolution, Receptive Field, im2col。

读完后下一步应该看什么？

推荐下一步阅读“Transformer Attention 数学：Q/K/V、Softmax 权重、Mask 与 KV Cache”，这样可以把当前知识点接到更完整的学习路线里。

这篇文章有没有可运行代码或配套资源？

有。页面里的运行说明、资源卡片和下载入口会指向复现实验所需的命令、数据、代码或说明文件。

这篇文章和整个网站的学习路线有什么关系？

它会通过文章上下文、学习路线、资源库和项目时间线连接到同一主题下的其他内容。

文章上下文

人工智能项目

从 AI、机器学习、训练评估、神经网络到 Python 小实战、手写数字识别、CIFAR-10 CNN、对抗性流量防御和 AI 安全攻防，按顺序建立基础。

难度: 进阶阅读时间: 13 分钟

Convolution
Receptive Field
im2col

继续下一步

继续：Transformer Attention 数学

先补基础打开资源

对应语言版本 Convolution and Receptive Field Math: Padding, Stride, Channels, and im2col

可分享摘要 卷积与感受野数学：5×5 输入、3×3 kernel、padding 和 im2col

手算一次 5x5 输入与 3x3 kernel 的离散卷积，解释输出尺寸、padding、stride、感受野和 im2col。

下载分享图打开分享中心

配套资源

包含矩阵形状、计算图、loss contour、卷积扫描和 attention heatmap。

打开资源关联文章

打包 NumPy 脚本、CSV 结果、公式图、loss contour、卷积图和 attention 热图。

打开资源关联文章

在浏览器里调梯度检查、优化轨迹、卷积输出尺寸和 attention 权重热图。

打开资源关联文章

发表回复取消回复

要发表评论，您必须先登录。

项目时间线

已发布文章

人工智能基础学习路线：先理解什么是 AI、机器学习和深度学习面向有编程基础的读者，梳理 AI、机器学习、深度学习的关系，并给出可执行的人工智能基础学习路线。
机器学习完整流程：从数据、特征到模型预测从工程视角拆解机器学习完整流程：定义问题、理解数据、处理特征、训练模型、预测和评估。
机器学习算法怎么选：分类、回归、聚类和推荐场景对照表用任务类型、数据规模、解释性和部署成本选择机器学习算法，覆盖逻辑回归、决策树、随机森林、K-means 和表格数据基线模型。
特征工程入门实战：用 scikit-learn 处理缺失值、类别变量和数值标准化用 scikit-learn Pipeline 和 ColumnTransformer 完成特征工程，处理缺失值、类别变量、数值标准化，并避免数据泄漏。
模型训练与评估入门：损失函数、过拟合和准确率怎么理解讲清楚模型训练中的参数、损失函数、梯度下降、过拟合，以及准确率、召回率、F1 等分类评估指标。
过拟合和欠拟合怎么解决：机器学习模型调优实战指南用训练分数和验证分数判断过拟合与欠拟合，并通过模型复杂度、正则化、交叉验证和特征工程调整机器学习模型。
神经网络基础：从感知机到多层网络从一个神经元讲起，解释权重、偏置、激活函数、前向传播、反向传播和典型神经网络训练循环。
神经网络矩阵微积分：从 y = Wx + b 推导 MSE 梯度用手算、矩阵形状图、NumPy 代码和梯度检查解释 y = Wx + b 下 dL/dW = (ŷ - y)x^T 的来源。
反向传播计算图：两层 MLP 的前向、局部梯度和反向传播把两层 MLP 拆成计算图，手算 ReLU、softmax cross-entropy、dW2、dW1，并用 NumPy 复现实验结果。
梯度下降与优化器几何：Momentum、Adam 和 loss surface 轨迹在二维二次函数上手算梯度下降前几步，比较 Momentum 和 Adam 的轨迹，并用代码生成 loss contour。
卷积与感受野数学：5×5 输入、3×3 kernel、padding 和 im2col 手算一次 5x5 输入与 3x3 kernel 的离散卷积，解释输出尺寸、padding、stride、感受野和 im2col。
Transformer Attention 数学：Q/K/V、Softmax 权重、Mask 与 KV Cache 用 3 个 token 手算 scaled dot-product attention，解释 Q/K/V、softmax、mask、多头注意力和 KV cache。
Python 人工智能小实战：用 scikit-learn 完成一个分类任务使用 scikit-learn 内置教学数据集跑通一个分类任务，覆盖数据加载、拆分、标准化、训练、预测、评估和实验记录。
手写数字识别项目入门：先读懂 train.csv、test.csv 和标签结构从项目文件结构入手，读懂手写数字训练集、测试集、标签列和 784 维像素输入，为后续 C 分类器和实验台打基础。
用 C 实现手写数字 Softmax 分类器：从 784 维像素到 submission.csv 结合当前项目源码，讲清楚 softmax 多分类、损失函数、梯度更新、混淆矩阵输出，以及 submission.csv 的生成过程。
手写数字实验记录：怎么把离线分类项目接进浏览器实验台解释浏览器实验台为什么采用轻量预训练模型、它和离线 C 项目的关系，以及如何用样本浏览和手绘输入理解预测结果。
CIFAR-10 Tiny CNN 教程：用 C 语言实现小型卷积神经网络图像分类用单文件 C 程序完成 CIFAR-10 小型 CNN 图像分类，讲解数据格式、网络结构、训练命令、loss、accuracy、常见错误和改进方向。
构建高熵流量防御：基于 Python 的连接层白噪声混淆与对抗性机器学习实践以 mld_chaffing_v2.py 虚幻镜项目为例，讲解加密元数据泄漏、信息熵、分布距离、混淆矩阵、空闲窗口微脉冲和性能测试取舍。
AI 安全威胁建模：用 NIST AML、MITRE ATLAS 和 OWASP 建立攻防地图用 NIST Adversarial ML、MITRE ATLAS 和 OWASP LLM Top 10 建立 AI 安全威胁模型，覆盖资产、攻击面、证据和剩余风险。
对抗样本与鲁棒评估：从 FGSM 公式到 scikit-learn 数字分类实验从 FGSM 公式解释对抗样本，用 scikit-learn digits toy 实验评估 clean accuracy、perturbed accuracy 和扰动预算。
数据投毒与后门攻击防御：污染率、触发器和训练管线隔离用 toy digits 实验解释数据投毒、后门触发器、attack success rate、数据来源审计和训练管线隔离。
模型隐私与模型窃取风险：成员推断、模型抽取和输出接口防护用本地 toy 实验解释成员推断、模型抽取、membership AUC、surrogate fidelity、输出最小化和查询治理。
LLM/RAG/Agent 安全：Prompt Injection、工具权限和边界感知防护从 RAG 和 Agent 架构解释 prompt injection、外部数据降权、工具 allowlist、人工审批和边界感知防护。

已公开资源

Python AI 小实战代码说明文章内包含可直接复制运行的 scikit-learn 分类脚本。
digit_softmax_classifier.c 手写数字 softmax 分类器的 C 语言源码。
train.csv.zip 手写数字训练集压缩包，包含 42000 条带标签样本。
test.csv.zip 手写数字测试集压缩包，包含 28000 条待预测样本。
sample_submission.csv 官方提交格式示例，可直接对照最终输出字段。
submission.csv 当前 C 项目跑出的预测结果文件。
digit-playground-model.json 浏览器实验台使用的轻量 softmax 演示模型与样本。
digit-sample-grid.svg 从训练集中抽取的小型手写数字预览网格。
手写数字项目打包下载包含源码、压缩数据、提交文件、浏览器模型和样本预览图。
cifar10_tiny_cnn.c 源码单文件 C 语言 tiny CNN，包含 CIFAR-10 读取、卷积、池化、softmax 和反向传播。
model_weights.bin 样例权重一次本地小样本运行生成的模型权重文件。
test_predictions.csv 预测样例 CIFAR-10 tiny CNN 输出的测试预测样例。
CNN 项目说明 PDF 配套 CNN 项目说明材料。
虚幻镜脱敏代码骨架去除控制口令、真实节点和目标列表后的 mld_chaffing_v2.py 控制流程说明。
虚幻镜压力测试记录模板用于记录 CPU、内存、线程峰值、微脉冲速率、延迟和错误数的脱敏 CSV 模板。
虚幻镜分类器评估模板用于记录 TP、FN、FP、TN、accuracy、precision、recall、F1、ROC-AUC、熵和 JS 散度的 CSV 模板。
虚幻镜资源说明说明公开资源为何只提供脱敏代码、测试模板和架构笔记。
AI Security Lab 说明说明 AI 安全攻防系列的安全边界、安装命令和 quick-run 实验。
AI Security Lab 完整实验包包含安全 toy scripts、结果 CSV、风险登记表、攻防矩阵和架构图。
AI 安全风险登记表面向 AI 威胁建模和上线评审的 CSV 风险登记模板。
AI 攻防矩阵把攻击面、toy demo、指标和防护控制映射到一张 CSV 表。
AI Security Lab 架构图展示威胁建模、鲁棒评估、数据完整性、模型隐私和 RAG 防护之间的关系。
FGSM digits 鲁棒评估脚本本地 digits 分类器的 FGSM-style 扰动和准确率下降实验。
数据投毒与后门 toy 脚本用 digits 数据演示污染率、触发器和 attack success rate。
模型隐私与抽取 toy 脚本输出 membership AUC、target accuracy、surrogate fidelity 和 surrogate accuracy。
RAG prompt injection guard toy 脚本用确定性 toy agent 演示外部数据降权和工具权限阻断。
Deep Learning Math Lab 说明包含安装命令、脚本入口、输出结果和文章图示生成说明。
深度学习数学完整实验包打包 NumPy 脚本、CSV 结果、公式图、loss contour、卷积图和 attention 热图。
梯度检查结果 CSV 保存 MSE 梯度解析值、数值差分值和误差范数。
优化器轨迹 CSV 记录梯度下降、Momentum 和 Adam 在二维二次函数上的逐步坐标与 loss。
Attention 权重 CSV 三 token scaled dot-product attention 的 scores、softmax weights 和 context 输出。
深度学习数学图示目录包含矩阵形状、计算图、loss contour、卷积扫描和 attention heatmap。
深度学习数学交互演示在浏览器里调梯度检查、优化轨迹、卷积输出尺寸和 attention 权重热图。
深度学习专题分享图用于分享深度学习 / CNN 专题页的 1200x630 SVG 图。
从零实现机器学习分享图用于分享 K-means、Iris 和机器学习流程专题页的 1200x630 SVG 图。
学生 AI 项目分享图用于分享手写数字、C 分类器和浏览器实验台专题页的 1200x630 SVG 图。
CNN 卷积扫描动画 Remotion 生成的 8 秒短动画，展示 3x3 卷积核如何扫描输入并形成特征图。

当前学习路线

人工智能基础学习路线学习路线节点
机器学习完整流程学习路线节点
机器学习算法怎么选学习路线节点
特征工程入门实战学习路线节点
模型训练与评估入门学习路线节点
过拟合和欠拟合怎么解决学习路线节点
神经网络基础学习路线节点
神经网络矩阵微积分学习路线节点
反向传播计算图学习路线节点
梯度下降与优化器几何学习路线节点
卷积与感受野数学学习路线节点
Transformer Attention 数学学习路线节点
LLM 可视化教学台学习路线节点
Python 人工智能小实战学习路线节点
手写数字数据结构入门学习路线节点
用 C 实现手写数字 Softmax 分类器学习路线节点
手写数字实验台说明学习路线节点
CIFAR-10 Tiny CNN 教程学习路线节点
高熵流量防御实验学习路线节点
AI 安全威胁建模学习路线节点
对抗样本与鲁棒评估学习路线节点
数据投毒与后门防御学习路线节点
模型隐私与模型抽取防护学习路线节点
LLM/RAG/Agent 安全学习路线节点

下一步计划

补充更多图像分类和误差分析案例
把常见指标整理成速查表
继续补充 AI 安全防御实验记录

一、卷积输出尺寸的数学公式

二、手算单个输出像素点

三、不断扩张的感受野 (Receptive Field)

四、代码实现：im2col 与矩阵乘法

五、卷积维度检查表

六、动画演示核心过程

七、个人经验与工程师视角

1. The Mathematics of Output Dimensions

2. Hand-Calculating One Output Cell

3. The Expanding Receptive Field

4. Implementation: im2col and Matrix Multiplication

5. Visualizing the Process

6. Personal Experience / Engineer's Perspective

7. Convolution Verification Table

一、卷积输出尺寸的数学公式

二、手算单个输出像素点

三、不断扩张的感受野 (Receptive Field)

四、代码实现：im2col 与矩阵乘法

五、卷积维度检查表

六、动画演示核心过程

七、个人经验与工程师视角

这篇文章适合谁读？

读完后下一步应该看什么？

这篇文章有没有可运行代码或配套资源？

这篇文章和整个网站的学习路线有什么关系？

配套资源

深度学习数学图示目录

深度学习数学完整实验包

深度学习数学交互演示

发表回复 取消回复

项目时间线

发表回复取消回复