CNN Convolution Math Tutorial: Output Size, Receptive Fields, and im2col

Reading info

Level: Intermediate Reading time: 13 min

Convolution
Receptive Field
im2col

Open knowledge map

English

Convolution and Receptive Field Math: Padding, Stride, Channels, and im2col

A convolutional layer is not just an image-model component. It is a fundamental computation pattern built on three core principles: local connectivity, shared weights, and preserved spatial structure. Rather than connecting every input to every output like a dense layer, convolution slides a localized window—a kernel—across the input data. Understanding convolution from first principles is critical to mastering modern deep learning.

This article explores the mathematics behind the convolution operation, demonstrates how the receptive field grows hierarchically, and implements a functioning 2D convolution from scratch using Numpy.

1. The Mathematics of Output Dimensions

When you slide a kernel across an input tensor, the spatial dimensions of the output feature map are determined by four factors: Input size ($W, H$), Kernel size ($K$), Padding ($P$), and Stride ($S$).

The formula to compute the output dimension is:

Output_Size = floor((Input_Size + 2 * Padding - Kernel_Size) / Stride) + 1

Let's break down a classic scenario: an input image of 5x5, a kernel of 3x3, a padding of 0 (Valid convolution), and a stride of 1. Plugging these into our formula:

floor((5 + 2*0 - 3) / 1) + 1 = 3

Thus, our output feature map is exactly 3x3. If we wanted the output to remain 5x5, we would need to add a padding of 1 (Same convolution), assuming a stride of 1.

2. Hand-Calculating One Output Cell

One convolution output value is the sum of element-wise products between a local patch of the input and the kernel matrix. This is essentially a dot product.

5x5 input, 3x3 kernel, and 3x3 output feature map — The highlighted patch is dotted with the kernel to produce one output cell.

If the companion lab writes -1.000000 for row=1,col=1 in conv2d-results.csv, it means multiplying the highlighted 3x3 input patch by the 3x3 kernel matrix and summing all 9 entries yields exactly -1.0.

3. The Expanding Receptive Field

The Receptive Field (RF) is the size of the region in the original input space that affects a specific neural network feature. A single 3x3 convolution sees a 3x3 local region. However, deep neural networks stack multiple convolutional layers. How does the network ever "see" the whole picture?

When you stack a second 3x3 convolution on top of the first one, a single output neuron in the second layer connects to a 3x3 region in the first hidden layer. But each of those 9 neurons in the first hidden layer itself connects to a 3x3 region in the original input. Consequently, a single neuron in layer 2 has an effective receptive field of 5x5 on the original input.


graph TD
    sublayer_2["Layer 2 (1x1 Output)"] --> sublayer_1["Layer 1 (3x3 Feature Map)"]
    sublayer_1 --> input["Original Input (5x5 Receptive Field)"]
    
    style sublayer_2 fill:#f9f,stroke:#333,stroke-width:2px
    style sublayer_1 fill:#bbf,stroke:#333,stroke-width:2px
    style input fill:#bfb,stroke:#333,stroke-width:2px

Mathematically, the receptive field size $RF_l$ at layer $l$ can be computed using the formula:

RF_l = RF_{l-1} + (Kernel_Size_l - 1) * Stride_Product_{i=1 to l-1}

This explains how convolutional networks can start by detecting tiny edges and gradually build up to recognizing complex textures, shapes, and eventually entire objects like faces or cars.

4. Implementation: im2col and Matrix Multiplication

In practice, iterating through an image using nested loops (sliding window) is extremely slow. Modern deep learning frameworks (like PyTorch and TensorFlow) vectorize this operation by transforming the convolution into a massive matrix multiplication. This technique is known as im2col (Image to Column).

im2col extracts each local patch from the input image, flattens it into a 1D vector, and stacks them into a large matrix. The kernel is also flattened. The convolution then becomes a single, highly optimized matrix multiplication (GEMM).


import numpy as np

def conv2d_im2col(image, kernel, stride=1):
    """
    A practical Numpy implementation of 2D Convolution using im2col.
    """
    h_in, w_in = image.shape
    k_h, k_w = kernel.shape
    
    # Calculate output dimensions
    out_h = (h_in - k_h) // stride + 1
    out_w = (w_in - k_w) // stride + 1
    
    # Extract patches (im2col step)
    # Shape of cols: (out_h * out_w, k_h * k_w)
    cols = []
    for r in range(0, h_in - k_h + 1, stride):
        for c in range(0, w_in - k_w + 1, stride):
            patch = image[r:r+k_h, c:c+k_w]
            cols.append(patch.reshape(-1))
            
    im_matrix = np.vstack(cols)
    
    # Flatten kernel
    weight_matrix = kernel.reshape(-1, 1)
    
    # Perform matrix multiplication
    result = im_matrix @ weight_matrix
    
    # Reshape back to feature map dimensions
    return result.reshape(out_h, out_w)

# Test the implementation
test_img = np.arange(25).reshape(5, 5)
test_kernel = np.ones((3, 3))
output = conv2d_im2col(test_img, test_kernel)
print("Output Shape:", output.shape)
print(output)

For a 5x5 input and 3x3 kernel, there are 9 valid patches, each containing 9 values. The im_matrix will have the shape (9, 9).

5. Visualizing the Process

The animation shows the convolution window scanning, output cells filling, and the receptive field expanding.

While watching the animation, notice two properties: the same kernel weights are reused across many spatial positions (weight sharing), and each output initially only "sees" a local region of the input (local connectivity).

6. Personal Experience / Engineer's Perspective

Working with convolutions in real-world scenarios introduces several practical challenges that aren't immediately obvious from the math:

The Memory vs. Compute Trade-off: The im2col approach is brilliant for fully utilizing GPU cores via matrix multiplication, but it comes with a massive cost: memory duplication. By extracting overlapping patches, im2col inflates the memory footprint of the input tensor. If you are working with large medical images (e.g., 3D CT scans), calling im2col can easily cause an Out-Of-Memory (OOM) error. In production C++/CUDA, we often use more memory-efficient implicit GEMM or Winograd algorithms.

Debugging Dimensionality Nightmares: The number one error junior engineers encounter is the dreaded RuntimeError: size mismatch when transitioning from the final Convolutional layer to the first Fully Connected (Dense) layer. Always log or manually compute your final tensor shape using the output size formula before applying a flatten() operation.
Checkerboard Artifacts: When using transposed convolutions (often wrongly called deconvolutions) for upsampling in Generative Adversarial Networks (GANs), you frequently encounter checkerboard artifacts. These happen when the kernel size is not evenly divisible by the stride. A practical fix I often use is to replace transposed convolutions with a nearest-neighbor upsample followed by a standard stride-1 convolution.
Padding Effects on Edges: Zero-padding is the default, but it artificially introduces dark borders into your feature maps. If you notice your model performing poorly on objects at the edge of the image, consider switching to "Reflect" or "Replicate" padding.

7. Convolution Verification Table

A convolution implementation should be checked at three levels: shape math, numerical output, and architectural side effects. The table below provides a compact audit trail for the example in this article.

Check	Expected evidence	Why it matters	Failure signal
Output shape	Input size, kernel size, padding, stride, and computed output dimensions	Shape errors propagate into flatten and dense layers	The formula predicts one shape while the code prints another
Single-cell value	A hand-calculated patch-kernel dot product for one output location	Proves that the sliding-window operation is numerically correct	The implementation is actually correlation, transposed axes, or off by one
Receptive field	Layer-by-layer RF calculation with kernel and stride history	Explains what region of the input influences a deep feature	Architecture changes stride or dilation without updating RF assumptions
Memory behavior	`im2col` matrix shape and estimated memory footprint	Vectorization can trade compute efficiency for memory pressure	Large inputs cause OOM after patch extraction duplicates data

Next up, we transition from rigid, locally-connected convolutions to the flexible, global token-to-token interactions of Attention mechanisms in Transformers.

Chinese

卷积与感受野数学：5×5 输入、3×3 kernel、padding 和 im2col

Open as a full page

卷积层绝不仅仅是一个用于图像处理的模型组件。它是建立在三个核心原则基础上的计算模式：局部连接、权重共享和空间结构保留。与将每个输入连接到每个输出的全连接层不同，卷积会在输入数据上滑动一个局部的窗口（即卷积核）。从第一性原理理解卷积，是掌握现代深度学习的关键。

本文将深入探讨卷积操作背后的数学原理，展示感受野是如何按层级扩大的，并使用 Numpy 从头手写一个功能完整的 2D 卷积。

一、卷积输出尺寸的数学公式

当您在输入张量上滑动卷积核时，输出特征图的空间尺寸取决于四个因素：输入尺寸 ($W, H$)、卷积核大小 ($K$)、填充/Padding ($P$) 和步长/Stride ($S$)。

计算输出尺寸的通用公式为：

Output_Size = floor((Input_Size + 2 * Padding - Kernel_Size) / Stride) + 1

我们来拆解一个经典场景：输入图像为 5x5，卷积核为 3x3，填充为 0 (Valid 卷积)，步长为 1。代入公式：

floor((5 + 2*0 - 3) / 1) + 1 = 3

因此，输出的特征图精确地为 3x3。如果我们希望输出保持 5x5 的尺寸，则需要添加 1 的填充 (Same 卷积)，假设步长依然为 1。

二、手算单个输出像素点

卷积的一个输出值，等于输入的一个局部图像块（Patch）与卷积核矩阵之间逐元素相乘后的总和。这本质上就是一个点积操作。

5x5 输入、3x3 卷积核和 3x3 输出特征图 — 高亮图像块与卷积核做点积，得到输出特征图上的一个像素。

实验包输出的 conv2d-results.csv 显示中间位置 row=1,col=1 的结果是 -1.000000。如果您手动把高亮的 3x3 区域和卷积核对齐相乘并求和，也会得到一模一样的结果。

三、不断扩张的感受野 (Receptive Field)

感受野 (Receptive Field, RF) 是指神经网络中某个特征影响了原始输入空间多大的区域。单层 3x3 卷积只能看到 3x3 的局部区域。然而，深度神经网络堆叠了多个卷积层，网络到底是如何“看懂”整张图片的呢？

当你在第一层上再堆叠一个 3x3 卷积层时，第二层的一个输出神经元连接到第一层隐藏层中 3x3 的区域。但是，这 9 个神经元中的每一个，都分别连接到原始输入中 3x3 的区域。因此，第二层中的单个神经元在原始输入上具有 5x5 的有效感受野。


graph TD
    sublayer_2["第 2 层 (1x1 输出)"] --> sublayer_1["第 1 层 (3x3 特征图)"]
    sublayer_1 --> input["原始输入 (5x5 感受野)"]
    
    style sublayer_2 fill:#f9f,stroke:#333,stroke-width:2px
    style sublayer_1 fill:#bbf,stroke:#333,stroke-width:2px
    style input fill:#bfb,stroke:#333,stroke-width:2px

在数学上，第 $l$ 层的感受野尺寸 $RF_l$ 可以用以下公式计算：

RF_l = RF_{l-1} + (Kernel_Size_l - 1) * Stride_Product_{i=1 to l-1}

这就解释了为什么卷积网络可以从检测微小的边缘开始，逐渐组合以识别复杂的纹理、形状，最终识别出人脸或汽车等完整对象。

四、代码实现：im2col 与矩阵乘法

在工程实践中，使用嵌套循环（滑动窗口）遍历图像的速度极其缓慢。现代深度学习框架（如 PyTorch 和 TensorFlow）通过将卷积转换为庞大的矩阵乘法来对该操作进行向量化加速。这种技术被称为 im2col (Image to Column)。

im2col 会从输入图像中提取每个局部块（Patch），将其展平为一维向量，并将它们堆叠成一个大型矩阵。卷积核也会被展平。随后，卷积操作就变成了一次高度优化的矩阵乘法 (GEMM)。


import numpy as np

def conv2d_im2col(image, kernel, stride=1):
    """
    一个使用 im2col 实现 2D 卷积的实用 Numpy 代码。
    """
    h_in, w_in = image.shape
    k_h, k_w = kernel.shape
    
    # 计算输出尺寸
    out_h = (h_in - k_h) // stride + 1
    out_w = (w_in - k_w) // stride + 1
    
    # 提取图像块 (im2col 步骤)
    # cols 矩阵的形状: (out_h * out_w, k_h * k_w)
    cols = []
    for r in range(0, h_in - k_h + 1, stride):
        for c in range(0, w_in - k_w + 1, stride):
            patch = image[r:r+k_h, c:c+k_w]
            cols.append(patch.reshape(-1))
            
    im_matrix = np.vstack(cols)
    
    # 展平卷积核
    weight_matrix = kernel.reshape(-1, 1)
    
    # 执行矩阵乘法
    result = im_matrix @ weight_matrix
    
    # 将结果 reshape 回特征图的尺寸
    return result.reshape(out_h, out_w)

# 测试实现
test_img = np.arange(25).reshape(5, 5)
test_kernel = np.ones((3, 3))
output = conv2d_im2col(test_img, test_kernel)
print("输出形状:", output.shape)
print(output)

对于 5x5 输入和 3x3 卷积核，共有 9 个有效图像块，每个块包含 9 个值。生成的 im_matrix 形状将是 (9, 9)。

五、卷积维度检查表

卷积代码最常见的问题不是公式不会写，而是张量维度在某一层悄悄错了。下面这张表可以在调试模型时逐层检查。

检查项	应该确认什么	常见错误
输入格式	NCHW 还是 NHWC	通道维和高度维写反
输出尺寸	是否符合公式	padding 或 stride 少写导致 flatten 失败
感受野	深层特征是否覆盖足够输入区域	网络太浅，只能看到局部纹理
参数量	`out_channels * in_channels * k_h * k_w`	通道数暴涨导致显存超限
边界填充	zero / reflect / replicate 是否符合任务	边缘目标预测明显差

六、动画演示核心过程

动画展示了卷积窗口扫描、输出像素填充以及感受野扩大的全过程。

在观看动画时，请注意两个核心属性：相同的卷积核权重在许多空间位置被重复使用（这就是权重共享），并且每个输出最初只“看到”输入的一个局部区域（这就是局部连接）。

七、个人经验与工程师视角

在真实的业务场景中使用卷积神经网络，会遇到一些单看数学公式很难发现的工程挑战：

显存与算力的博弈： im2col 方法非常巧妙，它通过矩阵乘法充分压榨 GPU 核心的算力，但代价极其昂贵：内存复制冗余。通过提取重叠的图像块，im2col 会成倍地膨胀输入张量的显存占用。如果您处理的是大型医疗影像（如 3D CT 扫描），调用 im2col 很容易直接导致 Out-Of-Memory (OOM) 报错。在生产级的 C++/CUDA 开发中，我们通常采用显存效率更高的隐式 GEMM (Implicit GEMM) 或 Winograd 算法。

维度不匹配的噩梦： 初级工程师遇到的最常见的错误，就是从最后一个卷积层过渡到第一个全连接（Dense）层时，突然跳出 RuntimeError: size mismatch。在执行 flatten() 操作之前，务必养成习惯：打印张量形状，或者使用输出尺寸公式手动验算最后的尺寸。
棋盘伪影 (Checkerboard Artifacts)： 在生成对抗网络 (GANs) 中使用转置卷积（常被误称为反卷积）进行上采样时，您经常会遇到棋盘状的伪影。当卷积核的大小不能被步长整除时，就会发生这种现象。我常用的一个实用解决方案是：放弃转置卷积，改用“最近邻插值上采样 (Nearest-neighbor upsample) + 标准步长为 1 的普通卷积”组合来替代。
Padding 对边缘的影响： 补零 (Zero-padding) 是框架的默认选项，但它人为地在特征图中引入了黑色边缘。如果您发现模型在处理图像边缘的对象时性能不佳，可以考虑将其切换为 “Reflect” 或 “Replicate” 填充模式。

下一篇，我们将离开死板、局部连接的卷积，进入 Transformer 中极其灵活、基于全局 Token-to-Token 交互的注意力机制世界。

Run notes

Environment: Python 3 + NumPy + Matplotlib

Install

cd deep-learning-math-lab
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Run

python src/conv2d_im2col.py

Input: 5x5 input, 3x3 kernel, stride 1
Expected output: Writes convolution output, im2col shape, and convolution-scan SVG output.

Install cd deep-learning-math-lab
Install python3 -m venv .venv
Install source .venv/bin/activate
Install pip install -r requirements.txt
Run python src/conv2d_im2col.py

1. The Mathematics of Output Dimensions

The formula to compute the output dimension is:

Output_Size = floor((Input_Size + 2 * Padding - Kernel_Size) / Stride) + 1

Let’s break down a classic scenario: an input image of 5x5, a kernel of 3x3, a padding of 0 (Valid convolution), and a stride of 1. Plugging these into our formula:

floor((5 + 2*0 - 3) / 1) + 1 = 3

Thus, our output feature map is exactly 3x3. If we wanted the output to remain 5x5, we would need to add a padding of 1 (Same convolution), assuming a stride of 1.

2. Hand-Calculating One Output Cell

One convolution output value is the sum of element-wise products between a local patch of the input and the kernel matrix. This is essentially a dot product.

If the companion lab writes -1.000000 for row=1,col=1 in conv2d-results.csv, it means multiplying the highlighted 3×3 input patch by the 3×3 kernel matrix and summing all 9 entries yields exactly -1.0.

3. The Expanding Receptive Field


graph TD
    sublayer_2["Layer 2 (1x1 Output)"] --> sublayer_1["Layer 1 (3x3 Feature Map)"]
    sublayer_1 --> input["Original Input (5x5 Receptive Field)"]
    
    style sublayer_2 fill:#f9f,stroke:#333,stroke-width:2px
    style sublayer_1 fill:#bbf,stroke:#333,stroke-width:2px
    style input fill:#bfb,stroke:#333,stroke-width:2px

Mathematically, the receptive field size $RF_l$ at layer $l$ can be computed using the formula:

RF_l = RF_{l-1} + (Kernel_Size_l - 1) * Stride_Product_{i=1 to l-1}

This explains how convolutional networks can start by detecting tiny edges and gradually build up to recognizing complex textures, shapes, and eventually entire objects like faces or cars.

4. Implementation: im2col and Matrix Multiplication


import numpy as np

def conv2d_im2col(image, kernel, stride=1):
    """
    A practical Numpy implementation of 2D Convolution using im2col.
    """
    h_in, w_in = image.shape
    k_h, k_w = kernel.shape
    
    # Calculate output dimensions
    out_h = (h_in - k_h) // stride + 1
    out_w = (w_in - k_w) // stride + 1
    
    # Extract patches (im2col step)
    # Shape of cols: (out_h * out_w, k_h * k_w)
    cols = []
    for r in range(0, h_in - k_h + 1, stride):
        for c in range(0, w_in - k_w + 1, stride):
            patch = image[r:r+k_h, c:c+k_w]
            cols.append(patch.reshape(-1))
            
    im_matrix = np.vstack(cols)
    
    # Flatten kernel
    weight_matrix = kernel.reshape(-1, 1)
    
    # Perform matrix multiplication
    result = im_matrix @ weight_matrix
    
    # Reshape back to feature map dimensions
    return result.reshape(out_h, out_w)

# Test the implementation
test_img = np.arange(25).reshape(5, 5)
test_kernel = np.ones((3, 3))
output = conv2d_im2col(test_img, test_kernel)
print("Output Shape:", output.shape)
print(output)

For a 5x5 input and 3x3 kernel, there are 9 valid patches, each containing 9 values. The im_matrix will have the shape (9, 9).

5. Visualizing the Process

The animation shows the convolution window scanning, output cells filling, and the receptive field expanding.

While watching the animation, notice two properties: the same kernel weights are reused across many spatial positions (weight sharing), and each output initially only “sees” a local region of the input (local connectivity).

6. Personal Experience / Engineer’s Perspective

Working with convolutions in real-world scenarios introduces several practical challenges that aren’t immediately obvious from the math:

The Memory vs. Compute Trade-off: The im2col approach is brilliant for fully utilizing GPU cores via matrix multiplication, but it comes with a massive cost: memory duplication. By extracting overlapping patches, im2col inflates the memory footprint of the input tensor. If you are working with large medical images (e.g., 3D CT scans), calling im2col can easily cause an Out-Of-Memory (OOM) error. In production C++/CUDA, we often use more memory-efficient implicit GEMM or Winograd algorithms.

Debugging Dimensionality Nightmares: The number one error junior engineers encounter is the dreaded RuntimeError: size mismatch when transitioning from the final Convolutional layer to the first Fully Connected (Dense) layer. Always log or manually compute your final tensor shape using the output size formula before applying a flatten() operation.
Checkerboard Artifacts: When using transposed convolutions (often wrongly called deconvolutions) for upsampling in Generative Adversarial Networks (GANs), you frequently encounter checkerboard artifacts. These happen when the kernel size is not evenly divisible by the stride. A practical fix I often use is to replace transposed convolutions with a nearest-neighbor upsample followed by a standard stride-1 convolution.
Padding Effects on Edges: Zero-padding is the default, but it artificially introduces dark borders into your feature maps. If you notice your model performing poorly on objects at the edge of the image, consider switching to “Reflect” or “Replicate” padding.

7. Convolution Verification Table

Check	Expected evidence	Why it matters	Failure signal
Output shape	Input size, kernel size, padding, stride, and computed output dimensions	Shape errors propagate into flatten and dense layers	The formula predicts one shape while the code prints another
Single-cell value	A hand-calculated patch-kernel dot product for one output location	Proves that the sliding-window operation is numerically correct	The implementation is actually correlation, transposed axes, or off by one
Receptive field	Layer-by-layer RF calculation with kernel and stride history	Explains what region of the input influences a deep feature	Architecture changes stride or dilation without updating RF assumptions
Memory behavior	`im2col` matrix shape and estimated memory footprint	Vectorization can trade compute efficiency for memory pressure	Large inputs cause OOM after patch extraction duplicates data

Next up, we transition from rigid, locally-connected convolutions to the flexible, global token-to-token interactions of Attention mechanisms in Transformers.

Search questions

FAQ

Who is this article for?

This article is for readers who want an intermediate-level guide to Convolution and Receptive Field Math. It takes about 13 min and focuses on Convolution, Receptive Field, im2col.

What should I read next?

The recommended next step is Transformer Attention Math, so the article connects into a longer learning route instead of ending as an isolated note.

Does this article include runnable code or companion resources?

Yes. Use the run notes, resource cards, and download links on the page to reproduce the example or inspect the companion files.

How does this article fit into the larger site?

It is connected to the article context block, learning routes, resources, and project timeline so readers can move from concept to implementation.

Article context

AI Learning Project

A practical route from AI concepts to machine learning workflow, evaluation, neural networks, Python practice, handwritten digits, a CIFAR-10 CNN, adversarial traffic-defense notes, and AI security.

Level: Intermediate Reading time: 13 min

Convolution
Receptive Field
im2col

Your next step

Continue: Transformer Attention Math

Review the foundation Open resource

Other language version 卷积与感受野数学：5×5 输入、3×3 kernel、padding 和 im2col

Share summary Convolution and Receptive Field Math

Compute convolution output size, receptive fields, channel mixing, and im2col layout.

Download share card Open share center

Companion resources

Includes matrix shapes, computation graphs, loss contours, convolution scans, and attention heatmaps.

Open resource Related article

Bundles NumPy scripts, CSV outputs, formula diagrams, loss contours, convolution figures, and attention heatmaps.

Open resource Related article

Browser modules for gradient checking, optimizer paths, convolution output size, and attention heatmaps.

Open resource Related article

Project timeline

Published posts

AI Basics Learning Roadmap Separate AI, machine learning, and deep learning before going into implementation details.
Machine Learning Workflow Follow the practical path from data and features to training, prediction, and evaluation.
Model Training and Evaluation Understand loss, overfitting, train/test splits, accuracy, recall, and F1.
Neural Network Basics Move from perceptrons to activation, forward propagation, backpropagation, and training loops.
Matrix Calculus for Neural Networks Derive dL/dW for y = Wx + b and verify it with finite differences.
Backpropagation as a Computation Graph Trace local gradients through ReLU and softmax cross-entropy in a two-layer MLP.
Gradient Descent and Optimizer Geometry Compare gradient descent, momentum, and Adam on a visible quadratic loss surface.
Convolution and Receptive Field Math Compute convolution output size, receptive fields, channel mixing, and im2col layout.
Transformer Attention Math Hand-calculate Q/K/V scores, softmax weights, masks, multi-head structure, and KV cache.
Python AI Mini Practice Run a small scikit-learn classification task and read the experiment output.
Handwritten Digit Dataset Basics Read train.csv, test.csv, labels, and the flattened 28 by 28 pixel layout before training the classifier.
Handwritten Digit Softmax in C Follow the C implementation from logits and softmax probabilities to confusion matrices and submission export.
Handwritten Digit Playground Notes See how the offline classifier was adapted into a browser demo with drawing input and probability output.
CIFAR-10 Tiny CNN Tutorial in C Build and train a small convolutional neural network for CIFAR-10 image classification, then read its loss and accuracy output.
High-Entropy Traffic Defense Notes Study encrypted metadata leaks, entropy, traffic classifiers, and a defensive Python chaffing prototype.
AI Security Threat Modeling Build a defense map with NIST adversarial ML, MITRE ATLAS, and OWASP LLM risks.
Adversarial Examples and Robust Evaluation Evaluate clean and perturbed accuracy with an FGSM-style digits experiment.
Data Poisoning and Backdoor Defense Study poison rate, trigger behavior, attack success rate, and training pipeline controls.
Model Privacy and Extraction Defense Measure membership inference signal and surrogate fidelity against a local toy model.
LLM, RAG, and Agent Security Separate instructions from data and enforce tool permissions against indirect prompt injection.

Published resources

Python AI practice code guide The article includes a runnable scikit-learn classification script.
digit_softmax_classifier.c The C source for the handwritten digit softmax classifier.
train.csv.zip Compressed handwritten digit training set with 42000 labeled samples.
test.csv.zip Compressed handwritten digit test set with 28000 unlabeled samples.
sample_submission.csv The official submission format example for checking the final output columns.
submission.csv The prediction file generated by the current C project.
digit-playground-model.json The compact softmax demo model and sample set used by the browser playground.
digit-sample-grid.svg A small handwritten digit preview grid extracted from the training set.
Handwritten digit project bundle Contains the source file, compressed datasets, submission files, browser model, and preview grid.
cifar10_tiny_cnn.c source Single-file C tiny CNN with CIFAR-10 loading, convolution, pooling, softmax, and backpropagation.
model_weights.bin sample weights Model weights generated by one local small-sample run.
test_predictions.csv sample predictions Sample test prediction output from the CIFAR-10 tiny CNN.
CNN project explanation PDF Companion explanation material for the CNN project.
Virtual Mirror redacted code skeleton A redacted mld_chaffing_v2.py control-flow skeleton with secrets, node topology, and target lists removed.
Virtual Mirror stress-test template A redacted CSV template for CPU, memory, peak threads, pulse rate, latency, and error measurements.
Virtual Mirror classifier-evaluation template A CSV template for TP, FN, FP, TN, accuracy, precision, recall, F1, ROC-AUC, entropy, and JS divergence.
Virtual Mirror resource notes Notes explaining why the public resources include only redacted code, test templates, and architecture context.
AI Security Lab README Setup, safety boundaries, and quick-run commands for the AI Security series.
AI Security Lab full bundle Includes safe toy scripts, result CSVs, risk register, attack-defense matrix, and architecture diagram.
AI security risk register CSV risk register template for AI threat modeling and release review.
AI attack-defense matrix Maps attack surface, toy demo, metric, and defensive control into one CSV table.
AI Security Lab architecture diagram Shows threat modeling, robustness, data integrity, model privacy, and RAG guardrails.
FGSM digits robustness script FGSM-style perturbation and accuracy-drop experiment for a local digits classifier.
Data poisoning and backdoor toy script Demonstrates poison rate, trigger behavior, and attack success rate on digits.
Model privacy and extraction toy script Outputs membership AUC, target accuracy, surrogate fidelity, and surrogate accuracy.
RAG prompt injection guard toy script Uses a deterministic toy agent to demonstrate external-data demotion and tool-policy blocking.
Deep Learning Math Lab README Setup commands, script entry points, generated outputs, and figure notes for the math series.
Deep learning math full lab bundle Bundles NumPy scripts, CSV outputs, formula diagrams, loss contours, convolution figures, and attention heatmaps.
Gradient check results CSV Stores MSE analytic gradients, finite-difference gradients, and error norms.
Optimizer path CSV Step-by-step coordinates and loss for gradient descent, momentum, and Adam on a 2D quadratic.
Attention weights CSV Scores, softmax weights, and context vectors for a three-token scaled dot-product attention example.
Deep learning math figure set Includes matrix shapes, computation graphs, loss contours, convolution scans, and attention heatmaps.
Deep learning math interactive visualizer Browser modules for gradient checking, optimizer paths, convolution output size, and attention heatmaps.
Deep Learning topic share card A 1200x630 SVG card for sharing the Deep Learning / CNN topic hub.
Machine Learning From Scratch share card A 1200x630 SVG card for the K-means, Iris, and ML workflow topic hub.
Student AI Projects share card A 1200x630 SVG card for handwritten digits, C classifiers, and browser demos.
CNN convolution scan animation An 8-second Remotion animation showing how a 3x3 convolution kernel scans an input and builds a feature map.

Current route

AI Basics Learning Roadmap Learning path step
Machine Learning Workflow Learning path step
Model Training and Evaluation Learning path step
Neural Network Basics Learning path step
Matrix Calculus for Neural Networks Learning path step
Backpropagation as a Computation Graph Learning path step
Gradient Descent and Optimizer Geometry Learning path step
Convolution and Receptive Field Math Learning path step
Transformer Attention Math Learning path step
LLM Visualizer Learning path step
Python AI Mini Practice Learning path step
Handwritten Digit Dataset Basics Learning path step
Handwritten Digit Softmax in C Learning path step
Handwritten Digit Playground Notes Learning path step
CIFAR-10 Tiny CNN Tutorial in C Learning path step
High-Entropy Traffic Defense Notes Learning path step
AI Security Threat Modeling Learning path step
Adversarial Examples and Robust Evaluation Learning path step
Data Poisoning and Backdoor Defense Learning path step
Model Privacy and Extraction Defense Learning path step
LLM, RAG, and Agent Security Learning path step

Next notes

Add more image-classification and error-analysis cases
Turn common metrics into a quick reference
Add more AI security defense experiment notes

1. The Mathematics of Output Dimensions

2. Hand-Calculating One Output Cell

3. The Expanding Receptive Field

4. Implementation: im2col and Matrix Multiplication

5. Visualizing the Process

6. Personal Experience / Engineer's Perspective

7. Convolution Verification Table

一、卷积输出尺寸的数学公式

二、手算单个输出像素点

三、不断扩张的感受野 (Receptive Field)

四、代码实现：im2col 与矩阵乘法

五、卷积维度检查表

六、动画演示核心过程

七、个人经验与工程师视角

1. The Mathematics of Output Dimensions

2. Hand-Calculating One Output Cell

3. The Expanding Receptive Field

4. Implementation: im2col and Matrix Multiplication

5. Visualizing the Process

6. Personal Experience / Engineer’s Perspective

7. Convolution Verification Table

Who is this article for?

What should I read next?

Does this article include runnable code or companion resources?

How does this article fit into the larger site?

Companion resources

Deep learning math figure set

Deep learning math full lab bundle

Deep learning math interactive visualizer

Leave a Reply Cancel reply

Project timeline