English
Convolution and Receptive Field Math: Padding, Stride, Channels, and im2col
A convolutional layer is not just an image-model component. It is a computation pattern built from local connections, shared weights, and preserved spatial structure. The best way to understand it is to watch one window scan the input.
This article uses a 5x5 input and a 3x3 kernel, then connects the hand calculation to padding, stride, receptive field, and im2col.
1. Convolution Output Size
out_h = floor((H + 2P - K) / S) + 1
out_w = floor((W + 2P - K) / S) + 1
For a 5x5 input, 3x3 kernel, zero padding, and stride 1:
floor((5 + 2*0 - 3) / 1) + 1 = 3
The output feature map is therefore 3x3.
2. Hand Calculate One Output Cell
One convolution output value is the sum of elementwise products between a local patch and the kernel.
The companion lab writes -1.000000 for row=1,col=1 in conv2d-results.csv. Multiplying the highlighted patch by the kernel and summing the entries gives the same value.
3. Why Receptive Field Grows
A single 3x3 convolution sees a 3x3 local region. After stacking two 3x3 layers, one output in the second layer depends on a 3x3 region in the first layer, and each of those first-layer points depends on a local region in the original image. The effective region in the original input grows.
This is how CNNs can build from edges and textures toward local shapes and higher-level patterns.
4. im2col: Convolution As Matrix Multiplication
Many efficient implementations flatten every patch into one row and stack all patches into a matrix. For this 5x5 input and 3x3 kernel, there are 9 patches, each with 9 values, so the lab reports im2col_shape=9x9.
def im2col(image, kernel_size, stride=1, padding=0):
rows = []
for row in range(out_h):
for col in range(out_w):
patch = padded[row:row+kernel_size, col:col+kernel_size]
rows.append(patch.reshape(-1))
return np.vstack(rows)
5. What The Animation Shows
Watch two properties: the same kernel weights are reused at many positions, and each output initially sees only a local region.
6. Engineering Notes
- Compute layer output sizes before writing the classifier head.
- Larger stride downsamples but loses spatial detail.
- Padding controls how quickly edge information disappears.
- Multi-channel convolution applies kernels per channel and then sums across channels.
The next article moves to Transformers, where attention replaces local windows with global token-to-token weights.
Chinese
卷积与感受野数学:5×5 输入、3×3 kernel、padding 和 im2col
Open as a full page卷积神经网络的核心不是“图片模型专用层”,而是一种局部连接、权重共享和空间结构保留的计算方式。理解卷积,要从一个窗口如何扫描输入开始。
这一篇用 5x5 输入和 3x3 kernel 手算一次卷积,再解释 padding、stride、receptive field 和 im2col。
一、卷积输出尺寸公式
out_h = floor((H + 2P - K) / S) + 1
out_w = floor((W + 2P - K) / S) + 1
如果输入是 5x5,kernel 是 3x3,padding 为 0,stride 为 1,输出就是:
floor((5 + 2*0 - 3) / 1) + 1 = 3
所以输出 feature map 是 3x3。
二、手算一个位置
卷积的一个输出值来自一个局部 patch 和 kernel 的逐元素乘积求和。
实验包输出的 conv2d-results.csv 显示中间位置 row=1,col=1 的结果是 -1.000000。如果你手动把高亮 3x3 区域和 kernel 对齐相乘求和,也会得到同一个值。
三、为什么 receptive field 会变大
单层 3x3 卷积只看输入中的 3x3 局部区域。堆叠两层 3x3 卷积后,第二层的一个输出点会依赖第一层的 3x3 区域,而第一层每个点又依赖原图局部区域。因此原图上的有效感受野会扩大。
这就是 CNN 能从边缘、纹理逐步组合成局部形状和高层模式的原因。
四、im2col:卷积变矩阵乘法
很多高性能实现会把每个 patch 展平为一行,所有 patch 组成一个矩阵。对于这个 5x5 输入和 3x3 kernel,一共有 9 个 patch,每个 patch 有 9 个数,所以 im2col_shape=9x9。
def im2col(image, kernel_size, stride=1, padding=0):
rows = []
for row in range(out_h):
for col in range(out_w):
patch = padded[row:row+kernel_size, col:col+kernel_size]
rows.append(patch.reshape(-1))
return np.vstack(rows)
五、动画看什么
看动画时注意:kernel 参数在不同位置复用,这就是权重共享;窗口只看局部区域,这就是局部连接。
六、工程建议
- 写 CNN 前先算每层输出尺寸,避免 flatten 时维度错误。
- stride 增大能降采样,但也会丢失空间细节。
- padding 可以控制边缘信息是否被快速丢弃。
- 多通道卷积会在每个输入通道上卷积,再跨通道求和。
下一篇进入 Transformer,卷积的局部窗口会被注意力里的全局 token-to-token 权重替代。
A convolutional layer is not just an image-model component. It is a computation pattern built from local connections, shared weights, and preserved spatial structure. The best way to understand it is to watch one window scan the input.
This article uses a 5x5 input and a 3x3 kernel, then connects the hand calculation to padding, stride, receptive field, and im2col.
1. Convolution Output Size
out_h = floor((H + 2P - K) / S) + 1
out_w = floor((W + 2P - K) / S) + 1
For a 5x5 input, 3x3 kernel, zero padding, and stride 1:
floor((5 + 2*0 - 3) / 1) + 1 = 3
The output feature map is therefore 3x3.
2. Hand Calculate One Output Cell
One convolution output value is the sum of elementwise products between a local patch and the kernel.
The companion lab writes -1.000000 for row=1,col=1 in conv2d-results.csv. Multiplying the highlighted patch by the kernel and summing the entries gives the same value.
3. Why Receptive Field Grows
A single 3x3 convolution sees a 3x3 local region. After stacking two 3x3 layers, one output in the second layer depends on a 3x3 region in the first layer, and each of those first-layer points depends on a local region in the original image. The effective region in the original input grows.
This is how CNNs can build from edges and textures toward local shapes and higher-level patterns.
4. im2col: Convolution As Matrix Multiplication
Many efficient implementations flatten every patch into one row and stack all patches into a matrix. For this 5x5 input and 3x3 kernel, there are 9 patches, each with 9 values, so the lab reports im2col_shape=9x9.
def im2col(image, kernel_size, stride=1, padding=0):
rows = []
for row in range(out_h):
for col in range(out_w):
patch = padded[row:row+kernel_size, col:col+kernel_size]
rows.append(patch.reshape(-1))
return np.vstack(rows)
5. What The Animation Shows
Watch two properties: the same kernel weights are reused at many positions, and each output initially sees only a local region.
6. Engineering Notes
- Compute layer output sizes before writing the classifier head.
- Larger stride downsamples but loses spatial detail.
- Padding controls how quickly edge information disappears.
- Multi-channel convolution applies kernels per channel and then sums across channels.
The next article moves to Transformers, where attention replaces local windows with global token-to-token weights.
Search questions
FAQ
Who is this article for?
This article is for readers who want an intermediate-level guide to Convolution and Receptive Field Math. It takes about 13 min and focuses on Convolution, Receptive Field, im2col.
What should I read next?
The recommended next step is Transformer Attention Math, so the article connects into a longer learning route instead of ending as an isolated note.
Does this article include runnable code or companion resources?
Yes. Use the run notes, resource cards, and download links on the page to reproduce the example or inspect the companion files.
How does this article fit into the larger site?
It is connected to the article context block, learning routes, resources, and project timeline so readers can move from concept to implementation.
Article context
AI Learning Project
A practical route from AI concepts to machine learning workflow, evaluation, neural networks, Python practice, handwritten digits, a CIFAR-10 CNN, adversarial traffic-defense notes, and AI security.
Compute convolution output size, receptive fields, channel mixing, and im2col layout.
Download share card Open share centerCompanion resources
AI Learning Project / DIAGRAM
Deep learning math figure set
Includes matrix shapes, computation graphs, loss contours, convolution scans, and attention heatmaps.
AI Learning Project / ARCHIVE
Deep learning math full lab bundle
Bundles NumPy scripts, CSV outputs, formula diagrams, loss contours, convolution figures, and attention heatmaps.
AI Learning Project / TOOL
Deep learning math interactive visualizer
Browser modules for gradient checking, optimizer paths, convolution output size, and attention heatmaps.
Project timeline
Published posts
- AI Basics Learning Roadmap Separate AI, machine learning, and deep learning before going into implementation details.
- Machine Learning Workflow Follow the practical path from data and features to training, prediction, and evaluation.
- Model Training and Evaluation Understand loss, overfitting, train/test splits, accuracy, recall, and F1.
- Neural Network Basics Move from perceptrons to activation, forward propagation, backpropagation, and training loops.
- Matrix Calculus for Neural Networks Derive dL/dW for y = Wx + b and verify it with finite differences.
- Backpropagation as a Computation Graph Trace local gradients through ReLU and softmax cross-entropy in a two-layer MLP.
- Gradient Descent and Optimizer Geometry Compare gradient descent, momentum, and Adam on a visible quadratic loss surface.
- Convolution and Receptive Field Math Compute convolution output size, receptive fields, channel mixing, and im2col layout.
- Transformer Attention Math Hand-calculate Q/K/V scores, softmax weights, masks, multi-head structure, and KV cache.
- NLP Basics: Understanding Bag of Words and TF-IDF An introduction to the most fundamental text representation methods in NLP: Bag of Words (BoW) and TF-IDF.
- RNN Basics: Handling Sequential Data with Memory Understand the core concepts of Recurrent Neural Networks (RNN), the role of hidden states, and their application in NLP.
- Transformer Self-Attention Read Q/K/V, scaled dot-product attention, multi-head attention, and positional encoding before exploring LLM internals.
- Python AI Mini Practice Run a small scikit-learn classification task and read the experiment output.
- Handwritten Digit Dataset Basics Read train.csv, test.csv, labels, and the flattened 28 by 28 pixel layout before training the classifier.
- Handwritten Digit Softmax in C Follow the C implementation from logits and softmax probabilities to confusion matrices and submission export.
- Handwritten Digit Playground Notes See how the offline classifier was adapted into a browser demo with drawing input and probability output.
- CIFAR-10 Tiny CNN Tutorial in C Build and train a small convolutional neural network for CIFAR-10 image classification, then read its loss and accuracy output.
- Building a Tiny CIFAR-10 CNN in C: Convolution, Pooling, and Backpropagation A source-based walkthrough of cifar10_tiny_cnn.c, covering CIFAR-10 binary input, 3x3 convolution, ReLU, max pooling, fully connected logits, softmax, backpropagation, and local commands.
- High-Entropy Traffic Defense Notes Study encrypted metadata leaks, entropy, traffic classifiers, and a defensive Python chaffing prototype.
- AI Security Threat Modeling Build a defense map with NIST adversarial ML, MITRE ATLAS, and OWASP LLM risks.
- Adversarial Examples and Robust Evaluation Evaluate clean and perturbed accuracy with an FGSM-style digits experiment.
- Data Poisoning and Backdoor Defense Study poison rate, trigger behavior, attack success rate, and training pipeline controls.
- Model Privacy and Extraction Defense Measure membership inference signal and surrogate fidelity against a local toy model.
- LLM, RAG, and Agent Security Separate instructions from data and enforce tool permissions against indirect prompt injection.
Published resources
- Python AI practice code guide The article includes a runnable scikit-learn classification script.
- digit_softmax_classifier.c The C source for the handwritten digit softmax classifier.
- train.csv.zip Compressed handwritten digit training set with 42000 labeled samples.
- test.csv.zip Compressed handwritten digit test set with 28000 unlabeled samples.
- sample_submission.csv The official submission format example for checking the final output columns.
- submission.csv The prediction file generated by the current C project.
- digit-playground-model.json The compact softmax demo model and sample set used by the browser playground.
- digit-sample-grid.svg A small handwritten digit preview grid extracted from the training set.
- Handwritten digit project bundle Contains the source file, compressed datasets, submission files, browser model, and preview grid.
- cifar10_tiny_cnn.c source Single-file C tiny CNN with CIFAR-10 loading, convolution, pooling, softmax, and backpropagation.
- model_weights.bin sample weights Model weights generated by one local small-sample run.
- test_predictions.csv sample predictions Sample test prediction output from the CIFAR-10 tiny CNN.
- CNN project explanation PDF Companion explanation material for the CNN project.
- Virtual Mirror redacted code skeleton A redacted mld_chaffing_v2.py control-flow skeleton with secrets, node topology, and target lists removed.
- Virtual Mirror stress-test template A redacted CSV template for CPU, memory, peak threads, pulse rate, latency, and error measurements.
- Virtual Mirror classifier-evaluation template A CSV template for TP, FN, FP, TN, accuracy, precision, recall, F1, ROC-AUC, entropy, and JS divergence.
- Virtual Mirror resource notes Notes explaining why the public resources include only redacted code, test templates, and architecture context.
- AI Security Lab README Setup, safety boundaries, and quick-run commands for the AI Security series.
- AI Security Lab full bundle Includes safe toy scripts, result CSVs, risk register, attack-defense matrix, and architecture diagram.
- AI security risk register CSV risk register template for AI threat modeling and release review.
- AI attack-defense matrix Maps attack surface, toy demo, metric, and defensive control into one CSV table.
- AI Security Lab architecture diagram Shows threat modeling, robustness, data integrity, model privacy, and RAG guardrails.
- FGSM digits robustness script FGSM-style perturbation and accuracy-drop experiment for a local digits classifier.
- Data poisoning and backdoor toy script Demonstrates poison rate, trigger behavior, and attack success rate on digits.
- Model privacy and extraction toy script Outputs membership AUC, target accuracy, surrogate fidelity, and surrogate accuracy.
- RAG prompt injection guard toy script Uses a deterministic toy agent to demonstrate external-data demotion and tool-policy blocking.
- Deep Learning Math Lab README Setup commands, script entry points, generated outputs, and figure notes for the math series.
- Deep learning math full lab bundle Bundles NumPy scripts, CSV outputs, formula diagrams, loss contours, convolution figures, and attention heatmaps.
- Gradient check results CSV Stores MSE analytic gradients, finite-difference gradients, and error norms.
- Optimizer path CSV Step-by-step coordinates and loss for gradient descent, momentum, and Adam on a 2D quadratic.
- Attention weights CSV Scores, softmax weights, and context vectors for a three-token scaled dot-product attention example.
- Deep learning math figure set Includes matrix shapes, computation graphs, loss contours, convolution scans, and attention heatmaps.
- Deep learning math interactive visualizer Browser modules for gradient checking, optimizer paths, convolution output size, and attention heatmaps.
- Deep Learning topic share card A 1200x630 SVG card for sharing the Deep Learning / CNN topic hub.
- Machine Learning From Scratch share card A 1200x630 SVG card for the K-means, Iris, and ML workflow topic hub.
- Student AI Projects share card A 1200x630 SVG card for handwritten digits, C classifiers, and browser demos.
- CNN convolution scan animation An 8-second Remotion animation showing how a 3x3 convolution kernel scans an input and builds a feature map.
Current route
- AI Basics Learning Roadmap Learning path step
- Machine Learning Workflow Learning path step
- Model Training and Evaluation Learning path step
- Neural Network Basics Learning path step
- Matrix Calculus for Neural Networks Learning path step
- Backpropagation as a Computation Graph Learning path step
- Gradient Descent and Optimizer Geometry Learning path step
- Convolution and Receptive Field Math Learning path step
- Transformer Attention Math Learning path step
- Transformer Self-Attention Learning path step
- LLM Visualizer Learning path step
- Python AI Mini Practice Learning path step
- Handwritten Digit Dataset Basics Learning path step
- Handwritten Digit Softmax in C Learning path step
- Handwritten Digit Playground Notes Learning path step
- CIFAR-10 Tiny CNN Tutorial in C Learning path step
- High-Entropy Traffic Defense Notes Learning path step
- AI Security Threat Modeling Learning path step
- Adversarial Examples and Robust Evaluation Learning path step
- Data Poisoning and Backdoor Defense Learning path step
- Model Privacy and Extraction Defense Learning path step
- LLM, RAG, and Agent Security Learning path step
Next notes
- Add more image-classification and error-analysis cases
- Turn common metrics into a quick reference
- Add more AI security defense experiment notes
