Neural Network Basics Tutorial: Perceptrons, Activation, and Backpropagation

Reading info

Level: Intermediate Reading time: 8 min

Neural Networks
Backpropagation
Python

Open knowledge map

English

Neural Network Basics: From Perceptrons to Multi-Layer Networks

Neural networks are often presented as complicated systems, but the entry-level view can be simple: a neural network is a trainable composition of functions. Each layer transforms its input, and multiple layers together can represent more complex relationships.

This article starts with a single neuron and explains weights, bias, activation functions, forward propagation, and the intuition behind backpropagation. The goal is not to derive every formula, but to make neural network training code easier to read.

While reading, keep one main loop in mind: the network predicts with current parameters, measures loss, then updates parameters in the direction that reduces loss.

1. Start With One Neuron

A simple neuron can be written as:

z = w1 * x1 + w2 * x2 + ... + b
output = activation(z)

The parts are:

x: input features
w: weights
b: bias
activation: an activation function

Without activation functions, multiple linear layers can still be collapsed into one linear transformation. Activation functions give the network nonlinear expressive power.

2. What a Perceptron Can Do

A perceptron can be viewed as an early simple neural network. It computes a weighted sum of inputs, then applies a threshold to produce a class label.

if w1 * x1 + w2 * x2 + b > 0:
    predict 1
else:
    predict 0

This can solve linearly separable problems, where classes can be separated by a line, plane, or higher-dimensional hyperplane.

Real data often contains nonlinear relationships, so we need multi-layer networks and nonlinear activation functions.

3. What Is a Layer?

A layer sends a group of inputs through multiple neurons and returns a group of outputs. Common layer roles include:

Input layer: receives raw features
Hidden layer: performs intermediate transformations
Output layer: returns class probabilities or numeric predictions

A small multi-layer network can be represented as:

input features -> hidden layer 1 -> hidden layer 2 -> output layer

Each layer has its own weights and biases. Training adjusts these parameters together.

4. Forward Propagation

Forward propagation means computing from input to output, layer by layer.

x -> layer1 -> activation -> layer2 -> activation -> output

In code, this usually corresponds to a model's forward function. It answers:

Given the current parameters and a batch of input, what does the model predict?

Both training and inference use forward propagation. During training, the prediction is also used to compute loss and update parameters.

5. The Intuition Behind Backpropagation

Backpropagation calculates how each parameter affects the loss. Intuitively, it asks:

If this weight became slightly larger or smaller, how would the final loss change?

With that information, an optimizer can update parameters in a direction that reduces loss.

prediction -> compute loss -> backpropagate gradients -> update parameters

You do not need to hand-write backpropagation at the beginning. Frameworks such as PyTorch and TensorFlow compute gradients automatically. But you should understand why training code contains steps such as loss.backward() and optimizer.step().

6. A Typical Training Loop

In pseudocode, neural network training often looks like this:

for epoch in range(num_epochs):
    for X_batch, y_batch in train_loader:
        y_pred = model(X_batch)
        loss = loss_fn(y_pred, y_batch)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

The loop can be read as five steps:

Take a batch of training data
Run forward propagation to get predictions
Compute loss against the true labels
Backpropagate gradients
Let the optimizer update parameters

7. Why Deep Learning Needs More Data and Compute

Neural networks can express complex patterns, but the cost is real:

They have many parameters and can overfit
They usually need more data
They have a larger tuning space
Training speed depends more heavily on hardware

This is why it is useful to learn the traditional machine learning workflow first. Once data, features, training, and evaluation are clear, neural networks become easier to reason about.

8. Neural Networks and Large Models

Large language models, image generation systems, and speech recognition systems are deep learning systems. They use more complex architectures, larger datasets, and longer training processes.

Even when the model is large, the foundation questions remain similar:

How is input represented as numbers?
How does the model transform input into output?
How does the loss function measure prediction error?
How does training update parameters?
Does the evaluation method reflect real use?

Learning neural network basics is not only about training a network immediately. It gives you the shared language behind modern AI systems.

9. Common Beginner Misunderstandings

When first learning neural networks, these misunderstandings are common:

Assuming more layers are always better while ignoring data size, overfitting, and training cost
Treating activation functions as minor details instead of understanding their nonlinear role
Focusing only on architecture while ignoring the loss function and evaluation metrics
Assuming the model is reliable just because training loss goes down

Neural networks are powerful because of their expressive capacity, but reliability still depends on data splits, evaluation, and error analysis.

10. What to Read Next

The previous article is Model Training and Evaluation. To connect the whole series in one runnable exercise, continue with Python AI Mini Practice.

Chinese

神经网络基础：从感知机到多层网络

Open as a full page

神经网络经常被包装成很复杂的东西，但从入门角度看，可以先把它理解成一种可训练的函数组合：每一层做一次变换，多层连起来，就能表达更复杂的关系。

这篇文章从感知机讲起，逐步解释神经元、权重、激活函数、前向传播和反向传播的基本直觉。目标不是推完所有公式，而是让你能读懂神经网络训练代码的大体结构。

读这篇时可以先抓住一个主线：神经网络先用当前参数做预测，再根据损失反向更新参数。

一、从一个神经元开始

一个最简单的神经元可以写成：

z = w1 * x1 + w2 * x2 + ... + b
output = activation(z)

这里：

x 是输入特征
w 是权重
b 是偏置
activation 是激活函数

如果没有激活函数，多层线性变换叠在一起仍然可以合并成一个线性变换。激活函数的意义是给网络加入非线性表达能力。

二、感知机能做什么

感知机可以看成早期的简单神经网络。它根据输入特征加权求和，再通过阈值得到分类结果。

if w1 * x1 + w2 * x2 + b > 0:
    predict 1
else:
    predict 0

这个模型能解决线性可分问题，也就是可以用一条线、一个平面或更高维超平面分开的数据。

但现实中的关系往往不是线性的，所以我们需要多层网络和非线性激活函数。

三、层是什么

神经网络里的一层，就是把一组输入同时送入多个神经元，得到一组输出。常见结构包括：

输入层：接收原始特征
隐藏层：中间的特征变换层
输出层：输出分类概率或预测数值

一个简单的多层网络可以表示成：

输入特征 -> 隐藏层 1 -> 隐藏层 2 -> 输出层

每一层都有自己的权重和偏置。训练时，模型会同时调整这些参数。

四、前向传播是什么

前向传播就是从输入开始，一层一层算到输出。

x -> layer1 -> activation -> layer2 -> activation -> output

在代码里，前向传播通常对应模型的 forward 函数。它回答的是：

给定当前参数和一批输入，模型会预测什么？

训练和预测都会用到前向传播。区别是训练时还要根据预测误差更新参数。

五、反向传播的直觉

反向传播用于计算每个参数对损失的影响。直觉上，它要回答：

如果这个权重稍微变大或变小，最终损失会怎么变化？

有了这个信息，优化器就可以更新参数，让损失下降。

预测输出 -> 计算损失 -> 反向传播梯度 -> 更新参数

你不需要一开始手写反向传播。PyTorch、TensorFlow 这类框架会自动完成梯度计算。但你需要知道训练循环为什么会有 loss.backward() 和 optimizer.step() 这类步骤。

六、一个典型训练循环

用伪代码表示，神经网络训练通常长这样：

for epoch in range(num_epochs):
    for X_batch, y_batch in train_loader:
        y_pred = model(X_batch)
        loss = loss_fn(y_pred, y_batch)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

这段代码可以拆成五步：

取一批训练数据
前向传播得到预测
计算预测和真实标签之间的损失
反向传播计算梯度
优化器根据梯度更新参数

七、深度学习为什么需要更多数据和算力

神经网络的表达能力很强，但代价也很明显：

参数多，容易过拟合
训练通常需要更多数据
调参空间更大
训练速度更依赖硬件

这也是为什么入门时建议先学传统机器学习流程。你先掌握数据、特征、训练和评估，再进入神经网络，会更容易判断模型到底有没有学到可泛化的规律。

八、神经网络和大模型的关系

大语言模型、图像生成模型、语音识别模型，本质上都属于深度学习系统。它们通常使用更复杂的网络结构、更大的数据集和更长的训练过程。

但无论模型多大，基础问题仍然相似：

输入如何表示成数字
模型结构如何把输入变成输出
损失函数如何衡量预测错误
训练过程如何更新参数
评估方式是否能反映真实使用效果

所以，理解神经网络基础不是为了马上训练大模型，而是为了看懂现代 AI 系统背后的共同语言。

九、初学者容易误解的点

刚接触神经网络时，下面几个误解很常见：

以为层数越多一定越好，忽略数据量、过拟合和训练成本
把激活函数当成可有可无的细节，没有理解非线性的作用
只关注模型结构，不关注损失函数和评估指标
看到训练集 loss 下降就认为模型已经可靠

神经网络的强大来自表达能力，但可靠性仍然要靠数据划分、评估和错误分析来确认。

十、下一步读什么

上一篇是模型训练与评估入门。如果你想把前面几篇串成一次完整练习，可以继续读 Python 人工智能小实战。

While reading, keep one main loop in mind: the network predicts with current parameters, measures loss, then updates parameters in the direction that reduces loss.

1. Start With One Neuron

A simple neuron can be written as:

z = w1 * x1 + w2 * x2 + ... + b
output = activation(z)

The parts are:

x: input features
w: weights
b: bias
activation: an activation function

Without activation functions, multiple linear layers can still be collapsed into one linear transformation. Activation functions give the network nonlinear expressive power.

2. What a Perceptron Can Do

A perceptron can be viewed as an early simple neural network. It computes a weighted sum of inputs, then applies a threshold to produce a class label.

if w1 * x1 + w2 * x2 + b > 0:
    predict 1
else:
    predict 0

This can solve linearly separable problems, where classes can be separated by a line, plane, or higher-dimensional hyperplane.

Real data often contains nonlinear relationships, so we need multi-layer networks and nonlinear activation functions.

3. What Is a Layer?

A layer sends a group of inputs through multiple neurons and returns a group of outputs. Common layer roles include:

Input layer: receives raw features
Hidden layer: performs intermediate transformations
Output layer: returns class probabilities or numeric predictions

A small multi-layer network can be represented as:

input features -> hidden layer 1 -> hidden layer 2 -> output layer

Each layer has its own weights and biases. Training adjusts these parameters together.

4. Forward Propagation

Forward propagation means computing from input to output, layer by layer.

x -> layer1 -> activation -> layer2 -> activation -> output

In code, this usually corresponds to a model’s forward function. It answers:

Given the current parameters and a batch of input, what does the model predict?

Both training and inference use forward propagation. During training, the prediction is also used to compute loss and update parameters.

5. The Intuition Behind Backpropagation

Backpropagation calculates how each parameter affects the loss. Intuitively, it asks:

If this weight became slightly larger or smaller, how would the final loss change?

With that information, an optimizer can update parameters in a direction that reduces loss.

prediction -> compute loss -> backpropagate gradients -> update parameters

6. A Typical Training Loop

In pseudocode, neural network training often looks like this:

for epoch in range(num_epochs):
    for X_batch, y_batch in train_loader:
        y_pred = model(X_batch)
        loss = loss_fn(y_pred, y_batch)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

The loop can be read as five steps:

Take a batch of training data
Run forward propagation to get predictions
Compute loss against the true labels
Backpropagate gradients
Let the optimizer update parameters

7. Why Deep Learning Needs More Data and Compute

Neural networks can express complex patterns, but the cost is real:

They have many parameters and can overfit
They usually need more data
They have a larger tuning space
Training speed depends more heavily on hardware

This is why it is useful to learn the traditional machine learning workflow first. Once data, features, training, and evaluation are clear, neural networks become easier to reason about.

8. Neural Networks and Large Models

Large language models, image generation systems, and speech recognition systems are deep learning systems. They use more complex architectures, larger datasets, and longer training processes.

Even when the model is large, the foundation questions remain similar:

How is input represented as numbers?
How does the model transform input into output?
How does the loss function measure prediction error?
How does training update parameters?
Does the evaluation method reflect real use?

Learning neural network basics is not only about training a network immediately. It gives you the shared language behind modern AI systems.

9. Common Beginner Misunderstandings

When first learning neural networks, these misunderstandings are common:

Assuming more layers are always better while ignoring data size, overfitting, and training cost
Treating activation functions as minor details instead of understanding their nonlinear role
Focusing only on architecture while ignoring the loss function and evaluation metrics
Assuming the model is reliable just because training loss goes down

Neural networks are powerful because of their expressive capacity, but reliability still depends on data splits, evaluation, and error analysis.

10. What to Read Next

The previous article is Model Training and Evaluation. To connect the whole series in one runnable exercise, continue with Python AI Mini Practice.

Search questions

FAQ

Who is this article for?

This article is for readers who want an intermediate-level guide to Neural Network Basics. It takes about 8 min and focuses on Neural Networks, Backpropagation, Python.

What should I read next?

The recommended next step is Python AI Mini Practice, so the article connects into a longer learning route instead of ending as an isolated note.

Does this article include runnable code or companion resources?

This article is primarily explanatory, but the related tutorials point to runnable examples, resources, and project pages.

How does this article fit into the larger site?

It is connected to the article context block, learning routes, resources, and project timeline so readers can move from concept to implementation.

Article context

AI Learning Project

A practical route from AI concepts to machine learning workflow, evaluation, neural networks, Python practice, handwritten digits, a CIFAR-10 CNN, adversarial traffic-defense notes, and AI security.

Level: Intermediate Reading time: 8 min

Neural Networks
Backpropagation
Python

Your next step

Continue: Transformer Self-Attention

Review the foundation View project

Other language version 神经网络基础：从感知机到多层网络

Share summary Neural Network Basics

Move from perceptrons to activation, forward propagation, backpropagation, and training loops.

Download share card Open share center

Project timeline

Published posts

AI Basics Learning Roadmap Separate AI, machine learning, and deep learning before going into implementation details.
Machine Learning Workflow Follow the practical path from data and features to training, prediction, and evaluation.
Model Training and Evaluation Understand loss, overfitting, train/test splits, accuracy, recall, and F1.
Neural Network Basics Move from perceptrons to activation, forward propagation, backpropagation, and training loops.
NLP Basics: Understanding Bag of Words and TF-IDF An introduction to the most fundamental text representation methods in NLP: Bag of Words (BoW) and TF-IDF.
RNN Basics: Handling Sequential Data with Memory Understand the core concepts of Recurrent Neural Networks (RNN), the role of hidden states, and their application in NLP.
Transformer Self-Attention Read Q/K/V, scaled dot-product attention, multi-head attention, and positional encoding before exploring LLM internals.
Python AI Mini Practice Run a small scikit-learn classification task and read the experiment output.
Handwritten Digit Dataset Basics Read train.csv, test.csv, labels, and the flattened 28 by 28 pixel layout before training the classifier.
Handwritten Digit Softmax in C Follow the C implementation from logits and softmax probabilities to confusion matrices and submission export.
Handwritten Digit Playground Notes See how the offline classifier was adapted into a browser demo with drawing input and probability output.
CIFAR-10 Tiny CNN Tutorial in C Build and train a small convolutional neural network for CIFAR-10 image classification, then read its loss and accuracy output.
Building a Tiny CIFAR-10 CNN in C: Convolution, Pooling, and Backpropagation A source-based walkthrough of cifar10_tiny_cnn.c, covering CIFAR-10 binary input, 3x3 convolution, ReLU, max pooling, fully connected logits, softmax, backpropagation, and local commands.
High-Entropy Traffic Defense Notes Study encrypted metadata leaks, entropy, traffic classifiers, and a defensive Python chaffing prototype.
AI Security Threat Modeling Build a defense map with NIST adversarial ML, MITRE ATLAS, and OWASP LLM risks.
Adversarial Examples and Robust Evaluation Evaluate clean and perturbed accuracy with an FGSM-style digits experiment.
Data Poisoning and Backdoor Defense Study poison rate, trigger behavior, attack success rate, and training pipeline controls.
Model Privacy and Extraction Defense Measure membership inference signal and surrogate fidelity against a local toy model.
LLM, RAG, and Agent Security Separate instructions from data and enforce tool permissions against indirect prompt injection.

Published resources

Python AI practice code guide The article includes a runnable scikit-learn classification script.
digit_softmax_classifier.c The C source for the handwritten digit softmax classifier.
train.csv.zip Compressed handwritten digit training set with 42000 labeled samples.
test.csv.zip Compressed handwritten digit test set with 28000 unlabeled samples.
sample_submission.csv The official submission format example for checking the final output columns.
submission.csv The prediction file generated by the current C project.
digit-playground-model.json The compact softmax demo model and sample set used by the browser playground.
digit-sample-grid.svg A small handwritten digit preview grid extracted from the training set.
Handwritten digit project bundle Contains the source file, compressed datasets, submission files, browser model, and preview grid.
cifar10_tiny_cnn.c source Single-file C tiny CNN with CIFAR-10 loading, convolution, pooling, softmax, and backpropagation.
model_weights.bin sample weights Model weights generated by one local small-sample run.
test_predictions.csv sample predictions Sample test prediction output from the CIFAR-10 tiny CNN.
CNN project explanation PDF Companion explanation material for the CNN project.
Virtual Mirror redacted code skeleton A redacted mld_chaffing_v2.py control-flow skeleton with secrets, node topology, and target lists removed.
Virtual Mirror stress-test template A redacted CSV template for CPU, memory, peak threads, pulse rate, latency, and error measurements.
Virtual Mirror classifier-evaluation template A CSV template for TP, FN, FP, TN, accuracy, precision, recall, F1, ROC-AUC, entropy, and JS divergence.
Virtual Mirror resource notes Notes explaining why the public resources include only redacted code, test templates, and architecture context.
AI Security Lab README Setup, safety boundaries, and quick-run commands for the AI Security series.
AI Security Lab full bundle Includes safe toy scripts, result CSVs, risk register, attack-defense matrix, and architecture diagram.
AI security risk register CSV risk register template for AI threat modeling and release review.
AI attack-defense matrix Maps attack surface, toy demo, metric, and defensive control into one CSV table.
AI Security Lab architecture diagram Shows threat modeling, robustness, data integrity, model privacy, and RAG guardrails.
FGSM digits robustness script FGSM-style perturbation and accuracy-drop experiment for a local digits classifier.
Data poisoning and backdoor toy script Demonstrates poison rate, trigger behavior, and attack success rate on digits.
Model privacy and extraction toy script Outputs membership AUC, target accuracy, surrogate fidelity, and surrogate accuracy.
RAG prompt injection guard toy script Uses a deterministic toy agent to demonstrate external-data demotion and tool-policy blocking.
Deep Learning topic share card A 1200x630 SVG card for sharing the Deep Learning / CNN topic hub.
Machine Learning From Scratch share card A 1200x630 SVG card for the K-means, Iris, and ML workflow topic hub.
Student AI Projects share card A 1200x630 SVG card for handwritten digits, C classifiers, and browser demos.
CNN convolution scan animation An 8-second Remotion animation showing how a 3x3 convolution kernel scans an input and builds a feature map.

Current route

AI Basics Learning Roadmap Learning path step
Machine Learning Workflow Learning path step
Model Training and Evaluation Learning path step
Neural Network Basics Learning path step
Transformer Self-Attention Learning path step
LLM Visualizer Learning path step
Python AI Mini Practice Learning path step
Handwritten Digit Dataset Basics Learning path step
Handwritten Digit Softmax in C Learning path step
Handwritten Digit Playground Notes Learning path step
CIFAR-10 Tiny CNN Tutorial in C Learning path step
High-Entropy Traffic Defense Notes Learning path step
AI Security Threat Modeling Learning path step
Adversarial Examples and Robust Evaluation Learning path step
Data Poisoning and Backdoor Defense Learning path step
Model Privacy and Extraction Defense Learning path step
LLM, RAG, and Agent Security Learning path step

Next notes

Add more image-classification and error-analysis cases
Turn common metrics into a quick reference
Add more AI security defense experiment notes

1. Start With One Neuron

2. What a Perceptron Can Do

3. What Is a Layer?

4. Forward Propagation

5. The Intuition Behind Backpropagation

6. A Typical Training Loop

7. Why Deep Learning Needs More Data and Compute

8. Neural Networks and Large Models

9. Common Beginner Misunderstandings

10. What to Read Next

一、从一个神经元开始

二、感知机能做什么

三、层是什么

四、前向传播是什么

五、反向传播的直觉

六、一个典型训练循环

七、深度学习为什么需要更多数据和算力

八、神经网络和大模型的关系

九、初学者容易误解的点

十、下一步读什么

1. Start With One Neuron

2. What a Perceptron Can Do

3. What Is a Layer?

4. Forward Propagation

5. The Intuition Behind Backpropagation

6. A Typical Training Loop

7. Why Deep Learning Needs More Data and Compute

8. Neural Networks and Large Models

9. Common Beginner Misunderstandings

10. What to Read Next

Who is this article for?

What should I read next?

Does this article include runnable code or companion resources?

How does this article fit into the larger site?

Leave a Reply Cancel reply

Project timeline