English
Adversarial Examples and Robust Evaluation: From FGSM to a scikit-learn Digits Experiment
Adversarial examples are not just random noise. In ML security, they are inputs designed around an objective so that small perturbations change model behavior. A professional evaluation should report more than clean accuracy: it needs perturbed accuracy, confidence shifts, failure cases, and defense cost.
This article starts from the FGSM equation and uses the scikit-learn digits dataset for a local safety-bounded experiment. The demo targets only a local toy model. It does not access the network or interact with any real service.
1. Threat model
An adversarial evaluation should state at least four boundaries:
- Attacker knowledge: white-box parameters or black-box prediction queries.
- Perturbation budget: how much the input may change, such as an L-infinity epsilon.
- Attack goal: any wrong class or a specific target class.
- Evaluation target: model only, preprocessing pipeline, abstention policy, or full product system.
Without these conditions, the word "robust" is not comparable.
2. The FGSM intuition
FGSM uses the gradient of loss with respect to the input. A common form is:
x_adv = clip(x + epsilon * sign(grad_x J(theta, x, y)))
J is the loss, theta is the model parameter set, x is the input, and y is the true label. The perturbation moves the input in the direction that increases loss fastest under the chosen budget. If the decision boundary is close, a small movement can change the prediction.
3. Local digits experiment
The lab script trains a multinomial logistic regression model and computes an input-gradient perturbation from the learned weights. Run:
cd ai-security-lab
python src/fgsm_digits_demo.py --quick --out results/fgsm-results.csv
The output CSV contains epsilon, clean_accuracy, perturbed_accuracy, and accuracy_drop. The useful signal is not one score. It is the curve showing how perturbed accuracy changes as epsilon increases.
4. What robust evaluation should report
- Clean accuracy and perturbed accuracy.
- A list of perturbation budgets, not one epsilon.
- Input constraints such as clipping to valid pixel range.
- Failure distribution by class, not just the mean.
- Latency, abstention rate, and false positive cost after defenses.
A claim that accuracy improved after a defense is hard to cite unless the attack budget, evaluation set, and failure handling are clear.
5. Engineering controls
- Add robustness testing to model release gates.
- Use calibrated confidence and abstention for high-risk inputs.
- Log model version, data version, and attack parameters.
- Do not treat robustness on one benchmark as cross-distribution safety.
- Add human review and anomaly monitoring at the product layer.
6. Limitations
FGSM is a clear one-step teaching attack, not the strongest possible attack. A full robustness program should also consider PGD, AutoAttack, natural distribution shift, and physical-world transformations. This demo explains evaluation mechanics; it does not prove system safety.
7. References
Chinese
对抗样本与鲁棒评估:从 FGSM 公式到 scikit-learn 数字分类实验
Open as a full page对抗样本不是“图片上随便加点噪声”。在机器学习安全里,它指的是经过目标函数设计、能在小扰动下改变模型输出的输入。专业评估不能只报告 clean accuracy,还要报告扰动后的准确率、置信度变化、失败样本和防护成本。
这篇文章从 FGSM 的核心公式讲起,用 scikit-learn digits 数据集做一个本地安全实验。实验只针对本地 toy 模型,不访问网络,也不面向真实服务。
一、威胁模型
对抗样本评估至少要说明四个边界:
- 攻击者知识:白盒知道模型参数,还是黑盒只能查询输出。
- 扰动预算:输入最多能改多少,例如 L-infinity epsilon。
- 攻击目标:只要预测错误,还是必须预测成指定类别。
- 评价对象:模型本身、预处理流程、拒答策略,还是完整业务系统。
没有这些条件,所谓“鲁棒”没有可比较意义。
二、FGSM 的数学直觉
Goodfellow 等人的 FGSM 用损失函数对输入的梯度来决定扰动方向。常见形式是:
x_adv = clip(x + epsilon * sign(grad_x J(theta, x, y)))
这里 J 是损失函数,theta 是模型参数,x 是输入,y 是真实标签。它的含义很直接:沿着让损失上升最快的方向移动一点点输入。如果模型边界离样本很近,小扰动就可能改变预测。
三、本地 digits 实验
实验包里的脚本训练一个 multinomial logistic regression,然后根据模型权重计算输入梯度。运行命令:
cd ai-security-lab
python src/fgsm_digits_demo.py --quick --out results/fgsm-results.csv
输出 CSV 字段包括 epsilon、clean_accuracy、perturbed_accuracy 和 accuracy_drop。你应该关注的不是某个单点分数,而是随着 epsilon 增大,扰动准确率是否稳定下降。
四、鲁棒评估应该怎么写
一个专业报告至少应该包含:
- clean accuracy 和 perturbed accuracy。
- 扰动预算列表,而不是单一 epsilon。
- 输入约束,例如像素是否裁剪到合法范围。
- 失败样本类别分布,避免只看总体平均。
- 防御前后延迟、拒答率和误伤率。
如果只报告“加入防御后准确率提升”,但没有说明攻击预算和评估集,就很难被专业人士引用。
五、工程防护清单
- 把鲁棒性测试纳入模型发布门禁。
- 对高风险输入增加置信度校准和拒答策略。
- 记录模型版本、数据版本和攻击参数。
- 避免把鲁棒性结果当成跨分布安全保证。
- 在业务层补充人工复核和异常输入监控。
六、局限性
FGSM 是教学上很清楚的一步攻击,不代表最强攻击。真实鲁棒评估通常还要比较 PGD、AutoAttack、自然分布偏移和物理世界扰动。本实验的目标是解释评估方法,而不是证明模型安全。
七、参考文献
Adversarial examples are not just random noise. In ML security, they are inputs designed around an objective so that small perturbations change model behavior. A professional evaluation should report more than clean accuracy: it needs perturbed accuracy, confidence shifts, failure cases, and defense cost.
This article starts from the FGSM equation and uses the scikit-learn digits dataset for a local safety-bounded experiment. The demo targets only a local toy model. It does not access the network or interact with any real service.
1. Threat model
An adversarial evaluation should state at least four boundaries:
- Attacker knowledge: white-box parameters or black-box prediction queries.
- Perturbation budget: how much the input may change, such as an L-infinity epsilon.
- Attack goal: any wrong class or a specific target class.
- Evaluation target: model only, preprocessing pipeline, abstention policy, or full product system.
Without these conditions, the word “robust” is not comparable.
2. The FGSM intuition
FGSM uses the gradient of loss with respect to the input. A common form is:
x_adv = clip(x + epsilon * sign(grad_x J(theta, x, y)))
J is the loss, theta is the model parameter set, x is the input, and y is the true label. The perturbation moves the input in the direction that increases loss fastest under the chosen budget. If the decision boundary is close, a small movement can change the prediction.
3. Local digits experiment
The lab script trains a multinomial logistic regression model and computes an input-gradient perturbation from the learned weights. Run:
cd ai-security-lab
python src/fgsm_digits_demo.py --quick --out results/fgsm-results.csv
The output CSV contains epsilon, clean_accuracy, perturbed_accuracy, and accuracy_drop. The useful signal is not one score. It is the curve showing how perturbed accuracy changes as epsilon increases.
4. What robust evaluation should report
- Clean accuracy and perturbed accuracy.
- A list of perturbation budgets, not one epsilon.
- Input constraints such as clipping to valid pixel range.
- Failure distribution by class, not just the mean.
- Latency, abstention rate, and false positive cost after defenses.
A claim that accuracy improved after a defense is hard to cite unless the attack budget, evaluation set, and failure handling are clear.
5. Engineering controls
- Add robustness testing to model release gates.
- Use calibrated confidence and abstention for high-risk inputs.
- Log model version, data version, and attack parameters.
- Do not treat robustness on one benchmark as cross-distribution safety.
- Add human review and anomaly monitoring at the product layer.
6. Limitations
FGSM is a clear one-step teaching attack, not the strongest possible attack. A full robustness program should also consider PGD, AutoAttack, natural distribution shift, and physical-world transformations. This demo explains evaluation mechanics; it does not prove system safety.
7. References
Search questions
FAQ
Who is this article for?
This article is for readers who want a professional-level guide to Adversarial Examples and Robust Evaluation. It takes about 11 min and focuses on Adversarial Examples, FGSM, Robust Evaluation, scikit-learn.
What should I read next?
The recommended next step is Data Poisoning and Backdoor Defense, so the article connects into a longer learning route instead of ending as an isolated note.
Does this article include runnable code or companion resources?
Yes. Use the run notes, resource cards, and download links on the page to reproduce the example or inspect the companion files.
How does this article fit into the larger site?
It is connected to the article context block, learning routes, resources, and project timeline so readers can move from concept to implementation.
Article context
AI Learning Project
A practical route from AI concepts to machine learning workflow, evaluation, neural networks, Python practice, handwritten digits, a CIFAR-10 CNN, adversarial traffic-defense notes, and AI security.
Evaluate clean and perturbed accuracy with an FGSM-style digits experiment.
Download share card Open share centerCompanion resources
AI Learning Project / CODE
FGSM digits robustness script
FGSM-style perturbation and accuracy-drop experiment for a local digits classifier.
AI Learning Project / ARCHIVE
AI Security Lab full bundle
Includes safe toy scripts, result CSVs, risk register, attack-defense matrix, and architecture diagram.
Project timeline
Published posts
- AI Basics Learning Roadmap Separate AI, machine learning, and deep learning before going into implementation details.
- Machine Learning Workflow Follow the practical path from data and features to training, prediction, and evaluation.
- Model Training and Evaluation Understand loss, overfitting, train/test splits, accuracy, recall, and F1.
- Neural Network Basics Move from perceptrons to activation, forward propagation, backpropagation, and training loops.
- NLP Basics: Understanding Bag of Words and TF-IDF An introduction to the most fundamental text representation methods in NLP: Bag of Words (BoW) and TF-IDF.
- RNN Basics: Handling Sequential Data with Memory Understand the core concepts of Recurrent Neural Networks (RNN), the role of hidden states, and their application in NLP.
- Transformer Self-Attention Read Q/K/V, scaled dot-product attention, multi-head attention, and positional encoding before exploring LLM internals.
- Python AI Mini Practice Run a small scikit-learn classification task and read the experiment output.
- Handwritten Digit Dataset Basics Read train.csv, test.csv, labels, and the flattened 28 by 28 pixel layout before training the classifier.
- Handwritten Digit Softmax in C Follow the C implementation from logits and softmax probabilities to confusion matrices and submission export.
- Handwritten Digit Playground Notes See how the offline classifier was adapted into a browser demo with drawing input and probability output.
- CIFAR-10 Tiny CNN Tutorial in C Build and train a small convolutional neural network for CIFAR-10 image classification, then read its loss and accuracy output.
- Building a Tiny CIFAR-10 CNN in C: Convolution, Pooling, and Backpropagation A source-based walkthrough of cifar10_tiny_cnn.c, covering CIFAR-10 binary input, 3x3 convolution, ReLU, max pooling, fully connected logits, softmax, backpropagation, and local commands.
- High-Entropy Traffic Defense Notes Study encrypted metadata leaks, entropy, traffic classifiers, and a defensive Python chaffing prototype.
- AI Security Threat Modeling Build a defense map with NIST adversarial ML, MITRE ATLAS, and OWASP LLM risks.
- Adversarial Examples and Robust Evaluation Evaluate clean and perturbed accuracy with an FGSM-style digits experiment.
- Data Poisoning and Backdoor Defense Study poison rate, trigger behavior, attack success rate, and training pipeline controls.
- Model Privacy and Extraction Defense Measure membership inference signal and surrogate fidelity against a local toy model.
- LLM, RAG, and Agent Security Separate instructions from data and enforce tool permissions against indirect prompt injection.
Published resources
- Python AI practice code guide The article includes a runnable scikit-learn classification script.
- digit_softmax_classifier.c The C source for the handwritten digit softmax classifier.
- train.csv.zip Compressed handwritten digit training set with 42000 labeled samples.
- test.csv.zip Compressed handwritten digit test set with 28000 unlabeled samples.
- sample_submission.csv The official submission format example for checking the final output columns.
- submission.csv The prediction file generated by the current C project.
- digit-playground-model.json The compact softmax demo model and sample set used by the browser playground.
- digit-sample-grid.svg A small handwritten digit preview grid extracted from the training set.
- Handwritten digit project bundle Contains the source file, compressed datasets, submission files, browser model, and preview grid.
- cifar10_tiny_cnn.c source Single-file C tiny CNN with CIFAR-10 loading, convolution, pooling, softmax, and backpropagation.
- model_weights.bin sample weights Model weights generated by one local small-sample run.
- test_predictions.csv sample predictions Sample test prediction output from the CIFAR-10 tiny CNN.
- CNN project explanation PDF Companion explanation material for the CNN project.
- Virtual Mirror redacted code skeleton A redacted mld_chaffing_v2.py control-flow skeleton with secrets, node topology, and target lists removed.
- Virtual Mirror stress-test template A redacted CSV template for CPU, memory, peak threads, pulse rate, latency, and error measurements.
- Virtual Mirror classifier-evaluation template A CSV template for TP, FN, FP, TN, accuracy, precision, recall, F1, ROC-AUC, entropy, and JS divergence.
- Virtual Mirror resource notes Notes explaining why the public resources include only redacted code, test templates, and architecture context.
- AI Security Lab README Setup, safety boundaries, and quick-run commands for the AI Security series.
- AI Security Lab full bundle Includes safe toy scripts, result CSVs, risk register, attack-defense matrix, and architecture diagram.
- AI security risk register CSV risk register template for AI threat modeling and release review.
- AI attack-defense matrix Maps attack surface, toy demo, metric, and defensive control into one CSV table.
- AI Security Lab architecture diagram Shows threat modeling, robustness, data integrity, model privacy, and RAG guardrails.
- FGSM digits robustness script FGSM-style perturbation and accuracy-drop experiment for a local digits classifier.
- Data poisoning and backdoor toy script Demonstrates poison rate, trigger behavior, and attack success rate on digits.
- Model privacy and extraction toy script Outputs membership AUC, target accuracy, surrogate fidelity, and surrogate accuracy.
- RAG prompt injection guard toy script Uses a deterministic toy agent to demonstrate external-data demotion and tool-policy blocking.
- Deep Learning topic share card A 1200x630 SVG card for sharing the Deep Learning / CNN topic hub.
- Machine Learning From Scratch share card A 1200x630 SVG card for the K-means, Iris, and ML workflow topic hub.
- Student AI Projects share card A 1200x630 SVG card for handwritten digits, C classifiers, and browser demos.
- CNN convolution scan animation An 8-second Remotion animation showing how a 3x3 convolution kernel scans an input and builds a feature map.
Current route
- AI Basics Learning Roadmap Learning path step
- Machine Learning Workflow Learning path step
- Model Training and Evaluation Learning path step
- Neural Network Basics Learning path step
- Transformer Self-Attention Learning path step
- LLM Visualizer Learning path step
- Python AI Mini Practice Learning path step
- Handwritten Digit Dataset Basics Learning path step
- Handwritten Digit Softmax in C Learning path step
- Handwritten Digit Playground Notes Learning path step
- CIFAR-10 Tiny CNN Tutorial in C Learning path step
- High-Entropy Traffic Defense Notes Learning path step
- AI Security Threat Modeling Learning path step
- Adversarial Examples and Robust Evaluation Learning path step
- Data Poisoning and Backdoor Defense Learning path step
- Model Privacy and Extraction Defense Learning path step
- LLM, RAG, and Agent Security Learning path step
Next notes
- Add more image-classification and error-analysis cases
- Turn common metrics into a quick reference
- Add more AI security defense experiment notes
