English
Data Poisoning and Backdoor Defense: Poison Rates, Triggers, and Training Pipeline Isolation
Data poisoning and backdoors target the training process, not just one inference request. If an attacker can influence training samples, labels, preprocessing scripts, or upstream data sources, a model may keep good clean accuracy while producing attacker-chosen outputs when a trigger appears.
This article explains poison rate, trigger behavior, attack success rate, and training-pipeline isolation using a local toy experiment. The demo uses the scikit-learn digits dataset only; it does not involve real data sources or real model supply chains.
1. Poisoning versus backdoors
Data poisoning contaminates training data in a way that changes the model boundary. Backdoor behavior is more specific: the model behaves normally on clean inputs but maps triggered inputs to a chosen target.
Backdoor risk often appears around:
- Externally scraped data and weak labels.
- Crowdsourced annotation and human feedback data.
- Third-party pretrained models and fine-tuning datasets.
- Automatic retraining pipelines and user-feedback loops.
2. Core metrics
Backdoor evaluation cannot rely on clean accuracy alone. Track at least:
- Poison rate: the fraction of contaminated training rows.
- Clean accuracy: performance on normal test data.
- Attack success rate: triggered source-class samples mapped to the target class.
- Trigger visibility: whether simple inspection can find the trigger pattern.
- Provenance coverage: how much of the dataset has source and transformation records.
3. Local backdoor experiment
The lab script adds a small lower-right trigger to a fraction of training samples from digit 1 and changes their label to digit 7. It then measures clean accuracy and triggered attack success.
cd ai-security-lab
python src/poisoning_backdoor_demo.py --quick --out results/poisoning-results.csv
The output includes poison_rate, poisoned_rows, clean_accuracy, and trigger_attack_success_rate. If clean accuracy stays high while attack success increases, a normal test set would miss the backdoor behavior.
4. Training-pipeline isolation
Defense should not rely on one cleanup step before training. A more reliable pipeline is auditable by stage:
- Raw data is append-only, not overwritten.
- Every filtering, labeling, and transformation change creates a versioned record.
- Training jobs read only approved data snapshots.
- External models and datasets enter an isolated evaluation area.
- Release gates check both clean accuracy and attack success rate.
This reduces the chance that a temporary data fix becomes a long-term supply-chain exposure.
5. Detection and control checklist
- Sample for label flips, duplicates, abnormal pixel blocks, or unusual token patterns.
- Build trigger scan sets for critical classes and track ASR.
- Review new data sources for provenance and license terms.
- Run shadow evaluation for retraining data before release.
- Keep data lineage so every model version maps back to a dataset snapshot.
6. Limitations
The toy trigger is simpler than real backdoors. Real attacks may use more natural, sparse, or cross-modal triggers. This demo is a metric teaching tool, not a complete backdoor detector.
7. References
Chinese
数据投毒与后门攻击防御:污染率、触发器和训练管线隔离
Open as a full page数据投毒和后门攻击针对的是训练流程,而不是单次推理输入。攻击者如果能影响训练样本、标签、预处理脚本或外部数据源,就可能让模型在正常测试集上表现良好,却在带触发器的输入上稳定输出攻击者想要的类别。
这篇文章用一个本地 toy 实验解释污染率、触发器、attack success rate 和训练管线隔离。实验只使用 scikit-learn digits 数据集,不涉及真实数据源或真实模型供应链。
一、投毒和后门的区别
数据投毒 是训练数据被系统性污染,导致模型整体边界被改变。后门攻击 更隐蔽:模型在正常输入上表现接近正常,但遇到特定触发模式时输出指定目标。
后门风险尤其适合出现在这些位置:
- 外部爬取数据和弱标注数据。
- 众包标注和人工反馈数据。
- 第三方预训练模型和微调数据。
- 自动再训练管线和用户反馈闭环。
二、核心指标
评估后门风险不能只看 clean accuracy。至少要同时看:
- poison rate:训练集中被污染的比例。
- clean accuracy:正常测试集准确率。
- attack success rate:带触发器的源类样本被预测为目标类的比例。
- trigger visibility:触发器是否容易被简单扫描发现。
- data provenance coverage:样本来源和处理记录覆盖率。
三、本地后门实验
实验脚本把 digits 数据中一部分类别 1 的训练样本加上右下角触发器,并把标签改成 7。然后观察正常准确率和触发攻击成功率。
cd ai-security-lab
python src/poisoning_backdoor_demo.py --quick --out results/poisoning-results.csv
输出 CSV 包含 poison_rate、poisoned_rows、clean_accuracy 和 trigger_attack_success_rate。如果 clean accuracy 保持较高但 attack success rate 明显上升,就说明只看正常测试集会漏掉后门行为。
四、训练管线隔离
防御不应只依赖“训练前清洗一下数据”。更稳的做法是把管线拆成可审计阶段:
- 原始数据只追加,不覆盖。
- 每次清洗、过滤、标注变更都生成版本记录。
- 训练作业只读取经过批准的数据快照。
- 外部模型和数据源进入隔离评估区。
- 发布门禁同时检查 clean accuracy 和 attack success rate。
这类隔离能减少“临时数据修复”变成长期供应链风险的概率。
五、检测和防护清单
- 对标签翻转、重复样本、异常像素块或异常 token 模式做抽样检查。
- 对关键类别构造触发器扫描集,记录 ASR。
- 对新数据源设置准入评审和许可证检查。
- 对再训练数据使用影子评估,不直接自动发布。
- 保留数据 lineage,确保每个模型版本能追溯到数据快照。
六、局限性
toy 触发器比真实后门更简单,真实攻击可能使用更自然、更稀疏或跨模态的触发条件。本文给出的实验适合训练团队理解指标,不应被当作完整后门检测器。
七、参考文献
Data poisoning and backdoors target the training process, not just one inference request. If an attacker can influence training samples, labels, preprocessing scripts, or upstream data sources, a model may keep good clean accuracy while producing attacker-chosen outputs when a trigger appears.
This article explains poison rate, trigger behavior, attack success rate, and training-pipeline isolation using a local toy experiment. The demo uses the scikit-learn digits dataset only; it does not involve real data sources or real model supply chains.
1. Poisoning versus backdoors
Data poisoning contaminates training data in a way that changes the model boundary. Backdoor behavior is more specific: the model behaves normally on clean inputs but maps triggered inputs to a chosen target.
Backdoor risk often appears around:
- Externally scraped data and weak labels.
- Crowdsourced annotation and human feedback data.
- Third-party pretrained models and fine-tuning datasets.
- Automatic retraining pipelines and user-feedback loops.
2. Core metrics
Backdoor evaluation cannot rely on clean accuracy alone. Track at least:
- Poison rate: the fraction of contaminated training rows.
- Clean accuracy: performance on normal test data.
- Attack success rate: triggered source-class samples mapped to the target class.
- Trigger visibility: whether simple inspection can find the trigger pattern.
- Provenance coverage: how much of the dataset has source and transformation records.
3. Local backdoor experiment
The lab script adds a small lower-right trigger to a fraction of training samples from digit 1 and changes their label to digit 7. It then measures clean accuracy and triggered attack success.
cd ai-security-lab
python src/poisoning_backdoor_demo.py --quick --out results/poisoning-results.csv
The output includes poison_rate, poisoned_rows, clean_accuracy, and trigger_attack_success_rate. If clean accuracy stays high while attack success increases, a normal test set would miss the backdoor behavior.
4. Training-pipeline isolation
Defense should not rely on one cleanup step before training. A more reliable pipeline is auditable by stage:
- Raw data is append-only, not overwritten.
- Every filtering, labeling, and transformation change creates a versioned record.
- Training jobs read only approved data snapshots.
- External models and datasets enter an isolated evaluation area.
- Release gates check both clean accuracy and attack success rate.
This reduces the chance that a temporary data fix becomes a long-term supply-chain exposure.
5. Detection and control checklist
- Sample for label flips, duplicates, abnormal pixel blocks, or unusual token patterns.
- Build trigger scan sets for critical classes and track ASR.
- Review new data sources for provenance and license terms.
- Run shadow evaluation for retraining data before release.
- Keep data lineage so every model version maps back to a dataset snapshot.
6. Limitations
The toy trigger is simpler than real backdoors. Real attacks may use more natural, sparse, or cross-modal triggers. This demo is a metric teaching tool, not a complete backdoor detector.
7. References
Search questions
FAQ
Who is this article for?
This article is for readers who want a professional-level guide to Data Poisoning and Backdoor Defense. It takes about 11 min and focuses on Data Poisoning, Backdoor Defense, Training Pipeline, scikit-learn.
What should I read next?
The recommended next step is Model Privacy and Extraction Defense, so the article connects into a longer learning route instead of ending as an isolated note.
Does this article include runnable code or companion resources?
Yes. Use the run notes, resource cards, and download links on the page to reproduce the example or inspect the companion files.
How does this article fit into the larger site?
It is connected to the article context block, learning routes, resources, and project timeline so readers can move from concept to implementation.
Article context
AI Learning Project
A practical route from AI concepts to machine learning workflow, evaluation, neural networks, Python practice, handwritten digits, a CIFAR-10 CNN, adversarial traffic-defense notes, and AI security.
Study poison rate, trigger behavior, attack success rate, and training pipeline controls.
Download share card Open share centerCompanion resources
AI Learning Project / CODE
Data poisoning and backdoor toy script
Demonstrates poison rate, trigger behavior, and attack success rate on digits.
AI Learning Project / DATASET
AI security risk register
CSV risk register template for AI threat modeling and release review.
AI Learning Project / ARCHIVE
AI Security Lab full bundle
Includes safe toy scripts, result CSVs, risk register, attack-defense matrix, and architecture diagram.
Project timeline
Published posts
- AI Basics Learning Roadmap Separate AI, machine learning, and deep learning before going into implementation details.
- Machine Learning Workflow Follow the practical path from data and features to training, prediction, and evaluation.
- Model Training and Evaluation Understand loss, overfitting, train/test splits, accuracy, recall, and F1.
- Neural Network Basics Move from perceptrons to activation, forward propagation, backpropagation, and training loops.
- NLP Basics: Understanding Bag of Words and TF-IDF An introduction to the most fundamental text representation methods in NLP: Bag of Words (BoW) and TF-IDF.
- RNN Basics: Handling Sequential Data with Memory Understand the core concepts of Recurrent Neural Networks (RNN), the role of hidden states, and their application in NLP.
- Transformer Self-Attention Read Q/K/V, scaled dot-product attention, multi-head attention, and positional encoding before exploring LLM internals.
- Python AI Mini Practice Run a small scikit-learn classification task and read the experiment output.
- Handwritten Digit Dataset Basics Read train.csv, test.csv, labels, and the flattened 28 by 28 pixel layout before training the classifier.
- Handwritten Digit Softmax in C Follow the C implementation from logits and softmax probabilities to confusion matrices and submission export.
- Handwritten Digit Playground Notes See how the offline classifier was adapted into a browser demo with drawing input and probability output.
- CIFAR-10 Tiny CNN Tutorial in C Build and train a small convolutional neural network for CIFAR-10 image classification, then read its loss and accuracy output.
- Building a Tiny CIFAR-10 CNN in C: Convolution, Pooling, and Backpropagation A source-based walkthrough of cifar10_tiny_cnn.c, covering CIFAR-10 binary input, 3x3 convolution, ReLU, max pooling, fully connected logits, softmax, backpropagation, and local commands.
- High-Entropy Traffic Defense Notes Study encrypted metadata leaks, entropy, traffic classifiers, and a defensive Python chaffing prototype.
- AI Security Threat Modeling Build a defense map with NIST adversarial ML, MITRE ATLAS, and OWASP LLM risks.
- Adversarial Examples and Robust Evaluation Evaluate clean and perturbed accuracy with an FGSM-style digits experiment.
- Data Poisoning and Backdoor Defense Study poison rate, trigger behavior, attack success rate, and training pipeline controls.
- Model Privacy and Extraction Defense Measure membership inference signal and surrogate fidelity against a local toy model.
- LLM, RAG, and Agent Security Separate instructions from data and enforce tool permissions against indirect prompt injection.
Published resources
- Python AI practice code guide The article includes a runnable scikit-learn classification script.
- digit_softmax_classifier.c The C source for the handwritten digit softmax classifier.
- train.csv.zip Compressed handwritten digit training set with 42000 labeled samples.
- test.csv.zip Compressed handwritten digit test set with 28000 unlabeled samples.
- sample_submission.csv The official submission format example for checking the final output columns.
- submission.csv The prediction file generated by the current C project.
- digit-playground-model.json The compact softmax demo model and sample set used by the browser playground.
- digit-sample-grid.svg A small handwritten digit preview grid extracted from the training set.
- Handwritten digit project bundle Contains the source file, compressed datasets, submission files, browser model, and preview grid.
- cifar10_tiny_cnn.c source Single-file C tiny CNN with CIFAR-10 loading, convolution, pooling, softmax, and backpropagation.
- model_weights.bin sample weights Model weights generated by one local small-sample run.
- test_predictions.csv sample predictions Sample test prediction output from the CIFAR-10 tiny CNN.
- CNN project explanation PDF Companion explanation material for the CNN project.
- Virtual Mirror redacted code skeleton A redacted mld_chaffing_v2.py control-flow skeleton with secrets, node topology, and target lists removed.
- Virtual Mirror stress-test template A redacted CSV template for CPU, memory, peak threads, pulse rate, latency, and error measurements.
- Virtual Mirror classifier-evaluation template A CSV template for TP, FN, FP, TN, accuracy, precision, recall, F1, ROC-AUC, entropy, and JS divergence.
- Virtual Mirror resource notes Notes explaining why the public resources include only redacted code, test templates, and architecture context.
- AI Security Lab README Setup, safety boundaries, and quick-run commands for the AI Security series.
- AI Security Lab full bundle Includes safe toy scripts, result CSVs, risk register, attack-defense matrix, and architecture diagram.
- AI security risk register CSV risk register template for AI threat modeling and release review.
- AI attack-defense matrix Maps attack surface, toy demo, metric, and defensive control into one CSV table.
- AI Security Lab architecture diagram Shows threat modeling, robustness, data integrity, model privacy, and RAG guardrails.
- FGSM digits robustness script FGSM-style perturbation and accuracy-drop experiment for a local digits classifier.
- Data poisoning and backdoor toy script Demonstrates poison rate, trigger behavior, and attack success rate on digits.
- Model privacy and extraction toy script Outputs membership AUC, target accuracy, surrogate fidelity, and surrogate accuracy.
- RAG prompt injection guard toy script Uses a deterministic toy agent to demonstrate external-data demotion and tool-policy blocking.
- Deep Learning topic share card A 1200x630 SVG card for sharing the Deep Learning / CNN topic hub.
- Machine Learning From Scratch share card A 1200x630 SVG card for the K-means, Iris, and ML workflow topic hub.
- Student AI Projects share card A 1200x630 SVG card for handwritten digits, C classifiers, and browser demos.
- CNN convolution scan animation An 8-second Remotion animation showing how a 3x3 convolution kernel scans an input and builds a feature map.
Current route
- AI Basics Learning Roadmap Learning path step
- Machine Learning Workflow Learning path step
- Model Training and Evaluation Learning path step
- Neural Network Basics Learning path step
- Transformer Self-Attention Learning path step
- LLM Visualizer Learning path step
- Python AI Mini Practice Learning path step
- Handwritten Digit Dataset Basics Learning path step
- Handwritten Digit Softmax in C Learning path step
- Handwritten Digit Playground Notes Learning path step
- CIFAR-10 Tiny CNN Tutorial in C Learning path step
- High-Entropy Traffic Defense Notes Learning path step
- AI Security Threat Modeling Learning path step
- Adversarial Examples and Robust Evaluation Learning path step
- Data Poisoning and Backdoor Defense Learning path step
- Model Privacy and Extraction Defense Learning path step
- LLM, RAG, and Agent Security Learning path step
Next notes
- Add more image-classification and error-analysis cases
- Turn common metrics into a quick reference
- Add more AI security defense experiment notes
