English
AI Security Threat Modeling: Build a Defense Map with NIST, MITRE ATLAS, and OWASP
AI security should not start after deployment with a generic vulnerability scan. A defensible AI security program starts with threat modeling: assets, actors, trust boundaries, failure modes, evidence, and residual risk.
This article uses the NIST adversarial machine learning taxonomy, MITRE ATLAS, and the OWASP LLM Top 10 to build an engineering map for AI defense. The goal is not to publish an attack playbook. The goal is to make the system reviewable by AI engineers, security engineers, and future maintainers.
1. The model is not the only asset
Conventional application security often starts with APIs, databases, and identities. AI systems add several asset classes:
- Training data: samples, labels, annotation rules, provenance, and filtering policy.
- Model artifacts: weights, tokenizer, feature processors, calibration settings, and evaluation reports.
- Prediction interfaces: inputs, outputs, confidence values, explanations, and bulk-query behavior.
- Context systems: RAG documents, vector indexes, ranking, and agent tool permissions.
- Feedback loops: user feedback, human review, retraining samples, and evaluation logs.
If the asset list says only "model service", it misses data poisoning, model extraction, membership inference, prompt injection, and supply-chain compromise.
2. A three-layer threat framework
A practical AI threat model can use three layers. The first layer uses NIST adversarial ML categories for evasion, poisoning, privacy attacks, abuse, and supply-chain risks. The second layer maps behaviors to MITRE ATLAS tactics and techniques. The third layer uses OWASP LLM Top 10 risks for LLM, RAG, and agent applications, including prompt injection, sensitive information disclosure, excessive agency, and supply-chain issues.
This combination keeps the vocabulary stable: NIST gives the risk taxonomy, MITRE ATLAS gives the attack-process view, and OWASP gives common application failure modes.
3. What a useful threat record looks like
A useful record should be more specific than "AI security risk exists". It should reach this level:
asset: prediction API
attacker goal: infer whether a sample was in the training data
attack pattern: membership inference from confidence scores
control: reduce confidence precision, rate-limit queries, monitor confidence gap
evidence: train/test confidence distribution and membership AUC
residual risk: labels and model behavior may still leak information
This format can be reused by security review, model review, and runtime monitoring. It also prevents teams from treating one filter or one guardrail as proof that the risk disappeared.
4. How the lab supports the model
The AI Security Lab companion package includes a small risk register and an attack-defense matrix. After downloading the package, run:
cd ai-security-lab
python src/privacy_extraction_demo.py --quick --out results/privacy-extraction-results.csv
python src/rag_prompt_injection_guard_demo.py --quick --out results/rag-guard-results.csv
These demos are local toy experiments. They do not access the network, attack real systems, or include secrets. Their value is to turn abstract risks into measurable signals such as membership AUC, surrogate fidelity, blocked document count, and unauthorized tool-call attempts.
5. Evidence professional teams should keep
- Data evidence: provenance, license, sampling policy, anomaly treatment, and dataset hashes.
- Model evidence: training parameters, evaluation splits, robustness tests, privacy tests, and version ids.
- Interface evidence: rate limits, output fields, confidence precision, error codes, and audit logs.
- LLM evidence: retrieval sources, prompt templates, tool allowlists, and human approval rules.
- Runtime evidence: alert thresholds, drift checks, abuse queries, and rollback records.
A citeable engineering article should identify where evidence comes from. Without evidence, a threat model is only a meeting note.
6. Limitations
Threat modeling is not a formal proof. It does not cover every attack and it does not replace red teaming, code review, data governance, or runtime monitoring. Its value is to create a shared map so robustness, data integrity, privacy, and LLM security testing have explicit entry points.
7. References
Chinese
AI 安全威胁建模:用 NIST AML、MITRE ATLAS 和 OWASP 建立攻防地图
Open as a full pageAI 安全不能只等模型上线后再补漏洞扫描。一个可引用、可审计的 AI 安全方案,第一步应该是威胁建模:明确资产、攻击者、信任边界、失败模式、检测证据和剩余风险。
这篇文章用 NIST Adversarial Machine Learning taxonomy、MITRE ATLAS 和 OWASP LLM Top 10 建立一张工程攻防地图。它面向 AI 工程师和安全工程师,目标不是制造攻击清单,而是把“应该保护什么、如何验证保护是否有效”写清楚。
一、AI 系统的资产不只有模型权重
传统应用安全经常把重点放在 API、数据库和身份权限。AI 系统还多出几类特殊资产:
- 训练数据:样本、标签、标注规则、数据来源、过滤策略。
- 模型工件:权重、tokenizer、特征处理器、校准参数、评估报告。
- 推理接口:输入、输出、置信度、解释字段、批量查询能力。
- 上下文系统:RAG 文档、向量库、检索排序、agent 工具权限。
- 反馈闭环:用户反馈、人工审核、再训练样本、自动评估日志。
如果资产清单只写“模型服务”,就会漏掉数据投毒、模型抽取、成员推断、prompt injection 和供应链篡改这些更接近 AI 的风险。
二、用三层框架组织威胁
一个实用的 AI 威胁模型可以分成三层。第一层使用 NIST 的 adversarial ML 分类,区分逃逸攻击、投毒、隐私攻击、滥用和供应链风险。第二层用 MITRE ATLAS 把攻击行为映射到战术和技术。第三层用 OWASP LLM Top 10 对 LLM/RAG/Agent 系统补上 prompt injection、敏感信息泄漏、过度代理权限和供应链问题。
这种组合的好处是:NIST 负责术语和风险分类,MITRE ATLAS 负责攻击过程建模,OWASP 负责 Web/LLM 工程系统里的常见失败模式。
三、威胁模型表应该怎么写
不要只写“存在安全风险”。工程上至少要写到这个粒度:
asset: prediction API
attacker goal: infer whether a sample was in training data
attack pattern: membership inference from confidence scores
control: reduce confidence precision, rate-limit queries, monitor confidence gap
evidence: train/test confidence distribution and membership AUC
residual risk: model family and task may still leak through labels
这类记录能被安全评审、模型评审和上线后监控复用。它也能避免团队把“已经加了过滤器”误当成“风险已经消失”。
四、实验包如何支持威胁建模
本系列配套的 AI Security Lab 提供一个小型风险登记表和攻防矩阵。先下载资源包,再运行下面的命令查看模板:
cd ai-security-lab
python src/privacy_extraction_demo.py --quick --out results/privacy-extraction-results.csv
python src/rag_prompt_injection_guard_demo.py --quick --out results/rag-guard-results.csv
这些 demo 都是本地 toy 实验:不访问网络、不攻击真实系统、不包含真实 token。它们的价值是帮助你把抽象风险转成指标,例如 membership AUC、surrogate fidelity、blocked document count 和 unauthorized tool-call attempt。
五、专业团队应该保留哪些证据
- 数据证据:数据来源、许可证、采样策略、异常样本处理和 hash 记录。
- 模型证据:训练参数、评估切分、鲁棒性测试、隐私测试和版本号。
- 接口证据:限流规则、输出字段、置信度精度、错误码和审计日志。
- LLM 证据:检索来源、prompt 模板、工具 allowlist、人工审批规则。
- 运行证据:告警阈值、漂移检测、滥用查询、回滚记录。
可引用的安全文章不应该只讲原则,还应该说明“证据在哪里”。没有证据链,威胁模型就只是一次会议纪要。
六、局限性
威胁建模不是形式化证明。它不能保证覆盖所有攻击,也不能代替红队、代码审计、数据治理和运行时监控。它的作用是建立共同语言,让后续鲁棒性、数据完整性、隐私和 LLM 安全测试有明确入口。
七、参考文献
AI security should not start after deployment with a generic vulnerability scan. A defensible AI security program starts with threat modeling: assets, actors, trust boundaries, failure modes, evidence, and residual risk.
This article uses the NIST adversarial machine learning taxonomy, MITRE ATLAS, and the OWASP LLM Top 10 to build an engineering map for AI defense. The goal is not to publish an attack playbook. The goal is to make the system reviewable by AI engineers, security engineers, and future maintainers.
1. The model is not the only asset
Conventional application security often starts with APIs, databases, and identities. AI systems add several asset classes:
- Training data: samples, labels, annotation rules, provenance, and filtering policy.
- Model artifacts: weights, tokenizer, feature processors, calibration settings, and evaluation reports.
- Prediction interfaces: inputs, outputs, confidence values, explanations, and bulk-query behavior.
- Context systems: RAG documents, vector indexes, ranking, and agent tool permissions.
- Feedback loops: user feedback, human review, retraining samples, and evaluation logs.
If the asset list says only “model service”, it misses data poisoning, model extraction, membership inference, prompt injection, and supply-chain compromise.
2. A three-layer threat framework
A practical AI threat model can use three layers. The first layer uses NIST adversarial ML categories for evasion, poisoning, privacy attacks, abuse, and supply-chain risks. The second layer maps behaviors to MITRE ATLAS tactics and techniques. The third layer uses OWASP LLM Top 10 risks for LLM, RAG, and agent applications, including prompt injection, sensitive information disclosure, excessive agency, and supply-chain issues.
This combination keeps the vocabulary stable: NIST gives the risk taxonomy, MITRE ATLAS gives the attack-process view, and OWASP gives common application failure modes.
3. What a useful threat record looks like
A useful record should be more specific than “AI security risk exists”. It should reach this level:
asset: prediction API
attacker goal: infer whether a sample was in the training data
attack pattern: membership inference from confidence scores
control: reduce confidence precision, rate-limit queries, monitor confidence gap
evidence: train/test confidence distribution and membership AUC
residual risk: labels and model behavior may still leak information
This format can be reused by security review, model review, and runtime monitoring. It also prevents teams from treating one filter or one guardrail as proof that the risk disappeared.
4. How the lab supports the model
The AI Security Lab companion package includes a small risk register and an attack-defense matrix. After downloading the package, run:
cd ai-security-lab
python src/privacy_extraction_demo.py --quick --out results/privacy-extraction-results.csv
python src/rag_prompt_injection_guard_demo.py --quick --out results/rag-guard-results.csv
These demos are local toy experiments. They do not access the network, attack real systems, or include secrets. Their value is to turn abstract risks into measurable signals such as membership AUC, surrogate fidelity, blocked document count, and unauthorized tool-call attempts.
5. Evidence professional teams should keep
- Data evidence: provenance, license, sampling policy, anomaly treatment, and dataset hashes.
- Model evidence: training parameters, evaluation splits, robustness tests, privacy tests, and version ids.
- Interface evidence: rate limits, output fields, confidence precision, error codes, and audit logs.
- LLM evidence: retrieval sources, prompt templates, tool allowlists, and human approval rules.
- Runtime evidence: alert thresholds, drift checks, abuse queries, and rollback records.
A citeable engineering article should identify where evidence comes from. Without evidence, a threat model is only a meeting note.
6. Limitations
Threat modeling is not a formal proof. It does not cover every attack and it does not replace red teaming, code review, data governance, or runtime monitoring. Its value is to create a shared map so robustness, data integrity, privacy, and LLM security testing have explicit entry points.
7. References
Search questions
FAQ
Who is this article for?
This article is for readers who want a professional-level guide to AI Security Threat Modeling. It takes about 12 min and focuses on AI Security, Threat Modeling, NIST, MITRE ATLAS.
What should I read next?
The recommended next step is Adversarial Examples and Robust Evaluation, so the article connects into a longer learning route instead of ending as an isolated note.
Does this article include runnable code or companion resources?
Yes. Use the run notes, resource cards, and download links on the page to reproduce the example or inspect the companion files.
How does this article fit into the larger site?
It is connected to the article context block, learning routes, resources, and project timeline so readers can move from concept to implementation.
Article context
AI Learning Project
A practical route from AI concepts to machine learning workflow, evaluation, neural networks, Python practice, handwritten digits, a CIFAR-10 CNN, adversarial traffic-defense notes, and AI security.
Build a defense map with NIST adversarial ML, MITRE ATLAS, and OWASP LLM risks.
Download share card Open share centerCompanion resources
AI Learning Project / GUIDE
AI Security Lab README
Setup, safety boundaries, and quick-run commands for the AI Security series.
AI Learning Project / DATASET
AI security risk register
CSV risk register template for AI threat modeling and release review.
AI Learning Project / DATASET
AI attack-defense matrix
Maps attack surface, toy demo, metric, and defensive control into one CSV table.
AI Learning Project / ARCHIVE
AI Security Lab full bundle
Includes safe toy scripts, result CSVs, risk register, attack-defense matrix, and architecture diagram.
Project timeline
Published posts
- AI Basics Learning Roadmap Separate AI, machine learning, and deep learning before going into implementation details.
- Machine Learning Workflow Follow the practical path from data and features to training, prediction, and evaluation.
- Model Training and Evaluation Understand loss, overfitting, train/test splits, accuracy, recall, and F1.
- Neural Network Basics Move from perceptrons to activation, forward propagation, backpropagation, and training loops.
- NLP Basics: Understanding Bag of Words and TF-IDF An introduction to the most fundamental text representation methods in NLP: Bag of Words (BoW) and TF-IDF.
- RNN Basics: Handling Sequential Data with Memory Understand the core concepts of Recurrent Neural Networks (RNN), the role of hidden states, and their application in NLP.
- Transformer Self-Attention Read Q/K/V, scaled dot-product attention, multi-head attention, and positional encoding before exploring LLM internals.
- Python AI Mini Practice Run a small scikit-learn classification task and read the experiment output.
- Handwritten Digit Dataset Basics Read train.csv, test.csv, labels, and the flattened 28 by 28 pixel layout before training the classifier.
- Handwritten Digit Softmax in C Follow the C implementation from logits and softmax probabilities to confusion matrices and submission export.
- Handwritten Digit Playground Notes See how the offline classifier was adapted into a browser demo with drawing input and probability output.
- CIFAR-10 Tiny CNN Tutorial in C Build and train a small convolutional neural network for CIFAR-10 image classification, then read its loss and accuracy output.
- Building a Tiny CIFAR-10 CNN in C: Convolution, Pooling, and Backpropagation A source-based walkthrough of cifar10_tiny_cnn.c, covering CIFAR-10 binary input, 3x3 convolution, ReLU, max pooling, fully connected logits, softmax, backpropagation, and local commands.
- High-Entropy Traffic Defense Notes Study encrypted metadata leaks, entropy, traffic classifiers, and a defensive Python chaffing prototype.
- AI Security Threat Modeling Build a defense map with NIST adversarial ML, MITRE ATLAS, and OWASP LLM risks.
- Adversarial Examples and Robust Evaluation Evaluate clean and perturbed accuracy with an FGSM-style digits experiment.
- Data Poisoning and Backdoor Defense Study poison rate, trigger behavior, attack success rate, and training pipeline controls.
- Model Privacy and Extraction Defense Measure membership inference signal and surrogate fidelity against a local toy model.
- LLM, RAG, and Agent Security Separate instructions from data and enforce tool permissions against indirect prompt injection.
Published resources
- Python AI practice code guide The article includes a runnable scikit-learn classification script.
- digit_softmax_classifier.c The C source for the handwritten digit softmax classifier.
- train.csv.zip Compressed handwritten digit training set with 42000 labeled samples.
- test.csv.zip Compressed handwritten digit test set with 28000 unlabeled samples.
- sample_submission.csv The official submission format example for checking the final output columns.
- submission.csv The prediction file generated by the current C project.
- digit-playground-model.json The compact softmax demo model and sample set used by the browser playground.
- digit-sample-grid.svg A small handwritten digit preview grid extracted from the training set.
- Handwritten digit project bundle Contains the source file, compressed datasets, submission files, browser model, and preview grid.
- cifar10_tiny_cnn.c source Single-file C tiny CNN with CIFAR-10 loading, convolution, pooling, softmax, and backpropagation.
- model_weights.bin sample weights Model weights generated by one local small-sample run.
- test_predictions.csv sample predictions Sample test prediction output from the CIFAR-10 tiny CNN.
- CNN project explanation PDF Companion explanation material for the CNN project.
- Virtual Mirror redacted code skeleton A redacted mld_chaffing_v2.py control-flow skeleton with secrets, node topology, and target lists removed.
- Virtual Mirror stress-test template A redacted CSV template for CPU, memory, peak threads, pulse rate, latency, and error measurements.
- Virtual Mirror classifier-evaluation template A CSV template for TP, FN, FP, TN, accuracy, precision, recall, F1, ROC-AUC, entropy, and JS divergence.
- Virtual Mirror resource notes Notes explaining why the public resources include only redacted code, test templates, and architecture context.
- AI Security Lab README Setup, safety boundaries, and quick-run commands for the AI Security series.
- AI Security Lab full bundle Includes safe toy scripts, result CSVs, risk register, attack-defense matrix, and architecture diagram.
- AI security risk register CSV risk register template for AI threat modeling and release review.
- AI attack-defense matrix Maps attack surface, toy demo, metric, and defensive control into one CSV table.
- AI Security Lab architecture diagram Shows threat modeling, robustness, data integrity, model privacy, and RAG guardrails.
- FGSM digits robustness script FGSM-style perturbation and accuracy-drop experiment for a local digits classifier.
- Data poisoning and backdoor toy script Demonstrates poison rate, trigger behavior, and attack success rate on digits.
- Model privacy and extraction toy script Outputs membership AUC, target accuracy, surrogate fidelity, and surrogate accuracy.
- RAG prompt injection guard toy script Uses a deterministic toy agent to demonstrate external-data demotion and tool-policy blocking.
- Deep Learning topic share card A 1200x630 SVG card for sharing the Deep Learning / CNN topic hub.
- Machine Learning From Scratch share card A 1200x630 SVG card for the K-means, Iris, and ML workflow topic hub.
- Student AI Projects share card A 1200x630 SVG card for handwritten digits, C classifiers, and browser demos.
- CNN convolution scan animation An 8-second Remotion animation showing how a 3x3 convolution kernel scans an input and builds a feature map.
Current route
- AI Basics Learning Roadmap Learning path step
- Machine Learning Workflow Learning path step
- Model Training and Evaluation Learning path step
- Neural Network Basics Learning path step
- Transformer Self-Attention Learning path step
- LLM Visualizer Learning path step
- Python AI Mini Practice Learning path step
- Handwritten Digit Dataset Basics Learning path step
- Handwritten Digit Softmax in C Learning path step
- Handwritten Digit Playground Notes Learning path step
- CIFAR-10 Tiny CNN Tutorial in C Learning path step
- High-Entropy Traffic Defense Notes Learning path step
- AI Security Threat Modeling Learning path step
- Adversarial Examples and Robust Evaluation Learning path step
- Data Poisoning and Backdoor Defense Learning path step
- Model Privacy and Extraction Defense Learning path step
- LLM, RAG, and Agent Security Learning path step
Next notes
- Add more image-classification and error-analysis cases
- Turn common metrics into a quick reference
- Add more AI security defense experiment notes
