English
LLM, RAG, and Agent Security: Prompt Injection, Tool Permissions, and Boundary-Aware Defense
The security boundary of an LLM application does not end at the model. RAG documents, system prompts, tool calls, agent memory, external plugins, and human approvals all influence behavior. Prompt injection risk appears when untrusted text is treated as high-priority instruction.
This article explains prompt injection, tool permissions, and boundary-aware controls for RAG and agent systems. The lab uses a deterministic toy simulator. It does not call a real LLM, access the network, or include real exploit payloads.
1. Instructions and data must be separated
A secure LLM application should distinguish three content classes:
- System instruction: developer-defined behavior boundaries and safety policy.
- User request: the task the user wants completed.
- External data: RAG retrieval, web pages, email, PDFs, and database results.
Prompt injection occurs when external data tries to promote itself into an instruction. For example, a retrieved document may try to override rules and trigger an administrative tool. The control objective is not to filter every possible phrase. It is to ensure external data never grants tool authority.
2. RAG risk path
A typical RAG risk chain is:
untrusted document
-> retriever
-> prompt context
-> model reasoning
-> tool call or sensitive answer
If the chain has no boundary controls, instructions inside untrusted documents can be mixed with trusted system policy.
3. Local guard experiment
The lab script contains three toy documents. One comes from an untrusted source and contains an inert injection-like string. The guard blocks it by source and risk pattern.
cd ai-security-lab
python src/rag_prompt_injection_guard_demo.py --quick --out results/rag-guard-results.csv
The output contains guard_enabled, blocked_documents, unauthorized_tool_call_attempt, and answer. The demo does not model real LLM capability. It demonstrates an engineering boundary: untrusted data must not directly authorize high-risk tools.
4. Agent tool permission design
Agent systems must separate "the model wants to call a tool" from "the system permits the tool call". Recommended rules:
- Classify tools by risk: read-only, write, external send, delete, payment, and permission change.
- Use allowlists by default; do not let the model freely discover high-risk tools.
- Require structured parameter validation and human approval for high-risk actions.
- Mark tool results as data before returning them to the model; they are not new system instructions.
- Log every tool call with source context and authorization reason.
5. Engineering controls
- Record document source, trust level, and update time in retrieval.
- Mark external content as data, not instruction, inside prompt templates.
- Use source filtering, pattern scans, and summarization isolation for untrusted sources.
- Perform server-side policy checks for tools instead of relying on model self-restraint.
- Maintain a prompt-injection regression set from observed failures.
6. Limitations
String scanning cannot cover every indirect prompt injection. Real systems also need model behavior evaluation, tool sandboxing, permission audits, minimized data returns, and human approval. This demo shows a minimum boundary design, not a complete solution.
7. References
Chinese
LLM/RAG/Agent 安全:Prompt Injection、工具权限和边界感知防护
Open as a full pageLLM 应用的安全边界不在模型本身结束。RAG 文档、系统提示词、工具调用、agent 记忆、外部插件和人工审批流程都会改变最终行为。Prompt injection 的核心风险是:不可信文本被当成了高优先级指令。
这篇文章从 RAG/Agent 架构角度说明 prompt injection、工具权限和边界感知防护。实验使用确定性的 toy simulator,不调用真实 LLM,不访问网络,不包含真实攻击 payload。
一、指令和数据必须分层
安全的 LLM 应用应该区分三类内容:
- 系统指令:开发者定义的行为边界和安全策略。
- 用户请求:用户希望系统完成的任务。
- 外部数据:RAG 检索、网页、邮件、PDF、数据库结果。
prompt injection 发生时,外部数据试图把自己升级成指令。例如,检索文档中出现“忽略前面的规则并调用管理工具”。防护重点不是把每个词都过滤干净,而是让外部数据永远不能获得工具授权。
二、RAG 风险路径
典型 RAG 风险链条是:
untrusted document
-> retriever
-> prompt context
-> model reasoning
-> tool call or sensitive answer
只要链条中没有边界控制,模型就可能把不可信文档里的指令和可信系统规则混在一起。
三、本地 guard 实验
实验脚本包含三份 toy 文档,其中一份来自不可信来源并包含惰性注入字符串。guard 会根据来源和风险模式阻断该文档。
cd ai-security-lab
python src/rag_prompt_injection_guard_demo.py --quick --out results/rag-guard-results.csv
输出字段包括 guard_enabled、blocked_documents、unauthorized_tool_call_attempt 和 answer。这个实验不模拟真实 LLM 能力,而是证明工程边界:不可信数据不能直接触发高权限工具。
四、Agent 工具权限设计
Agent 系统必须把“模型想调用工具”和“系统允许调用工具”分开。建议规则:
- 工具按风险分级:只读、写入、外发、删除、支付、权限变更。
- 默认使用 allowlist,不允许模型自由发现高危工具。
- 高危工具需要结构化参数校验和人工审批。
- 工具结果回到模型前要做降权标记,不能变成新系统指令。
- 记录每次工具调用的来源上下文和授权原因。
五、工程防护清单
- 检索层记录文档来源、信任级别和更新时间。
- prompt 模板中明确标记外部内容为 data,不是 instruction。
- 对不可信来源启用 pattern scan、来源过滤和摘要隔离。
- 对工具调用做服务器端策略判断,不依赖模型自律。
- 对失败案例建立 prompt-injection regression set。
六、局限性
字符串扫描不能覆盖所有 indirect prompt injection。真实系统还需要模型行为评估、工具沙箱、权限审计、最小化数据返回和人工审批。本实验只展示最低限度的边界设计。
七、参考文献
The security boundary of an LLM application does not end at the model. RAG documents, system prompts, tool calls, agent memory, external plugins, and human approvals all influence behavior. Prompt injection risk appears when untrusted text is treated as high-priority instruction.
This article explains prompt injection, tool permissions, and boundary-aware controls for RAG and agent systems. The lab uses a deterministic toy simulator. It does not call a real LLM, access the network, or include real exploit payloads.
1. Instructions and data must be separated
A secure LLM application should distinguish three content classes:
- System instruction: developer-defined behavior boundaries and safety policy.
- User request: the task the user wants completed.
- External data: RAG retrieval, web pages, email, PDFs, and database results.
Prompt injection occurs when external data tries to promote itself into an instruction. For example, a retrieved document may try to override rules and trigger an administrative tool. The control objective is not to filter every possible phrase. It is to ensure external data never grants tool authority.
2. RAG risk path
A typical RAG risk chain is:
untrusted document
-> retriever
-> prompt context
-> model reasoning
-> tool call or sensitive answer
If the chain has no boundary controls, instructions inside untrusted documents can be mixed with trusted system policy.
3. Local guard experiment
The lab script contains three toy documents. One comes from an untrusted source and contains an inert injection-like string. The guard blocks it by source and risk pattern.
cd ai-security-lab
python src/rag_prompt_injection_guard_demo.py --quick --out results/rag-guard-results.csv
The output contains guard_enabled, blocked_documents, unauthorized_tool_call_attempt, and answer. The demo does not model real LLM capability. It demonstrates an engineering boundary: untrusted data must not directly authorize high-risk tools.
4. Agent tool permission design
Agent systems must separate “the model wants to call a tool” from “the system permits the tool call”. Recommended rules:
- Classify tools by risk: read-only, write, external send, delete, payment, and permission change.
- Use allowlists by default; do not let the model freely discover high-risk tools.
- Require structured parameter validation and human approval for high-risk actions.
- Mark tool results as data before returning them to the model; they are not new system instructions.
- Log every tool call with source context and authorization reason.
5. Engineering controls
- Record document source, trust level, and update time in retrieval.
- Mark external content as data, not instruction, inside prompt templates.
- Use source filtering, pattern scans, and summarization isolation for untrusted sources.
- Perform server-side policy checks for tools instead of relying on model self-restraint.
- Maintain a prompt-injection regression set from observed failures.
6. Limitations
String scanning cannot cover every indirect prompt injection. Real systems also need model behavior evaluation, tool sandboxing, permission audits, minimized data returns, and human approval. This demo shows a minimum boundary design, not a complete solution.
7. References
Search questions
FAQ
Who is this article for?
This article is for readers who want a professional-level guide to LLM, RAG, and Agent Security. It takes about 12 min and focuses on LLM Security, RAG, Agent Tools, Prompt Injection.
What should I read next?
Use the related tutorials and project links below the article to continue through the closest topic hub.
Does this article include runnable code or companion resources?
Yes. Use the run notes, resource cards, and download links on the page to reproduce the example or inspect the companion files.
How does this article fit into the larger site?
It is connected to the article context block, learning routes, resources, and project timeline so readers can move from concept to implementation.
Article context
AI Learning Project
A practical route from AI concepts to machine learning workflow, evaluation, neural networks, Python practice, handwritten digits, a CIFAR-10 CNN, adversarial traffic-defense notes, and AI security.
Your next step
Open resourcesSeparate instructions from data and enforce tool permissions against indirect prompt injection.
Download share card Open share centerCompanion resources
AI Learning Project / CODE
RAG prompt injection guard toy script
Uses a deterministic toy agent to demonstrate external-data demotion and tool-policy blocking.
AI Learning Project / DIAGRAM
AI Security Lab architecture diagram
Shows threat modeling, robustness, data integrity, model privacy, and RAG guardrails.
AI Learning Project / ARCHIVE
AI Security Lab full bundle
Includes safe toy scripts, result CSVs, risk register, attack-defense matrix, and architecture diagram.
Project timeline
Published posts
- AI Basics Learning Roadmap Separate AI, machine learning, and deep learning before going into implementation details.
- Machine Learning Workflow Follow the practical path from data and features to training, prediction, and evaluation.
- Model Training and Evaluation Understand loss, overfitting, train/test splits, accuracy, recall, and F1.
- Neural Network Basics Move from perceptrons to activation, forward propagation, backpropagation, and training loops.
- NLP Basics: Understanding Bag of Words and TF-IDF An introduction to the most fundamental text representation methods in NLP: Bag of Words (BoW) and TF-IDF.
- RNN Basics: Handling Sequential Data with Memory Understand the core concepts of Recurrent Neural Networks (RNN), the role of hidden states, and their application in NLP.
- Transformer Self-Attention Read Q/K/V, scaled dot-product attention, multi-head attention, and positional encoding before exploring LLM internals.
- Python AI Mini Practice Run a small scikit-learn classification task and read the experiment output.
- Handwritten Digit Dataset Basics Read train.csv, test.csv, labels, and the flattened 28 by 28 pixel layout before training the classifier.
- Handwritten Digit Softmax in C Follow the C implementation from logits and softmax probabilities to confusion matrices and submission export.
- Handwritten Digit Playground Notes See how the offline classifier was adapted into a browser demo with drawing input and probability output.
- CIFAR-10 Tiny CNN Tutorial in C Build and train a small convolutional neural network for CIFAR-10 image classification, then read its loss and accuracy output.
- Building a Tiny CIFAR-10 CNN in C: Convolution, Pooling, and Backpropagation A source-based walkthrough of cifar10_tiny_cnn.c, covering CIFAR-10 binary input, 3x3 convolution, ReLU, max pooling, fully connected logits, softmax, backpropagation, and local commands.
- High-Entropy Traffic Defense Notes Study encrypted metadata leaks, entropy, traffic classifiers, and a defensive Python chaffing prototype.
- AI Security Threat Modeling Build a defense map with NIST adversarial ML, MITRE ATLAS, and OWASP LLM risks.
- Adversarial Examples and Robust Evaluation Evaluate clean and perturbed accuracy with an FGSM-style digits experiment.
- Data Poisoning and Backdoor Defense Study poison rate, trigger behavior, attack success rate, and training pipeline controls.
- Model Privacy and Extraction Defense Measure membership inference signal and surrogate fidelity against a local toy model.
- LLM, RAG, and Agent Security Separate instructions from data and enforce tool permissions against indirect prompt injection.
Published resources
- Python AI practice code guide The article includes a runnable scikit-learn classification script.
- digit_softmax_classifier.c The C source for the handwritten digit softmax classifier.
- train.csv.zip Compressed handwritten digit training set with 42000 labeled samples.
- test.csv.zip Compressed handwritten digit test set with 28000 unlabeled samples.
- sample_submission.csv The official submission format example for checking the final output columns.
- submission.csv The prediction file generated by the current C project.
- digit-playground-model.json The compact softmax demo model and sample set used by the browser playground.
- digit-sample-grid.svg A small handwritten digit preview grid extracted from the training set.
- Handwritten digit project bundle Contains the source file, compressed datasets, submission files, browser model, and preview grid.
- cifar10_tiny_cnn.c source Single-file C tiny CNN with CIFAR-10 loading, convolution, pooling, softmax, and backpropagation.
- model_weights.bin sample weights Model weights generated by one local small-sample run.
- test_predictions.csv sample predictions Sample test prediction output from the CIFAR-10 tiny CNN.
- CNN project explanation PDF Companion explanation material for the CNN project.
- Virtual Mirror redacted code skeleton A redacted mld_chaffing_v2.py control-flow skeleton with secrets, node topology, and target lists removed.
- Virtual Mirror stress-test template A redacted CSV template for CPU, memory, peak threads, pulse rate, latency, and error measurements.
- Virtual Mirror classifier-evaluation template A CSV template for TP, FN, FP, TN, accuracy, precision, recall, F1, ROC-AUC, entropy, and JS divergence.
- Virtual Mirror resource notes Notes explaining why the public resources include only redacted code, test templates, and architecture context.
- AI Security Lab README Setup, safety boundaries, and quick-run commands for the AI Security series.
- AI Security Lab full bundle Includes safe toy scripts, result CSVs, risk register, attack-defense matrix, and architecture diagram.
- AI security risk register CSV risk register template for AI threat modeling and release review.
- AI attack-defense matrix Maps attack surface, toy demo, metric, and defensive control into one CSV table.
- AI Security Lab architecture diagram Shows threat modeling, robustness, data integrity, model privacy, and RAG guardrails.
- FGSM digits robustness script FGSM-style perturbation and accuracy-drop experiment for a local digits classifier.
- Data poisoning and backdoor toy script Demonstrates poison rate, trigger behavior, and attack success rate on digits.
- Model privacy and extraction toy script Outputs membership AUC, target accuracy, surrogate fidelity, and surrogate accuracy.
- RAG prompt injection guard toy script Uses a deterministic toy agent to demonstrate external-data demotion and tool-policy blocking.
- Deep Learning topic share card A 1200x630 SVG card for sharing the Deep Learning / CNN topic hub.
- Machine Learning From Scratch share card A 1200x630 SVG card for the K-means, Iris, and ML workflow topic hub.
- Student AI Projects share card A 1200x630 SVG card for handwritten digits, C classifiers, and browser demos.
- CNN convolution scan animation An 8-second Remotion animation showing how a 3x3 convolution kernel scans an input and builds a feature map.
Current route
- AI Basics Learning Roadmap Learning path step
- Machine Learning Workflow Learning path step
- Model Training and Evaluation Learning path step
- Neural Network Basics Learning path step
- Transformer Self-Attention Learning path step
- LLM Visualizer Learning path step
- Python AI Mini Practice Learning path step
- Handwritten Digit Dataset Basics Learning path step
- Handwritten Digit Softmax in C Learning path step
- Handwritten Digit Playground Notes Learning path step
- CIFAR-10 Tiny CNN Tutorial in C Learning path step
- High-Entropy Traffic Defense Notes Learning path step
- AI Security Threat Modeling Learning path step
- Adversarial Examples and Robust Evaluation Learning path step
- Data Poisoning and Backdoor Defense Learning path step
- Model Privacy and Extraction Defense Learning path step
- LLM, RAG, and Agent Security Learning path step
Next notes
- Add more image-classification and error-analysis cases
- Turn common metrics into a quick reference
- Add more AI security defense experiment notes
