Security for agents that take action

AI agents do not just generate text. They execute code, move money, delete data, and send emails. Securing them requires a different control model than securing chatbots. This is for platform and security teams deploying agents into production.

Last updated: May 20, 2026

What is AI agent security?

AI agent security is the discipline of protecting autonomous AI systems that interact with external tools, APIs, and data stores from being exploited, manipulated, or misused. It spans threat modeling, policy definition, runtime enforcement, and post-incident analysis. Unlike traditional application security, agent security must account for non-deterministic decision-making by the AI model itself.

1. The AI agent threat landscape

The shift from chatbots to agents changed the security equation. A chatbot produces text. An agent produces actions. When you give an LLM the ability to call tools, you have introduced an actor in your system with delegated privileges at machine speed.

Enterprise software is moving from assistants that suggest actions to agents that take them. The inventory problem now moves faster than static review. Teams can often name the model and the app, but not the policy that decides whether a proposed tool call should run.

The common production gap: agents are authenticated, but their tool calls are not authorized at runtime. They have API keys that grant identity, but no policy check that governs behavior. Authentication answers "who is this?" Authorization answers "what can it do?" Agent deployments often have the first but not the second.

Traditional software security

  • -Deterministic behavior
  • -Code review catches implementation bugs
  • -Input validation at known boundaries
  • -Static RBAC can be enough

AI agent security

  • -Non-deterministic decisions
  • -Same input can produce different tool calls
  • -Attack surface includes natural language
  • -Requires runtime tool-call authorization

2. Attack taxonomy: how agents get exploited

OWASP's Agentic Security Initiative and the OWASP Top 10 for Agentic Applications 2026 document the main AI-agent risk families. We have organized the relevant risks into four categories that map to the tool-call boundary where authorization operates.

Prompt injection

The most widely discussed attack vector. An adversary embeds instructions in data the agent processes, such as a web page, an email, or a database record, causing the agent to execute unintended actions. Direct injection manipulates the agent's own prompt. Indirect injection poisons the data the agent retrieves.

A shared document can carry hidden instructions that cause an agent to exfiltrate user data through a rendered image URL. The agent fetches the document, processes the hidden instruction, and encodes private information in an outbound request, all within its normal operating parameters.

Operational consequence for authorization: Prompt injection does not need to be solved at the model level alone. If the agent is blocked from executing the destructive action (exfiltrating data, calling unauthorized APIs), the impact is contained on the governed path even if the model was manipulated.

Tool abuse and misuse

Agents are given tools: file system access, shell execution, API calls, database queries. Tool abuse occurs when an agent uses a legitimate tool in an unauthorized way: runningrm -rf / when it was given shell access for build scripts, or executingDROP TABLE when it was given database access for read queries.

A coding agent with legitimate database credentials can still attempt a destructive operation. The problem is not authentication. The tool is real, the credentials are valid, and the action still needs authorization before execution.

Operational consequence for authorization: Tool-level authorization evaluates not just which tools the agent can call, but what arguments are permitted. Allowing SELECT but blocking DROP is the kind of granular control that static permissions cannot provide.

Data exfiltration

Agents with access to sensitive data can be manipulated, or can independently decide, to transmit that data to unauthorized destinations. This includes encoding data in API request parameters, writing it to external services, embedding it in generated code, or leaking it through side channels like image URLs.

The risk is acute in healthcare (PHI under HIPAA), finance (PCI DSS cardholder data), and legal (attorney-client privileged information). An agent with read access to an EHR system and write access to an email API has enough access to cause a reportable breach.

Operational consequence for authorization: Output redaction and destination whitelisting at the tool-call level can block matched data from leaving the authorized perimeter. The agent can read patient records to answer questions, but the authorization boundary can stop or redact PHI before outbound actions on the governed path.

Privilege escalation

Multi-agent systems introduce delegation chains where one agent invokes another. A low-privilege agent can request a high-privilege agent to perform actions on its behalf, effectively escalating its own permissions. Without authorization checks at governed hops, the delegation chain becomes an escalation path.

This is analogous to the confused deputy problem in traditional security, but amplified by the non-deterministic nature of LLM reasoning. An agent that discovers it can ask another agent to perform a blocked action may do so without being explicitly instructed to; it is just "solving the problem" it was given.

Operational consequence for authorization: Per-agent, tool-call authorization with delegation tracking keeps downstream agents within the permissions of the original caller. The authorization boundary is enforced at governed tool calls in the chain, not just the first one.

3. Real-world incidents

Agent security is the boundary between a model's plan and a real side effect. These examples show where runtime authorization belongs.

Coding agent attempts destructive database operation

Failure mode

A coding agent with full database credentials can pass authentication and still attempt a destructive operation. Runtime authorization checks the actual command before the database sees it.

Document-borne data exfiltration

Indirect injection

A prompt injection hidden in a document can steer an agent to encode private information in a rendered image URL, sending it to an attacker-controlled server as part of normal document processing.

ChatGPT plugin SSRF and data leakage

2023-2024

Multiple ChatGPT plugins were found to be vulnerable to server-side request forgery (SSRF). Attackers could craft prompts that caused the agent to make requests to internal network addresses, bypassing firewall rules. The plugins acted as proxies into private infrastructure.

MCP tool poisoning attacks

2025

Malicious MCP servers can inject hidden instructions into tool descriptions that are invisible to the user but processed by the LLM, steering the agent toward data exfiltration, code changes, or unauthorized commands.

4. Security frameworks: NIST AI RMF and OWASP

Two frameworks give practical guidance for AI agent security. Understanding them is essential for building a defensible security posture and communicating risk to leadership.

NIST AI Risk Management Framework (AI 100-1)

The NIST AI RMF organizes risk management into four functions: Govern, Map, Measure, and Manage. For AI agents that can act, the critical functions are:

GOVERN

Establish policies and accountability structures for AI agent deployment. Define who can deploy agents, what permissions they start with, and who approves escalation.

MAP

Identify and categorize risks specific to each agent's tool set. An agent with file system access has different risks than one with email access. Map tools to threat categories.

MEASURE

Quantify agent risk through monitoring. Track tool-call frequency, denied actions, approval response times, and policy violation trends. Measure what your agents do.

MANAGE

Enforce policies at runtime. This is where authorization lives: intercepting, evaluating, and controlling each governed tool call against defined policy. The enforcement function.

OWASP Top 10 for Agentic Applications (2026)

OWASP's Agentic Security Initiative identifies critical risks in agentic applications. These representative risk families show where runtime authorization helps:

ASI
Excessive agency: Least-privilege policies restrict tool access to the minimum required set
ASI
Uncontrolled cascading effects: Tool-call authorization limits chain reactions across governed tool calls
ASI
Intent misalignment: Policy enforcement stays outside agent reasoning or intent
ASI
Prompt injection: Tool-call validation blocks governed actions even when the model was manipulated
ASI
Inadequate sandboxing: Authorization acts as a logical boundary around governed tool calls
ASI
Broken access control: Per-agent, per-tool policies add action scope beyond broad identity grants
ASI
Insufficient monitoring: Governed authorization decisions carry decision context for review
ASI
Broken delegation: Delegation chain tracking with explicit permission inheritance controls
ASI
Supply chain vulnerabilities: Tool-call validation inspects arguments on governed tools regardless of source
ASI
Data leakage: Output redaction and destination controls can run at the authorization boundary

5. Defense-in-depth: the five layers

No single control secures an agent. Defense-in-depth means applying controls at multiple layers, so that a failure at one layer has another control behind it. Organizations often have layers 1-3 but still need layer 4: runtime authorization at the tool-call enforcement point.

1

Model-level controls

System prompts, Constitutional AI, RLHF. These shape model behavior but cannot enforce it. The model can ignore, misinterpret, or be manipulated past these controls. Necessary but insufficient.

2

Input validation

Prompt injection detection, input sanitization, content filtering. Catches known attack patterns in user input. Cannot catch novel attacks, indirect injections from fetched data, or misuse that does not involve malicious input at all.

3

Network and infrastructure

Firewalls, VPCs, network segmentation, secrets management. Limits where agents can reach at the network level. Coarse-grained: can block entire hosts but not specific operations on allowed hosts.

4

Runtime authorization

Policy enforcement at the tool-call boundary. Checks governed calls, evaluates them against declarative policy, and allows, denies, or routes to human approval. Operates independently of the model's reasoning. Not delegated to prompt text because the model does not control the authorization check. This is what Veto provides.

5

Monitoring and response

Logging, alerting, anomaly detection, incident response. Essential for visibility but reactive by nature: it tells you what happened after the fact. Without layer 4, monitoring alone can become documentation after damage rather than control before execution.

6. Runtime authorization on the tool path

The distinction is this: prompts shape the model's reasoning, and network controls limit where it can reach. Runtime authorization controls what the agent can do at the moment it tries to do it.

This matters because agents are non-deterministic. You cannot predict every action an agent will take, write enough prompt instructions to cover every edge case, or test every tool-call sequence in advance. The durable control is enforcement on the tool path.

The tool-call boundary

Agentdecides to calldelete_file("/etc/passwd")
|Tool call intercepted by authorization boundary
|Policy evaluated: delete_file on /etc/* = DENY
XAction blocked. Agent receives denial response.
|Decision recorded with decision context for evidence review.

Runtime authorization helps across the four attack categories. Prompt injection that tries to trigger unauthorized actions can be blocked at the tool-call boundary. Tool abuse can be constrained through argument validation. Data exfiltration can be reduced through redaction and destination controls. Privilege escalation can be constrained through delegation tracking and explicit permission inheritance.

AttackPrompt defenseNetwork defenseRuntime authorization
Prompt injectionPartialNoneFull
Tool abuseNoneNoneFull
Data exfiltrationPartialPartialFull
Privilege escalationNoneNoneFull

7. Implementing agent security with Veto

Veto is the runtime authorization path for AI agents. It checks tool calls before they run, intercepting governed tool calls, evaluating them against declarative policy, and enforcing allow, deny, or approval decisions. The agent's code does not change. The model is unaware it is being governed.

Policy-as-code

Declarative YAML policies stored in your repository. Version-controlled, reviewable, auditable. Define what each agent can do with surgical precision, down to specific tool arguments and parameter ranges.

Human review

Route review-required actions to human approval. Configurable escalation through your review channel or workspace. The agent pauses until a human approves or denies. Decision record for governed decision and its outcome.

Tool-boundary integration

Fits OpenAI, Claude, Gemini, Vercel AI SDK, Mastra, Playwright, MCP, and custom TypeScript dispatch paths. The authorization boundary wraps your tools, not your agent.

Decision records

Each governed decision is recorded: tool name, arguments, policy matched, outcome, timestamp, agent identity, delegation chain. Queryable via workspace, exportable for SOC 2, HIPAA, GDPR, and EU AI Act evidence reporting.

8. Agent security maturity model

Where does your organization fall? Agent deployments often remain at Level 1 or 2. The gap between "authenticated" and "authorized" is where incidents happen.

L0

No controls

Agents run with full tool access. No authentication, no authorization, no logging. Common in prototypes and hackathon projects that quietly reach production.

L1

Authentication only

Agents have API keys and identity verification. You know who the agent is. You do not control what it can do. This is the common gap between identity and authorization.

L2

Prompt-based guardrails

System prompts include instructions like "never delete files" and "always ask before sending emails." These are suggestions, not enforcement. The model can ignore them, and prompt injection can override them.

L3

Runtime authorization

Declarative policies enforce authorization at the tool-call boundary. Governed calls are evaluated before execution. Review-required actions route to human approval. Governed decisions are logged. This is where Veto operates.

L4

Continuous governance

At this layer, teams test policies, watch drift, and report evidence from observed agent behavior. Veto provides the decision records, policy versions, and reviewer trail those programs need.

Further reading

Frequently asked questions

What is AI agent security?
AI agent security is the discipline of protecting autonomous AI systems that interact with external tools, APIs, and data stores. Unlike traditional application security, it must account for non-deterministic behavior, prompt-based manipulation, and the fact that agents can take real-world actions like sending money, deleting data, or accessing sensitive records.
How is AI agent security different from LLM security?
LLM security focuses on the model itself: jailbreaks, hallucinations, data poisoning. Agent security extends to the actions the model takes via tools. An LLM that hallucinates is annoying. An agent that executes a hallucinated SQL query against your production database is a security incident. Agent security operates at the tool-call boundary, not the text-generation boundary.
Can prompt engineering solve agent security?
No. Prompt engineering shapes model behavior, but it is not an enforcement boundary. The model can ignore, misinterpret, or be manipulated past prompt-based instructions. Prompt injection attacks specifically target this weakness. Runtime authorization enforces policy independently of the model's reasoning; the governed tool path does not give the agent control of the policy because it does not control the authorization check.
What is the OWASP Top 10 for Agentic Applications?
The OWASP Agentic Security Initiative publishes agentic application risk guidance. The risks include excessive agency, cascading effects, intent misalignment, prompt injection, inadequate sandboxing, broken access control, insufficient monitoring, broken delegation, supply chain exposure, and data leakage. Runtime authorization controls the tool-call boundary where those risks become side effects.
How does NIST AI RMF apply to AI agents?
The NIST AI Risk Management Framework (AI 100-1) provides a four-function approach: Govern, Map, Measure, and Manage. For agents, the MANAGE function is critical because it is where runtime enforcement happens. Veto maps to the MANAGE function by providing policy enforcement, decision records, and human review controls that produce risk-management evidence.
What frameworks does Veto work with?
Veto wraps tool functions at the call boundary. The production TypeScript path covers OpenAI, Claude, Gemini, Vercel AI SDK, Mastra, Playwright, MCP, and custom dispatch code. Python framework pages are preview guides for teams already using those stacks. Integration wraps your tools, not your agent.

Can does not mean may. Enforce it.