Execution guardrails for AI agents
Separate the controls that shape model behavior from the controls that decide whether a tool call is allowed to run. Then put the hard gate on the tool path.
Last updated: May 20, 2026
What are AI agent guardrails?
AI agent guardrails are runtime controls that intercept, evaluate, and enforce authorization policies on tool calls made by autonomous AI agents. Unlike prompt-based instructions, guardrails operate independently of the agent's reasoning at the governed tool boundary. They are the difference between relying on model behavior and enforcing a decision before the tool runs.
Why guardrails matter when tools run
Agents have moved from planning into operation: writing records, issuing refunds, deploying code, and calling internal APIs. The governance gap appears when those agents inherit broad credentials without a policy check on each action.
The consequences are visible wherever agents touch real systems: destructive database calls, unauthorized transfers, exposed customer data, and infrastructure changes nobody approved. In each case, authentication was not enough. The missing control was tool-call authorization.
That distinction is the entire problem. Authentication tells you who the agent is. Authorization tells you what it may do. An authenticated agent without authorization is an authenticated actor without runtime tool policy.
agents now write records, send messages, and call internal APIs
authentication establishes access, not whether the action should run
without authorization, the first review happens after the side effect
Taxonomy of guardrail approaches
Guardrail means different things at different enforcement points. Some guardrails influence model behavior. Others control execution. Use the taxonomy below to locate where the control sits.
1. Prompt-based constraints
Advisory onlyInstructions embedded in the system prompt: "Do not delete files," "Never access financial data," "Always ask before sending emails." These are the most common form of "guardrails" and the weakest. They live inside the model's context window, compete with other instructions, and can be overridden by jailbreaks, prompt injection, or the model choosing a path the prompt did not cover.
2. Input filtering and prompt-injection detection
Input-layer controlTools like Lakera Guard and cloud provider shields that scan inputs before they reach the model. They detect prompt injections, jailbreaks, PII in prompts, and malicious content. Effective at protecting the model from malicious inputs, but they do not control what the model does with good inputs. An agent given legitimate access can still take unauthorized actions.
3. Output filtering and content moderation
Output-layer controlValidators that check model outputs before they reach the user. Guardrails AI is the canonical example: a Python framework of composable validators that intercept LLM responses for toxicity, hallucination, PII leakage, and format compliance. Essential for user-facing outputs, but by the time you are filtering outputs, the agent has already acted. If it deleted a database, the output filter catches the response, not the deletion.
4. Dialog flow control
Conversation-layer controlNVIDIA NeMo Guardrails uses Colang, a domain-specific language, to define conversational flows across five pipeline stages: input, dialog, retrieval, execution, and output rails. Dialog-flow control is built for conversational agents, modeling entire dialog trees. However, it is primarily designed for chatbot-style interactions and requires learning Colang. For tool-calling agents that take real-world actions, dialog control alone is insufficient.
5. Runtime authorization
Execution-layer controlRuntime authorization intercepts tool calls at the execution boundary, before the tool runs. It evaluates each call against declarative policies that define what the agent may do, with what arguments, under what conditions. The governed path enforces it because the enforcement happens outside the model entirely. This is Veto's approach.
| Approach | Stops actions? | Model-bound? | Execution record? | Human approval? |
|---|---|---|---|---|
| Prompt constraints | Yes | |||
| Input filtering | Partial | N/A | Detection log | |
| Output filtering | N/A | Validation log | ||
| Dialog flow control | Partial | Partial | ||
| Runtime authorization | No |
How Veto policies work
Veto is an open-source runtime authorization SDK. It wraps your agent's tools and enforces declarative policies before governed tools execute. The authorization boundary is invisible to the model.
Intercept
Tool calls are intercepted before execution. The SDK wraps each tool function, capturing the tool name, arguments, and context. Your agent loop stays intact while authorization moves to an explicit boundary.
Evaluate
The policy engine checks the tool call against declarative YAML rules. Policies can match on tool name, argument values, time of day, caller identity, rate limits, and custom conditions. Evaluation runs in-process, before side effects.
Enforce
Three outcomes: allow (tool executes normally), deny (agent receives a configurable error), or escalate (action paused, routed to human for approval through the configured approval path). Governed actions can emit decision records.
Agent: "I need to call delete_database(name='prod')"
Veto SDK: Intercepted. Evaluating against policy
Policy: deny tool=delete_database where args.name contains 'prod'
Result: DENIED. Agent receives: "Operation not permitted on production databases."
Decision record: { tool: "delete_database", args: { name: "prod" }, policy: "protect-prod", outcome: "deny" }
Use cases by industry
Guardrails by workflow. Each use case shows the policy patterns, failure modes, and evidence artifacts that matter before an agent is allowed to act.
Transaction limits, approval workflows, SOX evidence review, payment authorization
Browser AgentsURL allowlisting, form protection, credential isolation, download controls
DevOps AgentsShell command filtering, infrastructure change protection, deployment gates
Data AgentsQuery validation, PII protection, row-level access, bulk extraction prevention
Customer SupportResponse validation, data access controls, escalation policies, refund limits
Sales AgentsCRM write limits, discount authorization, contract approval, data access
Research AgentsSource validation, extraction limits, IP protection, citation requirements
Enterprise AgentsSSO integration, decision records, multi-tenant isolation, RBAC policies
Healthcare AgentsPHI access control, HIPAA evidence, clinical workflow approvals
Legal AgentsDocument access control, privilege protection, confidentiality enforcement
Insurance AgentsClaims authorization, underwriting controls, review records
Integration surface
Veto integrates at the tool-dispatch boundary across common agent frameworks and LLM SDKs. Wrap the tool path; keep the agent loop.
Browse integrationsComparing guardrail tools
The market has produced several categories of tools that call themselves "guardrails." They solve different problems. Understanding the differences helps you pick the right combination.
| Tool | Category | What it does | Controls actions? |
|---|---|---|---|
| Veto | Runtime authorization | Intercepts tool calls, enforces policies, approval workflows | |
| NeMo Guardrails | Dialog flow control | Programmable conversation flows using Colang DSL | Partial |
| Guardrails AI | Output validation | Composable validators for LLM output quality and safety | |
| Lakera Guard | Input security | Prompt injection detection, PII scanning, content filtering | |
| Galileo | Observability + moderation | Hallucination detection, toxicity scoring, runtime monitoring | Partial |
| Arthur AI | AI monitoring | Model performance monitoring, bias detection, observability |
Implementation guide
A production rollout has four parts: install the SDK, write the first policy, wrap the tool path, and replay real calls before enforcement.
Step 1: Install the SDK
npm install veto-sdk. The production TypeScript SDK is open source under Apache-2.0.
Step 2: Define policies
Write declarative YAML policies. Or use the workspace's policy generator to create them from natural language. Policies define which tools are allowed, denied, or require approval, with optional argument-level conditions.
Step 3: Wrap your tools
Import the Veto SDK and wrap your tool functions. Your agent loop stays intact. The model does not know authorization exists.
Step 4: Test and deploy
Use the CLI to test policies locally. Use the playground to simulate tool calls. Deploy to production with environment-specific policies for dev, staging, and production.
Compliance and regulation
Regulated teams need evidence that AI systems are scoped, controlled, and reviewable. Guardrails are one practical way to produce that evidence before an incident or audit.
EU AI Act (phased through 2028)
High-risk AI systems can require risk mitigation measures, human oversight mechanisms, and logging capabilities. Runtime authorization with decision records and human review approval creates evidence for those requirements. See the EU AI Act mapping.
SOC 2 evidence
Expects access-control evidence, activity review, and evidence that policy enforcement is operating. Veto's decision records are exportable in formats compatible with SOC 2 evidence collection. Policy versioning preserves evidence of control changes over time.
HIPAA
Requires access controls for Protected Health Information (PHI). Guardrails can enforce row-level access, block bulk extraction, and record governed access decisions for audit.
GDPR
Requires data minimization, purpose limitation, and accountability. Guardrails enforce what data an agent can access and for what purpose, with decision records providing accountability evidence.
Frequently asked questions
What are AI agent guardrails?
How do guardrails differ from prompt engineering?
What is the difference between input guardrails and runtime authorization?
Do guardrails slow down my agent?
Can I use guardrails with my existing agent code?
What happens when a guardrail blocks an action?
How are AI guardrails different from traditional API rate limiting?
Do I need guardrails if my agent only has read access?
Are AI agent guardrails required by regulation?
What is the difference between guardrails and alignment?
Make the tool path enforceable.
Open source. Local enforcement. Policy checks before execution.