Home/AI Agent Guardrails

Execution guardrails for AI agents

Separate the controls that shape model behavior from the controls that decide whether a tool call is allowed to run. Then put the hard gate on the tool path.

Last updated: May 20, 2026

What are AI agent guardrails?

AI agent guardrails are runtime controls that intercept, evaluate, and enforce authorization policies on tool calls made by autonomous AI agents. Unlike prompt-based instructions, guardrails operate independently of the agent's reasoning at the governed tool boundary. They are the difference between relying on model behavior and enforcing a decision before the tool runs.

Why guardrails matter when tools run

Agents have moved from planning into operation: writing records, issuing refunds, deploying code, and calling internal APIs. The governance gap appears when those agents inherit broad credentials without a policy check on each action.

The consequences are visible wherever agents touch real systems: destructive database calls, unauthorized transfers, exposed customer data, and infrastructure changes nobody approved. In each case, authentication was not enough. The missing control was tool-call authorization.

That distinction is the entire problem. Authentication tells you who the agent is. Authorization tells you what it may do. An authenticated agent without authorization is an authenticated actor without runtime tool policy.

Live tools

agents now write records, send messages, and call internal APIs

Valid keys

authentication establishes access, not whether the action should run

No gate

without authorization, the first review happens after the side effect

Taxonomy of guardrail approaches

Guardrail means different things at different enforcement points. Some guardrails influence model behavior. Others control execution. Use the taxonomy below to locate where the control sits.

1. Prompt-based constraints

Advisory only

Instructions embedded in the system prompt: "Do not delete files," "Never access financial data," "Always ask before sending emails." These are the most common form of "guardrails" and the weakest. They live inside the model's context window, compete with other instructions, and can be overridden by jailbreaks, prompt injection, or the model choosing a path the prompt did not cover.

Fast to add, weak on the tool path
Model can ignore or misinterpret
Useful for intent framing
No decision record
Works as policy documentation
Vulnerable to prompt injection

2. Input filtering and prompt-injection detection

Input-layer control

Tools like Lakera Guard and cloud provider shields that scan inputs before they reach the model. They detect prompt injections, jailbreaks, PII in prompts, and malicious content. Effective at protecting the model from malicious inputs, but they do not control what the model does with good inputs. An agent given legitimate access can still take unauthorized actions.

Blocks prompt injection attacks
Does not control agent actions
Input screening before model use
Cannot enforce tool-level policies
Works as a pre-processing layer
No pre-action approval path by itself

3. Output filtering and content moderation

Output-layer control

Validators that check model outputs before they reach the user. Guardrails AI is the canonical example: a Python framework of composable validators that intercept LLM responses for toxicity, hallucination, PII leakage, and format compliance. Essential for user-facing outputs, but by the time you are filtering outputs, the agent has already acted. If it deleted a database, the output filter catches the response, not the deletion.

Catches toxic or incorrect outputs
Post-action: damage already done
Good for content quality
Does not govern tool execution
Extensible validator ecosystem
No approval workflows

4. Dialog flow control

Conversation-layer control

NVIDIA NeMo Guardrails uses Colang, a domain-specific language, to define conversational flows across five pipeline stages: input, dialog, retrieval, execution, and output rails. Dialog-flow control is built for conversational agents, modeling entire dialog trees. However, it is primarily designed for chatbot-style interactions and requires learning Colang. For tool-calling agents that take real-world actions, dialog control alone is insufficient.

Fine-grained conversation control
Requires learning Colang DSL
Open source (Apache-2.0)
Optimized for chatbots, not tool agents
Parallel rail execution
No action-approval workflow by itself

5. Runtime authorization

Execution-layer control

Runtime authorization intercepts tool calls at the execution boundary, before the tool runs. It evaluates each call against declarative policies that define what the agent may do, with what arguments, under what conditions. The governed path enforces it because the enforcement happens outside the model entirely. This is Veto's approach.

Stops disallowed actions before execution
Model-agnostic: policy matches tool calls, not model vendor
Human review approval flows
Decision record for evidence review
Invisible to the model
Policy-as-code, version-controlled
ApproachStops actions?Model-bound?Execution record?Human approval?
Prompt constraintsYes
Input filteringPartialN/ADetection log
Output filteringN/AValidation log
Dialog flow controlPartialPartial
Runtime authorizationNo

Where runtime authorization belongs

The approaches above are not mutually exclusive. In fact, the strongest production systems layer multiple approaches. If you start with one layer, runtime authorization gives you the clearest coverage. Here is why.

It protects the world, not the model

Input and output filtering protect the model from untrusted data. Runtime authorization limits which model-selected actions can reach systems, data, and users. When an agent has tools that can write, delete, transfer, or send, the risk is not what goes into the model but what comes out as action.

Outside the prompt by design

The model does not need to see the authorization logic. It requests an action; a separate system decides whether to allow it. Like a valet key: the constraint is structural, not conversational.

It routes human review

Because enforcement happens before execution, you can pause an action and route it to a human for approval. This is not available with prompt-based or output-filtering approaches. Human review approval is the safest pattern for high-stakes operations like financial transactions, data deletions, or external communications.

It produces reviewable decision records

Governed decisions can be recorded with the tool name, arguments, matched policy, outcome, timestamp, and approver (if applicable). This is the format regulators, auditors, and review teams ask for: evidence that access controls, oversight, and activity records exist for the AI system being reviewed.

How Veto policies work

Veto is an open-source runtime authorization SDK. It wraps your agent's tools and enforces declarative policies before governed tools execute. The authorization boundary is invisible to the model.

1

Intercept

Tool calls are intercepted before execution. The SDK wraps each tool function, capturing the tool name, arguments, and context. Your agent loop stays intact while authorization moves to an explicit boundary.

2

Evaluate

The policy engine checks the tool call against declarative YAML rules. Policies can match on tool name, argument values, time of day, caller identity, rate limits, and custom conditions. Evaluation runs in-process, before side effects.

3

Enforce

Three outcomes: allow (tool executes normally), deny (agent receives a configurable error), or escalate (action paused, routed to human for approval through the configured approval path). Governed actions can emit decision records.

Agent: "I need to call delete_database(name='prod')"
Veto SDK: Intercepted. Evaluating against policy
Policy: deny tool=delete_database where args.name contains 'prod'
Result: DENIED. Agent receives: "Operation not permitted on production databases."
Decision record: { tool: "delete_database", args: { name: "prod" }, policy: "protect-prod", outcome: "deny" }

Use cases by industry

Guardrails by workflow. Each use case shows the policy patterns, failure modes, and evidence artifacts that matter before an agent is allowed to act.

Browse use cases

Integration surface

Veto integrates at the tool-dispatch boundary across common agent frameworks and LLM SDKs. Wrap the tool path; keep the agent loop.

Browse integrations

Comparing guardrail tools

The market has produced several categories of tools that call themselves "guardrails." They solve different problems. Understanding the differences helps you pick the right combination.

ToolCategoryWhat it doesControls actions?
VetoRuntime authorizationIntercepts tool calls, enforces policies, approval workflows
NeMo GuardrailsDialog flow controlProgrammable conversation flows using Colang DSLPartial
Guardrails AIOutput validationComposable validators for LLM output quality and safety
Lakera GuardInput securityPrompt injection detection, PII scanning, content filtering
GalileoObservability + moderationHallucination detection, toxicity scoring, runtime monitoringPartial
Arthur AIAI monitoringModel performance monitoring, bias detection, observability
Compare runtime authorization with prompt, output, and monitoring guardrails

Implementation guide

A production rollout has four parts: install the SDK, write the first policy, wrap the tool path, and replay real calls before enforcement.

Step 1: Install the SDK

npm install veto-sdk. The production TypeScript SDK is open source under Apache-2.0.

Step 2: Define policies

Write declarative YAML policies. Or use the workspace's policy generator to create them from natural language. Policies define which tools are allowed, denied, or require approval, with optional argument-level conditions.

Step 3: Wrap your tools

Import the Veto SDK and wrap your tool functions. Your agent loop stays intact. The model does not know authorization exists.

Step 4: Test and deploy

Use the CLI to test policies locally. Use the playground to simulate tool calls. Deploy to production with environment-specific policies for dev, staging, and production.

Compliance and regulation

Regulated teams need evidence that AI systems are scoped, controlled, and reviewable. Guardrails are one practical way to produce that evidence before an incident or audit.

EU AI Act (phased through 2028)

High-risk AI systems can require risk mitigation measures, human oversight mechanisms, and logging capabilities. Runtime authorization with decision records and human review approval creates evidence for those requirements. See the EU AI Act mapping.

SOC 2 evidence

Expects access-control evidence, activity review, and evidence that policy enforcement is operating. Veto's decision records are exportable in formats compatible with SOC 2 evidence collection. Policy versioning preserves evidence of control changes over time.

HIPAA

Requires access controls for Protected Health Information (PHI). Guardrails can enforce row-level access, block bulk extraction, and record governed access decisions for audit.

GDPR

Requires data minimization, purpose limitation, and accountability. Guardrails enforce what data an agent can access and for what purpose, with decision records providing accountability evidence.

Frequently asked questions

What are AI agent guardrails?
AI agent guardrails are runtime controls that intercept, evaluate, and enforce authorization policies on governed tool calls made by autonomous AI agents. Unlike prompt-based instructions or output filters, runtime guardrails operate independently of the agent's reasoning at the tool-call boundary. When the tool path is wrapped, the model does not control the policy decision because enforcement happens outside the LLM context.
How do guardrails differ from prompt engineering?
Prompts are suggestions embedded in the model's context window. They can be ignored, misunderstood, overridden by conflicting instructions, or worked around through jailbreaks. Guardrails are enforcement mechanisms that intercept governed tool calls before execution. The model does not see the guardrail logic, so prompt wording does not change the policy decision. Prompts provide guidance; guardrails provide enforcement.
What is the difference between input guardrails and runtime authorization?
Input guardrails (like prompt injection detection) filter what goes into the model. Output guardrails filter what comes out. Runtime authorization is different: it intercepts the actions an agent tries to take, evaluating governed tool calls against policy before the tool executes. These layers are complementary. Input and output filtering protect the model; runtime authorization protects the world the model acts on.
Do guardrails slow down my agent?
Veto's policy evaluation runs in-process before the tool executes. The SDK can execute policy checks locally on the critical path. Cloud features such as team approvals and decision-record retention are separate from auto-approved local checks.
Can I use guardrails with my existing agent code?
Yes. Wrap the tools your app dispatches with the Veto SDK. When tool dispatch is centralized, Veto stays on that execution path while the authorization check remains outside the model's prompt context. The production TypeScript path covers OpenAI, Claude, Gemini, Vercel AI SDK, Mastra, Playwright, and MCP. Python framework pages are preview guides for teams already using those stacks.
What happens when a guardrail blocks an action?
The tool call is intercepted before execution and the agent receives a configurable response. You can return an error message, a fallback value, or route the action to human approval through the configured approval path. Governed actions can emit decision records with tool name, arguments, matched policy, and outcome.
How are AI guardrails different from traditional API rate limiting?
Rate limiting controls how often an agent can call a tool. Guardrails control whether the agent should call it at all, and with what arguments. A rate limiter would let an agent delete a production database once per minute. A guardrail would block the deletion on the governed path unless an approved policy permits it.
Do I need guardrails if my agent only has read access?
Read access still carries risks: data exfiltration, PII exposure, bulk extraction, and compliance violations. Guardrails can help limit which data an agent reads, enforce row-level access controls, and block bulk extraction patterns. If your agent touches sensitive data in any direction, you need a runtime guardrail layer.
Are AI agent guardrails required by regulation?
The EU AI Act is phased. Prohibited-practice rules applied from February 2, 2025, and GPAI obligations from August 2, 2025. Article 50 transparency rules apply from August 2, 2026. A May 7, 2026 Parliament-Council political agreement on the AI Omnibus would set high-risk timing later: December 2, 2027 for specified high-risk areas and August 2, 2028 for product-integrated systems; it still needs formal adoption before entering law. SOC 2 Type II reviewers look for access-control and activity-review evidence. HIPAA requires ePHI access controls. While the laws do not name 'guardrails' specifically, runtime authorization can provide operational evidence for oversight, access control, and incident review.
What is the difference between guardrails and alignment?
Alignment is about training the model to want to do the right thing. Guardrails constrain what the surrounding system will execute, even when the model asks for something else. Alignment is a property of the model. Guardrails are a property of the system around the model. Both matter. Neither is sufficient alone.

Make the tool path enforceable.

Open source. Local enforcement. Policy checks before execution.