Claude Agent Guardrails with Veto
Runtime authorization for Claude agents. Block dangerous tool calls, enforce policies, and maintain control without modifying your agent's prompts or behavior.
Claude agent guardrails and authorization
Claude agent guardrails are runtime controls that intercept and evaluate tool calls made by Claude before execution. Unlike prompt-based instructions, Veto guardrails operate independently of Claude's reasoning and cannot be bypassed by the model. This provides deterministic, auditable control over what your Claude agents can do.
Why Claude agents need authorization
Claude is one of the most capable AI models for agentic workflows. With the Anthropic SDK, Claude can use tools to read files, make API calls, execute code, and interact with external systems. This power comes with risk. A Claude agent with write access to your database, payment system, or customer data can cause real damage if it makes the wrong decision.
We've seen what happens when agents operate without guardrails. A coding agent deleted a production database after being told eleven times to stop. An AI safety researcher's own agent ignored her instructions and deleted hundreds of emails. Both agents were authenticated. Neither was authorized. That distinction is why Veto exists.
OWASP LLM Top 10 (2025) identifies prompt injection (LLM01:2025) as the primary security vulnerability for tool-calling agents, requiring defense-in-depth with system prompt hardening and deterministic input filtering. Claude's constitutional AI approach makes it safer than many models, but safety is not authorization. Guardrails enforce your policies regardless of Claude's own safety training.
How Veto works with Claude
Veto sits between Claude and your tools. When Claude decides to use a tool, Veto intercepts the call before execution, evaluates it against your policies, and either allows, blocks, or routes it to human approval. Claude is unaware it's being governed.
Wrap your tools
Use the Veto SDK to wrap your tool definitions and implementations. The SDK provides adapters for Anthropic's tool format out of the box.
Pass to Claude
Give Claude the wrapped tool definitions. Claude sees the same tool schemas it expects. Your agent code doesn't change.
Execute through Veto
When Claude calls a tool, Veto evaluates it against your policies first. Allowed calls proceed to your implementation. Denied calls return a configurable response to Claude.
Secure tool definition with Zod validation
The Anthropic SDK supports tool definitions with JSON Schema or Zod schemas for input validation. The allowed_callers field restricts which execution contexts can invoke specific tools. Programmatic Tool Calling enables Claude to orchestrate tools through code rather than individual API round-trips, reducing context pollution and improving security.
import Anthropic from '@anthropic-ai/sdk';
import { betaZodTool } from '@anthropic-ai/sdk/helpers/beta/zod';
import { z } from 'zod';
const client = new Anthropic();
// Authorization middleware
async function checkAuthorization(userId: string, toolName: string): Promise<boolean> {
const permissions = await fetchUserPermissions(userId);
return permissions.allowedTools.includes(toolName);
}
// Rate limit tracking
const rateLimiter = new Map<string, { count: number; resetAt: number }>();
function checkRateLimit(userId: string, maxRequests: number = 100): boolean {
const now = Date.now();
const userLimit = rateLimiter.get(userId);
if (!userLimit || now > userLimit.resetAt) {
rateLimiter.set(userId, { count: 1, resetAt: now + 60000 });
return true;
}
if (userLimit.count >= maxRequests) {
return false;
}
userLimit.count++;
return true;
}
// Secure tool with schema validation
const secureDatabaseTool = betaZodTool({
name: 'query_database',
description: 'Execute a read-only database query with user authorization',
inputSchema: z.object({
query: z.string().max(1000).describe('SQL SELECT query'),
table: z.enum(['users', 'orders', 'products']).describe('Target table'),
limit: z.number().min(1).max(100).default(10)
}),
run: async (input, context) => {
// Authorization check
if (!await checkAuthorization(context.userId, 'query_database')) {
throw new Error('Unauthorized: User lacks permission for database queries');
}
// Rate limit check
if (!checkRateLimit(context.userId)) {
throw new Error('Rate limit exceeded. Please try again later.');
}
// Input sanitization
const sanitizedQuery = sanitizeSQL(input.query);
if (!sanitizedQuery.startsWith('SELECT')) {
throw new Error('Only SELECT queries are allowed');
}
return executeQuery(sanitizedQuery, input.table, input.limit);
}
});This pattern shows: (1) Zod schema validates input structure and constraints, (2) Authorization middleware checks user permissions before execution, (3) Rate limiting prevents abuse, (4) Input sanitization protects against SQL injection, (5) Tool scoping restricts operations to read-only SELECT queries.
Tool result handling with error recovery
Tool results enter the model's context window and can influence future behavior. Always validate and sanitize outputs: check for PII before including results, validate JSON structure, limit result size to prevent context overflow, and redact or mask sensitive fields.
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
interface ToolExecutionContext {
userId: string;
sessionId: string;
allowedTools: string[];
maxRetries: number;
}
async function executeToolSafely(
toolUse: Anthropic.ToolUseBlock,
context: ToolExecutionContext
): Promise<Anthropic.ToolResultBlockParam> {
const startTime = Date.now();
try {
// Validate tool is allowed for this context
if (!context.allowedTools.includes(toolUse.name)) {
return {
type: 'tool_result',
tool_use_id: toolUse.id,
content: `Error: Tool '${toolUse.name}' is not authorized for this session`,
is_error: true
};
}
// Validate tool input against schema
const toolSchema = getToolSchema(toolUse.name);
const validatedInput = await toolSchema.parseAsync(toolUse.input);
// Execute tool with timeout
const result = await Promise.race([
executeTool(toolUse.name, validatedInput, context),
new Promise((_, reject) =>
setTimeout(() => reject(new Error('Tool execution timeout')), 30000)
)
]);
// Validate and sanitize output
const sanitizedResult = sanitizeOutput(result);
return {
type: 'tool_result',
tool_use_id: toolUse.id,
content: JSON.stringify(sanitizedResult)
};
} catch (error) {
// Log error for monitoring (without sensitive data)
logToolError({
toolName: toolUse.name,
error: error.message,
duration: Date.now() - startTime,
sessionId: context.sessionId
});
return {
type: 'tool_result',
tool_use_id: toolUse.id,
content: `Error: ${error.message}`,
is_error: true
};
}
}
// Main message handling loop with tool execution
async function handleMessageWithTools(
userMessage: string,
context: ToolExecutionContext
) {
const messages: Anthropic.MessageParam[] = [
{ role: 'user', content: userMessage }
];
let response = await client.messages.create({
model: 'claude-sonnet-4-5-20250929',
max_tokens: 4096,
tools: getToolDefinitions(),
messages
});
// Handle tool use loop
while (response.stop_reason === 'tool_use') {
const toolResults = await Promise.all(
response.content
.filter((block): block is Anthropic.ToolUseBlock => block.type === 'tool_use')
.map(toolUse => executeToolSafely(toolUse, context))
);
messages.push(
{ role: 'assistant', content: response.content },
{ role: 'user', content: toolResults }
);
response = await client.messages.create({
model: 'claude-sonnet-4-5-20250929',
max_tokens: 4096,
tools: getToolDefinitions(),
messages
});
}
return response;
}Streaming with real-time guardrails
Streaming responses should implement real-time content filtering to prevent PII or sensitive data from reaching users before it can be redacted. Buffer-based filtering enables context-aware detection for hallucination indicators and content policy violations.
import asyncio
from anthropic import AsyncAnthropic
from typing import AsyncIterator
import re
client = AsyncAnthropic()
# Guardrail patterns for content filtering
SENSITIVE_PATTERNS = [
r'\b\d{3}[-.]?\d{2}[-.]?\d{4}\b', # SSN
r'\b[\w\.-]+@[\w\.-]+\.\w+\b', # Email
r'\b\d{16}\b', # Credit card
]
class StreamingGuardrail:
def __init__(self):
self.buffer = ""
self.blocked = False
def check_content(self, text: str) -> str:
"""Filter sensitive content from streaming text."""
sanitized = text
for pattern in SENSITIVE_PATTERNS:
sanitized = re.sub(pattern, '[REDACTED]', sanitized)
return sanitized
def check_hallucination_indicators(self, text: str) -> bool:
"""Detect potential hallucination patterns."""
indicators = [
"I don't have access to",
"I cannot verify",
"based on my training",
]
return any(indicator in text.lower() for indicator in indicators)
async def stream_with_guardrails(
message: str,
system_prompt: str,
max_tokens: int = 4096
) -> AsyncIterator[str]:
"""Stream responses with real-time guardrails."""
guardrail = StreamingGuardrail()
full_response = ""
async with client.messages.stream(
model="claude-sonnet-4-5-20250929",
max_tokens=max_tokens,
system=system_prompt,
messages=[{"role": "user", "content": message}]
) as stream:
async for event in stream:
if event.type == "content_block_delta":
if hasattr(event.delta, 'text'):
raw_text = event.delta.text
# Apply guardrails
sanitized_text = guardrail.check_content(raw_text)
# Check for hallucination indicators
if guardrail.check_hallucination_indicators(raw_text):
yield "\n[Warning: Potential unverified information]\n"
full_response += sanitized_text
yield sanitized_text
elif event.type == "message_delta":
if hasattr(event, 'stop_reason'):
if event.stop_reason == "refusal":
yield "\n[Content policy triggered]\n"
guardrail.blocked = True
# Post-stream validation
if not guardrail.blocked:
final_message = await stream.get_final_message()
print(f"Input tokens: {final_message.usage.input_tokens}")
print(f"Output tokens: {final_message.usage.output_tokens}")Prompt caching for cost optimization
Claude's prompt caching provides significant cost savings (90% reduction on cache reads) and supports both automatic caching for multi-turn conversations and explicit breakpoints for fine-grained control. Cache entries are isolated per workspace, with up to 4 breakpoints per request.
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
interface CachedPromptConfig {
systemPrompt: string;
tools: Anthropic.Tool[];
examples: string[];
ttl: '5m' | '1h';
}
async function createCachedRequest(config: CachedPromptConfig) {
// System prompt with caching - cacheable prefix
const systemBlocks = [
{
type: 'text' as const,
text: config.systemPrompt,
cache_control: { type: 'ephemeral' as const, ttl: config.ttl }
}
];
// Tools are cached automatically as part of the prefix
// Mark frequently used tools with explicit caching
const cachedTools = config.tools.map(tool => ({
...tool,
cache_control: { type: 'ephemeral' as const }
}));
const response = await client.messages.create({
model: 'claude-sonnet-4-5-20250929',
max_tokens: 4096,
system: systemBlocks,
tools: cachedTools,
// Automatic caching for conversation history
cache_control: { type: 'ephemeral' },
messages: [
{
role: 'user',
content: 'Process the following request...'
}
]
});
// Monitor cache performance
const usage = response.usage;
console.log('Cache Performance:');
console.log(` Read from cache: ${usage.cache_read_input_tokens} tokens`);
console.log(` Written to cache: ${usage.cache_creation_input_tokens} tokens`);
console.log(` New tokens: ${usage.input_tokens} tokens`);
const totalInput = (usage.cache_read_input_tokens || 0) +
(usage.cache_creation_input_tokens || 0) +
usage.input_tokens;
const cacheHitRate = (usage.cache_read_input_tokens || 0) / totalInput;
console.log(` Cache hit rate: ${(cacheHitRate * 100).toFixed(1)}%`);
return response;
}Security note: Cache invalidation occurs when tool definitions, system prompts, or message content changes. When updating security rules or tool definitions, expect cache misses on subsequent requests.
Error handling with exponential backoff
The SDK provides built-in retries (2 by default) with exponential backoff. For production systems, implement custom retry logic with jitter, Retry-After header parsing, maximum retry caps, and monitoring for persistent failures. Never retry 4xx errors except 429.
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
interface RetryConfig {
maxRetries: number;
baseDelayMs: number;
maxDelayMs: number;
retryableStatusCodes: number[];
}
const DEFAULT_RETRY_CONFIG: RetryConfig = {
maxRetries: 3,
baseDelayMs: 1000,
maxDelayMs: 60000,
retryableStatusCodes: [429, 500, 502, 503, 504]
};
async function exponentialBackoff(
attempt: number,
config: RetryConfig
): Promise<void> {
const delay = Math.min(
config.baseDelayMs * Math.pow(2, attempt),
config.maxDelayMs
);
const jitter = Math.random() * 0.1 * delay;
await new Promise(resolve => setTimeout(resolve, delay + jitter));
}
async function createMessageWithRetry(
params: Anthropic.MessageCreateParams,
retryConfig: Partial<RetryConfig> = {}
): Promise<Anthropic.Message> {
const config = { ...DEFAULT_RETRY_CONFIG, ...retryConfig };
let lastError: Error | null = null;
for (let attempt = 0; attempt <= config.maxRetries; attempt++) {
try {
return await client.messages.create(params);
} catch (error) {
lastError = error;
if (error instanceof Anthropic.APIError) {
const status = error.status;
const requestId = error.headers?.['request-id'];
console.error(`API Error (attempt ${attempt + 1}):`);
console.error(` Status: ${status}`);
console.error(` Message: ${error.message}`);
console.error(` Request ID: ${requestId}`);
// Handle specific error types
if (error instanceof Anthropic.AuthenticationError) {
throw new Error('Authentication failed. Check your API key.');
}
if (error instanceof Anthropic.BadRequestError) {
// Don't retry bad requests - fix the request instead
throw new Error(`Invalid request: ${error.message}`);
}
if (error instanceof Anthropic.RateLimitError) {
const retryAfter = error.headers?.['retry-after'];
console.warn(`Rate limited. Retry after: ${retryAfter}s`);
if (attempt < config.maxRetries) {
const waitTime = retryAfter
? parseInt(retryAfter) * 1000
: config.baseDelayMs * Math.pow(2, attempt);
await new Promise(r => setTimeout(r, waitTime));
continue;
}
}
if (error instanceof Anthropic.InternalServerError) {
if (attempt < config.maxRetries && config.retryableStatusCodes.includes(status)) {
await exponentialBackoff(attempt, config);
continue;
}
}
throw error;
}
if (attempt < config.maxRetries) {
await exponentialBackoff(attempt, config);
continue;
}
}
}
throw lastError || new Error('Max retries exceeded');
}Veto policy configuration
The policy file gates Claude's tool calls. No prompts to maintain, no model behavior to debug.
rules:
- name: block_large_transfers
description: Block transfers over $10,000
tool: transfer_funds
when: args.amount > 10000
action: deny
message: "Transfers over $10,000 require manager approval"
- name: require_approval_external_email
description: Require approval for external emails
tool: send_email
when: "!args.to.endsWith('@company.com')"
action: require_approval
message: "External email requires approval"
- name: read_only_database
description: Enforce read-only database access
tool: query_database
when: "!args.query.toUpperCase().startsWith('SELECT')"
action: deny
message: "Only SELECT queries are permitted"
- name: rate_limit_api_calls
description: Limit API call frequency
tool: external_api
when: "rateLimitExceeded(userId, 'external_api', 100, 60000)"
action: deny
message: "Rate limit exceeded. Try again in 1 minute."Common Claude guardrails
Policies that teams commonly enforce for Claude agents in production.
Tool scoping
Restrict which tools Claude can use. Use the allowed_callers field to restrict invocation to specific execution contexts. Apply principle of least privilege to all tool definitions.
Approval workflows
Route sensitive actions to humans for review. Claude waits while approvers respond via Slack, email, or dashboard. Implement mandatory review for destructive operations.
Input validation
Schema validation with Zod or JSON Schema. Block specific argument values. Prevent writes to production paths, limit transaction amounts, restrict email domains.
Rate limiting
Limit how often Claude can call expensive or dangerous tools. Implement circuit breakers to prevent runaway execution. Monitor token consumption per session.
Programmatic Tool Calling
For multi-step workflows, use Programmatic Tool Calling to orchestrate tools through code rather than natural language. This reduces context pollution, enables parallel execution, and keeps intermediate results out of the model's context. The Tool Search Tool feature reduces context consumption by up to 85% for large tool libraries.
const response = await client.beta.messages.create({
model: 'claude-sonnet-4-5-20250929',
tools: [
{ type: 'code_execution_20250825', name: 'code_execution' },
{
name: 'get_data',
input_schema: { type: 'object', properties: { source: { type: 'string' } } },
allowed_callers: ['code_execution_20250825'] // Only callable from code execution
},
{
name: 'process_data',
input_schema: { type: 'object', properties: { data: { type: 'array' } } },
allowed_callers: ['code_execution_20250825']
}
],
betas: ['advanced-tool-use-2025-11-20'],
messages: [{ role: 'user', content: 'Analyze the sales data and generate a report' }]
});The allowed_callers field restricts tool invocation to the code execution environment. Intermediate results stay out of the model's context, reducing token consumption by 37% on complex research tasks.
Getting started
Install the SDK
npm install veto-sdk @anthropic-ai/sdk
Initialize Veto
Run npx veto init to create the veto/ directory with policies and configuration.
Define policies
Add rules to veto/policies.yaml. Use YAML expressions to match tools and arguments.
Wrap and run
Call veto.wrapTools() in your Claude agent code. Guardrails activate automatically.
Frequently asked questions
Does Veto slow down Claude agent responses?
Can Claude bypass the guardrails?
What about context overflow from large tool results?
Does Veto work with Claude's MCP support?
Related integrations
Stop hoping Claude behaves. Enforce it.