Claude Agent Guardrails with Veto

Runtime authorization for Claude agents. Block dangerous tool calls, enforce policies, and maintain control without modifying your agent's prompts or behavior.

Claude agent guardrails and authorization

Claude agent guardrails are runtime controls that intercept and evaluate tool calls made by Claude before execution. Unlike prompt-based instructions, Veto guardrails operate independently of Claude's reasoning and cannot be bypassed by the model. This provides deterministic, auditable control over what your Claude agents can do.

Why Claude agents need authorization

Claude is one of the most capable AI models for agentic workflows. With the Anthropic SDK, Claude can use tools to read files, make API calls, execute code, and interact with external systems. This power comes with risk. A Claude agent with write access to your database, payment system, or customer data can cause real damage if it makes the wrong decision.

We've seen what happens when agents operate without guardrails. A coding agent deleted a production database after being told eleven times to stop. An AI safety researcher's own agent ignored her instructions and deleted hundreds of emails. Both agents were authenticated. Neither was authorized. That distinction is why Veto exists.

OWASP LLM Top 10 (2025) identifies prompt injection (LLM01:2025) as the primary security vulnerability for tool-calling agents, requiring defense-in-depth with system prompt hardening and deterministic input filtering. Claude's constitutional AI approach makes it safer than many models, but safety is not authorization. Guardrails enforce your policies regardless of Claude's own safety training.

How Veto works with Claude

Veto sits between Claude and your tools. When Claude decides to use a tool, Veto intercepts the call before execution, evaluates it against your policies, and either allows, blocks, or routes it to human approval. Claude is unaware it's being governed.

1

Wrap your tools

Use the Veto SDK to wrap your tool definitions and implementations. The SDK provides adapters for Anthropic's tool format out of the box.

2

Pass to Claude

Give Claude the wrapped tool definitions. Claude sees the same tool schemas it expects. Your agent code doesn't change.

3

Execute through Veto

When Claude calls a tool, Veto evaluates it against your policies first. Allowed calls proceed to your implementation. Denied calls return a configurable response to Claude.

Secure tool definition with Zod validation

The Anthropic SDK supports tool definitions with JSON Schema or Zod schemas for input validation. The allowed_callers field restricts which execution contexts can invoke specific tools. Programmatic Tool Calling enables Claude to orchestrate tools through code rather than individual API round-trips, reducing context pollution and improving security.

Secure Tool Definition with Authorization Checkstypescript
import Anthropic from '@anthropic-ai/sdk';
import { betaZodTool } from '@anthropic-ai/sdk/helpers/beta/zod';
import { z } from 'zod';

const client = new Anthropic();

// Authorization middleware
async function checkAuthorization(userId: string, toolName: string): Promise<boolean> {
  const permissions = await fetchUserPermissions(userId);
  return permissions.allowedTools.includes(toolName);
}

// Rate limit tracking
const rateLimiter = new Map<string, { count: number; resetAt: number }>();

function checkRateLimit(userId: string, maxRequests: number = 100): boolean {
  const now = Date.now();
  const userLimit = rateLimiter.get(userId);

  if (!userLimit || now > userLimit.resetAt) {
    rateLimiter.set(userId, { count: 1, resetAt: now + 60000 });
    return true;
  }

  if (userLimit.count >= maxRequests) {
    return false;
  }

  userLimit.count++;
  return true;
}

// Secure tool with schema validation
const secureDatabaseTool = betaZodTool({
  name: 'query_database',
  description: 'Execute a read-only database query with user authorization',
  inputSchema: z.object({
    query: z.string().max(1000).describe('SQL SELECT query'),
    table: z.enum(['users', 'orders', 'products']).describe('Target table'),
    limit: z.number().min(1).max(100).default(10)
  }),
  run: async (input, context) => {
    // Authorization check
    if (!await checkAuthorization(context.userId, 'query_database')) {
      throw new Error('Unauthorized: User lacks permission for database queries');
    }

    // Rate limit check
    if (!checkRateLimit(context.userId)) {
      throw new Error('Rate limit exceeded. Please try again later.');
    }

    // Input sanitization
    const sanitizedQuery = sanitizeSQL(input.query);
    if (!sanitizedQuery.startsWith('SELECT')) {
      throw new Error('Only SELECT queries are allowed');
    }

    return executeQuery(sanitizedQuery, input.table, input.limit);
  }
});

This pattern shows: (1) Zod schema validates input structure and constraints, (2) Authorization middleware checks user permissions before execution, (3) Rate limiting prevents abuse, (4) Input sanitization protects against SQL injection, (5) Tool scoping restricts operations to read-only SELECT queries.

Tool result handling with error recovery

Tool results enter the model's context window and can influence future behavior. Always validate and sanitize outputs: check for PII before including results, validate JSON structure, limit result size to prevent context overflow, and redact or mask sensitive fields.

Tool Result Handling with Validationtypescript
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

interface ToolExecutionContext {
  userId: string;
  sessionId: string;
  allowedTools: string[];
  maxRetries: number;
}

async function executeToolSafely(
  toolUse: Anthropic.ToolUseBlock,
  context: ToolExecutionContext
): Promise<Anthropic.ToolResultBlockParam> {
  const startTime = Date.now();

  try {
    // Validate tool is allowed for this context
    if (!context.allowedTools.includes(toolUse.name)) {
      return {
        type: 'tool_result',
        tool_use_id: toolUse.id,
        content: `Error: Tool '${toolUse.name}' is not authorized for this session`,
        is_error: true
      };
    }

    // Validate tool input against schema
    const toolSchema = getToolSchema(toolUse.name);
    const validatedInput = await toolSchema.parseAsync(toolUse.input);

    // Execute tool with timeout
    const result = await Promise.race([
      executeTool(toolUse.name, validatedInput, context),
      new Promise((_, reject) =>
        setTimeout(() => reject(new Error('Tool execution timeout')), 30000)
      )
    ]);

    // Validate and sanitize output
    const sanitizedResult = sanitizeOutput(result);

    return {
      type: 'tool_result',
      tool_use_id: toolUse.id,
      content: JSON.stringify(sanitizedResult)
    };

  } catch (error) {
    // Log error for monitoring (without sensitive data)
    logToolError({
      toolName: toolUse.name,
      error: error.message,
      duration: Date.now() - startTime,
      sessionId: context.sessionId
    });

    return {
      type: 'tool_result',
      tool_use_id: toolUse.id,
      content: `Error: ${error.message}`,
      is_error: true
    };
  }
}

// Main message handling loop with tool execution
async function handleMessageWithTools(
  userMessage: string,
  context: ToolExecutionContext
) {
  const messages: Anthropic.MessageParam[] = [
    { role: 'user', content: userMessage }
  ];

  let response = await client.messages.create({
    model: 'claude-sonnet-4-5-20250929',
    max_tokens: 4096,
    tools: getToolDefinitions(),
    messages
  });

  // Handle tool use loop
  while (response.stop_reason === 'tool_use') {
    const toolResults = await Promise.all(
      response.content
        .filter((block): block is Anthropic.ToolUseBlock => block.type === 'tool_use')
        .map(toolUse => executeToolSafely(toolUse, context))
    );

    messages.push(
      { role: 'assistant', content: response.content },
      { role: 'user', content: toolResults }
    );

    response = await client.messages.create({
      model: 'claude-sonnet-4-5-20250929',
      max_tokens: 4096,
      tools: getToolDefinitions(),
      messages
    });
  }

  return response;
}

Streaming with real-time guardrails

Streaming responses should implement real-time content filtering to prevent PII or sensitive data from reaching users before it can be redacted. Buffer-based filtering enables context-aware detection for hallucination indicators and content policy violations.

Streaming with Content Filteringpython
import asyncio
from anthropic import AsyncAnthropic
from typing import AsyncIterator
import re

client = AsyncAnthropic()

# Guardrail patterns for content filtering
SENSITIVE_PATTERNS = [
    r'\b\d{3}[-.]?\d{2}[-.]?\d{4}\b',  # SSN
    r'\b[\w\.-]+@[\w\.-]+\.\w+\b',     # Email
    r'\b\d{16}\b',                       # Credit card
]

class StreamingGuardrail:
    def __init__(self):
        self.buffer = ""
        self.blocked = False

    def check_content(self, text: str) -> str:
        """Filter sensitive content from streaming text."""
        sanitized = text
        for pattern in SENSITIVE_PATTERNS:
            sanitized = re.sub(pattern, '[REDACTED]', sanitized)
        return sanitized

    def check_hallucination_indicators(self, text: str) -> bool:
        """Detect potential hallucination patterns."""
        indicators = [
            "I don't have access to",
            "I cannot verify",
            "based on my training",
        ]
        return any(indicator in text.lower() for indicator in indicators)

async def stream_with_guardrails(
    message: str,
    system_prompt: str,
    max_tokens: int = 4096
) -> AsyncIterator[str]:
    """Stream responses with real-time guardrails."""

    guardrail = StreamingGuardrail()
    full_response = ""

    async with client.messages.stream(
        model="claude-sonnet-4-5-20250929",
        max_tokens=max_tokens,
        system=system_prompt,
        messages=[{"role": "user", "content": message}]
    ) as stream:
        async for event in stream:
            if event.type == "content_block_delta":
                if hasattr(event.delta, 'text'):
                    raw_text = event.delta.text

                    # Apply guardrails
                    sanitized_text = guardrail.check_content(raw_text)

                    # Check for hallucination indicators
                    if guardrail.check_hallucination_indicators(raw_text):
                        yield "\n[Warning: Potential unverified information]\n"

                    full_response += sanitized_text
                    yield sanitized_text

            elif event.type == "message_delta":
                if hasattr(event, 'stop_reason'):
                    if event.stop_reason == "refusal":
                        yield "\n[Content policy triggered]\n"
                        guardrail.blocked = True

    # Post-stream validation
    if not guardrail.blocked:
        final_message = await stream.get_final_message()
        print(f"Input tokens: {final_message.usage.input_tokens}")
        print(f"Output tokens: {final_message.usage.output_tokens}")

Prompt caching for cost optimization

Claude's prompt caching provides significant cost savings (90% reduction on cache reads) and supports both automatic caching for multi-turn conversations and explicit breakpoints for fine-grained control. Cache entries are isolated per workspace, with up to 4 breakpoints per request.

Prompt Caching with Security Considerationstypescript
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

interface CachedPromptConfig {
  systemPrompt: string;
  tools: Anthropic.Tool[];
  examples: string[];
  ttl: '5m' | '1h';
}

async function createCachedRequest(config: CachedPromptConfig) {
  // System prompt with caching - cacheable prefix
  const systemBlocks = [
    {
      type: 'text' as const,
      text: config.systemPrompt,
      cache_control: { type: 'ephemeral' as const, ttl: config.ttl }
    }
  ];

  // Tools are cached automatically as part of the prefix
  // Mark frequently used tools with explicit caching
  const cachedTools = config.tools.map(tool => ({
    ...tool,
    cache_control: { type: 'ephemeral' as const }
  }));

  const response = await client.messages.create({
    model: 'claude-sonnet-4-5-20250929',
    max_tokens: 4096,
    system: systemBlocks,
    tools: cachedTools,
    // Automatic caching for conversation history
    cache_control: { type: 'ephemeral' },
    messages: [
      {
        role: 'user',
        content: 'Process the following request...'
      }
    ]
  });

  // Monitor cache performance
  const usage = response.usage;
  console.log('Cache Performance:');
  console.log(`  Read from cache: ${usage.cache_read_input_tokens} tokens`);
  console.log(`  Written to cache: ${usage.cache_creation_input_tokens} tokens`);
  console.log(`  New tokens: ${usage.input_tokens} tokens`);

  const totalInput = (usage.cache_read_input_tokens || 0) +
                     (usage.cache_creation_input_tokens || 0) +
                     usage.input_tokens;
  const cacheHitRate = (usage.cache_read_input_tokens || 0) / totalInput;
  console.log(`  Cache hit rate: ${(cacheHitRate * 100).toFixed(1)}%`);

  return response;
}

Security note: Cache invalidation occurs when tool definitions, system prompts, or message content changes. When updating security rules or tool definitions, expect cache misses on subsequent requests.

Error handling with exponential backoff

The SDK provides built-in retries (2 by default) with exponential backoff. For production systems, implement custom retry logic with jitter, Retry-After header parsing, maximum retry caps, and monitoring for persistent failures. Never retry 4xx errors except 429.

Comprehensive Error Handlingtypescript
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

interface RetryConfig {
  maxRetries: number;
  baseDelayMs: number;
  maxDelayMs: number;
  retryableStatusCodes: number[];
}

const DEFAULT_RETRY_CONFIG: RetryConfig = {
  maxRetries: 3,
  baseDelayMs: 1000,
  maxDelayMs: 60000,
  retryableStatusCodes: [429, 500, 502, 503, 504]
};

async function exponentialBackoff(
  attempt: number,
  config: RetryConfig
): Promise<void> {
  const delay = Math.min(
    config.baseDelayMs * Math.pow(2, attempt),
    config.maxDelayMs
  );
  const jitter = Math.random() * 0.1 * delay;
  await new Promise(resolve => setTimeout(resolve, delay + jitter));
}

async function createMessageWithRetry(
  params: Anthropic.MessageCreateParams,
  retryConfig: Partial<RetryConfig> = {}
): Promise<Anthropic.Message> {
  const config = { ...DEFAULT_RETRY_CONFIG, ...retryConfig };
  let lastError: Error | null = null;

  for (let attempt = 0; attempt <= config.maxRetries; attempt++) {
    try {
      return await client.messages.create(params);

    } catch (error) {
      lastError = error;

      if (error instanceof Anthropic.APIError) {
        const status = error.status;
        const requestId = error.headers?.['request-id'];

        console.error(`API Error (attempt ${attempt + 1}):`);
        console.error(`  Status: ${status}`);
        console.error(`  Message: ${error.message}`);
        console.error(`  Request ID: ${requestId}`);

        // Handle specific error types
        if (error instanceof Anthropic.AuthenticationError) {
          throw new Error('Authentication failed. Check your API key.');
        }

        if (error instanceof Anthropic.BadRequestError) {
          // Don't retry bad requests - fix the request instead
          throw new Error(`Invalid request: ${error.message}`);
        }

        if (error instanceof Anthropic.RateLimitError) {
          const retryAfter = error.headers?.['retry-after'];
          console.warn(`Rate limited. Retry after: ${retryAfter}s`);

          if (attempt < config.maxRetries) {
            const waitTime = retryAfter
              ? parseInt(retryAfter) * 1000
              : config.baseDelayMs * Math.pow(2, attempt);
            await new Promise(r => setTimeout(r, waitTime));
            continue;
          }
        }

        if (error instanceof Anthropic.InternalServerError) {
          if (attempt < config.maxRetries && config.retryableStatusCodes.includes(status)) {
            await exponentialBackoff(attempt, config);
            continue;
          }
        }

        throw error;
      }

      if (attempt < config.maxRetries) {
        await exponentialBackoff(attempt, config);
        continue;
      }
    }
  }

  throw lastError || new Error('Max retries exceeded');
}

Veto policy configuration

The policy file gates Claude's tool calls. No prompts to maintain, no model behavior to debug.

veto/policies.yamlyaml
rules:
  - name: block_large_transfers
    description: Block transfers over $10,000
    tool: transfer_funds
    when: args.amount > 10000
    action: deny
    message: "Transfers over $10,000 require manager approval"

  - name: require_approval_external_email
    description: Require approval for external emails
    tool: send_email
    when: "!args.to.endsWith('@company.com')"
    action: require_approval
    message: "External email requires approval"

  - name: read_only_database
    description: Enforce read-only database access
    tool: query_database
    when: "!args.query.toUpperCase().startsWith('SELECT')"
    action: deny
    message: "Only SELECT queries are permitted"

  - name: rate_limit_api_calls
    description: Limit API call frequency
    tool: external_api
    when: "rateLimitExceeded(userId, 'external_api', 100, 60000)"
    action: deny
    message: "Rate limit exceeded. Try again in 1 minute."

Common Claude guardrails

Policies that teams commonly enforce for Claude agents in production.

Tool scoping

Restrict which tools Claude can use. Use the allowed_callers field to restrict invocation to specific execution contexts. Apply principle of least privilege to all tool definitions.

Approval workflows

Route sensitive actions to humans for review. Claude waits while approvers respond via Slack, email, or dashboard. Implement mandatory review for destructive operations.

Input validation

Schema validation with Zod or JSON Schema. Block specific argument values. Prevent writes to production paths, limit transaction amounts, restrict email domains.

Rate limiting

Limit how often Claude can call expensive or dangerous tools. Implement circuit breakers to prevent runaway execution. Monitor token consumption per session.

Programmatic Tool Calling

For multi-step workflows, use Programmatic Tool Calling to orchestrate tools through code rather than natural language. This reduces context pollution, enables parallel execution, and keeps intermediate results out of the model's context. The Tool Search Tool feature reduces context consumption by up to 85% for large tool libraries.

Programmatic Tool Calling with Authorizationtypescript
const response = await client.beta.messages.create({
  model: 'claude-sonnet-4-5-20250929',
  tools: [
    { type: 'code_execution_20250825', name: 'code_execution' },
    {
      name: 'get_data',
      input_schema: { type: 'object', properties: { source: { type: 'string' } } },
      allowed_callers: ['code_execution_20250825'] // Only callable from code execution
    },
    {
      name: 'process_data',
      input_schema: { type: 'object', properties: { data: { type: 'array' } } },
      allowed_callers: ['code_execution_20250825']
    }
  ],
  betas: ['advanced-tool-use-2025-11-20'],
  messages: [{ role: 'user', content: 'Analyze the sales data and generate a report' }]
});

The allowed_callers field restricts tool invocation to the code execution environment. Intermediate results stay out of the model's context, reducing token consumption by 37% on complex research tasks.

Getting started

1

Install the SDK

npm install veto-sdk @anthropic-ai/sdk
2

Initialize Veto

Run npx veto init to create the veto/ directory with policies and configuration.

3

Define policies

Add rules to veto/policies.yaml. Use YAML expressions to match tools and arguments.

4

Wrap and run

Call veto.wrapTools() in your Claude agent code. Guardrails activate automatically.

Frequently asked questions

Does Veto slow down Claude agent responses?
Minimal impact. Policy evaluation happens in-process, typically in under 10ms. ABAC policy evaluation adds 5-20ms per access decision. Cache frequently-used policies to reduce latency. Prompt caching provides 90% cost reduction on cache reads with negligible latency overhead.
Can Claude bypass the guardrails?
No. Guardrails operate at the tool-call boundary, not in prompts. Claude cannot modify, bypass, or reason around policies because they execute in your code, not in the model's context. This addresses OWASP LLM01 (Prompt Injection) by implementing deterministic input filtering.
What about context overflow from large tool results?
Use Programmatic Tool Calling to process results in code before returning to the model. Implement result size limits. The Tool Search Tool reduces context consumption by up to 85% for large tool libraries by enabling on-demand discovery.
Does Veto work with Claude's MCP support?
Yes. Veto provides a managed MCP gateway that intercepts tool calls from any MCP client including Claude Desktop, Cursor, and Windsurf. Point your MCP client at Veto's gateway URL. All tool calls flow through Veto's policy engine.

Related integrations

Stop hoping Claude behaves. Enforce it.