How should multi-tenant AI agents enforce tenant isolation?

Bind tenant context to each governed tool call and evaluate it before execution. Policies should deny cross-tenant reads, writes, exports, and integration calls even if the model attempts them with valid credentials or retrieved context.

Why are vector database filters not enough for tenant isolation?

Vector filters reduce retrieval risk, but agents can still call tools that read, write, export, or mutate tenant data. Runtime authorization checks the action itself and can block cross-tenant tool calls after retrieval and before execution.

What control evidence do enterprise buyers expect for multi-tenant agents?

They expect tenant ID, actor, tool name, arguments, policy version, decision, timestamp, approver when required, and verification-ready records, where configured, that support SOC 2, GDPR, and customer security reviews.

Multi-Tenant AI Agent Architecture

Name: Veto
Availability: InStock
Author: Veto

Many enterprise applications are adding task-specific AI agents to multi-tenant SaaS. Unlike standard SaaS where a database row-level policy may be enough, AI agents introduce a fundamentally new isolation challenge: agents work with vector databases, long-term memory, tool access, and external APIs. A wrong permission or a prompt injection can cross tenant boundaries in ways no traditional access control model anticipated.

Why Traditional Tenant Isolation Breaks

In a standard SaaS app, tenant isolation is straightforward: every database query includes a WHERE tenant_id = ? clause, API requests are scoped to an org, and tenant isolation is treated as an application invariant. AI agents strain this model in five ways:

Agents do not log in. A traditional user has a session with a tenant context. An agent is invoked on behalf of a user, but it does not authenticate as a tenant. Each governed tool call needs explicit tenant scoping.
Vector search is approximate. Semantic search returns nearest neighbors, not exact matches. If you mix tenant embeddings in one index without strict metadata filtering, a query from Tenant A can return Tenant B's documents.
Tool access is lateral. An agent with access to a file system, database, or API can reach across tenant boundaries unless governed tools enforces tenant scoping independently.
Memory persists across sessions. Agent memory systems (conversation history, learned preferences, cached tool results) can leak information between tenants if not isolated.
Prompt injection crosses boundaries. A malicious user in Tenant A can craft input that causes the agent to access Tenant B's data through tool calls. Prompt-level isolation does not prevent this.

Architecture: Three Isolation Models

There are three patterns for structuring multi-tenant AI agents, each with different cost-isolation tradeoffs:

isolation_models.txt

Model 1: Fully Siloed (highest cost, strongest isolation)
┌─────────────────────────────────────────────────────────┐
│ Tenant A  │ Tenant B  │
│ ┌─────────────────────────┐ │ ┌─────────────────────────┐
│ │ Agent Instance  │ │ │ Agent Instance  │
│ │ Vector DB (dedicated)  │ │ │ Vector DB (dedicated)  │
│ │ File Storage (dedicated) │ │ │ File Storage (dedicated) │
│ │ Memory Store (dedicated) │ │ │ Memory Store (dedicated) │
│ │ Tool Credentials (own)  │ │ │ Tool Credentials (own)  │
│ └─────────────────────────┘ │ └─────────────────────────┘
└─────────────────────────────────────────────────────────┘
Cost: highest per tenant  Blast radius: single tenant

Model 2: Shared Infra, Logical Isolation (balanced)
┌─────────────────────────────────────────────────────────┐
│  Shared Infrastructure  │
│ ┌─────────────────────────────────────────────────────┐  │
│ │ Agent Runtime (shared, context-injected per tenant)  │  │
│ └──────────────────────┬──────────────────────────────┘  │
│  │  │
│ ┌──────────────┐ ┌─────┴──────────┐ ┌──────────────┐  │
│ │ Vector DB  │ │ Veto Policy  │ │ File Storage  │  │
│ │ (namespaced) │ │ Engine  │ │ (prefixed)  │  │
│ │ tenant_a/*  │ │ per-tenant  │ │ /tenant_a/*  │  │
│ │ tenant_b/*  │ │ policies  │ │ /tenant_b/*  │  │
│ └──────────────┘ └────────────────┘ └──────────────┘  │
└─────────────────────────────────────────────────────────┘
Cost: lower per tenant  Blast radius: controlled by policy

Model 3: Fully Shared (lowest cost, weakest isolation)
┌─────────────────────────────────────────────────────────┐
│ Single shared agent, single DB, tenant_id column only  │
│ ⚠ NOT RECOMMENDED for AI agents: too many leak vectors │
└─────────────────────────────────────────────────────────┘

Model 2 is where most production systems land. Shared infrastructure keeps costs manageable. Logical isolation through a policy engine blocks configured cross-tenant access without the operational burden of managing hundreds of isolated deployments.

Per-Tenant Policies with Veto

Different tenants have different security requirements. An enterprise customer with a SOC 2 evidence requirement needs approval workflows and reviewable decision records. A developer tenant running locally may only need basic rate limiting. Veto lets you define per-tenant policies that the same agent runtime evaluates at execution time:

policies/tenant-regulated-01.yaml

# Regulated tenant: Enterprise plan, SOC 2 evidence workflow
name: regulated-tenant-01
tenant_id: tenant_regulated_01
plan: enterprise

rules:
  - tool: query_database
    conditions:
      # Enforce tenant scoping on every query
      - match:
          arguments.query: "tenant_id\s*=\s*'tenant_regulated_01'"
        action: allow
      - match:
          arguments.query: ".*"
        action: deny
        reason: "Query must include tenant_id = 'tenant_regulated_01'"

  - tool: send_email
    constraints:
      rate_limit: 50/hour
    conditions:
      - match:
          arguments.to: "@approved.example$"
        action: allow
      - match:
          arguments.to: ".*"
        action: require_approval
        approval:
          channel: approval_channel
          webhook: "https://hooks.slack.com/services/<workspace>/<channel>/<secret>"
          timeout: 600s

  - tool: access_file
    conditions:
      - match:
          arguments.path: "^/data/tenant_regulated_01/"
        action: allow
      - match:
          arguments.path: "^/data/"
        action: deny
        reason: "Cross-tenant file access denied"

  default_action: deny
  logging:
    level: full
    retention: 3years

policies/tenant-local-01.yaml

# Local tenant: Open Source, basic controls
name: local-tenant-01
tenant_id: tenant_local_01
plan: open_source

rules:
  - tool: query_database
    conditions:
      - match:
          arguments.query: "tenant_id\s*=\s*'tenant_local_01'"
        action: allow
      - match:
          arguments.query: ".*"
        action: deny

  - tool: send_email
    constraints:
      rate_limit: 10/hour  # lower limit for Open Source
    action: allow

  - tool: access_file
    conditions:
      - match:
          arguments.path: "^/data/tenant_local_01/"
        action: allow
      - match:
          arguments.path: ".*"
        action: deny

  default_action: deny
  logging:
    level: decisions_only  # reduced logging for Open Source
    retention: 90days

The Runtime: Tenant Context Injection

The agent runtime is shared. For governed requests, Veto receives the tenant context and evaluates the correct policy. There is no conditional logic in your agent code. The same protect() call handles tenant-specific policy lookup:

multi_tenant_runtime.py

import os
from veto import Veto, Decision
import anthropic

client = anthropic.Anthropic()
veto = Veto(api_key=os.environ["VETO_API_KEY"], project="multi-tenant-agent")

async def handle_agent_request(user_message: str, tenant_id: str, user_id: str):
    """Shared agent code. Veto handles policy routing."""
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.messages.create(
            model=os.environ["ANTHROPIC_MODEL"],
            max_tokens=4096,
            tools=SHARED_TOOLS,
            messages=messages,
        )

        if response.stop_reason != "tool_use":
            return response

        tool_blocks = [b for b in response.content if b.type == "tool_use"]
        tool_results = []

        for block in tool_blocks:
            # Veto looks up the tenant's policy and evaluates against it
            decision = veto.protect(
                tool=block.name,
                arguments=block.input,
                context={
                    "tenant_id": tenant_id,  # this determines which policy applies
                    "user_id": user_id,
                    "plan": get_tenant_plan(tenant_id),
                }
            )

            if decision.action == Decision.DENY:
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": f"BLOCKED: {decision.reason}",
                    "is_error": True,
                })
            elif decision.action == Decision.APPROVAL_REQUIRED:
                approval = veto.wait_for_approval(
                    decision_id=decision.id, timeout=600
                )
                if approval.granted:
                    result = await execute_tool(block.name, block.input, tenant_id)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": str(result),
                    })
                else:
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": f"DENIED: {approval.reason}",
                        "is_error": True,
                    })
            else:
                result = await execute_tool(block.name, block.input, tenant_id)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": str(result),
                })

        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": tool_results})

Vector Database Isolation

Vector databases are the highest-risk point of cross-tenant leakage. Semantic search is approximate by design. Two approaches reduce leakage risk:

Namespaces: Create a unique namespace per tenant. Queries are scoped to the namespace. This has the smallest cross-tenant result surface. Higher cost (one index per tenant) but strongest isolation.
Metadata filtering: Stamp every vector with a tenant_id metadata field and enforce a mandatory filter on each query. Lower cost but requires discipline: each write must include the metadata, each read must include the filter. A single missing filter leaks data.

Veto enforces the second approach at the policy level. If an agent's vector_search tool call does not include the correct tenant filter in its arguments, the call is denied before it reaches the database.

Rate Limiting Per Tenant

Global rate limits are too blunt for multi-tenant systems. If your limit is 1,000 tool calls per hour across all tenants, one noisy tenant can starve the others. Veto enforces rate limits per tenant, per tool, tracked by tenant context:

rate_limiting.yaml

# Rate limits scale with plan tier
rate_limits:
  team_cloud:
    query_database: 100/hour
    send_email: 10/hour
    vector_search: 200/hour
    total_tool_calls: 500/hour

  vendor_growth:
    query_database: 1000/hour
    send_email: 100/hour
    vector_search: 2000/hour
    total_tool_calls: 5000/hour

  enterprise:
    query_database: 10000/hour
    send_email: 1000/hour
    vector_search: 20000/hour
    total_tool_calls: 50000/hour

Decision Records Per Tenant

Enterprise tenants expect dedicated decision records. When a regulated customer's compliance team asks "show me each governed action the agent took on our data during the review window," you need to produce that report without reconstructing it from scattered service traces. Veto's decision records are tenant-scoped. Each decision record entry includes the tenant context, and decision records can be exported per tenant for evidence review.

Defense in Depth

No single isolation mechanism is sufficient. A production multi-tenant agent system layers defenses:

Network layer: Tenant-scoped credentials for external APIs. An agent acting on behalf of Tenant A cannot use Tenant B's API keys.
Storage layer: Namespaced or prefixed access to files, vectors, and databases. The storage system enforces boundaries independently.
Policy layer (Veto): Runtime authorization on each governed tool call. Even if the storage layer has a misconfiguration, the policy engine blocks cross-tenant access.
Monitoring layer: Anomaly detection on cross-tenant access patterns. If Tenant A's agent suddenly starts querying paths outside its prefix, alert immediately.

If network filtering fails, the storage layer still protects data. If a container escape happens, the host has no cross-tenant credentials. Each layer reduces the blast radius.

First governed call

Adding multi-tenant support to an existing agent is one configuration change in Veto: include tenant_id in your protect() context and define tenant-specific policies. Your agent code stays identical across tenants.