Security guide

How to block data exfiltration from agents

The data-exfiltration story for agents is not about writes; it is about reads. A compromised or jailbroken agent rarely drops the table. It pages through it, asks for one more column, and pipes the result into an outbound channel the policy was not watching. The fix lives at the argument layer: cap row counts on reads, deny PII columns by default, enforce tenant boundaries, and rate-limit the bulk-export and outbound paths. Four rules in YAML cover the pattern. Write them and compose them in Python.

  • Row caps on every read tool so a single call cannot exfiltrate an entire table.
  • PII column blocking that denies SSN, password hash, and raw card data by default.
  • Tenant boundary enforcement so an agent cannot read across customer lines.
  • Rate limits on bulk exports and outbound channels with require_approval for over-the-line traffic.

Step 1: Cap row counts on every read

The first rule belongs at the schema boundary. Make limit a required argument on every read tool and bound it in policy. A search with no limit is a paginated dump waiting to happen. Below, the rule denies any call without a limit or with a limit above 100. Adjust the cap to your data shape.

yaml
# policies/exfiltration.yaml
- name: cap_row_counts_on_reads
  match:
    tool: search_orders
  rules:
    - if: args.limit == null or args.limit > 100
      then: deny
    - then: allow

- name: block_pii_columns_unless_explicit
  match:
    tool: query_users
  rules:
    - if: contains_any(args.columns, ["ssn", "tax_id", "password_hash", "raw_card_number"])
      then: deny
    - then: allow

- name: tenant_scoped_search
  match:
    tool: search_orders
  rules:
    - if: args.tenant_id != context.tenant_id
      then: deny
    - then: allow

- name: rate_limit_full_table_reads
  match:
    tool: export_csv
  rules:
    - if: rate.count("export_csv", "agent.id", "1h") > 3
      then: require_approval
    - if: args.row_estimate > 10000
      then: require_approval

Tenant boundary rules come next. Reject any read whereargs.tenant_id does not matchcontext.tenant_id. The agent can be tricked into asking for someone else's data; the policy is not.

Step 2: Wire the wrap in Python

Every read tool routes through veto.decide with the tenant id in both args and context. The duplication is intentional: it lets the policy check whether the agent is asking for its own tenant or a different one. The actual SQL also takes the tenant id as a bind parameter, which keeps the SQL layer defense-in-depth even if the policy is misconfigured.

py
import os
from veto_sdk import Veto

veto = Veto(api_key=os.environ["VETO_API_KEY"])

def search_orders(user_input_id: str, agent_id: str, tenant_id: str, limit: int = 50):
    args = {
        "tenant_id": tenant_id,
        "user_id": user_input_id,
        "limit": limit,
    }

    decision = veto.decide(
        tool="search_orders",
        args=args,
        agent={"id": agent_id, "role": "support"},
        context={"tenant_id": tenant_id, "source": "external_user"},
    )

    if decision.outcome != "allow":
        return {"error": decision.reason}

    return db.execute(
        "SELECT id, status, amount FROM orders "
        "WHERE tenant_id = %(tenant_id)s "
        "  AND user_id = %(user_id)s "
        "LIMIT %(limit)s",
        args,
    )

See the multi-tenant isolation guide for the broader pattern around tenant id propagation.

Step 3: Block outbound channels

Reads are one half of exfil; the other half is the outbound channel. If the agent can read 50 rows but cannot exfiltrate them anywhere, the read is contained. Two rules cover most cases: allow outbound email only to internal domains and allow-list webhook destinations. Large emails or non-empty attachments escalate to a human.

yaml
# Outbound exfil: block agents from emailing customer data to themselves.

- name: outbound_email_only_to_org
  match:
    tool: send_email
  rules:
    - if: not args.to.endsWith("@approved.example")
      then: require_approval
    - if: args.body.length > 50000
      then: require_approval
    - if: args.attachments != null and args.attachments.length > 0
      then: require_approval

- name: webhook_url_must_be_allow_listed
  match:
    tool: post_webhook
  rules:
    - if: not url_in_allow_list(args.url)
      then: deny

The webhook allow-list is the bit attackers usually target. Without it, a jailbroken agent can be told to POST customer data to any URL on the open internet. With it, the agent is bounded to the destinations you whitelisted at deploy time.

Step 4: Mask sensitive columns

Some queries need the email or phone column to function but should not expose the raw value to the model. The policy can deny the raw column and the Python wrapper substitutes a masked version. The agent sees mask(email) as email; the original never crosses the model boundary. This pairs with the column-deny rule from step 1.

py
def query_users(columns: list, where: dict, agent_id: str, tenant_id: str):
    decision = veto.decide(
        tool="query_users",
        args={"columns": columns, "where": where, "tenant_id": tenant_id},
        agent={"id": agent_id, "role": "support"},
        context={"tenant_id": tenant_id},
    )

    if decision.outcome == "deny":
        return {"error": decision.reason}

    sensitive = {"email", "phone"}
    select_cols = [c for c in columns if c not in sensitive] + [
        f"mask({c}) AS {c}" for c in columns if c in sensitive
    ]

    return db.execute(
        f"SELECT {', '.join(select_cols)} FROM users "
        "WHERE tenant_id = %(tenant_id)s "
        "AND " + where.sql,
        {"tenant_id": tenant_id, **where.params},
    )

Masking is a downstream concern after the policy decision. The policy decides which columns are even reachable, and the code below it decides what shape they reach the model in.

Failure modes to catch

Limit defaulted in the tool, not enforced in policy

A default of 50 in the tool definition is helpful, but the model can override it. The bound has to live in the policy, where the agent cannot rewrite the rule.

Missing tenant id in context

Tenant boundary rules need context.tenant_id. Without it, the rule has nothing to compare against and lets the agent ask for any tenant. Make tenant_id a required argument to each tenant-data tool wrapper.

No outbound allow-list

Outbound is the actual exfil path. If you only restrict reads but leave webhook and email tools wide open, a jailbroken agent will find the channel.

Production checklist

  • Every read tool has a row-cap rule in policy with a non-default upper bound.
  • PII column allow-list lives in policy and denies SSN, password_hash, and raw_card_number by default.
  • Tenant boundary rule denies cross-tenant reads on governed tenant-data tools.
  • Webhook destinations and outbound email domains live in an allow-list.
  • Bulk-export rate limit fires require_approval above a sane hourly threshold per agent.

FAQ

Why are read tools the exfiltration risk and not the write tools?

Writes get most of the security budget because they cause visible damage. Reads are the exfiltration channel: a compromised or jailbroken agent rarely deletes the table; it pages through it and emails the contents to an attacker. The right defense is argument-level constraints on reads (row caps, column allow-lists, tenant boundaries) and rate limits on the outbound channels.

What about LLM responses that expose the data in chat?

The chat is not the exfiltration channel; the read tool is. If the agent only saw 50 rows because the policy capped the read, the chat dump is at most 50 rows. Cap the input and the output cap takes care of itself. For high-stakes systems, add output filters on top of the policy as defense in depth.

Do I need separate policies for human users and agents?

Treat them differently. Humans rarely export tens of thousands of rows in a normal day; agents do it in a normal hour. Set tighter row caps and rate limits on the agent role than on the human role. Same YAML file, different match clauses keyed on agent.role.

Related guides

Bound the read path before the model finds it.