How to block data exfiltration from agents
The data-exfiltration story for agents is not about writes; it is about reads. A compromised or jailbroken agent rarely drops the table. It pages through it, asks for one more column, and pipes the result into an outbound channel the policy was not watching. The fix lives at the argument layer: cap row counts on reads, deny PII columns by default, enforce tenant boundaries, and rate-limit the bulk-export and outbound paths. Four rules in YAML cover the pattern. Write them and compose them in Python.
What you'll build
- Row caps on every read tool so a single call cannot exfiltrate an entire table.
- PII column blocking that denies SSN, password hash, and raw card data by default.
- Tenant boundary enforcement so an agent cannot read across customer lines.
- Rate limits on bulk exports and outbound channels with require_approval for over-the-line traffic.
Step 1: Cap row counts on every read
The first rule belongs at the schema boundary. Make limit a required argument on every read tool and bound it in policy. A search with no limit is a paginated dump waiting to happen. Below, the rule denies any call without a limit or with a limit above 100. Adjust the cap to your data shape.
# policies/exfiltration.yaml
- name: cap_row_counts_on_reads
match:
tool: search_orders
rules:
- if: args.limit == null or args.limit > 100
then: deny
- then: allow
- name: block_pii_columns_unless_explicit
match:
tool: query_users
rules:
- if: contains_any(args.columns, ["ssn", "tax_id", "password_hash", "raw_card_number"])
then: deny
- then: allow
- name: tenant_scoped_search
match:
tool: search_orders
rules:
- if: args.tenant_id != context.tenant_id
then: deny
- then: allow
- name: rate_limit_full_table_reads
match:
tool: export_csv
rules:
- if: rate.count("export_csv", "agent.id", "1h") > 3
then: require_approval
- if: args.row_estimate > 10000
then: require_approval
Tenant boundary rules come next. Reject any read whereargs.tenant_id does not matchcontext.tenant_id. The agent can be tricked into asking for someone else's data; the policy is not.
Step 2: Wire the wrap in Python
Every read tool routes through veto.decide with the tenant id in both args and context. The duplication is intentional: it lets the policy check whether the agent is asking for its own tenant or a different one. The actual SQL also takes the tenant id as a bind parameter, which keeps the SQL layer defense-in-depth even if the policy is misconfigured.
import os
from veto_sdk import Veto
veto = Veto(api_key=os.environ["VETO_API_KEY"])
def search_orders(user_input_id: str, agent_id: str, tenant_id: str, limit: int = 50):
args = {
"tenant_id": tenant_id,
"user_id": user_input_id,
"limit": limit,
}
decision = veto.decide(
tool="search_orders",
args=args,
agent={"id": agent_id, "role": "support"},
context={"tenant_id": tenant_id, "source": "external_user"},
)
if decision.outcome != "allow":
return {"error": decision.reason}
return db.execute(
"SELECT id, status, amount FROM orders "
"WHERE tenant_id = %(tenant_id)s "
" AND user_id = %(user_id)s "
"LIMIT %(limit)s",
args,
)
See the multi-tenant isolation guide for the broader pattern around tenant id propagation.
Step 3: Block outbound channels
Reads are one half of exfil; the other half is the outbound channel. If the agent can read 50 rows but cannot exfiltrate them anywhere, the read is contained. Two rules cover most cases: allow outbound email only to internal domains and allow-list webhook destinations. Large emails or non-empty attachments escalate to a human.
# Outbound exfil: block agents from emailing customer data to themselves.
- name: outbound_email_only_to_org
match:
tool: send_email
rules:
- if: not args.to.endsWith("@approved.example")
then: require_approval
- if: args.body.length > 50000
then: require_approval
- if: args.attachments != null and args.attachments.length > 0
then: require_approval
- name: webhook_url_must_be_allow_listed
match:
tool: post_webhook
rules:
- if: not url_in_allow_list(args.url)
then: deny
The webhook allow-list is the bit attackers usually target. Without it, a jailbroken agent can be told to POST customer data to any URL on the open internet. With it, the agent is bounded to the destinations you whitelisted at deploy time.
Step 4: Mask sensitive columns
Some queries need the email or phone column to function but should not expose the raw value to the model. The policy can deny the raw column and the Python wrapper substitutes a masked version. The agent sees mask(email) as email; the original never crosses the model boundary. This pairs with the column-deny rule from step 1.
def query_users(columns: list, where: dict, agent_id: str, tenant_id: str):
decision = veto.decide(
tool="query_users",
args={"columns": columns, "where": where, "tenant_id": tenant_id},
agent={"id": agent_id, "role": "support"},
context={"tenant_id": tenant_id},
)
if decision.outcome == "deny":
return {"error": decision.reason}
sensitive = {"email", "phone"}
select_cols = [c for c in columns if c not in sensitive] + [
f"mask({c}) AS {c}" for c in columns if c in sensitive
]
return db.execute(
f"SELECT {', '.join(select_cols)} FROM users "
"WHERE tenant_id = %(tenant_id)s "
"AND " + where.sql,
{"tenant_id": tenant_id, **where.params},
)
Masking is a downstream concern after the policy decision. The policy decides which columns are even reachable, and the code below it decides what shape they reach the model in.
Failure modes to catch
Limit defaulted in the tool, not enforced in policy
A default of 50 in the tool definition is helpful, but the model can override it. The bound has to live in the policy, where the agent cannot rewrite the rule.
Missing tenant id in context
Tenant boundary rules need context.tenant_id. Without it, the rule has nothing to compare against and lets the agent ask for any tenant. Make tenant_id a required argument to each tenant-data tool wrapper.
No outbound allow-list
Outbound is the actual exfil path. If you only restrict reads but leave webhook and email tools wide open, a jailbroken agent will find the channel.
Production checklist
- Every read tool has a row-cap rule in policy with a non-default upper bound.
- PII column allow-list lives in policy and denies SSN, password_hash, and raw_card_number by default.
- Tenant boundary rule denies cross-tenant reads on governed tenant-data tools.
- Webhook destinations and outbound email domains live in an allow-list.
- Bulk-export rate limit fires require_approval above a sane hourly threshold per agent.
FAQ
Why are read tools the exfiltration risk and not the write tools?⌄
Writes get most of the security budget because they cause visible damage. Reads are the exfiltration channel: a compromised or jailbroken agent rarely deletes the table; it pages through it and emails the contents to an attacker. The right defense is argument-level constraints on reads (row caps, column allow-lists, tenant boundaries) and rate limits on the outbound channels.
What about LLM responses that expose the data in chat?⌄
The chat is not the exfiltration channel; the read tool is. If the agent only saw 50 rows because the policy capped the read, the chat dump is at most 50 rows. Cap the input and the output cap takes care of itself. For high-stakes systems, add output filters on top of the policy as defense in depth.
Do I need separate policies for human users and agents?⌄
Treat them differently. Humans rarely export tens of thousands of rows in a normal day; agents do it in a normal hour. Set tighter row caps and rate limits on the agent role than on the human role. Same YAML file, different match clauses keyed on agent.role.
Related guides
The tenant-id propagation pattern that makes the boundary rules work.
Contain prompt injectionDefense in depth: the layer above argument-level exfil controls.
Shadow-test AI agent policiesRead-tool caps in particular need shadow time before enforcement.
AI agent permissionsArgument-level permission model.
Multi-tenant AI agentsLong-form on tenant boundaries and why they matter for exfil.
Bound the read path before the model finds it.