Evidence guide

How to record AI agent actions

Any production agent that touches real systems eventually has to answer the same question for three audiences: a regulator during an audit, a customer during an incident, and an on-call engineer at 3 a.m. The preparation is the same: a uniform decision record with actor, action, arguments, outcome, and policy version. The schema below shows what Veto writes, how to query the common audit questions, and how to package a quarterly evidence bundle for review.

  • A uniform decision record schema with tenant, agent, tool, outcome, and policy version on every row.
  • Three indexes for routine audit queries: tenant + time, agent + time, tool + time.
  • An incident-response playbook with canned queries for tenant forensics and breach-attempt search.
  • A quarterly evidence bundle script that tarballs decisions, approvals, and policy snapshots with checksums.

Step 1: Standardize the schema

Veto writes each governed decision in a canonical shape. The fields below are the ones auditors and forensic investigators reach for. Mirror them into your own store only if you have a reason to run joins against other audit data. Otherwise, query the Veto API directly.

sql
-- Reference schema for an external audit-log store
-- (Veto stores this for you, but mirror it locally if you need to.)

CREATE TABLE agent_decisions (
    decision_id  TEXT PRIMARY KEY,
    timestamp  TIMESTAMPTZ NOT NULL,
    org_id  TEXT NOT NULL,
    tenant_id  TEXT NOT NULL,
    agent_id  TEXT NOT NULL,
    agent_role  TEXT,
    actor_user_id  TEXT,
    tool  TEXT NOT NULL,
    args  JSONB NOT NULL,
    outcome  TEXT NOT NULL CHECK (outcome IN ('allow','deny','require_approval')),
    rule_matched  TEXT,
    policy_version  TEXT,
    approval_id  TEXT,
    request_id  TEXT,
    latency_ms  INTEGER,
    source  TEXT
);

CREATE INDEX idx_decisions_tenant_time ON agent_decisions (tenant_id, timestamp DESC);
CREATE INDEX idx_decisions_agent_time  ON agent_decisions (agent_id, timestamp DESC);
CREATE INDEX idx_decisions_tool_time  ON agent_decisions (tool, timestamp DESC);
CREATE INDEX idx_decisions_outcome  ON agent_decisions (outcome) WHERE outcome != 'allow';

The indexes teams often miss are the partial one onoutcome != 'allow' and the composite tenant + time and agent + time indexes. Most audit questions filter to a subset of rows; the indexes keep those queries bounded.

Step 2: Query by tenant, agent, and tool

Three filter dimensions cover almost every question that lands in your inbox. Tenant scopes to a customer. Agent scopes to a service. Tool scopes to an action surface. The Python preview exposes them through the decisions.list call. Use the same arguments from the CLI and the REST API.

py
import os
from veto_sdk import Veto

veto = Veto(api_key=os.environ["VETO_API_KEY"])

# Every governed action taken on a tenant in the last 24 hours
decisions = veto.decisions.list(
    tenant_id="tn_customer_01",
    since="24h",
    limit=1000,
)

# Every denied attempt by a specific agent
denies = veto.decisions.list(
    agent_id="ag_support_001",
    outcome="deny",
    since="30d",
)

# All large refunds, regardless of outcome
large_refunds = veto.decisions.list(
    tool="refund_order",
    where="args.amount_cents > 50000",
    since="90d",
)

The where clause accepts the same expression grammar as YAML policies, so you can filter on argument fields without exporting the data first.

Step 3: Build an incident playbook

A customer reports a missing record. The usual first question is "what touched their tenant in the suspect window." Two queries cover the common shape: everything that ran, and everything the policy gated or denied. Save these in a runbook so on-call does not have to write them at 3am.

sh
# Incident: a customer reports a missing record. Find every agent
# action that touched their tenant in the suspect window.

curl https://api.veto.so/v1/decisions \
  -H "Authorization: Bearer $VETO_API_KEY" \
  -G \
  --data-urlencode "tenant_id=tn_customer_01" \
  --data-urlencode "from=2026-05-10T14:00:00Z" \
  --data-urlencode "to=2026-05-10T16:00:00Z" \
  --data-urlencode "tool=delete_user,delete_order,execute_sql" \
  > forensic_2026-05-10.jsonl

# Same range, but only the actions that the policy gated or denied.
# Useful when looking for an attempted breach.
curl https://api.veto.so/v1/decisions \
  -H "Authorization: Bearer $VETO_API_KEY" \
  -G \
  --data-urlencode "tenant_id=tn_customer_01" \
  --data-urlencode "from=2026-05-10T14:00:00Z" \
  --data-urlencode "to=2026-05-10T16:00:00Z" \
  --data-urlencode "outcome=deny,require_approval" \
  > gated_2026-05-10.jsonl

Pair the output with the prompt and response logs from your observability stack to reconstruct what the model proposed before policy intervened. That is often the highest-signal artifact for a postmortem.

Step 4: Package quarterly evidence

For evidence review, export a single tarball per quarter with three files: decisions, approvals, and the policy versions that were live during the window. The checksum file is what makes the bundle verifiable against signatures and hashes. Reviewers can verify nothing changed between export and review.

sh
# Quarterly evidence bundle for evidence review.

DEST=evidence/$(date -u +%Y-Q%q)
mkdir -p "$DEST"

# All denied or approval-gated decisions for the quarter
veto-cli decisions export \
  --from "$(date -u -d '90 days ago' +%FT00:00:00Z)" \
  --to "$(date -u +%FT23:59:59Z)" \
  --outcome deny,require_approval \
  > "$DEST/decisions.jsonl"

# Approvals with approver identity and time-to-decision
veto-cli approvals export \
  --from "$(date -u -d '90 days ago' +%FT00:00:00Z)" \
  --to "$(date -u +%FT23:59:59Z)" \
  > "$DEST/approvals.jsonl"

# Policy snapshots that were live during the quarter
veto-cli policies snapshots \
  --from "$(date -u -d '90 days ago' +%FT00:00:00Z)" \
  --to "$(date -u +%FT23:59:59Z)" \
  > "$DEST/policy-versions.jsonl"

sha256sum "$DEST"/*.jsonl > "$DEST/checksums.txt"
tar -czf "$DEST.tar.gz" "$DEST"

For the SOC 2 control mapping, see the SOC 2 decision-record guide.

Failure modes to catch

No tenant id on the decision

Without tenant_id, you cannot answer the most common audit question. Pass it on the decide call, even in single-tenant deployments. It becomes the boundary you reach once a second customer exists.

Lossy argument logging

Logging only the tool name and outcome makes forensics painful. Keep the full argument payload (with sensitive fields redacted; see the SOC 2 guide for the redaction configuration).

No policy snapshot

A decision record without the policy version that produced the decision is half the story. Veto can attach policy_version. If you mirror the data, preserve the field.

Production checklist

  • Every decide call passes tenant_id, even on single-tenant systems.
  • Indexes exist on (tenant_id, timestamp), (agent_id, timestamp), (tool, timestamp).
  • Incident playbook is in the runbook and tested with a tabletop exercise.
  • Quarterly evidence bundle script runs unattended in CI on the first of each quarter.
  • Checksums file is regenerated when the bundle is created and stored next to the tarball.

FAQ

How is this different from application logs?

Application logs are operational. They rotate on operational schedules and skip fields the system does not need to keep running. Decision records are evidence. They keep the actor, action, outcome, and reason for the configured retention window, should not mutate, and need to answer the question 'what did this agent do on the night of X.' The two have different retention, indexing, and access patterns. Run them separately.

Do I need to store the decision records myself?

Veto can store them with configured retention and decision records with verification metadata when configured, so many teams do not need to own separate decision-record storage on day one. Many teams mirror the data into their own warehouse for joins against other audit sources. The export commands in step 4 produce JSONL that drops cleanly into Snowflake, BigQuery, or S3 Athena.

What gets logged for an LLM that does not call any tools?

Nothing, by default. Veto records decisions, which happen at the tool-call boundary. If you also want to log prompts and completions, route those through a separate observability layer; mixing the two is generally a mistake because their retention and access constraints differ.

Related guides

Have the answer before the question lands.