What is shadow mode validation?
Shadow mode validation runs your AI agent policy in observe-only mode against live production traffic. The engine evaluates each governed decision, logs what it would have done, and changes nothing. The agent executes actions as if the policy were not there, while operators get an exact picture of what enforcement would look like without the risk of breaking the agent by mistake.
Key facts
- Runs the same policy engine as enforcement, but the outcome is recorded instead of applied.
- Fits teams rolling out a new policy or a major rule change to a production agent.
- Without it, the alternative is enforcing unverified policy and discovering false positives in customer-facing breakage.
- Veto supports per-rule shadow mode so you can promote one rule at a time while keeping the rest of the policy enforced.
In plain English
The same idea that powers shadow deploys for ML models, applied to agent policy. You write a rule. You think it is right. You do not know what fraction of real traffic it will catch, what fraction it will accidentally block, or what edge cases will fire it. Shadow mode gives reviewers the actual distribution of your agent's calls.
The output is a log. For each tool call the agent attempted, the log shows the verdict the shadow rule would have returned and the verdict that was applied. You scan the differences, look for surprises, adjust the rule, repeat. When the surprises drop to zero, you flip the rule to enforcement.
How it works
Mark a rule as shadow in the policy file. The decision point evaluates it on every matching call alongside the enforced rules. The result is recorded in a separate decision stream that you can query, filter, and graph. The agent's actual behavior is unaffected: only the log changes.
Once the rule looks right, remove the shadow tag (or set mode to enforce). The promotion is a single-line diff. Because the rule lives in the same policy file, the change moves through the same code review and CI pipeline as every other policy change.
# YAML: a rule in shadow mode
- name: high_value_payments_will_require_approval
mode: shadow
match:
tool: payment.send
rules:
- if: args.amount_cents > 100000
then: require_approvalOperational consequence
Policy you cannot roll out with evidence is policy you do not roll out. Without shadow mode, each new rule is a gamble: either you enforce it cold and risk false positives blocking real work, or you do not enforce it and the system stays insecure. Shadow mode breaks the false dichotomy by giving you data on the rule's behavior before it changes anything.
The pattern also supports the audit story review teams want to see. Each rule shipped with shadow-mode evidence is a rule that landed with a written argument: "we observed N calls, M would have been gated, here is the distribution, here is why we promoted." That is the kind of paper trail SOC 2 and EU AI Act audits ask for and that ad-hoc policy changes cannot produce.
Related terms
FAQ
Is shadow mode the same as dry-run?⌄
Closely related. Dry-run usually means a one-off evaluation against a test input. Shadow mode runs continuously against live production traffic, logging what the policy would have decided. The signal is much richer because you see the actual distribution of calls, not a curated test set.
How long should I run shadow mode?⌄
Until the numbers stop surprising you. For a new policy, a few days is typical. For a major policy rewrite, a week or two. The signal you want is a stable rate of allow, deny, or escalate that matches your expectations, plus zero unexpected denies in the production traffic.
What do I look for in the shadow log?⌄
False positives (denies on traffic that should be fine) and false negatives (allows on traffic that should have been blocked). Both show up as patterns in the log. Tune the rules until the noise level is acceptable, then flip the policy to enforcement.
Can I shadow only part of a policy?⌄
Yes. Veto lets you tag individual rules as shadow while leaving the rest enforced. That is the typical rollout pattern: add a new rule in shadow, watch it for a few days, promote to enforcement when it looks right. The rest of the policy keeps protecting the system in the meantime.
Roll out new policy without breaking the agent.