For/ML Engineers
Veto for ML & AI Engineers

The agent passed eval. Production traffic disagrees.

You built the agent. You ran the evals. You shipped it. Then production traffic produced a tool call your eval suite did not cover. The model did not need to hallucinate: it followed a tool description in a context the description did not cover. The durable fix is not another prompt sentence; it is an action check in the execution path.

Where the control breaks

Eval is not prod. Your suite covers the cases you thought to test and the cases your trace data surfaced. Production sends inputs you did not predict, from users whose phrasing your eval prompts did not cover, in contexts where retrieval pulls back chunks you did not seed. The agent often does not fail loudly: it succeeds at calling tools in ways that produce unauthorized outcomes. A refund tool issued for the wrong amount. A search tool queried with PII it should not have seen. A code-execution tool given a snippet that touches a file it should not.

The instinct is to fix it in the prompt or in the tool description. Sometimes that works for a week. Then the next jailbreak or the next retrieval-shaped input comes through and you are back at the same postmortem. Prompting is not a tool-call safety boundary. The model is downstream of language; safety needs to be downstream of the model. Which means: a check that sits between the model's tool call and the tool's actual execution. Not in the prompt. Not in the loop. In the path.

The other thing you are losing is debugging time. When something goes wrong in production, you trace through agent state, retrieval logs, model responses, and tool outputs. The path from incident to root cause becomes stitching work. You want one decision record that shows: at time T, agent A called tool X with args Y, and this policy allowed, denied, or routed it.

What the control must do

A check between the model and the tool

Wrap the tool dispatch for a LangChain Tool, OpenAI function, Anthropic tool, or MCP server with veto.guard(). Each governed call gets evaluated before execution. The model may propose an action; Veto decides whether the call runs. Prompt engineering is not the enforcement mechanism, and retry loops stay out of the policy path.

Per-call decision records for replay

Each governed tool call records agent ID, run ID, argument payload, policy version, and outcome. Review the decision with the same relevant context to debug. Pivot from MLflow or W&B run metadata to the production decisions the agent made. Postmortems get the decision record.

Approval queue for the long tail

Some tool calls are too rare to allow blindly and too useful to deny outright. Route them to a human through the configured review queue. The agent gets back an approved or denied response and continues. The reviewer decision is recorded against identity and reason.

Shadow mode before enforcement

Run policies in shadow on real traffic before enforcement. See what would have been blocked and why, without affecting production behavior. Tune the policies, then promote the rule to enforce mode when the false positives are understood.

How it fits your stack

Veto wraps tools; it does not replace your agent framework. LangChain Tool definitions stay the same; wrap them at definition time. OpenAI function-calling and Anthropic tool use both work via the SDK. For LangGraph, gate transitions where tool execution happens so denied actions stop at the policy boundary. MCP tools can route through the gateway when you need authorization. For tracing, decision records include a trace_id field you can populate from MLflow, W&B, or your own tracing stack. Vector DB queries flow through the same wrapper if you treat them as tools.

See LangChain, OpenAI, and Claude integration docs. For human approval, see the approval workflow guide.

Signals worth operating

Production tool-call success rate

Tool calls that executed and produced a valid result without triggering an incident. Argument-level enforcement makes the high-impact tail visible before it becomes another postmortem.

Time spent on incident postmortems

From reconstructing agent state across traces to pulling a decision from the decision view. The decision record is already structured.

Policy false-positive rate

Calls Veto blocked that you expected to allow. Surface them in review, then fix the policy at the source. Same relevant args and policy version produce the same outcome.

Eval-to-prod drift

Tool calls in production that do not appear in your eval distribution. Decision records surface these so you can grow your eval suite from real traffic.

Objections to settle early

"We already have guardrails."

You have agent-loop guardrails: prompt scaffolding, output parsing, retry logic. Those make the agent more reliable inside its reasoning loop. Veto is the layer outside the loop, between the agent's decision and the world. Different problem, different solution. The two stack cleanly.

"This is going to add latency."

Policy evaluation is in-process. For high-volume cases, the policy cache stays local; there is no per-call network hop on the authorization path. The boundary that matters is after tool selection and before side effects.

"We do not have time to write policies."

Choose the policies that would have changed the outcome of recent incidents or near misses. Then run shadow mode on real traffic and let the next set come from observed calls. The policy library should grow from production reality, not from a full ontology designed up front.

Frequently asked questions

Do I already have guardrails in LangChain and the OpenAI SDK?
Those are agent-loop guardrails: they nudge the model, parse outputs, retry on wrong responses. They sit inside the agent. Veto sits outside, between the agent and the tool. The model may propose an action; Veto decides whether the governed call executes. The two are complementary, not overlapping. Teams keep their existing guardrails and add Veto for tool-call enforcement.
How does this work with LangChain Tools and the OpenAI tool-calling API?
Wrap each tool with veto.guard() before registering it with your agent framework. The wrapper has the same signature as the underlying tool, so LangChain, OpenAI function-calling, and Anthropic tool use can keep the same tool body. For LangGraph, you can also gate transitions at the node level. See the integration guides for code samples.
Can I see the args that caused a deny so I can fix the policy or fix the agent?
Each governed decision records the full input args, the policy that matched, and the outcome. The workspace's replay view shows you the same view your agent had at decision time. For postmortems, you can step through each governed tool call the agent made in the incident with policy outcomes inline.
How does this fit with MLflow and W&B for tracking?
Decision records export as JSON with run_id and trace_id fields you can populate from your existing tracing. We have a recipe for wiring MLflow run IDs through to Veto decisions so you can pivot from a tracked run to the agent's actual production tool-call decisions. Weights & Biases can integrate the same way via a custom record schema.

Related

Wrap one tool. Review one incident. Decide if you want the rest.

Choose the tool from the last postmortem. One governed tool path is enough to expose the real control boundary.