Research note, March 2026
Can Is Not May: Authority Models for Governable AI Agents
Yaz Caleb: Veto and Arizona State University
Download PDFPrompt-only baselines permitted unauthorized actions in 18.3% of sampled trials. The authority model recorded 0 observed violations in this benchmark.
Across 7,427 trials with 4 LLMs under ambient social pressure, prompt-only guardrails permitted unauthorized actions in 18.3% of sampled cases (per-model range: 1%-40%). The authority model used deterministic interception before tool execution and recorded no unauthorized actions in the benchmark configuration.
Abstract
AI agents act through tool-use frameworks that often prove capability while leaving authority implicit. This paper introduces authority models: deterministic, external policy engines that evaluate each governed tool call against a seven-parameter May judgment before execution.
The paper formalizes capability-authority independence: the principle that whether an agent can perform an action provides no information about whether it may. All four cells of the Can x May product are realizable, and any system conflating can with may admits privilege escalation by capability acquisition.
Three formal properties are proven: Can/May Separation, Deny-Monotonicity, and Escalation Monotonicity. AuthorityBench evaluates five enforcement conditions across 54 scenarios, four LLMs, and 7,427 trials. Under ambient social pressure, prompt-only baselines permitted unauthorized actions in 18.3% of sampled trials. The authority model recorded zero observed violations in this benchmark: by architecture, not by model behavior.
Key results
Violation rate under ambient social pressure (P2): the realistic threat where context makes a violation seem natural without explicit prompt injection. Aggregate across 4 models.
Prompt-Only
18.3%
Authorized tools listed in system prompt. No external enforcement.
Keyword Filter
16.0%
Pre-execution keyword denial on tool names. Bypassed by tool substitution.
LLM-as-Judge
0.0%
The authority model evaluates each call against policy. Report the exact model pairing in your benchmark notes.
Authority Model
0.0%
Deterministic policy evaluation with session history. No unauthorized actions were observed in this benchmark configuration.
Per-model violation rates under prompt-only enforcement range from 1.0% (lowest sampled model condition) to 40.0% (highest sampled model condition). The authority model recorded 0.0% across all four models in this benchmark. Cross-model variance for prompt-only: 0.022. Cross-model variance for authority model: 0.000. Enforcement determinism is a property of the architecture, not individual model strength.
The LLM-as-Judge condition in this sample also shows 0.0% violations, but was tested on a single proposer-judge pairing with an already-low base violation rate. It does not establish that LLM judging generally matches deterministic enforcement.
AuthorityBench
54 authorization scenarios across six categories, each tested at three adversarial pressure levels: benign (P1), ambient (P2), and adversarial (P3). Four LLMs from two provider ecosystems. All scenarios include ground-truth labels and argument-level authority constraints.
Four model conditions tested. Exact model names and versions live in the preprint appendix. Five enforcement conditions: Prompt-Only, Keyword Filter, Authority Model, Authority Model (−H, without history tracking), and LLM-as-Judge.
Scenarios, policy files, and the benchmark harness are publicly available at github.com/yazcaleb/can-is-not-may.
What this means
Prompts are suggestions, not enforcement. System prompt instructions, keyword filters, and alignment training all reduced violations on average in the sample, but did not eliminate them. Under ambient social pressure, the tested prompt-based approaches permitted unauthorized actions at nonzero rates across the evaluated models.
Deterministic interception: evaluating each governed tool call against explicit policy before execution, outside the LLM's inference path, was the tested approach in this benchmark sample with no recorded unauthorized actions across the tested models, scenarios, and pressure levels. Not because the models always behave well. Because the tested architecture did not give them the choice on the governed path.
This is the research foundation behind Veto's architecture: declarative policy, runtime interception, human escalation. The open-source SDK is the reference implementation described in Section 5 of the paper.
Related concepts
The blind spot identity-only authentication leaves between intent and action.
Excessive agencyOWASP LLM06 and the systems-level cause behind most agent failures.
Runtime authorizationThe control loop the paper proposes: after planning, before execution.
Shadow mode validationObserve-only enforcement for collecting evidence before turning on blocking.
Shadow-test policiesRun new policies against live traffic without affecting outcomes.
Audit agent actionsWhat to record, where to store it, and how to make logs useful in incidents.
Read the full study.