EU AI Act Evidence for AI Agents
EU AI Act timeline: Article 50 begins applying in August 2026 under the AI Omnibus political agreement, while high-risk timing splits across 2027 and 2028.
Control points
- Article 14 human oversight maps naturally to pre-action approval gates for high-impact agent tool calls.
- Article 12 logging maps to decision records: tool, arguments, policy, outcome, reviewer, and timestamp.
- Veto evidence connects AI governance claims to concrete runtime controls rather than policy documents alone.
The EU AI Act entered into force on August 1, 2024. The prohibited practices took effect in February 2025. The general-purpose AI (GPAI) rules apply from August 2025. A May 7, 2026 Parliament-Council political agreement on the AI Omnibus would set high-risk dates at December 2, 2027 for specified high-risk areas and August 2, 2028 for product-integrated systems, pending formal adoption. Article 50 transparency still has an August 2, 2026 application date, while the AI Omnibus political agreement adds a three-month transition for certain generated-content marking obligations to December 2, 2026. The practical point: build evidence before customers, auditors, or regulators ask for it.
The regulation text is not the hard part. The implementation path is specific: technical controls, YAML policies that encode them, audit records that support compliance review, and the exact Articles you need to map. If you are deploying AI agents in the EU or serving EU customers, this is your engineering blueprint.
Timeline: What Applies When
The EU AI Act rolls out in phases. Each phase brings new obligations:
Phase 1: February 2, 2025 (already in effect) - Prohibited AI practices banned (Article 5) - Social scoring, most real-time biometric identification use, and workplace or school emotion recognition restrictions - Prohibited-practices penalty tier Phase 2: August 2, 2025 (already in effect) - General-purpose AI model obligations (Articles 51-56) - GPAI transparency duties - Systemic-risk obligations for qualifying models - AI literacy obligations for deployers (Article 4) Phase 3: August 2, 2026 - Article 50 transparency rules begin applying, subject to AI Omnibus transition details - Some generated-content marking obligations move to December 2, 2026 under the AI Omnibus political agreement - Other non-delayed provisions apply Phase 4: December 2, 2027 - High-risk rules for specified areas under the Council-Parliament AI Omnibus political agreement, pending formal adoption - Covered areas include biometrics, critical infrastructure, education, employment, migration, asylum, and border control - Relevant controls include Articles 9, 10, 11, 12, 13, 14, 15, and 26 Phase 5: August 2, 2028 - High-risk rules for systems integrated into products covered by EU harmonization legislation
Annex III: Is Your AI Agent High-Risk?
Not every AI system is high-risk. The EU AI Act defines high-risk categories in Annex III. AI agents may fall into high-risk classification when their intended use maps to these domains:
- Critical infrastructure: Agents managing energy grids, water supply, transportation, or digital infrastructure.
- Education and training: Agents that determine access to education, evaluate learning outcomes, or assess qualifications.
- Employment and workers management: Agents that screen CVs, make hiring decisions, evaluate performance, or allocate tasks.
- Essential services: Agents involved in credit scoring, insurance risk assessment, or public benefit determination.
- Law enforcement and justice: Agents used in criminal profiling, evidence evaluation, or judicial decision support.
- Migration and border control: Agents that process visa applications, assess asylum claims, or perform identity verification.
- Healthcare: Agents that assist in diagnosis, treatment planning, triage, or medical device operation.
If your AI agent's intended use maps to an Annex III high-risk category, the AI Omnibus timeline points the high-risk obligations to December 2, 2027. If it is integrated into a regulated product, the AI Omnibus timeline points to August 2, 2028. Even if your agent is not high-risk, Article 4 applies AI literacy obligations to providers and deployers covered by the Act, and that rollout supports implementing the controls anyway as a risk management measure.
Article 9: Risk Management System
Article 9 requires a "risk management system" that is "established, implemented, documented and maintained" throughout the AI system's lifecycle. This is not a one-time assessment. It is a continuous process that must identify risks, evaluate their likelihood and severity, and implement mitigation measures that are tested and validated.
For AI agents, the primary risks are unauthorized tool actions, data leakage, and uncontrolled autonomous behavior. Veto's policy engine directly implements the mitigation layer: policies define what the agent is allowed to do, shadow mode lets you test policies against real traffic before enforcement, and the decision records expose risk metrics in real time.
name: healthcare-agent-risk-controls
description: "Article 9 risk management for healthcare triage agent"
risk_classification: high
annex_iii_category: healthcare
last_risk_assessment: "2026-03-15"
next_risk_assessment: "2026-06-15"
rules:
# Risk: agent makes diagnosis without clinical validation
- tool: suggest_diagnosis
action: require_approval
approval:
channel: workspace
reviewer_pool:
- role: licensed_clinician
timeout: 1800s
escalation: deny
risk_level: high
mitigation: "Human clinician must validate all diagnostic suggestions"
# Risk: agent accesses patient records beyond scope
- tool: query_patient_records
conditions:
- match:
context.treating_clinician: "true"
arguments.patient_id: "context.assigned_patients"
action: allow
- match:
arguments.patient_id: ".*"
action: deny
reason: "Access limited to assigned patients"
risk_level: medium
mitigation: "Strict patient-scope enforcement via policy"
# Risk: agent sends clinical information externally
- tool: send_communication
conditions:
- match:
arguments.channel: "^internal_"
action: allow
- match:
arguments.channel: ".*"
action: deny
reason: "External clinical communications prohibited"
risk_level: high
mitigation: "All clinical communications restricted to internal channels"
default_action: deny
shadow_mode:
enabled: true
duration: "30days"
compare_with: "previous_policy_version"Article 12: Record-Keeping
Article 12 requires "automatic recording of events (logs) over the lifetime of the system" that enable tracing of the system's operation. Article 26(6) requires deployers to retain these logs for "a period of at least six months." The logs must be sufficient to monitor the system's operation, identify risks, and facilitate post-market surveillance.
Veto decision records can create Article 12 evidence when the governed path captures the required context. Every protect() call produces a structured record with full decision context. Here is the format:
{
"record_id": "aud_eu_4a5b6c7d8e9f",
"timestamp": "2026-04-04T11:30:22.109Z",
"event_type": "tool_call_decision",
"system_identification": {
"system_name": "healthcare-triage-agent-v2.3",
"system_version": "2.3.1",
"risk_classification": "high",
"annex_iii_category": "healthcare",
"conformity_assessment_id": "CA-2026-0892"
},
"operation_context": {
"agent_id": "triage-agent-prod",
"session_id": "sess_eu_abc123",
"initiated_by": {
"user_id": "nurse_447",
"role": "triage_nurse",
"facility": "clinic_berlin_01"
}
},
"tool_call": {
"tool": "suggest_diagnosis",
"arguments": {
"symptoms": ["persistent_cough", "fever", "fatigue"],
"duration_days": 14,
"patient_age_range": "45-55"
}
},
"policy_evaluation": {
"policy_name": "healthcare-agent-risk-controls",
"policy_version": "1.4",
"rule_matched": "rule_1_diagnosis_approval",
"decision": "require_approval",
"reason": "Diagnostic suggestions require clinician validation"
},
"human_oversight": {
"approval_requested": true,
"reviewer": {
"user_id": "dr_823",
"role": "licensed_clinician",
"credential": "DE-MED-2019-4472"
},
"reviewed_at": "2026-04-04T11:32:45.882Z",
"decision": "approved_with_modification",
"modification": "Added differential diagnosis note",
"review_duration_seconds": 143
},
"eu_ai_act_metadata": {
"article_12_evidence_mapped": true,
"article_14_oversight_provided": true,
"retention_minimum": "6months",
"retention_configured": "3years"
}
}Article 13: Transparency
Article 13 requires that high-risk AI systems "be designed and developed in such a way as to ensure that their operation is sufficiently transparent to enable deployers to interpret a system's output and use it appropriately." For AI agents, this means each governed decision must be explainable: why did the agent take this action, what policy governed it, and what data informed the decision.
Veto's decision records can include the policy_evaluation block with governed records: showing which rule matched, what conditions were evaluated, and the human-readable reason for the decision. This is not a post-hoc explanation generated by another model. It is a deterministic record of the actual policy evaluation that produced the decision.
Article 14: Human Oversight
Article 14 is the most operationally demanding requirement. It mandates that high-risk AI systems include "human oversight measures" that enable natural persons to "effectively oversee" the system, "remain aware of the possible tendency of automatically relying on the output" (automation bias), and "be able to decide, in any particular situation, not to use the high-risk AI system or to otherwise disregard, override or reverse" its output.
This can be mapped to Veto's human review workflows:
name: article-14-human-oversight
description: "Article 14 compliance: human oversight for high-risk decisions"
rules:
# All diagnostic outputs require clinician review
- tool: suggest_diagnosis
action: require_approval
approval:
channel: workspace
reviewer_pool:
- role: licensed_clinician
timeout: 1800s
escalation: escalate_to_senior
context_shown:
- tool_name
- arguments
- model_reasoning
- confidence_score
- similar_past_decisions
# Treatment plan suggestions: tiered approval
- tool: suggest_treatment
action: require_approval
approval:
tiers:
- level: 1
reviewers:
- role: treating_physician
timeout: 3600s
- level: 2
reviewers:
- role: department_head
timeout: 7200s
context_shown:
- full_patient_context
- contraindication_check
- evidence_sources
# Emergency override: allow but flag for immediate review
- tool: emergency_alert
action: allow
post_action:
review_required: true
channel: pagerduty
reviewer_pool:
- role: on_call_physician
review_sla: 15minutes
oversight_workspace:
enabled: true
metrics:
- approval_rate_by_tool
- average_review_time
- override_rate
- automation_bias_indicatorsThe automation_bias_indicators metric is specifically designed for Article 14 compliance: it tracks how often reviewers approve agent suggestions without modification, flagging patterns that suggest reviewers are rubber-stamping rather than substantively evaluating the agent's output.
Article 26: Deployer Obligations
Article 26 is often overlooked because it targets deployers, not providers. If you are using an AI agent built by someone else (or built on top of a foundation model), you are a deployer. Your obligations include:
- Implement provider instructions: Follow the provider's instructions for use, including any limitations on the system's intended purpose.
- Assign human oversight: Ensure human oversight is carried out by individuals who have the "necessary competence, training and authority" to fulfill their oversight role.
- Monitor operation: Monitor the AI system's operation on the basis of the provider's instructions and report any serious incidents to the provider and relevant authorities.
- Retain logs: Keep logs generated by the AI system for at least six months, unless otherwise provided in sector-specific legislation.
- Data protection impact assessment: Before putting the system into use, carry out a DPIA as required by GDPR Article 35.
Penalty Structure
The EU AI Act sets penalty ceilings by infringement category:
┌────────────────────────────────────┬──────────────────────────────────┐ │ Infringement category │ Penalty tier │ ├────────────────────────────────────┼──────────────────────────────────┤ │ Prohibited AI practices │ Highest AI Act ceiling │ │ (Article 5) │ │ ├────────────────────────────────────┼──────────────────────────────────┤ │ High-risk system non-compliance │ Major infringement ceiling │ │ (Articles 6-27, incl. 9, 12-14) │ │ ├────────────────────────────────────┼──────────────────────────────────┤ │ False information to authorities │ Information-duty ceiling │ │ │ │ ├────────────────────────────────────┼──────────────────────────────────┤ │ SME or startup reduced fines │ Lower of: percentage or fixed │ │ │ amount (proportionality) │ └────────────────────────────────────┴──────────────────────────────────┘ These are legal ceilings, not pricing math. Actual exposure depends on role, facts, turnover, national implementation, and enforcement discretion. The durable engineering problem is evidence: which agent action ran, which rule evaluated it, what context was used, who approved it, and what decision record captures the result.
Requirements to Technical Controls
┌─────────────┬───────────────────────────────┬───────────────────────────────┐ │ Article │ Technical Control │ Veto Feature │ ├─────────────┼───────────────────────────────┼───────────────────────────────┤ │ Art. 9 │ Risk assessment + mitigation │ Policy engine with risk │ │ Risk Mgmt │ Continuous monitoring │ levels per rule. Shadow mode │ │ │ Testing and validation │ for pre-deployment testing. │ │ │ │ Decision workspace for │ │ │ │ real-time risk metrics. │ ├─────────────┼───────────────────────────────┼───────────────────────────────┤ │ Art. 12 │ Event logging │ Configured protect() calls can be logged │ │ Logging │ Traceable operation records │ with decision context. Structured │ │ │ 6-month minimum retention │ JSON format. Configurable │ │ │ │ retention (default: 1 year). │ ├─────────────┼───────────────────────────────┼───────────────────────────────┤ │ Art. 13 │ Interpretable outputs │ policy_evaluation block in │ │ Transparency │ Explainable decisions │ every log: rule matched, │ │ │ User-facing documentation │ conditions evaluated, reason. │ │ │ │ Deterministic, not generated. │ ├─────────────┼───────────────────────────────┼───────────────────────────────┤ │ Art. 14 │ Human review workflows │ require_approval action with │ │ Oversight │ Override/interrupt capability │ tiered escalation. Override │ │ │ Automation bias monitoring │ and argument modification. │ │ │ │ Approval rate metrics. │ ├─────────────┼───────────────────────────────┼───────────────────────────────┤ │ Art. 26 │ Deployer monitoring │ Decision view. Log │ │ Deployer │ Log retention (6mo+) │ retention configuration. │ │ Obligations │ Incident reporting │ Alert rules + webhook │ │ │ DPIA before deployment │ integrations for incidents. │ └─────────────┴───────────────────────────────┴───────────────────────────────┘
Practical Implementation Order
With the high-risk timetable split across 2027 and 2028, use the runway to build evidence into the workflow itself. Implementation order by priority and effort:
- Classify your systems (Week 1). Determine whether your AI agents fall under Annex III high-risk categories. If they do, everything below is mandatory. If not, implement anyway as a defensible control.
- Create decision records (Week 2). Add
protect()to every tool call path. This starts creating Article 12 evidence and gives you the data foundation for everything else. - Define policies (Weeks 3-4). Write YAML policies that encode your risk management decisions. Use deny-by-default and explicitly allow permitted actions. Deploy in shadow mode first.
- Add human oversight (Weeks 5-6). Configure approval workflows for high-risk tool calls. Train your reviewers. Set up the reviewer workspace and notification channels.
- Document and test (Weeks 7-8). Write your conformity documentation. Run adversarial tests against your policies. Generate sample audit reports. Validate retention configuration.
First governed call
The EU AI Act is a major AI governance regime whose extraterritorial reach affects any company serving EU customers, regardless of where the company is headquartered. The technical controls it requires, including risk management, logging, transparency, and human oversight, are not optional features for covered systems. Treat them as operating controls, not last-minute documentation.
Sign up and build EU AI Act evidence. Full EU AI Act evidence guide covers the relevant articles in detail, and our healthcare agent use case shows an Annex III high-risk implementation pattern.
Implementation paths
FAQ
How does EU AI Act Article 14 apply to AI agents?⌄
Article 14 requires meaningful human oversight for high-risk AI systems. For AI agents, that means humans must be able to review, approve, deny, or interrupt high-impact tool calls before they execute, especially for financial, employment, healthcare, or safety-sensitive actions.
How does Article 12 logging map to agent authorization?⌄
Article 12 expects traceable logs. Runtime authorization produces the right evidence: the attempted tool call, arguments, matched policy, outcome, timestamp, and approver details. That record is stronger than a generic application access log because it explains why an action was allowed or denied.
What Veto evidence helps with EU AI Act evidence?⌄
Veto provides policy files, approval records, and decision records that show risk controls operating at runtime. These artifacts support oversight, auditability, incident review, and change management for AI agents that can affect users or regulated processes.
Related posts
Sign up