EU AI Act Article 15 Isn't a Checklist. It's a Control Plane.
What 'appropriate level of accuracy, robustness and cybersecurity' actually means once you stop reading the slide deck and start writing the evidence.
Your high-risk AI system passes validation on Tuesday. A model update ships Thursday. By Friday, accuracy has drifted 4 percentage points on a subpopulation that represents 12% of your EU user base. Nobody notices for six weeks, because the compliance team signed off on a static test report and moved on.
Article 15 of the EU AI Act doesn't care that you tested once. It requires that high-risk AI systems "achieve an appropriate level of accuracy, robustness and cybersecurity" throughout their lifecycle. Three properties, each with continuous obligations, each generating distinct evidence requirements that no annual assessment will satisfy.
Most compliance teams treat this as three boxes: accuracy, check. Robustness, check. Cybersecurity, check. Then they file the conformity assessment and forget about it until the next audit cycle. That's not what the regulation says.
What Article 15 actually requires
Article 15 contains five sub-articles. Each creates operational obligations that persist from deployment through decommissioning.
Article 15.1 establishes the baseline: high-risk AI systems "shall be designed and developed in such a way that they achieve an appropriate level of accuracy, robustness and cybersecurity, and perform consistently in those respects throughout their lifecycle."
The word "consistently" is doing heavy lifting. This isn't "demonstrate once." It's "prove continuously."
Article 15.2 requires that accuracy levels, and the accuracy metrics used to measure them, are declared in the accompanying instructions for use. You must define what accuracy means for your specific system, how you measure it, and what thresholds constitute compliance.
Article 15.3 demands resilience against errors, faults, and inconsistencies that may occur within the system or its operating environment. This includes interactions with other AI systems, hardware, software, and human inputs. Robustness isn't just adversarial testing. It's operational resilience against the messy reality of production.
Article 15.4 addresses redundancy and fail-safe mechanisms. High-risk systems must include technical solutions to address risks from biased feedback loops, including feedback between system outputs and future inputs.
Article 15.5 covers cybersecurity specifically. Systems must be resilient against unauthorized third parties exploiting vulnerabilities to manipulate behavior, outputs, or performance. This explicitly names data poisoning, adversarial examples, model manipulation, and confidentiality attacks.
Why most implementations fail
The failure pattern is predictable. A compliance team reads Article 15, maps it to a risk management framework, creates a policy, performs a point-in-time assessment, and archives the results. They've satisfied the letter of exactly zero sub-articles.
Accuracy isn't a number. It's a monitoring system. Article 15.2 requires declared accuracy metrics in your instructions for use. Specific metrics, thresholds, committed publicly. When accuracy degrades, you detect, report, and remediate. A model card written at deployment time doesn't satisfy this.
Robustness isn't a pentest report. Article 15.3 covers errors, faults, and inconsistencies in the operating environment: integration failures, data quality degradation, edge cases from distribution shift. You can't test for all of these once. You need monitoring that catches them as they occur.
Cybersecurity for AI differs from traditional AppSec. Article 15.5 names four attack vectors: data poisoning, adversarial examples, model manipulation, and confidentiality attacks. Your standard vulnerability management program doesn't cover these. They require AI-specific threat models and AI-specific incident response.
The control plane model
Article 15 makes more sense as a control plane: a continuous system that observes, measures, and responds to the operational state of your AI systems.
A control plane has four functions:
- Define acceptable operating parameters (metrics, thresholds, tolerances)
- Observe the system's actual state against those parameters
- Detect deviations that constitute non-compliance
- Respond with documented remediation actions
Here's how each sub-article maps to control plane functions:
| Sub-Article | Requirement | Control Plane Function | Evidence Artifact |
|---|---|---|---|
| 15.1 | Consistent accuracy, robustness, cybersecurity throughout lifecycle | Continuous observation against declared baselines | Monitoring dashboards with historical trend data, lifecycle assessment records |
| 15.2 | Declared accuracy metrics and levels in instructions for use | Define operating parameters with measurable thresholds | Accuracy metric definitions, threshold documentation, measurement methodology |
| 15.3 | Resilience against errors, faults, inconsistencies in operating environment | Detect operational anomalies, integration failures, data quality degradation | Robustness test results, fault injection reports, integration monitoring logs |
| 15.4 | Redundancy and fail-safe for feedback loops | Respond with circuit breakers and fallback mechanisms | Feedback loop analysis, bias monitoring, fail-safe trigger logs |
| 15.5 | Resilience against data poisoning, adversarial examples, model manipulation, confidentiality attacks | Observe for attack indicators, detect exploitation attempts | AI-specific threat model, adversarial testing results, data pipeline integrity monitoring, access logs |
Each cell in the "Evidence Artifact" column represents something that must exist, stay current, and remain traceable to the specific system it governs. A single static compliance assessment produces none of this continuously.
The evidence burden nobody quantifies
Let's count the distinct evidence types an auditor or market surveillance authority will ask for.
Accuracy (15.1 + 15.2): Defined metrics with justification, baseline measurements at deployment, ongoing measurements at defined intervals, drift detection records with remediation timelines, subpopulation breakdowns, and documentation of how metrics were communicated to deployers.
Robustness (15.3 + 15.4): Fault injection results, feedback loop analysis, operating environment specifications, incident records where failures occurred, and redundancy design documentation.
Cybersecurity (15.5): AI-specific threat model, training data pipeline integrity controls, model registry with provenance, adversarial robustness testing, model extraction resistance assessment, and AI-specific incident response procedures.
That's roughly 25 distinct evidence types that must stay current for each high-risk AI system. Multiply by the number of high-risk systems you operate. A spreadsheet and quarterly review cadence won't produce this.
How Kyudo operationalizes Article 15
Kyudo's architecture treats Article 15 as what it is: a continuous control obligation that generates evidence at the speed of change, not the speed of audit cycles.
Controls Hub and the STRM Engine. The Controls Hub maps Article 15 requirements to specific controls using the STRM Engine (Set Theory Relationship Mapping). STRM maps 1,470+ controls across 80+ frameworks via the SCF meta-framework. Article 15's sub-requirements map to controls in the SCF's AAT (Artificial Intelligence & Autonomous Technology) domain, but also cross-map to ISO 42001 clauses, NIST AI RMF functions, and ISO 27001 Annex A controls for cybersecurity. One control implementation satisfies multiple framework obligations simultaneously.
CMCAE for continuous scoring. The Continuous Multi-Framework Control Assessment Engine doesn't just check whether a control exists. It scores maturity: is the control documented, implemented, operating, monitored, or optimized? For Article 15, an accuracy monitoring control at Level 2 (Implemented) means you built it. Level 3 (Operating) means it's running and producing data. Level 4 (Monitored) means deviations trigger alerts. The difference between Level 2 and Level 4 is the difference between a static assessment and a control plane.
Evidence Hub for artifact lifecycle. Every evidence artifact collected by the Evidence Hub carries a hash (proving integrity), a collection timestamp (enabling freshness scoring), and a derivation path (showing where it came from and how it was produced). For Article 15.2's accuracy monitoring, this means drift detection logs are collected automatically from your ML monitoring tools, hashed, and linked to the specific control they satisfy. When an auditor asks "show me your accuracy monitoring for system X over the past 90 days," the evidence exists, it's verifiable, and it's already linked to the control.
Tensei Copilot for advisory assessment. Tensei operates in Kyudo's Layer 2 (advisory, AI-enabled) and produces gap analyses with confidence scores and citations. Ask it "what's our Article 15.5 readiness for System X?" and it returns a scored assessment citing specific controls, their evidence state, their maturity level, and gaps. Every claim traces back to a node in the compliance graph. Below 0.7 confidence, the output gets flagged for human review. Above 0.7, it propagates to dashboards.
Two-Layer Trust Architecture. Layer 1 (deterministic, no AI) handles control scoring, evidence freshness, and framework mapping. Layer 2 (advisory, AI with confidence scores) handles gap analysis and remediation suggestions. You always know which layer produced an output.
Sovereign deployment. The entire system, including the compliance graph that connects your AI systems to their controls and evidence, deploys inside your Azure tenant. Your Article 15 evidence doesn't leave your environment.
The counter-argument: "we can do this with existing tools"
The reasonable objection: "We already have ML monitoring, AppSec, and a GRC platform. Why do we need another system?"
Steelman: ML monitoring tools (Evidently, Fiddler, Arthur, WhyLabs) handle drift detection. AppSec covers vulnerability scanning. GRC platforms store controls and evidence. Stitch them together with a reporting layer and you've satisfied Article 15 in theory.
In practice, three problems emerge:
The mapping problem. Which drift detection alert satisfies which sub-article? When your ML monitoring fires an accuracy degradation alert, someone needs to connect that alert to Article 15.2's requirement, determine whether it crosses the declared threshold, update the control's evidence state, and document the remediation. That connection doesn't exist natively in any ML monitoring tool. It's manual analyst work per alert, per system, per framework.
The evidence integrity problem. An accuracy metric exported from your monitoring tool into a spreadsheet, then uploaded to your GRC platform, then reviewed quarterly, has broken the evidence chain. Was the export tampered with? Is it the same data the monitoring tool showed? When was it actually collected vs. when was it uploaded? Native integrations that pull evidence directly from source systems and hash it at collection time are different from manual upload workflows.
The multi-framework problem. Article 15.5's cybersecurity requirements overlap with ISO 27001 A.8, NIST CSF PR.DS, and SOC 2 CC6. If you're managing each framework in isolation, you're proving the same control multiple times with slightly different evidence formats. STRM's set-theory mapping means one control implementation, evidenced once, satisfies all mapped obligations simultaneously. Without that mapping, every new framework multiplies your evidence burden linearly.
The stitched-together approach works until you have 10+ high-risk AI systems across 3+ frameworks. Then the manual mapping and evidence management overhead exceeds what analyst time can sustain.
What to do Monday morning
Article 15 enforcement is coming. Here's a concrete starting point:
- Inventory your high-risk AI systems. If you haven't done the AI inventory exercise, Article 15 compliance is impossible. You can't build a control plane for systems you haven't identified.
- Define accuracy metrics per system. Not "our model is accurate." Specific metrics, specific thresholds, specific measurement methodology, specific measurement frequency. Write it down. This becomes your Article 15.2 commitment.
- Map your AI-specific threat model. Take each of the four attack vectors in Article 15.5 (data poisoning, adversarial examples, model manipulation, confidentiality attacks) and assess your exposure per system. Which are relevant? What controls exist today? What gaps remain?
- Audit your evidence lifecycle. For each control you claim satisfies an Article 15 requirement, ask: how old is the evidence? Can I prove it wasn't modified? Can I trace it to the source system? If any answer is "no," you don't have evidence. You have a document.
- Decide on continuous vs. periodic. Article 15.1 says "throughout their lifecycle." That means continuous. If your current assessment cadence is annual or quarterly, you have an architectural decision to make about how to close that gap.
Article 15 isn't three checkboxes. It's an operating requirement that generates evidence every day your AI system runs. Treat it accordingly.
Ready to map Article 15 to your AI systems? Book a demo and see how Controls Hub maps EU AI Act requirements to operational controls with continuous evidence collection.
