EU AI Act Article 9: What It Actually Requires from Your AI Agents
I've spent the last two months reading the EU AI Act — the full consolidated text on EUR-Lex, the recitals, the early guidance docs from the AI Office. Not skimming. Reading. Highlighting. Going back.
Article 9 is the one that keeps me up. It's the section that defines what a "risk management system" means for high-risk AI systems, and its enforcement kicks in August 2, 2026. Four months from now.
Most of the commentary I've seen treats this like a procurement checkbox — "do you have a risk management framework? Great, next slide." That's not what the text says. The text says something much harder, and I think most teams deploying AI agents haven't internalized it yet.
What Article 9 actually says
The core of Article 9 is one sentence that sounds bureaucratic until you unpack it:
The risk management system shall be a continuous iterative process planned and run throughout the entire lifecycle of a high-risk AI system.
"Continuous iterative process" is doing a lot of work there. This isn't "write a risk assessment before launch and file it." This is: every time your system changes — new model, new tool, new data source, new deployment context — the risk assessment has to be revisited. Documented. Updated.
For a traditional ML model that gets retrained quarterly, that's manageable. For an AI agent that picks up a new MCP server on Tuesday and starts calling three APIs it's never touched before? That's a different engineering problem entirely.
The seven requirements, unpacked
I'm going to walk through what Article 9 demands, but I'm going to skip the structure most compliance blogs use (requirement → legal text → vague advice). Instead: what does each one concretely mean if you're an engineering team shipping agents.
1. A living risk management system
Not a document. A system. The regulation uses the word "system" deliberately — Recital 66 elaborates that this means established processes with feedback loops, not a static PDF.
What this looks like in practice: you need a pipeline gate. When an agent's capabilities change — someone adds a tool, changes a system prompt, connects a new data source — something has to fire that says "risk profile changed, re-evaluate before this goes live." If you're doing this manually via Jira tickets, you'll miss things. I know because we missed things.
Early on we tried tracking agent capability changes through a shared spreadsheet. An engineer added an MCP server to a staging agent on a Friday, it got promoted to prod on Monday, and nobody re-ran the risk check because the spreadsheet was three tabs deep in a Notion doc nobody opened. That was the week I started building automated capability-change detection into our pipeline.
2. Identify known and foreseeable risks
Identification and analysis of the known and reasonably foreseeable risks that the high-risk AI system can pose to health, safety or fundamental rights.
"Reasonably foreseeable" is the phrase that will matter in enforcement. A regulator isn't going to accept "we didn't think about prompt injection" in 2026. The OWASP Top 10 for LLM Applications exists. Tool poisoning is published research. Unauthorized data access via agent tool chains is documented.
The standard here isn't "did you think of everything" — it's "did you think of the things a competent team in your position should have thought of." If the risk is in a public taxonomy and you didn't catalog it, that's a gap.
For engineering teams, this means maintaining a risk registry that's scoped to agent capabilities, not just model properties. Every tool an agent can call, every data source it can access, every action it can take — each one carries risks that need to be enumerated. And when you add a tool, the registry needs updating. Not next quarter. Before deployment.
3. Evaluate intended use and misuse
This one tripped me up on first read. The regulation explicitly requires you to evaluate risks under two conditions: the intended use case, and "reasonably foreseeable misuse."
For agents, misuse isn't hypothetical. It's a developer connecting an agent to a production database because the staging one was slow. It's someone using a customer-support agent to query internal HR data because it technically has the access. It's shadow AI — agents deployed without IT knowing they exist.
You can't evaluate misuse risks for agents you don't know about. This is where the compliance problem and the security problem become the same problem. If your organization doesn't have visibility into what agents are running, what tools they're connected to, and what data they're touching, you're structurally unable to satisfy this requirement.
4. Targeted risk management measures
The key word here is "targeted." Generic controls don't satisfy Article 9. The regulation says your measures must be "designed to address the risks identified" — meaning there has to be a traceable line from each risk in your registry to a specific control that mitigates it.
What this looks like for an agent with database access: not "we use encryption at rest," but "every query the agent generates is evaluated against an access control policy scoped to the requesting user's role, and queries touching PII columns are logged with the user identity, the agent identity, and the authorization context that permitted the access."
That's specific. That's auditable. That's what the regulation is asking for.
I'll be honest: we didn't get this right on the first pass either. Our early compliance rules were too coarse — "flag any database access" — which generated so many alerts that the security team started ignoring them. The second pass was about making rules precise enough to be useful: flag database access to PII columns by agents without explicit PII authorization when the requesting user's role doesn't include data-controller scope. Three conditions instead of one. Alert volume dropped 90%, signal went up.
5. Test your measures
Testing in order to identify the most appropriate and targeted risk management measures.
This is the requirement that most surprised me. The regulation doesn't just want you to have controls — it wants evidence that you tested them. That your controls actually fire when the scenario they're designed to prevent occurs.
For AI agents, this means adversarial testing. Can the agent be prompt-injected into bypassing a compliance rule? If an agent attempts data exfiltration through a tool chain, does your monitoring catch it? If someone feeds a poisoned tool description to an agent, does the control trigger?
These tests need to be documented. With results. With timestamps.
The engineering implication: your compliance rules need to be testable in isolation. Given input X (an agent action with specific context), does rule Y fire? If your compliance evaluation is a black box — or worse, relies on an LLM to make judgment calls — you can't satisfy this. The same input needs to produce the same output every time, or your test results are meaningless.
6. Compound risk — the part everyone misses
This is my candidate for the most under-discussed requirement in Article 9:
The risk management measures shall give due consideration to the combined effects and possible interactions among the risks.
An agent that can read files and an agent that can make HTTP requests are two different risk profiles. An agent that can do both has a third risk profile that's worse than either — read credentials from disk, exfiltrate them over HTTP. The compound is not the sum.
Most risk frameworks I've seen evaluate capabilities individually. Tool A: medium risk. Tool B: medium risk. But A + B in sequence? That might be critical. And the regulation explicitly says you have to consider these interactions.
This is where I think most teams will get caught. Evaluating individual actions is straightforward. Evaluating action sequences for compound risk requires modeling the state an agent accumulates across steps — what it's read, what it now knows, what it can do with that knowledge. That's a graph problem, not a checklist problem.
7. Communicate residual risk
After all controls are in place, whatever risk remains must be documented and communicated to deployers. This isn't aspirational transparency — it's a legal obligation under Article 9(7).
The humans operating your agents need to know: "These are the risks we've identified, these are the controls in place, and here's what's left over." In writing. Accessible. Updated when the risk profile changes.
For engineering teams: this means your audit trail needs to surface not just what was blocked, but what was allowed and why. If a rule evaluated an action as low-risk and permitted it, that decision — and the reasoning behind it — needs to be in the log. When a regulator asks "how did you determine this was acceptable residual risk," you need a timestamped answer, not a retroactive explanation.
The thing nobody's talking about
Here's my actual take, the thing I haven't seen in any other Article 9 breakdown: the regulation assumes your AI system is a relatively stable artifact. It was written with traditional ML in mind — a model gets trained, validated, deployed, and occasionally retrained. The risk management process maps to that lifecycle cleanly.
AI agents break this assumption. An agent's risk profile can change between requests. A user connects a new tool, the agent gets access to a new data source, the system prompt gets modified — and suddenly the risk assessment you ran yesterday is stale. The regulation says "continuous iterative process," but I don't think the drafters were imagining a system whose capabilities can change at runtime.
This isn't a loophole. Regulators will still enforce Article 9 against agent deployments. But it means the compliance infrastructure you build needs to operate at a fundamentally different cadence than what the regulation's authors probably had in mind. Not quarterly reviews. Not monthly. Every action, evaluated against the current capability set, in real-time.
That's hard. I don't think anyone has this fully solved, including us. But I think the teams that recognize this gap early will be the ones that are actually compliant in August, versus the ones holding a PDF they generated in July.
Where to start if you're behind
I'll be direct: if you haven't started on Article 9 compliance and you deploy high-risk AI agents, you're behind. But "behind" isn't "doomed." Here's the minimum path.
If any of this is useful or if you want to compare notes on how your team is approaching Article 9, I'm hamza in the Quint Discord. I've been wrong about regulatory interpretation before and I'll be wrong again — happy to be corrected.
Hamza is co-founder and CTO at Quint. He's currently deep in the EU AI Act text and wishes the font size in EUR-Lex PDFs was larger.