Model Cards as Audit-Grade Evidence

Your data science team ships a credit scoring model to production on Friday. Somewhere in the team's Git repo, there's a model card, a markdown file created six months ago during initial development. It lists the training data, the intended use, the performance metrics from the validation set. Nobody has updated it since the model was retrained in February. Nobody signed it. Nobody in compliance knows it exists.

On Monday, your AI governance lead gets a questionnaire from a client asking for documentation of AI systems used in decisions affecting their customers. She spends three days tracking down which models are in production, where the documentation lives, and whether any of it is current.

This is the state of model card practice at most organizations in 2026. The documentation exists. It satisfies nobody.

What model cards are supposed to do

Mitchell et al. introduced model cards in 2019 as a standardized format for documenting trained ML models: intended use, training data, evaluation metrics, ethical considerations, limitations. The original paper positioned them as transparency artifacts.

Under current regulatory requirements, they have a second function most organizations haven't internalized: they're evidence artifacts. They demonstrate that you documented your AI system's characteristics as required by applicable regulations.

The difference between a transparency document and an evidence artifact isn't the content. It's the lifecycle management. Evidence artifacts need provenance, versioning, integrity, freshness tracking, and explicit linkage to the governance control they satisfy.

Why now: three regulatory triggers

EU AI Act Article 11 (Technical Documentation)

Article 11 requires providers of high-risk AI systems to maintain technical documentation that demonstrates compliance with the regulation's requirements. The documentation must be "drawn up before that system is placed on the market or put into service and shall be kept up to date."

"Kept up to date" is the operative phrase. A model card written during development and never updated after retraining, fine-tuning, or performance drift doesn't satisfy this requirement. Article 11 demands documentation that reflects the current state of the system, not its state at initial deployment.

EU AI Act Article 13 (Transparency)

Article 13 requires that high-risk AI systems "be designed and developed in such a way that their operation is sufficiently transparent to enable deployers to interpret the system's output and use it appropriately." The transparency requirement extends to documentation: deployers must receive information about the system's capabilities, limitations, and conditions under which it performs as intended.

Model cards are the natural vehicle for this transparency requirement. But a stale model card, one that hasn't been updated since the model was retrained on new data, doesn't provide accurate transparency. It provides historical transparency. The deployer reads one thing. The model does another.

NIST AI RMF (MAP and GOVERN Functions)

The NIST AI Risk Management Framework is becoming the de facto reference for AI governance programs in the US. The MAP function requires documentation of AI system context, including intended purposes and expected users. The GOVERN function requires defined roles and responsibilities. Model cards satisfy both, but only if they're maintained within the governance program, not in a Git repo the compliance team has never seen.

The five conventions that make model cards audit-grade

1. Versioned and tied to model versions

Most model cards carry a version like "v1" or "Last updated: March 2025." This is insufficient for audit purposes. An evidence artifact needs a version that corresponds to a specific state of the system it documents.

Convention	Bad Practice	Audit-Grade Practice
Version identifier	"v1", "latest"	Model version hash + card revision (e.g., `model-abc123-card-r3`)
Trigger for update	"When someone remembers"	Every model retrain, fine-tune, or architecture change
Version history	None, or Git commit log	Explicit changelog in the card with date, author, and reason for change
Linkage	Filename convention	Typed relationship: card version X documents model version Y

When the model is retrained, the card gets a new revision. When the card is updated for reasons other than model changes (correcting documentation, adding context), it gets a new revision with a different trigger reason. The audit trail shows which card version was current at any point in time.

2. Signed by the responsible person

A model card without a signature is a document. A model card with a signature from the designated responsible person under your AI management system is an accountability artifact.

Under the EU AI Act, providers must designate persons responsible for the AI management system. Under ISO 42001 (AI Management System), organizations assign specific roles with defined authorities. The model card should carry the signature (digital, timestamped) of the person accountable for the AI system it documents.

This isn't bureaucratic overhead. It answers the question an auditor will ask: "Who reviewed and approved this documentation as accurate?" If the answer is "nobody, it was written by a data scientist during development," you have documentation without governance.

3. Connected to a control owner in the governance platform

Here's where most organizations fail. The model card exists in one system (a Git repo, a wiki, a shared drive). The governance controls exist in another system (the GRC platform). There's no typed relationship between them.

The control says: "AI systems shall be documented in accordance with applicable transparency requirements." The evidence for that control should be the model card. But if the model card lives in a different system with no programmatic link, the compliance team has to manually verify: does this card exist? Is it current? Does it cover this system?

That manual verification breaks at scale. An organization with 15 models in production and 3 relevant frameworks has 45+ documentation requirements to track. Without a link from the control to the evidence artifact, freshness tracking is impossible.

4. Freshness-scored

A model card for a model that was retrained 4 months ago is stale. It documents a previous version of the system. If the retraining changed the training data, the performance characteristics, or the system's behavior in edge cases, the card is actively misleading.

Freshness scoring makes staleness visible before the auditor finds it:

Freshness State	Definition	Governance Implication
Fresh	Card updated within 30 days of last model change	Full evidence credit
Aging	Card not updated 31-90 days after model change	Warning: evidence degrading
Stale	Card not updated 90+ days after model change	Zero evidence credit; control gap
Unknown	No link between card revision and model version	Cannot assess; treated as stale

The "Unknown" state is the most common in practice. Organizations have model cards and they have model registries, but no link between card revision timestamps and model deployment timestamps. Without that link, nobody can tell whether the documentation is current.

5. Stored in the Compliance Graph with lineage

The model card needs to exist as a node in your governance data model, not as a file in a folder. As a node, it has typed relationships:

Documents → specific AI system (by system ID and version)
Satisfies → specific controls (Article 11 documentation, Article 13 transparency, ISO 42001 clauses)
Signed by → responsible person (with timestamp and role)
Current for → model version (with version hash)
Supersedes → prior card version (with reason for change)

These relationships make the model card queryable within the governance system. "Show me all AI systems where the model card is stale" becomes a graph traversal, not a manual spreadsheet audit. "Show me all model cards that don't have a responsible person signature" becomes a query, not a review meeting.

How Kyudo does this

The AI Governance module in Kyudo treats model cards as first-class evidence artifacts within the Compliance Graph. Each card is a node with typed relationships to AI systems, controls, frameworks, and responsible persons.

Evidence Hub integration. Model cards can be ingested directly from Git repositories, internal wikis, or uploaded manually. On ingestion, the artifact is hashed (SHA-256), timestamped, and linked to the AI system record and the relevant controls. Freshness scoring begins immediately based on the model's last deployment or retraining event.

CMCAE assessment. The Continuous Multi-Framework Control Assessment Engine includes AI-specific controls drawn from the EU AI Act, ISO 42001, and the NIST AI RMF. Model card freshness is a factor in control maturity scoring. A control with a stale model card cannot score above Level 2 regardless of what other evidence exists.

STRM Engine mapping. The Set Theory Relationship Mapping engine maps model card requirements across frameworks. A single model card satisfies documentation requirements under Article 11, transparency requirements under Article 13, and MAP function requirements under NIST AI RMF simultaneously. You maintain one artifact; the Controls Hub tracks its coverage across all applicable frameworks.

Tensei Copilot alerting. When a model retraining event is detected (through integration with ML platform APIs or manual logging), Tensei flags the associated model card as requiring update. The responsible person receives a task. If the card isn't updated within the defined SLA, the control maturity score degrades and surfaces in the next risk review.

The counter-argument: "Our models aren't high-risk under the AI Act"

Maybe. The EU AI Act's high-risk classification covers AI systems used in credit scoring, employment, education, law enforcement, and critical infrastructure, among others. If none of your models touch these domains, the mandatory documentation requirements of Articles 11 and 13 may not apply directly.

But three responses:

First, "not legally required" and "not needed" are different. Model documentation has operational value regardless of classification. When a model drifts, when a team member leaves, when a client asks questions, current documentation saves time and reduces risk.

Second, the classification may change. The EU AI Act allows updates to the high-risk list. Organizations that build documentation discipline early avoid the scramble when requirements expand.

Third, voluntary frameworks are becoming procurement requirements. ISO 42001 certification is appearing in RFPs. "Our models aren't high-risk" doesn't answer "can you document your AI systems for our vendor assessment?"

Monday morning checklist

1. Inventory your model cards. How many ML models are in production? How many have a current model card? "Current" means updated within 30 days of the last model change. If those two numbers don't match, you have documentation gaps that are also evidence gaps.

2. Check the version linkage. Pick a model card at random. Can you determine which model version it documents? Can you verify that no retraining has occurred since the card was last updated? If not, freshness is unknown, which is functionally equivalent to stale.

3. Find the responsible person. For each model card, identify who signed off on its accuracy. If nobody did, you have documentation without accountability. Under ISO 42001 and the EU AI Act, someone must be responsible.

4. Trace the control linkage. Open your GRC platform. Find the control that requires AI system documentation. What evidence is linked to that control? If the answer is "nothing, because the model cards are in a Git repo," you have a compliance gap that only becomes visible during audit.

5. Assess one card for audit-readiness. Take your most important model's card. Apply the five conventions: versioned and tied to model version? Signed? Connected to a control owner? Freshness-scored? Stored with lineage to the AI system inventory? Score it honestly. Most organizations pass one or two out of five.

Kyudo's AI Governance module stores model cards as evidence artifacts in the Compliance Graph, with typed lineage to AI systems, version tracking, responsible person signatures, and automated freshness scoring. Documentation becomes governance, not just developer notes.

Try the AI risk assessment to see how your current AI documentation maps against EU AI Act and ISO 42001 requirements.