Every high-risk AI system that uses trained models must be built on training, validation, and testing datasets that meet strict quality and governance standards. Article 10 sets out eight specific practices — and a sensitive-data rule for bias detection.
What's at stake
Data governance is a Section 2 requirement for high-risk AI systems. Providers must ensure compliance via Art 16(a). Failure is a violation of Article 16 obligations, subject to fines under Art 99(4)(a) — up to €15 million or 3% of global annual turnover, whichever is higher (lower cap for SMEs). Poor dataset governance is also the root cause of most biased-output enforcement risks.
Need to audit your dataset governance against Article 10?
Regumatrix generates a detailed Article 10 gap analysis for your specific AI system — covering all eight governance practices, your bias detection process, and your special-category data handling — and delivers it as a cited compliance report.
Article 10 applies to providers of high-risk AI systems — whether they are building from scratch, fine-tuning a foundation model, or integrating a third-party model into a high-risk use case. It must be satisfied before the system is placed on the market or put into service, and kept up-to-date if the dataset changes materially.
Art 10(1) — training-based systems
High-risk AI systems that use training of AI models must develop on the basis of training, validation, and testing datasets that meet the quality criteria in paragraphs 2–5 (and proposed Art 4a).
Art 10(6) — non-ML / rule-based systems
For high-risk AI systems that do not use training techniques (e.g. expert systems, rule-based logic), Articles 10(2)–(5) apply only to the testing datasets. Training and validation dataset obligations are not triggered.
Training, validation, and testing datasets must be subject to governance and management practices appropriate for the intended purpose. Article 10(2) lists eight specific areas those practices must cover:
(a) Design choices
Document the relevant design choices that shaped dataset selection and construction.
(b) Collection origin
Record data collection processes, the origin of data, and — for personal data — the original purpose of collection.
(c) Preparation operations
Document all data-preparation steps: annotation, labelling, cleaning, updating, enrichment, and aggregation.
(d) Assumptions
State the assumptions made, particularly about what the data is supposed to measure and represent.
(e) Availability & suitability
Assess the availability, quantity, and suitability of the datasets needed for the intended purpose.
(f) Bias examination
Examine for possible biases affecting health/safety, fundamental rights, or prohibited discrimination — especially where data outputs influence future inputs (feedback loops).
(g) Bias mitigation
Implement appropriate measures to detect, prevent, and mitigate biases identified under (f).
(h) Data gaps
Identify relevant data gaps or shortcomings that could prevent regulatory compliance, and explain how they will be addressed.
Datasets must be:
The quality characteristics may be met at the level of individual datasets or at the level of a combination of datasets.
Datasets must account — to the extent required by the intended purpose — for the characteristics specific to the setting in which the system will be used:
Geographical
A system deployed in rural Eastern Europe needs data from that context, not only urban Western European data.
Contextual
A clinical decision-support tool must reflect the healthcare context (primary care vs. specialist, etc.).
Behavioural
Patterns of user behaviour — how people interact with the system in practice — must be represented.
Functional
The operational function of the system (screening vs. final decision vs. advisory) shapes the data requirements.
Under current Article 10(5), providers of high-risk AI systems may exceptionally process special categories of personal data (health, biometric, racial/ethnic origin, etc.) for bias detection and correction. All six of the following conditions must be met simultaneously:
What changes: Art 1 point 5 inserts a new standalone Art 4a into the AI Act. Art 1 point 7 simultaneously amends Art 10:
Why it matters: Under current Art 10(5), only providers of high-risk AI systems have the special-category data right for bias detection. New Art 4a applies this right to providers and deployers of any AI system or model — including GPAI models, limited-risk applications, and non-high-risk systems. The six substantive conditions are preserved, adapted to the broader scope.
Art 10(2)(f) and (g) — the bias examination and bias mitigation obligations — remain in Article 10 and are unaffected by this change. Providers of high-risk systems must still document and address biases in their datasets.
What changes: COM(2025) 837 inserts a new GDPR Art 9(2)(k) which would permit processing special categories of personal data in the context of the development and operation of an AI system or AI model. This covers health data, biometric data, racial or ethnic origin data, and the other Art 9(1) GDPR categories.
The safeguards (new Art 9(5)): Controllers must:
Critical: Art 9(2)(k) does not replace Article 10 obligations
If enacted, Art 9(2)(k) provides a GDPR lawful basis for processing sensitive training data — it removes the GDPR barrier. But it does not touch the AI Act. A provider relying on Art 9(2)(k) must still meet all of Article 10's data quality and governance requirements, including the bias examination (10(2)(f)) and bias mitigation (10(2)(g)) obligations. GDPR compliance and AI Act compliance are parallel, not interchangeable.
Common Article 10 compliance failures
Regumatrix analyses your data governance process against Article 10 and produces a written gap analysis — not a checklist, but a cited legal assessment. Try it on your system.
Risk Management System
Art 9 — iterative risk identification and mitigation for high-risk AI. Data quality findings feed directly into the risk management process.
Technical Documentation
Annex IV — documentation requirements include dataset specifications, data governance records, and bias testing results.
AI Provider Obligations
Art 16 — the full checklist of provider obligations, including Art 16(a) compliance with Section 2 which covers Art 10.
Quality Management System
Art 17 — the QMS must include data management systems and procedures for all data operations feeding high-risk AI development.
Post-Market Monitoring
Art 72 — ongoing data collection from deployed systems; data quality findings from deployment can trigger dataset updates.
Fundamental Rights Impact Assessment
Art 27 — deployers of certain high-risk systems must assess fundamental rights impacts; bias in training data is a core risk to assess.
Is your dataset documentation Article 10-ready?
Regumatrix maps your training and testing data practices to every Article 10 requirement and flags the gaps — with citations, not vague advice.
Get your Article 10 analysis