GDPR + AI Act Intersection837 MAJOR changes proposedCritical for AI builders

Training AI on Personal Data: GDPR Rules & COM(2025) 837 Update

Name: Regumatrix
Author: Regumatrix

Two questions AI builders ask constantly: "Can we use our customer data to train our AI?" and "What about health or biometric data?" Today, the AI Act's Article 10(5) gives you a narrow special-category exception — for bias detection only. COM(2025) 837 proposes a much broader new GDPR lawful basis. Here is what each one means, what it does not cover, and the loophole that does not exist.

Why this matters right now

▸GDPR fines for unlawful processing of special category data: up to €20 million / 4% of global turnover under GDPR Article 83(5)
▸AI Act fines for non-compliant data governance under Art 10: high-risk violation penalty of €15M / 3% under Art 99(4)
▸The two regimes apply simultaneously — satisfying one does not satisfy the other

Does your AI system process personal data during training or operation?

Regumatrix analyses your system against Article 10 and the full AI Act — and identifies which data governance obligations apply, your risk tier, and your fine exposure under Article 99 in about 30 seconds.

Check my AI system — 3 free analyses

Part 1: Current Law — AI Act Article 10

Article 10 sets the data governance baseline for high-risk AI systems that use training techniques. Your training, validation and testing datasets must meet quality criteria, be subject to data governance practices, and be relevant, sufficiently representative, and — to the best extent possible — free of errors.

The Article 10(2) governance practices

For all training/validation/testing data, your governance must cover:

▸Relevant design choices and data collection processes (including original purpose of personal data collection)
▸Data preparation operations: annotation, labelling, cleaning, updating, enrichment, aggregation
▸Assumptions made about what the data measures and represents
▸Assessment of availability, quantity and suitability of datasets
▸Examination for possible biases affecting health, safety, fundamental rights, or prohibited discrimination
▸Measures to detect, prevent and mitigate biases identified
▸Identification of data gaps or shortcomings and how they will be addressed

Article 10(5) — the narrow special-category exception

Current law provides one specific situation where providers of high-risk AI systems may process special categories of personal data (health data, biometric data, racial/ethnic origin, etc.) without needing a separate GDPR Art 9 basis: bias detection and correction.

All six of the following conditions must be met simultaneously:

Bias detection cannot be effectively fulfilled by processing other data (including synthetic or anonymised data)
Technical limitations on re-use in place, plus state-of-the-art security and privacy measures including pseudonymisation
Data is secured, protected, with strict access controls — documented, with only authorised persons having access under confidentiality obligations
The data must not be transmitted, transferred, or accessed by third parties
Data must be deleted once bias has been corrected or the retention period has expired, whichever comes first
Records of processing must document why the processing was strictly necessary and why the objective could not be achieved by other means

Important: this is in addition to, not instead of, the GDPR requirements. Both must be satisfied independently.

Part 2: What COM(2025) 837 Proposes

COM(2025) 837 makes three 837-specific changes that directly affect AI training data. All are proposals — not yet enacted law.

PROPOSAL — not yet enacted lawCOM(2025) 837 — Digital Omnibus II

Change 1: New GDPR Art 9(2)(k) — broad AI training lawful basis

837 inserts a new GDPR Article 9(2)(k): processing of special categories of personal data is permitted "in the context of the development and operation of an AI system as defined in the AI Act or an AI model, subject to the conditions referred to in paragraph 5."

This is far broader than the current Art 10(5) exception — it covers general AI development and operation, not only bias detection. If adopted, this becomes a standalone lawful basis for processing health data, biometric data, racial/ethnic origin, and other special categories during AI training.

New Art 9(5) conditions: You must implement appropriate organisational and technical measures to avoid collecting special categories. Where you identify such data in training/testing/validation datasets despite those measures, you must remove it. If removal requires disproportionate effort, you must instead effectively protect the data from being used in outputs and from disclosure to third parties.

Change 2: GDPR Art 5(1)(b) purpose limitation clarification

837 clarifies that further processing for archiving in the public interest, scientific or historical research, or statistical purposes is compatible with the initial collection purpose — independently of the conditions in Art 6(4). This removes a common compliance obstacle for AI research and model development that builds on existing datasets.

Change 3: Contextual personal data (GDPR Art 4(1))

837 amends the definition of personal data. Data is not personal data for an entity that cannot identify the natural person to whom it relates — taking into account means reasonably likely to be used by that specific entity. A potential subsequent recipient's ability to re-identify does not make the data personal for the original controller.

For AI training: if a dataset arrives genuinely anonymised from your perspective (you have no reasonable means to re-identify), it is not personal data for you under GDPR — even if the original data holder could re-identify it. This significantly reduces compliance complexity for organisations using third-party AI training datasets.

837 is a legislative proposal — not in force. These provisions apply only if adopted by the European Parliament and Council.

The loophole that does not exist

Once 837 is enacted, some will conclude: "We now have a GDPR lawful basis to process biometric data — so we can build systems that output biometric categorisations." This conclusion is wrong.

The tension: Article 5(1)(g) of the EU AI Act prohibits AI systems that categorise natural persons based on biometric data into categories revealing or inferring racial/ethnic origin, political opinions, trade union membership, religious beliefs, sex life or sexual orientation. This is an absolute prohibition with a €35M/7% penalty.

The resolution: GDPR Art 9(2)(k) governs whether you can process biometric data as input to your system during training. AI Act Art 5(1)(g) governs what your system produces as output. Both laws apply simultaneously and independently:

You may have a GDPR lawful basis to process facial images during training
The AI Act still prohibits the system from outputting racial/ethnic category predictions from that processing
Having a GDPR basis does not exempt you from Art 5(1)(g)

Practical checklist: current law

Identify all personal data in your training sets

Map every dataset used for training, validation and testing. For each, identify whether it contains personal data and whether any falls into GDPR special categories (Art 9(1): health, biometric, racial/ethnic origin, political opinions, etc.).

Establish a GDPR Art 6 lawful basis for all personal data

Special category data or not, you need an Art 6 basis for all personal data in training datasets. The most common for commercial AI: legitimate interest (Art 6(1)(f)) or contract (Art 6(1)(b)).

Special category data? Apply Art 10(5) or a separate GDPR Art 9(2) basis

If your training data contains health, biometric or other special category data and it is for bias detection, apply Art 10(5) with all six conditions documented. If for other purposes, you need an existing GDPR Art 9(2) basis (e.g., explicit consent, public interest).

Document the AI Act Art 10(2) governance practices for your datasets

For every training/validation/testing dataset: document design choices, data origin, preparation operations, assumptions, availability assessment, bias examination, bias mitigation measures, and data gap identification.

Check Art 5(1)(g) — what does your system output?

If your system processes biometric data as input, confirm it does not output biometric categorisations that fall under the Art 5(1)(g) prohibition. If it does, you are operating a prohibited AI system regardless of GDPR compliance.

Common grey-area signals — check your situation

⚠Your training data contains age, health or demographic information and you have not mapped whether it constitutes special category data under GDPR Art 9
⚠You are relying on Art 10(5) for bias detection but have not documented all six required conditions
⚠You are planning that once 837 is enacted, Art 9(2)(k) will cover your existing processing — but have not audited whether the Art 9(5) conditions can be met
⚠Your AI model uses facial images for training and you have not checked whether its outputs constitute biometric categorisations under Art 5(1)(g)
⚠You use anonymised datasets from a third party and have not analysed whether they constitute personal data for your organisation under the contextual data clarification

Check my compliance — 3 free analyses

Frequently Asked Questions

Can I currently train a high-risk AI system on health, biometric or other special category data under the AI Act?+

Yes, but only under strict conditions and only for bias detection and correction. Article 10(5) of the AI Act allows providers of high-risk AI systems to process special categories of personal data for bias detection — but only where six cumulative conditions are all met: (a) bias detection cannot be fulfilled by other data such as synthetic or anonymised data; (b) technical limitations on re-use are in place, plus state-of-the-art security and pseudonymisation; (c) strict access controls and documentation; (d) the data must not be transmitted or accessed by third parties; (e) the data must be deleted once bias has been corrected or the retention period expires; (f) records of processing must document why this was strictly necessary. All of a–f must be satisfied simultaneously.

What does COM(2025) 837 change about training AI on personal data?+

Significantly. If adopted, 837 inserts new GDPR Article 9(2)(k), which creates a new lawful basis for processing special categories of personal data in the context of the development and operation of an AI system or AI model. This is broader than the current Article 10(5) — it covers general development and operation, not just bias detection. The conditions are set out in new Article 9(5): implement measures to avoid collecting special categories; if found, remove them; if removal is disproportionate, protect them from being used in outputs and from disclosure to third parties.

Does the new GDPR Art 9(2)(k) create a loophole in the AI Act's biometric output prohibitions?+

No. This is a critical distinction. 837 Art 9(2)(k) is a GDPR lawful basis — it governs whether you are allowed to process special category personal data as input data during AI development and operation. Article 5(1)(g) of the EU AI Act is a separate prohibition — it bans outputting biometric categorisations regardless of your GDPR basis. The two laws operate independently: GDPR controls what you can process; the AI Act controls what your system can output. Having a valid GDPR lawful basis does not exempt you from the AI Act prohibition on what your system produces.

What does the 837 contextual personal data change mean for AI training datasets?+

837 amends GDPR Article 4(1) to clarify that data is not personal data for an entity that cannot identify the natural person — even if another entity could. A potential subsequent recipient's ability to re-identify does not make the data personal for the original controller. For AI training, this means: if you receive a dataset that is anonymised from your perspective (you have no means to re-identify), it is not personal data under GDPR for you — even if the original collector or a third party could re-identify. This reduces compliance burden for organisations using genuinely anonymised training datasets.

Does the GDPR Art 9(2)(k) proposal remove the requirement for a standard Article 6 lawful basis?+

No. Article 9(2)(k) is a condition that allows processing special category data — it lifts only the Article 9(1) prohibition. A standard Article 6 lawful basis (legitimate interest, contract, consent, legal obligation, vital interests, or public task) is still required for the underlying personal data processing. Both must be satisfied independently: Article 6 for all personal data, and Article 9(2)(k) for special category data specifically.

Related Compliance Guides

Data Governance Requirements (Article 10)

Full Article 10 obligations: quality criteria, governance practices, the bias framework.

COM 837 Overview — All GDPR Changes

Complete plain-English summary of all COM(2025) 837 changes to GDPR and data law.

EU AI Act vs GDPR: Key Differences

How the two regimes interact, where they overlap, and where they diverge.

Art 5 Prohibited AI Practices

Art 5(1)(g) prohibition on biometric categorisation outputs — and the €35M/7% penalty.

Biometric AI Systems

The high-risk classification and prohibition rules for AI using biometric data.

Automated Decision-Making: GDPR + AI Act

837's Art 22 clarification alongside AI Act Art 14 human oversight obligations.

Know exactly which data rules apply to your AI system

Regumatrix analyses your AI system against Article 10, the full AI Act obligation chain, and flags any GDPR intersections — returning your risk tier, the exact obligations that apply, your fine exposure under Article 99, and an 8-section cited compliance report. About 30 seconds. No credit card required.

Start free analysis