Two questions AI builders ask constantly: "Can we use our customer data to train our AI?" and "What about health or biometric data?" Today, the AI Act's Article 10(5) gives you a narrow special-category exception — for bias detection only. COM(2025) 837 proposes a much broader new GDPR lawful basis. Here is what each one means, what it does not cover, and the loophole that does not exist.
Does your AI system process personal data during training or operation?
Regumatrix analyses your system against Article 10 and the full AI Act — and identifies which data governance obligations apply, your risk tier, and your fine exposure under Article 99 in about 30 seconds.
Check my AI system — 3 free analysesArticle 10 sets the data governance baseline for high-risk AI systems that use training techniques. Your training, validation and testing datasets must meet quality criteria, be subject to data governance practices, and be relevant, sufficiently representative, and — to the best extent possible — free of errors.
For all training/validation/testing data, your governance must cover:
Current law provides one specific situation where providers of high-risk AI systems may process special categories of personal data (health data, biometric data, racial/ethnic origin, etc.) without needing a separate GDPR Art 9 basis: bias detection and correction.
All six of the following conditions must be met simultaneously:
Important: this is in addition to, not instead of, the GDPR requirements. Both must be satisfied independently.
COM(2025) 837 makes three 837-specific changes that directly affect AI training data. All are proposals — not yet enacted law.
837 inserts a new GDPR Article 9(2)(k): processing of special categories of personal data is permitted "in the context of the development and operation of an AI system as defined in the AI Act or an AI model, subject to the conditions referred to in paragraph 5."
This is far broader than the current Art 10(5) exception — it covers general AI development and operation, not only bias detection. If adopted, this becomes a standalone lawful basis for processing health data, biometric data, racial/ethnic origin, and other special categories during AI training.
New Art 9(5) conditions: You must implement appropriate organisational and technical measures to avoid collecting special categories. Where you identify such data in training/testing/validation datasets despite those measures, you must remove it. If removal requires disproportionate effort, you must instead effectively protect the data from being used in outputs and from disclosure to third parties.
837 clarifies that further processing for archiving in the public interest, scientific or historical research, or statistical purposes is compatible with the initial collection purpose — independently of the conditions in Art 6(4). This removes a common compliance obstacle for AI research and model development that builds on existing datasets.
837 amends the definition of personal data. Data is not personal data for an entity that cannot identify the natural person to whom it relates — taking into account means reasonably likely to be used by that specific entity. A potential subsequent recipient's ability to re-identify does not make the data personal for the original controller.
For AI training: if a dataset arrives genuinely anonymised from your perspective (you have no reasonable means to re-identify), it is not personal data for you under GDPR — even if the original data holder could re-identify it. This significantly reduces compliance complexity for organisations using third-party AI training datasets.
837 is a legislative proposal — not in force. These provisions apply only if adopted by the European Parliament and Council.
Once 837 is enacted, some will conclude: "We now have a GDPR lawful basis to process biometric data — so we can build systems that output biometric categorisations." This conclusion is wrong.
The tension: Article 5(1)(g) of the EU AI Act prohibits AI systems that categorise natural persons based on biometric data into categories revealing or inferring racial/ethnic origin, political opinions, trade union membership, religious beliefs, sex life or sexual orientation. This is an absolute prohibition with a €35M/7% penalty.
The resolution: GDPR Art 9(2)(k) governs whether you can process biometric data as input to your system during training. AI Act Art 5(1)(g) governs what your system produces as output. Both laws apply simultaneously and independently:
Identify all personal data in your training sets
Map every dataset used for training, validation and testing. For each, identify whether it contains personal data and whether any falls into GDPR special categories (Art 9(1): health, biometric, racial/ethnic origin, political opinions, etc.).
Establish a GDPR Art 6 lawful basis for all personal data
Special category data or not, you need an Art 6 basis for all personal data in training datasets. The most common for commercial AI: legitimate interest (Art 6(1)(f)) or contract (Art 6(1)(b)).
Special category data? Apply Art 10(5) or a separate GDPR Art 9(2) basis
If your training data contains health, biometric or other special category data and it is for bias detection, apply Art 10(5) with all six conditions documented. If for other purposes, you need an existing GDPR Art 9(2) basis (e.g., explicit consent, public interest).
Document the AI Act Art 10(2) governance practices for your datasets
For every training/validation/testing dataset: document design choices, data origin, preparation operations, assumptions, availability assessment, bias examination, bias mitigation measures, and data gap identification.
Check Art 5(1)(g) — what does your system output?
If your system processes biometric data as input, confirm it does not output biometric categorisations that fall under the Art 5(1)(g) prohibition. If it does, you are operating a prohibited AI system regardless of GDPR compliance.
Yes, but only under strict conditions and only for bias detection and correction. Article 10(5) of the AI Act allows providers of high-risk AI systems to process special categories of personal data for bias detection — but only where six cumulative conditions are all met: (a) bias detection cannot be fulfilled by other data such as synthetic or anonymised data; (b) technical limitations on re-use are in place, plus state-of-the-art security and pseudonymisation; (c) strict access controls and documentation; (d) the data must not be transmitted or accessed by third parties; (e) the data must be deleted once bias has been corrected or the retention period expires; (f) records of processing must document why this was strictly necessary. All of a–f must be satisfied simultaneously.
Significantly. If adopted, 837 inserts new GDPR Article 9(2)(k), which creates a new lawful basis for processing special categories of personal data in the context of the development and operation of an AI system or AI model. This is broader than the current Article 10(5) — it covers general development and operation, not just bias detection. The conditions are set out in new Article 9(5): implement measures to avoid collecting special categories; if found, remove them; if removal is disproportionate, protect them from being used in outputs and from disclosure to third parties.
No. This is a critical distinction. 837 Art 9(2)(k) is a GDPR lawful basis — it governs whether you are allowed to process special category personal data as input data during AI development and operation. Article 5(1)(g) of the EU AI Act is a separate prohibition — it bans outputting biometric categorisations regardless of your GDPR basis. The two laws operate independently: GDPR controls what you can process; the AI Act controls what your system can output. Having a valid GDPR lawful basis does not exempt you from the AI Act prohibition on what your system produces.
837 amends GDPR Article 4(1) to clarify that data is not personal data for an entity that cannot identify the natural person — even if another entity could. A potential subsequent recipient's ability to re-identify does not make the data personal for the original controller. For AI training, this means: if you receive a dataset that is anonymised from your perspective (you have no means to re-identify), it is not personal data under GDPR for you — even if the original collector or a third party could re-identify. This reduces compliance burden for organisations using genuinely anonymised training datasets.
No. Article 9(2)(k) is a condition that allows processing special category data — it lifts only the Article 9(1) prohibition. A standard Article 6 lawful basis (legitimate interest, contract, consent, legal obligation, vital interests, or public task) is still required for the underlying personal data processing. Both must be satisfied independently: Article 6 for all personal data, and Article 9(2)(k) for special category data specifically.
Data Governance Requirements (Article 10)
Full Article 10 obligations: quality criteria, governance practices, the bias framework.
COM 837 Overview — All GDPR Changes
Complete plain-English summary of all COM(2025) 837 changes to GDPR and data law.
EU AI Act vs GDPR: Key Differences
How the two regimes interact, where they overlap, and where they diverge.
Art 5 Prohibited AI Practices
Art 5(1)(g) prohibition on biometric categorisation outputs — and the €35M/7% penalty.
Biometric AI Systems
The high-risk classification and prohibition rules for AI using biometric data.
Automated Decision-Making: GDPR + AI Act
837's Art 22 clarification alongside AI Act Art 14 human oversight obligations.
Regumatrix analyses your AI system against Article 10, the full AI Act obligation chain, and flags any GDPR intersections — returning your risk tier, the exact obligations that apply, your fine exposure under Article 99, and an 8-section cited compliance report. About 30 seconds. No credit card required.
Start free analysis