A regulatory perspective on synthetic data

Can synthetic data be considered fully anonymised data under the EU General Data Protection Regulation (“GDPR”)?

5 min read

 In tandem with the rise of AI, synthetic data has made quite an impact in the insights industry in recent times. While the readers may have read all about it capabilities and risks from industry experts (Synthetic data and its transformative power in the future – part 1 & part 2), but what is the regulatory perspective on synthetic data? Read on to find out in this two-part article!

The following article is an excerpt from ESOMAR’s Global Research Software 2024. This report provides the most comprehensive overview of the state of the insights industry and the evolving research software sector worldwide, using data collected by national research associations, leading companies, independent analysts, and ESOMAR representatives.

Data privacy: Where does anonymisation start?

Can synthetic data be considered fully anonymised data under the EU General Data Protection Regulation (“GDPR”)?

Under the GDPR, personal data is any information relating to an identifiable person. Even when data is synthesised, if it can be traced back to an individual, it falls under GDPR’s scope.

Pseudonymised data is personal data that can no longer be attributed to a specific data subject without additional information (the ‘key’), provided that such additional information is kept separately and subject to technical and organisational measures to ensure that the personal data are not reidentified. In contrast, anonymised data is non-identifiable data because the data that could be used to reidentify the individuals no longer exist.

The GDPR applies to identifiable personal data, including pseudonymised data. However, anonymous data is exempt from this regulation, making this data classification crucial for its treatment and the rights of the data subjects.

To determine whether a natural person is identifiable, all the means that are reasonably likely to be used to identify the natural person (directly or indirectly) should be considered, taking into account all objective factors, such as the costs of and the amount of time required for identification by the processor or any third party based on the available technology at the time of the processing and the foreseeable technological developments.

The WP29[1] Opinion 05/2014 on Anonymisation Techniques includes three specific reidentification risks:

  1. Singling out, which refers to the ability to locate an individual’s record within a dataset;

  2. Linkability, which involves linking two records about the same individual or group of individuals; and

  3. Inference, which entails confidently guessing or estimating values using other information.

In essence, robust anonymisation techniques that address these three risks would be deemed satisfactory. But, as is often the case, the optimal solution should be decided on a case-by-case basis.

With the increasing use of AI technologies and computing power, there’s an ongoing debate about the appropriate use of synthetic data. Many agree that its main value lies in training AI models, but privacy concerns arise if the data can be re-identified.

The debate revolves around whether a relative risk approach is feasible where synthetic data is considered unlikely to be identifiable and the processor only accesses anonymised data without the ‘key’ to re-identify, while the controller holds pseudonymised data. However, other data protection authorities require complete anonymisation from data controllers and processors before recognising data as fully anonymised. As long as the key to reidentify the pseudonymised data materially exists, the risk of reidentification will exist too.

Another pressing concern is the potential re-identification of anonymised data with advancing technologies, including AI. This ongoing debate underscores the evolving data protection landscape and the challenge of balancing privacy with utility.

Although it is unclear whether synthetic data can be considered fully anonymised data under the GDPR, the algorithms and models trained to produce synthetic data from real data are privacy-enhancing technologies. These are suitable tools that can help maximise the use of data by reducing risks inherent to data use. Substituting real data with synthetic counterparts allows organisations to mitigate the risk of data breaches and unauthorised access to personal information. This is particularly crucial in healthcare, where patient confidentiality is paramount. In the research field, whether pseudonymised or anonymised, synthetic data can facilitate the sharing and analysis of data (e.g., between the data controller[2] and data processor[3]) while complying with stringent privacy regulations.

The full article about synthetic data and its capabilities is now available in ESOMAR’s Global Research Software 2024. Download your copy now!

Also, for an extra EUR 90, readers can now download the report on the core market research sector—ESOMAR’s Global Market Research 2024.

[1]This Working Party was established under Article 29 of Directive 95/46/EC as an independent European advisory body on data protection and privacy. Upon the entry into force of Regulation (EU) 2016/679 (General Data Protection Regulation (GDPR)) in 2018, the European Data Protection Board (EDPB) replaced the Article 29 Working Party (Art. 29 WP), established by Directive 95/46/EC.

[2] Under the GDPR, ‘controller’ means the natural or legal person, public authority, agency, or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data — where the purposes and means of such processing are determined by Union or Member State law, the controller or the specific criteria for its nomination may be provided for by Union or Member State law.

[3] Under the GDPR, ‘processor’ means a natural or legal person, public authority, agency, or other body that processes personal data on behalf of the controller.

Claudio Gennaro
Senior Advocacy Programmes Coordinator at ESOMAR