Synthetic data and its transformative power in the future – part 1

17 March

The rise of AI has significantly impacted the insights industry through synthetic data, though it has its critics. In this first part of the two-part series, you will read about synthetic data and its potential in the insights industry.

5 min read
5 min read

The following article is an excerpt from ESOMAR’s Global Research Software 2024. This report provides the most comprehensive overview of the state of the insights industry and the evolving research software sector worldwide, using data collected by national research associations, leading companies, independent analysts, and ESOMAR representatives.

The landscape of market research could alter significantly.”

Artificially generated, rather than produced by real-world events, synthetic data is typically created using algorithms. It can be used to validate mathematical models and to train machine learning models. This is most welcome as getting real-world data can be complex and the storage and management of it is costly. Furthermore, using real-world data can bring along privacy and bias concerns as well as regulatory restrictions. In such cases, GenAI-powered synthetic data can offer an artificial alternative. However, the relatively new technology has its critics too.

1. What it is

According to Marty Resnick, Research VP for Technology Innovation at Gartner, “Synthetic data is a class of data that is artificially generated — that is, not obtained from direct observations of the real world.” He explains that it can be generated using different methods, such as: statistically rigorous sampling from real data, semantic approaches, or generative adversarial networks — and by creating simulation scenarios where models and processes interact to create completely new datasets of events.

Amplifying voices

Annelies Verhaeghe, Managing Partner and Chief Platform Officer at the Human8 consultancy, believes that “with every new technology revolution it is wise to reflect on how it can help to automate, augment or even transform what we are doing already today.” Synthetic data is appealing, she adds, because it can potentially check all three boxes:

  • Automate: “As we are only running things on machines, we can get feedback from synthetic audiences at an unforeseen speed and potentially even a fraction of the cost.”

  • Augment: “There are some big challenges we’ve had as an industry that we can potentially take forward. Think of problems with fraud or Personally Identifiable Information (PII) information, or the challenge of long questionnaires.”

  • Transform: “The big question is if it can also do something new that was not possible before.”

Verhaeghe sees several interesting movements. A common one is doing research with niche audiences, where synthetic data allows one to enlarge a data set: “In essence, this enables you to amplify the voice of certain targets in your data that are hard to engage in traditional ways.” It can also increase consumer-driven decision-making in companies, she observes, because some of the cost and time barriers are taken down. “You can use it in cases where you have not done research before, for instance, to test more stimuli than you would normally do.” 

Key benefits

Wim Kees Janssen is the CEO and founder of Syntho, a scale-up that is disrupting the data industry with AI-generated synthetic data. He believes that the technology offers the following key benefits: 

  • Accelerated data access: “Synthetic data allows for faster access to data, reducing the often time-consuming process of acquiring real-world data sets. Organizations can increase progress from concept to execution, accelerating the pace of innovation.”

  • Unlock data and insights: “Synthetic data unlocks data that was too time-consuming or costly to access. This empowers analysts and data scientists to produce actionable intelligence.”

  • Facilitating collaborations: “By providing a secure platform for data sharing, organizations can use synthetic data to build innovation.”

  • Building trust through privacy preservation: “Synthetic data minimizes the utilization of personal information. By preserving privacy while still enabling innovative processes, businesses can navigate regulatory landscapes with confidence.” 

2. What it can do

Verhaeghe has seen various valuable synthetic data use cases, such as smart avatars. The greater value of this lies in using AI models as activation tools, bringing large groups of internal stakeholders closer to the people who are important to a brand: “For example, we have experimented with creating chatbots that are trained based on your own primary data to bring a persona or segment to live in a workshop. Instead of reading a long report, stakeholders can quickly chat with one of the personas to immerse in their world and get answers to their most burning questions.”

Marty Resnick, research VP for technology innovation at Gartner, also sees many use cases from software testing to creating digital twins and analysing scenarios with anonymized data. Resnick indicates: “Marketing could use it to create digital twins of customers, while manufacturing could leverage the industrial metaverse with synthetic data used to create and test scenarios with digital twins of machines and processes.” 

Privacy-by-design

In Janssen’s experience, the technology can accelerate testing processes, whilst maintaining data quality: “Using synthetic data enables agile development by closely resembling production data, whilst adhering to privacy regulations, ensuring a secure and efficient testing environment. The use of rule-based synthetic data also allows users to have the full flexibility to create new scenarios that are not covered by production, but may occur in the future.” 

Another valuable use case he sees is AI model training and validation. Organisations face significant challenges in getting access to relevant data, especially when privacy concerns are involved: “Traditional anonymization methods often fall short, leaving data unusable for analytics while maintaining privacy risks. AI-generated synthetic data addresses these challenges by creating artificial data that closely mimics real-world patterns and characteristics. It offers a privacy-by-design approach, ensuring data utility without compromising confidentiality.”

The full article with the risks of synthetic data and what will the future be is now available in ESOMAR’s Global Research Software 2024. Download your copy now!

Also, for an extra EUR 90, readers can now download the report on the core market research sector—ESOMAR’s Global Market Research 2024.