Synthetic data and its transformative power in the future – part 2
Synthetic data is influencing the insights industry amid AI's rise, but it has its critics. This second part explores its risks and future.

Article series
Global Research Software
- Is the growth slowdown of research software a sign of trouble?
- Can Europe’s research software catch up to its potential?
- Research software in the US: Who is pushing the envelope forward?
- Asia Pacific, the fastest-growing insights industry in the world
- The evolving insights landscape in the Rest of the Americas
- Africa and Middle East: Unpacking growth amidst challenges
- Synthetic data and its transformative power in the future – part 1
- Synthetic data and its transformative power in the future – part 2
In tandem with the rise of AI, synthetic data has made quite an impact in the insights industry in recent times. But the use of it is not without its critics. Part 1 of this article discussed Synthetic data and its capabilities. In this second part of the two-part series, you will read about the risks of synthetic data and its future in the insights industry.
The following article is an excerpt from ESOMAR’s Global Research Software 2024. This report provides the most comprehensive overview of the state of the insights industry and the evolving research software sector worldwide, using data collected by national research associations, leading companies, independent analysts, and ESOMAR representatives.
“The landscape of market research could alter significantly.”
Artificially generated, rather than produced by real-world events, synthetic data is typically created using algorithms. It can be used to validate mathematical models and to train machine learning models. This is of course most welcome as getting real-world data can be complex and the storage and management of it is costly. Furthermore, using real-world data can bring along privacy and bias concerns as well as regulatory restrictions. In such cases, GenAI-powered synthetic data can offer an artificial alternative. However, the relatively new technology has its critics too.
What are the risks?
Whilst synthetic data offers a plethora of possibilities, it also carries with it certain risks. Matt Hay is a consumer trends expert and founder and CEO of the technology platform Bulbshare. In his Research World article, last year, he talked about “unforeseeable dangers”. As the picture is coming into greater focus, he deems synthetic data not inherently dangerous but warns that misuse or misinterpretation can lead to significant consequences, such as reinforcing biases or spreading misinformation: “As with any data set, there’s always a need to interrogate the source and examine its integrity. Researchers must apply that same level of curiosity and discernment when it comes to synthetic data. Without that discipline, it poses grave consequences for the integrity of our sector at large which must be avoided at all costs.”
Asleep at the wheel
Similarly critical was a 2022 essay by Mikkel Krenchel and Maria Cury, partners at ReD Associates, a social science-based consulting firm. Whilst they confirm synthetic data’s many benefits and predict a growing role in the toolbox of thoughtful researchers, they also see lots of pitfalls. Krenche points out: “The one that worries us most is that it may lead researchers and executives to fall asleep at the wheel. As a society, we are already struggling with data literacy and transparency. With the growth of synthetic data, it might get a whole lot worse.” Krenchel adds that synthetic data could become a dangerous default, even for people who know about its pitfalls, simply because it is cheap, convenient, and can feel authoritative.
Rigorous assessment
According to Hay, synthetic data poses various risks, including reinforcing biases and facilitating the spread of misinformation: “Whether intentionally manipulated to arrive at misleading results, or just badly prompted to create skewed or discriminatory outcomes, the consequences of misuse are very real.” He urges the implementation of rigorous validation processes, transparency in data generation methods, and ongoing evaluation of the impact of one’s synthetic data on decision-making processes: “Alongside rigorous assessment, it's important to also ensure that any source data you use is diverse, representative, and reliable. And, once the information is generated, always sense-check it against real-world research to confirm its validity.”
Echo chamber effect
Compared to when they published their essay, Krenchel and Cury are a little less worried today that AI models built purely by synthetic data will cause all kinds of harm. However, they are more worried that synthetic data will be misused and misunderstood in other contexts. Krenchel says: “We described a possible echo chamber effect, whereby AI feeds the AI and the models that develop and control key aspects of our world increasingly respond to an internal logic divorced from the reality we inhabit. Since then, researchers have shown that this echo-chamber effect is actually so strong that AI models built purely from synthetic data over time become virtually useless, in what they describe as model collapse.” This is good news, he says, and much better than if models built purely on synthetic data only had modest biases: “Because it is pushing AI developers to be thoughtful about their data sets and combats the tendency for synthetic data to leave us asleep at the wheel.”
Human eyes
Synthetic data is only as smart as the quality of the input it receives, underlines Verhaeghe. She, therefore, offers the following advice:
Be aware of potential biases that are inherently present in these data sets in terms of diversity, country, or timeframe: “There is a strong Anglo-American bias and ChatGPT relies on data up until late 2021.”
In augmented synthetic data, one contextualizes the AI system with deep data collected in primary research: “In-the-moment and emotional primary data are vital to complement synthetic data and guide business decisions. Moreover, synthetic data cannot be prompted for things we are unaware of. To unveil the unknown, primary research remains crucial.”
Question the involved risk: “Which decisions can you afford to be wrong? Synthetic data can be great for low-risk, incremental decisions, but is it worthwhile basing more important decisions on it?”
Be critical when using black box synthetic solutions: “Proven concepts, such as representatives, bias, sample sizes, and the art of asking good, relevant questions to AI systems, all remain crucial. AI needs human eyes!”
Synthesis
If reliance on synthetic data becomes widespread, Hay fears it may lead to a decline in the quality and diversity of research, ultimately limiting innovation and progress in the research industry: “The landscape of market research could alter significantly, with businesses opting to prioritise a solution that is low cost and free of GDPR constraints, rather than choosing human empathy first and foremost — which will always provide richer, more authentic results.”
However, he believes that smart brands will recognise that synthetic data and organic consumer insight work can have a complementary relationship, rather than a mutually exclusive one: “While synthetic data might offer benefits for cost savings, speed, scalability, and security, it may not grasp the nuance of real-world scenarios that expert human researchers relish. It truly highlights the importance of maintaining a balance between leveraging synthetic data and conducting authentic research — with human insights. Synthesising both might be the route to more profound insights that radically drive competitive advantage for those companies that harness its power."
What will the future bring?
Gartner estimates that by 2030, synthetic data will completely overshadow real data in AI models. This could be considered a threat to “traditional” suppliers of real data. Resnick however, prefers to point out the opportunity to pivot to new markets. In the firm’s latest Tapestry research, organizations are encouraged to sharply look at trends and forces: “Executive leaders must assess the footprints of their organizations, which represent their existing presence within a market or industry, while also offering new opportunities to develop footholds, their plans to scale and expand into new markets, across these four worlds beyond just digital transformation.”
Fundamental change
Resnick feels that synthetic data, or any technology, shouldn’t be looked at by itself: “It is the impacts and accelerators/inhibitors that are going to come across what we call Tapestry — technology, politics, economics, social, trust, regulatory, and environmental areas — that will truly determine the future world and the place of synthetic data in that world.” Cury agrees with Hay that the research industry is facing some fundamental changes but that does not mean there is no role for original research in the future. “If anything, the need for high quality data that can be used as the basis for synthetic data sets, AI models, and more, will only go up. But the practice of research will have to significantly change to keep up.”
The full article about synthetic data and its capabilities is now available in ESOMAR’s Global Research Software 2024. Download your copy now!
Also, for an extra EUR 90, readers can now download the report on the core market research sector—ESOMAR’s Global Market Research 2024.
Article series
Global Research Software
- Is the growth slowdown of research software a sign of trouble?
- Can Europe’s research software catch up to its potential?
- Research software in the US: Who is pushing the envelope forward?
- Asia Pacific, the fastest-growing insights industry in the world
- The evolving insights landscape in the Rest of the Americas
- Africa and Middle East: Unpacking growth amidst challenges
- Synthetic data and its transformative power in the future – part 1
- Synthetic data and its transformative power in the future – part 2