Why We Don’t Talk About ‘Synthetic Data’—And Why You Shouldn’t Either

17 June 2025

AI is changing how we generate insights, but not all AI-driven methods are the same. Relying solely on 'synthetic data' can be misleading and unhelpful, as it should not be treated as a catch-all term for all AI outputs.

Business Trends Artificial Intelligence

5 min read

A few weeks ago, a thoughtful paper from the UK’s Market Research Society reflected the ongoing debate around the application of AI-generated “Synthetic Data” for insight. The conclusion from the analysis – that synthetic data used alone, as a direct replacement for consumer research, can be unreliable, strongly aligns with the perspective we’ve been developing for the last couple of years.

AI is transforming the way we generate insights, but not all AI-driven approaches are created equal. ‘Synthetic data’ alone often isn’t enough and, furthermore, from our perspective, using ‘synthetic data’ as a catch-all term for any and all AI-powered outputs can be unhelpful and, potentially, misleading.

The sheer volume of articles, discussions and AI-enhanced solutions in the market demonstrate that market research is being reshaped by AI. And when used well, AI-generated data is already showing how it can serve as a solution to long-term industry challenges: faster real-time insights, overcoming limitations in access panel data quality, secure testing for confidential materials and cutting costs by reducing reliance on expensive participant-driven research.

But in lumping all AI-driven approaches under an umbrella term ‘synthetic data’, we’re doing a disservice to the real potential of AI-empowered insight and fuelling unnecessary scepticism.

Instead, we should look to AI-driven simulations rooted in high-quality, proprietary client data, enriched with curated contextual datasets— to ensure every AI-powered insight is anchored in and validated by human truth.

The Good, the Bad, and the Deeply Misleading

To be clear, this is as much a semantic question of the blanket use of the term ‘synthetic’ itself as anything. At its core, “synthetic data” is, of course, any data, generated by AI which is designed to mimic real-world data.

But the term ‘synthetic’ has pejorative associations. And there are distinctions; not all synthetic data is created equal.

At one end, there are ‘fully synthetic’ datasets, created primarily by algorithms with little direct connection to real people. There is ‘partially synthetic’ data; real data with some gaps filled in by AI, to expand accuracy or overcome dataset limitations

And there is ‘augmented’ data, which uses real-world inputs as a foundation for AI models and simulations to expand or extrapolate from.

These methods can vary greatly in their reliability.

Fully or partially synthetic approaches – and indeed poorly designed augmented approaches – face challenges that can, understandably, create trust issues amongst researchers if not addressed:

A lack of human anchoring, synthetic data insights risk being disconnected from how real people actually think, feel, and behave.
Over-Simplification, AI can generalize too much, lacking context and smoothing out nuances and niche trends that represent the messy reality of human behaviour.
Bias Reinforcement, if the data AI is trained on is limited, flawed or unrepresentative, the output will be too. A risk of exaggeration or overfitting existing patterns. Garbage in, garbage out.
Black Box Syndrome, many synthetic data solutions lack transparency. If researchers don’t know how insights were generated, how can they trust them?
Closed loop, AI-generated data feeds back into itself over time. Without fresh, human inputs, it stagnates, distorts, and declines in accuracy (a point we’ve discussed previously)

All of which make it difficult for researchers to trace the insights back to real consumers and customers. This can be fine for when fast, directional (quick and dirty) guidance is needed, but not for making decisions of consequence.

A different (and better) approach

That’s why it can be helpful to draw distinctions between different types of AI-generated data.

Focusing on carefully created augmented approaches, such as AI simulations built with transparent use of real, high-quality data from real people in their training layers helps the generated insight to be credible, actionable, and reflective of the real world.

Ultimately, to build credibility, it’s about ensuring the AI outputs are anchored in human truth and can be verified back to humans.

AI + Human Truth = Real Insights. Not just throwing AI at the internet and hoping for the best. Instead, training the AI on proprietary, high-quality datasets, ensuring the insights are grounded in real behaviour, not statistical guesswork.
Context Matters. Consumer behaviour is deeply tied to cultural, economic, and psychological factors. That’s why integrating carefully curated contextual datasets, layering in social trends and cultural artifacts creates insights beyond what AI alone can achieve
Full transparency. Help researchers know what goes into their insights - don’t hide behind black-box solutions and be open about methods and assumptions.
Human. AI insights don’t run on autopilot, overlay human expertise at every step. Set a benchmark for quality using real-world testing, validation and human empathy to extract meaning.
Commercial. Use proprietary high quality data sources to provide proprietary competitive advantage. Not just access to the same insight as everyone else

Combining the power of AI with deep human psychology, cultural analysis, and real-world context: this is where the truly unexpected and real insight shows up.

Don’t settle for Artificial Insights

The best AI-powered approaches are built on a combination of real-world data, cultural intelligence, and human truth - verifiable, actionable, and reflective of the real world.

So, let’s stop talking only about ‘synthetic data.’ AI has the power to transform insight more broadly—if we use it the right way. After all, why have synthetic, when you can have silk?

Business Trends Artificial Intelligence

Richard Preedy

As a senior insight specialist with nearly two decades of experience, I blend robust research with innovative techniques and cutting-edge technology to generate culturally resonant, future-facing insights for brands. As Executive Director at Verve, I lead our AI for Insight offer. With over 25 years in insight and data, I've developed solutions combining Cultural and Human understanding with AI's power, driving value-adding opportunities for clients and revolutionising insights discovery and utilisation.

Why We Don’t Talk About ‘Synthetic Data’—And Why You Shouldn’t Either

Richard Preedy

Related

Top 10 global consumer trends 2022

Successful AI Implementations in Market Research

Disconnection in the workplace