Charting AI’s Future in Research - Insights from the Panel Revolution

18 November 2024

Looking back to the rise of online panels may be illustrative in charting the path of synthetic respondents

Online Communities Synthetic Data

5 min read

Reminiscing with an ex-colleague, I realised that something has changed in the way market researchers are perceived. When we started our careers and would tell others that we workedin the industry, a common response was: “So you’re the person with a clipboard who stops me in the street!” Over the years this has become less and less common, to the point that now I don’t hear it at all. I must admit, I don’t miss it, but its disappearance is reflective of a profound change in the way we conduct research.

Once upon a time, quantitative research was conducted through a range of different approaches – one of these was intercept interviews – stopping people who were going about their day and asking them survey questions. Other approaches were also common – mail interviews sent to random addresses, telephone interviews calling random numbers. These have now all been replaced by what was once a truly revolutionary innovation: the online panel.

While they seem prosaic these days, online panels were revolutionary. The ability to target segments of the population through huge communities of pre-opted in consumers massively reduced the time and cost involved in conducting fieldwork.

The rise of online panels may hold insight into the fate of another innovation that is shaking up the research landscape: synthetic responses. Synthetic response firms promise to change research by making the process more efficient, just as panel companies did in the 90s and 00s.

Synthetic responses (it is claimed), will completely remove the need for fieldwork. Instead of sourcing, incentivising, and interviewing real respondents, this tech will use AI driven large language models to replicate responses, creating robust samples of any population you desire. Even hard and expensive to source groups (heads of procurement for large hospitals, Fortune 500 CEOs, etc.) could be replicated at huge sample sizes.

Our ability to evaluate where synthetic data might be headed is hampered by deep seated vested interests from almost all who comment on the issue. Synthetic data firms benefit from tech-utopian credulity to drive investment and growth. All other market researchers (such as myself), sell their trade by extracting truths from real human beings, and so would benefit from a luddite denial of this technology’s validity.

Casting our minds back to when panels were in their infancy may give us a glimpse of the industry’s future direction. With the advent of the internet, online panels quickly became the main method of conducting quantitative research – almost completely replacing conventional methods (intercepts, mail surveys, and telephone interviews).

The core advantages of panels are that they allow research to be conducted far more quickly and cheaply than before but, in many ways, sampling via panel is far inferior to the methods it replaced: panel members are a self-selecting group, so don’t necessarily represent a wider population with varying interests and motivations. They are less accessible (e.g. to people without the technology or inclination to sign up) and so reach a smaller and less representative group. They have given rise to small groups of “professional respondents” who game the system, providing false responses in order to qualify for surveys, or even create bots to do this for them.

Collectively, we as an industry decided that these trade-offs were worth it for the cost and time savings panels provide, and the role of the researcher has changed accordingly; part of the art of being a researcher in 2024 is designing questionnaires which counteract issues with panels: building in questions to identify poor quality responses and bots, and working around representativity issues.

And if we look back and think the compromise wasn’t worth it? Well, it’s too late. The genie is out of the bottle – panels now dictate the cost and duration of research, such that applyingmore rigorous sampling methods is almost never a competitive approach. Research integrity was traded for efficiency.

The parallels with synthetic data are apparent; its adherents promise to revolutionise the industry by cutting timing and costs, and like panels, it has some undeniable drawbacks in terms of research integrity.

The obvious and fundamental flaw is that it is impossible to know how closely a dataset produced by synthetic responses matches a dataset of human responses. Enthusiasm for this burgeoning sub sector is tempered by understandable scepticism: Automating research processes is one thing, but removing humans from the research process entirely is, intuitively, difficult to accept.

The firms themselves tell us they achieve over 90% similarity with traditional research results (a confusing claim, because two samples taken from a population in the traditional way will rarely have 95% correlation with each other), but many remain sceptical.

Leave aside the issue of “hallucinations” (the tendency of AI generated responses to just make something up), and the other issues that large language models represent when mimicking human responses. A choice to use synthetic responses in place of human responses is to make a leap of faith.

Few are qualified to assess the veracity of the technology – to do so one would need an advanced understanding of statistics, machine learning, data engineering, and programming. Even those with such expertise would be facing a black box. Synthetic data firms will guard their proprietary models closely; “trust us, it works” is the mantra.

Panels changed our industry, arguably for the worse, but in that case we could rationally assess the trade-offs that were being made in exchange for cost and speed. The same cannot be said for synthetic responses. But if research buyers are happy to accept synthetic responses as valid, then the historic rise of online panels may foreshadow how synthetic data will reshape our industry.

Online Communities Synthetic Data

Caspar Swanston

Quantitative Director at Lovebrands

Caspar is an experienced researcher with a commitment to rigour, whose unrelenting curiosity is driven by a profound fascination with human behaviour. As head of the quantitative division at Lovebrands, a global consultancy with a holistic offering, he is dedicated to uncovering insight, building strategy, and driving innovation for clients.

Lovebrands

Charting AI’s Future in Research - Insights from the Panel Revolution

Caspar Swanston

Related

Synthetic Data - Just a hype? Decoding Synthetic Data for Product Testing

Digital design re-thinking

Synthetic data: A game-changer in data-driven decision making