Learning to deal with AI that is smarter than us

28 April

AI has already surpassed the “just a flawed chatbot” stage and is rapidly improving. With new benchmarks like GDPval showing it can perform professional-level work, the real question now is how quickly industries adapt to this shift.

6 min read

AI is already smarter than you think, and it is getting more capable every week. If your mental model is still "a chatbot that gets things wrong," it is time for an update. The question of whether AI can do professional work as well as the experts who currently do it is no longer abstract. There is now a benchmark, GDPval, that tests exactly that, and the results should change how researchers, insight teams, and clients think about the next two years.

What GDPval measures

GDPval, developed by OpenAI, is a benchmark designed to push past trivia, exam questions, and toy coding puzzles. Instead, it gives AI models the same sort of work assignments that paid professionals do. It then asks independent expert judges to pick the better output, without knowing which was produced by a human and which by a machine.

The tasks are not artificial tests, for example, logic problems. The tasks are professional deliverables: legal memos, financial analyses, medical care plans, engineering reports, and customer service responses. The full set covers 1,320 tasks across 44 occupations, drawn from the nine sectors that make up the largest share of US economic output. The name itself, GDPval, is a clue: it is a measure of practical economic usefulness, not academic ability.

On the latest leaderboard, the most recent frontier model scored close to 85 per cent. That means in roughly 85 of every 100 head-to-head comparisons, judges either preferred the AI output or rated it equal to the work of an experienced professional. If we exclude the draws, AI wins still sit in the high 60s. For practical purposes, on any single isolated task, a senior professional looking at the AI's work blind would find it acceptable or impressive most of the time.

The jagged edge

This is where it would be tempting to either celebrate or panic. Both reactions miss the point because AI capability is jagged, not smooth. The same system that can draft a defensible legal memo can also fail at things a child would get right. The classic example is asking how many Rs are in "strawberry" and getting the wrong answer. There are dozens of others, from arithmetic slips to confidently invented citations to subtle category errors a junior researcher would never make.

So, we have an unusual situation. AI is, on average, as good as or better than experts on a wide range of real tasks, while still being capable of mistakes no expert would make. Useful, but not safely autonomous. The implication for our industry is not "AI is overhyped" or "humans are obsolete." It is something more demanding: we need to learn to work with a collaborator whose strengths and weaknesses do not match our intuitions. This is the jagged edge.

Four principles for responding

If we accept that picture, four basic principles follow.

1.     People, not machines, own the work. Do not send an analysis to a client unless you can sign it off as correct. The accountability must sit with a named person who has read the output, understood it, and is prepared to defend it. AI changes how the work is produced, not who is responsible for it. This is the line that separates augmentation from negligence.

2.     Build in checks for the things AI cannot do. Because the failures are jagged, a generic review is not enough. We need processes that specifically look for the kinds of errors machines make: fabricated sources, plausible-sounding numbers that do not reconcile, missing context, and ethical issues that a human would notice instinctively. That means structured QA steps, second-pair-of-eyes rules, and a culture in which it is fine, even expected, to say "the AI got this bit wrong."

3.     Use AI to amplify you, not to replace your judgment. The professionals who will benefit most are those who treat AI as a force multiplier on tasks they already understand. Use it to draft, to summarise, to challenge, to surface alternatives, to accelerate the boring middle of every project. Then apply your trained judgement to what it produces. Your expertise remains a key strength; AI just lets you apply it to more projects.

4.     Press for protection at the system level. Individual diligence is necessary but not sufficient. Trade associations, regulators, and governments need to act to protect people from deception and abuse. For example, synthetic content passed off as real respondents, AI-generated advice that masquerades as professional, and opaque automated decisions in sensitive domains. Esomar's professional standards work, the ICC/Esomar Code, and similar instruments in adjacent industries are the right vehicles, but they only matter if practitioners actively support and shape them.

Where this leaves us

The honest reading of GDPval is that AI will not replace professionals tomorrow. The benchmark tests isolated one-shot tasks. It does not capture client relationships, accountability, or the judgment built over a career. But on any given task, the AI's work would now be acceptable or impressive to an expert most of the time. That is a meaningful signal, and the trajectory is steep.

The researchers and insight professionals who thrive over the next few years will not be the ones who pretend this is not happening, nor the ones who outsource their thinking to a model. They will be the ones who own the work, design around the jagged edge, use AI to do more and better, and help build the guardrails that keep the wider system honest.

Ray Poynter
Chair of the Professional Standards Committee at Esomar