Can AI Forecast B2B Adoption? Closer Than You’d Think.

In B2B software, forecasting adoption is one of the hardest jobs in product strategy. We compared the results of a human conjoint with five synthetic conjoint results. One of the tests came within a percentage point of real buyers’ choices.

8 min read
8 min read

In B2B software, where buying cycles run 9 to 12 months and most prospects are locked into multi-year contracts with incumbents, forecasting adoption is one of the hardest jobs in product strategy. We compared the results of a human conjoint with five synthetic conjoint results. One of the synthetic approaches we developed came within a percentage point of real buyers’ choices.

B2B purchases are rarely one person’s call. A typical decision-maker must win over security, finance, the end-users, and an executive sponsor. That multi-stakeholder reality, layered on top of 9-to-12-month buying cycles and multi-year contracts with incumbent vendors, is exactly what makes B2B adoption uniquely hard to forecast, and exactly what AI-simulated buyer research has had the hardest time capturing.

The classical way to answer “who will buy what, at what price” is a conjoint study: a survey that walks real buyers through a series of product trade-offs (this bundle vs. that one vs. neither) and infers which features and prices drive decisions. A B2B conjoint typically runs months and tens of thousands of dollars, requires an expert to design the trade-offs correctly, and depends on a pool of hard-to-reach qualified buyers (security leaders, CFOs, heads of platform engineering) large enough to power the model. Smaller and fast-growing B2B companies, the ones with the most pressing pricing and packaging decisions, are precisely the ones least likely to be able to afford it. That is why so many teams are turning to LLMs as a substitute and why making them work has been the hard part.

So far, researchers have not been able to generate synthetic conjoint results that are decision grade quality. Studies from Harvard, Columbia, Stanford, and elsewhere have documented why: AI overestimates willingness-to-pay by as much as three times on some attributes, gets the wrong sign on others, and silently fills in unstated assumptions. Change the price and the model invisibly changes its assumed quality, biasing preference estimates. These biases persist across LLM generations. 

When you ask an off-the-shelf AI to play a B2B software buyer and either pick a product or decline, it almost never declines. In our test, the AI’s adoption rate was 98%, against 53% for real B2B buyers (with adoption varying widely across segments, as we show below). The 47% real opt-out is well above the 33% one would expect from random choice across three options; buyers were actively declining, not picking at random. Researchers Gui and Toubia at Columbia (2024) call this “sycophantic acceptance,” the same failure mode every prior study runs into. By default, AI buyers say yes to almost everything: they have no boss to satisfy, no compliance team to convince, no committee to align.

. We designed a conjoint study with software options. The choices of 160 real B2B buyers were compared with five different synthetic choices. Each choice offered the same three options: pick option A, pick option B, or pick neither (keep the current solution). Options A and B were each described by nine varying features. We compared the synthetic choice results against the human benchmark on two practical metrics: how closely its predicted adoption rate matched the real one, and how strongly its ranking of which features matter most correlated with humans.

We tested four synthetic conjoint approaches

The simplest approach uses a bare LLM with only a demographic-style persona, the standard practice today. It confirmed the failure mode: the AI’s predicted adoption rate was 98%, against the real 53%. Not good enough to inform real product decisions.

A second approach replaced demographic stubs (“Mid-market SaaS buyer, 200 employees”) with personas built from more than 100 actual recorded buyer conversations: a mix of structured customer interviews and sales calls, encoded into a structured profile of each buyer’s situation, jobs, and constraints. This alone cut the adoption-rate gap from 44 percentage points to 19. But it revealed something important. Real-conversation grounding mostly teaches the model that real buyers often say no. It barely improved the AI’s ability to identify which product attributes matter most. Grounding is necessary. It is not enough.

In our third approach we built a panel identical to step two, but now with two additions to every persona: signals about how anchored each buyer was to their current vendor, and explicit decline conditions written in the persona’s own voice, grounded in what their source recording revealed. Each persona now carried its own switching threshold. This “feature-priority panel” reduced the adoption-rate gap to 3 percentage points and moved the AI’s ranking of which product features matter most from a weak match with human judgments to a strong match. It is the strongest setup for figuring out what to build next.

Fourth, we added a brief category overview that each AI persona reads first, exactly the kind of introduction a real survey respondent reads before answering. We call the resulting setup the “adoption-forecast panel.” It pushed the AI’s predicted adoption rate to within 1.2 percentage points of the real 53%, within the noise of any human survey, and improved the AI’s per-buyer predictions. The trade-off: most of step three’s feature-ranking gain slipped back to step two levels. The two set-ups, feature-priority and adoption-forecast, are best at different things.

Lastly, we added more detail per persona isn’t always better. Layering in finer-grained rejection criteria pushed the AI’s predicted adoption rate down to 38%, against the real 53%. The gap widened from 1.2 percentage points back to 16. Piling detail onto a persona without softening its rejection logic makes the panel worse, not better.  This effect has also been found in non-conjoint synthetic studies.

The five approaches at a glance


Which set-up for which question

The most useful framing from this research is not “which AI buyer panel is best?” but “which AI buyer panel is best for this question?”

If you need to know which features deserve top-of-roadmap investment, use the feature-priority panel. Every grounded AI buyer panel in the study agreed on the same top three things B2B buyers care about: price, customer support, and security, in that order. The off-the-shelf AI got that ordering wrong. The caveat is that this top three finding holds best for features buyers can evaluate before purchase: price, security certifications, integration count, deployment model. For experience features, those whose value only becomes clear with use (like onboarding ease, support responsiveness, or community depth), the AI got the direction wrong on three of nine attributes, predicting buyers liked something they disliked. That gap reflects a real limitation of training on text rather than lived experience, and no amount of prompt engineering closed it. Treat AI rankings on experience features as directional at best and validate them with humans before treating them as roadmap inputs.

If you need to know what share of buyers will adopt at a given price, use the adoption-forecast panel. The 1.2-percentage-point gap is tight enough to use for take-rate forecasting, packaging decisions, and price-point exploration before fielding any human research.

If you are thinking of replacing all your human research with synthetic, no single panel can be a good enough substitute. The AI buyer panel’s per-buyer prediction is meaningfully above chance but still less than half what a state-of-the-art statistical model achieves on real human data. Detailed pricing models and individual-level adoption forecasts still need human respondents. What an AI buyer panel does instead is compress the front end of the research process, letting you screen concepts, prune attributes, and pre-register hypotheses before fielding, so three exploratory human studies become one focused one.

How it is used

Our client has applied the validated AI buyer panel to its own product strategy. The team uses it to model new packaging concepts, test price sensitivity, and pre-screen feature trade-offs before committing additional human research. This tightens the front end of every product strategy decision.

Once calibrated against a single matched human study, the same persona bank can be re-run against any new design at any new price point at the cost of compute alone, and a small (about 20-respondent) human anchor recovers near-full accuracy on substantially new designs. In a category where every fielded study takes months and costs tens of thousands of dollars, that adds up quickly. A recent packaging study, for example, would have cost well over $60K and 10 weeks to field with human respondents; the AI buyer panel version was completed within a day.

The bottom line

An AI buyer panel is not a replacement for talking to customers. It is a way to do more with the conversations you have already had. You can turn a hundred recorded buyer calls into a panel that can be re-run on any design, at any price point, overnight, and that gets the order of buyer-segment adoption right.

The ingredients that make it work are simple in principle: real recorded conversations as source data, explicit decline conditions written into each persona, and a brief category overview at the start. The intuitive move that backfires, piling more detail onto personas without softening their rejection logic, is worth knowing before you build.

Pick the panel that matches the decision your team is trying to make. Not the one that produces the most impressive-sounding number.