Why data cleaning is failing: A call for pre-survey fraud prevention

4 June
Authors Ryan Rothe

Fraud in market research is a major threat, with nearly one-third of respondents flagged as fraudulent in a recent study. Evolving tactics like device spoofing and AI-generated text are outpacing traditional defenses.

5 min read

We’re seeing more fraud, not less - and it’s evolving faster than our defenses.

Fraud in market research is no longer a fringe issue. It’s a systemic threat that touches nearly every part of online sampling. In a recent research-on-research study, we simulated a typical survey and found that nearly one-third of respondents were clearly fraudulent or flagged for quality concerns. Much of that data passed standard industry checks without raising alarms.

This isn’t just a matter of scale. It’s about how fraud is changing. Fraudsters now use tools like device spoofing, VPNs, emulators, and even AI-generated text to mimic legitimate behavior. Many of the industry’s go-to defenses, such as straightlining flags, red herrings or simple logic traps, can no longer keep up.

Post-survey cleaning can’t catch what it can’t see

Cleaning has long been treated as a safety net for bad data. But it’s a reactive solution. It can only evaluate what’s already made it through the door, and when fraud is designed to blend in, there’s often nothing visibly wrong.

In the same study, we reviewed open-ended responses that appeared thoughtful and convincing. But every single one came from a known fraudulent source. Based on content alone, these responses would have passed. But their metadata told another story.

This is the most dangerous part of today’s threat landscape. Fraud looks better than ever. And once flawed data enters a dataset, it’s difficult to remove without risking your timelines or rewriting your results.

Most quality issues are already baked in before fieldwork begins

Fraud is not just slipping through the cracks. It’s reaching the survey environment long before we try to stop it. The assumption that fraud can be fixed after the fact is still widely held, but it doesn’t match today’s reality. By the time a fraudulent respondent is flagged during cleaning, incentives may already be paid, quotas filled, and conclusions drawn.

What’s needed is a proactive shift: stop fraud before it ever enters a survey. That starts by analyzing how a response is generated, not just what it says. Metadata like browser data, network behavior, device characteristics, and traffic source information can all provide warning signs. Paradata (data about how someone interacts with a survey) can help surface issues long before the first question is even answered.

Focus on signals, not symptoms

Checking responses for logic or strange wording is no longer enough. When fraudsters get better at writing human-like answers, we have to look elsewhere for detection. What matters is not just what someone says, but how they behave.

That’s where behavioral data comes in; information like response cadence, device and network configuration, and whether a virtual machine or emulator is in use. These signals reveal patterns that don’t show up in content alone. They help catch both known fraud and new tactics that haven’t yet been widely identified.

In other words, these systems look for patterns, not just red flags.

The industry needs smarter systems and shared accountability

Stopping fraud isn’t just about building better tech. It requires a mindset shift across the industry. Too often, researchers lean on post-survey cleaning to solve issues that should have been prevented earlier. At that stage, the fraudster has been paid, the data has been used, and the trust in your findings may already be eroded.

A smarter approach means real-time checks at the top of the funnel. That includes evaluating behavioral risk factors, such as inconsistent timestamps, unusual browser activity, or mismatched device details, and stopping fraud before it reaches the respondent pool.

But tools alone won’t fix the problem. Researchers and suppliers have to be willing to hold each other accountable. That means asking the hard questions when problems arise, instead of falling back on easy excuses like “the screener was too long” or “the IR was low.” It means taking the time to understand where issues originate and working collaboratively to address them.

Fraud can’t be solved by outsourcing the problem or accepting it as a cost of doing business. It takes vigilance, transparency, and alignment on what quality should look like.

Cleaning isn’t a cure-all, and it never was

Cleaning still has a role, but it was never built to catch fraud that looks clean. If we want to rebuild trust in online research, prevention needs to be built into the process from the very beginning, not treated as an optional final step.

That starts with accepting that some fraud will appear deceptively valid. And it ends with changing how we evaluate data: from the first click, not the final review.

Ryan Rothe
Chief Revenue Officer at Rep Data