From data lakes to data dams: Improving the impact of market research - Part one
Part one of a two-part series on how to best utilize market research data that is stored in an organization’s data lake

Setting the stage: the power of dams
In the 19th Century, dams played an important role in society, helping workers transport large, heavy logs from one location to another. In the absence of roads and trucks, these logs were floated down rivers and across reservoirs, which created a greater supply of materials for the construction of houses and buildings.
The idea of creating a barrier of sorts to store up water had other benefits, like turning turbines that powered logging mills. The processed logs themselves weren’t the desired outcome of all this but what was able to be built with them - from houses to offices, fishing boats to musical instruments. Today, there’s a new type of dam that delivers a similar type of future benefit to an organization, which happens to share a lot in common with data lakes.
In this modern commercial world we operate in, client-side market research teams have access to data lakes, and these can contain incredible value, as we’ll see below. The alternative, which we’ll also explore below, is building a market research data dam, which can store up and collect valuable research data that can deliver organizations.
Letting your research speak: Elevating insights from the data lake
Data lakes have become central to how modern organizations store and manage information. These flexible repositories can house vast amounts of raw, unstructured, and structured data, enabling a more comprehensive, integrated view of business operations, customers, and the marketplace. As global data creation surges toward a projected 181 zettabytes by 2025, companies are increasingly turning to data lakes to accommodate the scale and variety of inputs that fuel decision-making across departments.
Fueled by this growth, the global data lake market reached $16.6 billion in 2023 and is expected to exceed $90 billion by 2032. Much of this growth is driven by cloud-based deployments, now accounting for nearly 60% of the market, and by the expanding role of Artificial Intelligence (AI) in transforming how data is used. AI and machine learning technologies are especially well suited to work with the raw formats stored in data lakes: uncovering patterns, predicting trends, and enabling automation across everything from customer engagement to operational planning.
Yet despite this potential, many organizations still struggle to extract value from all their data, especially the survey results, brand tracking, and marketing feedback that often sit relatively unused within these systems. That’s where the right tools make a difference: technologies purpose-built to integrate with data lakes, surface marketing-relevant insights from market research data, and help teams move this type of complex data from storage to strategic action. When set up effectively, a data lake isn’t just a repository. Like a well-engineered dam, it can control the flow, build momentum, and direct the power of data toward insight-driven growth, as well as providing a secure location for data.
Yet despite this potential, many organizations still struggle to extract value from all their data, especially the survey results, brand tracking, and marketing feedback that often sit relatively unused within these systems. That’s where the right tools make a difference: technologies purpose-built to integrate with data lakes, surface marketing-relevant insights from market research data, and help teams move this type of complex data from storage to strategic action. When set up effectively, a data lake isn’t just a repository. Like a well-engineered dam, it can control the flow, build momentum, and direct the power of data toward insight-driven growth, while also providing a secure location that protects the integrity of the data itself.
It also supports time-sensitive decision-making by helping ensure that only relevant, recent datasets are fueling analysis. Without this safeguard, organizations risk drawing conclusions based on trends that no longer reflect consumer realities, or worse, combining insights from entirely different eras into the same output. Purpose-built platforms help mitigate this by preserving metadata, enforcing recency checks, and allowing insights teams to maintain confidence in the timeliness and relevance of the data being used.
Building a smarter stack: Why survey data demands more than BI
Inside an enterprise data lake, data is in constant motion, flowing in from a variety of sources and being accessed for analysis and reporting. Survey data, behavioral signals, CRM inputs, social listening feeds, sensor logs, sales data and more all converge in this centralized environment. The flexibility of the data lake allows organizations to accommodate a diverse array of inputs without enforcing rigid schemas at the point of ingestion.
But simply storing this information isn’t enough. To create value, you need tools that can intelligently prepare, analyze, and activate this data. Companies rely on an expanding array of technologies, ranging from cloud infrastructure and data wrangling tools to advanced analytics and visualization platforms to transform the data into usable insights. Business Intelligence (BI) solutions such as Power BI and Tableau are widely used for financial and operational reporting and offer strong capabilities in terms of dashboards and high-level metrics.
However, these tools often fall short when applied to market research data. Surveys typically include open-ended text, multi-response formats, nested variables, sampling weights, and metadata that provide critical context. General-purpose BI tools frequently flatten or omit these elements, thereby stripping the data of its richness and limiting the depth of insight that can be achieved. As a result, insights professionals often resort to manual coding, external statistical packages, or custom scripting to complete basic analytical tasks. These workarounds slow down processes, introduce risk, and add technical debt to the system.
In short, trying to analyze complex survey data with general-purpose BI tools is like forcing industrial-grade water through a recreational dam. The infrastructure simply wasn’t built for that level of pressure or precision. Without the right structure in place, the value seeps away or causes bottlenecks that slow progress.
These limitations become even more apparent as organisations seek to leverage AI to enhance insights generation. According to a recent report from the Market Research Institute International, 62 percent of research teams report that most or some of their members are using AI, up significantly from 39 percent the year prior. AI is increasingly being applied to automate tasks such as text summarisation, trend detection, and smart reporting. However, AI systems are only as effective as the data infrastructure they operate within. Without a platform that preserves the structure and complexity of research data, even the most advanced models can produce results that are misleading or incomplete.
Read the rest of this piece in the second article instalment, which will be published next week!
John Bird
Executive Vice President at InfotoolsJohn Bird currently serves as an Executive Vice President for Infotools (www.infotools.com). His experience spans B2B and B2C work and he has conducted research programs in over 70 countries. He is focused on fueling curiosity and moving clients from three ring binders and “death by PowerPoint” to Infotools Harmoni, a SaaS data design, investigation and reporting platform.