I rubbed my hands in anticipation as another huge dataset splashed into my inbox. "Fire up the old stats engine!" I declaimed, my poor colleagues glancing at each other in anticipation of the usual random data mining exercise.
With 40 columns and 20,000 rows of data, we had plenty of scope to go correlation hunting. With 780 potential inter-column relationships to test, a confidence threshold of 95% would guarantee around 40 significant results even if none of them were real. And once we start looking at subgroups, there could be hundreds. Easily enough to drown any meaningful insights in a mass of speculative relationships driven, almost entirely, by pure noise.
But then I got a call from my friend David Hume. You remember him – Scottish philosopher, icon of rationality, died in 1776. Very interested in a little thing called cause and effect. David rang me up on a Whatsapp voice call (roaming charges to the 18th century are dreadful) and reminded me of his infamous motto: correlation is not causation. You should have trademarked that, Dave. Anyway, I sighed. Held to account, yet again, by the reasoned voice of science.
My colleague Siyanda and I stared mournfully at this vast dataset. Where in a static dump of data can we find causal relationships? Traditionally you'd have to do an experiment: change variable x and see whether y changes consistently. But we already had the data. No chance to go back and re-engineer.
We talked about it. Our usual practice for uncovering insights like this is to ask the respondents: tell us a story. Stories have causality: something happened, and because of that, something else happened. By hearing people's stories, we can understand their causal model of the world. In this dataset, we had indeed asked a "storyhearing" question: Tell us about your life one year from now.
Unfortunately, the stories we got back in this column were very limited. Respondents had already spent a bit of time answering open-ended questions, and perhaps were too fatigued to open up fully about the next year of their lives.
Siyanda had started to explore the idea of qualitatively understanding the people within each of the ten cultures in the study. From their words, we should be able to tell a story about each culture's archetypal person, as represented in the data.
Then we had an idea: could we assemble a story from the different answers to the questions in the dataset? The columns include: Where did your life begin? Describe yourself. What are your long term goals? And the core elements of a classic story structure include a protagonist, a starting point, an inciting incident, a motivation, a series of events and an ending. On another Whatsapp call, me in the back of a car and Siyanda somewhere between Chicago and Botswana, we realised that this dataset, entirely by accident, tells a complete life story for every respondent!