Data that’s really, really big
No one goes around saying, “Gee, look at how big that data is!” Well, maybe some people, but they’re weird. At present, there is no unified definition of ‘big data’. Various stakeholders have diverse or self-serving definitions. A major software company defines big data as “a process” in which we “apply computing power to massive and highly complex datasets”. This definition implies a need for computer software – gosh, how convenient.
A more thoughtful approach was recently taken by two British researchers. For data to be considered “big”, they argue, it must have two of the following three characteristics:
- Size: big data is massively large in volume.
- Complexity: big data has highly complex multi-dimensional data structures.
- Technology: big data requires advanced analysis and reporting tools.
OK, not bad. Note that in this definition there are no relatives or absolutes (i.e., that a dataset must be bigger than ‘x’). But we also know intuitively that the data housed at Amazon, Walmart, or Google probably meets these requirements. I like this approach, because it does not reflexively imply a large number of subjects. Big data could be miles deep, yet just one respondent wide (think, for example, about continuous measures of emotion or cognition). Biometric data, for example, easily fits.
Reg Baker (Market Strategies International) has said that ‘big data’ is a term that describes “datasets so large and complex that they cannot be processed or analyzed with conventional software systems”. Rather than size, complexity, or technology, he focused on sources:
- Transactions (sales, web traffic/search)
- Social media (likes, endorsements, comments, reviews)
- Internet of Things (IoT) (sensors or servo-mechanisms)
Perhaps so. If thinking only of sources, I would add biometric/observational data to this list. These data are inherently different: they are narrow and still complex. Observational data might include experiences, ethnography, weather/celestial data, or sources that involve continuously measured activity. Biometric data includes all manner of physiological (sensor) measurement that is then analyzed using highly sophisticated software. In biometric research, the number of observations (subjects) is often < 30, yet the number of data elements captured per subject is enormous.
So, when is data “big”?
A lay person would say that big data implies “massiveness”, so while not wrong, ‘big data’ is somewhat of a misnomer. We need to think of big data in a three-dimensional way. Big data requires “massiveness” in three areas:
- Data elements (i.e., variables)
- Observation units (i.e., subjects)
- Longitudinal units (i.e., time)
Big data typically has a longitudinal aspect (i.e., data collected continuously or over multiple time periods) with frequent updates (i.e., repeat purchase). Additionally, the tools needed to analyze big data (i.e., neural networks, SEM, time series) are significantly different tools than those used for less complex datasets (i.e., spreadsheets). Much like the word “fast”, the word “big” will evolve, too.
Better, cheaper, faster – or just bigger?
In the last 5-10 years we have seen a shift away from reliance on survey research data and analysis, towards a greater belief that ‘big data’ will tell us everything we need to know about trends, customers, and brands. This is reflected in the following data and analysis trends:
- From data/analytics that are scarce and costly, to ubiquitous and cheap. When data is everywhere, and basically free, we assume that there must be more we can do with it.
- From testing specific hypotheses, to relationships discovered by data mining. This is the million-monkey-typewriter hypothesis. Here are some amusing examples.
- From seeking feedback directly, to presuming needs from data patterns. This implies more weight on correlations and modeling than conversation.
- From a foundation of sampling theory and statistical testing, to the presumption of normality (and that all differences are meaningful given ‘big data’ status).
- From data gathered by design, to data “found” in other processes (so, for example, GPS data in a transaction record).
The above is not “wrong” per se; rather, it represents a shift away from critical thinking. ‘Big data’ is shrouded in hype and over-promise. Marketing management’s dreams of never ending insights from big data are just that. Dashboards, visualizations, and marketing mix models are alluring representations of ‘big data’ – some are beautiful and artistic. Yet, isn’t the goal to use ‘big data’ to drive profitable decision-making?
‘Big data’ and survey research – BFFs, like, forever
Survey research must share the bed with ‘big data’, though they will continue to fight over the sheets. Big data can free the survey researcher from having to spend time collecting merely descriptive data (for which human memory is notoriously foggy) and that might otherwise reside in transactional databases. This lets survey research do what it does best: gather opinions and reactions to stimuli.
Over time we will find out that ‘big data’ does a wonderful job of recording behavior, but does less well at predicting. In the near term, there will be a redeployment of resources away from primary and survey research. Some companies will rely on big data too heavily, and in avoiding direct discussions with customers, will suffer.
Opportunity will be there for companies who actively listen, rather than a purely modeled approach. Bridging the gap between self-reported, biometric, and observed behaviors is likely to become the next really “big” thing. Happy Halloween!!