Sometimes it is better to sit still and
wait until the dust settles. Slowly but surely, the dust is settling around the
Big Data buzz and what I am looking at does not impress me.
There are a lot of sloppy definitions of
Big Data about. I have read incredible white papers, books and articles which
all boil down to: “We don’t know what it is, but it is coming! Be prepared (to
open your wallet).” In this article, I make an attempt to put certain aspects
about Big Data in a Business Intelligence perspective and come to a definition
that is more formal and truly distinguishes Big Data from just being a bigger
database. I have a few challenges for you that will hopefully elicit some response
and debate. These are the main challenges, intentionally served as bullet
points as it will make bullet holes in some people’s assumptions.
- Three V’s are not enough to describe the phenomenon
- There is a word missing in the phenomenon’s naming
- It has always been around: nihil novi sub sole
- What’s really Big about Big Data
Between all vaguely defined concepts,
vendor pushed white papers and publications, three “Vs” are supposed to define
Big Data completely.
Gartner, who is always ready to define a
new market segment as a target for analysis and studies as a new revenue source
introduced these three “Vs” in 2008 or something.
“Variety”, “Velocity” and “Volume” are
supposed to describe the essence of Big Data. Let’s see if this holds any
water. “Velocity”, the speed with which data should be processed and “Volume”
are relative and subject to technological advances. When I did analysis for a
large telecom organisation in 1 the mid-nineties, the main challenge was to
respond fast and adequately to potential churn signals, hidden in the massive
amounts of call data records (CDRs). Those CDRs were the Big Data of the
nineties.
Today with the ever increasing numbers in
social, mobile and sensor data the velocity and volumes have increased but so
has the speed of massive parallel processing capabilities as well as storage
and retrieval. Speed and volume are also an identical challenge for every player in the competition. If you need to be
on par with your competitors, take out your credit card and book some space on
AWS or Azure and Bob’s your uncle. The entire Big Data infrastructure is
designed to function on cheap hardware.
Two “Vs” that matter: Variety and Volatility
"Variety"
One of the real differentiators is
“Variety”: the increased variety of data types that really defines what Big
Data is all about.
This “V” you won’t read about in Big Data
hype articles, yet this is a serious challenge. From the early EBCDIC and ASCII
files to UTF-8, xml, BLOBs, audio, video, sensor and actuator data etc...
making sense of all these data types can be a pain. But also the variety within
a data type may add to complexity. Think of the various separation marks in
weblogs. And it gets worse when analysing unstructured data like natural
language where culture and context kicks in: events, subcultures, trends,
relationships, etc... Even a simple mood analysis in the US on tweets about the
death of Bin Laden was negatively interpreted since a simplistic count of the
word “death” was used to parse the tweets on the minus side without taking the
context into the equation.
”Volatility”
This is about the variations in sentiment,
value and predictive power which is so typical for an important segment in Big
Data. A few examples to make it clear.
If I give a “like” to Molson and Stella beers,
will this still be valid next week, next month,…?
If I measure consumer behaviour in its broadest sense:
information requests, complaints, e-mails to the customer service, reviews on
websites, purchases, returns and payment behaviour and I try to link this to
behaviour on social media, will it correlate with the previous measures? Think
of predicting election results on the basis of opinion polls ,don’t we often
say one thing while we act totally different?
This is the real challenge of Big Data. The
rewards are great. The organisation that can exploit data about moods,
attitudes and intentions from unstructured data will have better forecasts and
respond faster to changes in the market.
This way, Big Data realises a dream from
the 1960’s: establishing a relationship between communication input and its impact
on sales results. In other words, the DAGMAR [1] model from Russel Colley can
materialise by analysing unstructured data from a broad range of sources.
[1] DAGMAR: the abbreviation of Defining
Advertising Goals for Measuring Advertising Results.
Geen opmerkingen:
Een reactie posten