vrijdag 2 mei 2014

Questions to Ask Ralph Kimball the 10th June in 't Spant in Bussum (Neth.)

Dear Ralph,

I know you’re a busy man so I won’t take too much of your time to read this post. I look forward to meeting you June 10 in 't Spant in Bussum for an in depth session on Big Data and your views on the phenomenon.
In one of your keynotes you will address your vision on how Big Data drives the Business and IT to adapt and evolve. Let me first of all congratulate you with the title of your keynote. It proves that a world class BI and data warehouse veteran is still on top of things, which we can’t say for some other gurus of your generation, but let’s not dwell on that.
I have been studying the Big Data Phenomenon from my narrow perspective: business analysis and BI architecture and here are some of the questions I hope we can tackle during your keynote session:

1. Do you consider Big Data as something you can fit entirely in star schemas? I know since The Data Webhouse Toolkit days that semi structured data like web logs can find a place in a multidimensional model but some of the Big Data produce is to my knowledge not fit for persistent storage. Yet I believe that a derived form of persistent storage may be a good idea. Let me give you an example. Imagine we can measure the consumer mood for a certain brand on a daily basis, scanning the social media postings. Instead of creating a junk-like dimension we could build a star schema with the following dimensions: a mood dimension, social media source dimension, time, location and brand dimension to name the minimum and a central fact table with the mood score on a seven point Likert scale. The real challenge will lie in correctly structuring the text strings into the proper Likert score using advanced text analytics. Remember the wrong interpretation of the Osama Bin Laden tweets early May 2011? The program interpreted “death” as a negative mood when the entire US was cheering the expedient demise of the terrorist.
Figure 1: An example of derived Big Data in a multidimensional schema

2. How will you address the volatility issue?  Because Big Data’s most convincing feature is not volume, velocity or variety which have always been relative to the state of the art. No, volatility is what really characterizes Big Data and I refer to my article here where I point out that Big Behavioural Data is the true challenge for analytics as emotions and intentions can be very volatile and the Law of Large Numbers may not always apply.
3. Do you see a case for data provisioning strategies to address the above two issues? With data provisioning, I mean a transparent layer between the business questions and the BI architecture where ETL provides answers to routine or planned business questions and virtual data marts produce answers to ad hoc and unplanned business questions. If so, what are the major pitfalls of virtualization for Big Data Analytics?
4. Do you see the need for new methodologies and new modeling methods or does the present toolbox suffice?

It’s been a while since we met and I really look forward to meeting you in Bussum, whether you answer these questions or not. 


Kind regards,

Bert Brijs

Geen opmerkingen:

Een reactie plaatsen