zondag 19 oktober 2014

Defining Business Analysis for Big Data


Introduction to an enhanced methodology in business analysis

Automating the Value Chain


In the beginning of the Information Era, there was business analysis for application development. Waterfall methods, Rapid Application Development, Agile methods,.. all were based on delivering a functioning piece of information technology that supports a well defined business process. There are clear signs of an evolution in the application development area.
Core operations like manufacturing and logistics came up with automation of human tasks and the IT department was called the “EDP department”. Some of the readers will need to look up that abbreviation. I can spare them the time: Electronic Data Processing indicated clearly that the main challenge was managing the data from these primary processes.
Information as a business process support becomes an enabler of (new) business processes

This schema gives a few hints on the progress made in automation of business processes: the core operations came first: finance, logistics and manufacturing which evolved into Enterprise Resource Planning (ERP). Later sales, marketing and after sales service evolved into customer relationship management which later on extended into Enterprise Relationship Management (ERM) incorporating employee relationship management and partner relationship management.  Finally ERP and ERM merged into massive systems claiming to be the source of all data. The increase in productivity and processing power of the infrastructure enabled an information layer that binds all these business processes and interacts with the outside world via standardized protocols (EDI, web services based on SOAP or REST protocols).
The common, denominator of these developments is: crisp business analysis to enable accurate system designs was needed to meet the business needs.

The "Information is the New Oil Era"

Already in the mid nineties, Mike Saylor, the visionary founder and CEO   from Microstrategy stated that information is the new oil.  Twenty years later, Peter Sondergaard from Gartner repeated his dictum and added “and analytics is the combustion engine”.  A whole new discipline –already announced since the 1950’s- emerged: Business Intelligence (BI). Connecting all available relevant data sources to come up with meaningful information and insights to improve the corporate performance dramatically.
The metaphor remains powerful in its simplicity: drill for information in the data and fuel your organization’s growth with better decision making.
Yet, the consequences of this new discipline on the business analysis practice remained unnoticed by most business analysts, project managers and project sponsors. The majority was still using the methods from the application development era. And I admit in the late nineties I have also used concepts from waterfall in project management and approached the products from a BI development track as an application where requirements gathering would do the trick. But it soon became clear to me that asking for requirements to a person who has an embryonic idea about what he wants is not the optimum way. The client changes requirements in 90 % of the cases after seeing the results from his initial requirements. That’s when I started collecting data and empirical evidence on which approach to a business analysis method leads to success.  So when I published my book “Business Analysis for Business Intelligence” in October 2012, I was convinced everybody would agree this new approach is what we need to develop successful BI projects. The International Institute of Business Analysis’s (IIBA) Body Of Knowledge has increased its attention to BI but the mainstream community is still unaware of the consequences on their practice. And now, I want to discuss a new layer of paradigms, methods, tricks and tips on top of this one? Why face the risk of leaving even more readers and customers behind?  I guess I need to take Luther’s pose at the Diet of Worms in 1521: “Here I stand, I can do no other.” So call me a heretic, see if I care. 
Luther at the Diet of Worms in 1521

The new, enhanced approach to business analysis for business intelligence in a nutshell deals with bridging three gaps. The first gap is the one between the strategy process and the information needed to develop, monitor and adjust the intended strategic options.
The second gap is about the mismatch between the needed and the available information and the third gap is about the available information and the way data are registered, stored and maintained in the organization.
Now, with the advent of Big Data, new challenges impose themselves on our business analysis practice.

Business Analysis for Big Data: the New Challenges

But before I discuss a few challenges, let’s refer to my definition of Big Data as described in the article “What is really Big About Big Data” In short: volume, variety and velocity are relative to technological developments. In the eighties, 20 Megabytes was Big Data and today 100 terabytes isn’t a shocker. Variety has always been around and velocity is also relative to processing, I/O and storage speeds which have evolved. No, the real discriminating factor is volatility: answering the pressing question what data you need to consider as persistent both on semantic and on a physical storage level. The clue is partly to be found in the practice of data mining itself: a model evolves dynamically over time, due to new data with better added value and / or because of a decay in value of the existing data collection strategy.
Ninety percent of “classic” Business Intelligence is about “What we know we need to know” . With the advent of Big Data the shift towards “What we don’t know we need to know” will increase. I can imagine in the long run the majority of value creation will come from this part.
From “What we know we need to know” to
“What we don’t know we need to know”
is the major challenge in Business Analysis for Big Data
Another challenge is about managing scalability. Your business analysis may come up with a nice case for tapping certain data streams which deliver promising results  within a small scope but if the investment can’t be depreciated on a broader base, you are dead in your tracks. That’s why the innovation adage “Fail early and fail cheap” should lead all your analytical endeavors in the Big Data sphere. Some of you may say “If you expect to fail, why invest in this Big Data Thing?”. The simple answer is “Because you can’t afford not to invest and miss out on opportunities.” Like any groundbreaking technology at the beginning of its life cycle, the gambling factor is large but the winnings are also high. As the technology matures, both the winning chances and the prize money diminish. Failing early and cheap is more difficult than it sounds. This is where a good analytical strategy, defined in a business analysis process can mitigate the risks of failing in an expensive way.
Business Analysis for Big Data is about finding scalable analytical solutions, early and cheap.
So make sure you can work in an agile way as I have described in my article on BA4BI and deliver value in two to three weeks of development. Big Data needs small increments.
Data sources pose the next challenge. Since they are mostly delivered via external providers, you don’t control the format, the terms and conditions of use, ... In short it is hard if not impossible to come with an SLA between you and the data provider. The next challenge related to the data is: getting your priorities right. Is user generated content like reviews on Yelp or posts in Disqus more relevant than blog posts or tweets? What about the other side of the Big Data coin like Open Data sources, process data or IOT data? And to finish it off: nothing is easier than copying, duplicating or reproducing data which can be a source of bias.
Data generates data and may degenerate the analytics
Some activist groups get an unrealistic level of attention and most social media use algorithms to publish selected posts to their audience. This filtering causes spikes in occurrences and this in turn may compromise the analytics. And of course, the opposite is also true: finding the dark number, i.e. things people talk about without being prominent on the Web may need massive amounts of data and longitudinal studies before you notice a pattern in the data. Like a fire brigade, you need to find the peat-moor fire before the firestorm starts.
The architectural challenge is also one to take into account. Because of the massive amount amount of data and their volatility which cannot always be foreseen, the architectural challenges are bigger than in “regular” Business Intelligence.
Data volatility drives architectural decisions
There are quite a few processing decisions to make and their architectural consequences impact greatly the budget and the strategic responsiveness of the organization. In a following article I will go into more detail but for now, this picture of a simplified Big Data processing scheme gives you a clue.

Big Data Architecture Options


Enabling Business Analysis for Big Data

We are at the beginning of a new analytical technology cycle and therefore, classical innovation management advice is to be heeded.
You need to have a business sponsor with sufficient clout, supporting the evangelization efforts  and experiments with the new technologies and data sources.
Allow for failures but make sure they are not fatal failures: “fail fast and cheap”. Reward the people who stick out their necks and commit themselves to new use cases. Make sure these use cases connect with the business needs, if they don’t, forward them to your local university. They might like to do fundamental research.
If the experiments show some value and can be considered as a proof of concept, your organization can learn and develop further in this direction.
The next phase is about integration:

  •  integrate Big Data analytics in the BI portfolio
  • integrate Big Data analytics in the BI architecture
  • integrate Big Data analytical competences in your BI team
  • integrate it with the strategy process
  • integrate it in the organizational culture
  • deal with ethical and privacy issues 
  • link the Big Data analytical practice with existing performance management systems.


And on a personal note, please, please be aware that the business analysis effort for Big Data analytics is not business as usual.

What is the Added Value of Business Analysis for Big Data?

This is a pertinent question formulated by one of the reviewers of this article. “It depends” is the best possible answer.
The Efficiency Mode
It depends on the basic strategic drive of the organization.  If management is in a mode of efficiency drive, they will skip the analysis part and start experimenting as quickly as possible. On the upside:  this can save time and deliver spontaneous insights. But the downside of this non directed trial-and-error approach can provoke undesirable side effects. What if the trials aren’t “deep” and “wide” enough and the experiment is killed too early? With “deep” I mean the sample size and the time frame of the captured data and with “wide” the number of attributes and the number of investigated links with corporate performance measures.
The Strategy Management Mode
If management is actively devising new strategies, looking for opportunities and new ways of doing business rather than only looking for cost cutting then Business Analysis for Big Data can deliver true value.
It will help you to detect leading indicators for potential changes in market trends, consumer behavior, production deficiencies, lags and gaps in communication and advertising positioning, fraud and crime prevention etc…
Today, the Big Data era is like 1492 in Sevilla, when Columbus went to look for an alternative route to India. He got far beyond the known borders of the world, didn’t quite reach India but he certainly changed many paradigms and assumptions about the then known world. And isn’t that the essence of what leaders do?

maandag 26 mei 2014

Elections’ Epilogue: What Have We Learned?

First the good news: a MAD of 1.41 Gets the Bronze Medal of All Polls!

The results from the Flemish Parliament elections with all votes counted are:

Party
 Results (source: Het Nieuwsblad)
SAM’s forecast
20,48 %
18,70 %
Green (Groen)
8,7 %
8,75 %
31,88 %
30,32 %
Liberal democrats (open VLD)
14,15 %
13,70 %
13,99 %
13,27 %
5,92%
9,80%

Table1. Results Flemish Parliament compared to our forecast

And below is the comparative table of all polls compared to this result and the Mean Absolute Deviation (MAD) which expresses the level of variability in the forecasts. A MAD of zero value means you did a perfect prediction. In this case,with the highest score of almost 32 % and the lowest of almost six % in only six observations  anything under 1.5 is quite alright.

Table 2. Comparison of all opinion polls for the Flemish Parliament and our prediction based on Twitter analytics by SAM.

Compared to 16 other opinion polls, published by various national media our little SAM (Social Analytics and Monitoring) did quite alright on the budget of a shoestring: in only 5.7 man-days we came up with a result, competing with mega concerns in market research.
The Mean Absolute Deviation covers up one serious flaw in our forecast: the giant shift from voters from VB (The nationalist Anti Islam party) to N-VA (the Flemish nationalist party). This led to an underestimation of the N-VA result and an overestimation  of the VB result. Although the model estimated the correct direction of the shift, it underestimated the proportion of it.
If we would have used more data, we might have caught that shift and ended even higher!

Conclusion

Social Media Analytics is a step further than social media reporting as most tools nowadays do. With our little SAM, built on the Data2Action platform, we have sufficiently proven that forecasting on the basis of correct judgment of sentiment on even only one source like Twitter can produce relevant results in marketing, sales, operations and finance. Because, compared to politics, these disciplines deliver far more predictable data as they can combine external sources like social media with customer, production, logistics and financial data. And the social media actors and opinion leaders certainly produce less bias in these areas than is the -case in political statements. All this can be done on a continuous basis supporting day-to-day management in communication, supply chain, sales, etc...
If you want to know more about Data2Action, the platform that made this possible, drop me a line: contact@linguafrancaconsulting.eu 

Get ready for fact based decision making 
on all levels of your organisation





zaterdag 24 mei 2014

The Last Mile in the Belgian Elections (VII)

The Flemish Parliament’s Predictions

Scope management is important if you are on a tight budget and your sole objective is to prove that social media analytics is a journey into the future. That is why we concentrated on Flanders, the northern part of Belgium. (Yet, the outcome of the elections for the Flemish parliament will determine the events on Belgian level: if the N-VA wins significantly, they can impose some of their radical methods to get Belgium out of the economic slump which is not very appreciated in the French speaking south.)  In commercial terms, this last week of analytics would have cost the client 5.7 man-days of work. Compare this to the cost of an opinion poll and there is a valid add on available for opinion polls as the Twitter analytics can be done a continuous basis. A poll is a photograph of the situation while social media analytics show the movie.

 A poll is a photograph of the situation while social media analytics show the movie.

From Share-of-Voice to Predictions


It’s been a busy week. Interpreting tweets is not a simple task as we illustrated in the previous blog posts. And today, the challenge gets even bigger. To predict the election outcome in the northern, Dutch speaking part of Belgium on the basis of sentiment analysis related to topics is like base-jumping knowing that not one, but six guys have packed your parachute. These six guys are totally biased. Here are their names, in alphabetical order, in case you might think I am biased:


Dutch name
Name used in this blog post
CD&V (Christen Democratisch en Vlaams)
Christian democrats
Groen
Green (the ecologist party)
N-VA (Nieuw-Vlaamse Alliantie)
Flemish nationalists
O-VLD (Open Vlaamse   Liberalen en Democraten)
Liberal democrats
SP-A (Socialistische Partij Anders)
Social democrats
VB (Vlaams Belang)
Nationalist & Anti-Islam party
Table 1 Translation of the original Dutch party names

From the opinion polls, the consensus is that the Flemish nationalists can obtain a result over 30 % but the latest poll showed a downward trend breach, the Nationalist Anti-Islam party will lose further and become smaller than the Green party. In our analysis we didn’t include the extreme left wing party PVDA for the simple reason that they were almost non-existent on Twitter and the confusion with the Dutch social democrats created a tedious filtering job which is fine if you get a budget for this. But since this was not the case, we skipped them as well as any other exotic outsider. Together with the blanc and invalid votes they may account for an important percentage which will show itself at the end of math exercises. But the objective of this blog post is to examine the possibilities of approximating the market shares with the share of voice on Twitter, detect the mechanics of possible anomalies and report on the user experience as we explained at the outset of this Last Mile series of posts.

If we take the rough data of the share-of-voice on over 43.000 tweets we see some remarkable deviations from the consensus.
Party
Share of voice on Twitter
Christian democrats
21,3 %
Green (the ecologist party)
8,8 %
Flemish nationalists
27,9 %
Liberal democrats
13,6 %
Social democrats
12,8 %
Nationalist & Anti-Islam party
11,3 %
Void, blanc, mini parties
4,3 %

Table 2. Percentage share of voice on Twitter per Flemish party

It is common practice nowadays to combine the results of multiple models instead of using just one. Not only in statistics is this better, Nobel prize winner Kahneman has shown this clearly in his work. In this case we combine this model with other independent models to come to a final one.
In this case we use the opinion polls to derive the covariance matrix.
Table 3. The covariance matrix with the shifts in market shares 
This allows us to see things such as, if one party’s share grows, at which party’s expense is it? In the case of the Flemish nationalists it does so at the cost of the Liberal democrats and the Nationalist and Anti-Islam party but it wins less followers from the Christian and the social democrats. The behaviour of Green and the Nationalist and Anti-Islam party during the opinion polls was very volatile, which explains for a part the spurious correlations with other parties.


Graph 1 Overview of all opinion poll results: the evolution of the market shares in different opinion polls over time.

Comparing the different opinion polls, from different research organisations, on different samples is simply not possible. But if you combine all numbers in a mathematical model you can smooth a large part of these differences and create a central tendency.
To combine the different models, we use a derivation of the Black-Litterman model used in finance. We are violating some assumptions such as general market equilibrium which we replace by a total different concept as opinion polls. However the elegance of this approach allows us to take into account opinions, confidence in this opinion and complex interdepencies between the parties. The mathematical gain is worth the sacrifice of the theoretical underpinning.
This is based on a variant of the Black-Litterman model  Î¼=Π+τΣt(Ω+Ï„PΣPt)(pPΠ)


And the Final Results Are…


Party
Central Tendency of all opinion polls
Data2Action’s Prediction
18 %
18,7 %
Green (the ecologist party)
8,7 %
8,8 %
31 %
30,3 %
14 %
13,7 %
13,3 %
13,3 %
9,4 %
9,8 %
Other (blanc, mini parties,…)
5,6 %
5,4 %
Total
100 %
100 %

Table 4. Prediction of the results of the votes for the Flemish Parliament 

Now let’s cross our fingers and hope we produced some relevant results.

In the Epilogue, next week, we will evaluate the entire process. Stay tuned! Update: navigate to the evaluation.






vrijdag 23 mei 2014

The Last Mile in the Belgian Elections (VI)

Are Twitter People Nice People?


The answer is: “Depends”. In this article I make a taxonomy of tweets in the last week of the Belgian elections. Based on over 35.000 tweets we can be pretty sure that this is a representative sample. You can consider this article as an introduction to tomorrow's headline: the last election poll, based on twitter analytics.

A picture says more than a thousand tweets

The taxonomy of the Twitter community

So here it is.  The majority of tweets are negative. When you encounter positive tweets, they are either from somebody who wants to market something (in case of the elections him or herself or a candidate he or she supports) or from somebody who is forwarding a link with a positive comment.
There is a correlation between the level of negativity about a subject and the political party related to the subject. From a political point of view, the polarisation between the Walloon socialist party and the Flemish nationalist party is clearly visible on Twitter.
Even today, on the funeral of the well-respected politician of the older generation, the former Belgian prime minister Jean-Luc Dehaene, the majority of tweets were negative. Tweets linking him to the financial scandal of the Christian democrat trade union in Dexia were six times more than the pious "RIP JLD" variants.
So how do you derive popularity and even arrive at some predictive value from a bunch of negative tweets?  That, my dear blog readers, will be examined tomorrow in the final article. 





donderdag 22 mei 2014

The Last Mile in the Belgian Elections (V)

Why Sentiment Measures Alone Are Not Enough


In the process of developing Social Analytics and Monitoring, we learnt something most interesting about sentiment analysis. Before we created Data2Action  as a platform for data mining and developed SAM (Social Analytics and Monitoring) we studied many approaches.
Many of these were just producing numbers to express sentiment versus a brand, a person, a concept or a company, to name a few.
Isolated Sentiment Analysis is Meaningless
This can be too superficial to produce meaningful analytic results so we recreated social constructs that match with concepts. Analysing the sentiment of a construct element in context with a topic is not a trivial task. But at least it approaches human judgement and it can be trained to increase precision and relevance.
Today, I am not going to amaze you with Big Numbers but I’ll show you some examples of how we approach sentiment analysis with SAM.
Let’s take a few tweets about the N-VA party and examine how they are scored:
The ultimate horror for companies and a torpedo for our welfare state: an anti N-VA coalition with the ecologist party
Another point where N-VA does not represent the Flemish people
From a one-dimensional point of view, both tweets are negative for N-VA but the first is in fact meant as a positive, pro N-VA statement.
Let us look at this, more complex tweet:
Vande Lanotte opens up the coalition for the Green Party, wrong move as the voters already consider N-VA strong enough.
The first part of the sentence “Vande Lanotte opens up the coalition for the Green Party” can be considered positive for Vande Lanotte and his socialist party SP-A. But the second part is negative. This shows the importance of parsing the sentence correctly and attributing scores as a function of viewpoints.



woensdag 21 mei 2014

The Last Mile in the Belgian Elections (IV)

How Topic Detection Leads to Share-of-voice Analysis


It was a full day of events on Twitter. Time to make an inventory of the principal topics and the buzz created on the social network in the Dutch speaking community in the north of Belgium.
First, the figures: 10.605 tweets were analysed of which 5.754 were referring to an external link (i.e. a news site or another external web site like a blog, a photo album etc…)
As the Flemish nationalist party leader Mr. Dewever from N-VA (the New Flemish Alliance in English) launched his appeal to the French speaking community today, we focused on the tweets about, to and from this party.
A mere 282 tweets were deemed relevant for topic analysis. And here’s the first striking number: of these 282 tweets only 16 contained a reactive response. 
Tweets that provoked a reactive response are almost nonexistent

About 49 topics were grouping several media sources and publications of all sorts. We will discuss three to illustrate how the relationship between topic, retweets, klout score and added content makes some tweets more relevant than others. These are the three topics:

  • Dewever addresses the French speaking community via Twitter
  • Christian Democrat De Clerck falsely accuses N-VA of using fascist symbols in an advertisement
  • You Tube movie from N-VA is ridiculed by the broad community 

Dewever addresses the French speaking community via Twitter

This topic is divided in a moderately positive headline and two neutral ones. The positive: Bart Dewever to the French Speaking Community: “Give N-VA a Chance”
This headline generates a total klout score of 188 where the Flemish tv station VRT takes the biggest chunk with 158 klout score.
This neutral headline generates only 98 klout score: “Dewever puts the struggle between N-VA and the French speaking socialist party at the centre of the discussion”
The other neutral headline “N-VA President Bart Dewever addresses the French speaking community directly” delivers a higher score: 140 klout score partly because one of N-VA’s members of Parliament promoted the link to the news medium.
All in all with 426 total klout score, this topic does not cause great ripples, especially not if you compare this to a mere anecdote, which is the second topic.

Christian Democrat De Clerck falsely accuses N-VA of using fascist symbols in an advertisement

On the left, the swastika hoax, commented by the christian democrat and in the right the original ad showing a labyrinth

Felix De Clerck, son of the former Christian democrat minister of Justice Stefaan De Clerck, reacted to a hoax and was chastised for doing this. With a klout score of 967 this has caused a bigger stir although the political relevance is a lot smaller than Dewever’s speech… Emotions can play a role even in simple and neutral retweets.


You Tube movie from N-VA is ridiculed by the broad community


Another day’s high was reached with an amateuristic and unprofessional YouTube movie which showed a parody on a famous Flemish detective series to highlight the major issues of the campaign. This product from the candidates in West-Flanders, including the Flemish minister of Interior Affairs, Geert Bourgeois generated a total klout score of 778 tweets and retweets with negative or sarcastic comments.
Yet an adjacent topic about a cameraman from Bruges who is surprised by minister Bourgeois’ enthusiasm generates a 123 moderately positive klout score.

Three topics out of 49 generate 20.6 % of total klout scores!

This illustrates perfectly how the Twitter community selects and reinforces topics that carry an emotional value: the YouTube movie and the hoax from De Clerck generated a share of voice of no less than almost 17% of the tweets.

Forgive me for reducing the scope to Flanders, the political scope to just one party and the tweets to only three because this blog has not the intention of presenting the full enchilada. I hope we have demonstrated with today’s contribution that topics and the way they are perceived and handled can vary greatly in impact and cannot be entirely reduced to numbers. In other words, the human interpreter will deliver added value for quite a long time.

dinsdag 20 mei 2014

The Last Mile in the Belgian Elections (III)

Awesome Numbers... Big Data Volumes

Wow, the first results are awesome. Well, er, the first calculations at least are amazing.

  • 8500 tweets per 15 seconds measured means 1.5 billion tweets per month if you extrapolate this in a very rudimentary way...
  • 2 Kb per tweet = 2.8 Terabytes on input data per month if you follow the same reasoning. Nevertheless it is quite impressive for a small country like Belgium where the Twitter adoption is not on par with the northern countries..
  • If you use  55 kilobytes for a  model vector of 1000 features you generate 77 Terabyte of information per month
  • 55 K is a small vector. A normal feature vector of one million  features generates 72 Petabytes of information per month.

And wading through this sea of data you expect us to come up with results that matter?
Yes.
We did it.

Male versus female tweets in Belgian Elections
Gender analysis of tweets in the Belgian elections n = 4977 tweets

Today we checked the gender differences

The Belgian male Twitter species is clearly more interested in politics than the female variant: only 22 % of the 24 hours tweets were of female signature, the remaining 78 % were of male origin.
This is not because Belgian women are less present on Twitter: 48 % are female tweets against 52 % of the male sources.
Analysing the first training results for irony and sarcasm also shows a male bias. the majority of the sarcastic tweets were male: 95 out of 115. Only 50 were detected by the data mining algorithms so we still have some training to do.
More news tomorrow!

maandag 19 mei 2014

The Last Mile in the Belgian Elections (II)

Getting Started


I promised to report on my activities in social analytics. For this report, I will try to wear the shoes of a novice user and report, without any withholdings about this emerging discipline. I explicitly use the word “emerging” as it has all the likes of it: technology enthusiasts will have no problem overlooking the quirks preventing an easy end to end “next-next-next” solution. Because there is no user friendly wizard that can guide you from selecting the sources, setting up the target, creating the filters and optimising the analytics for cost, sample size, relevance and validity checks, I will have to go through the entire process in an iterative and sometimes trial-and-error way.
This is how massive amounts of data enter the file system
Over the weekend and today I have been mostly busy just doing that. Tweet intakes ranged from taking in 8.500 Belgian tweets in 15 seconds and doing the filtering locally on our in memory database to pushing all filters to the source system and getting 115 tweets in an hour. But finally, we got to an optimum query result and the Belgian model can be trained. The first training we will set up is detecting sarcasm and irony. With the proper developed and tested algorithms we hope for a 70% accuracy in finding tweets that express exactly the opposite sentiment of what the words say. Tweets like “well done, a**hole” are easy to detect but it’s the one without the description of the important part of the human digestive system that’s a little harder.
The cleaned output is ready for the presentation layer
Conclusion of this weekend and today: don’t start social analytics like any data mining or statistical project. Because taming the social media data is an order of magnitude harder than crunching the numbers  in stats.

Let’s all cross our fingers and hope we can come up with some relevant results tomorrow.

woensdag 14 mei 2014

The Last Mile in the Belgian Elections

Sentiment Analysis, a Predictor of the Outcome?


Data2Action is an agile data mining platform consisting of efficiently integrated components for rapid application development. One deliverable of Data2Action is SAM, for Social Analytics and Monitoring.
In the coming days, I will publish the daily results from sentiment analysis on Twitter with regards to the programmes, the major candidates and interest groups.

Data2Action and social analytics

Stay tuned for the first report on Monday 19th May

Questions like:

  • Which media produce the most negative or positive tweets about which party, which major candidate?
  • Who are the major influencers on Twitter?
  • What are the tweets with the highest impact?
The major networks will stimulate lots of tweets this weekend so we will present the analysis next Monday.

zaterdag 3 mei 2014

What has Immanuel Kant got to do with it??

Making a Success of New BI Tool Introduction


In the previous post I indicated the five major causes why BI consultants fail to introduce a new BI tool in the organisation. As promised, I have not just raised questions but I am ready to provide you with some answers.
Some of my colleagues in Business Intelligence commented on the LinkedIn discussion forum. I will quote their comments and integrate them in this post.
It is all about embedding the tool in a larger setting, larger than the competences of one BI specialist.
Some people won’t like to read this. The reason is simple: positioning the BI tool in a very broad, organisation wide vision goes beyond the competences of a technical project lead.  The approach requires teamwork and input of business analysts, strategic consultants and change managers. It requires more time and budget and both are scarce resources in an organisation. 
But if you look at the wasted time and money in remedial efforts to get the new BI tool on the road, you can consider the extra effort and resources as an insurance premium. Because you can only make a first impression once. 

These are the seven steps to successful introduction I will address in the article on my book site BA4BI:

* Get a deep insight in the organisation’s DNA
* Understand its strategy
* Understand its information needs
* Assess the information modelling acceptance in the organisation
* Translate the previous in the tool’s requirements
* Introduce the tool 
* Develop the decision making culture with the new tool

vrijdag 2 mei 2014

Questions to Ask Ralph Kimball the 10th June in 't Spant in Bussum (Neth.)

Dear Ralph,

I know you’re a busy man so I won’t take too much of your time to read this post. I look forward to meeting you June 10 in 't Spant in Bussum for an in depth session on Big Data and your views on the phenomenon.
In one of your keynotes you will address your vision on how Big Data drives the Business and IT to adapt and evolve. Let me first of all congratulate you with the title of your keynote. It proves that a world class BI and data warehouse veteran is still on top of things, which we can’t say for some other gurus of your generation, but let’s not dwell on that.
I have been studying the Big Data Phenomenon from my narrow perspective: business analysis and BI architecture and here are some of the questions I hope we can tackle during your keynote session:

1. Do you consider Big Data as something you can fit entirely in star schemas? I know since The Data Webhouse Toolkit days that semi structured data like web logs can find a place in a multidimensional model but some of the Big Data produce is to my knowledge not fit for persistent storage. Yet I believe that a derived form of persistent storage may be a good idea. Let me give you an example. Imagine we can measure the consumer mood for a certain brand on a daily basis, scanning the social media postings. Instead of creating a junk-like dimension we could build a star schema with the following dimensions: a mood dimension, social media source dimension, time, location and brand dimension to name the minimum and a central fact table with the mood score on a seven point Likert scale. The real challenge will lie in correctly structuring the text strings into the proper Likert score using advanced text analytics. Remember the wrong interpretation of the Osama Bin Laden tweets early May 2011? The program interpreted “death” as a negative mood when the entire US was cheering the expedient demise of the terrorist.
Figure 1: An example of derived Big Data in a multidimensional schema

2. How will you address the volatility issue?  Because Big Data’s most convincing feature is not volume, velocity or variety which have always been relative to the state of the art. No, volatility is what really characterizes Big Data and I refer to my article here where I point out that Big Behavioural Data is the true challenge for analytics as emotions and intentions can be very volatile and the Law of Large Numbers may not always apply.
3. Do you see a case for data provisioning strategies to address the above two issues? With data provisioning, I mean a transparent layer between the business questions and the BI architecture where ETL provides answers to routine or planned business questions and virtual data marts produce answers to ad hoc and unplanned business questions. If so, what are the major pitfalls of virtualization for Big Data Analytics?
4. Do you see the need for new methodologies and new modeling methods or does the present toolbox suffice?

It’s been a while since we met and I really look forward to meeting you in Bussum, whether you answer these questions or not. 


Kind regards,

Bert Brijs

dinsdag 15 april 2014

Why Business Intelligence Consultants Fail at Introducing New BI Tools

Many studies, surveys and other empirical evidence shows that between 25 and 45 % of all Business Intelligence (BI) software is deprecated and ends up as shelf ware. This is less the fault of aggressive software vendors than BI consultants who can’t look beyond at least five blinkers. And whether the BI consultant(s) involved was an internal staff member or outsourced doesn’t affect the outcome.
There are many causes of failed BI tool introductions ranging from “no executive sponsorship”, via “an underdeveloped business case”  to “not managing the scope properly “ but all these are not at the root of the problem. The BI project leader and the BI Business Analyst are responsible to manage these risks. 
Let’s have a look at these five blinders that are at the root cause of poorly managed tool introductions:

  • Focus on the technical aspects
  • Not enough problem owners identified
  • Managing expectations poorly
  • Too much power in the hands of the BI consultant
  • Pouring old wine in new bags

Focus on the technical aspects

A large publishing company asked us to compare two market leading BI tools. There were two technology factions in the IT department and we were asked to take the role of the objective referee between the two opposing camps. After days and days of intensive study, testing and two proofs of concept the differences were negligible and we advised the customer to  let the total cost of ownership decide which would be the tool of choice. But the two factions kept their hawkish stance on the tool of their choice.
We then tried to convince them to quit the technical discussions and take the business questions and the end user experience into account. The reaction of both parties was pointing at the real problem: “Managers all ask the same questions so this approach is irrelevant”.  In their vision the business questions were generic but the tool was unique. Never was the distance between IT and the business users wider. The lack of mutual understanding between business and IT is still prevalent in many organizations.  Only age old mail-order companies and pure play ecommerce enterprises have passed or skipped this development stage.

Not enough problem owners identified

If the BI consultants are unable to identify sufficient problem owners, the advice should be to postpone the tool introduction and if that is not possible, at least invest heavily in detecting and even creating problem owners. In too many cases finance and admin  (F&A) departments are the sole problem owners in a finance BI projects. As if HRM, marketing and operations have nothing to do with revenue,, variable and fixed costs…
During one of our BI audits in a third party logistics company we found that F&A had only knowledge of 27% of the total information needs. Yet the other 73 %  were in weaker or stronger form related to F&A information requirements… Needless to point out that this approach produces a weak foundation for a durable BI architecture.
Conclusion: the entire organization needs to own the information management problems and delegate these to the project steering committee.

Managing expectations poorly

During an audit of a customer relationship management (CRM) system at a large PC distributor we noticed it was impossible to create a product profile per customer. There was an order history per customer but these records only contained article number, price and quantity. If this multi million euro organisation had foreseen an extra field for category management, then this information had produced some meaning. Because who can tell if a 1Gb hard disk in 1998 was sold as consumer or a professional product?
The lesson to be learned is: before you accept any of the information requirements, you need to validate these against the available source data or else you may create expectations you can never meet.

Too much power in the hands of the BI consultant

We can all live with the fact that there are no objective consultants who can deliver totally value free advice based on scientific evidence. But one type of advice should always raise your suspicion: that of “consulting intensive”  BI tools. Any specialist knows that there are many degrees between “download, install and DIY” on one side and “Welcome to the consultants who will never leave” on the other. So make sure you choose a tool that can stay on board for at least five years. BI technology may produce a real or hyped breakthrough on every Gartner Summit but organisations need to be able to follow, adopt and adapt to the new technology to make maximum use of it.
I sometimes wonder if there is not a correlation between the size of the consulting organization and the size of the BI solution they bring to the table.
My advice: go visit a comparable organisation and see what their experience is with the BI tool and the consulting organization. It will set the PowerPoints of your BI consultants in perspective.

Pouring old wine in new bags

To paraphrase Bill Clinton: “It’s the user, stupid!”  Contrary to ERP systems where negative feedback is the norm and where users are forced to work with it, BI is a world of user motivation and positive feedback. If you avoid using the ERP system then essential documents and information like purchase orders, inventory status or invoices won’t be produced correctly and somebody higher up in the chain of command will have an urgent conversation with you. But if you avoid using the BI system because it is too difficult or it doesn’t provide answers to your questions then probably no one in the organisation will know, even if they use the monitoring tools. These tools will only tell them if you opened the cube or the report but they have nothing to say about the influence of the presented facts on your decision making process.
The possibilities to explore and exploit data are only limited by the availability of the data and your analytical capabilities. “Availability” should be translated into “usable, verifiable, quality-checked, well defined and traceable data which can support fact based decision making”, otherwise it is just… lots of data. Meaningless data can be very demotivating for end users. Think about it when you are setting up report bursting for example. The simple interaction “user effort – new insights” is what motivates users to come up with better decisions, smarter information requirements for new iterations. This simple interaction lifts the entire organization to a higher maturity level in BI.
It is also a plea for agile BI (read my agile BI manifesto here) because too many projects fail to deliver functionality within the time perspective of the user. If the user is not on board of the new system rapidly, he will be reluctant to trade in his spreadsheet and his calculator for something new that is not meeting his expectations.

Conclusion: introducing a new BI tool needs careful organisation wide change management. Anyone who thinks he can do with less will end up with nothing.
In the next post I will suggest a few remedies to increase the success factor of a new BI tool introduction. Stay tuned!

woensdag 5 februari 2014

Some Thoughts on Predictive and Customer Analytics @BA4All

Last Thursday, three excellent sessions on predictive and customer analytics led me to a few conclusions


Technology allows very old dreams of marketers to come true:
  • To know and understand the response logistics from product design or customization via the attention – interest – desire – action chain to the sales counter,
  • To decompose aggregate marketing information to the lowest grain: the individual consumer,
  • To get deep understanding of switching behavior while the consumer is shopping,
  • To monitor online the impact of all marketing variables
The more technology evolves and delivers value, the more need for human relationship and context management becomes a necessity. These are two sides of the same coin in business intelligence: user acceptance and user integration of data driven decision making will be the bottleneck in any marketing technology push.
And finally, all this new stuff still needs to pass the test of a well balanced business case. The balance between the added value of information in hard currency and a softer evaluation of improving the strategic position with new technology needs careful study.
Go the website for the full article.

woensdag 8 januari 2014

Real-time BI, who would have thought that twenty years ago?

(Abstract)
Business Intelligence has come a long way. From the static, rigorous decision support systems to the agile BI architectures and the volatile Big Data stores... It has been a journey filled with success and failure, with never ending discussions about which architecture would provide a robust future proof delivery platform.
With social analytics as a new addition to real time BI, another discussion is opened: 
where and how is (near-) real time BI worth the investment?
And what aspects need to be considered to produce a balanced investment appraisal?