woensdag 21 mei 2014

The Last Mile in the Belgian Elections (IV)

How Topic Detection Leads to Share-of-voice Analysis


It was a full day of events on Twitter. Time to make an inventory of the principal topics and the buzz created on the social network in the Dutch speaking community in the north of Belgium.
First, the figures: 10.605 tweets were analysed of which 5.754 were referring to an external link (i.e. a news site or another external web site like a blog, a photo album etc…)
As the Flemish nationalist party leader Mr. Dewever from N-VA (the New Flemish Alliance in English) launched his appeal to the French speaking community today, we focused on the tweets about, to and from this party.
A mere 282 tweets were deemed relevant for topic analysis. And here’s the first striking number: of these 282 tweets only 16 contained a reactive response. 
Tweets that provoked a reactive response are almost nonexistent

About 49 topics were grouping several media sources and publications of all sorts. We will discuss three to illustrate how the relationship between topic, retweets, klout score and added content makes some tweets more relevant than others. These are the three topics:

  • Dewever addresses the French speaking community via Twitter
  • Christian Democrat De Clerck falsely accuses N-VA of using fascist symbols in an advertisement
  • You Tube movie from N-VA is ridiculed by the broad community 

Dewever addresses the French speaking community via Twitter

This topic is divided in a moderately positive headline and two neutral ones. The positive: Bart Dewever to the French Speaking Community: “Give N-VA a Chance”
This headline generates a total klout score of 188 where the Flemish tv station VRT takes the biggest chunk with 158 klout score.
This neutral headline generates only 98 klout score: “Dewever puts the struggle between N-VA and the French speaking socialist party at the centre of the discussion”
The other neutral headline “N-VA President Bart Dewever addresses the French speaking community directly” delivers a higher score: 140 klout score partly because one of N-VA’s members of Parliament promoted the link to the news medium.
All in all with 426 total klout score, this topic does not cause great ripples, especially not if you compare this to a mere anecdote, which is the second topic.

Christian Democrat De Clerck falsely accuses N-VA of using fascist symbols in an advertisement

On the left, the swastika hoax, commented by the christian democrat and in the right the original ad showing a labyrinth

Felix De Clerck, son of the former Christian democrat minister of Justice Stefaan De Clerck, reacted to a hoax and was chastised for doing this. With a klout score of 967 this has caused a bigger stir although the political relevance is a lot smaller than Dewever’s speech… Emotions can play a role even in simple and neutral retweets.


You Tube movie from N-VA is ridiculed by the broad community


Another day’s high was reached with an amateuristic and unprofessional YouTube movie which showed a parody on a famous Flemish detective series to highlight the major issues of the campaign. This product from the candidates in West-Flanders, including the Flemish minister of Interior Affairs, Geert Bourgeois generated a total klout score of 778 tweets and retweets with negative or sarcastic comments.
Yet an adjacent topic about a cameraman from Bruges who is surprised by minister Bourgeois’ enthusiasm generates a 123 moderately positive klout score.

Three topics out of 49 generate 20.6 % of total klout scores!

This illustrates perfectly how the Twitter community selects and reinforces topics that carry an emotional value: the YouTube movie and the hoax from De Clerck generated a share of voice of no less than almost 17% of the tweets.

Forgive me for reducing the scope to Flanders, the political scope to just one party and the tweets to only three because this blog has not the intention of presenting the full enchilada. I hope we have demonstrated with today’s contribution that topics and the way they are perceived and handled can vary greatly in impact and cannot be entirely reduced to numbers. In other words, the human interpreter will deliver added value for quite a long time.

dinsdag 20 mei 2014

The Last Mile in the Belgian Elections (III)

Awesome Numbers... Big Data Volumes

Wow, the first results are awesome. Well, er, the first calculations at least are amazing.

  • 8500 tweets per 15 seconds measured means 1.5 billion tweets per month if you extrapolate this in a very rudimentary way...
  • 2 Kb per tweet = 2.8 Terabytes on input data per month if you follow the same reasoning. Nevertheless it is quite impressive for a small country like Belgium where the Twitter adoption is not on par with the northern countries..
  • If you use  55 kilobytes for a  model vector of 1000 features you generate 77 Terabyte of information per month
  • 55 K is a small vector. A normal feature vector of one million  features generates 72 Petabytes of information per month.

And wading through this sea of data you expect us to come up with results that matter?
Yes.
We did it.

Male versus female tweets in Belgian Elections
Gender analysis of tweets in the Belgian elections n = 4977 tweets

Today we checked the gender differences

The Belgian male Twitter species is clearly more interested in politics than the female variant: only 22 % of the 24 hours tweets were of female signature, the remaining 78 % were of male origin.
This is not because Belgian women are less present on Twitter: 48 % are female tweets against 52 % of the male sources.
Analysing the first training results for irony and sarcasm also shows a male bias. the majority of the sarcastic tweets were male: 95 out of 115. Only 50 were detected by the data mining algorithms so we still have some training to do.
More news tomorrow!

maandag 19 mei 2014

The Last Mile in the Belgian Elections (II)

Getting Started


I promised to report on my activities in social analytics. For this report, I will try to wear the shoes of a novice user and report, without any withholdings about this emerging discipline. I explicitly use the word “emerging” as it has all the likes of it: technology enthusiasts will have no problem overlooking the quirks preventing an easy end to end “next-next-next” solution. Because there is no user friendly wizard that can guide you from selecting the sources, setting up the target, creating the filters and optimising the analytics for cost, sample size, relevance and validity checks, I will have to go through the entire process in an iterative and sometimes trial-and-error way.
This is how massive amounts of data enter the file system
Over the weekend and today I have been mostly busy just doing that. Tweet intakes ranged from taking in 8.500 Belgian tweets in 15 seconds and doing the filtering locally on our in memory database to pushing all filters to the source system and getting 115 tweets in an hour. But finally, we got to an optimum query result and the Belgian model can be trained. The first training we will set up is detecting sarcasm and irony. With the proper developed and tested algorithms we hope for a 70% accuracy in finding tweets that express exactly the opposite sentiment of what the words say. Tweets like “well done, a**hole” are easy to detect but it’s the one without the description of the important part of the human digestive system that’s a little harder.
The cleaned output is ready for the presentation layer
Conclusion of this weekend and today: don’t start social analytics like any data mining or statistical project. Because taming the social media data is an order of magnitude harder than crunching the numbers  in stats.

Let’s all cross our fingers and hope we can come up with some relevant results tomorrow.

woensdag 14 mei 2014

The Last Mile in the Belgian Elections

Sentiment Analysis, a Predictor of the Outcome?


Data2Action is an agile data mining platform consisting of efficiently integrated components for rapid application development. One deliverable of Data2Action is SAM, for Social Analytics and Monitoring.
In the coming days, I will publish the daily results from sentiment analysis on Twitter with regards to the programmes, the major candidates and interest groups.

Data2Action and social analytics

Stay tuned for the first report on Monday 19th May

Questions like:

  • Which media produce the most negative or positive tweets about which party, which major candidate?
  • Who are the major influencers on Twitter?
  • What are the tweets with the highest impact?
The major networks will stimulate lots of tweets this weekend so we will present the analysis next Monday.

zaterdag 3 mei 2014

What has Immanuel Kant got to do with it??

Making a Success of New BI Tool Introduction


In the previous post I indicated the five major causes why BI consultants fail to introduce a new BI tool in the organisation. As promised, I have not just raised questions but I am ready to provide you with some answers.
Some of my colleagues in Business Intelligence commented on the LinkedIn discussion forum. I will quote their comments and integrate them in this post.
It is all about embedding the tool in a larger setting, larger than the competences of one BI specialist.
Some people won’t like to read this. The reason is simple: positioning the BI tool in a very broad, organisation wide vision goes beyond the competences of a technical project lead.  The approach requires teamwork and input of business analysts, strategic consultants and change managers. It requires more time and budget and both are scarce resources in an organisation. 
But if you look at the wasted time and money in remedial efforts to get the new BI tool on the road, you can consider the extra effort and resources as an insurance premium. Because you can only make a first impression once. 

These are the seven steps to successful introduction I will address in the article on my book site BA4BI:

* Get a deep insight in the organisation’s DNA
* Understand its strategy
* Understand its information needs
* Assess the information modelling acceptance in the organisation
* Translate the previous in the tool’s requirements
* Introduce the tool 
* Develop the decision making culture with the new tool

vrijdag 2 mei 2014

Questions to Ask Ralph Kimball the 10th June in 't Spant in Bussum (Neth.)

Dear Ralph,

I know you’re a busy man so I won’t take too much of your time to read this post. I look forward to meeting you June 10 in 't Spant in Bussum for an in depth session on Big Data and your views on the phenomenon.
In one of your keynotes you will address your vision on how Big Data drives the Business and IT to adapt and evolve. Let me first of all congratulate you with the title of your keynote. It proves that a world class BI and data warehouse veteran is still on top of things, which we can’t say for some other gurus of your generation, but let’s not dwell on that.
I have been studying the Big Data Phenomenon from my narrow perspective: business analysis and BI architecture and here are some of the questions I hope we can tackle during your keynote session:

1. Do you consider Big Data as something you can fit entirely in star schemas? I know since The Data Webhouse Toolkit days that semi structured data like web logs can find a place in a multidimensional model but some of the Big Data produce is to my knowledge not fit for persistent storage. Yet I believe that a derived form of persistent storage may be a good idea. Let me give you an example. Imagine we can measure the consumer mood for a certain brand on a daily basis, scanning the social media postings. Instead of creating a junk-like dimension we could build a star schema with the following dimensions: a mood dimension, social media source dimension, time, location and brand dimension to name the minimum and a central fact table with the mood score on a seven point Likert scale. The real challenge will lie in correctly structuring the text strings into the proper Likert score using advanced text analytics. Remember the wrong interpretation of the Osama Bin Laden tweets early May 2011? The program interpreted “death” as a negative mood when the entire US was cheering the expedient demise of the terrorist.
Figure 1: An example of derived Big Data in a multidimensional schema

2. How will you address the volatility issue?  Because Big Data’s most convincing feature is not volume, velocity or variety which have always been relative to the state of the art. No, volatility is what really characterizes Big Data and I refer to my article here where I point out that Big Behavioural Data is the true challenge for analytics as emotions and intentions can be very volatile and the Law of Large Numbers may not always apply.
3. Do you see a case for data provisioning strategies to address the above two issues? With data provisioning, I mean a transparent layer between the business questions and the BI architecture where ETL provides answers to routine or planned business questions and virtual data marts produce answers to ad hoc and unplanned business questions. If so, what are the major pitfalls of virtualization for Big Data Analytics?
4. Do you see the need for new methodologies and new modeling methods or does the present toolbox suffice?

It’s been a while since we met and I really look forward to meeting you in Bussum, whether you answer these questions or not. 


Kind regards,

Bert Brijs

dinsdag 15 april 2014

Why Business Intelligence Consultants Fail at Introducing New BI Tools

Many studies, surveys and other empirical evidence shows that between 25 and 45 % of all Business Intelligence (BI) software is deprecated and ends up as shelf ware. This is less the fault of aggressive software vendors than BI consultants who can’t look beyond at least five blinkers. And whether the BI consultant(s) involved was an internal staff member or outsourced doesn’t affect the outcome.
There are many causes of failed BI tool introductions ranging from “no executive sponsorship”, via “an underdeveloped business case”  to “not managing the scope properly “ but all these are not at the root of the problem. The BI project leader and the BI Business Analyst are responsible to manage these risks. 
Let’s have a look at these five blinders that are at the root cause of poorly managed tool introductions:

  • Focus on the technical aspects
  • Not enough problem owners identified
  • Managing expectations poorly
  • Too much power in the hands of the BI consultant
  • Pouring old wine in new bags

Focus on the technical aspects

A large publishing company asked us to compare two market leading BI tools. There were two technology factions in the IT department and we were asked to take the role of the objective referee between the two opposing camps. After days and days of intensive study, testing and two proofs of concept the differences were negligible and we advised the customer to  let the total cost of ownership decide which would be the tool of choice. But the two factions kept their hawkish stance on the tool of their choice.
We then tried to convince them to quit the technical discussions and take the business questions and the end user experience into account. The reaction of both parties was pointing at the real problem: “Managers all ask the same questions so this approach is irrelevant”.  In their vision the business questions were generic but the tool was unique. Never was the distance between IT and the business users wider. The lack of mutual understanding between business and IT is still prevalent in many organizations.  Only age old mail-order companies and pure play ecommerce enterprises have passed or skipped this development stage.

Not enough problem owners identified

If the BI consultants are unable to identify sufficient problem owners, the advice should be to postpone the tool introduction and if that is not possible, at least invest heavily in detecting and even creating problem owners. In too many cases finance and admin  (F&A) departments are the sole problem owners in a finance BI projects. As if HRM, marketing and operations have nothing to do with revenue,, variable and fixed costs…
During one of our BI audits in a third party logistics company we found that F&A had only knowledge of 27% of the total information needs. Yet the other 73 %  were in weaker or stronger form related to F&A information requirements… Needless to point out that this approach produces a weak foundation for a durable BI architecture.
Conclusion: the entire organization needs to own the information management problems and delegate these to the project steering committee.

Managing expectations poorly

During an audit of a customer relationship management (CRM) system at a large PC distributor we noticed it was impossible to create a product profile per customer. There was an order history per customer but these records only contained article number, price and quantity. If this multi million euro organisation had foreseen an extra field for category management, then this information had produced some meaning. Because who can tell if a 1Gb hard disk in 1998 was sold as consumer or a professional product?
The lesson to be learned is: before you accept any of the information requirements, you need to validate these against the available source data or else you may create expectations you can never meet.

Too much power in the hands of the BI consultant

We can all live with the fact that there are no objective consultants who can deliver totally value free advice based on scientific evidence. But one type of advice should always raise your suspicion: that of “consulting intensive”  BI tools. Any specialist knows that there are many degrees between “download, install and DIY” on one side and “Welcome to the consultants who will never leave” on the other. So make sure you choose a tool that can stay on board for at least five years. BI technology may produce a real or hyped breakthrough on every Gartner Summit but organisations need to be able to follow, adopt and adapt to the new technology to make maximum use of it.
I sometimes wonder if there is not a correlation between the size of the consulting organization and the size of the BI solution they bring to the table.
My advice: go visit a comparable organisation and see what their experience is with the BI tool and the consulting organization. It will set the PowerPoints of your BI consultants in perspective.

Pouring old wine in new bags

To paraphrase Bill Clinton: “It’s the user, stupid!”  Contrary to ERP systems where negative feedback is the norm and where users are forced to work with it, BI is a world of user motivation and positive feedback. If you avoid using the ERP system then essential documents and information like purchase orders, inventory status or invoices won’t be produced correctly and somebody higher up in the chain of command will have an urgent conversation with you. But if you avoid using the BI system because it is too difficult or it doesn’t provide answers to your questions then probably no one in the organisation will know, even if they use the monitoring tools. These tools will only tell them if you opened the cube or the report but they have nothing to say about the influence of the presented facts on your decision making process.
The possibilities to explore and exploit data are only limited by the availability of the data and your analytical capabilities. “Availability” should be translated into “usable, verifiable, quality-checked, well defined and traceable data which can support fact based decision making”, otherwise it is just… lots of data. Meaningless data can be very demotivating for end users. Think about it when you are setting up report bursting for example. The simple interaction “user effort – new insights” is what motivates users to come up with better decisions, smarter information requirements for new iterations. This simple interaction lifts the entire organization to a higher maturity level in BI.
It is also a plea for agile BI (read my agile BI manifesto here) because too many projects fail to deliver functionality within the time perspective of the user. If the user is not on board of the new system rapidly, he will be reluctant to trade in his spreadsheet and his calculator for something new that is not meeting his expectations.

Conclusion: introducing a new BI tool needs careful organisation wide change management. Anyone who thinks he can do with less will end up with nothing.
In the next post I will suggest a few remedies to increase the success factor of a new BI tool introduction. Stay tuned!