vrijdag 23 mei 2014

The Last Mile in the Belgian Elections (VI)

Are Twitter People Nice People?


The answer is: “Depends”. In this article I make a taxonomy of tweets in the last week of the Belgian elections. Based on over 35.000 tweets we can be pretty sure that this is a representative sample. You can consider this article as an introduction to tomorrow's headline: the last election poll, based on twitter analytics.

A picture says more than a thousand tweets

The taxonomy of the Twitter community

So here it is.  The majority of tweets are negative. When you encounter positive tweets, they are either from somebody who wants to market something (in case of the elections him or herself or a candidate he or she supports) or from somebody who is forwarding a link with a positive comment.
There is a correlation between the level of negativity about a subject and the political party related to the subject. From a political point of view, the polarisation between the Walloon socialist party and the Flemish nationalist party is clearly visible on Twitter.
Even today, on the funeral of the well-respected politician of the older generation, the former Belgian prime minister Jean-Luc Dehaene, the majority of tweets were negative. Tweets linking him to the financial scandal of the Christian democrat trade union in Dexia were six times more than the pious "RIP JLD" variants.
So how do you derive popularity and even arrive at some predictive value from a bunch of negative tweets?  That, my dear blog readers, will be examined tomorrow in the final article. 





donderdag 22 mei 2014

The Last Mile in the Belgian Elections (V)

Why Sentiment Measures Alone Are Not Enough


In the process of developing Social Analytics and Monitoring, we learnt something most interesting about sentiment analysis. Before we created Data2Action  as a platform for data mining and developed SAM (Social Analytics and Monitoring) we studied many approaches.
Many of these were just producing numbers to express sentiment versus a brand, a person, a concept or a company, to name a few.
Isolated Sentiment Analysis is Meaningless
This can be too superficial to produce meaningful analytic results so we recreated social constructs that match with concepts. Analysing the sentiment of a construct element in context with a topic is not a trivial task. But at least it approaches human judgement and it can be trained to increase precision and relevance.
Today, I am not going to amaze you with Big Numbers but I’ll show you some examples of how we approach sentiment analysis with SAM.
Let’s take a few tweets about the N-VA party and examine how they are scored:
The ultimate horror for companies and a torpedo for our welfare state: an anti N-VA coalition with the ecologist party
Another point where N-VA does not represent the Flemish people
From a one-dimensional point of view, both tweets are negative for N-VA but the first is in fact meant as a positive, pro N-VA statement.
Let us look at this, more complex tweet:
Vande Lanotte opens up the coalition for the Green Party, wrong move as the voters already consider N-VA strong enough.
The first part of the sentence “Vande Lanotte opens up the coalition for the Green Party” can be considered positive for Vande Lanotte and his socialist party SP-A. But the second part is negative. This shows the importance of parsing the sentence correctly and attributing scores as a function of viewpoints.



woensdag 21 mei 2014

The Last Mile in the Belgian Elections (IV)

How Topic Detection Leads to Share-of-voice Analysis


It was a full day of events on Twitter. Time to make an inventory of the principal topics and the buzz created on the social network in the Dutch speaking community in the north of Belgium.
First, the figures: 10.605 tweets were analysed of which 5.754 were referring to an external link (i.e. a news site or another external web site like a blog, a photo album etc…)
As the Flemish nationalist party leader Mr. Dewever from N-VA (the New Flemish Alliance in English) launched his appeal to the French speaking community today, we focused on the tweets about, to and from this party.
A mere 282 tweets were deemed relevant for topic analysis. And here’s the first striking number: of these 282 tweets only 16 contained a reactive response. 
Tweets that provoked a reactive response are almost nonexistent

About 49 topics were grouping several media sources and publications of all sorts. We will discuss three to illustrate how the relationship between topic, retweets, klout score and added content makes some tweets more relevant than others. These are the three topics:

  • Dewever addresses the French speaking community via Twitter
  • Christian Democrat De Clerck falsely accuses N-VA of using fascist symbols in an advertisement
  • You Tube movie from N-VA is ridiculed by the broad community 

Dewever addresses the French speaking community via Twitter

This topic is divided in a moderately positive headline and two neutral ones. The positive: Bart Dewever to the French Speaking Community: “Give N-VA a Chance”
This headline generates a total klout score of 188 where the Flemish tv station VRT takes the biggest chunk with 158 klout score.
This neutral headline generates only 98 klout score: “Dewever puts the struggle between N-VA and the French speaking socialist party at the centre of the discussion”
The other neutral headline “N-VA President Bart Dewever addresses the French speaking community directly” delivers a higher score: 140 klout score partly because one of N-VA’s members of Parliament promoted the link to the news medium.
All in all with 426 total klout score, this topic does not cause great ripples, especially not if you compare this to a mere anecdote, which is the second topic.

Christian Democrat De Clerck falsely accuses N-VA of using fascist symbols in an advertisement

On the left, the swastika hoax, commented by the christian democrat and in the right the original ad showing a labyrinth

Felix De Clerck, son of the former Christian democrat minister of Justice Stefaan De Clerck, reacted to a hoax and was chastised for doing this. With a klout score of 967 this has caused a bigger stir although the political relevance is a lot smaller than Dewever’s speech… Emotions can play a role even in simple and neutral retweets.


You Tube movie from N-VA is ridiculed by the broad community


Another day’s high was reached with an amateuristic and unprofessional YouTube movie which showed a parody on a famous Flemish detective series to highlight the major issues of the campaign. This product from the candidates in West-Flanders, including the Flemish minister of Interior Affairs, Geert Bourgeois generated a total klout score of 778 tweets and retweets with negative or sarcastic comments.
Yet an adjacent topic about a cameraman from Bruges who is surprised by minister Bourgeois’ enthusiasm generates a 123 moderately positive klout score.

Three topics out of 49 generate 20.6 % of total klout scores!

This illustrates perfectly how the Twitter community selects and reinforces topics that carry an emotional value: the YouTube movie and the hoax from De Clerck generated a share of voice of no less than almost 17% of the tweets.

Forgive me for reducing the scope to Flanders, the political scope to just one party and the tweets to only three because this blog has not the intention of presenting the full enchilada. I hope we have demonstrated with today’s contribution that topics and the way they are perceived and handled can vary greatly in impact and cannot be entirely reduced to numbers. In other words, the human interpreter will deliver added value for quite a long time.

dinsdag 20 mei 2014

The Last Mile in the Belgian Elections (III)

Awesome Numbers... Big Data Volumes

Wow, the first results are awesome. Well, er, the first calculations at least are amazing.

  • 8500 tweets per 15 seconds measured means 1.5 billion tweets per month if you extrapolate this in a very rudimentary way...
  • 2 Kb per tweet = 2.8 Terabytes on input data per month if you follow the same reasoning. Nevertheless it is quite impressive for a small country like Belgium where the Twitter adoption is not on par with the northern countries..
  • If you use  55 kilobytes for a  model vector of 1000 features you generate 77 Terabyte of information per month
  • 55 K is a small vector. A normal feature vector of one million  features generates 72 Petabytes of information per month.

And wading through this sea of data you expect us to come up with results that matter?
Yes.
We did it.

Male versus female tweets in Belgian Elections
Gender analysis of tweets in the Belgian elections n = 4977 tweets

Today we checked the gender differences

The Belgian male Twitter species is clearly more interested in politics than the female variant: only 22 % of the 24 hours tweets were of female signature, the remaining 78 % were of male origin.
This is not because Belgian women are less present on Twitter: 48 % are female tweets against 52 % of the male sources.
Analysing the first training results for irony and sarcasm also shows a male bias. the majority of the sarcastic tweets were male: 95 out of 115. Only 50 were detected by the data mining algorithms so we still have some training to do.
More news tomorrow!

maandag 19 mei 2014

The Last Mile in the Belgian Elections (II)

Getting Started


I promised to report on my activities in social analytics. For this report, I will try to wear the shoes of a novice user and report, without any withholdings about this emerging discipline. I explicitly use the word “emerging” as it has all the likes of it: technology enthusiasts will have no problem overlooking the quirks preventing an easy end to end “next-next-next” solution. Because there is no user friendly wizard that can guide you from selecting the sources, setting up the target, creating the filters and optimising the analytics for cost, sample size, relevance and validity checks, I will have to go through the entire process in an iterative and sometimes trial-and-error way.
This is how massive amounts of data enter the file system
Over the weekend and today I have been mostly busy just doing that. Tweet intakes ranged from taking in 8.500 Belgian tweets in 15 seconds and doing the filtering locally on our in memory database to pushing all filters to the source system and getting 115 tweets in an hour. But finally, we got to an optimum query result and the Belgian model can be trained. The first training we will set up is detecting sarcasm and irony. With the proper developed and tested algorithms we hope for a 70% accuracy in finding tweets that express exactly the opposite sentiment of what the words say. Tweets like “well done, a**hole” are easy to detect but it’s the one without the description of the important part of the human digestive system that’s a little harder.
The cleaned output is ready for the presentation layer
Conclusion of this weekend and today: don’t start social analytics like any data mining or statistical project. Because taming the social media data is an order of magnitude harder than crunching the numbers  in stats.

Let’s all cross our fingers and hope we can come up with some relevant results tomorrow.

woensdag 14 mei 2014

The Last Mile in the Belgian Elections

Sentiment Analysis, a Predictor of the Outcome?


Data2Action is an agile data mining platform consisting of efficiently integrated components for rapid application development. One deliverable of Data2Action is SAM, for Social Analytics and Monitoring.
In the coming days, I will publish the daily results from sentiment analysis on Twitter with regards to the programmes, the major candidates and interest groups.

Data2Action and social analytics

Stay tuned for the first report on Monday 19th May

Questions like:

  • Which media produce the most negative or positive tweets about which party, which major candidate?
  • Who are the major influencers on Twitter?
  • What are the tweets with the highest impact?
The major networks will stimulate lots of tweets this weekend so we will present the analysis next Monday.

zaterdag 3 mei 2014

What has Immanuel Kant got to do with it??

Making a Success of New BI Tool Introduction


In the previous post I indicated the five major causes why BI consultants fail to introduce a new BI tool in the organisation. As promised, I have not just raised questions but I am ready to provide you with some answers.
Some of my colleagues in Business Intelligence commented on the LinkedIn discussion forum. I will quote their comments and integrate them in this post.
It is all about embedding the tool in a larger setting, larger than the competences of one BI specialist.
Some people won’t like to read this. The reason is simple: positioning the BI tool in a very broad, organisation wide vision goes beyond the competences of a technical project lead.  The approach requires teamwork and input of business analysts, strategic consultants and change managers. It requires more time and budget and both are scarce resources in an organisation. 
But if you look at the wasted time and money in remedial efforts to get the new BI tool on the road, you can consider the extra effort and resources as an insurance premium. Because you can only make a first impression once. 

These are the seven steps to successful introduction I will address in the article on my book site BA4BI:

* Get a deep insight in the organisation’s DNA
* Understand its strategy
* Understand its information needs
* Assess the information modelling acceptance in the organisation
* Translate the previous in the tool’s requirements
* Introduce the tool 
* Develop the decision making culture with the new tool