The Flemish Parliament’s Predictions
Scope management is important if you are on a tight budget
and your sole objective is to prove that social media analytics is a journey
into the future. That is why we concentrated on Flanders, the northern part of
Belgium. (Yet, the outcome of the elections for the Flemish parliament will
determine the events on Belgian level: if the N-VA wins significantly, they can
impose some of their radical methods to get Belgium out of the economic slump
which is not very appreciated in the French speaking south.) In commercial terms, this last week of
analytics would have cost the client 5.7 man-days of work. Compare this to the
cost of an opinion poll and there is a valid add on available for opinion polls
as the Twitter analytics can be done a continuous basis. A poll is a photograph
of the situation while social media analytics show the movie.
A poll is a photograph of the situation while social media analytics show the movie.
From Share-of-Voice to Predictions
It’s been a busy week. Interpreting tweets is not a simple
task as we illustrated in the previous blog posts. And today, the challenge
gets even bigger. To predict the election outcome in the northern, Dutch
speaking part of Belgium on the basis of sentiment analysis related to topics
is like base-jumping knowing that not one, but six guys have packed your
parachute. These six guys are totally biased. Here are their names, in
alphabetical order, in case you might think I am biased:
Dutch name
|
Name used in this blog post
|
CD&V (Christen
Democratisch en Vlaams)
|
Christian democrats
|
Groen
|
Green (the ecologist
party)
|
N-VA (Nieuw-Vlaamse
Alliantie)
|
Flemish nationalists
|
O-VLD (Open Vlaamse Liberalen en Democraten)
|
Liberal democrats
|
SP-A (Socialistische
Partij Anders)
|
Social democrats
|
VB (Vlaams Belang)
|
Nationalist &
Anti-Islam party
|
Table 1 Translation of the original Dutch party names
From the opinion polls, the consensus is that the Flemish
nationalists can obtain a result over 30 % but the latest poll showed a
downward trend breach, the Nationalist Anti-Islam party will lose further and
become smaller than the Green party. In our analysis we didn’t include the
extreme left wing party PVDA for the simple reason that they were almost
non-existent on Twitter and the confusion with the Dutch social democrats
created a tedious filtering job which is fine if you get a budget for this. But
since this was not the case, we skipped them as well as any other exotic
outsider. Together with the blanc and invalid votes they may account for an
important percentage which will show itself at the end of math exercises. But
the objective of this blog post is to examine the possibilities of
approximating the market shares with the share of voice on Twitter, detect the
mechanics of possible anomalies and report on the user experience as we
explained at the outset of this Last Mile series of posts.
If we take the rough data of the share-of-voice on over
43.000 tweets we see some remarkable deviations from the consensus.
Party
|
Share of voice on Twitter
|
|
21,3 %
|
Green (the ecologist party)
|
8,8 %
|
Flemish nationalists
|
27,9 %
|
Liberal democrats
|
13,6 %
|
Social democrats
|
12,8 %
|
Nationalist &
Anti-Islam party
|
11,3 %
|
Void, blanc, mini
parties
|
4,3 %
|
Table 2. Percentage share of voice on Twitter per Flemish party
It is common practice nowadays to combine the results of multiple models
instead of using just one. Not only in statistics is this better, Nobel prize
winner Kahneman has shown this clearly in his work. In this case we combine
this model with other independent models to come to a final one.
In this case we use the opinion polls to derive the covariance matrix.
Table 3. The covariance matrix with the shifts in market shares
This
allows us to see things such as, if one party’s share grows, at which party’s
expense is it? In the case of the Flemish nationalists it does so at the cost
of the Liberal democrats and the Nationalist and Anti-Islam party but it wins
less followers from the Christian and the social democrats. The behaviour of Green
and the Nationalist and Anti-Islam party during the opinion polls was very
volatile, which explains for a part the spurious correlations with other
parties.
Graph 1 Overview of all opinion poll results: the evolution of the market shares in different opinion
polls over time.
Comparing the different opinion polls, from different
research organisations, on different samples is simply not possible. But if you
combine all numbers in a mathematical model you can smooth a large part of
these differences and create a central tendency.
To combine the different models, we use a derivation of
the Black-Litterman model used in finance. We are violating some assumptions
such as general market equilibrium which we replace by a total different
concept as opinion polls. However the elegance of this approach allows us to
take into account opinions, confidence in this opinion and complex
interdepencies between the parties. The mathematical gain is worth the
sacrifice of the theoretical underpinning.
This is based on a variant of the Black-Litterman model μ=Π+τΣt(Ω+τPΣPt)(p−PΠ)
And the Final Results Are…
Party
|
Central Tendency of all opinion polls
|
Data2Action’s Prediction
|
|
18 %
|
18,7 %
|
Green (the ecologist
party)
|
8,7 %
|
|
|
31 %
|
30,3 %
|
|
14 %
|
13,7 %
|
|
13,3 %
|
13,3 %
|
|
9,4 %
|
9,8 %
|
Other (blanc, mini
parties,…)
|
5,6 %
|
5,4 %
|
Total
|
100 %
|
100 %
|
Table 4. Prediction of the results of the votes for the Flemish Parliament
Now let’s cross our fingers and hope we produced some
relevant results.