Getting Started
I promised to report on my activities in
social analytics. For this report, I will try to wear the shoes of a novice
user and report, without any withholdings about this emerging discipline. I explicitly
use the word “emerging” as it has all the likes of it: technology enthusiasts
will have no problem overlooking the quirks preventing an easy end to end “next-next-next”
solution. Because there is no user friendly wizard that can guide you from
selecting the sources, setting up the target, creating the filters and
optimising the analytics for cost, sample size, relevance and validity checks,
I will have to go through the entire process in an iterative and sometimes trial-and-error
way.
This is how massive amounts of data enter the file system |
Over the weekend and today I have been
mostly busy just doing that. Tweet intakes ranged from taking in 8.500 Belgian tweets
in 15 seconds and doing the filtering locally on our in memory database to
pushing all filters to the source system and getting 115 tweets in an hour. But
finally, we got to an optimum query result and the Belgian model can be
trained. The first training we will set up is detecting sarcasm and irony. With
the proper developed and tested algorithms we hope for a 70% accuracy in
finding tweets that express exactly the opposite sentiment of what the words
say. Tweets like “well done, a**hole” are easy to detect but it’s the one
without the description of the important part of the human digestive system
that’s a little harder.
The cleaned output is ready for the presentation layer |
Conclusion of this weekend and today: don’t
start social analytics like any data mining or statistical project. Because taming
the social media data is an order of magnitude harder than crunching the numbers
in stats.
Let’s all cross our fingers and hope we can
come up with some relevant results tomorrow.