Posts tonen met het label Analytics. Alle posts tonen
Posts tonen met het label Analytics. Alle posts tonen

vrijdag 1 maart 2019

About Ends and Means, or Beginning and Ends...


It has been a while since I published anything on this blog. But after having been confronted with organisations that –from an analytics point of view- live in the pre-industrial era, I need to get a few things off my chest.
In these organisations (and they aren’t the smallest ones)  ends and means are mixed up, and ends are positioned as the beginning of Business Intelligence. Let me explain the situation.

Ends are the beginning



sea ice
A metaphor for a critical look at reporting requirements is like watching heavy drift ice 
and wondering whether it’s coming from a land based glacier or from an iceberg...

Business users formulate their requirements in terms of reports. That’s OK, as long as someone, an analyst, an architect or even a data modeller understands this is not the end of the matter, on the contrary.
Yet too many information silos have been created when this rule is ignored. If an organisation considers report requirements as the start of a BI project they are skipping at least the following questions and the steps needed to produce a meaningful analytics landscape that can stand the test of time:

  • New information silos emerge with an end-to-end infrastructure to answer a few specific business questions leaving opportunities for a richer information centre unexplored.
  • The cost per report becomes prohibitive. Unless you think € 60.000 to create one (1) report is a cinch…
  • Since the same data elements run the risk of being used in various data base schemas, the extract and load processes pay a daily price in terms of performance and processing cost.

Ends and means are mixed up


A report is the result of an analytical process, combining data for activities like variance analysis, trend analysis, optimisation exercises, etc.. As such it is a means to support decision making; so rather than accepting the report requirements as such, some reverse engineering is advised:

What are the decisions to be made for the various managerial levels, based on these report requirements?

You  may wonder why this obvious question needs to be asked but be advised, some reports are the equivalent of a news report. The requestor might just want to know about what happens without ever drawing any conclusions let alone linking any consequences to the data presented.

What are the control points needed by the controller to verify aspects of the operations and their link to financial results?

Asking this question almost always leads to extending the scope of the requirements. Controllers like to match data from various sources to make sure the financial reports reflect the actual situation.

What are the future options, potential requirements and / or possibilities of the required enhanced with the available data in the sources?

This exercise is needed to discover analytical opportunities which may not be taken at the moment for a number of reasons like: insufficient historical data, lacking analytical skills to come up with meaningful results… But that must not stop the design from taking the data in scope from the start. Adding the data in a later stage will come at a far greater cost than the cost of the scope extension.

What is the basic information infrastructure to facilitate the above? I.e. what is the target model?

A Star schema is the ideal communication platform between business and tech people.
Whatever modelling language you use, whatever technology you use (virtualisation, in memory analytics, appliances, etc…) in the end the front end tool will build a star schema. So take the time to build a logical data star schema model that  can be understood by both technical people and business managers.

What is the latency and the history needed per decision making horizon?

The latency question deals with a multitude of aspects and can take you to places you weren’t expecting when you were briefed about report requirements. As a project manager I’d advise you to handle with care as the scope may become unmanageable. Stuff like (near) real-time analytics, in database analytics, triple store extensions to the data warehouse, complex event processing mixing textual information with numerical measures… But as an analyst I’d advise you to be aware of the potentially new horizons to explore.
The history question is more straightforward and deals with the scope of the initial load. The slower the business cycle, the more history you need to load to come up with useful data sets for time series analysis.

What data do we present via which interface to support these various decision types?

This question begs a separate article but for now, a few examples should make things clear.
Static reports for external stakeholders who require information for legal purposes,
  • Reports using prompts and filters for team leaders who need to explore the data within predetermined boundaries,
  • OLAP cubes for managers who want to explore the data in detail and get new insights,
  • A dashboard for C- level executives who want the right cockpit information to run the business,
  • Data exploration results from data mining efforts to produce valid, new and potentially useful insights in running the business.

If all these questions are answered adequately, we can start the data requirements collection as well as the source to target mappings.



Three causes, hard to eradicate


If your organisation shows one or more of these three causes, you have a massive change management challenge ahead that will take more than a few project initiation documents to remedy. If you don’t get full support from top management, you’d better choose between accepting this situation and become an Analytics Sisyphus or look for another job.

Project based funding

Government agencies may use the excuse that there is no other way but moving from tender to tender, the French proverb “les excuses sont faites pour s’en servir” [1] applies. A solid data and information architecture, linked to the required capabilities and serving the strategic objectives of a government agency can provide direction to these various projects.
A top performing European retailer had a data warehouse with 1.500 tables, of which eight (8!) different time dimensions. The reason? Simple: every BU manager had sovereign rule over his information budget and “did it his way” to quote Frank Sinatra.

Hierarchical organisations

I already mentioned the study of Prof. Karin Moser introducing three preconditions for knowledge co-operation: reciprocity, a long term perspective for the employees and the organisation and breaking the hierarchical barriers. [2]
On the same pages I quote the authors Leliveld & Vink and Davos & Newstrom who support the idea that knowledge exchange based on reciprocity can only take place in organisational forms that present the whole picture to their employees and that keep the distance between co-workers and the company’s vision, objectives, customers etc. as small as possible.
Hierarchical organisations are more about power plays and job protection than knowledge sharing so the idea of having one shared data platform for everyone in the organisation to extract his own analyses and insights is an absolute horror scenario.

Process based support

Less visible but just as impactful, if IT systems are designed primarily for process support instead of attending as well to the other side of the coin, i.e. decision support, then you have a serious structural problem. Unlocking value from the data may be a lengthy and costly process. Maybe you will find some inspiration in a previous article on this blog: Design from the Data.
In short: processes are variable and need to be flexible, what lasts is the data. Information objects like a customer, an invoice, an order, a shipment, a region etc… are far more persistent than the processes that create or consume instances of these objects.




 [1]    Excuses are made to be used
 [2]    Business Analysis for Business Intelligence pp. 35 -38 CRC Books, a Taylor & Francis Company October 2012






dinsdag 29 maart 2016

Data Governance in Business Intelligence, a Sense of Urgency is Needed

The Wikipedia article on data governance gives a good definition and an overview of the related topics. But although you may find a few hints on how data governance impacts the business intelligence and analytics practice, the article is living proof that the link data governance with BI and Analytics is not really on the agenda of many organisations.

Sure, DAMA and the likes are reserving space in their Body of Knowledge for governance but it remains on the operational level and data  governance for analytics is considered a derived result from data governance for on line transaction processing (OLTP). I submit to you that it should be the other way around. Data governance should start from a clear vision on what data with which degree of consistency, accuracy and general quality measures to support the quality of the decision making process is needed. In a second iteration this vision should be translated into a governance process on the source data in the OLTP systems. Once this vision is in place, the lineage from source to target becomes transparent, trustworthy and managed for changes. Now the derived result is compliance with data protection, data security and auditability to comply with legislation like Sarbanes Oxley or the imminent EU directives on data privacy.

Two observations to make my point

Depending on the source, between 30 and 80 percent of all Business Intelligence projects fail. The reasons for this failure are manifold: setting expectations too high may be a cause but the root cause that emerges after thorough research is a distrust in the data itself or in the way data are presented in context and defined in their usability for the decision maker. Take the simple example of the object “Customer”. If marketing and finance do not use the same perspective on this object, conflicts are not far away. If finance considers anyone who has received an invoice in the past ten years as a customer, marketing may have an issue with that if 90 % of all customers renew their subscription or reorder books within 18 months.  Only clear data governance rules supported by a data architecture that facilitates both views on the object “Customer” will avoid conflicts.
Another approach: only 15 – 25 % of decision making is based on BI deliverables. On the plus side it may mean that 75 % of decision making is focused on managing uncertainty or nonsystematic risk which can be fine. But often it is rather the opposite: the organisation lacks scenario based decision making to deal with uncertainty and uses “gut feeling” and “experience” to take decisions that could have been fact based, if the facts were made available in a trusted setting.

Let’s spread the awareness for data governance in BI


Many thanks in advance!

dinsdag 16 februari 2016

maandag 26 mei 2014

Elections’ Epilogue: What Have We Learned?

First the good news: a MAD of 1.41 Gets the Bronze Medal of All Polls!

The results from the Flemish Parliament elections with all votes counted are:

Party
 Results (source: Het Nieuwsblad)
SAM’s forecast
20,48 %
18,70 %
Green (Groen)
8,7 %
8,75 %
31,88 %
30,32 %
Liberal democrats (open VLD)
14,15 %
13,70 %
13,99 %
13,27 %
5,92%
9,80%

Table1. Results Flemish Parliament compared to our forecast

And below is the comparative table of all polls compared to this result and the Mean Absolute Deviation (MAD) which expresses the level of variability in the forecasts. A MAD of zero value means you did a perfect prediction. In this case,with the highest score of almost 32 % and the lowest of almost six % in only six observations  anything under 1.5 is quite alright.

Table 2. Comparison of all opinion polls for the Flemish Parliament and our prediction based on Twitter analytics by SAM.

Compared to 16 other opinion polls, published by various national media our little SAM (Social Analytics and Monitoring) did quite alright on the budget of a shoestring: in only 5.7 man-days we came up with a result, competing with mega concerns in market research.
The Mean Absolute Deviation covers up one serious flaw in our forecast: the giant shift from voters from VB (The nationalist Anti Islam party) to N-VA (the Flemish nationalist party). This led to an underestimation of the N-VA result and an overestimation  of the VB result. Although the model estimated the correct direction of the shift, it underestimated the proportion of it.
If we would have used more data, we might have caught that shift and ended even higher!

Conclusion

Social Media Analytics is a step further than social media reporting as most tools nowadays do. With our little SAM, built on the Data2Action platform, we have sufficiently proven that forecasting on the basis of correct judgment of sentiment on even only one source like Twitter can produce relevant results in marketing, sales, operations and finance. Because, compared to politics, these disciplines deliver far more predictable data as they can combine external sources like social media with customer, production, logistics and financial data. And the social media actors and opinion leaders certainly produce less bias in these areas than is the -case in political statements. All this can be done on a continuous basis supporting day-to-day management in communication, supply chain, sales, etc...
If you want to know more about Data2Action, the platform that made this possible, drop me a line: contact@linguafrancaconsulting.eu 

Get ready for fact based decision making 
on all levels of your organisation





zaterdag 1 juni 2013

Book Review: Taming the Big Data Tidal Wave


By Bill Franks
The Wiley and SAS Business Series 2012

The Big Data Definition misses a “V”



Whenever I see a sponsored book, the little bird on my shoulder called Paranoia whispers “Don’t waste your time on airport literature”. But this time, I was rewarded for my stamina. As soon as I got through the first pages stuffed with hype and “do or die” messages the author started to bring nuanced information about Big Data.

I appreciate expressions of caution and reserve towards Big Data: most Big Data doesn’t matter (p.17) and The complexity of the rules and the magnitude of the data being removed or kept at each stage will vary by data source and by business problem. The load processes and filters that are put on top of big data are absolutely critical (p. 21).

They prove Franks knows his onions. Peeling away further in the first chapter, his ideas on the need for some form of standardisation are spot on.

But I still miss a clear and concise definition of what really distinguishes Big Data as the Gartner definition Franks applies (Velocity, Volume and Variety) misses the “V” from “Volatility”. A statistician like Franks should have made some reflections on this aspect. Because “Variety” and “Volatility” are the true defining aspects of Big Data.

Moving on to chapter two where Franks positions Web data as the original Big Data.

It’s about qualitative contexts, not just lots of text strings


It is true that web analytics can provide leading indicators for transactions further down the sales pipeline but relying on just web logs without the context may deliver a lot of noise in your analysis. Here again, Franks is getting too excited to be credible, for two reasons: you are analysing the behaviour of a PC in case of non-registered customers and even when you know the PC user, you are missing loads of qualitative information to interpret the clicks. Studies with eye cameras analysing promotions and advertising have shown that you can optimise the layout and the graphics using the eye movements combined with qualitative interviews but there is no direct link between “eyeballs and sales”. Companies like Metrix Lab who work with carefully selected customer panels also provide clickstream and qualitative analytics but to my knowledge using these results as a leading indicator for sales still remains very tricky. Captions like Read your customers’ minds (p.37) are nice for Hello magazine but are a bit over the top.

I get Big Data analytical suggestions from a well-known on  line book store suggesting me to buy a Bert doll from Sesame Street because my first name… is… you guessed? Imagine the effort and money spent to come up with this nonsense.

The airline example (p. 38-39) Franks uses is a little more complicated than shown in the book: ex post analysis may be able to explain the trade-offs between price and value the customer has made but this ignores the auction mechanisms airlines use whenever somebody is looking and booking. Only by using control groups visiting the booking site with fixed prices and compare them to the dynamic pricing group may provide reliable information.

Simple tests of price, product, promotion etc. are ideal with this Big Data approach. But don’t expect explanations from web logs. The chapter finishes with some realistic promises in attrition and response management as well as segmentation and assessing advertising results. But it is the note at the end that explains a lot: The content of this chapter is based on a conference talk… (p. 51)


Chapter three suggests the added value of various Big Data sources. Telematics, text, time and location, smart grid, RFID, sensor, telemetry and social network data are all known examples but they are discussed in a moderate tone this time. The only surprise I got was the application of RFID data in casino chips. But then it has been a while since I visited COMDEX in Vegas.

Moving on to the second part about technologies, processes and methods. It starts with a high level didactic “for Dummies” kind of overview of data warehouses, massive parallel processing systems, SQL and UDF,PMML, cloud computing, grid computing, MapReduce.

In chapter 5, the analytic sandbox is positioned as  a major driver of analytic value and rightly so. Franks addresses some architectural issues with the question of external or internal sandboxes but he is a bit unclear about when to use one or the other as he simply states the advantages and disadvantages of both choices, adding the hybrid version as simply the sum of the external and internal sandbox(p. 125 – 130).

Why and when we choose one of the options isn’t mentioned. Think of fast exploration of small data sets in an external system versus testing, modifying a model with larger data sets in an internal system for example.

When discussing the use of enterprise reusable datasets, the author does tackle the “When?” question. It seems this section has somewhat of a SAS flavour. I have witnessed a few “puppy dog” approaches of the SAS sales teams to recognise a phrase like: There is no reason why business intelligence and reporting environments; as well as their users, cant leverage the EADS (Enterprise Analytic Data Set (author’s note)) structures as well (p145). This where the EADS becomes a substitute for the existing –or TO BE-  data warehouse environment and SAS takes over the entire BI landscape. Thanks but no thanks, I prefer a best of breed approach to ETL, database technology and publication of analytical results instead of the camel’s nose. A sandbox should be a project based environment, not a persistent BI infrastructure. You can’t have your cake and eat it.


The sixth chapter discusses the evolution of analytic tools and methods and here Franks is way out of line as far as I am concerned. Many of the commonly used analytical tools and modelling approaches have been in use for many years. Some, such as linear regression or decision trees, are effective and relevant, but relatively simplistic to implement? (p. 154) I am afraid I am lost here. Does Franks mean that only complex implementations produce value in big data? Or does he mean that the old formulas are no longer appropriate? Newsflash for all statisticians, nerds and number crunching geeks: better a simple model that is understood and used by the people who execute the strategy than a complex model –running the risk of overfitting and modelling white noise- that is not understood by the people who produce and consume strategic inputs and outputs… Double blind tests between classical regression techniques and fancy new algorithms have often showed only slightly or even negative added predictive value. Because models can only survive if the business user adds context, deep knowledge and wisdom to the model.

I remember a shootout in a proof of concept between the two major data mining tools (guess who was on that shortlist!) and the existing Excel 2007 forecasting implementation. I regret to say to the data mining tool vendors that Excel won. Luckily a few pages further the author himself admits: Sometimes “good enough” really is! (p. 157)

The third part, about the people and approaches starts off on the wrong foot: A reporting environment, as we will define it here, is also often called a business intelligence (BI) environment.

Maybe Franks keeps some reserve using “is also often called” but nevertheless is reversing a few things which I am glad to restore in their glory. Business Intelligence is a comprehensive discipline. It entails the architecture of the information delivery system, the data management, the delivery processes and its products like reporting, OLAP cubes, monitoring, statistics and analytics…

But he does make a point when het states that massive amounts of reports … amount to frustrated IT providers and frustrated report users. Frank’s plea for relevant reports (p. 182) is not addressing the root cause.

That root cause is –in my humble opinion- that many organisations still use an end to end approach in reporting: building point solutions from data source to target BI report. That way, duplicates and missed information opportunities are combined because these organisations lack an architectural vision.


On page 183, Bill Franks makes a somewhat academic comparison between reporting and analysis which raises questions (and eyebrows).

Here’s the table with just one of the many comments I can make per comparison:

Reporting…
Analysis…
Just one remark (as you are pressed for time)
Provides data
Provides answers
So there are no data in analyses?
Provides what is asked for
Provides what is needed
A report can answer both open and closed questions: deviations from the norm as  answers to these questions and trend comparisons of various KPI’s leading to new analysis.
Is typically standardised
Is typically customised
OK, but don’t underestimate the number of reports with ten or more prompts: reports or analytics? I don’t care.
Does not involve a person
Involves a person
True for automated scoring in OLTP applications but I prefer some added human intelligence as the ultimate goal of BI is: improved decision making.
Is fairly inflexible
Is extremely flexible.
OK Bill, this one’s on me. You’re absolutely right!


The book presents a reflection on what makes a great analytic professional and how to enable analytic innovation. What makes a great analytic professional and a team? In a nutshell it is very simple: the person who has the competence, commitment and creativity to produce new insights. He accepts imperfect base data, is business savvy and connects the analysis to the granularity of the decision. He also knows how to communicate analytic results. So far so good. As for the analytic team discussion, I see a few discussion points, especially the suggested dichotomy between IT and analytics (pp. 245 – 247) It appears that the IT teams want to control and domesticate the creativity of the analytics team but that is a bit biased. In my experience, analysts who can explain not only what they are doing and how they work but also what the value is for the organisation can create buy in from IT.

Finally, Franks discusses the analytics culture. And this is again a call to action for innovation and introduction of Big Data Analytics. The author sums up the  barriers for innovation which I assume should be known to his audience.

Conclusion


Although not completely detached from commercial interests (the book is sold for a song, which says something about the intentions of the writer and the sponsors) Bill Franks gives a good C-level explanation of what Big Data is all about. It provides food for thought for executives who want to position the various aspects of Big Data in their organisation. Sure, it follows the AIDA structure of a sales call, but Bill Franks does it with a clear pen, style and elegance.

This book has a reason of existence. Make sure you get a free copy from your SAS Institute or Teradata account manager.

donderdag 23 mei 2013

What is Really “Big” about Big Data


Sometimes it is better to sit still and wait until the dust settles. Slowly but surely, the dust is settling around the Big Data buzz and what I am looking at does not impress me.
There are a lot of sloppy definitions of Big Data about. I have read incredible white papers, books and articles which all boil down to: “We don’t know what it is, but it is coming! Be prepared (to open your wallet).” In this article, I make an attempt to put certain aspects about Big Data in a Business Intelligence perspective and come to a definition that is more formal and truly distinguishes Big Data from just being a bigger database. I have a few challenges for you that will hopefully elicit some response and debate. These are the main challenges, intentionally served as bullet points as it will make bullet holes in some people’s assumptions.

  • Three V’s are not enough to describe the phenomenon
  • There is a word missing in the phenomenon’s naming
  • It has always been around: nihil novi sub sole
  • What’s really Big about Big Data


Between all vaguely defined concepts, vendor pushed white papers and publications, three “Vs” are supposed to define Big Data completely.
Gartner, who is always ready to define a new market segment as a target for analysis and studies as a new revenue source introduced these three “Vs” in 2008 or something.

 “Variety”, “Velocity” and “Volume” are supposed to describe the essence of Big Data. Let’s see if this holds any water. “Velocity”, the speed with which data should be processed and “Volume” are relative and subject to technological advances. When I did analysis for a large telecom organisation in 1 the mid-nineties, the main challenge was to respond fast and adequately to potential churn signals, hidden in the massive amounts of call data records (CDRs). Those CDRs were the Big Data of the nineties.
Today with the ever increasing numbers in social, mobile and sensor data the velocity and volumes have increased but so has the speed of massive parallel processing capabilities as well as storage and retrieval. Speed and volume are also an identical challenge for every  player in the competition. If you need to be on par with your competitors, take out your credit card and book some space on AWS or Azure and Bob’s your uncle. The entire Big Data infrastructure is designed to function on cheap hardware.

Two “Vs” that matter: Variety and Volatility


"Variety"

One of the real differentiators is “Variety”: the increased variety of data types that really defines what Big Data is all about.
This “V” you won’t read about in Big Data hype articles, yet this is a serious challenge. From the early EBCDIC and ASCII files to UTF-8, xml, BLOBs, audio, video, sensor and actuator data etc... making sense of all these data types can be a pain. But also the variety within a data type may add to complexity. Think of the various separation marks in weblogs. And it gets worse when analysing unstructured data like natural language where culture and context kicks in: events, subcultures, trends, relationships, etc... Even a simple mood analysis in the US on tweets about the death of Bin Laden was negatively interpreted since a simplistic count of the word “death” was used to parse the tweets on the minus side without taking the context into the equation.

”Volatility”

This is about the variations in sentiment, value and predictive power which is so typical for an important segment in Big Data. A few examples to make it clear.
If I give a “like” to Molson and Stella beers, will this still be valid next week, next month,…?
If I measure  consumer behaviour in its broadest sense: information requests, complaints, e-mails to the customer service, reviews on websites, purchases, returns and payment behaviour and I try to link this to behaviour on social media, will it correlate with the previous measures? Think of predicting election results on the basis of opinion polls ,don’t we often say one thing while we act totally different?
This is the real challenge of Big Data. The rewards are great. The organisation that can exploit data about moods, attitudes and intentions from unstructured data will have better forecasts and respond faster to changes in the market.
This way, Big Data realises a dream from the 1960’s: establishing a relationship between communication input and its impact on sales results. In other words, the DAGMAR [1] model from Russel Colley can materialise by analysing unstructured data from a broad range of sources.





[1] DAGMAR: the abbreviation of Defining Advertising Goals for Measuring Advertising Results.