Posts tonen met het label PRINCE2. Alle posts tonen
Posts tonen met het label PRINCE2. Alle posts tonen

woensdag 16 juni 2021

Managing a Data Lake Project Part II: A Compelling Business Case for a Governed Lake

 

In Part I A Data Lake and its Capabilities we already hinted towards a business case but in this blog we make it a little bit more explicit.




A recap from Part I: the data lake capabilities

The business case for a data lake has many aspects, some of them present sufficient rationale on their own but that depends of course on the actual situation and context of your organisation. Therefore, I mention about eleven rationales, but feel free to add yours in the comments.

 We are mixing on-premise data with Cloud based systems which causes new silos

The Cloud providers deliver software for easy switching your on-premise applications and databases to Cloud versions. But there are cases where this isn’t possible to do this in one fell swoop:

  • Some applications require refactoring before moving them to the Cloud;
  • Some are under such strict info security constraints that even the best Cloud security they can’t be relied on. I know of  retailer who keeps his excellent logistic system in something close to a bunker!
  • Sometimes the budget or the available skills are insufficient to support a 100 % Cloud environment, etc…

This provides already a very compelling business case for a governed data lake: a catalog that manages lineage and meaning will make the transition smoother and safer.

Master data is a real pain in siloed data storage, as is governance...

A governed data lake can improve master data processes by involving the end users in evaluating intuitively what’s in the data store. Using both predefined data quality rules and machine learning to detect anomalies and implicit relationships in the data as well as defining the golden record for objects like CUSTOMER, PRODUCT, REGION,… the data lake can unlock data even in technical and physical silos. 

We now deal with new data processing and storage technologies other than ETL and relational databases: NoSQL, Hadoop, Spark or Kafka to name a few

NoSQL has many advantages for certain purposes but from a governance point of view it is a nightmare: any data format, any level of nesting and any undocumented business process can be captured by a NoSQL database.

Streaming (unstructured) data is unfit for a classical ETL process which supports structured data analysis so we need to combine the flexibility of a data lake ingestion process with the governance capabilities of a data catalogue or else we will end up with a data swamp.

We don't have the time, nor the resources to analyse up front what data are useful for analysis and what data are not

There is a shortage of experienced data scientists. Initiatives like applications to support data citizens may soften the pain here and there but let’s face it, most organisations lack the capabilities for continuous sandboxing to discover what data in what form can be made meaningful. It’s easier to accept indiscriminately all data to move into the data lake and let the catalogue do some of the heavy lifting.

We need to scale horizontally to cope with massive unpredictable bursts of of data

Larger online retailers, event organisations, government e-services and other public facing organisations can use the data lake as a buffer for ingesting massive amounts of data and sort out its value in a later stage.  

We need to make a rapid and intuitive connection between business concepts and data that contribute, alter, define or challenge these concepts

This has been my mission for about three decades: to bridge the gap between business and IT and as far as “classical” architectures go, this craft was humanly possible. But in the world of NoSQL, Hadoop and Graph databases this would be an immense task if not supported by a data catalogue.  

Consequently, we need to facilitate self-service data wrangling, data integration and data analysis for the business users

A governed data lake ensures trust in the data, trust in what business can and can't do. This can speed up data literacy in the organisation by an order of magnitude.

We need to get better insight in the value and the impact of data we create, collect and store.

Reuse of well-catalogued data will enable this: end users will contribute to the evaluation of data and automated meta-analysis of data in analytics will reinforce the use of the best data available in the lake. Data lifecycle management becomes possible in a diverse data environment.

We need to avoid fines like those stipulated in the GDPR from the EU which can amount up to 4% of annual turnover!

Data privacy regulations need functionality to support “security by design” which is delivered in a governed data lake. Data pseudonimisation, data obfuscation or anonimisation come in handy when these functions are linked to security roles and user rights. 

We need a clear lineage of the crucial data to comply with stringent laws for publicly listed companies

Sarbanes Oxley and Basel III are examples of legislation that require accountability at all levels and in all business processes. Data lineage is compulsory in these legal contexts. 

But more than all of the above IT based arguments, there is one compelling business case for C-level management: speeding up the decision cycle time and the organisation’s agility in the market.

Whether this market is a profit generating market or a non-profit market where the outcomes are beneficial to society, speeding up decisions by tightening the integration between concepts and data is the main benefit of a governed data lake.

Anyone who has followed the many COVID-19 kerfuffles, the poor response times and the quality of the responses to the pandemic sees the compelling business case:

  • Rapid meta-analysis of peer reviewed research papers;
  • Social media reporting on local outbreaks and incidents;
  • Second use opportunities from drug repurposing studies;
  • Screening and analysing data from testing, vaccinations, diagnoses, death reports,…

I am sure medical professionals can come up with more rationales for a data lake, but you get the gist of it.

So, why is there a need for a special project management approach to a data lake introduction? That is the theme of Part III.  But first, let me have your comments on this blogpost.






vrijdag 24 juli 2015

The Future of Information Systems: Design from the Data

This third post in a series of three on BI programme management looks at a new way of designing systems for both transaction  and decision support to improve the organisation’s effectiveness further. I will examine the concept of BI architecture further and give hints of how BI programme management can evolve towards an ideal architecture which merges transaction and decision support systems in a powerful ensemble, ready for the new economic challenges.
I propose an “Idealtyp” knowing that no existing organisation can achieve this in less than a decade for reasons like sunk cost fallacies, the dialectics of progress and simply resistance to change.

But new organisations and innovators who can make the change will notice that the rewards of this approach are immense. They will combine architectural rigidity with business agility and improve their competitive power with an order of magnitude.

Why a BI Architecture is Necessary


I am a fan of Max Weber’s definition of “Idealtyp”[i], which has direct links with architecture in information technology. BI architecture is an abstraction of reality, and as such an instrument to better understand a complex organisation of hardware, network topologies, software, data objects, business processes, key people and organisational units. All these components interact in –what appears to outsiders- in a chaotic way. An architectural framework brings order to the chaos and provides meaning to all the contributors to the system.
Architecture is used as a benchmark, a to be situation by which the present state of nature can be measured. It is a more crisp and more manageable concept than CMM-like models which express maturity sometimes in rather esoteric terms. For a quick scan, this will do but for in-depth managing of the above mentioned BI assets, an architectural framework is better for BI environments.


CMM Level
BI symptoms
 Principal risks
 Initial
 A serious case of “spreadsheetitis”: every decision maker has its own set of spreadsheet files to support him in his battles with the other owners of spreadsheets. Everyday tugs of war over who has the correct figures.
Your project may never take off because of political infighting and if it does, there will be a pressing need for change management of the highest quality and huge efforts will have to be invested in adoption tracks.
 Repeatable
 The organisation uses some form of project management, in most cases inherited or even a carbon a copy of systems or application development
The project management method may be totally inadequate for a BI project leading to expensive rework and potential project failure in case everybody remains on his position.
 Defined
The organisation has a standard procedure for the production of certified reports. These can connect with one or more source systems in a standardised way: direct connection to the source tables, import of flat files, or some form of a data warehouse.
Resistance to change.
This depends on the way the organisation has implemented the data warehouse concept and how reversible the previous efforts are in a migration scenario.

 Managed
The development processes are standardised and monitored using key performance indicators and a PDCA cycle.
The iterative and explorative approach of BI project management may frighten the waterfall and RAD fans in the organisation. Make sure you communicate well about the specifics of a BI development track.
 Optimising
The development processes only need fine-tuning.
Analysis paralysis and infighting over details may hamper the project’s progress.

Table 2 Example of the BI version of the Capability Maturity Model as described in Business Analysis for Business Intelligence on page 202. In the book, it is positioned as a tool to help the BA with identifying broad project management issues

Why this "Idealtyp" is not Easy to Achieve


Proposing an ideal BI architecture is one thing, achieving it, another. I will only mention three serious roadblocks on the path towards this ideal BI architecture that unifies transaction systems and decision support systems: the sunk cost fallacy, the dialectics of progress and resistance to change.

The sunk cost fallacy is a powerful driver in maintaining the status quo; organisations suffering from this irrational behaviour consider they have invested so much effort, money, hardware, training, user acceptance and other irretrievable costs that they should continue to throw good money at bad money.  And sometimes the problem is compounded when the costs were spent on technology from market leaders.
No one ever got fired for buying… (fill in any market leader’s name)

No matter what industry you look at, market leaders fulfil their basic marketing promise: provide stability, predictable behaviour and a very high degree of CYA (google it) to the buyer. But that doesn’t mean the purchase decision is the best possible decision for future use. Market leaders in IT are also very keen on “providing” vendor lock-in, disallowing the client to adapt to changing requirements.
As a footnote: today, buyers are more looking at the market cap or the private equity of the Big Data technology providers than at their actual technical performance and their fit with the organisation’s requirements. Yes, people keep making the same mistakes over and over…

At the other end of the spectrum are the dialectics of progress:  this law was discovered by the Dutch journalist Jan Romein who noticed that gas lights were still used in London when other European capitals already used electricity.  This law suggests-and I quote an article on Wikipedia-  that making progress in a particular area often creates circumstances in which stimuli are lacking to strive for further progress. This results in the individual or group that started out ahead eventually being overtaken by others. In the terminology of the law, the head start, initially an advantage, subsequently becomes a handicap.
An explanation for why the phenomenon occurs is that when a society dedicates itself to certain standards, and those standards change, it is harder for them to adapt. Conversely, a society that has not committed itself yet will not have this problem. Thus, a society that at one point has a head start over other societies, may, at a later time, be stuck with obsolete technology or ideas that get in the way of further progress. One consequence of this is that what is considered to be the state of the art in a certain field can be seen as "jumping" from place to place, as each leader soon becomes a victim of the handicap. 
(From:  https://en.wikipedia.org/wiki/Law_of_the_handicap_of_a_head_start)

As always, resistance to change plays its role. New tools and new architectures require new skills to be trained, new ways of working to adopt and if one human species has trouble adapting to new technologies it is… the tech people. I can produce COBOL programmers who will explain to you that COBOL is good enough for object oriented programming or IMS specialists who see nothing new in the Big Data phenomenon…


What is BI Architecture?

Here’s architecture explained in an image. Imagine Christopher Wren would have disposed of modern building technologies. Then either the cathedral, based on the architecture “as is” would have looked completely different, with higher arches, bigger windows, etc… Or,… the architecture could have evolved as modern technology would have influenced Wren’s vision on buildings.
Exactly this is what happens in BI architecture  and BI programme management.

Figure 5 On the left: architecture, right: a realisation of architecture as illustrated by Wren’s Saint-Paul’s Cathedral


Architecture descriptions are formal descriptions of an information system, organized in a way:
  • that supports reasoning about the structural and behavioural properties of the system and its evolution.
  • These descriptions define the components or building blocks that make up the overall information system, and
  • They provide a plan from which products can be procured, and subsystems developed,
  • that will work together to implement the overall system.
  • It thus enables you to manage your overall IT investment in a way that meets the needs of your business.
It is also the interaction between structure, which is requirements based, and principles applicable to any component of the structure.

What is the Function of BI Architecture? 

BI Architecture should reflect how the BI requirements are realized by services, processes, and software applications in the day-to-day operations. Therefore, the quality of the architecture is largely determined by the ability to capture and analyse the relevant goals and requirements, the extent to which they can be realized by the architecture, and the ease with which goal and requirements can be changed. 

Figure 6 The Open Group Architecture Framework puts requirements management at the centre of the lifecycle management. The connection with business analysis for business intelligence is obvious. 


Reality Check: the Two Worlds of Doing and Thinking

Now we have established a common view on BI architecture and programme management, it is time to address the murky reality of everyday practice.
Although Frederick Taylor and Henri Fayol’s ideas of separation between doing and thinking have been proven inadequate for modern organisations, our information systems still reflect these early 20th Century paradigms. You have the transaction systems where the scope is simply: execute one step after another in one business process and make sure you comply with the requirements of the system. This is the world of doing and not thinking. Separated from the world of doing is the world of thinking and not doing: decision support systems. The business looks at reports, cubes and analytical results extracted from transaction and external data and then makes decisions which the doers can execute.
What if the new economy were changing all this in a rapid pace? What if doing and thinking came together in one flow? That’s exactly what the Internet is creating, and I am afraid the majority of organisations are simply not ready for this (r)evolution. Already in 1999, Bill Gates and Collins Hemingway[ii] wrote about empowering people in the digital age when they gave us the following business lessons:
  • q  The more line workers understand the inner workings of production systems, the more intelligently they can run those systems.
  • q  Real-time data on production systems enables you to schedule maintenance before something breaks.
  • q  Tying compensation to improved quality will work only with real-time feedback of quality problems.
  • q  Task workers will go away. Their jobs will be automated or combined into bigger tasks requiring knowledge work.
  • q  Look into how portable devices and wireless networks can extend your information systems into the factory, warehouse and other areas.

I am afraid this advice still needs implementation in many organisations. The good news is that contemporary technologies can support the integration of doing and thinking. But it will require new architectures, new organisational and technological skills to reap maximum benefits from the technology.

The major and most relevant BI programme management decision criterion will be the answer to the question: “Which quality data yield the highest return in terms of competitive advantage?


Bringing IT Together: Design from the Data


What if we considered business processes as something that can change in 24 hours if the customer or the supplier wants it? Or if competitive pressure forces us to change the process? What if information systems would have no problem supporting changing business processes because the true cornerstone, surviving any business process is data? This could be a real game changer for industries that still consider data as a product of a business process instead of the objective of that process.
The schema below describes a generic architecture integrating transaction and decision support systems in one architectural vision. Let’s read it from left to right.
Any organisation has a number of business drivers, for example as described by Michael Porter’s generic strategies: be the cost leader, differentiate from the competition or focus on a niche. Parallel with the business drivers are decision making motives such as: “I want complete customer and product insight” and finally, the less concrete but very present knowledge discovery driver to make sure organisations are always in the lookout for unpredictable changes in the competitive environment. These three drivers define a number of business objects, both static and dynamic. And these entities can be endogenous to the organisation (like customer, channel, product, etc..) or they can be external like weather data, currency data, etc…. These business objects need to be translated into data objects suitable for transaction and decision support

Figure 7 This is the (condensed) target architecture of an integrated  “Big Data Warehouse”: combining batch and stream processing using low latency for operational intelligence and aggregate data for tactical and strategic decision making. Built from the ground up using data in stead of business processes as the analytic cornerstone.

Conclusion: an integrated view on transactions and decision making will improve BI programme management supported by this architectural vision. The major and most relevant BI programme management decision criterion will be the answer to the question: “Which quality data yield the highest return in terms of competitive advantage?” And thus, which project (whether on the transaction or decision support systems need the highest priority in allocation of resources? 



[i] According to the excellent website http://plato.stanford.edu/entries/weber/  this is the best description of Max Weber’s definition:
“The methodology of “ideal type” (Idealtypus) is another testimony to such a broadly ethical intention of Weber. According to Weber's definition, “an ideal type is formed by the one-sided accentuation of one or more points of view” according to which “concrete individual phenomena … are arranged into a unified analytical construct” (Gedankenbild); in its purely fictional nature, it is a methodological “utopia [that] cannot be found empirically anywhere in reality”. Keenly aware of its fictional nature, the ideal type never seeks to claim its validity in terms of a reproduction of or a correspondence with reality. Its validity can be ascertained only in terms of adequacy, which is too conveniently ignored by the proponents of positivism. This does not mean, however, that objectivity, limited as it is, can be gained by “weighing the various evaluations against one another and making a ‘statesman-like’ compromise among them”, which is often proposed as a solution by those sharing Weber's kind of methodological perspectivism. Such a practice, which Weber calls “syncretism,” is not only impossible but also unethical, for it avoids “the practical duty to stand up for our own ideals”.”

What is less known is that Weber used the concept also in decision making theory when he analysed the outcome of the Battle of Köninggratz, where Von Moltke defeated the Austrian-Bavarian coalition against Prussia and its allies in 1866, an important phase in the unification of Germany.


[ii] “Business at the Speed of Thought” Bill Gates and Collins Hemingway, Penguin Books, London England, 1999 pp 293 -294