vrijdag 1 maart 2019

About Ends and Means, or Beginning and Ends...

It has been a while since I published anything on this blog. But after having been confronted with organisations that –from an analytics point of view- live in the pre-industrial era, I need to get a few things off my chest.
In these organisations (and they aren’t the smallest ones)  ends and means are mixed up, and ends are positioned as the beginning of Business Intelligence. Let me explain the situation.

Ends are the beginning

sea ice
A metaphor for a critical look at reporting requirements is like watching heavy drift ice 
and wondering whether it’s coming from a land based glacier or from an iceberg...

Business users formulate their requirements in terms of reports. That’s OK, as long as someone, an analyst, an architect or even a data modeller understands this is not the end of the matter, on the contrary.
Yet too many information silos have been created when this rule is ignored. If an organisation considers report requirements as the start of a BI project they are skipping at least the following questions and the steps needed to produce a meaningful analytics landscape that can stand the test of time:

  • New information silos emerge with an end-to-end infrastructure to answer a few specific business questions leaving opportunities for a richer information centre unexplored.
  • The cost per report becomes prohibitive. Unless you think € 60.000 to create one (1) report is a cinch…
  • Since the same data elements run the risk of being used in various data base schemas, the extract and load processes pay a daily price in terms of performance and processing cost.

Ends and means are mixed up

A report is the result of an analytical process, combining data for activities like variance analysis, trend analysis, optimisation exercises, etc.. As such it is a means to support decision making; so rather than accepting the report requirements as such, some reverse engineering is advised:

What are the decisions to be made for the various managerial levels, based on these report requirements?

You  may wonder why this obvious question needs to be asked but be advised, some reports are the equivalent of a news report. The requestor might just want to know about what happens without ever drawing any conclusions let alone linking any consequences to the data presented.

What are the control points needed by the controller to verify aspects of the operations and their link to financial results?

Asking this question almost always leads to extending the scope of the requirements. Controllers like to match data from various sources to make sure the financial reports reflect the actual situation.

What are the future options, potential requirements and / or possibilities of the required enhanced with the available data in the sources?

This exercise is needed to discover analytical opportunities which may not be taken at the moment for a number of reasons like: insufficient historical data, lacking analytical skills to come up with meaningful results… But that must not stop the design from taking the data in scope from the start. Adding the data in a later stage will come at a far greater cost than the cost of the scope extension.

What is the basic information infrastructure to facilitate the above? I.e. what is the target model?

A Star schema is the ideal communication platform between business and tech people.
Whatever modelling language you use, whatever technology you use (virtualisation, in memory analytics, appliances, etc…) in the end the front end tool will build a star schema. So take the time to build a logical data star schema model that  can be understood by both technical people and business managers.

What is the latency and the history needed per decision making horizon?

The latency question deals with a multitude of aspects and can take you to places you weren’t expecting when you were briefed about report requirements. As a project manager I’d advise you to handle with care as the scope may become unmanageable. Stuff like (near) real-time analytics, in database analytics, triple store extensions to the data warehouse, complex event processing mixing textual information with numerical measures… But as an analyst I’d advise you to be aware of the potentially new horizons to explore.
The history question is more straightforward and deals with the scope of the initial load. The slower the business cycle, the more history you need to load to come up with useful data sets for time series analysis.

What data do we present via which interface to support these various decision types?

This question begs a separate article but for now, a few examples should make things clear.
Static reports for external stakeholders who require information for legal purposes,
  • Reports using prompts and filters for team leaders who need to explore the data within predetermined boundaries,
  • OLAP cubes for managers who want to explore the data in detail and get new insights,
  • A dashboard for C- level executives who want the right cockpit information to run the business,
  • Data exploration results from data mining efforts to produce valid, new and potentially useful insights in running the business.

If all these questions are answered adequately, we can start the data requirements collection as well as the source to target mappings.

Three causes, hard to eradicate

If your organisation shows one or more of these three causes, you have a massive change management challenge ahead that will take more than a few project initiation documents to remedy. If you don’t get full support from top management, you’d better choose between accepting this situation and become an Analytics Sisyphus or look for another job.

Project based funding

Government agencies may use the excuse that there is no other way but moving from tender to tender, the French proverb “les excuses sont faites pour s’en servir” [1] applies. A solid data and information architecture, linked to the required capabilities and serving the strategic objectives of a government agency can provide direction to these various projects.
A top performing European retailer had a data warehouse with 1.500 tables, of which eight (8!) different time dimensions. The reason? Simple: every BU manager had sovereign rule over his information budget and “did it his way” to quote Frank Sinatra.

Hierarchical organisations

I already mentioned the study of Prof. Karin Moser introducing three preconditions for knowledge co-operation: reciprocity, a long term perspective for the employees and the organisation and breaking the hierarchical barriers. [2]
On the same pages I quote the authors Leliveld & Vink and Davos & Newstrom who support the idea that knowledge exchange based on reciprocity can only take place in organisational forms that present the whole picture to their employees and that keep the distance between co-workers and the company’s vision, objectives, customers etc. as small as possible.
Hierarchical organisations are more about power plays and job protection than knowledge sharing so the idea of having one shared data platform for everyone in the organisation to extract his own analyses and insights is an absolute horror scenario.

Process based support

Less visible but just as impactful, if IT systems are designed primarily for process support instead of attending as well to the other side of the coin, i.e. decision support, then you have a serious structural problem. Unlocking value from the data may be a lengthy and costly process. Maybe you will find some inspiration in a previous article on this blog: Design from the Data.
In short: processes are variable and need to be flexible, what lasts is the data. Information objects like a customer, an invoice, an order, a shipment, a region etc… are far more persistent than the processes that create or consume instances of these objects.

 [1]    Excuses are made to be used
 [2]    Business Analysis for Business Intelligence pp. 35 -38 CRC Books, a Taylor & Francis Company October 2012

zaterdag 29 december 2018

Roadmap to a successful data lake

A few years ago, a couple of eCommerce organisations asked my opinion on the viability of a data lake in their enterprise architecture for analytical purposes. After careful study the result was 50 – 50: one organisation had no immediate advantage investing in a data lake. It would become just another data silo or even a data junk yard with hard to exploit data and no idea of the added value this would bring.
The other -€ 1 bn plus company- had all the reasons in the world to start exploring the possibilities of a repository for semi-structured and unstructured data. But it would take them at least two years to set up a profitable infrastructure. Technology was not the problem: low cost processing and storage as well as the software -mainly open source- was no problem. They even had no problem attracting the right technical profiles as their job offers topped everyone in the market. No, the real problem was integrating and exploiting the new data streams in a sensible and managed way. As I am about to embark on a new mission to rethink an analytical infrastructure with the data lake in scope, I can share a few lessons from the past and think ahead for what’s coming.

Start from the data and work your way up to the business case

Analyse the Velocity, Variability and Volume of the data to meet the analytical requirements

Is it stable and predictable? Then it’s probably an indication that your organisation is not yet ready for this investment. But if there is a rapid growth rate in at least one of these three Vs, you better get planning and designing your data lake.


  •         What time do we need to close the skills gap and manage a Hadoop environment professionally?
  •        What is a realistic timeframe to connect, understand and manage the new semi-structured and unstructured data sources?


  •         Do we put every piece of data in the lake and write off our investments in the classical BI infrastructure or do we choose a hybrid approach where only new data types will be filling the lake?

o   In case of a hybrid approach, do we need to join between the two data sources?
o   In case of a total replacement of the data warehouse, do we have the proper front end tools to make the business users exploit the data or do they have to rely on data scientists and data engineers, potentially creating a bottleneck in the process?
  •        How will we process the data? Do we simply dump it and leave it all to the data scientists to make sense of it or do we plan ahead on some form of modelling on the Hadoop platform, creating column families which are flexible enough to cope with new attributes and which will make broader access possible?
  •        Do we have a metadata strategy that can handle the growth, especially from a user-oriented perspective?
  •        Security and governance are far more complex in a data lake than in a data warehouse. What’s our take on this issue?

Check the evolution of your business requirements

It’s no use to invest in a data lake when the business ambitions are on a basic level and stuff like a balanced scorecard is just popping up in the PowerPoints from the CEO.
Some requirements are very clear on their data needs, but others aren’t. It may take a considerable amount of analysis to surface the data requirements for semi-structured and unstructured data.
And with legislation like the GDPR, some data may be valuable but also very hard to get as the consumer is more and more aware of his position in the data game. That’s why very fine-grained opt-ins are adding complexity to customer data management.

Develop a few winning use cases

“A leader is someone who has followers” is quite applicable in this situation. You are after all challenging the status quo and if there’s one thing I’ve learned in 30 years in analytics and ICT in general: a craftsman is very loyal to his tools. Managing change in the technical department will not be a walk in the park. It may require adding an entire new team to the department or at least have some temporary professionals come in to do the dirtiest part of the job and hand over the Hadoop cluster in maintenance mode to the team.

To enable all this, you need a few winning use cases that appeal to the thought leaders in the organisation. Make sure you pick sponsors with clout and the budget to turn PowerPoints into working solutions.

There certainly will be use cases for marketing, finance and operations. Look for the maximum leverage and get funded. And by the way, don’t bother the HR department unless you are working for the armed forces. They always come last in commercial organisations…

donderdag 19 april 2018

How to make progress in a political organisation

Why Business Analysis and politics don’t mix.

After thirty years of practice in all sorts and flavours of organisations there’s one that stands out as a tough conundrum for any business analyst and by extension enterprise architect as well as project managers. It’s the political organisation, so eloquently described by Henry Mintzberg. 
The problem with these organisations for a business analyst, project manager or enterprise architect is identical: setting priorities to determine the first iteration of the development cycle. This lack of priority ranking may lead to scope creep, projects that never deliver the product or a user community that is not on board, etc…

Forces in a political organisation

Wouldn't we all like to work in Tom Davenports Analytical Organisation?

In the paragraph “Decisions, Teams, and Groups at Work, Classification of Decision-Making Environments, I use a simple matrix to describe decision-making contexts for BI projects. But, believe me, you can use it for any project type.

You don’t need much time to determine if you’re in a political organisation. Look for committees that make the ultimate decisions, look for a lack of accountable individuals, slow decision making processes and a track record of projects that failed to deliver the intended product. Of course government bodies are by definition political but you will also find them in the private sector.

How to recognise a political organisation before you’re even at the reception desk?

Maybe this table can help:

Political organisations, by definition, don’t have shared goals. Each alderman, state secretary, each manager, wants to score his goals without letting the team take any credit for it. Because re-election or promotion matter… And political organisations always differ on the cause and effect chains which shows clearly in analytical projects.

Setting priorities in a political organisation

You can imagine that this is the toughest conundrum to solve; if you can’t prioritise “because everything is important” you can’t even start an analysis track. Unless you simply want to sell billable hours… And prepare for a debriefing and passing the buck, dodging any responsibility.
But if you’re a hired gun that may be exactly why you’ve been hired: to take the blame for the organisation’s ineptness to take responsibility and make choices even if they go against some members of the team. (I use “team” for want of a better word in a political organisation)
In this post, I am giving you a few tips and tricks to force the “team” to come up with priorities.

But first some context. The organisation is looking for a new way to analyse structured and unstructured data; Therefore it needs a modern data architecture. Your job as business analyst (and by extension project manager and enterprise architect) is to know what the strategic priorities of the organisation are.  This needs to match with the available data and information needs. You need to check the feasibility and then choose the first iteration to deliver analytical results.  A best practice is to check the organisation’s strategy, its initiatives to improve the organisation’s position in case of a commercial entity or the level of societal utility in case of a governmental or non for profit organisation.
Imagine the first intake with the project sponsor, the product owner and any other stakeholder who has been identified in the project structure.

Here’s the dialogue:

Business Analyst: At the kick off of this analysis track, I’d like to determine with you the first iteration: where we start analysing, designing and building the first deliverables.
The “team”: (silence)
Business Analyst: Do you have a project portfolio and do you use program management to prioritise the management actions? Do you have mission and vision statement for this project?
The “team”: We thought you could formulate the vision and the mission for the project. And no we don’t have a project portfolio. We do have an Excel sheet with a list of all the projects and their status.
Business Analyst: Could we infer from the status what the priorities are?
The “team”: No.
Business Analyst: What if we look at the budget per management project. Maybe the size says something about the priority? Or what if look at rejected project proposals and the reasons? Maybe that says something about the criteria.
The “team”: Not necessarily. First of all, all management project requests are answered positively and funds are allocated to these projects. Some projects may have big budgets but that doesn’t indicate anything about their importance.
Business Analyst: What about the number of full time equivalents allocated to each project?
The “team”: A high number may indicate something about the complexity or the scope but that doesn’t tell you what priority the project has.
Business Analyst: I think this one may help us out: have you indicated the origins of leakages and losses in your business processes and could those numbers give us a hint of what’s important to the management team?
The “team”: Leaks and losses are handled by the management team and as such are equally important.
Business Analyst: Does the amount of data, the connection with business processes and the variety in the data give us a clue where we should start the project?
The “team”: That’s we are hiring you as a Business Analyst.

Now it gets tricky and you make the call, as The Clash sing: “Should I stay or should I go”

Here are few of the killer questions and remarks that will lead you to the exit:

  • What projects will get or got the most press coverage?
  • What if you had to choose, right now?
  • Do you expect me to deliver a successful end result if you don’t know what you want?


More on decision making contexts in the book “Business Analysis for Business Intelligence” p. 203 – 213 

Is there way out? Maybe.

The only escape route I can think of is to start with a stakeholder analysis. Try defining the primary stakeholders and map them on a RACI matrix. If that works, you can develop your first iteration with some confidence, knowing that danger is always on the road ahead..

Example of a stakeholder analysis that turns out well: the CEO’s desk is where the buck stops.

If a stakeholder analysis is inconclusive, there must be someone who’s not involved in the official decision making unit (DMU) who is the primary influencer. Now you’ll have to get out of your comfort zone as an analyst and start thinking like an account manager.

I was lucky to have training in the Miller Heiman Strategic Selling method as well as the Holden Power Base Selling method. It sharpened my skills for identifying and influencing these hidden decision makers. So here’s my advice: check out these two books. They will increase your efficiency in political organisations with an order of magnitude.
Target account selling; Fox hunting

The new strategic selling is an update of the original, worth reading for any novice in business analysis and project management.
This is Jim Holden’s original book. Of course, as things go in this business, there were many to follow up on his success. Start here anyway.

zondag 24 december 2017

Getting practical: How Analytics Can Drive the Information Architecture Development

Does the theory presented in the previous article work in practice? That is the theme of this post where I present an (anonymous) case from a project I did for a customer.
But before I proceed, a quick reminder from my book “Business Analysis for Business Intelligence”.
What every organisation needs to know boils down to four C’s. It is information about the customer, the cost, the competition and the competences of the organisation, the latter also represented by a higher level of abstraction: the capabilities.
The illustration below shows how these four C’s are the foundation of a balanced scorecard. But a balanced scorecard measures only the intended –or planned- strategy, not the emergent strategies. Therefore, this 4 C framework has a much broader scope and includes decision support for emergent strategies. 

To develop a shared knowledge of the customer, this organisation needed to embed a business rule in the data namely that contacts are associated with an account. This, because the organisation is an exclusive business-to-business marketing machine selling to large corporations. A contact without this association was registered and kept in a staging area, waiting to be associated with an account. In other words: only contacts related to an organisation were of use to the business. At least, in the present context. 

Today, this rule is cast in stone in a monolithic CRM application but the CIO wishes to migrate to a service factory in the near future. This way, when the business rule would change or when the company would move to a B2C market, the CRM processes would be easier to adapt to the new situation. A transition plan for all customer data needs to be developed.
Lingua Franca used the following phased approach:
  1. ·        Mapping the customer data in a data portfolio
  2. ·        Study the ASIS
  3. ·        Link capabilities to analytics
  4. ·        Map the capabilities on the data portfolio
  5. ·        Define the information landscape
  6. ·        Make the mapping analytics – transactional data
  7. ·        Define the services
  8. ·        Decide on the physical architecture

Mapping the customer data in a data portfolio 

A lot of customer data is of strategic value and a lot isn’t. That led us to use a modified version of McFarlan’s portfolio approach to information systems which can just as well be applied to data.
Variant on: McFarlan, F. W. (1981). "Portfolio approach to information systems.
"Harvard Business Review (September–October 1981): 142-150

The analytics version of this schema translates the four quadrants into workable definitions:
Strategic Data: critical to future strategy development: both forming and executing strategy are supported by the data, as well as emergent strategies where data might be captured outside the exiting process support systems.  The reason is clear: process support or transaction support systems are designed and tuned for the intended strategy. 
Turnaround Data: critical to future business success as today’s operations are not supported, new operations will be needed to execute. These data are often not even in scope of the emergent strategy processes. They may be hidden in a competitor’s research, in technological breakthroughs, in new government regulations or in consumer outcries against abuse to name a few sources.
Support Data: Valuable but not critical to success
Factory Data: critical to existing business operations: the classical reports, dashboards and scorecards
In this case, the association between account and contact was considered factory data as it describes the way the company is doing business today

As the illustration below in the Archimate model shows, there is a cascading flow of business drivers and stakeholders that influence the business goals which in their turn impact the requirements that are realised by business processes. These are supported by legacy systems and new software packages or bespoke applications. The result of this approach is a dispersed view on the data that are used and produced in these applications. What if not processes but data would be at the base of the requirements? Would this change the organisation’s agility? Would it enhance responsiveness to external influence? That was the exercise we were preparing for. 


Study the ASIS

Today, the business process of account and contact registrations is as follows:

The present CRM monolith supports this process but future developments like the takeover of a more consumer oriented business may change the business model and the business process drastically. Thus, the self-service registration process should make the link between contact and account optional and the validation process should only deal with harmonising data to make sure the geographical information is correct and contact data are uniform as far as (internal) naming conventions and (external) reference data are concerned. It is already a great step forward that the company uses a master data management system to separate data management from process management. This enables a smoother transition to the new information architecture development method. 

Link capabilities to analytics

Therefore an extensive inventory of all potentially needed business capabilities is undertaken and linked to the relevant business questions supporting these capabilities.
In this example we present a few of these present and future business questions:
What is the proportion of contacts from our B2B customers that may be interested in our consumer business?
Which accounts may experience a potential threat from our new consumer business unit?
Which contacts from the B2C may become interested in our B2B offerings?
Which products from the B2C unit may prove sellable via the B2B channels?
By listing all the relevant present and future business questions, it becomes clear that the account validation process as it is defined today may need to change and what is considered factory data today may get an “upgrade” to strategic and turnaround data to deal with the challenges.

Map the capabilities on the data portfolio

In this diagram, the entire data landscape of the account – contact association is charted and managed via five methods. 
Operational business metadata describe the context in which data is created, updated and deleted as well as the context in which it is used. A minimum deliverable is instructions and training for the people who perform the CRUD operations.Process metadata relates the business process (present and future) to the business context to provide the process stakeholders with information and motivation: the what, why, when and who of the process and the data captured.Business Intelligence metadata describes the decision support possibilities in the present and future clients: dashboards, reports, cubes, data sets for further examination,…Process alignment: it describes what is often a mutual adjustment between a monolithic application and the business process it supports. Some market leaders in OLTP software present their process flows as best practice. As if all businesses should converge in their way of doing business…ETL Architecture documents the lineage from source to target, the transformations, quality measures, as well as the technical aspects of the process i.e. parallel or sequential loading, dropping of indexes and rebuilding them, hashing, etc… 

Define the information landscape

Even in this simple customer – account relationship some thinking needs to be done about a holistic view on the essential elements defining the relationship. By “essential” I mean the minimum attributes and levels of detail that need to be shared outside the context of CRM to be used in other business functions like HR, operations, finance,…
Here are a few of the considerations to be made:
How long is a customer considered as such? If the average buying frequency of your product is twice a year, for how many years do you keep the relationship active if for three years no order has come in? How do we compare the account performance in case of mergers? Does an account always need a DUNS number? Or a VAT registration? What about informal groups regularly doing group purchases? Discussing these and many other issues lays the foundation for a data governance process. 

Make the mapping analytics – transactional data

This phase is crucial for the quality of your decision support system and is very much like the business analysis process for analytics. Start with high level concepts and descend to the lowest grain of attributes and transaction records as well as external sources like social media, open data and market research data.For instance: “customer loyalty” is expressed as “a constantly high share of wallet over an average historic period of time of three years and a projected future loyalty period of another three years”.
Can you imagine the data needed to make this definition work? 
The exercise at this customer’s site produced 87 different data types coming from the ERP and CRM systems as well as external data like Net Promotor Scores, contact centre chat data, e-mails and response to LinkedIn posts. It sparked new ways of customer interaction procedures: new sales and order processing methods as well as new aftersales initiatives, the organisation would never have come up with if it hadn’t done this exercise.

Define the services

To move from the monolithically based approach to a more micro service oriented architecture, we needed to decompose the monolith into distinctive reusable services and data components. This approach forces a strict quality management for the data in scope as errors or poor quality will reflect on an enterprise scale. On the other hand, this “do it right the first time” principle avoids replication of work and improves the quality of decision making drastically.The schema below needs some explanation. The intake service triggers the validation service which checks the contact and account data with reference data, Chamber of Commerce data and, when finished, triggers the registration service which in its turn triggers the master data update service. MDM contact is now a superclass of this contact and will be used enterprise wide. Four services now ensure reusability for not just the CRM application but for all other use cases in the organisation. And the data quality improves drastically as the “do-it-right-the-first-time” principle is easier to fund for enterprise wide data. 

Decide on the physical architecture

The classical approach using at least two environments is becoming obsolete for organisations that want to stay ahead of the competition. The separation between transaction processing and analytical processing will go out the window in the next few years. Not only because of the costly maintenance of Extract Transform and Load (ETL) processes between the transaction systems and the data warehouse but first and foremost because of the lack of integration with unstructured data that are in Hadoop Distributed File Sets (HDFS) or streaming data that are caught in Resilient Distributed Datasets (RDD)
The organisation needs a significant leap forward and is now examining the Vector in Hadoop solution, a database that combines the classic SQL environment with NoSQL. The reasons are supported by objective facts: a rapidly scalable full ACID SQL database based on HDFS. It supports modify, insert and updates using a patented technology developed at the University of Amsterdam: the Positional Delta Trees (PDT). More on this in their paper which is published here. The short version of PDT: a separation between the write and read store where updates are merged into the write store at run time using the row index for a correct positioning of the modify/insert or update. The result? Online updates without impacting the read performance. Since the database can also access Spark’s parallel processing capability combining Spark RDD architecture accessed from the SQL perspective so that queries that were previously impossible to consider, this system combines the very best of three worlds: ACID based transaction support, complex event processing and HDFS support for unstructured data analytics with a flexible approach to changing data influx –provided you do your homework and define the column families in the broadest possible sense to fit your analytical needs. 
Data loading – if that is the purpose - can be achieved at a rate of around 4TB per hour comprising four billion ‘120 column’ tuples per hour on a 10 node Hadoop cluster – or around  500 billion columns per hour in total! (many caveats apply but it is still a remarkable performance.
The advantage of this architecture will be exploited to the maximum if the data architecture is connecting transaction data, which are by definition microscopic and consistent, to analytical concepts which are macroscopic, flexible and fuzzy. So here is –finally!- my sales pitch: do your proper business analysis for analytics well. Because the cost of preparing for a well thought through system is a fraction of the license-, hardware- and maintenance cost.

Epilogue: an initial approach

A first attempt to map the various data ingestions to Vector H and the consumers of the data was made as illustrated below. This has a few consequences we will discuss in the next few paragraphs.

A more in depth example of Vector H’s power

One aspect will be along the Spark line – the ease of facilitating combined queries that incorporate data that is held in Hadoop with managed structured data in a way that standard BI tools simply query the database in the same way that they do a standard SQL database. I.e. the user does not need to use ETL or ELT separately from the actual BI query for ad-hoc queries once they have defined the external table as referencing the Hadoop data. It is hard to define the simplification this brings.  In its simple form – it’s like the data really is inside the Vector database. This brings the advantage that current solutions – including off the peg turnkey applications can access this data.

This example shows the declaration made by the DBA, once this is done, the end users’ business layer will simply see ‘tweets’ as a table that can be joined to actual tables 

(username VARCHAR(20),
tweet VARCHAR(100),
timestamp VARCHAR(50))
WITH REFERENCE='hdfs://blue/tmp/twitter.avro',

This command will select tweets that are made which are from customers only, those from non-customers will be ignored:
SELECT tw.username , cust.firstname, cust.lastname, tw.tweet 
FROM tweets  tw,
            Customers cust
WHERE tw.username = cust.username 
Similar queries can track non-customer queries.
Where possible restrictions will be pushed down to the Spark ( Map Reduce and Scala level ) in Hadoop to be answered. The data never needs to be stored. Of course some data may be required to be added to the structured data. I already applied this in a customer analysis project where I illustrated how the results from Big Data analytics can be transformed to dimensions in the “classical” data warehouse:

To conclude: will hybrid architectures make data modelling obsolete?

I can’t yet generalise this for all hybrid databases but at least from Vector H we know that there is a serious chance. It uses a partition clause that distributes data on the hash of one or more columns that have a minimum of 10X unique values evenly distributed as the number of partitions you are using.

Vector H is therefore the most model agnostic data store I know. You simply create a schema, load data and run queries. There is no need for indexing or some form of normalisation with this technology.
Whereas the need fort 3NF, Data Vault or Star schemas may become less important, governing these massive amounts of data in a less organised way may become the principal issue to focus on. And metadata management may become the elephant in the room.

dinsdag 25 juli 2017

What if Analytics Drove the Information Architecture Development?


Information architecture helps people to understand their work field, their relationships with the real world as well as with the information systems which are supposed to reflect the real world.
Information architecture deals with objects, their relationships, hierarchies, categories and how to store them in and retrieve them from applications, files, websites, social media and other sources I forget to mention…
With the massive expansion of sensor data rebaptised “the Internet of Things”, social media and linked open data, these semi structured and unstructured data are adding complexity to the information architecture.
On the other hand, hypercompetitive environments force agility upon the larger corporations as the next garage start-up may overthrow their business model and their dominance in an incredible short time span. This agility is translated in flexible applications with point and click business process reengineering.
So how does all this affect the information architecture development? That is the approach to submit to your judgement in the next paragraphs.

Analytics, the classical chain of events

In many large organisations, the process can be described in eight separate stages:
A business question is formulated, e.g. who are my most loyal customers from the past that may be vulnerable to competitive offers?
The data analyst starts looking for data that can contribute to an answer by breaking the business question into related questions, e.g. which customers have given proof of price sensitivity? Which customers have shown a downward trend in their net promotor score? Which customers are reducing their purchases of consumables, Etc…
Gathering the data is the next step: in transaction systems, market research data, social media, e-mails,…
Manipulating the data: from simple cleaning and conforming operations to very complex pipeline processing of text and web URLs to make the data useful for analysis
But before that, visualisation may already provide intuitive insights: histograms, heat maps, bubble charts and the likes may show you approaches for further analysis
Analysing the data with the possibilities offered to analyse text, the old dichotomy between quantitative and qualitative research has become obsolete. Modern analytics is about hop skip and jump between the two extremes: quantitative approaches will tell you about the proportion of clients that may look for greener pastures whereas qualitative analytics will probe for reasons and root causes.
Interpreting the data may follow more intuitive paths where extra information is added, opinions are collected using the Delphi technique or other qualitative approaches to add useful meaning and actionable insights to the analysis. E.g. developing a customer scoring model that is broadly used and understood in the organisation.
The hardest part is the last phase: integrate the data and the analytics in the decision making process. To conclude with our example: developing scripts and scenarios for the call centre agents that pop up whenever a client with a potential defection risk calls the company.

Architecture development, the classical chain of events

TOGAF's Architecture Dvelopment Method

Togaf’s architecture development method (ADM) also follows a structured path as the illustration shows.  For a detailed information on the Togaf ADM, we refer to the Open Group website: http://www.opengroup.org/subjectareas/enterprise/togaf
At the heart of Enterprise Architecture development is the management of requirements. These requirements are predominantly based on process support.
User stories like “As a call centre agent, I want to see the entire customer history when call comes in in order to serve the customer better” are process support requirements. The data are defined within the context of the process. In this comprehensive case, some level of enterprise class da ta is attained but what about more microscopic user stories like “As a dunning clerk I want to see the accounts receivable per customer sorted per days overdue”. In this case, no context about why the customer is overdue is in scope. Maybe the delivery was late or incorrect, maybe the customer has a complaint filed with customer service or maybe the invoice was sent late and arrived during the client’s holiday closing…

Yes, we have a shifting paradigm!

I know, in this business the paradigm notion is an overrated concept, abused for pouring old wine in new bags. But in Thomas Kuhn’s strict definition of the term, I think we do stand a chance of dealing a with a paradigm shift in information architecture development.
A must read for anyone in information technology

I see critical anomalies:
inconsistent decision making depending on the flavour of the day and the profile of the decision maker, often based on inconsistent information which is extracted from inconsistent data. With a time to market reducing to smaller and smaller timeframes, the old process based architecture development method may prove to be ineffective to meet the challenges of new entrants and substitute products and services. Although every pundit is touting that information is the new oil, not too many companies are using it as the basis of information architecture development.
The old top down view leads to underperforming data retrieval which is no more sustainable in a digital competitive environment where time to market is often equal to the time it takes to tailor data to your needs, e.g. recommenders in e-business, cross selling in retail, risk assessment in insurance,…

There’s external pressure from the GDPR

By now every organisation doing business with or in the EU will be aware of the 25th May 2018, date when the general data protection regulation or GDPR, comes into effect which requires:
valid and explicit consent for the use of any data that can identify a person,
data protection by default (anonymization, pseudonymisation and security measures for data,
data breaches communication to the authorities and
records of processing activities.
Data management activities needed for compiance with the new legislation 

This requires organisations to manage their data on individuals far better and more centralised than they did in the past. Data requirements on persons will be at the heart of the information architecture development cycles as dealing with those on a lower level in the architecture framework will be a sure recipe for disaster.

Technology also contributes to this new approach

At least three technology evolutions enable the data centric approach to information architecture development: microservices, master data management tools and hybrid databases.
Microservices enable rapid scaling and reengineering of processes. The use of consistent data throughout the microservices architecture is a prerequisite.
Master data management tools are maturing as each relevant player is expanding from its original competence into the two others. You can observe data governance tools adding data quality and master data management functionality as well as data quality tools developing master data management and governance services and… you know where this is going.
Last but not least, hybrid databases will enable better storage and retrieval options as they support both transactional and analytical operations on structured and unstructured data.

In conclusion: modern information architecture needs flexible and fluid process management support using consistent data to facilitate consistent decision making, both by humans and machines.

In the next post, I will use a case to illustrate this approach. In the meantime, I look forward to your remarks and inputs for a thorough discussion.