zaterdag 18 november 2023

Best Practices in Defining a Data Warehouse Architecture

This blogpost is part of a series of which the following posts have been published:

The opening statement

What is a data mesh?

Coherent business concepts keep the data relevant

In any data mesh architecture, the data warehouse is and will be a critical component for many reasons. First and foremost: some analytics need industrialised solutions, automating the entire flow from raw data tot finished reports. Structured data will always contribute to the analytical environment and will need a relational model to provide the foundation for analyses. In my experience, the most flexible and sustainable model is the process based star schema architecture from Ralph Kimball. In  one of my previous posts I have made the case for this approach.

 And in the context of a data lake project I positioned the Kimball approach as the best in class

The process diagram below tells the story of requirements gathering, ingesting all sorts of data in the lake and making the distinction between structured and unstructured data. Identifying the common dimensions and facts is crucial to make the concept work. Either you provide an increment to an existing data mart bus or you introduce a new process metrics fact table with foreign keys from existing and new dimensions.


Best practices in DWH
Managing structured and unstructured data in a data mesh environment

Making the case for the data warehouse as an endpoint of unstructured analysis

A lot of advanced analytics can be facilitated by the data lake. Think of text analytics, social media analytics and image processing. The outcomes of these analyses may find their way to the data warehouse. For example: polarity analysis in social media. Imagine a bank or a telecom provider capturing the social media comments on its performance. As we all know from customer feedback analysis, only the emotions two or three sigma away from the mean make it to social media. The client is either very satisfied or very dissatisfied and wants the world to know. Taking snapshots of the client’s mood and relating it to his financial or communication behaviour may yield interesting information. Already today, some banks are capturing their client’s mood to determine the optimum conditions to present their services. Aggregating these data may even provide macro-economic data correlating with the business cycle.

Have a look at the diagram below and imagine the business questions it can answer for you.

A high level star schema integrating social messages and their polarity with sales metrics

Think of time series: is there a some form of a leading indicator of sales in the polarity of this customer’s social messages?

If one of our products is the subject of a social media post, has this any (positive or negative) effect on sales of that particular product?

What social media sources have the greatest impact on our brand equity?

I am sure you will add your dimensions and business questions to the model. And by doing so you are realising one of the main traits of a data mesh: delivering data as a product.

I hope I have made my point clear: even in the most sophisticated data lakehouse supporting a data mesh architecture, the data warehouse is not going away.

In the next blog article we will focus on governing the data ingestion process.

Stay tuned!


Geen opmerkingen:

Een reactie posten