This blogpost is part of a series of which
the following posts have been published:
Coherent business concepts keep the data relevant
In any data mesh architecture, the data warehouse is and will be a critical component for many reasons. First and foremost: some analytics need industrialised solutions, automating the entire flow from raw data tot finished reports. Structured data will always contribute to the analytical environment and will need a relational model to provide the foundation for analyses. In my experience, the most flexible and sustainable model is the process based star schema architecture from Ralph Kimball. In one of my previous posts I have made the case for this approach.
And in the context of a data lake project I positioned the Kimball approach as the best in class
The process diagram below tells the story
of requirements gathering, ingesting all sorts of data in the lake and making
the distinction between structured and unstructured data. Identifying the
common dimensions and facts is crucial to make the concept work. Either you
provide an increment to an existing data mart bus or you introduce a new
process metrics fact table with foreign keys from existing and new dimensions.
Managing structured and unstructured data in a data mesh environment |
Making the case for the data warehouse as an endpoint of unstructured analysis
A lot of advanced analytics can be
facilitated by the data lake. Think of text analytics, social media analytics
and image processing. The outcomes of these analyses may find their way to the
data warehouse. For example: polarity analysis in social media. Imagine a bank
or a telecom provider capturing the social media comments on its performance.
As we all know from customer feedback analysis, only the emotions two or three
sigma away from the mean make it to social media. The client is either very
satisfied or very dissatisfied and wants the world to know. Taking snapshots of
the client’s mood and relating it to his financial or communication behaviour may
yield interesting information. Already today, some banks are capturing their
client’s mood to determine the optimum conditions to present their services.
Aggregating these data may even provide macro-economic data correlating with
the business cycle.
Have a look at the diagram below and
imagine the business questions it can answer for you.
A high level star schema integrating social messages and their polarity with sales metrics |
Think of time series: is there a some form
of a leading indicator of sales in the polarity of this customer’s social
messages?
If one of our products is the subject of a
social media post, has this any (positive or negative) effect on sales of that
particular product?
What social media sources have the greatest
impact on our brand equity?
I am sure you will add your dimensions and
business questions to the model. And by doing so you are realising one of the
main traits of a data mesh: delivering data as a product.
I hope I have made my point clear: even in
the most sophisticated data lakehouse supporting a data mesh architecture, the
data warehouse is not going away.
In the next blog article we will focus on
governing the data ingestion process.
Stay tuned!