vrijdag 2 februari 2024

Geospatial Data Warehouse and Spatially Enabled Data Warehouse: Turf Wars or Symbiosis?

Over the course of more than 25 years, I have been involved in numerous discussions between the GIS buffs and my tribe, the data warehouse team over where geospatial information should reside. 

Narnia map
A geospatial map of Narnia. What attributes should reside in the geospatial system?

Since almost all measures have a location aspect, the spatial data warehouse was promoted as the single source of truth, able to visualize data in an unparalleled way whereas the opponents stated that all you need is to define a good location dimension and the data warehouse could do without the expensive software and the scarce resources in the geospatial domain. 

I will spare you the avalanche of technical arguments back and forth between the two, leading to tugs of war between the teams and I propose an approach from the business user’s point of view.

The essential question is: “What information am I looking for?” Is it about one or more measures that need to be put in context using a majority of dimensions outside the geospatial domain, even if it includes DimLocation or is it exclusively related to questions “What happened or happens in this particular location, i.e. at this point, line or polygon?” , “What are the measures within a radius of point (x,y) on the map?” or “What is the intersect between location A and location B as far as measure Z is concerned?”.

It is clear that in the first case, the performance and cost of a classical data warehouse with a location dimension will prove to be the better choice. But if location is the point of entry to a query, then the spatial data warehouse is the smartest tool in the shed. 

Symbiosis is the way forward

There are many reasons why the two environments make sense. For executive and managerial information based on structured  data, the data warehouse has proven to be the platform of choice and will continue to do so. For location based analysis, the geospatial data warehouse outperforms the latter. At the same time it is much closer to operational analytics and it can even be a part of operational applications like CRM, SCM or any other OLTP system.

To enable symbiosis, the location dimension needs some connection to the geospatial system. Some plead for a simple snapshot of a shapefile, some want a full duplication of all geospatial data and their timestamp. The latter may lead to an avalanche of data as any little correction of the shape on the GIS system will send new time stamped data. This can’t be a workable situation. Either the snapshot ignores updates but takes in the original GIS object ID to secure a trace or it overwrites any location data and keeps the last version as an active one. Because the only objective here is to provide a path to analysts who need a deeper geospatial analysis of one or more measurements registered in the data warehouse. 

Let's open the debate

I am sure I have missed a few points here and there. Let me know your position on this issue via the comments or a personal message via the contact form on our website: https://www.linguafrancaconsulting.eu/ 

This is one of the topics of our course "ICT Focus Areas for Board Members & Management"