How to Overcome the Challenges of Using a Data Vault

Analytics / Big Data

How to Overcome the Challenges of Using a Data Vault

What Are the Challenges?

From flexibility to scalability and efficiency, using a data vault as your Data Modeling approach has many benefits. But simultaneously there are challenges that you need to be aware of. In this blog I’m going to walk you through the limitations and how you can overcome them.

The approach a data vault takes when modeling data (something I will go into detail on further down) results in a significantly larger amount of data objects compared to other approaches. These objects include things like tables and columns and the reason there are so many more is because a data vault separates information types. 

As a consequence, the up-front modeling effort can be larger to accommodate the resulting benefits – mentioned above – as the end result. It also means that during the modeling process there can be larger numbers of manual or mechanical tasks involved to establish the flexible and detailed data model with all its components.

How Can These Limitations Be Addressed?

To avoid time-consuming manual tasks during the modeling process, architects can automate parts of the model, making it more efficient to create, update and maintain long-term.

How can they do that?

Within the data vault approach, there are certain layers of data. These range from the source systems where data originates, to a staging area where data arrives from the source system, modeled according to the original structure, to the core data warehouse, which contains the raw vault, a layer that allows tracing back to the original source system data, and the business vault, a semantic layer where business rules are implemented. Finally, there are data marts, which are structured based on the requirements of the business. For example, there could be a finance data mart or a marketing data mart, holding the relevant data for analysis purposes.

Out of these layers, the staging area and the raw vault are best suited to automation.

What Are the Characteristics of Data Vault Modeling?

The data vault modeling technique brings ultimate flexibility by separating the business keys, which uniquely identify each business entity and do not change often, from their attributes. These results, as mentioned earlier, in many more data objects being in the model, but also provides a data model that can be highly responsive to changes, such as the integration of new data sources and business rules.

The basic structure of the model comes from the business keys and the relationships between them. Their stable nature provides the key ingredient for a robust data model, but also means the keys need to be chosen carefully, as they form the very basis from which everything else is derived.

Hubs

The tables which contain the business keys are called hubs in the data vault approach. In addition to storing the keys, hubs also contain surrogate keys and metadata for each business key. Finally, the source of each business key can also be found in the hub, so that information can be traced back to its origins.

Links

Link tables are many-to-many join tables that connect different business keys. Within link tables the information you will find are the surrogate keys for the hubs connected via the link, as well as the surrogate key for the link and the metadata about where the association originated from.

Satellites

With the hubs and links in place, the structure of the data vault model is set up. It does not, however, contain any attributes yet. This is where satellites come in. Satellite tables hold metadata that connect them to their parent hubs and link tables. They also contain metadata about the origins of the attributes, as well as temporal attributes. This means that thanks to satellites, data architects can ensure that history is recorded at any interval, while also providing an audit trail and traceability to the source system.

Continue Reading

How to Overcome the Challenges of Using a Data Vault