понедельник, 19 января 2015 г.

Data Vault DWH Modelling

Back in the 90s of last century, the designer schiki data warehousing (DW) began to face challenges due to the growing volume of information. The scale of these problems increases a pair, with the volumes of stored data exponentially. Implementation of applications such as CRM, SCM, ERP etc, Led to a further increase in the volume of data in storage.

To date, the difficulties associated with the scalability, flexibility and level of detail representation of data is a daily headache for DWH developers. Practice shows that the traditional model and data type 3NF, and fashionable wye difficult to modify, maintain, and analyze, not to mention the difficulties associated with the backup copying and restoration, as they initially were not intended for the purpose of storage.

The answer to these problems was the model of Data Vault, designed by Linstedtom Dan. Technology Data Vault is the next step in the evolution of data modeling techniques, because it was created specifically for the enterprise data warehouse.

Data Vault is a collection of interconnected normalized tables, the oriented deposited detailed information with the possibility of tracking the origin of the data and maintain one or more areas of business. This hybrid technology, which incorporates the best of schemes 3NF and Star. Design Data Vault flexible, expands it differs consistent and easily adapts adapts to the changing needs of company.

Components Data Vault
In the model Data Vault is used in all three types of tables persons - Hub, Link and Satellite, which allows you to keep the design simple and elegant DWH. Type Hub provides a representation of the functional areas of the subject area. Link provides a transactional relationship between the Hub-party tables. Satellite provides details of the primary key Hub-table. Each type of table is designed to provide maximum flexibility and scalability storage, while retaining most of the traditional methods of data modeling.

Table type Hub
Tables of this type contain a defined set of business keys. Business key - a unique identifier, which the business uses in its daily operations. Examples of business keys are, for example, invoice number, employee number, customer number, part number, etc. If the business loses a key, it will lost all the information about the object.
Other attributes Hub-tables:

  • surrogate key (Surrogate Key) - optional component, possibly a smart key or serial number;
  • timestamp download (Load Date Time Stamp) - the date, time recording, when the key for the first time has been loaded into the repository;
  • data source (Record Source) - the name of the original tion system used to track the source of data.

Once identified by means of keys
business objects (for example, determined that customers and accounts), you can begin to build a relationship between them.

Table type Link
This type of table reflects the attitude of tranzak- or between two or more components of the business (two or more business keys) the type of relations "many to many" as a model 3NF.
Link contains the following attributes:

  • surrogate key (Surrogate Key) - optional component, possibly a smart key or serial number. Used only if there are more than two Hub-tables related to these Link;
  • keys Hub-tables - are transferred to the Link, forming an integral key, and represent interactions and of the connection between Hub-tables;
  • timestamp download (Load Date Time Stamp) - the date, the time of recording to boot the key in the repository;
  • data source (Record Source) - the name of the source system for tracking the data source.

With just a few Hub- and Link-table model begins to describe business processes. It should be noted that this modeling technique is designed for a data warehouse, and not for OLTP-systems.

Table type Satellite
Satellite-table contains descriptive information key Hub. This information is subject to change over time, and therefore the structure of Satellite-table should be adapted for storage solutions as new or modified, as well as historical information.
The table has the following mandatory attributes:
  • Satellite-primary key table is the primary key Hub- or Link-table (in Satellite transferred from the Hub or Link);
  • primary key Satellite (timestamp download (Load Date Time Stamp)) - the date is recorded when the key was first uploaded to the repository (always insert a new line);
  • primary key Satellite (optional) - a surrogate key;
  • data source (Record Source) - registration of the original system, is used to back trace data.
Satellite-table inherently closest to the size SCD II in determining Ralph Kimball. It keeps changing at a detailed level, and its function is to describe the context and copies Hub Link. Designing Satellite-tables should be based on the mathematical principles of reducing the excess data on the rate of change data.
Thus, Satellite-table play the role description of the business key to the most detailed level available. This provides a basis for the development of context, describing the business.
On the basis of simple components can be constructed as a simple storage consisting of one pair and Hub- Satellite-tables and huge corporate DWH containing hundreds Hub.

Data Warehouse Data Vault
Large amount of data leads to problems with queries, especially wye, to a lesser extent - 3NF. For large volumes of information broken query performance in concerted and coordinated measurements fact tables. Often required to do partitioning and continuously alter the structure of the repository to provide additional granularity business users. Restart the ever-changing Stars is difficult (not to mention the attempts to accomplish this with a large volume of data, such as more than 1 TB).
Data Vault model is rooted in the mathematical foundations of the normalized data model and therefore does not have these disadvantages. Reduction of data redundancy and given the pace of change in the data sets to improve performance and ease of management. Architecture Data Vault is not limited to any one platform.
Scalability model Data Vault can be demonstrated by the following example.
Suppose a company that sells computers, has DWH, consisting of Hub-table "Computers", Hub-table "Accounts" and Link-table relationships between them. Then the company decides to sell cars. The data model Data Vault allows you to enter a new Hub "Cars" and create a new Link "Cars-Accounts". As a result, no data is lost, all the information accumulated and Retained for a long time, maintained and reflects changes in the business. This is just one of the many opportunities to handle this situation.
Thus, the scope and structure of the repository, built on technology Data Vault, can always be changed or updated with minimal labor costs.
Architectural flexibility inherent in the model Data Vault, you can create a data warehouse iteratively without significant changes created structure. The practical application of the model Data Vault in various business areas successfully proves it.