пятница, 19 сентября 2014 г.

The Bus Architecture


The data warehouse bus architecture was developed by Ralph Kimball and is extensively described in his books The Data Warehouse Toolkit and The Data Warehouse Lifecycle Toolkit. Both books are published by Wiley Publishing and cover the complete lifecycle of modeling, building, and maintaining data warehouses. The term bus refers to the fact that the different data marts in the data warehouse are interlinked by using conformed dimensions. A simple example can explain this. Suppose you have dimension tables for customers, suppliers, and products dimensions and want to analyze data about sales and purchase transactions. In case of the purchasing transactions, the customer is still unknown so it’s not very useful to include the customer dimension in the purchase star schema. For sales transactions the situation is slightly different: You need information about the customer who purchased a product and the supplier the product was purchased from. The resulting diagram for this small example data warehouse is shown in schema below:
It’s best to start with a high-level bus architecture matrix before the data mart’s design process is started. Figure 7-4 shows an example matrix, where all identified business facts are placed in the rows and all identified dimensions in the columns. The ‘‘bus’’ is formed by the main business process or the natural flow of events within an organization. In our case, that would be ordering from suppliers, storing and moving inventory, receiving customer orders, shipping DVDs, and handling returns. Within such a main business process it’s easy to check off all relationships between dimensions and facts, which makes the design process easier to manage and can also be used to communicate with the business users about the completeness of the data warehouse.
 Using the bus architecture with conformed dimensions is what enables the collection of data marts to be treated as a true Enterprise Data Warehouse. Each dimension table is designed and maintained in only one location, and a single process exists to load and update the data. This contrasts sharply with a collection of independent data marts where each individual data mart is designed, built, and maintained as a point solution. In that case, each data mart contains its own dimensions and each individual dimension has no relation to similar dimensions in other data marts. As a result of this way of working, you might end up having to maintain five or more different product and customer dimensions. We strongly oppose this type of ‘‘architecture’’!The author advice here is to always start with developing and agreeing upon the high-level bus matrix to identify all the entities of interest for the data warehouse. Only after completing this step can the detailed design for the individual dimension and fact tables be started.