Wednesday, August 27, 2008

Data Warehouses can be defined as subject-oriented, integrated, time-variant, non-volatile collections of data used to support analytical decision making. The data in the Warehouse comes from the operational environment and external sources. Data Warehouses are physically separated from operational systems, even though the operational systems feed the Warehouse with source data.

Subject Orientation

Data Warehouses are designed around the major subject areas of the enterprise; the operational environment is designed around applications and functions. This difference in orientation (data vs. process) is evident in the content of the database. Data Warehouses do not contain information that will not be used for informational or analytical processing; operational databases contain detailed data that is needed to satisfy processing requirements but which has no relevance to management or analysis.

Integration and Transformation

The data within the Data Warehouse is integrated. This means that there is consistency among naming conventions, measurements of variables, encoding structures, physical attributes, and other salient data characteristics. An example of this integration is the treatment of codes such as gender codes. Within a single corporation, various applications may represent gender codes in different ways: male vs. female, m vs. f, and 1 vs. 0, etc. In the Data Warehouse, gender is always represented in a consistent way, regardless of the many ways by which it may be encoded and stored in the source data. As the data is moved to the Warehouse, it is transformed into a consistent representation as required.

Time Variance

All data in Data Warehouse is accurate as of some moment in time, providing an historical perspective. This differs from the operational environment in which data is intended to be accurate as of the moment of access. The data in the Data Warehouse is, in effect, a series of snapshots. Once the data is loaded into the enterprise data store and data marts, it cannot be updated. It is refreshed on a periodic basis, as determined by the business need. The operational data store, if included in the Warehouse architecture, may be updated.

Non-Volatility

Data in the Warehouse is static, not dynamic. The only operations that occur in Data Warehouse applications are the initial loading of data, access of data, and refresh of data. For these reasons, the physical design of a Data Warehouse optimizes the access of data, rather than focusing on the requirements of data update and delete processing.

0 comments: