![](https://csdnimg.cn/release/download_crawler_static/86589658/bg1.jpg)
Data warehousing
Definitions
Bill Inmon:
A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in
support of management's decision making process.
Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For example,
"sales" can be a particular subject.
Integrated: A data warehouse integrates data from multiple data sources. For example, source A and
source B may have different ways of identifying a product, but in a data warehouse, there will be only a
single way of identifying a product.
Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data from 3
months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a
transactions system, where often only the most recent data is kept. For example, a transaction system
may hold the most recent address of a customer, where a data warehouse can hold all addresses
associated with a customer.
Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data
warehouse should never be altered.
Ralph Kimball provided a more concise definition of a data warehouse:
A data warehouse is a copy of transaction data specifically structured for query and analysis
Data Warehouse Architecture
In general, all data warehouse systems have the following layers:
� Data Source Layer
� Data Extraction Layer
� Staging Area
� ETL Layer
� Data Storage Layer
� Data Logic Layer
� Data Presentation Layer
� Metadata Layer
� System Operations Layer
Data Source Layer
This represents the different data sources that feed data into the data warehouse. The data
source can be of any format -- plain text file, relational database, other types of database, Excel
file, etc., can all act as a data source.
Many different types of data can be a data source: