第一章:1.4V{VolumeVarietyVelocity}2.OLTP{Traditional database applications;Automation of repetitive
processes;Highly structured access to few records;Short transactions with high update(An order entry at
eBay;Reservations;Bank transaction);Integrity of data is crucial,isolation and locking essential;Typically on mainframes
or client/server(c/s) environments.}3.OLAP{Directed at knowledge worker;Analysis of past performance;make better
future decisions based on OLAP;OLAP is an element of decision support systems;Read-access to large volumes of
data;Data aggregates;Analysis queries}4.OLTPvsOLAP{User(System designer, System Administrator, Data Entry
Clerk@Decision Maker,Executives)Function(Daily operations, (On-Line ) Transaction Processing@Decision support,
(On-Line) Analytical Processing)DB Design(Application oriented;Subject oriented)Data(up-to-date, atomic,
relational,isolated@Historical, summarized, multidimensional, integrated)Usage(Repetitive, Routine(常规的)@Ad hoc
queries)Access(Read/Write Simple transaction@Read mostly Complex Query)System Requirements(Transaction
Throughput( 吞 吐 量 ) Data Consistency@Query Throughput,Data Accuracy)}5.not Mix{Different performance
requirements[Transaction processing (OLTP)#(Fast response time important (< 1 second)@Data must be up-to-date,
consistent at all times)Data analysis (OLAP)#(Queries can consume lots of resources@Can saturate(过多…占用) CPUs
and disk bandwidth@Operating on static “snapshot” of data usually OK)OLAP can crowd out OLTP
transactions#(Transactions are slow → unhappy users)]Different data modeling requirements[Transaction processing
(OLTP)#(Normalized schema for consistency@Complex data models, many tables@Limited number of standardized
queries and updates)Data analysis (OLAP)#(Simplicity of data model is important@Denormalized schemas are
common)]Analysis requires data from many sources[An OLTP system targets one specific process;OLAP integrates
data from different processes#(Combine sales, inventory, and purchasing data@Analyze experiments conducted by
different labs)OLAP often makes use of historical data#(Identify long-term patterns@Notice changes in behavior over
time)Terminology, schemas vary across data sources#(Integrating data from disparate sources is a major
challenge)]}6.DW{A data warehouse is a subject-oriented, integrated, time-variant, nonvolatile collection of data in
support of management decisions}7.DWvsData Marts{Data Warehouse is enterprise-wide(Requires extensive( 广 泛 )
business modeling@May take years to design and build@Expensive)Data Marts comprise(包含) specific subsets(Limited
focus@Faster to build and roll-out 扩 展 @Integration is a difficult task, data inconsistency may be very
serious)}8.Characteristics{Subject oriented(Data are organized by how users refer to it)Integrated(Inconsistencies are
removed in both nomenclature (命名系统)and conflicting information)Non-volatile(Read-only data. Data do not change
over time.)Time series(Data are time series, not current status)}
第二章1.Data in the data warehouse{Current detail data(From the operational databases@From business partners
through electronic data interchange)Old detail data(Result of operational databases during previous business
cycles@Historical data@Same data categories as the current detail data but provided with time stamps@Historical data
makes the size of many data warehouses tremendously(惊人) large)Derived(源) data(Summarized data is the result from
multidimensional analysis performed on the detail data@Aggregations(聚集)(Aggregations are often materialized(实体
化) for performance reasons@Multiple aggregations maintained(维护) over same detail data@Given the huge volume of
detail data, aggregations may themselves be very large tables.))Reconciled data ( 被整合的数据)(Integrated operational
detail data@Summarized data@have undergone format conversions( 转 化 )Conversion is done through a conversion
function@If no conversion function exists,explicit reconciliation tables must be maintained)}2.MetaData(Data about
Data){Description and location of DW systems and the OLTP systems@Names, descriptions and definitions of detail-
data@Names, descriptions and definitions of end-user views@Authoritative ( 权 威 性 的 ) data
sources@Reconciliation ( 一 致 ) information@History about data refreshments@Security authorizations( 安 全 授
权 )@Information on aggregations and DW performance vs. usage patterns}3.Backroom Metadata(The backroom is
where the data staging process takes place. The metadata that define the existing applications that feed the DW)
{Source System Metadata(Source specifications( 说 明 书 )@Source descriptive info@Process info)Data Staging
Metadata(Data acquisition( 获 得 ) Info@Dimension table management@Transformation and Aggregation)DBMS
Metadata}4.Front room Metadata{Meta data is extended to the horizon(Business names and descriptions for tables
and groupings@Canned queries and report definitions@Tools settings@Network security user privilege profiles( 简
况)@Individual user profiles, with link to Human Resources to track promotions, transfers, and resignations)}5.Metadata
Repository(元数据库){The components of the DW interact with each other via the metadata repository@The metadata is
managed in the metadata repository@Metadata repository management software can be used to map the source data to the
target database, generate code for data transformations, integrate and transform the data, and control moving data to the
warehouse}6.Challenges of Metadata{New data types@Inconsistent data formats@Missing or invalid data@Different
levels of aggregation@Semantic(语义) inconsistency@Unknown or questionable data quality and timeliness}7.Metadata
Trends{Be able to handle a new data source@Be able to handle the new enriched data content as easily as the simple data
评论0
最新资源