The Data WarehouseETL Toolkit: Practical Techniques for

所需积分/C币:9 2013-12-27 06:50:15 5.32MB PDF
66
收藏 收藏
举报

Cowritten by Ralph Kimball the world"s leading data warehousing authority whose previous books have sold more than 150 000 copies Delivers real world solutions for the most time and labor intensive portion of data warehousing data staging or the extract transform load ETL process Delineates best practices for extracting data from scattered sources removing redundant and inaccurate data transforming the remaining data into correctly formatted data structures and then loading the end product into the data warehouse Offers proven time saving ETL techniques comprehensive guidance on building dimensional structures and crucial advice on ensuring data">Cowritten by Ralph Kimball the world"s leading data warehousing authority whose previous books have sold more than 150 000 copies Delivers real world solutions for the most time and labor intensive portion of data warehousing data staging or the extract transform load ETL process Deline [更多]
The data Warehouse ETL Toolkit Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data Ralph kimball Joe caserta WILEY Wiley Publishing, Inc plished by Wiley Publishing, Inc. 10475 Crosspoint boulevard Indianapolis, IN 46256 www.wiley.com Copyright C 2004 by Wiley Publishing, Inc. All rights reserved Published simultaneously in Canada SBN:0-764-57923-1 Printed in the United States of america 10987654321 No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, ex- cept as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without eithe the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax(978)646-8600 Requests to the Publisher for permission should be addressed to the Legal Department, Wiley Publishing, Inc, 10475 Crosspoint Blvd. Indianapolis, IN 46256,(317) 572-3447,fax(317)572-4355,e-mail:brandreview@wiley.com Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specif- ically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other pro- fessional services. If professional assistance is required the services of a competent professional person should be sought. Neither the publisher not the author shall be liable for damages arising herefrom. The fact that an organization or Website is referred to in this work as a citation and or a potential source of further information does not mean that the author or the publisher endorses the information the organization or Website may provide or recommendations it may make. Fur ther, readers should be aware that Internet Websites listed in this work may have changed or disappeared between when this work was written and when it is read For general information on our other products and services please contact our Customer Care Department within the United States at( 800)762-2974, outside the United States at(317)572-3993 or fax(317)572-4002 Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books Library of Congress Cataloging-in-Publication Data Kimball, ralph The data warehouse EtL toolkit: practical techniques for extracting, cleaning, conforming, and delivering data/ Ralph Kimball, Joe Caserta P. Cm Includes index eIsB07645-7923-1 1. Data warehousing. 2. Database design. I Caserta, Joe, 1965-II. Title QA769D37K532004 00574-dc22 2004016909 Trademarks: Wiley, the Wiley Publishing logo, and related trade dress are trademarks or registered trademarks of John Wiley sons, InC and/or its affiliates. All other trademarks are the property of their respective owners. Wiley Publishing, Inc. is not associated with any product or vendor mentioned in this book Credits Vice president and executive Development editor Group publisher: Adaobi obi Fulton Richard wadley Production editor Vice President and publisher: Pamela hanley Joseph b Likert Media Development Specialist Executive Editorial director: Travis silvers Mary bednarek Text Design &x Composition: Executive editor. Tech Books composition Services Robert elliot Editorial Manager: Kathryn A malm Contents Acknowledgments XVI About the authors XIX Introduction Part Requirements, Realities, and Architecture Chapter 1 Surrounding the requirements Requirements Business needs Compliance requirements Data Profiling Security requirements Data Integratic Data latency Archiving and lineage End User Delivery Interfaces 3444567788999 Available skills Legacy licenses Architecture ETL Tool versus Hand Coding Buy a Tool Suite or roll Your own? The Back room-Preparing the data 16 The front room -Data access 20 The mission of the data Warehouse What the data Warehouse is What the data wa Industry Terms Not Used Consistently 25 什 Contents Resolving Architectural Conflict: A Hybrid Approach 27 How the data Warehouse Is changing 27 The Mission of the etl team 28 Chapter 2 ETL Data Structures 29 To Stage or not to stag ge Designing the Staging Area 31 Data Structures in the etl System 35 flat file XML Data sets 38 Relational tables 40 Independent DBMS Working Tables 41 Third normal Form Entity/ Relation Models 42 Nonrelational Data Sources Dimensional data Models: The handoff from the back Room to the front room Fact Table Dimension tables Atomic and aggregate Fact Tables 47 Surrogate Key Mapping Tabl 48 Planning and design standards Impact analysis 49 Metadata Capt Naming Conventions 51 Auditing data Transformation Steps 51 ummary 52 Part l Data flow 53 Chapter 3 Extracting 55 Part 1: The logical data Map Designing logical before physical Inside the logical Data Map Components of the Logical Data Map 58 Using Tools for the Logical Data Map 62 Building the logical Data Map 62 Data Discovery phase 3 Data Content analysis 71 Collecting Business Rules in the EtL Process 3 Integrating Heterogeneous Data Sources 73 Part 2: The Challenge of Extracting from Disparate Platforms Connecting to Diverse Sources through OdBC Mainframe Sources 78 Working with COBol Copybooks 78 EBCDIC Character Set 79 Converting ebcdic to asCll Contents Transferring Data between platforms Handling Mainframe Numeric Data Using PICtures 81 Unpacking Packed Decimals 83 Working with Redefined Fields Multiple occurs Managing Multiple Mainframe Record Type Files Handling Mainframe Variable Record Lengths Flat files Processing Fixed length Flat Files Processing Delimited Flat Files XML Sources Character sets 94 XML Meta Data Web Log sources W3C Common and Extended formats Name value pairs in Web logs 100 ERP SyStem Sources 102 Part3:Eⅹ tractin 8 Changed Dat Detecting Changes 106 Extraction Tips 109 Detecting Deleted or Overwritten Fact Records at the Source 111 ummary Chapter 4 Cleaning and Conforming 113 Defining data Quality 115 Assumptions 116 Part 1 Design Objectives g1 117 Understand Your Key constituencies 117 Competing Factors 119 Balancing Conflicting Priorities 120 Formulate a policy 122 Part 2 Cleaning Deliverables 124 Data Profili 1g Deliverable 125 Cleaning Deliverable #1: Error Event Table 125 aning Deliverable #2: Audit Dimensio 128 Audit dimension fine points 130 Part 3: Screens and Their measurements 131 nomal Detection Phase 131 Types of Enforcem 134 Column property Enforcement 134 Structure Enforcement 135 Data and value rule enforcement 135 Measurements Driving Screen Design 136 Overall Process flow 136 The show Must Go on-usually 138 Screens 139

...展开详情
试读 127P The Data WarehouseETL Toolkit: Practical Techniques for
立即下载
限时抽奖 低至0.43元/次
身份认证后 购VIP低至7折
一个资源只可评论一次,评论内容不能少于5个字
您会向同学/朋友/同事推荐我们的CSDN下载吗?
谢谢参与!您的真实评价是我们改进的动力~
关注 私信
上传资源赚钱or赚积分
最新推荐
The Data WarehouseETL Toolkit: Practical Techniques for 9积分/C币 立即下载
1/127
The Data WarehouseETL Toolkit: Practical Techniques for第1页
The Data WarehouseETL Toolkit: Practical Techniques for第2页
The Data WarehouseETL Toolkit: Practical Techniques for第3页
The Data WarehouseETL Toolkit: Practical Techniques for第4页
The Data WarehouseETL Toolkit: Practical Techniques for第5页
The Data WarehouseETL Toolkit: Practical Techniques for第6页
The Data WarehouseETL Toolkit: Practical Techniques for第7页
The Data WarehouseETL Toolkit: Practical Techniques for第8页
The Data WarehouseETL Toolkit: Practical Techniques for第9页
The Data WarehouseETL Toolkit: Practical Techniques for第10页
The Data WarehouseETL Toolkit: Practical Techniques for第11页
The Data WarehouseETL Toolkit: Practical Techniques for第12页
The Data WarehouseETL Toolkit: Practical Techniques for第13页
The Data WarehouseETL Toolkit: Practical Techniques for第14页
The Data WarehouseETL Toolkit: Practical Techniques for第15页
The Data WarehouseETL Toolkit: Practical Techniques for第16页
The Data WarehouseETL Toolkit: Practical Techniques for第17页
The Data WarehouseETL Toolkit: Practical Techniques for第18页
The Data WarehouseETL Toolkit: Practical Techniques for第19页
The Data WarehouseETL Toolkit: Practical Techniques for第20页

试读结束, 可继续阅读

9积分/C币 立即下载