The Data WarehouseETL Toolkit: Practical Techniques for
Cowritten by Ralph Kimball the world"s leading data warehousing authority whose previous books have sold more than 150 000 copies Delivers real world solutions for the most time and labor intensive portion of data warehousing data staging or the extract transform load ETL process Delineates best practices for extracting data from scattered sources removing redundant and inaccurate data transforming the remaining data into correctly formatted data structures and then loading the end product into the data warehouse Offers proven time saving ETL techniques comprehensive guidance on building dimensional structures and crucial advice on ensuring data">Cowritten by Ralph Kimball the world"s leading data warehousing authority whose previous books have sold more than 150 000 copies Delivers real world solutions for the most time and labor intensive portion of data warehousing data staging or the extract transform load ETL process Deline [更多]
The data Warehouse ETL Toolkit Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data Ralph kimball Joe caserta WILEY Wiley Publishing, Inc plished by Wiley Publishing, Inc. 10475 Crosspoint boulevard Indianapolis, IN 46256 www.wiley.com Copyright C 2004 by Wiley Publishing, Inc. All rights reserved Published simultaneously in Canada SBN:0-764-57923-1 Printed in the United States of america 10987654321 No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, ex- cept as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without eithe the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax(978)646-8600 Requests to the Publisher for permission should be addressed to the Legal Department, Wiley Publishing, Inc, 10475 Crosspoint Blvd. Indianapolis, IN 46256,(317) 572-3447,fax(317)572-4355,e-mail:email@example.com Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specif- ically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other pro- fessional services. If professional assistance is required the services of a competent professional person should be sought. Neither the publisher not the author shall be liable for damages arising herefrom. The fact that an organization or Website is referred to in this work as a citation and or a potential source of further information does not mean that the author or the publisher endorses the information the organization or Website may provide or recommendations it may make. Fur ther, readers should be aware that Internet Websites listed in this work may have changed or disappeared between when this work was written and when it is read For general information on our other products and services please contact our Customer Care Department within the United States at( 800)762-2974, outside the United States at(317)572-3993 or fax(317)572-4002 Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books Library of Congress Cataloging-in-Publication Data Kimball, ralph The data warehouse EtL toolkit: practical techniques for extracting, cleaning, conforming, and delivering data/ Ralph Kimball, Joe Caserta P. Cm Includes index eIsB07645-7923-1 1. Data warehousing. 2. Database design. I Caserta, Joe, 1965-II. Title QA769D37K532004 00574-dc22 2004016909 Trademarks: Wiley, the Wiley Publishing logo, and related trade dress are trademarks or registered trademarks of John Wiley sons, InC and/or its affiliates. All other trademarks are the property of their respective owners. Wiley Publishing, Inc. is not associated with any product or vendor mentioned in this book Credits Vice president and executive Development editor Group publisher: Adaobi obi Fulton Richard wadley Production editor Vice President and publisher: Pamela hanley Joseph b Likert Media Development Specialist Executive Editorial director: Travis silvers Mary bednarek Text Design &x Composition: Executive editor. Tech Books composition Services Robert elliot Editorial Manager: Kathryn A malm Contents Acknowledgments XVI About the authors XIX Introduction Part Requirements, Realities, and Architecture Chapter 1 Surrounding the requirements Requirements Business needs Compliance requirements Data Profiling Security requirements Data Integratic Data latency Archiving and lineage End User Delivery Interfaces 3444567788999 Available skills Legacy licenses Architecture ETL Tool versus Hand Coding Buy a Tool Suite or roll Your own? The Back room-Preparing the data 16 The front room -Data access 20 The mission of the data Warehouse What the data Warehouse is What the data wa Industry Terms Not Used Consistently 25 什 Contents Resolving Architectural Conflict: A Hybrid Approach 27 How the data Warehouse Is changing 27 The Mission of the etl team 28 Chapter 2 ETL Data Structures 29 To Stage or not to stag ge Designing the Staging Area 31 Data Structures in the etl System 35 flat file XML Data sets 38 Relational tables 40 Independent DBMS Working Tables 41 Third normal Form Entity/ Relation Models 42 Nonrelational Data Sources Dimensional data Models: The handoff from the back Room to the front room Fact Table Dimension tables Atomic and aggregate Fact Tables 47 Surrogate Key Mapping Tabl 48 Planning and design standards Impact analysis 49 Metadata Capt Naming Conventions 51 Auditing data Transformation Steps 51 ummary 52 Part l Data flow 53 Chapter 3 Extracting 55 Part 1: The logical data Map Designing logical before physical Inside the logical Data Map Components of the Logical Data Map 58 Using Tools for the Logical Data Map 62 Building the logical Data Map 62 Data Discovery phase 3 Data Content analysis 71 Collecting Business Rules in the EtL Process 3 Integrating Heterogeneous Data Sources 73 Part 2: The Challenge of Extracting from Disparate Platforms Connecting to Diverse Sources through OdBC Mainframe Sources 78 Working with COBol Copybooks 78 EBCDIC Character Set 79 Converting ebcdic to asCll Contents Transferring Data between platforms Handling Mainframe Numeric Data Using PICtures 81 Unpacking Packed Decimals 83 Working with Redefined Fields Multiple occurs Managing Multiple Mainframe Record Type Files Handling Mainframe Variable Record Lengths Flat files Processing Fixed length Flat Files Processing Delimited Flat Files XML Sources Character sets 94 XML Meta Data Web Log sources W3C Common and Extended formats Name value pairs in Web logs 100 ERP SyStem Sources 102 Part3:Eⅹ tractin 8 Changed Dat Detecting Changes 106 Extraction Tips 109 Detecting Deleted or Overwritten Fact Records at the Source 111 ummary Chapter 4 Cleaning and Conforming 113 Defining data Quality 115 Assumptions 116 Part 1 Design Objectives g1 117 Understand Your Key constituencies 117 Competing Factors 119 Balancing Conflicting Priorities 120 Formulate a policy 122 Part 2 Cleaning Deliverables 124 Data Profili 1g Deliverable 125 Cleaning Deliverable #1: Error Event Table 125 aning Deliverable #2: Audit Dimensio 128 Audit dimension fine points 130 Part 3: Screens and Their measurements 131 nomal Detection Phase 131 Types of Enforcem 134 Column property Enforcement 134 Structure Enforcement 135 Data and value rule enforcement 135 Measurements Driving Screen Design 136 Overall Process flow 136 The show Must Go on-usually 138 Screens 139
The Data Warehouse ETL Toolkit2018-08-03
The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning2011-04-06
- The Data WarehouseETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delive 7012008-11-20版权声明：原创作品，允许转载，转载时请务必以超链接形式标明文章原始出版、作者信息和本声明。否则将追究法律责任。http://blog.csdn.net/topmvp - topmvp*Cowritten by Ralph Kimball, the worlds leading data warehousing authority, whose previous books have sold m
数据仓库工具箱合集3in1——Data Warehouse Toolkit 3 in 12009-05-24
本资源包括Ralph Kimball的3本数据仓库的著作： ...《The Data Warehouse ETL Toolkit Practical Techniques for Extracting,Cleaning,Conforming,and Delivering Data》 就是我们所说的数据仓库工具箱系列
Data Visualization Toolkit is your hands-on, practical, and holistic guide to the art of visualizing data. You’ll learn how to use Rails, jQuery, D3, Leaflet, PostgreSQL, and PostGIS together, ...
Practical Web Scraping for Data Science - 20182018-04-19
Practical Web Scraping for Data Science: Best Practices and Examples with Python By 作者: Seppe vanden Broucke – Bart Baesens ISBN-10 书号: 1484235819 ISBN-13 书号: 9781484235812 Edition 版本: 1st...
*Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for ...
Modern libraries and data handling techniques mean you can collect, clean, process, store, visualize, and present web application data while enjoying the efficiency of a single-language pipeline and ...
Clojure and Elixir becoming part of important enterprise applications, functional data structures have gained an important place in the developer toolkit. Immutability is a cornerstone of functional ...
Machine Learning and Security: Protecting Systems with Data and Algorithms pdf2018-06-25
Machine learning and security specialists Clarence Chio and David Freeman provide a framework for discussing the marriage of these two fields, as well as a toolkit of machine-learning algorithms that ...
Getting Started with C++ Audio Programming for Game Development2015-09-15
this book gives a clear introduction to the concepts and practical application of audio programming using the FMOD library and toolkit. Overview Add audio to your game using FMOD and wrap it in ...
Includes practical real-world examples of techniques for implementation, such as building a text classification system to categorize news articles, analyzing app or game reviews using topic modeling ...
《数据挖掘实用机器学习技术》(原书第2版) [高清ebook，非扫描, ]2010-06-23
Offering a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques, inside you'll find: + Algorithmic methods at the heart of successful data ...
It is practical to only retain the high signal-to-noise ratio (SNR) regions of the waveform, therefore there is also a need for a speech activity detector (SAD) in the front-end. After dropping the ...
Practical Mod Perl2008-04-05
Practical Mod Perl <br> Copyright Preface What You Need to Know Who This Book Is For How This Book Is Organized Reference Sections Filesystem Conventions Apache ...
Enterprise Development with Flex2010-05-12
Part of the popular "Adobe Developer Library" co-published by O'Reilly and Adobe, "Enterprise Development with Flex" goes well beyond Flex tutorials and product documentation to suggest best practices...
Packed with examples, it will teach you text processing techniques and give you the skills to work with the most popular Python libraries for transforming text from one form to another. The book ...
Beginning Silverlight 5 in C#, 4th Edition2012-07-11
Understand the fundamental concepts and techniques that lie at the heart of every successful Silverlight application and how to apply them to your own projects Explore the new features and coding ...
Deep Learning, Vol. 2: From Basics to Practice2019-01-06
Deep learning is fast becoming part of the intellectual toolkit used by scientists, artists, executives, doctors, musicians, and anyone else who wants to discover the information hiding in their data,...
Python Text Processing with NLTK 2.0 Cookbook.pdf2019-08-18
Toolkit (NLTK) suite of libraries has rapidly emerged as one of the most efficient tools for Natural Language Processing. You want to employ nothing less than the best techniques in Natural Language...