没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
Table of Contents
Cover
Title Page
Copyright
Dedication
About the Authors
Credits
Acknowledgments
Introduction
The Origins of Kettle
About This Book
How This Book Is Organized
Prerequisites
On the Website
Further Resources
Part I: Getting Started
Chapter 1: ETL Primer
OLTP versus Data Warehousing
What Is ETL?
ETL, ELT, and EII
Data Integration Challenges
ETL Tool Requirements
Summary
Chapter 2: Kettle Concepts
Design Principles
The Building Blocks of Kettle Design
Parameters and Variables
Visual Programming
Summary
Chapter 3: Installation and Configuration
Kettle Software Overview
Installation
Configuration
Summary
Chapter 4: An Example ETL Solution—Sakila
Sakila
Prerequisites and Some Basic Spoon Skills
The Sample ETL Solution
Summary
Part II: ETL
Chapter 5: ETL Subsystems
Introduction to the 34 Subsystems
Summary
Chapter 6: Data Extraction
Kettle Data Extraction Overview
Working with ERP and CRM Systems
Data Profiling
CDC: Change Data Capture
Delivering Data
Summary
Chapter 7: Cleansing and Conforming
Data Cleansing
Error Handling
Auditing Data and Process Quality
Deduplicating Data
Scripting
Summary
Chapter 8: Handling Dimension Tables
Managing Keys
Loading Dimension Tables
Slowly Changing Dimensions
More Dimensions
Summary
Chapter 9: Loading Fact Tables
Loading in Bulk
Dimension Lookups
Fact Table Handling
Summary
Chapter 10: Working with OLAP Data
OLAP Benefits and Challenges
Working with Mondrian
Working with XML/A Servers
Working with Palo
Summary
Part III: Management and Deployment
Chapter 11: ETL Development Lifecycle
Solution Design
Agile Development
Testing and Debugging
Documenting the Solution
Summary
Chapter 12: Scheduling and Monitoring
Scheduling
Monitoring
Summary
Chapter 13: Versioning and Migration
Version Control Systems
Kettle Metadata
Managing Repositories
Version Migration System
Summary
Chapter 14: Lineage and Auditing
Batch-Level Lineage Extraction
Lineage
Logging and Operational Metadata
Summary
Part IV: Performance and Scalability
Chapter 15: Performance Tuning
Transformation Performance: Finding the Weakest Link
Improving Transformation Performance
Improving Job Performance
Summary
Chapter 16: Parallelization, Clustering, and Partitioning
Multi-Threading
Using Carte as a Slave Server
Clustering Transformations
Partitioning
Summary
Chapter 17: Dynamic Clustering in the Cloud
Dynamic Clustering
Cloud Computing
EC2
Summary
Chapter 18: Real-Time Data Integration
Introduction to Real-Time ETL
Transformation Streaming
Summary
Part V: Advanced Topics
Chapter 19: Data Vault Management
Introduction to Data Vault Modeling
Do You Need a Data Vault?
Data Vault Building Blocks
Transforming Sakila to the Data Vault Model
Loading the Data Vault: A Sample ETL Solution
Updating a Data Mart from a Data Vault
Summary
Chapter 20: Handling Complex Data Formats
剩余581页未读,继续阅读
资源评论
- daniudaniudaniu2014-02-25玩 ETL 的朋友,必须得看,很好的书啊
外面de雨
- 粉丝: 0
- 资源: 6
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 2023-04-06-项目笔记 - 第一百十五阶段 - 4.4.2.113全局变量的作用域-113 -2024.04.26
- 2023-04-06-项目笔记 - 第一百十五阶段 - 4.4.2.113全局变量的作用域-113 -2024.04.26
- htmlzwbjq_downyi.com.zip
- 无头单向非循环链表的实现(Test.c)
- 无头单向非循环链表的实现(SList.c)
- 浏览器重定向插件更新文件
- SSA-BP麻雀算法优化BP神经网络多特征分类预测(Matlab实现完整源码和数据)
- 粒子群算法优化BP神经网络PSO-BP的MATLAB代码(数值预测)
- 基于Springboot的一起看书平台.zip
- 无头单向非循环链表的实现(SList.h)
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功