没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
Jan Kunigk, Ian Buss,
Paul Wilkinson & Lars George
Architecting
Modern Data
Platforms
A GUIDE TO ENTERPRISE HADOOP AT SCALE
Jan Kunigk, Ian Buss, Paul Wilkinson,
and Lars George
Architecting Modern
Data Platforms
A Guide to Enterprise
Hadoop at Scale
Boston Farnham Sebastopol TokyoBeijing Boston Farnham Sebastopol TokyoBeijing
978-1-491-96927-4
[LSI]
Architecting Modern Data Platforms
by Jan Kunigk, Ian Buss, Paul Wilkinson, and Lars George
Copyright © 2019 Jan Kunigk, Lars George, Ian Buss, and Paul Wilkinson. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are
also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/insti‐
tutional sales department: 800-998-9938 or corporate@oreilly.com.
Editors: Nicole Tache and Michele Cronin
Production Editors: Nicholas Adams and
Kristen Brown
Copyeditor: Shannon Wright
Proofreader: Rachel Head
Indexer: Ellen Troutman-Zaig
Interior Designer: David Futato
Cover Designer: Karen Montgomery
Illustrator: Rebecca Demarest
December 2018: First Edition
Revision History for the First Edition
2018-12-05: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781491969274 for release details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Architecting Modern Data Platforms,
the cover image, and related trade dress are trademarks of O’Reilly Media, Inc.
The views expressed in this work are those of the authors, and do not represent the publisher’s views.
While the publisher and the authors have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility
for errors or omissions, including without limitation responsibility for damages resulting from the use of
or reliance on this work. Use of the information and instructions contained in this work is at your own
risk. If any code samples or other technology this work contains or describes is subject to open source
licenses or the intellectual property rights of others, it is your responsibility to ensure that your use
thereof complies with such licenses and/or rights.
Table of Contents
Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
1. Big Data Technology Primer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
A Tour of the Landscape 3
Core Components 5
Computational Frameworks 10
Analytical SQL Engines 14
Storage Engines 18
Ingestion 24
Orchestration 25
Summary 26
Part I. Infrastructure
2. Clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Reasons for Multiple Clusters 31
Multiple Clusters for Resiliency 31
Multiple Clusters for Software Development 32
Multiple Clusters for Workload Isolation 33
Multiple Clusters for Legal Separation 34
Multiple Clusters and Independent Storage and Compute 35
Multitenancy 35
Requirements for Multitenancy 36
Sizing Clusters 37
Sizing by Storage 38
iii
Sizing by Ingest Rate 40
Sizing by Workload 41
Cluster Growth 41
The Drivers of Cluster Growth 42
Implementing Cluster Growth 42
Data Replication 43
Replication for Software Development 43
Replication and Workload Isolation 43
Summary 44
3. Compute and Storage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Computer Architecture for Hadoop 46
Commodity Servers 46
Server CPUs and RAM 48
Nonuniform Memory Access 50
CPU Specifications 54
RAM 55
Commoditized Storage Meets the Enterprise 55
Modularity of Compute and Storage 57
Everything Is Java 57
Replication or Erasure Coding? 57
Alternatives 58
Hadoop and the Linux Storage Stack 58
User Space 58
Important System Calls 61
The Linux Page Cache 62
Short-Circuit and Zero-Copy Reads 65
Filesystems 69
Erasure Coding Versus Replication 71
Discussion 76
Guidance 79
Low-Level Storage 81
Storage Controllers 81
Disk Layer 84
Server Form Factors 91
Form Factor Comparison 94
Guidance 95
Workload Profiles 96
Cluster Configurations and Node Types 97
Master Nodes 98
Worker Nodes 99
Utility Nodes 100
iv | Table of Contents
剩余632页未读,继续阅读
资源评论
flylehe
- 粉丝: 0
- 资源: 10
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功