所需积分/C币:9 2013-12-04 12:06:34 7.66MB PDF
收藏 收藏

hadoop权威指南 英文版,非影印版。
SECOND EDITION Hadoop: The definitive guide Tom White foreword by doug cutting O REILLY Beijing· Cambridge· Farnham·Koln· Sebastopol· Tokyo Hadoop: The Definitive Guide Second Edition by Tom white Copyright o 2011 Tom White. All rights reserved Printed in the United States of america Published by O Reilly Media, Inc, 1005 Gravenstein Highway North, Sebastopol, CA 95472 O'Reilly books may be purchased for educational, business, or sales promotional use. Online editions arealsoavailableformosttitles(http://my.safaribooksonline.com).Formoreinformationcontactour corporate/institutionalsalesdepartment:(800)998-9938orcorporate@oreilly.com Editor: Mike Loukides Indexer jay book Services Production editor: Adam zaremba Cover designer Karen montgomery Proofreader: Diane ll grande Interior Designer: David Futato Illustrator: Robert ro Printing History: June 2009 First edition October 2010: Second edition Nutshell Handbook, the Nutshell Handbook logo, and the O Reilly logo are registered trademarks of O'Reilly Media, Inc. Hadoop: The Definitive Guide, the image of an African elephant, and related trade dress are trademarks of O'Reilly Media, Inc Many of the designations used by manufacturers and sellers to distinguish their products are claimed as rademarks. Where those designations appear in this book, and O'Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information con- tained herein ISBN:978-1-449-38973-4 SB 1285179414 For Eliane. Emilia, and lottie Table of contents Foreword XV Preface 1. Meet hadoop… Data Data Storage and analysis Comparison with Other Systems RDBMS Grid Computing 134468 Volunteer Computing A Brief History of Hadoop Apache Hadoop and the Hadoop Ecosystem 12 2. Mapreduce A Weather Dataset 15 Data format Analyzing the data with Unix Tools 17 analyzing the data with hadoop 18 Map and reduce Java Mapreduce 20 caling out 27 Data flow 28 Combiner functions 30 Running a distributed Mapreduce Job Hadoop Streaming 33 Ruby 33 Python 36 P 37 Compiling and running 38 3. The hadoop Distributed filesystem 41 The Design of HDFS 41 HDFS Concepts 43 Blocks 43 Namenodes and datanodes 44 The Command-Line Interface Basic Filesystem Operations 46 Hadoop filesystems 47 Interfaces 49 The Java Interface 51 Reading Data from a Hadoop url 51 Reading Data Using the File System API 52 Writing Data Directories Querying the Filesystem 57 Deleting Data Data flow 62 Anatomy of a File read 62 Anatomy of a File Write 6 Coherency mode Parallel Copying with distcp Keeping an hDFS Cluster Balanced 71 Hadoop archives 71 Using Hadoop archiv Limitations 4. Hadoop /0 Data Integrity 75 Data Integrity in HDFS ocalFileSystem 76 Checksum FileSystem Compression 78 Compression and input splits 83 Using Compression in MapReduce Serialization The Writable interface Writable classes 89 Implementing a Custom Writable 96 Serialization frameworks 101 Avro 103 File-Based Data Structures 116 Sequence file 116 ⅵ i Table of Contents aprIle 5. Developing a MapReduce Application............. 129 The Configuration API 130 Combining re esources 131 Variable Expansion 132 Configuring the development environment 132 Managing Configuration 132 GenericOptionsParser, Tool, and ToolRunner 135 Writing a unit test 138 apper 138 Reducer 140 Running locally on Test Data 141 Running a job in a local Job runner 141 Testing the Driver 145 Running on a Cluster 146 ging 146 Launching a Job 146 The MapReduce Web UI 148 Retrieving the results 151 Debugging a Job 153 Using a Remote debugger 158 g a Jo Profiling Tasks 160 MapReduce Workflows 163 Decomposing a Problem into MapReduce Jobs 163 Running dependent jobs 165 6. How MapReduce Works 167 Anatomy of a MapReduce Job run 167 Job Submission 167 Job Initialization 169 Task assignment 169 Task execution 170 Progress and Status Updates 170 Job Completion 172 Failures 173 Task failure 173 Tasktracker failure 175 obtracker上 allure 175 Job Scheduling 175 The Fair Schedul 176 The Capacity Scheduler 177 Table of contents|ⅶi Shuffle and sort 177 The Map Side 177 The Reduce Side 179 Configuration Tuning 180 Task execution 183 Speculative Execution 183 Task jVm Reuse 184 Skipping Bad records 185 The task execution environment 186 7. MapReduce Types and Formats ,189 MapReduce types 189 The Default Map Reduce Job Input Formats 198 nput Splits and Records 198 Text Input 209 Bi Input 213 Multiple inputs 214 Database Input(and Output) 215 Output Formats 215 Text Output 216 Binary Output 216 Multiple outputs 217 Lazy Output 224 Database Output 224 8. MapReduce Features......................... 225 Counters 225 Built-in Counters 225 User-Defined Java Counters 227 User-Defined Streaming Counters 232 Sorting 232 Preparation 232 Partial sort 233 Total Sort 237 ry so 241 Joins 247 ap-SIde Joins 247 Reduce-Side joins 249 Side data distribution 252 Using the Job Configuration 252 Distributed cache 253 MapReduce library classes 257 ⅶ ii Table of Contents

试读 127P hadoop权威指南
立即下载 低至0.43元/次 身份认证VIP会员低至7折
  • 分享宗师

关注 私信 TA的资源
hadoop权威指南 9积分/C币 立即下载

试读结束, 可继续阅读

9积分/C币 立即下载 >