The Hadoop Distributed File System:
Architecture and Design
by Dhruba Borthakur
Table of contents
1 Introduction .......................................................................................................................3
2 Assumptions and Goals .....................................................................................................3
2.1 Hardware Failure ..........................................................................................................3
2.2 Streaming Data Access .................................................................................................3
2.3 Large Data Sets .............................................................................................................3
2.4 Simple Coherency Model .............................................................................................4
2.5 “Moving Computation is Cheaper than Moving Data” ................................................4
2.6 Portability Across Heterogeneous Hardware and Software Platforms .........................4
3 NameNode and DataNodes ...............................................................................................4
4 The File System Namespace .............................................................................................5
5 Data Replication ................................................................................................................6
5.1 Replica Placement: The First Baby Steps .................................................................... 7
5.2 Replica Selection ..........................................................................................................8
5.3 Safemode ......................................................................................................................8
6 The Persistence of File System Metadata ......................................................................... 8
7 The Communication Protocols ......................................................................................... 9
8 Robustness ........................................................................................................................ 9
8.1 Data Disk Failure, Heartbeats and Re-Replication .....................................................10
8.2 Cluster Rebalancing ....................................................................................................10
8.3 Data Integrity ..............................................................................................................10
Copyright © 2007 The Apache Software Foundation. All rights reserved.