iii
Table of Contents
1. Who is this book for? ...................................................................................................... 1
1.1. About "Hadoop illuminated" ................................................................................... 1
2. About Authors ................................................................................................................ 2
3. Big Data ....................................................................................................................... 5
3.1. What is Big Data? ................................................................................................ 5
3.2. Human Generated Data and Machine Generated Data .................................................. 5
3.3. Where does Big Data come from ............................................................................ 5
3.4. Examples of Big Data in the Real world ................................................................... 6
3.5. Challenges of Big Data ......................................................................................... 7
3.1. Taming Big Data .................................................................................................. 8
4. Hadoop and Big Data ...................................................................................................... 9
4.1. How Hadoop solves the Big Data problem ................................................................ 9
4.2. Business Case for Hadoop .................................................................................... 10
5. Hadoop for Executives ................................................................................................... 12
6. Hadoop for Developers .................................................................................................. 14
7. Soft Introduction to Hadoop ............................................................................................ 16
7.1. Hadoop = HDFS + MapReduce ............................................................................. 16
7.2. Why Hadoop? .................................................................................................... 16
7.3. Meet the Hadoop Zoo .......................................................................................... 18
7.4. Hadoop alternatives ............................................................................................. 21
7.5. Alternatives for distributed massive computations ..................................................... 22
7.6. Arguments for Hadoop ........................................................................................ 23
8. Hadoop Distributed File System (HDFS) -- Introduction ....................................................... 24
8.1. HDFS Concepts .................................................................................................. 24
8.1. HDFS Architecture ............................................................................................. 27
9. Introduction To MapReduce ............................................................................................ 30
9.1. How I failed at designing distributed processing ....................................................... 30
9.2. How MapReduce does it ...................................................................................... 31
9.3. How MapReduce really does it ............................................................................. 31
9.1. Understanding Mappers and Reducers .................................................................... 32
9.4. Who invented this? ............................................................................................. 34
9.5. The benefits of MapReduce programming ............................................................... 34
10. Hadoop Use Cases and Case Studies ............................................................................... 35
10.1. Politics ............................................................................................................ 35
10.2. Data Storage .................................................................................................... 35
10.3. Financial Services ............................................................................................. 35
10.4. Health Care ...................................................................................................... 36
10.5. Human Sciences ............................................................................................... 37
10.6. Telecoms ......................................................................................................... 37
10.7. Travel ............................................................................................................. 38
10.8. Energy ............................................................................................................ 38
10.9. Logistics .......................................................................................................... 39
10.10. Retail ............................................................................................................ 40
10.11. Software / Software As Service (SAS) / Platforms / Cloud ....................................... 40
10.12. Imaging / Videos ............................................................................................. 41
10.13. Online Publishing , Personalized Content ............................................................. 42
11. Hadoop Distributions ................................................................................................... 44
11.1. The Case for Distributions .................................................................................. 44
11.2. Overview of Hadoop Distributions ....................................................................... 44
11.3. Hadoop in the Cloud ......................................................................................... 45
12. Big Data Ecosystem ..................................................................................................... 47