Introducing YARN .................................................................................................................37
The structure of a YARN job .................................................................................................37
How MapReduce works in YARN .........................................................................................39
Scheduler implementations ..................................................................................................39
What is a resource? .............................................................................................................42
Monitoring jobs in YARN ......................................................................................................42
Summary ..............................................................................................................................44
Chapter 5 Hadoop Streaming ................................................................................................45
Introducing Hadoop Streaming .............................................................................................45
Streaming jobs in Hadoop ....................................................................................................45
The streaming interface ........................................................................................................46
Streaming word count in Python ...........................................................................................47
Streaming word count in .NET ..............................................................................................51
Interoperability with Hadoop Streaming ................................................................................55
Managing dependencies in Hadoop Streaming ....................................................................55
Performance impact of Hadoop Streaming ...........................................................................56
Summary ..............................................................................................................................57
Chapter 6 Inside the Cluster ..................................................................................................58
Introduction ..........................................................................................................................58
Sizing Hadoop—minimal cluster ...........................................................................................58
Sizing Hadoop—the small cluster .........................................................................................59
Sizing Hadoop – large clusters .............................................................................................61
Reliability in HDFS ...............................................................................................................61
Reliability in YARN ...............................................................................................................63
Cluster configuration ............................................................................................................63
core-site.xml .........................................................................................................................64