CDH4 Installation Guide
Cloudera, Inc.
220 Portage Avenue
Palo Alto, CA 94306
info@cloudera.com
US: 1-888-789-1488
Intl: 1-650-362-0488
www.cloudera.com
Important Notice
© 2010-2013 Cloudera, Inc. All rights reserved.
Cloudera, the Cloudera logo, Cloudera Impala, Impala, and any other product or service names or
slogans contained in this document, except as otherwise disclaimed, are trademarks of Cloudera and its
suppliers or licensors, and may not be copied, imitated or used, in whole or in part, without the prior
written permission of Cloudera or the applicable trademark holder.
Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation. All other
trademarks, registered trademarks, product names and company names or logos mentioned in this
document are the property of their respective owners. Reference to any products, services, processes or
other information, by trade name, trademark, manufacturer, supplier or otherwise does not constitute
or imply endorsement, sponsorship or recommendation thereof by us.
Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights
under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval
system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or
otherwise), or for any purpose, without the express written permission of Cloudera.
Cloudera may have patents, patent applications, trademarks, copyrights, or other intellectual property
rights covering subject matter in this document. Except as expressly provided in any written license
agreement from Cloudera, the furnishing of this document does not give you any license to these
patents, trademarks copyrights, or other intellectual property.
The information in this document is subject to change without notice. Cloudera shall not be liable for
any damages resulting from technical errors or omissions which may be present in this document, or
from use of this document.
Version: CDH4.2
Date: February 27, 2013
Contents
ABOUT THIS GUIDE ................................................................................................................................................ 1
WHAT'S NEW IN CDH4 ........................................................................................................................................... 1
BEFORE YOU INSTALL CDH4 ON A CLUSTER ........................................................................................................... 1
CDH4 INSTALLATION .............................................................................................................................................. 2
CDH4 AND MAPREDUCE ................................................................................................................................................ 2
MapReduce 2.0 (YARN) ......................................................................................................................................... 2
WAYS TO INSTALL CDH4 ................................................................................................................................................ 3
How Packaging Affects CDH4 Deployment ........................................................................................................... 3
BEFORE YOU BEGIN INSTALLING CDH4 MANUALLY .............................................................................................................. 4
INSTALLING CDH4 ......................................................................................................................................................... 4
Step 1: Add or Build the CDH4 Repository or Download the "1-click Install" package. ......................................... 4
Step 1a: Optionally Add a Repository Key ............................................................................................................. 9
Step 2: Install CDH4 with MRv1 .......................................................................................................................... 10
Step 3: Install CDH4 with YARN ........................................................................................................................... 13
Step 4: Deploy CDH and Install Components....................................................................................................... 15
INSTALLING CDH4 COMPONENTS ................................................................................................................................... 16
VIEWING THE APACHE HADOOP DOCUMENTATION ............................................................................................................. 17
INSTALLING AN EARLIER CDH4 RELEASE............................................................................................................... 17
DOWNLOADING AND INSTALLING AN EARLIER RELEASE ........................................................................................................ 17
On Red Hat-compatible systems ......................................................................................................................... 18
On SLES systems .................................................................................................................................................. 19
On Ubuntu and Debian systems .......................................................................................................................... 20
UPGRADING FROM CDH3 TO CDH4 ...................................................................................................................... 22
CDH4 AND MAPREDUCE .............................................................................................................................................. 22
MapReduce 2.0 (YARN) ....................................................................................................................................... 23
High Availability .................................................................................................................................................. 23
BEFORE YOU BEGIN ...................................................................................................................................................... 24
Plan Downtime .................................................................................................................................................... 24
Considerations for Secure Clusters ...................................................................................................................... 24
UPGRADING TO CDH4 .................................................................................................................................................. 24
Step 1: Back Up Configuration Data and Uninstall Components ........................................................................ 25
Step 2: Back up the HDFS Metadata ................................................................................................................... 26
Step 3: Copy the Hadoop Configuration to the Correct Location and Update Alternatives ................................ 27
Step 4: Uninstall CDH3 Hadoop ........................................................................................................................... 28
Step 5: Download CDH4 ...................................................................................................................................... 29
Step 6a: Install CDH4 with MRv1 ........................................................................................................................ 32
Step 6b: Install CDH4 with YARN ......................................................................................................................... 34
Step 7: Copy the CDH4 Logging File .................................................................................................................... 37
Step 7a: (Secure Clusters Only) Set Variables for Secure DataNodes .................................................................. 37
Step 8: Upgrade the HDFS Metadata .................................................................................................................. 38
Step 9: Create the HDFS /tmp Directory ............................................................................................................ 39
Step 10: Start MapReduce (MRv1) or YARN ..................................................................................................... 39
Step 11: Set the Sticky Bit .................................................................................................................................... 46
Step 12: Re-Install CDH4 Components ................................................................................................................ 46
Step 13: Apply Configuration File Changes ......................................................................................................... 47
Step 14: Finalize the HDFS Metadata Upgrade ................................................................................................... 47
MIGRATING DATA BETWEEN A CDH3 AND CDH4 CLUSTER .................................................................................. 48
REQUIREMENTS ........................................................................................................................................................... 48
USING DISTCP TO MIGRATE DATA BETWEEN TWO CLUSTERS ................................................................................................ 49
The DistCp Command .......................................................................................................................................... 49
POST-MIGRATION VERIFICATION ..................................................................................................................................... 49
UPGRADING FROM AN EARLIER CDH4 RELEASE ................................................................................................... 50
BEFORE YOU BEGIN ...................................................................................................................................................... 51
UPGRADING TO THE LATEST VERSION OF CDH4 ................................................................................................................. 51
Step 1: Prepare the cluster for the upgrade. ....................................................................................................... 51
Step 2: Download the CDH4 package on each of the hosts in your cluster. ........................................................ 53
Step 3: Upgrade the packages on the appropriate hosts. ................................................................................... 56
Step 4: Upgrade the HDFS Metadata (Beta 1 or earlier) ..................................................................................... 62
Step 5: Start HDFS (Beta 2 or later) ..................................................................................................................... 63
Step 5a: Verify that /tmp Exists and Has the Right Permissions ....................................................................... 63
Step 6: Start MapReduce (MRv1) or YARN .......................................................................................................... 64
Step 7: Set the Sticky Bit ...................................................................................................................................... 72
Step 8: Upgrade Components to CDH4 ............................................................................................................... 72
Step 9: Apply Configuration File Changes ........................................................................................................... 73
Step 10: Finalize the HDFS Metadata Upgrade (Beta 1 or earlier)...................................................................... 73
CONFIGURING PORTS FOR CDH4 ......................................................................................................................... 74
PORTS USED BY COMPONENTS OF CDH4 ......................................................................................................................... 74
PORTS USED BY THIRD PARTIES ....................................................................................................................................... 81
DEPLOYING CDH4 IN PSEUDO-DISTRIBUTED MODE ............................................................................................. 82
DEPLOYING CDH4 ON A CLUSTER ......................................................................................................................... 82
CONFIGURING NETWORK NAMES .................................................................................................................................... 82
DEPLOYING HDFS ON A CLUSTER .................................................................................................................................... 84
Copying the Hadoop Configuration ..................................................................................................................... 84
Customizing Configuration Files .......................................................................................................................... 85
Configuring Local Storage Directories ................................................................................................................. 87
Configuring DataNodes to Tolerate Local Storage Directory Failure .................................................................. 90
Formatting the NameNode ................................................................................................................................. 91
Configuring a Remote NameNode Storage Directory ......................................................................................... 92
Configuring the Secondary NameNode ............................................................................................................... 93
Enabling Trash .................................................................................................................................................... 95
Enabling WebHDFS ............................................................................................................................................. 97
DEPLOYING MAPREDUCE V1 (MRV1) ON A CLUSTER .......................................................................................................... 97
Step 1: Configuring Properties for MRv1 Clusters ............................................................................................... 99
Step 2: Configure Local Storage Directories for Use by MRv1 Daemons........................................................... 100
Step 3: Configure a Health Check Script for DataNode Processes ..................................................................... 101
Step 4: Configure JobTracker Recovery ............................................................................................................. 102
Enabling JobTracker Recovery ........................................................................................................................... 102
Step 5: Deploy your Custom Configuration to your Entire Cluster .................................................................... 102
Step 6: Start HDFS on Every Node in the Cluster ............................................................................................... 103
Step 7: Create the HDFS /tmp Directory .......................................................................................................... 103
Step 8: Create MapReduce /var directories ..................................................................................................... 104
Step 9: Verify the HDFS File Structure ............................................................................................................... 104
Step 10: Create and Configure the mapred.system.dir Directory in HDFS ...................................................... 104
Step 11: Start MapReduce ................................................................................................................................ 105
Step 12: Create a Home Directory for each MapReduce User .......................................................................... 105