Cloudera Administration
Important Notice
© 2010-2016 Cloudera, Inc. All rights reserved.
Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service
names or slogans contained in this document are trademarks of Cloudera and its
suppliers or licensors, and may not be copied, imitated or used, in whole or in part,
without the prior written permission of Cloudera or the applicable trademark holder.
Hadoop and the Hadoop elephant logo are trademarks of the Apache Software
Foundation. All other trademarks, registered trademarks, product names and
company names or logos mentioned in this document are the property of their
respective owners. Reference to any products, services, processes or other
information, by trade name, trademark, manufacturer, supplier or otherwise does
not constitute or imply endorsement, sponsorship or recommendation thereof by
us.
Complying with all applicable copyright laws is the responsibility of the user. Without
limiting the rights under copyright, no part of this document may be reproduced,
stored in or introduced into a retrieval system, or transmitted in any form or by any
means (electronic, mechanical, photocopying, recording, or otherwise), or for any
purpose, without the express written permission of Cloudera.
Cloudera may have patents, patent applications, trademarks, copyrights, or other
intellectual property rights covering subject matter in this document. Except as
expressly provided in any written license agreement from Cloudera, the furnishing
of this document does not give you any license to these patents, trademarks
copyrights, or other intellectual property. For information about patents covering
Cloudera products, see http://tiny.cloudera.com/patents.
The information in this document is subject to change without notice. Cloudera
shall not be liable for any damages resulting from technical errors or omissions
which may be present in this document, or from use of this document.
Cloudera, Inc.
1001 Page Mill Road, Bldg 3
Palo Alto, CA 94304
info@cloudera.com
US: 1-888-789-1488
Intl: 1-650-362-0488
www.cloudera.com
Release Information
Version: 5.7.x
Date: April 6, 2016
Table of Contents
About Cloudera Administration................................................................................8
Managing CDH and Managed Services....................................................................9
Managing CDH and Managed Services Using Cloudera Manager..........................................................9
Configuration Overview............................................................................................................................................9
Managing Clusters..................................................................................................................................................36
Managing Services..................................................................................................................................................39
Managing Roles......................................................................................................................................................48
Managing Hosts......................................................................................................................................................53
Maintenance Mode.................................................................................................................................................66
Managing CDH Using the Command Line...............................................................................................68
Starting CDH Services Using the Command Line................................................................................................69
Stopping CDH Services Using the Command Line..............................................................................................74
Migrating Data between a CDH 4 and CDH 5 Cluster.........................................................................................77
Managing Individual Services...................................................................................................................82
Managing Flume.....................................................................................................................................................82
Managing HBase....................................................................................................................................................85
Managing HDFS....................................................................................................................................................146
Managing Hive......................................................................................................................................................184
Managing Hue.......................................................................................................................................................192
Managing Impala..................................................................................................................................................216
Managing Key-Value Store Indexer....................................................................................................................228
Managing Oozie....................................................................................................................................................230
Managing Solr.......................................................................................................................................................238
Managing Spark....................................................................................................................................................242
Managing the Sqoop 1 Client..............................................................................................................................245
Managing Sqoop 2................................................................................................................................................246
Managing YARN (MRv2) and MapReduce (MRv1).............................................................................................246
Managing ZooKeeper...........................................................................................................................................254
Configuring Services to Use the GPL Extras Parcel..........................................................................................254
Performance Management...................................................................................256
Optimizing Performance in CDH............................................................................................................256
Choosing a Data Compression Format.................................................................................................259
Tuning Hive...............................................................................................................................................260
Tuning Hive on Spark..............................................................................................................................261
YARN Configuration.............................................................................................................................................262
Spark Configuration..............................................................................................................................................262
Hive Configuration................................................................................................................................................264
Tuning the Solr Server.............................................................................................................................265
Tuning to Complete During Setup.......................................................................................................................265
General Tuning......................................................................................................................................................266
Other Resources...................................................................................................................................................271
Tuning Spark Applications......................................................................................................................271
Tuning YARN............................................................................................................................................278
Overview................................................................................................................................................................278
Cluster Configuration...........................................................................................................................................282
YARN Configuration.............................................................................................................................................283
MapReduce Configuration...................................................................................................................................284
Step 7: MapReduce Configuration......................................................................................................................284
Step 7A: MapReduce Sanity Checking................................................................................................................284
Configuring Your Cluster In Cloudera Manager.................................................................................................284
Resource Management.........................................................................................286
Cloudera Manager Resource Management Features.........................................................................286
Static Service Pools.................................................................................................................................287
Linux Control Groups (cgroups)...........................................................................................................................288
Dynamic Resource Pools........................................................................................................................291
Managing Dynamic Resource Pools...................................................................................................................292
YARN Pool Status and Configuration Options..................................................................................................294
Defining Resource Allocations with Configuration Sets..................................................................................295
Configuring Configuration Set Schedules..........................................................................................................296
Assigning Applications and Queries to Resource Pools...................................................................................298
YARN (MRv2) and MapReduce (MRv1) Schedulers..............................................................................299
Configuring the Fair Scheduler............................................................................................................................300
Enabling and Disabling Fair Scheduler Preemption.........................................................................................302
Resource Management for Impala........................................................................................................304
Controlling Resource Estimation Behavior........................................................................................................304
Checking Resource Estimates and Actual Usage.............................................................................................304
How Resource Limits Are Enforced....................................................................................................................305
Enabling Resource Management for Impala.....................................................................................................305
Limitations of Resource Management for Impala............................................................................................305
Admission Control and Query Queuing..............................................................................................................306
Managing Impala Admission Control.................................................................................................................313
Cluster Utilization Reports.....................................................................................................................315
Configuring the Cluster Utilization Report.........................................................................................................316
Using the Cluster Utilization Report to Manage Resources............................................................................317
High Availability.....................................................................................................324
HDFS High Availability............................................................................................................................324
Introduction to HDFS High Availability..............................................................................................................324
Configuring Hardware for HDFS HA...................................................................................................................326
Enabling HDFS HA................................................................................................................................................327
Disabling and Redeploying HDFS HA.................................................................................................................340
Configuring Other CDH Components to Use HDFS HA.....................................................................................344
Administering an HDFS High Availability Cluster.............................................................................................346
Changing a Nameservice Name for Highly Available HDFS Using Cloudera Manager.................................350
MapReduce (MRv1) and YARN (MRv2) High Availability.....................................................................351
YARN (MRv2) ResourceManager High Availability...........................................................................................351
Work Preserving Recovery for YARN Components...........................................................................................358
MapReduce (MRv1) JobTracker High Availability..............................................................................................360
Cloudera Navigator Key Trustee Server High Availability...................................................................373
Configuring Key Trustee Server High Availability Using Cloudera Manager..................................................373
Configuring Key Trustee Server High Availability Using the Command Line.................................................374
Recovering a Key Trustee Server........................................................................................................................375
Key Trustee KMS High Availability.........................................................................................................375
High Availability for Other CDH Components.......................................................................................376
HBase High Availability........................................................................................................................................377
Hue High Availability ...........................................................................................................................................382
Hive Metastore High Availability........................................................................................................................385
Configuring Oozie for High Availability...............................................................................................................387
Search High Availability.......................................................................................................................................388
Configuring Cloudera Manager for High Availability With a Load Balancer......................................389
Introduction to Cloudera Manager Deployment Architecture.........................................................................390
Prerequisites for Setting up Cloudera Manager High Availability...................................................................391
Cloudera Manager Failover Protection..............................................................................................................391
High-Level Steps to Configure Cloudera Manager High Availability ..............................................................393
Database High Availability Configuration..........................................................................................................419
TLS and Kerberos Configuration for Cloudera Manager High Availability......................................................420
Backup and Disaster Recovery.............................................................................422
Port Requirements for Backup and Disaster Recovery.......................................................................422
Data Replication......................................................................................................................................423
Designating a Replication Source.......................................................................................................................426
HDFS Replication..................................................................................................................................................427
Hive Replication....................................................................................................................................................434
Impala Metadata Replication..............................................................................................................................441
Using Snapshots with Replication......................................................................................................................442
Enabling Replication Between Clusters in Different Kerberos Realms..........................................................442