hive_performance_tuning-HDP3.1.0.pdf资源-CSDN文库

需积分: 10 186 浏览量 2021-07-19 22:32:02 上传评论收藏 805KB PDF 举报

Hive性能调优是在Hadoop生态系统中提高数据仓库性能的重要环节。本文档针对的是使用Hortonworks数据平台（HDP）3.1.0版本的Apache Hive，它详细介绍了如何对Hive数据仓库进行调优，以便提高业务智能和其他应用程序的相关性和性能。随着工作负载和数据库规模的增长，调整Hive和支撑Hive处理的后台组件变得越来越重要。文档中首先提到了LLAP（Low-Latency Analytical Processing）的相关概念，它是提升交互式查询性能的关键技术。LLAP的引入使得Hive能够满足低延迟、变化量级的基准测试，在15秒或更短时间内得到响应。为了实现这样的性能，文档中介绍了多个准备步骤和配置项，包括启用YARN预占（preemption）、设置LLAP、配置HiveServer Interactive以提升高可用性、配置多个HiveServer实例等。在监控方面，文档介绍了如何监控Apache Hive以及LLAP资源的使用情况，这对于保持系统性能和及时发现瓶颈至关重要。文档强调了利用ORC（Optimized Row Columnar）文件格式来最大化存储资源的利用，以及通过高级ORC属性和分区来进一步提升性能。此外，文档还讨论了如何通过Tez执行引擎属性来优化Hive的数据仓库处理组件。Tez作为Hive的执行引擎，支持更复杂的查询计划，它通过并行执行来提高效率，特别适用于大规模数据集的处理。在交互式查询方面，文档提供了设置和运行交互式查询的方法，包括如何使用HiveServer Interactive UI和如何将JDBC客户端连接到LLAP。通过优化YARN队列配置，使得批量处理和自定义LLAP队列能够更合理地分配资源，进而优化查询性能。关键组件方面，文档强调了查询结果缓存和元数据存储缓存的作用，这是提高Hive数据仓库性能的重要组成部分。通过缓存减少对HDFS的重复读取，可以显著提高查询的响应速度。为了充分利用成本基础优化器（Cost-Based Optimizer, CBO），文档介绍了如何设置CBO以及如何生成和查看Apache Hive统计数据。CBO通过分析统计数据，能够为查询生成更优的执行计划，从而提高查询效率。文档还探讨了优化和规划属性，这些属性对于Hive查询性能的提升有着直接的影响。Hive提供了多种优化属性，合理配置这些属性能够使Hive更加智能地处理查询，如减少不必要的数据扫描、优化join操作的执行顺序等。本文档提供了一套全面的Hive性能调优指南，涵盖了从硬件资源分配、查询执行计划优化到监控和故障诊断等各个方面，适用于希望提升Hive数据仓库性能的开发者和数据工程师。通过对这些知识点的理解和应用，可以显著提高Hive数据仓库在处理大数据集时的效率，满足企业对数据交互式分析的高性能需求。

资源推荐

资源详情

资源评论

Data Access 3

Apache Hive Performance Tuning

Date of Publish: 2018-08-23

http://docs.hortonworks.com

Data Access | Contents | ii

Contents

Optimizing an Apache Hive data warehouse........................................................ 4

LLAP ports................................................................................................................4

Preparations for tuning performance.....................................................................4

Setting up LLAP.......................................................................................................5

Enable YARN preemption................................................................................................................................... 5

Enable interactive query.......................................................................................................................................6

Set up multiple HiveServer Interactives for high availability............................................................................. 7

Configure an llap queue....................................................................................................................................... 7

Add a Hive proxy.................................................................................................................................................9

Configure other LLAP properties........................................................................................................................ 9

Configure the HiveServer heap size.................................................................................................................. 11

Save LLAP settings and restart services............................................................................................................12

Run an interactive query.................................................................................................................................... 14

Use HiveServer Interactive UI.............................................................................. 15

Connect a JDBC client to LLAP.......................................................................... 16

Configuring YARN queues for Hive.................................................................... 16

Configure a queue for batch processing............................................................................................................ 16

Configure a custom LLAP queue...................................................................................................................... 19

Set up multiple HiveServer instances...................................................................21

Key components of Hive warehouse processing..................................................22

Query result cache and metastore cache..............................................................24

Tez execution engine properties............................................................................24

Monitoring Apache Hive performance.................................................................25

Monitoring LLAP resources...............................................................................................................................25

Data Access Optimizing an Apache Hive data warehouse

Optimizing an Apache Hive data warehouse

You can tune your data warehouse infrastructure, components, and client connection parameters to improve the

performance and relevance of business intelligence and other applications. Tuning Hive and background components

that support Hive processing is particularly important as your workload and database volume increases.

Increasingly, enterprises want to run SQL workloads that return faster results than batch processing can provide.

These enterprises often want data analytics applications to support interactive queries. Hive low-latency analytical

processing (LLAP) can improve the performance of interactive queries. A Hive interactive query that runs on the

Hortonworks Data Platform (HDP) meets low-latency, variably guaged benchmarks to which Hive LLAP responds in

15 seconds or fewer. LLAP enables application development and IT infrastructure to run queries that return real-time

or near-real-time results.

You can further enhance LLAP performance with real-time data by integrating the enterprise data warehouse (EDW)

with the Druid business intelligence engine.

When you query large-scale EDW data sets, you have to meet service-level agreement (SLA) benchmarks or other

performance expectations. Because how you tune your query processing environment depends on factors such as

system resources, depth of data analysis, and query latency requirements, you must become familiar with Hive

warehouse processing, prepare for tuning, and configure LLAP using parameters that meet your performance needs.

LLAP ports

You use port 10500 to make the JDBC connection through Beeline to query Hive through the HiveServer Interactive

host. The LLAP daemon uses several other ports.

List of port properties

• HiveServer Interactive (LLAP) port (10500)

• hive.server2.thrift.http.port (10501)

• hive.llap.daemon.rpc.port (0)

• hive.llap.daemon.web.port (15002)

• hive.llap.daemon.yarn.shuffle.port (15551)

• hive.llap.management.rpc.port (15004)

Preparations for tuning performance

Before you tune Apache Hive, you should follow best practices. These guidelines include how you configure the

cluster, store data, and write queries.

Best practices

• Set up your cluster to use Apache Tez or the Hive on Tez execution engine.

In HDP 3.x, the MapReduce execution engine is replaced by Tez.

• Disable user impersonation by setting Run as end user to false in Ambari, which is equivalent to setting

hive.server2.enable.doAs in hive-site.xml.

LLAP caches data for multiple queries and this capability does not support user impersonation.

• Add the Ranger security service to your cluster and dependent services.

• Set up LLAP to run interactive queries.

• Store data using the ORC File format.

剩余32页未读，继续阅读

评论收藏

内容反馈

啊彪123

粉丝: 23
资源: 22

hive_performance_tuning-HDP3.1.0.pdf

database-2-day-performance-tuning-guide.pdf

2-day-performance-tuning-guide.pdf

sql server 2008 query performance tuning distilled. pdf

SBP-performance-tuning_color_en.pdf

oracl database performance tuning guide 11gr2.pdf

编译的spark-hive_2.11-2.3.0和 spark-hive-thriftserver_2.11-2.3.0.jar

apache-hive-3.1.0-bin.tar.gz

TPC-H_on_Hive_2009-08-14.tar.gz

hive-jdbc-3.1.0.jar

含两个文件hive-jdbc-3.1.2-standalone.jar和apache-hive-3.1.2-bin.tar.gz

oracle 11g performance tuning guide.pdf

sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz

flink-connector-hive_2.11-1.10.0-API文档-中文版.zip

flink-connector-hive_2.11-1.13.2-API文档-中英对照版.zip

hive-jdbc-3.1.0-standalone

spark-sql_2.11-2.4.0-cdh6.1.1.jar

DBeaver链接hive驱动包下载： hive-jdbc-uber-2.6.5.0-292.jar

Apache Hive（apache-hive-3.1.3-bin.tar.gz）

hive驱动包hive-jdbc-uber-2.6.5.0-292.jar（用户客户端连接使用）

Apache Hive（apache-hive-1.2.2-bin.tar.gz）

flink-connector-hive_2.11-1.13.1.jar

openstack安装包（一）

hive-jdbc-uber-2.6.5.0-292.rar

hive-jdbc-uber-2.6.5.0-292.jar

hive-0.13.1-cdh5.3.6.rar

hive-jdbc-uber-2.6.5.0-292.zip

spark-hive-thriftserver_2.11-2.1.3-SNAPSHOT-123456.jar

apache-hive-2.1.1-bin.tar

flink-connector-hive-2.12-1.13.1.jar

apache-hive-3.1.2-bin.tar.gz

最新资源