impala-2.8资源-CSDN文库

需积分: 9 81 浏览量 2018-11-01 17:18:55 上传评论收藏 7.95MB PDF 举报

Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS, HBase, or the Amazon Simple Storage Service (S3). In addition to using the same unified storage platform, Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface (Impala query UI in Hue) as Apache Hive. This provides a familiar and unified platform for real-time or batch-oriented queries. Impala is an addition to tools available for querying big data. Impala does not replace the batch processing frameworks built on MapReduce such as Hive. Hive and other frameworks built on MapReduce are best suited for long running batch jobs, such as those involving batch processing of Extract, Transform, and Load (ETL) type jobs. Note: Impala was accepted into the Apache incubator on December 2, 2015. In places where the documentation formerly referred to “Cloudera Impala”, now the official name is “Apache Impala (incubating)”. ### Impala-2.8: Fast Interactive SQL Queries for Big Data #### Introduction to Apache Impala (Incubating) Apache Impala is a high-performance, distributed SQL query engine that enables fast, interactive SQL queries on data stored in Apache Hadoop's HDFS, HBase, or Amazon S3. This system is designed to provide users with a familiar and unified platform for real-time or batch-oriented queries by leveraging the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface (Impala query UI in Hue) as Apache Hive. #### Key Benefits of Impala - **Interactive Queries**: Impala is optimized for interactive use cases, allowing analysts and developers to query large datasets quickly. - **Unified Platform**: By using the same metadata, SQL syntax, ODBC driver, and UI as Hive, it simplifies the development and deployment process. - **Integration**: It seamlessly integrates with existing Hadoop ecosystems, including HDFS, HBase, and other components. #### How Impala Works with Apache Hadoop Impala complements the Hadoop ecosystem by providing a fast SQL interface without relying on the MapReduce framework. Instead, it uses a low-latency, massively parallel processing (MPP) architecture to execute queries directly on the data stored in HDFS, HBase, or S3. #### Primary Features of Impala - **High Performance**: Impala can perform queries significantly faster than traditional batch-oriented systems like Hive. - **Interactive Query Processing**: It supports ad-hoc queries, making it ideal for exploratory data analysis. - **Scalability**: Impala can scale out to handle large volumes of data across multiple nodes. #### Impala Concepts and Architecture - **Components of the Impala Server**: - **Impala Daemon**: Runs on each node and executes queries. - **Impala Statestore**: Manages cluster membership and coordination between Impala daemons. - **Impala Catalog Service**: Maintains the metadata for all tables and columns. #### Developing Impala Applications - **Overview of the Impala SQL Dialect**: Impala supports a subset of SQL, including standard SQL constructs like SELECT, FROM, WHERE, JOIN, and GROUP BY. It also includes additional functions and features specific to Impala. - **Overview of Impala Programming Interfaces**: Impala supports various interfaces, including ODBC, JDBC, and REST APIs, which enable easy integration with various applications and tools. #### How Impala Fits into the Hadoop Ecosystem Impala coexists alongside other Hadoop components, such as Hive and Pig, offering a complementary solution for interactive analytics. While Hive is better suited for long-running batch jobs, Impala excels in delivering fast results for ad-hoc queries. #### How Impala Works with Hive - **Shared Metadata**: Both Impala and Hive share the same metadata through the Hive Metastore, ensuring consistency between the two systems. - **SQL Syntax**: Impala uses the Hive SQL dialect, making it easier for users familiar with Hive to transition to Impala. #### Overview of Impala Metadata and the Metastore - **Metadata Management**: Impala relies on the Hive Metastore for metadata management, which stores information about the structure and location of data. - **Schema Evolution**: The metastore supports schema evolution, enabling changes to table schemas without disrupting data access. #### How Impala Uses HDFS and HBase - **HDFS**: Impala reads and writes data directly from HDFS, bypassing the need for intermediate processing stages. - **HBase**: For HBase tables, Impala leverages the HBase client API to access data directly, providing faster query performance. #### Planning for Impala Deployment - **Requirements**: - **Supported Operating Systems**: Linux distributions like CentOS, Red Hat Enterprise Linux, and Ubuntu. - **Hive Metastore and Related Configuration**: Impala requires a properly configured Hive Metastore to manage metadata. - **Java Dependencies**: Java runtime environment is required. - **Networking Configuration**: Proper network configuration to ensure connectivity between Impala daemons and other services. - **Hardware Requirements**: Sufficient CPU, memory, and disk space based on the size of the data and expected query workload. - **User Account Requirements**: User accounts with appropriate permissions to access data and run Impala services. - **Cluster Sizing Guidelines**: - Consider the amount of data, number of concurrent users, and expected query complexity when sizing the cluster. - Use dedicated machines for Impala daemons to avoid contention with other Hadoop components. #### Guidelines for Designing Impala Schemas - **Columnar Storage**: Utilize columnar storage formats like Parquet or ORC for improved query performance. - **Partitioning**: Implement partitioning to improve query efficiency by reducing the amount of data scanned. - **Schema Simplification**: Keep schemas simple and denormalized where possible to reduce the overhead of joins. #### Installing Impala - **What is Included in an Impala Installation**: Typically includes the Impala daemon, state store, catalog service, and necessary libraries. - **Managing Impala**: Post-installation configuration, including setting up ODBC and JDBC connections, is crucial for optimal performance and security. #### Configuring Impala to Work with ODBC and JDBC - **ODBC Configuration**: Requires configuring the ODBC driver and establishing a DSN (Data Source Name). - **JDBC Configuration**: - **Configuring the JDBC Port**: Specifies the port used for JDBC connections. - **Choosing the JDBC Driver**: Select a compatible JDBC driver version. - **Enabling JDBC Support on Client Systems**: Install the JDBC driver on client systems. - **Establishing JDBC Connections**: Configure the connection URL and credentials. #### Upgrading Impala - **Upgrade Process**: Follow the recommended upgrade path provided by the Impala documentation to ensure compatibility and minimize downtime. #### Starting Impala - **Starting Impala from the Command Line**: Use the `impala-server` command to start the Impala daemon. - **Modifying Impala Startup Options**: Customize startup options for different daemons (e.g., impalad, statestored, catalogd). #### Checking the Values of Impala Configuration Options - **Common Startup Options**: - For `impalad`: Specifies options like the hostname, port, and query log directory. - For `statestored`: Manages cluster membership and coordination. - For `catalogd`: Controls the metadata service. #### Impala Tutorials - **Tutorials for Getting Started**: - **Explore a New Impala Instance**: Learn how to connect to Impala and run basic queries. - **Load CSV Data from Local Files**: Understand how to import data from CSV files into Impala. - **Point an Impala Table at Existing Data Files**: Explore how to link Impala tables to data stored in HDFS. - **Describe the Impala Table**: Discover how to view the schema of a table. - **Query the Impala Table**: Execute complex queries and analyze the results. #### Advanced Tutorials - **Attaching an External Partitioned Table to an HDFS Directory Structure**: Guide on creating external tables and linking them to directories in HDFS. - **Switching Back and Forth Between Impala and Hive**: Techniques for seamlessly transitioning between Impala and Hive for different tasks. #### Conclusion Apache Impala is a powerful tool for performing fast, interactive SQL queries on big data stored in Hadoop. Its integration with Hive and other Hadoop components makes it a valuable addition to any organization’s data processing infrastructure. By understanding its architecture, features, and deployment guidelines, organizations can leverage Impala to gain insights from their data more efficiently and effectively.

资源推荐

资源详情

资源评论

Apache Impala (incubating) Guide

| Contents | ii

Contents

Introducing Apache Impala (incubating).............................................................13

Impala Benefits................................................................................................................................................... 13

How Impala Works with Apache Hadoop.........................................................................................................13

Primary Impala Features.....................................................................................................................................14

Impala Concepts and Architecture.......................................................................14

Components of the Impala Server......................................................................................................................14

The Impala Daemon............................................................................................................................... 14

The Impala Statestore............................................................................................................................. 15

The Impala Catalog Service................................................................................................................... 15

Developing Impala Applications........................................................................................................................ 16

Overview of the Impala SQL Dialect.................................................................................................... 16

Overview of Impala Programming Interfaces........................................................................................17

How Impala Fits Into the Hadoop Ecosystem................................................................................................... 17

How Impala Works with Hive............................................................................................................... 17

Overview of Impala Metadata and the Metastore..................................................................................18

How Impala Uses HDFS........................................................................................................................ 18

How Impala Uses HBase....................................................................................................................... 18

Planning for Impala Deployment..........................................................................18

Impala Requirements.......................................................................................................................................... 18

Supported Operating Systems................................................................................................................ 18

Hive Metastore and Related Configuration............................................................................................19

Java Dependencies.................................................................................................................................. 19

Networking Configuration Requirements...............................................................................................19

Hardware Requirements..........................................................................................................................20

User Account Requirements...................................................................................................................20

Cluster Sizing Guidelines for Impala.................................................................................................................20

Guidelines for Designing Impala Schemas........................................................................................................22

Installing Impala..................................................................................................... 24

What is Included in an Impala Installation........................................................................................................24

Managing Impala....................................................................................................25

Post-Installation Configuration for Impala.........................................................................................................25

Configuring Impala to Work with ODBC......................................................................................................... 26

Configuring Impala to Work with JDBC...........................................................................................................27

Configuring the JDBC Port.................................................................................................................... 27

Choosing the JDBC Driver.................................................................................................................... 27

Enabling Impala JDBC Support on Client Systems.............................................................................. 27

Establishing JDBC Connections.............................................................................................................28

Notes about JDBC and ODBC Interaction with Impala SQL Features................................................. 29

Upgrading Impala...................................................................................................30

| Contents | iii

Upgrading Impala............................................................................................................................................... 30

Starting Impala....................................................................................................... 31

Starting Impala from the Command Line..........................................................................................................31

Modifying Impala Startup Options.................................................................................................................... 32

Configuring Impala Startup Options through the Command Line.........................................................32

Checking the Values of Impala Configuration Options.........................................................................34

Startup Options for impalad Daemon.................................................................................................... 34

Startup Options for statestored Daemon................................................................................................ 34

Startup Options for catalogd Daemon....................................................................................................34

Impala Tutorials..................................................................................................... 34

Tutorials for Getting Started.............................................................................................................................. 35

Explore a New Impala Instance............................................................................................................. 35

Load CSV Data from Local Files.......................................................................................................... 40

Point an Impala Table at Existing Data Files........................................................................................ 42

Describe the Impala Table......................................................................................................................43

Query the Impala Table..........................................................................................................................44

Data Loading and Querying Examples.................................................................................................. 45

Advanced Tutorials.............................................................................................................................................47

Attaching an External Partitioned Table to an HDFS Directory Structure............................................ 47

Switching Back and Forth Between Impala and Hive...........................................................................49

Cross Joins and Cartesian Products with the CROSS JOIN Operator...................................................50

Dealing with Parquet Files with Unknown Schema.......................................................................................... 52

Impala Administration........................................................................................... 66

Admission Control and Query Queuing.............................................................................................................67

Overview of Impala Admission Control................................................................................................67

Concurrent Queries and Admission Control.......................................................................................... 68

Memory Limits and Admission Control................................................................................................ 68

How Impala Admission Control Relates to Other Resource Management Tools..................................68

How Impala Schedules and Enforces Limits on Concurrent Queries....................................................69

How Admission Control works with Impala Clients (JDBC, ODBC, HiveServer2).............................69

SQL and Schema Considerations for Admission Control..................................................................... 70

Configuring Admission Control............................................................................................................. 70

Resource Management for Impala..................................................................................................................... 75

How Resource Limits Are Enforced......................................................................................................75

impala-shell Query Options for Resource Management........................................................................ 76

Limitations of Resource Management for Impala................................................................................. 76

Setting Timeout Periods for Daemons, Queries, and Sessions..........................................................................76

Increasing the Statestore Timeout.......................................................................................................... 76

Setting the Idle Query and Idle Session Timeouts for impalad............................................................. 76

Setting Timeout and Retries for Thrift Connections to the Backend Client.......................................... 77

Cancelling a Query................................................................................................................................. 77

Using Impala through a Proxy for High Availability........................................................................................77

Overview of Proxy Usage and Load Balancing for Impala...................................................................77

Special Proxy Considerations for Clusters Using Kerberos.................................................................. 78

Example of Configuring HAProxy Load Balancer for Impala..............................................................79

Managing Disk Space for Impala Data..............................................................................................................81

Impala Security....................................................................................................... 82

Security Guidelines for Impala.......................................................................................................................... 83

| Contents | iv

Securing Impala Data and Log Files................................................................................................................. 84

Installation Considerations for Impala Security.................................................................................................84

Securing the Hive Metastore Database.............................................................................................................. 84

Securing the Impala Web User Interface...........................................................................................................84

Configuring TLS/SSL for Impala...................................................................................................................... 85

Using the Command Line...................................................................................................................... 85

Using TLS/SSL with Business Intelligence Tools.................................................................................86

Enabling Sentry Authorization for Impala.........................................................................................................86

The Sentry Privilege Model................................................................................................................... 86

Starting the impalad Daemon with Sentry Authorization Enabled........................................................87

Using Impala with the Sentry Service (Impala 1.4 or higher only).......................................................88

Using Impala with the Sentry Policy File..............................................................................................88

Setting Up Schema Objects for a Secure Impala Deployment.............................................................. 93

Privilege Model and Object Hierarchy.................................................................................................. 93

Debugging Failed Sentry Authorization Requests................................................................................. 97

The DEFAULT Database in a Secure Deployment...............................................................................97

Impala Authentication.........................................................................................................................................97

Enabling Kerberos Authentication for Impala....................................................................................... 98

Enabling LDAP Authentication for Impala......................................................................................... 100

Using Multiple Authentication Methods with Impala..........................................................................102

Configuring Impala Delegation for Hue and BI Tools........................................................................ 103

Auditing Impala Operations............................................................................................................................. 103

Durability and Performance Considerations for Impala Auditing....................................................... 103

Format of the Audit Log Files............................................................................................................. 104

Which Operations Are Audited............................................................................................................104

Viewing Lineage Information for Impala Data............................................................................................... 105

Impala SQL Language Reference.......................................................................105

Comments..........................................................................................................................................................106

Data Types........................................................................................................................................................ 106

ARRAY Complex Type (Impala 2.3 or higher only).......................................................................... 106

BIGINT Data Type...............................................................................................................................110

BOOLEAN Data Type......................................................................................................................... 111

CHAR Data Type (Impala 2.0 or higher only)....................................................................................112

DECIMAL Data Type (Impala 1.4 or higher only).............................................................................116

DOUBLE Data Type............................................................................................................................ 124

FLOAT Data Type............................................................................................................................... 125

INT Data Type......................................................................................................................................126

MAP Complex Type (Impala 2.3 or higher only)............................................................................... 127

REAL Data Type.................................................................................................................................. 131

SMALLINT Data Type........................................................................................................................ 131

STRING Data Type.............................................................................................................................. 132

STRUCT Complex Type (Impala 2.3 or higher only).........................................................................134

TIMESTAMP Data Type..................................................................................................................... 140

TINYINT Data Type............................................................................................................................ 146

VARCHAR Data Type (Impala 2.0 or higher only)............................................................................147

Complex Types (Impala 2.3 or higher only)....................................................................................... 150

Literals...............................................................................................................................................................181

Numeric Literals................................................................................................................................... 181

String Literals........................................................................................................................................182

Boolean Literals.................................................................................................................................... 183

Timestamp Literals............................................................................................................................... 184

NULL.................................................................................................................................................... 184

SQL Operators.................................................................................................................................................. 185

Arithmetic Operators............................................................................................................................ 185

| Contents | v

BETWEEN Operator............................................................................................................................ 188

Comparison Operators.......................................................................................................................... 189

EXISTS Operator..................................................................................................................................190

ILIKE Operator.....................................................................................................................................194

IN Operator........................................................................................................................................... 195

IREGEXP Operator.............................................................................................................................. 198

IS DISTINCT FROM Operator............................................................................................................199

IS NULL Operator................................................................................................................................200

LIKE Operator...................................................................................................................................... 201

Logical Operators..................................................................................................................................202

REGEXP Operator................................................................................................................................205

RLIKE Operator....................................................................................................................................206

Impala Schema Objects and Object Names..................................................................................................... 207

Overview of Impala Aliases.................................................................................................................208

Overview of Impala Databases............................................................................................................ 208

Overview of Impala Functions............................................................................................................. 209

Overview of Impala Identifiers............................................................................................................ 210

Overview of Impala Tables.................................................................................................................. 211

Overview of Impala Views.................................................................................................................. 216

Impala SQL Statements.................................................................................................................................... 219

DDL Statements....................................................................................................................................220

DML Statements................................................................................................................................... 221

ALTER TABLE Statement.................................................................................................................. 221

ALTER VIEW Statement.....................................................................................................................234

COMPUTE STATS Statement............................................................................................................. 236

CREATE DATABASE Statement....................................................................................................... 243

CREATE FUNCTION Statement.........................................................................................................245

CREATE ROLE Statement (Impala 2.0 or higher only)..................................................................... 251

CREATE TABLE Statement................................................................................................................251

CREATE VIEW Statement.................................................................................................................. 265

DELETE Statement (Impala 2.8 or higher only).................................................................................267

DESCRIBE Statement.......................................................................................................................... 268

DROP DATABASE Statement............................................................................................................ 280

DROP FUNCTION Statement............................................................................................................. 282

DROP ROLE Statement (Impala 2.0 or higher only)..........................................................................284

DROP STATS Statement..................................................................................................................... 284

DROP TABLE Statement.....................................................................................................................289

DROP VIEW Statement....................................................................................................................... 291

EXPLAIN Statement............................................................................................................................ 292

GRANT Statement (Impala 2.0 or higher only).................................................................................. 295

INSERT Statement................................................................................................................................296

INVALIDATE METADATA Statement..............................................................................................304

LOAD DATA Statement...................................................................................................................... 307

REFRESH Statement............................................................................................................................ 311

REVOKE Statement (Impala 2.0 or higher only)................................................................................315

SELECT Statement............................................................................................................................... 315

SET Statement...................................................................................................................................... 341

SHOW Statement..................................................................................................................................373

TRUNCATE TABLE Statement (Impala 2.3 or higher only)............................................................. 391

UPDATE Statement (Impala 2.8 or higher only)................................................................................ 394

UPSERT Statement (Impala 2.8 or higher only)................................................................................. 395

USE Statement......................................................................................................................................396

Impala Built-In Functions.................................................................................................................................397

Impala Mathematical Functions........................................................................................................... 398

Impala Bit Functions............................................................................................................................ 413

Impala Type Conversion Functions..................................................................................................... 425