# Elasticsearch Hadoop [![Build Status](https://travis-ci.org/elastic/elasticsearch-hadoop.svg?branch=master)](https://travis-ci.org/elastic/elasticsearch-hadoop)
Elasticsearch real-time search and analytics natively integrated with Hadoop.
Supports [Map/Reduce](#mapreduce), [Apache Hive](#apache-hive), [Apache Pig](#apache-pig), [Apache Spark](#apache-spark) and [Apache Storm](#apache-storm).
See [project page](http://www.elastic.co/products/hadoop/) and [documentation](http://www.elastic.co/guide/en/elasticsearch/hadoop/current/index.html) for detailed information.
## Requirements
Elasticsearch (__1.x__ or higher (2.x _highly_ recommended)) cluster accessible through [REST][]. That's it!
Significant effort has been invested to create a small, dependency-free, self-contained jar that can be downloaded and put to use without any dependencies. Simply make it available to your job classpath and you're set.
For a certain library, see the dedicated [chapter](http://www.elastic.co/guide/en/elasticsearch/hadoop/current/requirements.html).
ES-Hadoop 6.x and higher are compatible with Elasticsearch __1.X__, __2.X__, __5.X__, and __6.X__
ES-Hadoop 5.x and higher are compatible with Elasticsearch __1.X__, __2.X__ and __5.X__
ES-Hadoop 2.2.x and higher are compatible with Elasticsearch __1.X__ and __2.X__
ES-Hadoop 2.0.x and 2.1.x are compatible with Elasticsearch __1.X__ *only*
## Installation
### Stable Release (currently `8.5.2`)
Available through any Maven-compatible tool:
```xml
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-hadoop</artifactId>
<version>8.5.2</version>
</dependency>
```
or as a stand-alone [ZIP](http://www.elastic.co/downloads/hadoop).
### Development Snapshot
Grab the latest nightly build from the [repository](http://oss.sonatype.org/content/repositories/snapshots/org/elasticsearch/elasticsearch-hadoop/) again through Maven:
```xml
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-hadoop</artifactId>
<version>8.5.3-SNAPSHOT</version>
</dependency>
```
```xml
<repositories>
<repository>
<id>sonatype-oss</id>
<url>http://oss.sonatype.org/content/repositories/snapshots</url>
<snapshots><enabled>true</enabled></snapshots>
</repository>
</repositories>
```
or [build](#building-the-source) the project yourself.
We do build and test the code on _each_ commit.
### Supported Hadoop Versions
Running against Hadoop 1.x is deprecated in 5.5 and will no longer be tested against in 6.0.
ES-Hadoop is developed for and tested against Hadoop 2.x and YARN.
More information in this [section](http://www.elastic.co/guide/en/elasticsearch/hadoop/current/install.html).
## Feedback / Q&A
We're interested in your feedback! You can find us on the User [mailing list](https://groups.google.com/forum/?fromgroups#!forum/elasticsearch) - please append `[Hadoop]` to the post subject to filter it out. For more details, see the [community](http://www.elastic.co/community) page.
## Online Documentation
The latest reference documentation is available online on the project [home page](http://www.elastic.co/guide/en/elasticsearch/hadoop/index.html). Below the README contains _basic_ usage instructions at a glance.
## Usage
### Configuration Properties
All configuration properties start with `es` prefix. Note that the `es.internal` namespace is reserved for the library internal use and should _not_ be used by the user at any point.
The properties are read mainly from the Hadoop configuration but the user can specify (some of) them directly depending on the library used.
### Required
```
es.resource=<ES resource location, relative to the host/port specified above>
```
### Essential
```
es.query=<uri or query dsl query> # defaults to {"query":{"match_all":{}}}
es.nodes=<ES host address> # defaults to localhost
es.port=<ES REST port> # defaults to 9200
```
The full list is available [here](http://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html)
## [Map/Reduce][]
For basic, low-level or performance-sensitive environments, ES-Hadoop provides dedicated `InputFormat` and `OutputFormat` that read and write data to Elasticsearch. To use them, add the `es-hadoop` jar to your job classpath
(either by bundling the library along - it's ~300kB and there are no-dependencies), using the [DistributedCache][] or by provisioning the cluster manually.
See the [documentation](http://www.elastic.co/guide/en/elasticsearch/hadoop/current/index.html) for more information.
Note that es-hadoop supports both the so-called 'old' and the 'new' API through its `EsInputFormat` and `EsOutputFormat` classes.
### 'Old' (`org.apache.hadoop.mapred`) API
### Reading
To read data from ES, configure the `EsInputFormat` on your job configuration along with the relevant [properties](#configuration-properties):
```java
JobConf conf = new JobConf();
conf.setInputFormat(EsInputFormat.class);
conf.set("es.resource", "radio/artists");
conf.set("es.query", "?q=me*"); // replace this with the relevant query
...
JobClient.runJob(conf);
```
### Writing
Same configuration template can be used for writing but using `EsOuputFormat`:
```java
JobConf conf = new JobConf();
conf.setOutputFormat(EsOutputFormat.class);
conf.set("es.resource", "radio/artists"); // index or indices used for storing data
...
JobClient.runJob(conf);
```
### 'New' (`org.apache.hadoop.mapreduce`) API
### Reading
```java
Configuration conf = new Configuration();
conf.set("es.resource", "radio/artists");
conf.set("es.query", "?q=me*"); // replace this with the relevant query
Job job = new Job(conf)
job.setInputFormatClass(EsInputFormat.class);
...
job.waitForCompletion(true);
```
### Writing
```java
Configuration conf = new Configuration();
conf.set("es.resource", "radio/artists"); // index or indices used for storing data
Job job = new Job(conf)
job.setOutputFormatClass(EsOutputFormat.class);
...
job.waitForCompletion(true);
```
## [Apache Hive][]
ES-Hadoop provides a Hive storage handler for Elasticsearch, meaning one can define an [external table][] on top of ES.
Add es-hadoop-<version>.jar to `hive.aux.jars.path` or register it manually in your Hive script (recommended):
```
ADD JAR /path_to_jar/es-hadoop-<version>.jar;
```
### Reading
To read data from ES, define a table backed by the desired index:
```SQL
CREATE EXTERNAL TABLE artists (
id BIGINT,
name STRING,
links STRUCT<url:STRING, picture:STRING>)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'radio/artists', 'es.query' = '?q=me*');
```
The fields defined in the table are mapped to the JSON when communicating with Elasticsearch. Notice the use of `TBLPROPERTIES` to define the location, that is the query used for reading from this table.
Once defined, the table can be used just like any other:
```SQL
SELECT * FROM artists;
```
### Writing
To write data, a similar definition is used but with a different `es.resource`:
```SQL
CREATE EXTERNAL TABLE artists (
id BIGINT,
name STRING,
links STRUCT<url:STRING, picture:STRING>)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'radio/artists');
```
Any data passed to the table is then passed down to Elasticsearch; for example considering a table `s`, mapped to a TSV/CSV file, one can index it to Elasticsearch like this:
```SQL
INSERT OVERWRITE TABLE artists
SELECT NULL, s.name, named_struct('url', s.url, 'picture', s.picture) FROM source s;
```
As one can note, currently the reading and writing are treated separately but we're working on unifying the two and automatically translating [HiveQL][] to Elasticsearch queries.
## [Apache Pig][]
ES-Hadoop provides both read and write functions for Pig so you can access Elasticsearch from Pig scripts.
Register ES-Hadoop jar into your script or add it to your Pig cla
没有合适的资源?快使用搜索试试~ 我知道了~
elasticsearch-hadoop-8.5.3.zip
共21个文件
jar:18个
txt:2个
md:1个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 151 浏览量
2023-01-24
21:35:49
上传
评论
收藏 14.78MB ZIP 举报
温馨提示
Elasticsearch for Apache Hadoop (ES-Hadoop),elasticsearch-hadoop-8.5.3.zip
资源推荐
资源详情
资源评论
收起资源包目录
elasticsearch-hadoop-8.5.3.zip (21个子文件)
elasticsearch-hadoop-8.5.3
LICENSE.txt 11KB
NOTICE.txt 69KB
dist
elasticsearch-spark-20_2.11-8.5.3-sources.jar 513KB
elasticsearch-hadoop-hive-8.5.3-sources.jar 459KB
elasticsearch-hadoop-pig-8.5.3-javadoc.jar 369KB
elasticsearch-hadoop-mr-8.5.3-javadoc.jar 329KB
elasticsearch-storm-8.5.3.jar 1.79MB
elasticsearch-hadoop-8.5.3.jar 2.17MB
elasticsearch-hadoop-pig-8.5.3.jar 1.79MB
elasticsearch-storm-8.5.3-sources.jar 452KB
elasticsearch-hadoop-8.5.3-sources.jar 571KB
elasticsearch-spark-20_2.11-8.5.3.jar 2.08MB
elasticsearch-hadoop-hive-8.5.3-javadoc.jar 375KB
elasticsearch-hadoop-mr-8.5.3-sources.jar 436KB
elasticsearch-hadoop-pig-8.5.3-sources.jar 451KB
elasticsearch-hadoop-hive-8.5.3.jar 1.8MB
elasticsearch-hadoop-mr-8.5.3.jar 1.76MB
elasticsearch-hadoop-8.5.3-javadoc.jar 487KB
elasticsearch-spark-20_2.11-8.5.3-javadoc.jar 348KB
elasticsearch-storm-8.5.3-javadoc.jar 384KB
README.md 14KB
共 21 条
- 1
资源评论
YunFeiDong
- 粉丝: 37
- 资源: 3850
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功