elasticsearch-hadoop-8.5.3.zip资源-CSDN文库

共21个文件

jar：18个

txt：2个

md：1个

版权申诉

elasticsearch

hadoop

151 浏览量 2023-01-24 21:35:49 上传评论收藏 14.78MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

elasticsearch-hadoop-8.5.3.zip （21个子文件）

elasticsearch-hadoop-8.5.3

LICENSE.txt 11KB

NOTICE.txt 69KB

dist

elasticsearch-spark-20_2.11-8.5.3-sources.jar 513KB

elasticsearch-hadoop-hive-8.5.3-sources.jar 459KB

elasticsearch-hadoop-pig-8.5.3-javadoc.jar 369KB

elasticsearch-hadoop-mr-8.5.3-javadoc.jar 329KB

elasticsearch-storm-8.5.3.jar 1.79MB

elasticsearch-hadoop-8.5.3.jar 2.17MB

elasticsearch-hadoop-pig-8.5.3.jar 1.79MB

elasticsearch-storm-8.5.3-sources.jar 452KB

elasticsearch-hadoop-8.5.3-sources.jar 571KB

elasticsearch-spark-20_2.11-8.5.3.jar 2.08MB

elasticsearch-hadoop-hive-8.5.3-javadoc.jar 375KB

elasticsearch-hadoop-mr-8.5.3-sources.jar 436KB

elasticsearch-hadoop-pig-8.5.3-sources.jar 451KB

elasticsearch-hadoop-hive-8.5.3.jar 1.8MB

elasticsearch-hadoop-mr-8.5.3.jar 1.76MB

elasticsearch-hadoop-8.5.3-javadoc.jar 487KB

elasticsearch-spark-20_2.11-8.5.3-javadoc.jar 348KB

elasticsearch-storm-8.5.3-javadoc.jar 384KB

README.md 14KB

# Elasticsearch Hadoop [![Build Status](https://travis-ci.org/elastic/elasticsearch-hadoop.svg?branch=master)](https://travis-ci.org/elastic/elasticsearch-hadoop) Elasticsearch real-time search and analytics natively integrated with Hadoop. Supports [Map/Reduce](#mapreduce), [Apache Hive](#apache-hive), [Apache Pig](#apache-pig), [Apache Spark](#apache-spark) and [Apache Storm](#apache-storm). See [project page](http://www.elastic.co/products/hadoop/) and [documentation](http://www.elastic.co/guide/en/elasticsearch/hadoop/current/index.html) for detailed information. ## Requirements Elasticsearch (__1.x__ or higher (2.x _highly_ recommended)) cluster accessible through [REST][]. That's it! Significant effort has been invested to create a small, dependency-free, self-contained jar that can be downloaded and put to use without any dependencies. Simply make it available to your job classpath and you're set. For a certain library, see the dedicated [chapter](http://www.elastic.co/guide/en/elasticsearch/hadoop/current/requirements.html). ES-Hadoop 6.x and higher are compatible with Elasticsearch __1.X__, __2.X__, __5.X__, and __6.X__ ES-Hadoop 5.x and higher are compatible with Elasticsearch __1.X__, __2.X__ and __5.X__ ES-Hadoop 2.2.x and higher are compatible with Elasticsearch __1.X__ and __2.X__ ES-Hadoop 2.0.x and 2.1.x are compatible with Elasticsearch __1.X__ *only* ## Installation ### Stable Release (currently `8.5.2`) Available through any Maven-compatible tool: ```xml <dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch-hadoop</artifactId> <version>8.5.2</version> </dependency> ``` or as a stand-alone [ZIP](http://www.elastic.co/downloads/hadoop). ### Development Snapshot Grab the latest nightly build from the [repository](http://oss.sonatype.org/content/repositories/snapshots/org/elasticsearch/elasticsearch-hadoop/) again through Maven: ```xml <dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch-hadoop</artifactId> <version>8.5.3-SNAPSHOT</version> </dependency> ``` ```xml <repositories> <repository> <id>sonatype-oss</id> <url>http://oss.sonatype.org/content/repositories/snapshots</url> <snapshots><enabled>true</enabled></snapshots> </repository> </repositories> ``` or [build](#building-the-source) the project yourself. We do build and test the code on _each_ commit. ### Supported Hadoop Versions Running against Hadoop 1.x is deprecated in 5.5 and will no longer be tested against in 6.0. ES-Hadoop is developed for and tested against Hadoop 2.x and YARN. More information in this [section](http://www.elastic.co/guide/en/elasticsearch/hadoop/current/install.html). ## Feedback / Q&A We're interested in your feedback! You can find us on the User [mailing list](https://groups.google.com/forum/?fromgroups#!forum/elasticsearch) - please append `[Hadoop]` to the post subject to filter it out. For more details, see the [community](http://www.elastic.co/community) page. ## Online Documentation The latest reference documentation is available online on the project [home page](http://www.elastic.co/guide/en/elasticsearch/hadoop/index.html). Below the README contains _basic_ usage instructions at a glance. ## Usage ### Configuration Properties All configuration properties start with `es` prefix. Note that the `es.internal` namespace is reserved for the library internal use and should _not_ be used by the user at any point. The properties are read mainly from the Hadoop configuration but the user can specify (some of) them directly depending on the library used. ### Required ``` es.resource=<ES resource location, relative to the host/port specified above> ``` ### Essential ``` es.query=<uri or query dsl query> # defaults to {"query":{"match_all":{}}} es.nodes=<ES host address> # defaults to localhost es.port=<ES REST port> # defaults to 9200 ``` The full list is available [here](http://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html) ## [Map/Reduce][] For basic, low-level or performance-sensitive environments, ES-Hadoop provides dedicated `InputFormat` and `OutputFormat` that read and write data to Elasticsearch. To use them, add the `es-hadoop` jar to your job classpath (either by bundling the library along - it's ~300kB and there are no-dependencies), using the [DistributedCache][] or by provisioning the cluster manually. See the [documentation](http://www.elastic.co/guide/en/elasticsearch/hadoop/current/index.html) for more information. Note that es-hadoop supports both the so-called 'old' and the 'new' API through its `EsInputFormat` and `EsOutputFormat` classes. ### 'Old' (`org.apache.hadoop.mapred`) API ### Reading To read data from ES, configure the `EsInputFormat` on your job configuration along with the relevant [properties](#configuration-properties): ```java JobConf conf = new JobConf(); conf.setInputFormat(EsInputFormat.class); conf.set("es.resource", "radio/artists"); conf.set("es.query", "?q=me*"); // replace this with the relevant query ... JobClient.runJob(conf); ``` ### Writing Same configuration template can be used for writing but using `EsOuputFormat`: ```java JobConf conf = new JobConf(); conf.setOutputFormat(EsOutputFormat.class); conf.set("es.resource", "radio/artists"); // index or indices used for storing data ... JobClient.runJob(conf); ``` ### 'New' (`org.apache.hadoop.mapreduce`) API ### Reading ```java Configuration conf = new Configuration(); conf.set("es.resource", "radio/artists"); conf.set("es.query", "?q=me*"); // replace this with the relevant query Job job = new Job(conf) job.setInputFormatClass(EsInputFormat.class); ... job.waitForCompletion(true); ``` ### Writing ```java Configuration conf = new Configuration(); conf.set("es.resource", "radio/artists"); // index or indices used for storing data Job job = new Job(conf) job.setOutputFormatClass(EsOutputFormat.class); ... job.waitForCompletion(true); ``` ## [Apache Hive][] ES-Hadoop provides a Hive storage handler for Elasticsearch, meaning one can define an [external table][] on top of ES. Add es-hadoop-<version>.jar to `hive.aux.jars.path` or register it manually in your Hive script (recommended): ``` ADD JAR /path_to_jar/es-hadoop-<version>.jar; ``` ### Reading To read data from ES, define a table backed by the desired index: ```SQL CREATE EXTERNAL TABLE artists ( id BIGINT, name STRING, links STRUCT<url:STRING, picture:STRING>) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource' = 'radio/artists', 'es.query' = '?q=me*'); ``` The fields defined in the table are mapped to the JSON when communicating with Elasticsearch. Notice the use of `TBLPROPERTIES` to define the location, that is the query used for reading from this table. Once defined, the table can be used just like any other: ```SQL SELECT * FROM artists; ``` ### Writing To write data, a similar definition is used but with a different `es.resource`: ```SQL CREATE EXTERNAL TABLE artists ( id BIGINT, name STRING, links STRUCT<url:STRING, picture:STRING>) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource' = 'radio/artists'); ``` Any data passed to the table is then passed down to Elasticsearch; for example considering a table `s`, mapped to a TSV/CSV file, one can index it to Elasticsearch like this: ```SQL INSERT OVERWRITE TABLE artists SELECT NULL, s.name, named_struct('url', s.url, 'picture', s.picture) FROM source s; ``` As one can note, currently the reading and writing are treated separately but we're working on unifying the two and automatically translating [HiveQL][] to Elasticsearch queries. ## [Apache Pig][] ES-Hadoop provides both read and write functions for Pig so you can access Elasticsearch from Pig scripts. Register ES-Hadoop jar into your script or add it to your Pig cla

评论收藏

内容反馈

版权申诉