Table of Contents
Elasticsearch for Hadoop
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Setting Up Environment
Setting up Hadoop for Elasticsearch
Setting up Java
Setting up a dedicated user
Installing SSH and setting up the certificate
Downloading Hadoop
Setting up environment variables
Configuring Hadoop
Configuring core-site.xml
Configuring hdfs-site.xml
Configuring yarn-site.xml
Configuring mapred-site.xml
The format distributed filesystem
Starting Hadoop daemons
Setting up Elasticsearch
Downloading Elasticsearch
Configuring Elasticsearch
Installing Elasticsearch's Head plugin
Installing the Marvel plugin
Running and testing
Running the WordCount example
Getting the examples and building the job JAR file
Importing the test file to HDFS
Running our first job
Exploring data in Head and Marvel
Viewing data in Head
Using the Marvel dashboard
Exploring the data in Sense
Summary
2. Getting Started with ES-Hadoop
Understanding the WordCount program
Understanding Mapper
Understanding the reducer
Understanding the driver
Using the old API – org.apache.hadoop.mapred
Going real — network monitoring data
Getting and understanding the data
Knowing the problems
Solution approaches
Approach 1 – Preaggregate the results
Approach 2 – Aggregate the results at query-time
Writing the NetworkLogsMapper job
Writing the mapper class
Writing Driver
Building the job
Getting the data into HDFS
Running the job
Viewing the Top N results
Getting data from Elasticsearch to HDFS
Understanding the Twitter dataset
Trying it yourself
Creating the MapReduce job to import data from Elasticsearch to HDFS
Writing the Tweets2Hdfs mapper
Running the example
Testing the job execution output
Summary ...