hadoopmapred_tutorial官方文档资源-CSDN文库

需积分: 3 190 浏览量 2011-03-13 00:41:14 上传评论收藏 223KB PDF 举报

资源推荐

资源详情

资源评论

Hadoop Map-Reduce Tutorial

Table of contents

1 Purpose...............................................................................................................................2

2 Pre-requisites......................................................................................................................2

3 Overview............................................................................................................................2

4 Inputs and Outputs............................................................................................................. 3

5 Example: WordCount v1.0................................................................................................ 3

5.1 Source Code...................................................................................................................3

5.2 Usage.............................................................................................................................6

5.3 Walk-through.................................................................................................................7

6 Map-Reduce - User Interfaces........................................................................................... 8

6.1 Payload.......................................................................................................................... 9

6.2 Job Configuration........................................................................................................12

6.3 Task Execution & Environment..................................................................................13

6.4 Job Submission and Monitoring..................................................................................15

6.5 Job Input......................................................................................................................16

6.6 Job Output................................................................................................................... 17

6.7 Other Useful Features..................................................................................................18

7 Example: WordCount v2.0.............................................................................................. 22

7.1 Source Code.................................................................................................................22

7.2 Sample Runs................................................................................................................28

7.3 Highlights.................................................................................................................... 30

1. Purpose

This document comprehensively describes all user-facing facets of the Hadoop Map-Reduce

framework and serves as a tutorial.

2. Pre-requisites

Ensure that Hadoop is installed, configured and is running. More details:

•

Hadoop Quickstart for first-time users.

•

Hadoop Cluster Setup for large, distributed clusters.

3. Overview

Hadoop Map-Reduce is a software framework for easily writing applications which process

vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of

nodes) of commodity hardware in a reliable, fault-tolerant manner.

A Map-Reduce job usually splits the input data-set into independent chunks which are

processed by the map tasks in a completely parallel manner. The framework sorts the outputs

of the maps, which are then input to the reduce tasks. Typically both the input and the output

of the job are stored in a file-system. The framework takes care of scheduling tasks,

monitoring them and re-executes the failed tasks.

Typically the compute nodes and the storage nodes are the same, that is, the Map-Reduce

framework and the Distributed FileSystem are running on the same set of nodes. This

configuration allows the framework to effectively schedule tasks on the nodes where data is

already present, resulting in very high aggregate bandwidth across the cluster.

The Map-Reduce framework consists of a single master JobTracker and one slave

TaskTracker per cluster-node. The master is responsible for scheduling the jobs'

component tasks on the slaves, monitoring them and re-executing the failed tasks. The slaves

execute the tasks as directed by the master.

Minimally, applications specify the input/output locations and supply map and reduce

functions via implementations of appropriate interfaces and/or abstract-classes. These, and

other job parameters, comprise the job configuration. The Hadoop job client then submits the

job (jar/executable etc.) and configuration to the JobTracker which then assumes the

responsibility of distributing the software/configuration to the slaves, scheduling tasks and

monitoring them, providing status and diagnostic information to the job-client.

Although the Hadoop framework is implemented in JavaTM, Map-Reduce applications need

Hadoop Map-Reduce Tutorial

Page 2

剩余29页未读，继续阅读

评论收藏

内容反馈

doudou0411

粉丝: 0
资源: 10

hadoop mapred_tutorial官方文档

最新资源

hadoop mapred_tutorial官方文档

hadoop官方文档中文档

spring-hadoop官方文档

hadoop官方文档

spring-data-hadoop官方文档

cdh hadoop官方安装文档

django_rest_doc_tutorial:仅遵循官方文档

duilib_tutorial:DuiLib基础入门文档+视频教程，基于官方开源代码-源码开源

tf_tutorial_plus, TensorFlow api教程官方文档不包括.zip

wagtailCMS-tutorial：Wagtail是一套基于Python Django的内容管理系统，为很多大型机构，诸如NASA，Google，MIT，Mizilla等所使用，本项目将将其官方文档翻译整理为中文语言

A_tutorial_on_R_and_Hadoop,_using_the_RHadoop

Hadoop安装教程_单机_伪分布式配置_Hadoop2.6.0_Ubuntu141

虾皮工作室文档 - Hadoop入门教程

hadoop2.7.5_window编码包

mapred_tutorial

mapred.zip_hadoop_hadoop mapreduce_mapReduce

hadoop-test-report.zip_hadoop_hadoop word_压力测试报告

Hadoop常见异常

hdfs.rar_hadoop_hadoop ubuntu_hdfs_分布式系统_基于hadoop

hadoop2.7.x_winutils_exe&&hadoop;_dll

hadoop2.6_windows_x64.zip

HADOOP课件_大数据_hadoop_

hadoop2.6_Win_x64-master

hadoop_join.jar.zip_hadoop_hadoop query_reduce

hadoop2.7_winutils_exe和hadoop_dll

hdfs-webdav.rar_hadoop_hadoop webdav_hadoop 系统_hadoop2.0 d_hdfs

tutorial-django-traducao:Django 官方文档中包含的教程的翻译

pc机连接集群的HADOOP_HOME

大数据技术Hadoop3.x 2021年

最新资源