<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!-- Do not modify this file directly. Instead, copy entries that you -->
<!-- wish to modify from this file into hdfs-site.xml and change them -->
<!-- there. If hdfs-site.xml does not already exist, create it. -->
<configuration>
<property>
<name>hadoop.hdfs.configuration.version</name>
<value>1</value>
<description>version of this configuration file</description>
</property>
<property>
<name>dfs.namenode.rpc-address</name>
<value></value>
<description>
RPC address that handles all clients requests. In the case of HA/Federation where multiple namenodes exist,
the name service id is added to the name e.g. dfs.namenode.rpc-address.ns1
dfs.namenode.rpc-address.EXAMPLENAMESERVICE
The value of this property will take the form of nn-host1:rpc-port. The NameNode's default RPC port is 8020.
</description>
</property>
<property>
<name>dfs.namenode.rpc-bind-host</name>
<value></value>
<description>
The actual address the RPC server will bind to. If this optional address is
set, it overrides only the hostname portion of dfs.namenode.rpc-address.
It can also be specified per name node or name service for HA/Federation.
This is useful for making the name node listen on all interfaces by
setting it to 0.0.0.0.
</description>
</property>
<property>
<name>dfs.namenode.servicerpc-address</name>
<value></value>
<description>
RPC address for HDFS Services communication. BackupNode, Datanodes and all other services should be
connecting to this address if it is configured. In the case of HA/Federation where multiple namenodes exist,
the name service id is added to the name e.g. dfs.namenode.servicerpc-address.ns1
dfs.namenode.rpc-address.EXAMPLENAMESERVICE
The value of this property will take the form of nn-host1:rpc-port.
If the value of this property is unset the value of dfs.namenode.rpc-address will be used as the default.
</description>
</property>
<property>
<name>dfs.namenode.servicerpc-bind-host</name>
<value></value>
<description>
The actual address the service RPC server will bind to. If this optional address is
set, it overrides only the hostname portion of dfs.namenode.servicerpc-address.
It can also be specified per name node or name service for HA/Federation.
This is useful for making the name node listen on all interfaces by
setting it to 0.0.0.0.
</description>
</property>
<property>
<name>dfs.namenode.lifeline.rpc-address</name>
<value></value>
<description>
NameNode RPC lifeline address. This is an optional separate RPC address
that can be used to isolate health checks and liveness to protect against
resource exhaustion in the main RPC handler pool. In the case of
HA/Federation where multiple NameNodes exist, the name service ID is added
to the name e.g. dfs.namenode.lifeline.rpc-address.ns1. The value of this
property will take the form of nn-host1:rpc-port. If this property is not
defined, then the NameNode will not start a lifeline RPC server. By
default, the property is not defined.
</description>
</property>
<property>
<name>dfs.namenode.lifeline.rpc-bind-host</name>
<value></value>
<description>
The actual address the lifeline RPC server will bind to. If this optional
address is set, it overrides only the hostname portion of
dfs.namenode.lifeline.rpc-address. It can also be specified per name node
or name service for HA/Federation. This is useful for making the name node
listen on all interfaces by setting it to 0.0.0.0.
</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>0.0.0.0:9868</value>
<description>
The secondary namenode http server address and port.
</description>
</property>
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>0.0.0.0:9869</value>
<description>
The secondary namenode HTTPS server address and port.
</description>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:9866</value>
<description>
The datanode server address and port for data transfer.
</description>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:9864</value>
<description>
The datanode http server address and port.
</description>
</property>
<property>
<name>dfs.datanode.ipc.address</name>
<value>0.0.0.0:9867</value>
<description>
The datanode ipc server address and port.
</description>
</property>
<property>
<name>dfs.datanode.http.internal-proxy.port</name>
<value>0</value>
<description>
The datanode's internal web proxy port.
By default it selects a random port available in runtime.
</description>
</property>
<property>
<name>dfs.datanode.handler.count</name>
<value>10</value>
<description>The number of server threads for the datanode.</description>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>0.0.0.0:9870</value>
<description>
The address and the base port where the dfs namenode web ui will listen on.
</description>
</property>
<property>
<name>dfs.namenode.http-bind-host</name>
<value></value>
<description>
The actual address the HTTP server will bind to. If this optional address
is set, it overrides only the hostname portion of dfs.namenode.http-address.
It can also be specified per name node or name service for HA/Federation.
This is useful for making the name node HTTP server listen on all
interfaces by setting it to 0.0.0.0.
</description>
</property>
<property>
<name>dfs.namenode.heartbeat.recheck-interval</name>
<value>300000</value>
<description>
This time decides the interval to check for expired datanodes.
With this value and dfs.heartbeat.interval, the interval of
deciding the datanode is stale or not is also calculated.
The unit of this configuration is millisecond.
</description>
</property>
<property>
<name>dfs.http.policy</name>
<value>HTTP_ONLY</value>
<description>Decide if HTTPS(SSL) is supported on HDFS
This configures the HTTP endpoint for HDFS daemons:
The following values are supported:
- HTTP_ONLY : Service is provided only on http
- HTTPS_ONLY : Service is provided only on https
- HTTP_AND_HTTPS : Service is provided both on http and https
</description>
</property>
<property>
<name>dfs.client.https.need-auth</name>
<value>false</value>
<description>Whether SSL client certificate authentication is required
</description>
</property>
<property>
<name>dfs.client.cached.conn.retry</name>
<value>3</value>
<description>The number of times the HDFS client will pull a socket from the
cache. Once this number is exceeded, the client will try to create a new
socket.
</description>
</property>
<property>
<name>dfs.https.server.keystore.resource</name>
<value>ssl-server.xml</value>
<description>Resource file from which ssl server keystore
information will be extracted
</description>
</property>
<property>
<name>dfs
Hadoop默认的配置文件
需积分: 0 170 浏览量
更新于2023-09-07
收藏 112KB ZIP 举报
在分布式计算领域,Hadoop是一个不可或缺的关键框架,它为大数据处理提供了强大而灵活的解决方案。Hadoop的核心组件包括HDFS(Hadoop Distributed File System)和MapReduce,它们各自都有自己的默认配置文件,这些配置文件是Hadoop运行时的重要组成部分。在你提到的压缩包文件中,我们能看到四个主要的默认配置文件:`core-default.xml`,`hdfs-default.xml`,`mapred-default.xml`,以及`yarn-default.xml`。下面我们将逐一详细介绍这些文件及其包含的配置项。
`core-default.xml`是Hadoop的核心配置文件,它定义了Hadoop的基本行为,如I/O设置、序列化参数和文件系统属性等。其中,重要的配置包括`fs.defaultFS`,它是Hadoop集群的默认文件系统,通常指向HDFS;`io.file.buffer.size`控制读写操作的缓冲区大小;`fs.trash.interval`设定垃圾回收的时间间隔。
接下来,`hdfs-default.xml`关注HDFS的配置。HDFS是Hadoop的分布式文件系统,它的配置直接影响数据存储和访问。例如,`dfs.replication`设置数据块的副本数量,用于提高容错性和可用性;`dfs.blocksize`定义了默认的数据块大小,这是HDFS存储文件的基本单位;`dfs.namenode.name.dir`指定NameNode保存元数据的目录,是HDFS的关键存储位置。
`mapred-default.xml`与MapReduce有关,它是Hadoop的并行计算模型。这个文件包含了关于作业执行、任务调度和资源管理的配置。比如,`mapreduce.map.memory.mb`和`mapreduce.reduce.memory.mb`分别设定了Map和Reduce任务的内存大小;`mapreduce.map.cpu.vcores`和`mapreduce.reduce.cpu.vcores`定义了任务可以使用的虚拟CPU核心数;`mapreduce.jobtracker.address`是JobTracker的地址,负责协调整个MapReduce作业的执行。
`yarn-default.xml`属于YARN(Yet Another Resource Negotiator),它是Hadoop 2.x版本引入的资源管理系统,取代了原来的JobTracker。YARN的主要任务是资源分配和作业调度。配置项如`yarn.nodemanager.resource.memory-mb`和`yarn.nodemanager.resource.cpu-vcores`分别定义了每个NodeManager节点可分配的内存和CPU核心数;`yarn.scheduler.minimum-allocation-mb`和`yarn.scheduler.maximum-allocation-mb`设定了最小和最大资源分配的限制。
了解这些默认配置文件对于优化Hadoop集群性能、确保稳定运行以及解决可能出现的问题至关重要。开发者和管理员可以根据实际需求调整这些配置,以满足特定的工作负载和资源需求。同时,理解这些配置的含义也有助于深入理解Hadoop的工作原理,从而更好地利用这个强大的工具。在实际操作中,还需要结合`site.xml`文件(如`core-site.xml`, `hdfs-site.xml`, `mapred-site.xml`, 和`yarn-site.xml`)来覆盖默认配置,实现定制化的集群配置。