---
layout: global
title: HDFS charts
---
# HDFS charts
Helm charts for launching HDFS daemons in a K8s cluster. The main entry-point
chart is `hdfs-k8s`, which is a uber-chart that specifies other charts as
dependency subcharts. This means you can launch all HDFS components using
`hdfs-k8s`.
Note that the HDFS charts are currently in pre-alpha quality. They are also
being heavily revised and are subject to change.
HDFS on K8s supports the following features:
- namenode high availability (HA): HDFS namenode daemons are in charge of
maintaining file system metadata concerning which directories have which
files and where are the file data. Namenode crash will cause service outage.
HDFS can run two namenodes in active/standby setup. HDFS on K8s supports HA.
- K8s persistent volumes (PV) for metadata: Namenode crash will cause service
outage. Losing namenode metadata can lead to loss of file system. HDFS on
K8s can store the metadata in remote K8s persistent volumes so that metdata
can remain intact even if both namenode daemons are lost or restarted.
- K8s HostPath volumes for file data: HDFS datanodes daemons store actual
file data. File data should also survive datanode crash or restart. HDFS on
K8s stores the file data on the local disks of the K8s cluster nodes using
K8s HostPath volumes. (We plan to switch to a better mechanism, K8s
persistent local volumes)
- Kerberos: Vanilla HDFS is not secure. Intruders can easily write custom
client code, put a fake user name in requests and steal data. Production
HDFS often secure itself using Kerberos. HDFS on K8s supports Kerberos.
Here is the list of all charts.
- hdfs-k8s: main uber-chart. Launches other charts.
- hdfs-namenode-k8s: a statefulset and other K8s components for launching HDFS
namenode daemons, which maintains file system metadata. The chart supports
namenode high availability (HA).
- hdfs-datanode-k8s: a daemonset and other K8s components for launching HDFS
datanode daemons, which are responsible for storing file data.
- hdfs-config-k8s: a configmap containing Hadoop config files for HDFS.
- zookeeper: This chart is NOT in this repo. But hdfs-k8s pulls the zookeeper
chart in the incubator remote repo
(https://kubernetes-charts-incubator.storage.googleapis.com/)
as a dependency and launhces zookeeper daemons. Zookeeper makes sure
only one namenode is active in the HA setup, while the other namenode
becomes standby. By default, we will launch three zookeeper servers.
- hdfs-journalnode-k8s: a statefulset and other K8s components for launching
HDFS journalnode quorums, which ensures the file system metadata are
properly shared among the two namenode daemons in the HA setup.
By default, we will launch three journalnode servers.
- hdfs-client-k8s: a pod that is configured to run Hadoop client commands
for accessing HDFS.
- hdfs-krb5-k8s: a size-1 statefulset and other K8s components for launching
a Kerberos server, which can be used to secure HDFS. Disabled by default.
- hdfs-simple-namenode-k8s: Disabled by default. A simpler setup of the
namenode that launches only one namenode. i.e. This does not support HA. It
does not support Kerberos nor persistent volumes either. As it does not
support HA, we also don't need zookeeper nor journal nodes. You may prefer
this if you want the simplest possible setup.
# Prerequisite
Requires Kubernetes 1.6+ as the `namenode` and `datanodes` are using
`ClusterFirstWithHostNet`, which was introduced in Kubernetes 1.6
# Usage
## Basic
The HDFS daemons can be launched using the main `hdfs-k8s` chart. First, build
the main chart using:
```
$ helm repo add incubator \
https://kubernetes-charts-incubator.storage.googleapis.com/
$ helm dependency build charts/hdfs-k8s
```
Zookeeper, journalnodes and namenodes need persistent volumes for storing
metadata. By default, the helm charts do not set the storage class name for
dynamically provisioned volumes, nor does it use persistent volume selectors for
static persistent volumes.
This means it will rely on a provisioner for default storage volume class for
dynamic volumes. Or if your cluster has statically provisioned volumes, the
chart will match existing volumes entirely based on the size requirements. To
override this default behavior, you can specify storage volume classes for
dynamic volumes, or volume selectors for static volumes. See below for how to
set these options.
- namenodes: Each of the two namenodes needs at least a 100 GB volume. i.e.
Yon need two 100 GB volumes. This can be overridden by the
`hdfs-namenode-k8s.persistence.size` option.
You can also override the storage class or the selector using
`hdfs-namenode-k8s.persistence.storageClass`, or
`hdfs-namenode-k8s.persistence.selector` respectively. For details, see the
values.yaml file inside `hdfs-namenode-k8s` chart dir.
- zookeeper: You need three > 5 GB volumes. i.e. Each of the two zookeeper
servers will need at least 5 GB in the volume. Can be overridden by
the `zookeeper.persistence.size` option. You can also override
the storage class using `zookeeper.persistence.storageClass`.
- journalnodes: Each of the three journalnodes will need at least 20 GB in
the volume. The size can be overridden by the
`hdfs-journalnode-k8s.persistence.size` option.
You can also override the storage class or the selector using
`hdfs-journalnode-k8s.persistence.storageClass`, or
`hdfs-journalnode-k8s.persistence.selector` respectively. For details, see the
values.yaml file inside `hdfs-journalnode-k8s` chart dir.
- kerberos: The single Kerberos server will need at least 20 GB in the volume.
The size can be overridden by the `hdfs-krb5-k8s.persistence.size` option.
You can also override the storage class or the selector using
`hdfs-krb5-k8s.persistence.storageClass`, or
`hdfs-krb5-k8s.persistence.selector` respectively. For details, see the
values.yaml file inside `hdfs-krb5-k8s` chart dir.
Then launch the main chart. Specify the chart release name say "my-hdfs",
which will be the prefix of the K8s resource names for the HDFS components.
```
$ helm install -n my-hdfs charts/hdfs-k8s
```
Wait for all daemons to be ready. Note some daemons may restart themselves
a few times before they become ready.
```
$ kubectl get pod -l release=my-hdfs
NAME READY STATUS RESTARTS AGE
my-hdfs-client-c749d9f8f-d5pvk 1/1 Running 0 2m
my-hdfs-datanode-o7jia 1/1 Running 3 2m
my-hdfs-datanode-p5kch 1/1 Running 3 2m
my-hdfs-datanode-r3kjo 1/1 Running 3 2m
my-hdfs-journalnode-0 1/1 Running 0 2m
my-hdfs-journalnode-1 1/1 Running 0 2m
my-hdfs-journalnode-2 1/1 Running 0 1m
my-hdfs-namenode-0 1/1 Running 3 2m
my-hdfs-namenode-1 1/1 Running 3 2m
my-hdfs-zookeeper-0 1/1 Running 0 2m
my-hdfs-zookeeper-1 1/1 Running 0 2m
my-hdfs-zookeeper-2 1/1 Running 0 2m
```
Namenodes and datanodes are currently using the K8s `hostNetwork` so they can
see physical IPs of each other. If they are not using `hostNetowrk`,
overlay K8s network providers such as weave-net may mask the physical IPs,
which will confuse the data locality later inside namenodes.
Finally, test with the client pod:
```
$ _CLIENT=$(kubectl get pods -l app=hdfs-client,release=my-hdfs -o name | \
cut -d/ -f 2)
$ kubectl exec $_CLIENT -- hdfs dfsadmin -report
$ kubectl exec $_CLIENT -- hdfs haadmin -getServiceState nn0
$ kubectl exec $_CLIENT -- hdfs haadmin -getServiceState nn1
没有合适的资源?快使用搜索试试~ 我知道了~
kubernetes-HDFS, 在Kubernetes中,保存用于运行HDFS集群的配置文件的存储库.zip
共58个文件
yaml:20个
gold:9个
sh:9个
需积分: 22 5 下载量 130 浏览量
2019-09-17
10:50:48
上传
评论
收藏 204KB ZIP 举报
温馨提示
kubernetes-HDFS, 在Kubernetes中,保存用于运行HDFS集群的配置文件的存储库 版式标题全局Kubernetes上的HDFSKubernetes上的存储在Kubernetes上的Hadoop分布式文件系统( HDFS )的存储库。有关如何运行图表的信息,请参阅图表/README.md 。有关如
资源推荐
资源详情
资源评论
收起资源包目录
kubernetes-HDFS.zip (58个子文件)
kubernetes-HDFS-master
.gitignore 20B
README.md 332B
tests
values
custom-hadoop-config.yaml 271B
kerberos.yaml 96B
common.yaml 464B
README.md 3KB
gold
subchart-datanode.gold 5KB
subchart-client.gold 3KB
subchart-zookeeper.gold 11KB
single-namenode.gold 10KB
kerberos.gold 38KB
subchart-config.gold 5KB
subchart-journalnode.gold 5KB
subchart-namenode.gold 9KB
basic.gold 29KB
run.sh 2KB
teardown.sh 710B
cleanup.sh 818B
cases
_single-namenode.sh 1KB
_basic.sh 1KB
_basic-subcharts.sh 3KB
_kerberos.sh 4KB
setup.sh 5KB
lib
_k8s.sh 1KB
designs
journal-approach.png 91KB
namenode-metadata.png 24KB
namenode-HA.md 8KB
LICENSE 11KB
topology
README.md 5KB
pod-cidr
.gitignore 19B
README.md 859B
pom.xml 3KB
src
main
java
org
apache
hadoop
net
PodCIDRToNodeMapping.java 14KB
.travis.yml 459B
charts
hdfs-datanode-k8s
Chart.yaml 101B
templates
datanode-daemonset.yaml 6KB
hdfs-krb5-k8s
.helmignore 333B
Chart.yaml 120B
templates
statefulset.yaml 3KB
hdfs-client-k8s
Chart.yaml 98B
templates
client-deployment.yaml 2KB
hdfs-namenode-k8s
Chart.yaml 100B
templates
namenode-statefulset.yaml 10KB
README.md 16KB
hdfs-simple-namenode-k8s
Chart.yaml 114B
templates
namenode-statefulset.yaml 2KB
hdfs-journalnode-k8s
Chart.yaml 117B
templates
journalnode-statefulset.yaml 6KB
hdfs-k8s
.gitignore 25B
values.yaml 9KB
.helmignore 333B
requirements.yaml 1KB
Chart.yaml 131B
templates
_helpers.tpl 8KB
hdfs-config-k8s
.helmignore 333B
Chart.yaml 131B
templates
_helpers.tpl 2KB
configmap.yaml 7KB
共 58 条
- 1
资源评论
weixin_38744207
- 粉丝: 343
- 资源: 2万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 操作简单的Mongodb网页web管理工具,基于Spring Boot2.0支持mongodb集群.zip
- tms-mongodb-web,提供访问mongodb数据的REST API和可灵活扩展的mongodb web 客户端.zip
- SpringBoot整合mongodb学习MongoTemplate和MongoRepository两种方式CRUD使用.zip
- SpringBoot整合MongoDB实现对数据库的CRUD小demo.zip
- Python操作MongoDB数据库的基本一些操作 .zip
- NOSQL数据库监控工具,目前实现了对Redis、MongoDB的监控功能 .zip
- mongoDB数据库的增删改查,以及所需要的配置.zip
- mongodb数据库idea测试.zip
- koa 分别 连接 mysql、mongodb数据库操作.zip
- 基于pytorch实现的人体部件分割源码+模型.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功