kubernetes-HDFS,在Kubernetes中，保存用于运行HDFS集群的配置文件的存储库.zip资源-CSDN文库

共58个文件

yaml：20个

gold：9个

sh：9个

需积分: 22 130 浏览量 2019-09-17 10:50:48 上传评论收藏 204KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

kubernetes-HDFS.zip （58个子文件）

kubernetes-HDFS-master

.gitignore 20B

README.md 332B

tests

values

custom-hadoop-config.yaml 271B

kerberos.yaml 96B

common.yaml 464B

README.md 3KB

gold

subchart-datanode.gold 5KB

subchart-client.gold 3KB

subchart-zookeeper.gold 11KB

single-namenode.gold 10KB

kerberos.gold 38KB

subchart-config.gold 5KB

subchart-journalnode.gold 5KB

subchart-namenode.gold 9KB

basic.gold 29KB

run.sh 2KB

teardown.sh 710B

cleanup.sh 818B

cases

_single-namenode.sh 1KB

_basic.sh 1KB

_basic-subcharts.sh 3KB

_kerberos.sh 4KB

setup.sh 5KB

lib

_k8s.sh 1KB

designs

journal-approach.png 91KB

namenode-metadata.png 24KB

namenode-HA.md 8KB

LICENSE 11KB

topology

README.md 5KB

pod-cidr

.gitignore 19B

README.md 859B

pom.xml 3KB

src

main

java

org

apache

hadoop

net

PodCIDRToNodeMapping.java 14KB

.travis.yml 459B

charts

hdfs-datanode-k8s

Chart.yaml 101B

templates

datanode-daemonset.yaml 6KB

hdfs-krb5-k8s

.helmignore 333B

Chart.yaml 120B

templates

statefulset.yaml 3KB

hdfs-client-k8s

Chart.yaml 98B

templates

client-deployment.yaml 2KB

hdfs-namenode-k8s

Chart.yaml 100B

templates

namenode-statefulset.yaml 10KB

README.md 16KB

hdfs-simple-namenode-k8s

Chart.yaml 114B

templates

namenode-statefulset.yaml 2KB

hdfs-journalnode-k8s

Chart.yaml 117B

templates

journalnode-statefulset.yaml 6KB

hdfs-k8s

.gitignore 25B

values.yaml 9KB

.helmignore 333B

requirements.yaml 1KB

Chart.yaml 131B

templates

_helpers.tpl 8KB

hdfs-config-k8s

.helmignore 333B

Chart.yaml 131B

templates

_helpers.tpl 2KB

configmap.yaml 7KB

--- layout: global title: HDFS charts --- # HDFS charts Helm charts for launching HDFS daemons in a K8s cluster. The main entry-point chart is `hdfs-k8s`, which is a uber-chart that specifies other charts as dependency subcharts. This means you can launch all HDFS components using `hdfs-k8s`. Note that the HDFS charts are currently in pre-alpha quality. They are also being heavily revised and are subject to change. HDFS on K8s supports the following features: - namenode high availability (HA): HDFS namenode daemons are in charge of maintaining file system metadata concerning which directories have which files and where are the file data. Namenode crash will cause service outage. HDFS can run two namenodes in active/standby setup. HDFS on K8s supports HA. - K8s persistent volumes (PV) for metadata: Namenode crash will cause service outage. Losing namenode metadata can lead to loss of file system. HDFS on K8s can store the metadata in remote K8s persistent volumes so that metdata can remain intact even if both namenode daemons are lost or restarted. - K8s HostPath volumes for file data: HDFS datanodes daemons store actual file data. File data should also survive datanode crash or restart. HDFS on K8s stores the file data on the local disks of the K8s cluster nodes using K8s HostPath volumes. (We plan to switch to a better mechanism, K8s persistent local volumes) - Kerberos: Vanilla HDFS is not secure. Intruders can easily write custom client code, put a fake user name in requests and steal data. Production HDFS often secure itself using Kerberos. HDFS on K8s supports Kerberos. Here is the list of all charts. - hdfs-k8s: main uber-chart. Launches other charts. - hdfs-namenode-k8s: a statefulset and other K8s components for launching HDFS namenode daemons, which maintains file system metadata. The chart supports namenode high availability (HA). - hdfs-datanode-k8s: a daemonset and other K8s components for launching HDFS datanode daemons, which are responsible for storing file data. - hdfs-config-k8s: a configmap containing Hadoop config files for HDFS. - zookeeper: This chart is NOT in this repo. But hdfs-k8s pulls the zookeeper chart in the incubator remote repo (https://kubernetes-charts-incubator.storage.googleapis.com/) as a dependency and launhces zookeeper daemons. Zookeeper makes sure only one namenode is active in the HA setup, while the other namenode becomes standby. By default, we will launch three zookeeper servers. - hdfs-journalnode-k8s: a statefulset and other K8s components for launching HDFS journalnode quorums, which ensures the file system metadata are properly shared among the two namenode daemons in the HA setup. By default, we will launch three journalnode servers. - hdfs-client-k8s: a pod that is configured to run Hadoop client commands for accessing HDFS. - hdfs-krb5-k8s: a size-1 statefulset and other K8s components for launching a Kerberos server, which can be used to secure HDFS. Disabled by default. - hdfs-simple-namenode-k8s: Disabled by default. A simpler setup of the namenode that launches only one namenode. i.e. This does not support HA. It does not support Kerberos nor persistent volumes either. As it does not support HA, we also don't need zookeeper nor journal nodes. You may prefer this if you want the simplest possible setup. # Prerequisite Requires Kubernetes 1.6+ as the `namenode` and `datanodes` are using `ClusterFirstWithHostNet`, which was introduced in Kubernetes 1.6 # Usage ## Basic The HDFS daemons can be launched using the main `hdfs-k8s` chart. First, build the main chart using: ``` $ helm repo add incubator \ https://kubernetes-charts-incubator.storage.googleapis.com/ $ helm dependency build charts/hdfs-k8s ``` Zookeeper, journalnodes and namenodes need persistent volumes for storing metadata. By default, the helm charts do not set the storage class name for dynamically provisioned volumes, nor does it use persistent volume selectors for static persistent volumes. This means it will rely on a provisioner for default storage volume class for dynamic volumes. Or if your cluster has statically provisioned volumes, the chart will match existing volumes entirely based on the size requirements. To override this default behavior, you can specify storage volume classes for dynamic volumes, or volume selectors for static volumes. See below for how to set these options. - namenodes: Each of the two namenodes needs at least a 100 GB volume. i.e. Yon need two 100 GB volumes. This can be overridden by the `hdfs-namenode-k8s.persistence.size` option. You can also override the storage class or the selector using `hdfs-namenode-k8s.persistence.storageClass`, or `hdfs-namenode-k8s.persistence.selector` respectively. For details, see the values.yaml file inside `hdfs-namenode-k8s` chart dir. - zookeeper: You need three > 5 GB volumes. i.e. Each of the two zookeeper servers will need at least 5 GB in the volume. Can be overridden by the `zookeeper.persistence.size` option. You can also override the storage class using `zookeeper.persistence.storageClass`. - journalnodes: Each of the three journalnodes will need at least 20 GB in the volume. The size can be overridden by the `hdfs-journalnode-k8s.persistence.size` option. You can also override the storage class or the selector using `hdfs-journalnode-k8s.persistence.storageClass`, or `hdfs-journalnode-k8s.persistence.selector` respectively. For details, see the values.yaml file inside `hdfs-journalnode-k8s` chart dir. - kerberos: The single Kerberos server will need at least 20 GB in the volume. The size can be overridden by the `hdfs-krb5-k8s.persistence.size` option. You can also override the storage class or the selector using `hdfs-krb5-k8s.persistence.storageClass`, or `hdfs-krb5-k8s.persistence.selector` respectively. For details, see the values.yaml file inside `hdfs-krb5-k8s` chart dir. Then launch the main chart. Specify the chart release name say "my-hdfs", which will be the prefix of the K8s resource names for the HDFS components. ``` $ helm install -n my-hdfs charts/hdfs-k8s ``` Wait for all daemons to be ready. Note some daemons may restart themselves a few times before they become ready. ``` $ kubectl get pod -l release=my-hdfs NAME READY STATUS RESTARTS AGE my-hdfs-client-c749d9f8f-d5pvk 1/1 Running 0 2m my-hdfs-datanode-o7jia 1/1 Running 3 2m my-hdfs-datanode-p5kch 1/1 Running 3 2m my-hdfs-datanode-r3kjo 1/1 Running 3 2m my-hdfs-journalnode-0 1/1 Running 0 2m my-hdfs-journalnode-1 1/1 Running 0 2m my-hdfs-journalnode-2 1/1 Running 0 1m my-hdfs-namenode-0 1/1 Running 3 2m my-hdfs-namenode-1 1/1 Running 3 2m my-hdfs-zookeeper-0 1/1 Running 0 2m my-hdfs-zookeeper-1 1/1 Running 0 2m my-hdfs-zookeeper-2 1/1 Running 0 2m ``` Namenodes and datanodes are currently using the K8s `hostNetwork` so they can see physical IPs of each other. If they are not using `hostNetowrk`, overlay K8s network providers such as weave-net may mask the physical IPs, which will confuse the data locality later inside namenodes. Finally, test with the client pod: ``` $ _CLIENT=$(kubectl get pods -l app=hdfs-client,release=my-hdfs -o name | \ cut -d/ -f 2) $ kubectl exec $_CLIENT -- hdfs dfsadmin -report $ kubectl exec $_CLIENT -- hdfs haadmin -getServiceState nn0 $ kubectl exec $_CLIENT -- hdfs haadmin -getServiceState nn1

评论收藏

内容反馈