没有合适的资源?快使用搜索试试~ 我知道了~
HDFS-High-Availability
需积分: 9 1 下载量 103 浏览量
2017-09-18
15:36:59
上传
评论
收藏 163KB PDF 举报
温馨提示
Large users often mandate that their IT systems are highly available, or are using Hadoop- based platforms as part of a service with SLAs that requires high availability. While high availability needs to be addressed across the stack it makes sense for the work to start with HDFS because most components in a Hadoop-based system are dependent on HDFS, and therefore their own availability may be limited by HDFS availability.
资源推荐
资源详情
资源评论
HDFS High Availability
Eli Collins, Todd Lipcon, Aaron T Myers
Motivation
Large users often mandate that their IT systems are highly available, or are using Hadoop-
based platforms as part of a service with SLAs that requires high availability. While high
availability needs to be addressed across the stack it makes sense for the work to start with
HDFS because most components in a Hadoop-based system are dependent on HDFS, and
therefore their own availability may be limited by HDFS availability.
Use Cases
The point of high availability is to increase the proportion of time the platform is functioning for
users. We can split the uses cases according to times when the system is not functioning:
1. Planned downtime, eg due to software upgrades and configuration changes. Upgrades and
configuration changes are likely more common than failures that currently cause downtime, and
are therefore a bigger source of downtime. Unplanned downtime is more or less acceptable to
different users, for example some users may have regular maintenance windows while others
need to keep a service up 24 x 7. If an administrator needs to take the system offline in order to
perform maintenance, what steps need to be performed, how long do they take.
2. Un-planned downtime, eg due to unexpected hardware failures. If the systems stops
functioning, what steps need to be performed to bring it back on line, and how long do they take.
If users have a process in place to deal with planned downtime (eg a regular service window)
then un-planned downtime is likely their primary concern.
3. Poor quality of service (QOS), even when the cluster is functioning poor QOS may result in
a lack of availability. A cluster that does not scale may not be available eg if a job can use a
disproportionate amount of resources, block other jobs, etc.
We make the following assumptions:
1. Because more users can tolerate planned downtime (eg will have regular maintenance
windows) the un-planned downtime is higher priority. Scalability and resource management are
out of the scope of this document.
2. Intermediate HDFS releases may rely on an HA NFS filer since this investment can be
amortized over multiple clusters, and is complementary to existing HDFS systems (eg users
often already buy HA filers to store the image and edits log). There is value in supporting both
options as some users may already be comfortable operating filers and want to avoid the
operational complexity of a new storage options.
3. Because most components in the platform store data in HDFS they depend on it for their own
availability. HDFS is therefore the natural place to start when addressing platform availability.
This writeup focuses on improving HDFS availability with the intent of increasing overall platform
availability, for example MapReduce and HBase may need to be modified to benefit from
improvements in HDFS availability (for example by continuing to function during Namenode fail-
over). Components dependent on these, eg Pig and Hive will benefit transitively.
Requirements / Assumptions
Both manual and automatic fail-over should be supported. Manual hot fail-over and automatic
hot fail-over are the most important use cases. Warm standby should be supported but is less
important than hot fail-over.
An active-passive configuration with two dedicated servers is sufficient for the near term. Future
releases should not require dedicated hosts be specified up-front (assuming any host is capable
of running the Namenode).
It is acceptable to require an HA NFS filer. Future releases/updates should not, ie no additional
hardware aside from the servers and switches is required for high availability.
An admin should be able to fail-back after fail-over.
The standby should not be required to share a switch with the master. Ie you can run the
standby cross-rack.
Failure types should be handled according current recommended hardware configurations (eg
it’s OK to require the primary and standby use ECC memory, redundant power, etc).
It is important to handle soft failures, eg components are frequently flaky rather than fail-stop.
Adding a dependency on Linux HA projects (eg Heartbeat) is acceptable, if necessary.
Operators (not using Enterprise) will perform and monitor fail-over tasks via the command-line
tools and Web UIs.
Goals
The following goals apply to HA generally:
HA configuration and fail-over management steps needs to be simple to prevent unavailability
and data loss due to configuration/operational mistakes.
HA should use consistent mechanisms and techniques across components in a Hadoop-based
剩余6页未读,继续阅读
资源评论
weixin_40294485
- 粉丝: 0
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 信息融合与状态估计 主要是针对多传感器多时滞(包括状态之后和观测滞后)系统,带有色噪声多重时滞传感网络系统的序列协方差交叉融合Kalman滤波器 将带有色噪声的系统转化为带相关噪声的系统,然后再进行
- EMplanner注释版本matlab代码,可刀 该算法使用dp动态规划进行了轨迹规划,过程中未向apollo的EM规划器一样还使用了QP进行了轨迹规划,整个大包
- qt tcp udp socket 通信 实现文字、图片、文件、语音和实时对讲 包括客户端和服务器端,带有报告,所有功能支持客户端和服务器端双向通信 自己开发,非转卖 物品可复制,联系不 ,介
- 240201118王辰辰实验4.pdf
- 永磁同步电机的全速度范围无传感器矢量控制:脉振高频注入(方波注入)切到改进SMO 低速段采用HFI脉振高频注入启动,中高速段采用基于转子磁链模型的SMO,切方法为加权系数 改进的SMO不使用低通滤
- 240201118王辰辰实验6.pdf
- PWM控制半桥全桥LLC谐振变器 仿真包括开环和闭环,可实现软开关,波形如下 matlab simulink模型
- LoRA原理详解,深入理解Lora原理,以及案例实战
- 汇川中型plc+纯ST语言双轴同步设备,程序中没有使用任何库文件,纯原生codesys功能块 非常适合初学入门者,三个驱动模拟虚主轴和两个伺服从轴,只要手里有汇川AM400,600,AC700,80
- 数据结构(c实现--vs2013).rar
- win11打印机共享修复签名认证
- 一个项目计划+每日工作任务记录的模板:
- 中间顶升流道输送机(sw16可编辑+工程图+bom)全套技术资料100%好用.zip
- 载具自动翻转机(sw16可编辑+cad+bom)全套技术资料100%好用.zip
- 自动称重AI码垛机x_t全套技术资料100%好用.zip
- 自动滚轮柑橘剥皮机sw16可编辑全套技术资料100%好用.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功