没有合适的资源?快使用搜索试试~ 我知道了~
HDFS-High-Availability
需积分: 9 1 下载量 91 浏览量
2017-09-18
15:36:59
上传
评论
收藏 163KB PDF 举报
温馨提示
Large users often mandate that their IT systems are highly available, or are using Hadoop- based platforms as part of a service with SLAs that requires high availability. While high availability needs to be addressed across the stack it makes sense for the work to start with HDFS because most components in a Hadoop-based system are dependent on HDFS, and therefore their own availability may be limited by HDFS availability.
资源推荐
资源详情
资源评论
HDFS High Availability
Eli Collins, Todd Lipcon, Aaron T Myers
Motivation
Large users often mandate that their IT systems are highly available, or are using Hadoop-
based platforms as part of a service with SLAs that requires high availability. While high
availability needs to be addressed across the stack it makes sense for the work to start with
HDFS because most components in a Hadoop-based system are dependent on HDFS, and
therefore their own availability may be limited by HDFS availability.
Use Cases
The point of high availability is to increase the proportion of time the platform is functioning for
users. We can split the uses cases according to times when the system is not functioning:
1. Planned downtime, eg due to software upgrades and configuration changes. Upgrades and
configuration changes are likely more common than failures that currently cause downtime, and
are therefore a bigger source of downtime. Unplanned downtime is more or less acceptable to
different users, for example some users may have regular maintenance windows while others
need to keep a service up 24 x 7. If an administrator needs to take the system offline in order to
perform maintenance, what steps need to be performed, how long do they take.
2. Un-planned downtime, eg due to unexpected hardware failures. If the systems stops
functioning, what steps need to be performed to bring it back on line, and how long do they take.
If users have a process in place to deal with planned downtime (eg a regular service window)
then un-planned downtime is likely their primary concern.
3. Poor quality of service (QOS), even when the cluster is functioning poor QOS may result in
a lack of availability. A cluster that does not scale may not be available eg if a job can use a
disproportionate amount of resources, block other jobs, etc.
We make the following assumptions:
1. Because more users can tolerate planned downtime (eg will have regular maintenance
windows) the un-planned downtime is higher priority. Scalability and resource management are
out of the scope of this document.
2. Intermediate HDFS releases may rely on an HA NFS filer since this investment can be
amortized over multiple clusters, and is complementary to existing HDFS systems (eg users
often already buy HA filers to store the image and edits log). There is value in supporting both
options as some users may already be comfortable operating filers and want to avoid the
operational complexity of a new storage options.
3. Because most components in the platform store data in HDFS they depend on it for their own
availability. HDFS is therefore the natural place to start when addressing platform availability.
This writeup focuses on improving HDFS availability with the intent of increasing overall platform
availability, for example MapReduce and HBase may need to be modified to benefit from
improvements in HDFS availability (for example by continuing to function during Namenode fail-
over). Components dependent on these, eg Pig and Hive will benefit transitively.
Requirements / Assumptions
Both manual and automatic fail-over should be supported. Manual hot fail-over and automatic
hot fail-over are the most important use cases. Warm standby should be supported but is less
important than hot fail-over.
An active-passive configuration with two dedicated servers is sufficient for the near term. Future
releases should not require dedicated hosts be specified up-front (assuming any host is capable
of running the Namenode).
It is acceptable to require an HA NFS filer. Future releases/updates should not, ie no additional
hardware aside from the servers and switches is required for high availability.
An admin should be able to fail-back after fail-over.
The standby should not be required to share a switch with the master. Ie you can run the
standby cross-rack.
Failure types should be handled according current recommended hardware configurations (eg
it’s OK to require the primary and standby use ECC memory, redundant power, etc).
It is important to handle soft failures, eg components are frequently flaky rather than fail-stop.
Adding a dependency on Linux HA projects (eg Heartbeat) is acceptable, if necessary.
Operators (not using Enterprise) will perform and monitor fail-over tasks via the command-line
tools and Web UIs.
Goals
The following goals apply to HA generally:
HA configuration and fail-over management steps needs to be simple to prevent unavailability
and data loss due to configuration/operational mistakes.
HA should use consistent mechanisms and techniques across components in a Hadoop-based
剩余6页未读,继续阅读
资源评论
weixin_40294485
- 粉丝: 0
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- java项目实战练习.zip
- java桌面小程序,主要为游戏.zip学习资料
- ember前端框架,一键部署到云开发平台.zip
- kero is a front-end model framework. - kero是一个前端模型框架,做为MVVM架构中Model层的增强,提供多维数据模型.zip
- PandaUi 是PandaX的前端框架,PandaX 是golang(go)语言微服务开发架构.zip
- v8垃圾回收机制 一篇技术分享文章
- libre后台管理系统前端,使用vue2开发.zip
- Java企业级快速开发平台 前后端分离基于nodejs+vue2+webpack+springboot.zip
- Java诊断工具.zip
- feHelper前端开发助手系统.zip开发
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功