paxosmadelive英文版资源-CSDN文库

5星 · 超过95%的资源需积分: 17 124 浏览量 2012-01-07 11:13:14 上传评论 1 收藏 240KB PDF 举报

### Paxos Made Live：在Google中的实践与挑战 #### 概述《Paxos Made Live》是一篇由Tushar Chandra、Robert Griesemer和Joshua Redstone撰写的技术文章，发表于2007年6月26日。该论文详细介绍了作者们在Google构建基于Paxos共识算法的容错数据库系统的实践经验。尽管理论文献中已有大量关于Paxos的研究，但在实际工程应用中构建此类系统仍面临诸多非平凡的问题。 #### 引言在计算机科学领域，通过复制技术实现硬件故障容忍性是众所周知的方法。一种常见的做法是利用一致性算法来确保所有副本之间数据的一致性。通过反复应用这种算法于一系列输入值上，可以在每个副本上建立相同的值序列。如果这些值是对某种数据结构的操作，则可以通过在所有副本上应用相同的序列来实现一致的数据结构状态。例如，如果日志包含一系列数据库操作，并且在每个副本上的本地数据库上应用相同的序列，最终所有副本都将拥有相同的数据内容（假设它们都从相同初始数据库状态开始）。这种方法可以用于实现各种容错原语，其中容错数据库仅是其中一个例子。因此，共识问题在过去二十年里被广泛研究。存在多种著名的共识算法，在不同的设置下运行并能容忍各种类型的故障。Paxos共识算法已经讨论了十多年，无论是在理论上还是在实践中都有所应用。 #### 实践挑战作者们选择了Paxos算法作为构建容错日志框架的基础，并依赖这一框架来构建一个容错数据库。尽管已经有大量文献探讨了相关主题，但构建生产级别的系统仍然是一个极具挑战性的任务，原因包括但不限于以下几点： - **代码规模膨胀**：虽然Paxos可以用一页伪代码描述清楚，但他们的完整实现包含了数千行C++代码。代码量的增加不仅仅是由于选择了C++语言本身，而是因为在实际工程实现中需要处理更多的细节。 - **复杂度增加**：理论模型往往忽略了实际环境中可能遇到的各种复杂情况，例如网络延迟、机器故障等，这些都需要在实现过程中予以考虑。 - **性能优化**：为了满足实际应用场景下的性能需求，开发人员必须对算法进行优化，这涉及到许多工程技巧和技术选择。 #### 工程问题与解决方案在构建基于Paxos的容错数据库时，作者们遇到了一系列算法性和工程性的问题，并找到了相应的解决方法。具体而言，这些问题及解决方案包括： - **网络延迟与同步问题**：在网络通信中，延迟和同步问题是不可避免的。为了解决这些问题，开发人员采用了一种称为“心跳”的机制，定期向其他节点发送消息以维持连接的状态，并确保所有节点之间的同步。 - **故障恢复**：在分布式系统中，节点故障是常态。为了快速恢复系统功能，设计了自动故障检测机制，并实现了数据复制策略以确保即使部分节点失效，系统仍能继续提供服务。 - **性能优化**：为了提高系统的响应速度和吞吐量，采用了多线程编程和异步I/O技术，同时优化了数据结构的设计，减少了不必要的数据传输。 #### 结论《Paxos Made Live》一文中，作者不仅详细描述了使用Paxos构建容错数据库的理论基础，还分享了他们在实践中遇到的具体问题以及采取的解决方案。尽管构建这样的系统面临着诸多挑战，但通过精心设计和不断优化，最终能够构建出具备竞争力的容错数据库系统。对于希望深入了解分布式系统设计与实现的读者来说，《Paxos Made Live》提供了宝贵的经验和指导。

资源推荐

资源详情

资源评论

Paxos Made Live - An Engineering Perspective

Tushar Chandra

Robert Griesemer

Joshua Redstone

June 26, 2007

Abstract

We describe our experience bu ilding a fault-tolerant data-base using t he Paxos consensus algorithm.

Despite the existing literature in the ﬁeld, building such a database proved to be non-trivial. We describe

selected algorithmic and engineering problems encountered, and the solutions we found for them. Our

measurements indicate that we have built a competitive system.

1 Introduction

It is well known that fault-tolerance on commodity hardware can be achieved through replication [17, 18]. A

common approach is to use a consensus algorithm [7] to ensure that all r e plicas are mutually consistent [8 ,

14, 17]. By repeatedly applying such an algorithm on a sequence of input values, it is possible to build an

identica l log of values on each replica. If the values are operations on some data structure, application of

the same log on all replicas may be used to a rrive at mutually consistent data structures on all replicas. For

instance, if the log contains a sequence of database operations, and if the same sequence of operations is

applied to the (local) database on each re plica, eventually all replica s will end up with the same database

content (provided that they all started with the same initial database state).

This general approach can be used to implement a wide variety o f fault-tolerant primitives, of which a

fault-tolerant database is just an example. As a result, the consensus problem has been s tudied ex tensively

over the past two decades. There are several well-known consensus algorithms that operate within a multitude

of settings and which toler ate a variety of failures. The Paxos consensus algorithm [8] has been discussed in

the theoretical [16] and applied community [10, 11, 12] for over a dec ade.

We used the Paxos algorithm (“Paxos” ) as the base for a framework that implements a fault-tolerant

log. We then relied o n that framework to build a fault-tolerant database. Despite the existing literature on

the subject, building a production system turned out to be a non-trivial task for a variety of reaso ns:

• While Paxos can be described with a page of pseudo-code, our complete implementation contains several

thousand lines of C++ code. The blow-up is not due simply to the fact that we used C++ ins tead

of pseudo notation, nor because our code style may have been verbose. Converting the algorithm into

a practical, production-ready system involved implementing many features and optimizations – some

published in the literature and some not.

• The fault-tolerant algorithms community is a ccustomed to proving short algorithms (one page of pseudo

code) correc t. This approa ch does not sca le to a system with thousands of lines of code. To gain

conﬁdence in the “correctness” of a real system, diﬀerent methods had to be use d.

• Fault-tolerant algorithms tolerate a limited set of carefully selected faults. However, the real world

exp oses software to a wide variety of failure modes, including e rrors in the algorithm, bugs in its

ACM 2007. This is a minor revision of the work that wil l be published in the proceedings of ACM PODC 2007.

implementation, and operator error. We had to engineer the software and design operational proce dures

to robustly handle this wider set of failure modes.

• A real system is rarely speciﬁed precise ly. Even worse, the speciﬁcation may change during the im-

plementation phase. Consequently, an implementation should be malleable. Finally, a system might

“fail” due to a misunderstanding that occurre d during its speciﬁcation phase.

This paper discusses a selection of the algorithmic and engineering challenges we encountered in moving

Paxos from theo ry to practice. This exercise took more R&D eﬀorts than a straightforward translation of

pseudo-code to C++ might suggest.

The rest of this paper is organized as follows. The next two sections expand on the motiva tio n for this

project and describe the general environment into which our system was built. We then provide a quick

refresher on Paxos. We divide our expe riences into three categorie s and discuss each in turn: algorithmic gaps

in the literature, software engineering challenges, a nd unexpected failures. We co nclude with meas urements

of our system, and some broader observations on the state of the art in our ﬁeld.

2 Background

Chubby [1] is a fault-tolerant system a t Google that provides a distributed locking mechanism and stores

small ﬁles. Typically there is one Chubby instance, or “cell”, per data center. Several Google systems – such

as the Google Filesystem (GFS) [4] and Bigtable [2] – use Chubby for distributed coordinatio n a nd to store

a small amount of metadata.

Chubby achieves fa ult-to lerance through replication. A typical Chubby cell consists of ﬁve replicas,

running the same code, each running on a dedicated machine. Every Chubby object (e.g., a Chubby lock,

or ﬁle) is stored as an entry in a database. It is this database that is replicated. At any one time, one of

these replicas is considered to be the “ma ster”.

Chubby clients (such as GFS and Bigtable) contact a Chubby cell for service. The master replica serves

all Chubby requests. If a Chubby client contacts a replica that is not the master, the replica replies with

the master’s network addre ss. The Chubby client may then contact the master. If the mas ter fails, a new

master is automatically elected, which will then continue to serve tra ﬃc based on the contents of its local

copy of the replicated database. Thus, the replicated database ensures continuity of Chubby state ac ross

master failover.

The ﬁrst version of Chubby was based on a commerc ial, third-party, fault-toler ant databas e; we will

refer to this databa se as “3DB” for the rest of this pap er. This database had a history of bugs related to

replication. In fact, as far as we know, the replication mechanism was not based on a proven replication

algorithm and we do not know if it is correct. Given the history of problems associated with tha t product

and the importance of Chubby, we eventually decided to replace 3DB with our own solution based on the

Paxos algorithm.

3 Architecture outline

Figure 1 illustrates the architecture of a single Chubby replica. A fault-tolerant replicated log based on the

Paxos algorithm sits at the bottom of the protocol s tack. Each replica maintains a local copy of the log. The

Paxos a lgorithm is r un repeatedly as requir e d to ensure that all replicas have identical sequences of entries

in their local logs. Replicas communicate with each other through a Paxos-speciﬁc protocol.

The nex t layer is a fault-tolerant replicated database which includes a local copy of the databas e at each

replica. The database consists of a local snapshot and a replay-log of database operations. New database

operations are submitted to the replicated log. When a database operation appears at a replica, it is applied

on tha t replica’s local database copy.

Finally, Chubby uses the fault-tolerant database to store its sta te. Chubby clients communicate with a

single Chubby re plica through a Chubby-speciﬁc protocol.

剩余15页未读，继续阅读

评论收藏

内容反馈

cisco_vpn

2014-08-26

又是一篇关于paxos算法的经典论文，正在学习中，很不错。
baidu_22525413

2016-07-19

不错，很好用，很经典！

ken_henderson

粉丝: 2
资源: 11

paxos made live 英文版

最新资源

paxos made live 英文版

The Part-Time Parliament+ Paxos Made Simple+ paxos made live-paper

paxos 算法推导中文版

paxos:基于“Paxos made simple”论文的Paxos“synod”共识算法的实现。 这解决了在“a”和“z”之间选择字母的简单共识问题，并尝试根据观察到的结果验证我们实现的正确性

Paxos made simple.pdf

从Paxos到ZooKeeper 清晰扫描版pdf加源码

《Paxos Made Simple》分布式一致性协议Paxos论文翻译

Paxos算法中文翻译

paxos-simple-Copy.pdf

云计算：C++实现的可直接运行paxos算法

Paxos Made Simple

从Paxos到Zookeeper 分布式一致性原理与实践 PDF电子书下载 带目录书签 完整版.pdf

paxos 算法解释

基于paxos的源码

Paxos算法详解.ppt

分布式服务协议Paxos原理、应用场景

paxos simple

chubby-paxosMadeLive.pdf

Paxos implementation

Revisiting the Paxos algorithm

Fast Paxos(pdf)

《Paxos到Zookeeper——分布式一致性原理与实践》高清完整版

Paxos图解（xmid图解）

cheap-paxos.pdf_Paxos算法_

从Paxos到Zookeeper

Paxos算法.pdf

从paxos到zookeeper

Beyond Compare 5激活安装教程

黑群晖 ARPL 引导文件

《中级系统集成项目管理工程师教程》软考中级资料，软考中级资料

Win11共享打印机修复工具 V22H2

最新资源

paxos:基于“Paxos made simple”论文的Paxos“synod”共识算法的实现。这解决了在“a”和“z”之间选择字母的简单共识问题，并尝试根据观察到的结果验证我们实现的正确性

从Paxos到Zookeeper 分布式一致性原理与实践 PDF电子书下载带目录书签完整版.pdf