The_Totem_Single-Ring_Ordering_and_Membership_Protocol.pdf

协议文档

分布式

4星 · 超过85%的资源需积分: 22 193 浏览量 2018-10-12 11:54:35 上传评论收藏 376KB PDF 举报

资源推荐

资源详情

资源评论

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/2603291

The Totem Single-Ring Ordering and Membership Protocol

ArticleinACM Transactions on Computer Systems · March 1999

DOI: 10.1145/210223.210224·Source: CiteSeer

CITATIONS

247

READS

1,064

5 authors, including:

Some of the authors of this publication are also working on these related projects:

AmeriFlux Management Project View project

Louise Moser

University of California, Santa Barbara

284 PUBLICATIONS6,364 CITATIONS

SEE PROFILE

Peter Melliar-Smith

University of California, San Francisco

251 PUBLICATIONS5,656 CITATIONS

SEE PROFILE

Deborah A. Agarwal

Lawrence Berkeley National Laboratory

107 PUBLICATIONS2,045 CITATIONS

SEE PROFILE

All content following this page was uploaded by Deborah A. Agarwal on 26 October 2014.

The user has requested enhancement of the downloaded file.

The Totem Single-Ring Ordering

and Memb ership Protocol

Y. AMIR, L. E. MOSER, P. M. MELLIAR-SMITH, D. A. AGARWAL, P. CIARFELLA

University of California, Santa Barbara

Fault-tolerant distributed systems are becoming more imp ortant but, in existing

systems, maintaining the consistency of replicated data is quite exp ensive. The

Totem single-ring proto col supports consistent concurrent operations by placing a

total order on broadcast messages. This total order is derived from the sequence

numb er in a token that circulates around a logical ring imp osed on a set of pro cessors

in a broadcast domain. The proto col handles reconguration of the system when

processors fail and restart or the network partitions and remerges. Extended virtual

synchrony ensures that pro cessors deliver messages and conguration changes to

the application in a consistent total order system-wide. An eective ow control

mechanism enables the Totem single-ring proto col to achieve message ordering rates

signicantly higher than the b est prior total ordering protocols.

Categories and Sub ject Descriptors: C.2.1 [

Computer-Communications Net-

works

]: Network Architecture and Design|

network communications

; C.2.2 [

Com-

puter Communication Networks

]: Network Protocols|

protocol architecture

;

C.2.4 [

Computer-Communication Networks

]: Distributed Systems|

network

operating systems

; C.2.5 [

Computer-Communication Networks

]: Local

Networks|

rings

; D.4.4 [

Operating Systems

]: Communications Management|

network communication

; D.4.5 [

Operating Systems

]: Reliability|

fault-tolerance

;

D.4.7 [

Operating Systems

]: Organization and Design|

distributed systems

General Terms: Protocols, Performance, Reliability

Additional Key Words and Phrases: Flow control, membership, reliable delivery,

token passing, total ordering, virtual synchrony

Earlier versions of the Totem single-ring protocol appeared in the Proceedings of the IEE Interna-

tional Conference on Information Engineering, Singap ore (December 1991) and in the Pro ceedings

of the IEEE 13th International Conference on Distributed Computing Systems, Pittsburgh, PA

(May 1993).

This research was supported by NSF Grant No. NCR-9016361, ARPA Contract No. N00174-93-

K-0097, and Ro ckwell CMC/State of California MICRO Grant No. 92-101.

Authors' Addresses: Y. Amir, Computer Science Department, The Hebrew University of

Jerusalem, Israel; L. E. Moser and P. M. Melliar-Smith, Department of Electrical and Com-

puter Engineering, University of California, Santa Barbara, CA 93106; D. A. Agarwal, Lawrence

Berkeley National Laboratory, Berkeley, CA 94720; P. Ciarfella, Cascade Communications Cor-

poration, 5 Carlisle Road, Westford, MA 01886.

Permission to copy without fee all or part of this material is granted provided that the copies are

not made or distributed for direct commercial advantage, the ACM copyright notice and the title

of the publication and its date app ear, and notice is given that copying is by p ermission of the

Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or

specic p ermission.



Y. Amir, L. E. Moser, P. M. Melliar-Smith, D. A. Agarwal, P. Ciarfella

1. INTRODUCTION

Fault-tolerant distributed systems are b ecoming more imp ortant due to the increas-

ing demand for more reliable operation and improved p erformance. Maintaining

the consistency of replicated data and co ordinating the activities of coop erating

processors present substantial problems, which are made more dicult by concur-

rency, asynchrony, fault-tolerance, and real-time p erformance requirements. Exist-

ing fault-tolerant distributed systems that address these problems are dicult to

program, and exp ensive in the numb er of messages broadcast and/or computations

required. Recent protocols for fault-tolerant distributed systems [2; 5; 9; 12; 17]

employ the idea of placing a partial or total order on broadcast messages to simplify

the application programs and to reduce the communication and computation costs.

The Totem single-ring proto col supp orts high-performance fault-tolerant dis-

tributed systems that must continue to operate despite network partitioning and

remerging, and pro cessor failure and restart. Totem provides totally ordered mes-

sage delivery with low overhead, high throughput, and low latency using a logical

token-passing ring imp osed on a broadcast domain. The key to its high perfor-

mance is an eective ow control mechanism. Totem also provides rapid detection

of network partitioning and pro cessor failure together with reconguration and

membership services. Its novel mechanisms prevent delivery of messages in dier-

ent orders in dierent comp onents of a partitioned network, and provide accurate

information about which processors have delivered which messages. Earlier versions

of the Totem single-ring proto col are describ ed in [3; 11].

Programming the application is considerably simplied if messages are delivered

in total order rather than only in causal order, or if messages are delivered in causal

order rather than only in FIFO order. In prior systems, delivery of messages in

total order has b een more exp ensive than delivery of messages in causal order, and

delivery of messages in causal order has b een more expensive than FIFO delivery.

The Totem single-ring proto col can, however, deliver totally ordered messages with

high throughput at no greater cost than causally ordered messages or, indeed,

than reliable point-to-point FIFO messages. A total order on messages simplies

the application programming by reducing the risk of inconsistency when replicated

data are updated, and by resolving the contention for shared resources within the

system, such as the claiming of locks.

In Totem, messages are delivered in agreed order, which guarantees that pro ces-

sors deliver messages in a consistent total order and that, when a pro cessor delivers

a message, it has already delivered all prior messages originated within its current

conguration. Totem also provides delivery in safe order, which guarantees addi-

tionally that, when a processor delivers a message, it has determined that every

processor in the current conguration has received and will deliver the message un-

less that processor fails. Delivery of a message in agreed or safe order is requested

by the originator of the message.

Delivery in a consistent total order is not easy to achieve in distributed systems

that are sub ject to pro cessor failure and network partitioning. A failing pro cessor,

or a group of pro cessors that have become isolated, may deliver messages in an

order that is dierent from the order determined by other processors. As long as

those pro cessors remain failed or isolated, these inconsistencies are not apparent,

The Totem Single-Ring Ordering and Memb ership Proto col



Application

messages

Broadcast

Best-Effort Broadcast Domain

Membership

Join messages

Commit token

Recovery

Configuration Change

messages

Token loss

Configuration changes

Install

Foreign messages

Total Ordering

Flow Control

Reliable Delivery

Fig. 1: The Totem single-ring proto col hierarchy.

but as so on as a pro cessor is repaired and readmitted to the system, or as so on

as the comp onents of a partitioned system are remerged, the inconsistencies in the

message order may b ecome manifest and recovery may be dicult. The Totem

protocol cannot guarantee that every pro cessor is able to deliver every message but

it do es guarantee that, if two pro cessors deliver a message, they deliver the message

in the same total order.

The application programs may also need to know ab out conguration changes.

Dierent pro cessors may learn of a conguration change at dierent times, but

they must form consistent views of the conguration change and of the messages

that precede or follow the conguration change. Birman [5] devised the concept of

virtual synchrony, which ensures that processors deliver messages consistently in

the event of pro cessor fail-stop faults. We have generalized this concept to extended

virtual synchrony [14], which applies to systems in which the network can partition

and remerge, and in which pro cessors can fail and restart with stable storage intact.

The Totem single-ring proto col is designed to operate over a single broadcast

domain, such as an Ethernet. It uses the Unix UDP service, which provides a b est-

eort multicast service over such media. Other media that provide a b est-eort

multicast service, such as ATM or the Internet MBone, can be used to construct

the broadcast domain needed by Totem.

The software architecture of the Totem single-ring proto col is shown in Figure 1.

The arrows on the left represent the passage of messages through the proto col hi-

erarchy, while the arrows on the right represent Conguration Change messages

and conguration installation. Using a logical token-passing ring imposed on the

physical broadcast domain, the single-ring proto col provides reliable totally ordered

messsage delivery and eective ow control. On detection of token loss, or on receiv-

ing a message from a pro cessor not on its ring, a pro cessor invokes the membership

protocol to form a new ring using Join messages and a Commit token transmit-

ted over the broadcast domain. The membership proto col activates the recovery



Y. Amir, L. E. Moser, P. M. Melliar-Smith, D. A. Agarwal, P. Ciarfella

protocol with the prop osed conguration change. The recovery proto col uses the

single-ring ordering proto col to recover missing messages. The processor then in-

stalls the new ring, by delivering Conguration Change messages to the application.

2. RELATED WORK

Our work on the Totem proto col is based on our combined exp erience with two

systems: the Trans and Total reliable ordered broadcast and membership proto cols

[12; 16] and the Transis group communication system [1; 2].

The Trans proto col uses p ositive and negative acknowledgments piggybacked

onto broadcast messages and exploits the transitivity of p ositive acknowledgments

to reduce the numb er of acknowledgments required. The Total proto col, layered on

top of the Trans protocol, is a fault-tolerant total ordering proto col that continues

to order messages provided that a resiliency constraint is met. The memb ership

protocol, layered on top of the Total protocol, ensures that each change in the

membership occurs at the same logical time in each pro cessor, corresp onding to

a p osition in the total order. The Totem proto col was developed to address the

computational overhead of Trans and Total, and is intended for lo cal-area networks

with fast and highly reliable communication.

The Transis group communication system provides reliable ordered group mul-

ticast and membership services. Transis initially based its ordering protocol on

the Trans proto col but, more recently, has also included the Totem proto col for

message ordering. Unlike other prior proto cols, the Transis membership protocol

supports remerging of a partitioned network, and maintains a consensus view of the

membership of each component, rather than a global consensus view of the entire

system. The Totem membership proto col uses the idea, rst proposed for Transis,

that the membership can be reduced in size to ensure termination.

In [7] Chang and Maxemchuk describ ed a reliable broadcast and ordering pro-

tocol that uses a token-based sequencer strategy. Unlike Totem, which requires

that a processor must hold the token to broadcast a message, their protocol allows

processors to broadcast messages at any time. The processor holding the token is

responsible for broadcasting an acknowledgment message that includes a sequence

numb er for each message acknowledged. A processor that has not received a mes-

sage sends a negative acknowledgment to request retransmission by the processor

that acknowledged the message. While the latency is go o d at low loads, it increases

at high loads and in the presence of a failed pro cessor.

More closely related to Totem is the TPM protocol of Ra jagopalan and McKin-

ley [18], which also uses a token to control broadcasting and sequencing of messages.

The TPM protocol provides the safe delivery but not the agreed delivery that Totem

provides. In the absence of processor failure and network partitioning, TPM re-

quires on average two and one-half token rotations for safe delivery, whereas Totem

requires two token rotations. In the event of network partitioning, only the comp o-

nent containing a ma jority of the pro cessors continues to op erate; pro cessors in the

other comp onents block. In contrast, Totem handles network partitioning and re-

merging by allowing each comp onent of a partitioned system to continue op erating,

not just the component that contains a ma jority of the pro cessors.

Birman's Isis system [5], and the more recent Horus system [19], have fo cused

on pro cess groups and the application program interface. Isis provides BCAST

剩余29页未读，继续阅读

评论收藏

内容反馈

do_er

2019-10-09

学习一下先

青阳Jayan

粉丝: 7
资源: 1

The_Totem_Single-Ring_Ordering_and_Membership_Protocol.pdf

The-Totem-Single-Ring-Ordering-and-Membership-Protocol.pdf

Introduce_Totem_Protocol.pdf

totem-1.0源代码

Totem拓扑数据

totem-nautilus-3.26.2-1.el7.x86_64.rpm

nrf5_SDK_for_Mesh_v0.10.1-Alpha_src

totem-pl-parser-3.26.1-2.el8.x86_64.rpm

totem-pl-parser-3.26.1-1.el7.x86_64.rpm

totem-pl-parser-devel-3.26.1-1.el7.x86_64.rpm

totem-pl-parser-3.26.1-2.el8.aarch64.rpm

totem-nautilus-3.26.2-1.el8.x86_64.rpm

totem-devel-3.26.2-1.el7.x86_64.rpm

Totem_Pole_Bridgeless_PFC.slx

totem-3.26.2-1.el8.x86_64.rpm

totem-3.26.2-1.el7.x86_64.rpm

ONE_Cycle_CONTROL__BL_TOTEM_PFC_V1.psimsch

Totem-2020年中国营销趋势报告（英文）-2020.2-148页.pdf

totem-3.26.2-1.el8.ppc64le.rpm

totem_pfc_single.slx.autosave

totem-pl-parser-devel-3.26.1-1.el7.i686.rpm

天翼云应知+从业者+解决方案+高级解决方案

华为云数据中心解决方案技术白皮书.pdf

openstack的qcow2格式的可用镜像

云数据中心网络与SDN：技术架构与实现.pdf

OpenStack cinder对接Ceph实验手册

迁移上云架构设计及解决方案

企业混合云架构设计及解决方案.pdf

中国移动边缘计算技术白皮书

小型私有云搭建方案.pptx

最新资源