没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/2603291
The Totem Single-Ring Ordering and Membership Protocol
ArticleinACM Transactions on Computer Systems · March 1999
DOI: 10.1145/210223.210224·Source: CiteSeer
CITATIONS
247
READS
1,064
5 authors, including:
Some of the authors of this publication are also working on these related projects:
AmeriFlux Management Project View project
Louise Moser
University of California, Santa Barbara
284 PUBLICATIONS6,364 CITATIONS
SEE PROFILE
Peter Melliar-Smith
University of California, San Francisco
251 PUBLICATIONS5,656 CITATIONS
SEE PROFILE
Deborah A. Agarwal
Lawrence Berkeley National Laboratory
107 PUBLICATIONS2,045 CITATIONS
SEE PROFILE
All content following this page was uploaded by Deborah A. Agarwal on 26 October 2014.
The user has requested enhancement of the downloaded file.
The Totem Single-Ring Ordering
and Memb ership Protocol
Y. AMIR, L. E. MOSER, P. M. MELLIAR-SMITH, D. A. AGARWAL, P. CIARFELLA
University of California, Santa Barbara
Fault-tolerant distributed systems are becoming more imp ortant but, in existing
systems, maintaining the consistency of replicated data is quite exp ensive. The
Totem single-ring proto col supports consistent concurrent operations by placing a
total order on broadcast messages. This total order is derived from the sequence
numb er in a token that circulates around a logical ring imp osed on a set of pro cessors
in a broadcast domain. The proto col handles reconguration of the system when
processors fail and restart or the network partitions and remerges. Extended virtual
synchrony ensures that pro cessors deliver messages and conguration changes to
the application in a consistent total order system-wide. An eective ow control
mechanism enables the Totem single-ring proto col to achieve message ordering rates
signicantly higher than the b est prior total ordering protocols.
Categories and Sub ject Descriptors: C.2.1 [
Computer-Communications Net-
works
]: Network Architecture and Design|
network communications
; C.2.2 [
Com-
puter Communication Networks
]: Network Protocols|
protocol architecture
;
C.2.4 [
Computer-Communication Networks
]: Distributed Systems|
network
operating systems
; C.2.5 [
Computer-Communication Networks
]: Local
Networks|
rings
; D.4.4 [
Operating Systems
]: Communications Management|
network communication
; D.4.5 [
Operating Systems
]: Reliability|
fault-tolerance
;
D.4.7 [
Operating Systems
]: Organization and Design|
distributed systems
General Terms: Protocols, Performance, Reliability
Additional Key Words and Phrases: Flow control, membership, reliable delivery,
token passing, total ordering, virtual synchrony
Earlier versions of the Totem single-ring protocol appeared in the Proceedings of the IEE Interna-
tional Conference on Information Engineering, Singap ore (December 1991) and in the Pro ceedings
of the IEEE 13th International Conference on Distributed Computing Systems, Pittsburgh, PA
(May 1993).
This research was supported by NSF Grant No. NCR-9016361, ARPA Contract No. N00174-93-
K-0097, and Ro ckwell CMC/State of California MICRO Grant No. 92-101.
Authors' Addresses: Y. Amir, Computer Science Department, The Hebrew University of
Jerusalem, Israel; L. E. Moser and P. M. Melliar-Smith, Department of Electrical and Com-
puter Engineering, University of California, Santa Barbara, CA 93106; D. A. Agarwal, Lawrence
Berkeley National Laboratory, Berkeley, CA 94720; P. Ciarfella, Cascade Communications Cor-
poration, 5 Carlisle Road, Westford, MA 01886.
Permission to copy without fee all or part of this material is granted provided that the copies are
not made or distributed for direct commercial advantage, the ACM copyright notice and the title
of the publication and its date app ear, and notice is given that copying is by p ermission of the
Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or
specic p ermission.
2
Y. Amir, L. E. Moser, P. M. Melliar-Smith, D. A. Agarwal, P. Ciarfella
1. INTRODUCTION
Fault-tolerant distributed systems are b ecoming more imp ortant due to the increas-
ing demand for more reliable operation and improved p erformance. Maintaining
the consistency of replicated data and co ordinating the activities of coop erating
processors present substantial problems, which are made more dicult by concur-
rency, asynchrony, fault-tolerance, and real-time p erformance requirements. Exist-
ing fault-tolerant distributed systems that address these problems are dicult to
program, and exp ensive in the numb er of messages broadcast and/or computations
required. Recent protocols for fault-tolerant distributed systems [2; 5; 9; 12; 17]
employ the idea of placing a partial or total order on broadcast messages to simplify
the application programs and to reduce the communication and computation costs.
The Totem single-ring proto col supp orts high-performance fault-tolerant dis-
tributed systems that must continue to operate despite network partitioning and
remerging, and pro cessor failure and restart. Totem provides totally ordered mes-
sage delivery with low overhead, high throughput, and low latency using a logical
token-passing ring imp osed on a broadcast domain. The key to its high perfor-
mance is an eective ow control mechanism. Totem also provides rapid detection
of network partitioning and pro cessor failure together with reconguration and
membership services. Its novel mechanisms prevent delivery of messages in dier-
ent orders in dierent comp onents of a partitioned network, and provide accurate
information about which processors have delivered which messages. Earlier versions
of the Totem single-ring proto col are describ ed in [3; 11].
Programming the application is considerably simplied if messages are delivered
in total order rather than only in causal order, or if messages are delivered in causal
order rather than only in FIFO order. In prior systems, delivery of messages in
total order has b een more exp ensive than delivery of messages in causal order, and
delivery of messages in causal order has b een more expensive than FIFO delivery.
The Totem single-ring proto col can, however, deliver totally ordered messages with
high throughput at no greater cost than causally ordered messages or, indeed,
than reliable point-to-point FIFO messages. A total order on messages simplies
the application programming by reducing the risk of inconsistency when replicated
data are updated, and by resolving the contention for shared resources within the
system, such as the claiming of locks.
In Totem, messages are delivered in agreed order, which guarantees that pro ces-
sors deliver messages in a consistent total order and that, when a pro cessor delivers
a message, it has already delivered all prior messages originated within its current
conguration. Totem also provides delivery in safe order, which guarantees addi-
tionally that, when a processor delivers a message, it has determined that every
processor in the current conguration has received and will deliver the message un-
less that processor fails. Delivery of a message in agreed or safe order is requested
by the originator of the message.
Delivery in a consistent total order is not easy to achieve in distributed systems
that are sub ject to pro cessor failure and network partitioning. A failing pro cessor,
or a group of pro cessors that have become isolated, may deliver messages in an
order that is dierent from the order determined by other processors. As long as
those pro cessors remain failed or isolated, these inconsistencies are not apparent,
The Totem Single-Ring Ordering and Memb ership Proto col
3
Application
messages
Broadcast
Best-Effort Broadcast Domain
Membership
Join messages
Commit token
Recovery
Configuration Change
messages
Token loss
Configuration changes
Install
Foreign messages
Total Ordering
Flow Control
Reliable Delivery
Fig. 1: The Totem single-ring proto col hierarchy.
but as so on as a pro cessor is repaired and readmitted to the system, or as so on
as the comp onents of a partitioned system are remerged, the inconsistencies in the
message order may b ecome manifest and recovery may be dicult. The Totem
protocol cannot guarantee that every pro cessor is able to deliver every message but
it do es guarantee that, if two pro cessors deliver a message, they deliver the message
in the same total order.
The application programs may also need to know ab out conguration changes.
Dierent pro cessors may learn of a conguration change at dierent times, but
they must form consistent views of the conguration change and of the messages
that precede or follow the conguration change. Birman [5] devised the concept of
virtual synchrony, which ensures that processors deliver messages consistently in
the event of pro cessor fail-stop faults. We have generalized this concept to extended
virtual synchrony [14], which applies to systems in which the network can partition
and remerge, and in which pro cessors can fail and restart with stable storage intact.
The Totem single-ring proto col is designed to operate over a single broadcast
domain, such as an Ethernet. It uses the Unix UDP service, which provides a b est-
eort multicast service over such media. Other media that provide a b est-eort
multicast service, such as ATM or the Internet MBone, can be used to construct
the broadcast domain needed by Totem.
The software architecture of the Totem single-ring proto col is shown in Figure 1.
The arrows on the left represent the passage of messages through the proto col hi-
erarchy, while the arrows on the right represent Conguration Change messages
and conguration installation. Using a logical token-passing ring imposed on the
physical broadcast domain, the single-ring proto col provides reliable totally ordered
messsage delivery and eective ow control. On detection of token loss, or on receiv-
ing a message from a pro cessor not on its ring, a pro cessor invokes the membership
protocol to form a new ring using Join messages and a Commit token transmit-
ted over the broadcast domain. The membership proto col activates the recovery
4
Y. Amir, L. E. Moser, P. M. Melliar-Smith, D. A. Agarwal, P. Ciarfella
protocol with the prop osed conguration change. The recovery proto col uses the
single-ring ordering proto col to recover missing messages. The processor then in-
stalls the new ring, by delivering Conguration Change messages to the application.
2. RELATED WORK
Our work on the Totem proto col is based on our combined exp erience with two
systems: the Trans and Total reliable ordered broadcast and membership proto cols
[12; 16] and the Transis group communication system [1; 2].
The Trans proto col uses p ositive and negative acknowledgments piggybacked
onto broadcast messages and exploits the transitivity of p ositive acknowledgments
to reduce the numb er of acknowledgments required. The Total proto col, layered on
top of the Trans protocol, is a fault-tolerant total ordering proto col that continues
to order messages provided that a resiliency constraint is met. The memb ership
protocol, layered on top of the Total protocol, ensures that each change in the
membership occurs at the same logical time in each pro cessor, corresp onding to
a p osition in the total order. The Totem proto col was developed to address the
computational overhead of Trans and Total, and is intended for lo cal-area networks
with fast and highly reliable communication.
The Transis group communication system provides reliable ordered group mul-
ticast and membership services. Transis initially based its ordering protocol on
the Trans proto col but, more recently, has also included the Totem proto col for
message ordering. Unlike other prior proto cols, the Transis membership protocol
supports remerging of a partitioned network, and maintains a consensus view of the
membership of each component, rather than a global consensus view of the entire
system. The Totem membership proto col uses the idea, rst proposed for Transis,
that the membership can be reduced in size to ensure termination.
In [7] Chang and Maxemchuk describ ed a reliable broadcast and ordering pro-
tocol that uses a token-based sequencer strategy. Unlike Totem, which requires
that a processor must hold the token to broadcast a message, their protocol allows
processors to broadcast messages at any time. The processor holding the token is
responsible for broadcasting an acknowledgment message that includes a sequence
numb er for each message acknowledged. A processor that has not received a mes-
sage sends a negative acknowledgment to request retransmission by the processor
that acknowledged the message. While the latency is go o d at low loads, it increases
at high loads and in the presence of a failed pro cessor.
More closely related to Totem is the TPM protocol of Ra jagopalan and McKin-
ley [18], which also uses a token to control broadcasting and sequencing of messages.
The TPM protocol provides the safe delivery but not the agreed delivery that Totem
provides. In the absence of processor failure and network partitioning, TPM re-
quires on average two and one-half token rotations for safe delivery, whereas Totem
requires two token rotations. In the event of network partitioning, only the comp o-
nent containing a ma jority of the pro cessors continues to op erate; pro cessors in the
other comp onents block. In contrast, Totem handles network partitioning and re-
merging by allowing each comp onent of a partitioned system to continue op erating,
not just the component that contains a ma jority of the pro cessors.
Birman's Isis system [5], and the more recent Horus system [19], have fo cused
on pro cess groups and the application program interface. Isis provides BCAST
剩余29页未读,继续阅读
资源评论
- do_er2019-10-09学习一下先
青阳Jayan
- 粉丝: 7
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功