Consensus in the Presence of Partial Synchrony
CYNTHIA DWORK AND NANCY LYNCH
.Massachusetts Institute of Technology, Cambridge, Massachusetts
AND
LARRY STOCKMEYER
IBM Almaden Research Center, San Jose, California
Abstract. The concept of partial synchrony in a distributed system is introduced. Partial synchrony lies
between the cases of a synchronous system and an asynchronous system. In a synchronous system,
there is a known fixed upper bound A on the time required for a message to be sent from one processor
to another and a known fixed upper bound % on the relative speeds of different processors. In an
asynchronous system no fixed upper bounds A and @ exist. In one version of partial synchrony, fixed
bounds A and Cp exist, but they are not known a priori. The problem is to design protocols that work
correctly in the partially synchronous system regardless of the actual values of the bounds A and Cp. In
another version of partial synchrony, the bounds are known, but are only guaranteed to hold starting
at some unknown time T, and protocols must be designed to work correctly regardless of when time T
occurs. Fault-tolerant consensus protocols are given for various cases of partial synchrony and various
fault models. Lower bounds that show in most cases that our protocols are optimal with respect to the
number of faults tolerated are also given. Our consensus protocols for partially synchronous processors
use new protocols for fault-tolerant “distributed clocks” that allow partially synchronous processors to
reach some approximately common notion of time.
Categories and Subject Descriptors: C.2.4 [Computer-Communication Networks]: Distributed Systems-
distributed applications; distributed databases; network operating systems; C.4 [Computer Systems
Organization]: Performance of Systems-reliability, availability, and serviceability;
H.2.4 [Database
Management]:
Systems-distributed systems
General Terms: Algorithms, Performance, Reliability, Theory, Verification
Additional Key Words and Phrases: Agreement problem, Byzantine Generals problem, commit problem,
consensus problem, distributed clock, distributed computing, fault tolerance, partially synchronous
system
A preliminary version of this paper appears in Proceedings of the 3rd ACM Symposium on Principles
of Distributed Computing (Vancouver, B.C., Canada, Aug. 27-29). ACM, New York, 1984, pp. 103-
118.
The work of C. Dwork was supported by a Bantrell postdoctoral Fellowship. The work of N. Lynch was
supported in part by the Defense Advance Research Projects Agency under contract N00014-83-K-
0125, the National Science Foundation under grants DCR 83-02391 and MCS 83-06854, the Office of
Army Research under Contract DAAG29-84-K-0058, and the Office of Naval Research under contract
NOOO14-85-K-0168.
Authors’ addresses: C. Dwork and L. Stockmeyer, Department K53/802, IBM Almaden Research
Center, 650 Harry Road, San Jose, CA 95 120; N. Lynch, Laboratory for Computer Science, Massachu-
setts Institute of Technology, 545 Technology Square, Cambridge, MA 02 139.
Permission to copy without fee all or part of this material is granted provided that the copies are not
made or distributed for direct commercial advantage, the ACM copyright notice and the title of the
publication and its date appear, and notice is given that copying is by permission of the Association for
Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.
0 1988 ACM 0004-541 l/88/0400-0288 $01.50
Journal of the Association for Computing Machinery, Vol. 35, No. 2, April 1988,
pp.
288-323.
评论0