Designing Distributed Systems Using
Approximate Synchrony in Data Center Networks
Dan R. K. Ports Jialin Li Vincent Liu Naveen Kr. Sharma Arvind Krishnamurthy
University of Washington
{drkp,lijl,vincent,naveenks,arvind}@cs.washington.edu
Abstract
Distributed systems are traditionally designed indepen-
dently from the underlying network, making worst-case
assumptions (e.g., complete asynchrony) about its behav-
ior. However, many of today’s distributed applications are
deployed in data centers, where the network is more re-
liable, predictable, and extensible. In these environments,
it is possible to co-design distributed systems with their
network layer, and doing so can offer substantial benefits.
This paper explores network-level mechanisms for pro-
viding Mostly-Ordered Multicast (MOM): a best-effort
ordering property for concurrent multicast operations. Us-
ing this primitive, we design Speculative Paxos, a state
machine replication protocol that relies on the network to
order requests in the normal case. This approach leads to
substantial performance benefits: under realistic data cen-
ter conditions, Speculative Paxos can provide 40% lower
latency and
2.6×
higher throughput than the standard
Paxos protocol. It offers lower latency than a latency-
optimized protocol (Fast Paxos) with the same throughput
as a throughput-optimized protocol (batching).
1 Introduction
Most distributed systems are designed independently from
the underlying network. For example, distributed algo-
rithms are typically designed assuming an asynchronous
network, where messages may be arbitrarily delayed,
dropped, or reordered in transit. In order to avoid making
assumptions about the network, designers are in effect
making worst-case assumptions about it.
Such an approach is well-suited for the Internet, where
little is known about the network: one cannot predict what
paths messages might take or what might happen to them
along the way. However, many of today’s applications are
distributed systems that are deployed in data centers. Data
center networks have a number of desirable properties
that distinguish them from the Internet:
•
Data center networks are more predictable. They are
designed using structured topologies [8,15,33], which
makes it easier to understand packet routes and ex-
pected latencies.
•
Data center networks are more reliable. Congestion
losses can be made unlikely using features such as
Quality of Service and Data Center Bridging [18].
•
Data center networks are more extensible. They are
part of a single administrative domain. Combined with
new flexibility provided by modern technologies like
software-defined networking, this makes it possible to
deploy new types of in-network processing or routing.
These differences have the potential to change the way
distributed systems are designed. It is now possible to
co-design distributed systems and the network they use,
building systems that rely on stronger guarantees avail-
able in the network and deploying new network-level
primitives that benefit higher layers.
In this paper, we explore the benefits of co-designing in
the context of state machine replication—a performance-
critical component at the heart of many critical data center
services. Our approach is to treat the data center as an
approximation of a synchronous network, in contrast to
the asynchronous model of the Internet. We introduce two
new mechanisms, a new network-level primitive called
Mostly-Ordered Multicast and the Speculative Paxos repli-
cation protocol, which leverages approximate synchrony
to provide higher performance in data centers.
The first half of our approach is to engineer the network
to provide stronger ordering guarantees. We introduce a
Mostly-Ordered Multicast primitive (MOM), which pro-
vides a best-effort guarantee that all receivers will receive
messages from different senders in a consistent order.
We develop simple but effective techniques for provid-
ing Mostly-Ordered Multicast that leverage the structured
topology of a data center network and the forwarding
flexibility provided by software-defined networking.
Building on this MOM primitive is Speculative Paxos, a
new protocol for state machine replication designed for an
environment where reordering is rare. In the normal case,
Speculative Paxos relies on MOM’s ordering guarantees
to efficiently sequence requests, allowing it to execute and
commit client operations with the minimum possible la-
tency (two message delays) and with significantly higher
throughput than Paxos. However, Speculative Paxos re-
mains correct even in the uncommon case where messages
are delivered out of order: it falls back on a reconciliation
protocol that ensures it remains safe and live with the
same guarantees as Paxos.
Our experiments demonstrate the effectiveness of this
approach. We find:
1
评论1