没有合适的资源?快使用搜索试试~ 我知道了~
Aardvark协议(BFT)1
试读
16页
需积分: 0 0 下载量 103 浏览量
更新于2022-08-03
收藏 429KB PDF 举报
《Aardvark协议(BFT)1》
在分布式计算领域,拜占庭容错(Byzantine Fault Tolerance,BFT)是确保系统在存在恶意或错误行为的节点时仍能正常运行的关键技术。Aardvark协议是针对这一问题提出的一种新型BFT协议,旨在提高对拜占庭故障的容忍度,同时保持高效的性能。
近年来,虽然已经开发出了一些快速的BFT状态机复制协议,如PBFT(Practical Byzantine Fault Tolerance)、Q/U、HQ和Zyzzyva等,但这些协议在面对单个故障的客户端或服务器时表现得相当脆弱。它们的性能可能会大幅度下降,甚至导致长时间的服务不可用。这种现象与BFT系统的初衷相悖,因为真正的拜占庭容错系统应该能够在出现故障时依然提供可靠服务。
本论文首先揭示了现有BFT协议的脆弱性,指出它们在面对拜占庭故障时的不足。然后,它提出了构建BFT服务的一系列原则,以确保即使在发生拜占庭故障时,系统仍能保持实用性。基于这些原则,研究者设计并实现了一种新的BFT协议——Aardvark。
Aardvark协议的设计目标是在保持高效性能的同时,增强对拜占庭故障的容忍度。在测试中,Aardvark协议的峰值性能可以达到最优秀的现有协议的96%,并且在最多f个服务器和任意数量的客户端出现故障时,仍然能提供相当一部分性能。实验结果显示,对于各种注入的故障,Aardvark协议在吞吐量上表现出色,范围在每秒11706到38667个请求之间。
Aardvark协议的核心创新在于其对拜占庭故障的处理策略和优化机制。它采用了更精细的故障检测和恢复机制,以及更为灵活的共识算法,以减少单点故障对整个系统的影响。此外,Aardvark还通过改进通信和验证流程,降低了处理故障时的延迟,提高了系统的可用性。
Aardvark协议是对传统BFT协议的一次重要改进,它不仅提升了系统的容错能力,而且在性能上也取得了显著的提升。这为构建更安全、更可靠的分布式系统提供了新的解决方案,尤其是在那些对容错性和性能有高要求的应用场景中,Aardvark协议具有很大的应用潜力。
USENIX Association NSDI ’09: 6th USENIX Symposium on Networked Systems Design and Implementation 153
Making Byzantine Fault Tolerant Systems
Tolerate Byzantine Faults
Allen Clement, Edmund Wong, Lorenzo Alvisi, Mike Dahlin
The University of Texas at Austin
Mirco Marchetti
The University of Modena and Reggio Emilia
Abstract
This paper argues for a new approach to building Byzan-
tine fault tolerant replication systems. We observe that
although recently developed BFT state machine replica-
tion protocols are quite fast, they don’t tolerate Byzantine
faults very well: a single faulty client or server is capa-
ble of rendering PBFT, Q/U, HQ, and Zyzzyva virtually
unusable. In this paper, we (1) demonstrate that exist-
ing protocols are dangerously fragile, (2) define a set of
principles for constructing BFT services that remain use-
ful even when Byzantine faults occur, and (3) apply these
principles to construct a new protocol, Aardvark. Aard-
vark can achieve peak performance within 40% of that of
the best existing protocol in our tests and provide a sig-
nificant fraction of that performance when up to f servers
and any number of clients are faulty. We observe useful
throughputs between 11706 and 38667 requests per sec-
ond for a broad range of injected faults.
1 Introduction
This paper is motivated by a simple observation: al-
though recently developed BFT state machine replica-
tion protocols have driven the costs of BFT replication
to remarkably low levels [1, 8, 12, 18], the reality is that
they don’t tolerate Byzantine faults very well. In fact, a
single faulty client or server can render these systems ef-
fectively unusable by inflicting multiple orders of mag-
nitude reductions in throughput and even long periods
of complete unavailability. Performance degradations of
such degree are at odds with what one would expect from
a system that calls itself Byzantine fault tolerant—after
all, if a single fault can render a system unavailable, can
that system truly be said to tolerate failures?
To illustrate the the problem, Table 1 shows the mea-
sured performance of a variety of systems both in the
absence of failures and when a single faulty client sub-
mits a carefully crafted series of requests. As we show
later, a wide range of other behaviors—faulty primaries,
recovering replicas, etc.—can have a similar impact. We
believe that these collapses are byproducts of a single-
minded focus on designing BFT protocols with ever
more impressive best-case performance. While this fo-
cus is understandable—after years in which BFT repli-
cation was dismissed as too expensive to be practical,
it was important to demonstrate that high-performance
BFT is not an oxymoron—it has led to protocols whose
complexity undermines robustness in two ways: (1) the
protocols’ design includes fragile optimizations that al-
low a faulty client or server to knock the system off of
the optimized execution path to an expensive alternative
path and (2) the protocols’ implementation often fails to
handle properly all of the intricate corner cases, so that
the implementations are even more vulnerable than the
protocols appear on paper.
The primary contribution of this paper is to advocate a
new approach, robust BFT (RBFT), to building BFT sys-
tems. Our goal is to change the way BFT systems are de-
signed and implemented by shifting the focus from con-
structing high-strung systems that maximize best case
performance to constructing systems that offer accept-
able and predictable performance under the broadest pos-
sible set of circumstances—including when faults occur.
System Peak Throughput Faulty Client
PBFT [8] 61710 0
Q/U [1] 23850 0
†
HQ [12] 7629 N/A
‡
Zyzzyva [18] 65999 0
Aardvark 38667 38667
Table 1: Observed peak throughput of BFT systems in a
fault-free case and when a single faulty client submits a
carefully crafted series of requests. We detail our mea-
surements in Section 7.2.
†
The result reported for Q/U is
for correct clients issuing conflicting requests.
‡
The HQ
prototype demonstrates fault-free performance and does
not implement many of the error-handling steps required
to handle inconsistent MACs.
154 NSDI ’09: 6th USENIX Symposium on Networked Systems Design and Implementation USENIX Association
RBFT explicitly considers performance during both
gracious intervals—when the network is synchronous,
replicas are timely and fault-free, and clients correct—
and uncivil execution intervals in which network links
and correct servers are timely, but up to f =
n−1
3
servers and any number of clients are faulty. The last
row of Table 1 shows the performance of Aardvark, an
RBFT state machine replication protocol whose design
and implementation are guided by this new philosophy.
In some ways, Aardvark is very similar to traditional
BFT protocols: clients send requests to a primary who
relays requests to the replicas who agree (explicitly or
implicitly) on the sequence of requests and the corre-
sponding results—not unlike PBFT [8], High through-
put BFT [19], Q/U [1], HQ [12], Zyzzyva [18], ZZ [32],
Scrooge [28], etc.
In other ways, Aardvark is very different and chal-
lenges conventional wisdom. Aardvark utilizes signa-
tures for authentication, even though, as Castro correctly
observes, “eliminating signatures and using MACs in-
stead eliminates the main performance bottleneck in pre-
vious systems” [7]. Aardvark performs regular view
changes, even though view changes temporarily prevent
the system from doing useful work. Aardvark utilizes
point to point communication, even though renouncing
IP-multicast gives up throughput deliberately.
We reach these counter-intuitive choices by following
a simple and systematic approach: without ever compro-
mising safety, we deliberately refocus both the design
of the system and the engineering choices involved in
its implementation on the stress that failures can impose
on performance. In applying this strategy for RBFT to
construct Aardvark, we choose an extreme position in-
spired by maxi-min strategies in game theory [26]: we
reject any optimization for gracious executions that can
decrease performance during uncivil executions.
Surprisingly, these counter-intuitive choices impose
only a modest cost on its peak performance. As Table 1
illustrates, Aardvark sustains peak throughput of 38667
requests/second, which is within 40% of the best perfor-
mance we measure on the same hardware for four stat-
of-the-art protocols. At the same time, Aardvark’s fault
tolerance is dramatically improved. For a broad range
of client, primary, and server misbehaviors we prove that
Aardvark’s performance remains within a constant fac-
tor of its best case performance. Testing of the prototype
shows that these changes significantly improve robust-
ness under a range of injected faults.
Once again, however, the main contribution of this pa-
per is neither the Aardvark protocol nor implementation.
It is instead a new approach that can—and we believe
should—be applied to the design of other BFT protocols.
In particular, we (1) demonstrate that existing protocols
and their implementations are fragile, (2) argue that BFT
protocols should be designed and implemented with a fo-
cus on robustness, and (3) use Aardvark to demonstrate
that the RBFT approach is viable: we gain qualitatively
better performance during uncivil intervals at only mod-
est cost to performance during gracious intervals.
In Section 2 we describe our system model and the
guarantees appropriate for high assurance systems. In
Section 3 we elaborate on the need to rethink Byzan-
tine fault tolerance and identify a set of design principles
for RBFT systems. In Section 4 we present a system-
atic methodology for designing RBFT systems and an
overview of Aardvark. In Section 5 we describe in detail
the important components of the Aardvark protocol. In
Section 6 we present an analysis of Aardvark’s expected
performance. In Section 7 we present our experimental
evaluation. In Section 8 we discuss related work.
2 System model
We assume the Byzantine failure model where faulty
nodes (servers or clients) can behave arbitrarily [21] and
a strong adversary can coordinate faulty nodes to com-
promise the replicated service. We do, however, assume
the adversary cannot break cryptographic techniques like
collision-resistant hashing, message authentication codes
(MACs), encryption, and signatures. We denote a mes-
sage X signed by principal p’s public key as X
σ
p
. We
denote a message X with a MAC appropriate for princi-
pals p and r as X
µ
r,p
. We denote a message containing
a MAC authenticator—an array of MACs appropriate for
verification by every replica—as X
µ
r
Our model puts no restriction on clients, except that
their number be finite: in particular, any number of
clients can be arbitrarily faulty. However, the system’s
safety and liveness properties are guaranteed only if at
most f =
n−1
3
servers are faulty.
Finally, we assume an asynchronous network where
synchronous intervals, during which messages are deliv-
ered with a bounded delay, occur infinitely often.
Definition 1 (Synchronous interval). During a syn-
chronous interval any message sent between correct pro-
cesses is delivered within a bounded delay T if the sender
retransmits according to some schedule until it is deliv-
ered.
3 Recasting the problem
The foundation of modern BFT state machine replication
rests on an impossibility result and on two principles that
assist us in dealing with it. The impossibility result, of
course, is FLP [13], which states that no solution to con-
sensus can be both safe and live in an asynchronous sys-
tems if nodes can fail. The two principles, first applied
by Lamport to his Paxos protocol [20], are at the core
of Castro and Liskov’s seminal work on PBFT [7]. The
first states that synchrony must not be needed for safety:
USENIX Association NSDI ’09: 6th USENIX Symposium on Networked Systems Design and Implementation 155
as long as a threshold of faulty servers is not exceeded,
the replicated service must always produce linearizable
executions, independent of whether the network loses,
reorders, or arbitrarily delays messages. The second rec-
ognizes, given FLP, that synchrony must play a role in
liveness: clients are guaranteed to receive replies to their
requests only during intervals in which messages sent to
correct nodes are received within some fixed (but poten-
tially unknown) time interval from when they are sent.
Within these boundaries, the engineering of BFT pro-
tocols has embraced Lampson’s well-known recommen-
dation: “Handle normal and worst case separately as a
rule because the requirements for the two are quite dif-
ferent. The normal case must be fast. The worst case
must make some progress” [22]. Ever since PBFT, the
design of BFT systems has then followed a predictable
pattern: first, characterize what defines the normal (com-
mon) case; then, pull out all the stops to make the system
perform well for that case. While different systems don’t
completely agree on what defines the common case [16],
on one point they are unanimous: the common case in-
cludes only gracious executions, defined as follows:
Definition 2 (Gracious execution). An execution is gra-
cious iff (a) the execution is synchronous with some
implementation-dependent short bound on message de-
lay and (b) all clients and servers behave correctly.
The results of this approach continue to be spectac-
ular. Since Zyzzyva last year reported a throughput of
over 85,000 null requests per second [18], several new
protocols have further improved on that mark [16, 28].
Despite these impressive results, we argue that a sin-
gle minded focus on aggressively tuning BFT systems
for the best case of gracious execution, a practice that
we have engaged in with relish [18], is increasingly mis-
guided, dangerous, and even futile.
It is misguided, because it encourages the design and
implementation of systems that fail to deliver on their ba-
sic promise: to tolerate Byzantine faults. While provid-
ing impressive throughput during gracious executions,
today’s high-performance BFT systems are content to
guaranteeing weak liveness guarantees (e.g. “eventual
progress”) in the presence of Byzantine failures. Unfor-
tunately, as we previewed in Figure 1 and show in detail
in Section 7.2, these guarantees are weak indeed. Al-
though current BFT systems can survive Byzantine faults
without compromising safety, we contend that a system
that can be made completely unavailable by a simple
Byzantine failure can hardly be said to tolerate Byzan-
tine faults.
It is dangerous, because it encourages fragile opti-
mizations. Fragile optimizations are harmful in two
ways. First, as we will see in Section 7.2, they make it
easier for a faulty client or server to knock the system off
its hard-won optimized execution path and enter an alter-
native, much more expensive one. Second, they weigh
down the system with subtle corner cases, increasing the
likelihood of buggy or incomplete implementations.
It is (increasingly) futile, because the race to optimize
common case performance has reached a point of dimin-
ishing return where many services’ peak demands are al-
ready far under the best-case throughput offered by ex-
isting BFT replication protocols. For such systems, good
enough is good enough, and further improvements in best
case agreement throughput will have little effect on end-
to-end system performance.
In our view, a BFT system fulfills its obligations
when it provides acceptable and dependable performance
across the broadest possible set of executions, including
executions with Byzantine clients and servers. In par-
ticular, the temptation of fragile optimizations should be
resisted: a BFT system should be designed around an
execution path that has three properties: (1) it provides
acceptable performance, (2) it is easy to implement, and
(3) it is robust against Byzantine attempts to push the sys-
tem away from it. Optimizations for the common case
should be accepted only as long as they don’t endanger
these properties.
FLP tells us that we cannot guarantee liveness in an
asynchronous environment. This is no excuse to cling to
gracious executions only. In particular, there is no theo-
retical reason why BFT systems should not be expected
to perform well in what we call uncivil executions:
Definition 3 (Uncivil execution). An execution is
uncivil iff (a) the execution is synchronous with some
implementation-dependent short bound on message de-
lay, (b) up to f servers and an arbitrary number of clients
are Byzantine, and (c) all remaining clients and servers
are correct.
Hence, we propose to build RBFT systems that pro-
vide adequate performance during uncivil executions.
Although we recognize that this approach is likely to re-
duce the best case performance, we believe that for a
BFT system a limited reduction in peak throughput is
preferable to the devastating loss of availability that we
report in Figure 1 and Section 7.2.
Increased robustness may come at effectively no ad-
ditional cost as long as a service’s peak demand is be-
low the throughput achievable through RBFT design:
as a data point, our Aardvark prototype reaches a peak
throughput of 38667 req/s.
Similarly, when systems have other bottlenecks, Am-
dahl’s law limits the impact of changing the performance
of agreement. For example, we report in Section 7 that
PBFT can execute almost 62,000 null requests per sec-
ond, suggesting that agreement consumes 16.1µs per re-
quest. If, rather than a null service, we replicate a service
156 NSDI ’09: 6th USENIX Symposium on Networked Systems Design and Implementation USENIX Association
for which executing an average request consumes 100µs
of processing time, then peak throughput with PBFT set-
tles to about 8613 requests per second. For the same ser-
vice, a protocol with twice the agreement overhead of
PBFT (i.e., 32.2µs per request), would still achieve peak
throughput of about 7564 requests/second: in this hy-
pothetical example, doubling agreement overhead would
reduce peak end-to-end throughput by about 12%.
4 Aardvark: RBFT in action
Aardvark is a new BFT system designed and imple-
mented to be robust to failures. The Aardvark pro-
tocol consists of 3 stages: client request transmission,
replica agreement, and primary view change. This is the
same basic structure of PBFT [8] and its direct descen-
dants [4, 18, 19, 33, 32], but revisited with the goal of
achieving an execution path that satisfies the properties
outlined in the previous section: acceptable performance,
ease of implementation, and robustness against Byzan-
tine disruptions. To avoid the pitfalls of fragile opti-
mizations, we focus at each stage of the protocol on how
faulty nodes, by varying both the nature and the rate of
their actions and omissions, can limit the ability of cor-
rect nodes to perform in a timely fashion what the proto-
col requires of them. This systematic methodology leads
us to the three main design differences between Aardvark
and previous BFT systems: (1) signed client requests, (2)
resource isolation, and (3) regular view changes.
Signed client requests. Aardvark clients use digital
signatures to authenticate their requests. Digital signa-
tures provide non-repudiation and ensure that all correct
replicas make identical decisions about the validity of
each client request, eliminating a number of expensive
and tricky corner cases found in existing protocols that
make use of weaker (though faster) message authentica-
tion code (MAC) authenticators [7] to authenticate client
requests. The difficulty with utilizing MAC authentica-
tors is that they do not provide the non-repudiation prop-
erty of digital signatures—one node validating a MAC
authenticator does not guarantee that any other nodes
will validate that same authenticator [2].
As we mentioned in the Introduction, digital signa-
tures are generally seen as too expensive to use. Aard-
vark uses them only for client requests where it is pos-
sible to push the expensive act of generating the signa-
ture onto the client while leaving the servers with the
less expensive verification operation. Primary-to-replica,
replica-to-replica, and replica-to-client communication
rely on MAC authenticators. The quorum-driven nature
of server-initiated communication ensures that a single
faulty replica is unable to force the system into undesir-
able execution paths.
Because of the additional costs associated with verify-
ing signatures in place of MACs, Aardvark must guard
Replica
Replica
Replica
Replica
Clients
Figure 1: Physical network in Aardvark.
against new denial-of-service attacks where the system
receives a large numbers of requests with signatures that
need to be verified. Our implementation limits the num-
ber of signature verifications a client can inflict on the
system by (1) utilizing a hybrid MAC-signature construct
to put a hard limit on the number of faulty signature veri-
fications a client can inflict on the system and (2) forcing
a client to complete one request before issuing the next.
Resource isolation. The Aardvark prototype imple-
mentation explicitly isolates network and computational
resources.
As illustrated by Fig. 1, Aardvark uses separate net-
work interface controllers (NICs) and wires to connect
each pair of replicas. This step prevents a faulty server
from interfering with the timely delivery of messages
from good servers, as happened when a single broken
NIC shut down the immigration system at the Los An-
geles International Airport [9]. It also allows a node to
defend itself against brute force denial of service attacks
by disabling the offending NIC. However, using phys-
ically separate NICs for communication between each
pair of servers incurs a performance hit, as Aardvark can
no longer use hardware multicast to optimize all-to-all
communication.
As Figure 2 shows, Aardvark uses separate work
queues for processing messages from clients and indi-
vidual replicas. Employing a separate queue for client
requests prevents client traffic from drowning out the
replica-to-replica communications required for the sys-
tem to make progress. Similarly, employing a sepa-
rate queue for each replica allows Aardvark to sched-
ule message processing fairly, ensuring that a replica is
able to efficiently gather the quorums it needs to make
progress. Aardvark can also easily leverage separate
hardware threads to process incoming client and replica
requests. Taking advantage of hardware parallelism al-
lows Aardvark to reclaim part of the costs paid to verify
signatures on client requests.
剩余15页未读,继续阅读
资源推荐
资源评论
5星 · 资源好评率100%
193 浏览量
2019-08-15 上传
136 浏览量
2021-02-13 上传
2021-04-30 上传
2021-03-23 上传
2021-02-04 上传
102 浏览量
124 浏览量
2019-10-24 上传
2021-02-06 上传
187 浏览量
2021-05-23 上传
2022-08-04 上传
2021-02-25 上传
197 浏览量
2021-04-17 上传
123 浏览量
2019-09-22 上传
2024-04-27 上传
2019-07-31 上传
2019-09-22 上传
5星 · 资源好评率100%
2022-10-11 上传
5星 · 资源好评率100%
197 浏览量
资源评论
马李灵珊
- 粉丝: 41
- 资源: 297
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功