01.Accurate+and+efficient+follower+log.pdf资源-CSDN文库

版权申诉

154 浏览量 2022-03-10 00:48:53 上传评论收藏 715KB PDF 举报

在现代集群数据库系统中，状态机复制（State Machine Replication, SMR）是一种广泛采用的技术，用于提高系统的容错性和可扩展性。其中，Raft 共识协议因其简洁性和易理解性，成为最常用的配置之一。在Raft协议中，存在一个强领导者，负责将日志复制到其他跟随者（followers）。由于跟随者可以处理读取请求，而许多实际工作负载通常是读密集型的，因此跟随者的恢复速度会显著影响系统的吞吐量。传统的数据库恢复方法并不适用于Raft协议。当跟随者崩溃并需要恢复时，首先需要修复其本地日志。原始的Raft协议在跟随者和领导者之间进行多次网络往返来比较日志，这可能导致效率低下。为优化这一过程，一种常见方法是截断跟随者在其最新本地提交点之后的未确定日志条目，然后在一个往返中直接从领导者获取所有已提交的日志条目。然而，如果提交点没有持久化，恢复中的跟随者必须从领导者获取整个日志，这会增加网络负担。针对这个问题，本文提出了一个精确且高效的日志修复算法（Accurate and Efficient Log Repair, AELR），该算法增强了对跟随者故障的容忍度，并且只需要一次网络往返就能获取最少的日志条目以完成跟随者的恢复。AELR算法的实现是在开源数据库系统OceanBase中进行的，实验结果表明，采用AELR的系统在恢复时间方面表现出良好的性能。关键词：Raft，高可用性，日志复制，日志修复 1. 引言面对廉价商用机器的故障问题，现代数据存储系统通常采用复制技术来增强容错能力和可伸缩性。根据不同的复制策略，系统可以提供不同程度的性能和一致性保证。在Raft共识中，领导者负责协调和同步集群中所有节点的状态，确保数据的一致性。然而，跟随者失败后的快速恢复对于维持服务的连续性和性能至关重要。 2. 系统背景与挑战在分布式数据库中，日志是保持状态一致性的关键。当跟随者发生故障时，需要通过日志修复来重建其状态。然而，如何高效地对比和同步跟随者与领导者之间的日志，同时减少网络开销，成为了一个重要的设计问题。 3. AELR算法 AELR算法的核心在于优化日志修复过程，减少网络往返次数并最小化需要传输的数据量。它可能涉及到更复杂的日志管理和状态跟踪机制，以确保即使在提交点未持久化的情况下，也能准确地定位需要恢复的日志范围。 4. 实现与评估在OceanBase数据库系统中，AELR算法被集成并进行了性能测试。通过对比传统的日志修复方法，实验结果验证了AELR在恢复时间和资源利用率上的优势。 5. 结论与未来工作 AELR算法的提出，展示了在Raft协议下提高跟随者恢复效率的可能性。未来的研究可能会进一步探索如何在更大规模的集群中优化日志修复，以及如何适应不同的故障模式和网络条件。这篇论文深入探讨了Raft共识协议中跟随者日志修复的挑战，并提出了一种新的高效解决方案。通过AELR算法，系统能够在不影响整体性能的前提下，快速恢复跟随者，从而提升了整个集群的高可用性。这对于依赖于分布式数据库系统的现代IT服务来说，具有重要的实践意义。

资源推荐

资源详情

资源评论

Front. Comput. Sci., 2021, 15(2): 152605

https://doi.org/10.1007/s11704-019-8349-0

Accurate and eﬃcient follower log repair for Raft-replicated

database systems

Jinwei GUO, Peng CAI , Weining QIAN, Aoying ZHOU

School of Data Science and Engineering, East China Normal University, Shanghai 200062, China

 Higher Education Press 2020

Abstract State machine replication has been widely used in

modern cluster-based database systems. Most commonly de-

ployed conﬁgurations adopt the Raft-like consensus protocol,

which has a single strong leader which replicates the log to

other followers. Since the followers can handle read requests

and many real workloads are usually read-intensive, the re-

covery speed of a crashed follower may signiﬁcantly impact

on the throughput. Diﬀerent from traditional database recov-

ery, the recovering follower needs to repair its local log ﬁrst.

Original Raft protocol takes many network round trips to do

log comparison between leader and the crashed follower. To

reduce network round trips, an optimization method is to trun-

cate the follower’s uncertain log entries behind the latest local

commit point, and then to directly fetch all committed log en-

tries from the leader in one round trip. However, if the commit

point is not persisted, the recovering follower has to get the

whole log from the leader. In this paper, we propose an ac-

curate and eﬃcient log repair (AELR) algorithm for follower

recovery. AELR is more robust and resilient to follower fail-

ure, and it only needs one network round trip to fetch the least

number of log entries for follower recovery. This approach is

implemented in the open source database system OceanBase.

We experimentally show that the system adopting AELR has a

good performance in terms of recovery time.

Keywords Raft, high availability, log replication, log repair

1 Introduction

Faced with the failures of cheap commodity machines, mod-

ern data store systems often employ replication techniques to

increase fault-tolerance and scalability. According to Brewer’s

CAP theorem [1,2], a data store providing strongly Consistent

services can not maintain highly Availability property in the

presence of complete network Partitions. Therefore, a main

challenge of using replication is how to trade oﬀ the three

properties. Replication techniques can be classiﬁed into two

categories according to whether they are based on distributed

consensus protocol or not. Many popular NoSQL systems, like

Amazon’s Dynamo [3] and Facebook’s Cassandra (see Cas-

sandra’s website), adopt replication based on non-consensus

Received October 11, 2018; accepted October 15, 2019

E-mail: pcai@dase.ecnu.edu.cn

protocol and they can provide horizontally scalable and highly

available services even if the network is partitioned, but they

sacriﬁce the consistency [4]. In recent years, the state machine

replication based on Paxos consensus protocol receives much

attention from both industry and academia as it can guarantee

strong consistent services and provides robust failure models.

However, the basic version of Paxos is well-known for its dif-

ﬁculty of understanding. Many details of implementation have

not been described in the original Paxos papers. This leads

to the result that existing systems adopting Paxos protocol,

such as Spanner, Chubby and PaxosStore, may have imple-

mented an unproven Paxos variant [5–7]. Therefore, in order

to enhance the understandability and facilitate implementation,

some multi-Paxos variants using strong leadership and log co-

herency features—which ensure that a log entry must be con-

sistent with the leader’s and there are no holes in the log—are

proposed. Raft [8] is the most typical one of them and it has

been used in many open source data stores or database sys-

tems, e.g., CockroachDB, etcd and TiDB. Meanwhile, many

academic works are happy to adopt the Raft protocol to improve

the availability of their systems, e.g., Taurus [9] and VAST [10].

Recovery is essential to guarantee the atomicity and durabil-

ity of ACID properties in transaction processing systems. Tra-

ditional database systems usually utilize logging mechanism

to ensure the correctness of database recovery [11]. When a

database system recovers from a failure, it ﬁrst installs the lat-

est checkpoint and then replays the log entries after the check-

point. Obviously, checkpoint installing and log replaying have

a impact on the recovery performance of the main-memory

database system. Therefore, there are always researchers con-

cerned about these issues [12–14]. Owning to the consensus

mechanism, the recovery of a replica in Raft is diﬀerent from

that of traditional database systems. However, existing research

works about the Raft-like protocol are mainly concerned about

the performance of log replication, they ignored the impact of

replica recovery on the system performance [15,16].

Leader election is the key mechanism for the correctness in

the Raft protocol. Usually, a log entry is regarded as a commit-

ted one if it is persisted in a majority of replicas (see Deﬁnition

1 in Section 2.2). In order to maintain this property, Raft never

commits log entries from previous terms. Therefore, when a

new leader is elected, it does not respond to clients until a log

2 Front. Comput. Sci., 2021, 15(2): 152605

entry from its current term has been committed. Obviously, this

may block read requests to prevent the risk of returning stale

data [8] and may lead to the livelock anomaly [17]. To avoid

these problems, once the new leader is elected, it needs to com-

mit an entry from its term. Speciﬁcally, each leader needs to

append a blank no-op entry—which we call the special mark

log entry in this paper—into the log at the start of its term.

Unfortunately, leader change can lead to the result that the

log of a replica may be inconsistent with that of another node

in Raft. This is because a replica can persist a log entry regard-

less of whether the corresponding write is committed. We ana-

lyze the log inconsistency anomaly in Section 3.1. Therefore, to

guarantee the correctness and consistency of the system, when

a replica recovers as a follower, it needs to repair its local log

ﬁrst. In this paper, we are mainly concerned with how a re-

covering follower repairs its log. The conventional follower log

repair methods (the details is presented in Section 3.2) usually

need many network round trips or more data to be transmitted,

which increases follower recovery time.

In this work, we present an accurate and eﬃcient log repair

(AELR) algorithm for follower recovery, which requires only

one network round trip for fetching the least log entries from

the leader when a follower is recovering. This algorithm lever-

ages the special mark log entries to accurately ﬁnd the extrane-

ous log entries inconsistent with the leader’s, which enables the

recovering follower to repair its log accurately and eﬃciently.

Since we make use of the properties of Raft replication, our fol-

lower log repair method can apply to a database system adopt-

ing the Raft-like protocol. The following is the list of our main

contributions.

• We give the notion of the special mark log entry, which is

the delimiter at the start of a term. Then we propose the

leader’s takeover execution, which utilizes its own special

entry and enables other replicas to conﬁrm whether the

special entry is committed.

• We introduce the AELR algorithm, which utilizes the spe-

cial mark log entries to repair the recovering follower’s

log accurately and eﬃciently. Then we explain why this

mechanism works and analyze it together with other log

repair approaches.

• We have implemented the AELR algorithm in the open

source database system OceanBase. The performance

analysis demonstrates the eﬀectiveness of our method in

terms of recovery time.

This paper is organized as follows. First, we review the Raft

replication in Section 2. In Section 3, we analyze the log incon-

sistency anomaly and summarize the follower log repair meth-

ods. We introduce the special mark log entry and how the leader

uses it to take over in Section 4. Section 5 describes our accurate

and eﬃcient log repair (AELR) algorithm for follower recov-

ery. In Section 6, we introduce the implementation of AELR

in a real database system. Section 7 presents the performance

evaluation. The related works are described in Section 8. We

conclude the paper in Section 9.

2 Preliminaries

In this section, we introduce the log replication model adopt-

ing the strong leadership and log coherency features, which is

basedonRaft,buthassomediﬀerence from the original. And

we give the properties of this replication using formalization.

• Strong leader: The leader replica is responsible for all the

write requests and it is the only one that can generate log

entries.

• Log coherency: There are no holes in the persisted log

(i.e., the log in the non-volatile storage) of each replica.

The hole in a log means that the LSNs of log entries are

not consecutive in the log.

Since the datasets of a database can entirely reside within the

main memory, each replica is a main-memory database system

in this work. Although checkpoint and snapshot are the other

important aspects of recovery technique in the database litera-

ture, this work mainly focuses on the log repairing in the Raft-

based systems. For a traditional main-memory database system,

the recovery can be divided into two phases: checkpoint in-

stalling and log replaying. Since there exist invalid log entries

in a log, the recovery of replica using the Raft protocol can be

divided into three phases: log repairing, checkpoint installing

and log replaying. It should be noted that the last two phases

are equal to the traditional two-phase recovery. When recover-

ing as a follower, the replica ﬁrst repairs its local log. Then it

installs the latest checkpoint and replays the local log entries

from the checkpoint. Obviously, we does not need to modify

the original recovery phases, except that we only add the log

repair phase. Therefore, our proposed method in this work can

extend for the recovery setting with checkpoint or snapshot.

2.1 The overview of Raft replication

To provide highly available services, the replicated database

systems usually are deployed on a cluster of collaborative com-

modity machines, where each one is a replica node used as a

state machine and mapped to one of the three roles: Leader,

Follower,orCandidate. Traditionally, systems adopting Raft

protocol have two main phases: leader election and log repli-

cation, whose executions can be overlapped. For ease of de-

scription, the total number of state machines is N.

In the Raft-based system, the lifecycle of the system is di-

vided into consecutive “terms” of arbitrary length, each term

is numbered with a monotonically increasing integer term_id.

Speciﬁcally, if a replica node is elected as the new leader, the

system enters a new leader term whose term_id is greater than

previous terms’.

During normal processing of log replication, only the leader

of a term can accept the write requests from clients. Figure 1

shows the model of log replication in the Raft replicated sys-

tem. To enhance understandability, we divide the replication

processing into two steps:

Step 1: When receiving a write from a client, the leader gen-

erates a log entry e and sends the entry e to all replicas. When

a replica receives the entry e, it persists e into its local storage

and then returns an acknowledgment. If the leader gets the ac-

knowledgments from a majority of replicas, it enters into the

second step.

Step 2: Since the entry e is persisted on a majority of repli-

cas, the leader can conﬁrm that the entry e is committed. Thus,

剩余12页未读，继续阅读

评论收藏

内容反馈

版权申诉

信息安全与项目管理

粉丝: 95
资源: 523

01.Accurate+and+efficient+follower+log.pdf

SQL.and.Relational.Theory.How.to.Write.Accurate.SQL.Code.3rd.Edition

IRM.Press,.Video.Data.Management.and.Information.Retrieval.pdf

CVPR2018_Oral_论文合集_人工智能_机器学习

Machine_Learning_Mastery_With_Python_－_Understand_Your_Data，_Create_Accurate_Models_and_Work_Projects_End－To－End.pdf ....pdf

Simple, Accurate, and Robust Projector-Camera Calibration.pdf

MTorres Accurate unwinding and splicing with Rockwell Automation.pdf

Robust Methods for Accurate and Efficient 3D Modeling from Unstructured Imagery

Que.Microsoft.Office.Access.2007.Forms.Reports.and.Queries.May.2007.pdf

Cascaded Partial Decoder for Fast and Accurate Salient Object Detection.pdf

ALIKE: Accurate and Lightweight Keypoint Detection and Descripto

DeepFool: a simple and accurate method to fool deep neural networks.pdf

TLC2543.pdf

Digital Video And HDTV Algorithms And Interfaces.pdf

ORB-SLAM_ a Versatile and Accurate Monocular SLAM System.pdf

机器学习论文合集（pdf格式）.zip

QRS检测+CEEMDAN代码直接运行.zip

Fast and Accurate Calibration of a Kinect Sensor

联想G460.pdf

p12-stoica.pdf

design of accurate and repeatable kinematics couplings.pdf

机器人外文文献.pdf

CUDA11.0-C-Programming-Guide.pdf

pets4词汇.pdf

河南专升本英语温习资料.pdf

最新资源