【免费】icpads-efficient-fault+赵帅兵+186973095701资源-CSDN文库

需积分: 0 94 浏览量 2022-08-04 14:43:49 上传评论收藏 485KB PDF 举报

资源详情

资源评论

资源推荐

An Efﬁcient Fault Tolerance Framework for

Distributed In-memory Caching Systems

Shuaibing Zhao, Lu Shen, Yusen Li, Rebecca J. Stones, Gang Wang*, Xiaoguang Liu*

Nankai-Baidu Joint Lab, College of Computer Science,

Nankai University, Tianjin, China

Email:{zhaoshb,shenlu,liyusen,rebecca.stones82,wgzwp,liuxg}@nbjl.nankai.edu.cn

Abstract—With the development of the information age,

many large database applications have introduced distributed

in-memory object caching systems, of which Memcached is

one of the most typical. However, Memcached does not have

fault-tolerant capabilities. In order to make Memcached enable

fault tolerance, Cocytus introduced Reed-Solomon codes and

distributed protocols into Memcached. Cocytus saves signiﬁcant

memory compared to primary-backup replication when tolerat-

ing the same number of failures. However, the relatively complex

ﬁnite-ﬁeld calculations used by RS codes and the high network

transmission cost during data reconstruction are becoming new

bottlenecks.

This paper introduces RDP codes into distributed Memcached

to optimize the calculation performance in Cocytus. In addition,

this paper adopts RDOR scheme and Collective Reconstruction

Read to speed up the data reconstruction. Compared with

Cocytus, which uses RS codes for fault tolerance, the new

distributed Memcached with 4 data nodes and 2 check parity

nodes reduces reconstruction overhead by up to 31%.

Index Terms—Memcached, erasure code, optimal recovery,

parallel recovery

I. INTRODUCTION

Relational database management systems struggle to keep

pace with modern increases in performance requirements [1]

due to limitations of their storage structures [2]. In order

to address this issue, some distributed in-memory caching

systems have been proposed, such as Memcached [3] and

Redis [4]. They put the most frequently accessed data in

memory so that user requests are processed without disk op-

erations, thereby improving overall performance. Distributed

in-memory caching systems are consequently widely used in

large Internet companies such as Facebook [5] and Twitter [6].

In distributed in-memory caching systems, when server

nodes are crashed(or are temporarily unavailable), it is desir-

able to have a mechanism to recover the data on erased nodes

via the non-erased nodes. Otherwise, if the lost data is reloaded

from the disk, the long loading time hurts the system perfor-

mance. A traditional approach for fault tolerance is through

primary-backup replication (PBR) [7]. In this approach, each

primary node has some backup nodes to store the data replicas

for fault tolerance. When a primary node crashes, one of the

backup nodes acts as the new primary node. Although this

approach provides continuous services in the presence of node

failures, the data redundancy is high.

Erasure codes [8] (such as the well-known Reed-Solomon

codes [9]) have been proven very efﬁcient in providing higher

levels of fault-tolerance with less cost, and are widely used

in today’s distributed systems [10]–[12]. In erasure-coded

distributed systems, server nodes are classiﬁed into data nodes

(where the raw data is stored) and check nodes (where the

parity check data is stored). The parity check data is computed

using the raw data. Data nodes and check nodes are organized

into coding groups. The raw data and parity check data are

both divided into equal-sized data units. If some data units

(either raw data units or parity check data units) in a coding

group are lost due to node failures, the lost data units can

be recovered using other data units belonging to a common

coding group.

Based on Reed-Solomon codes (RS codes) [9], [13], Zhang

et al. [14] proposed Cocytus, which applied erasure codes

to distributed in-memory caching systems for the ﬁrst time.

For a RS coding system with k data nodes and n − k check

nodes (normally denoted by RS (n, k)), at most n − k node

failures can be handled [13]. However, in order to recover

one data unit in a RS (n, k) coding system, k units of data

from different nodes need to be fetched to perform the ﬁnite-

ﬁeld computation. The overheads in both computation and

data transmission are huge, which inhibit the throughput of

the system.

In this paper, we propose a efﬁcient fault-tolerance frame-

work for a distributed in-memory caching systems that utilizes:

1) Row-Diagonal Parity (RDP) codes [15]. RDP codes only

use XOR operations for computing parity check units

and during recovery, which are faster than ﬁnite-ﬁeld

operations in RS codes.

2) Row-Diagonal Optimal Recovery (RDOR) scheme [16].

The amount of data required for single failure recovery

using RDOR is less than that using naive recovery

schemes for RDP codes and RS codes.

3) Collective Reconstruction Read (CRR) decoding

scheme [17] which carries out the recovery process in

a distributed and parallel manner.

We design encoding and decoding protocols for the pro-

posed fault-tolerance framework. We implement this as a

modiﬁcation of Cocytus [14], which is implemented on the

Memcached [18] caching system.

The structure of this paper is as follows. Section 2 reviews

the related work of this paper. Section 3 introduces RDP codes.

Section 4 describes the overall design of our Memcached sys-

tem, which includes data updating and data recovery. Section

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余7页未读，立即下载

评论收藏

内容反馈

王佛伟

粉丝: 13
资源: 320

icpads-efficient-fault+赵帅兵+186973095701

评论0

最新资源

icpads-efficient-fault+赵帅兵+186973095701

评论0

Traveling-Wave-Based Fault-Location Algorithms.rar_Fault Locatio

Data-driven Design of Fault Diagnosis and Fault-tolerant Control Systems

应对STM32_Cortex-M3_HardFault异常

AEC-Q100-007B：2007 Fault Simulation and Test Grading - 完整英文电子版（2

A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like Systems

Cloud-Native+Applications+in+Java-Packt+Publishing(2018).epub

On-Line-Fast-Motor-Fault-Diagnostics-Based-on-Fuz_Diagnostics_Fa

Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing

reference-architecture-vmware-fault-tolerance-dell-server-equallogic-iscsi_cn

digital-identity:数字身份和区块链

Cortex-M3内核的Fault调试模块设计-带代码

Improving-FRT-Capability-of-DFIG-Based-WT-Using-S_dfig fault rid

Cortex-M3内核的Fault调试模块设计-V1.doc

论文研究-SDG-Based Fault Isolation for Large-Scale Complex Systems Solved by Rough Set Theory.pdf

A Survey of Fault Diagnosis and Fault-Tolerant Techniques 1

Distribution-System-3ph-fault-model.rar_DEMO_distribution_distri

009-UNITE-Hardware Fault Diagnostic_seismicreflexion_

Cortex-M3_内核HardFault错误调试定位方法

ISO 22901-3 -Fault symptom exchange description - 2018.02.zip

最新版ISO/IEC 27001:2022、ISO 27002:2022中英文合集

Goby红队版-win-x64-2.4.7版本

Chrome Header Editor 插件

ISO SAE 21434-2021 中文版.pdf

OpenVAS GVM 中文翻译补丁

安全认证cisp教材全套

现代永磁同步电机控制原理及MATLAB仿真__袁雷编著1

2024最新：Hvv中常见的面试问题

全面的安全基线核查清单

最新资源