没有合适的资源?快使用搜索试试~ 我知道了~
icpads-efficient-fault+赵帅兵+186973095701
需积分: 0 0 下载量 94 浏览量
2022-08-04
14:43:49
上传
评论
收藏 485KB PDF 举报
温馨提示
试读
8页
icpads-efficient-fault+赵帅兵+186973095701
资源详情
资源评论
资源推荐
An Efficient Fault Tolerance Framework for
Distributed In-memory Caching Systems
Shuaibing Zhao, Lu Shen, Yusen Li, Rebecca J. Stones, Gang Wang*, Xiaoguang Liu*
Nankai-Baidu Joint Lab, College of Computer Science,
Nankai University, Tianjin, China
Email:{zhaoshb,shenlu,liyusen,rebecca.stones82,wgzwp,liuxg}@nbjl.nankai.edu.cn
Abstract—With the development of the information age,
many large database applications have introduced distributed
in-memory object caching systems, of which Memcached is
one of the most typical. However, Memcached does not have
fault-tolerant capabilities. In order to make Memcached enable
fault tolerance, Cocytus introduced Reed-Solomon codes and
distributed protocols into Memcached. Cocytus saves significant
memory compared to primary-backup replication when tolerat-
ing the same number of failures. However, the relatively complex
finite-field calculations used by RS codes and the high network
transmission cost during data reconstruction are becoming new
bottlenecks.
This paper introduces RDP codes into distributed Memcached
to optimize the calculation performance in Cocytus. In addition,
this paper adopts RDOR scheme and Collective Reconstruction
Read to speed up the data reconstruction. Compared with
Cocytus, which uses RS codes for fault tolerance, the new
distributed Memcached with 4 data nodes and 2 check parity
nodes reduces reconstruction overhead by up to 31%.
Index Terms—Memcached, erasure code, optimal recovery,
parallel recovery
I. INTRODUCTION
Relational database management systems struggle to keep
pace with modern increases in performance requirements [1]
due to limitations of their storage structures [2]. In order
to address this issue, some distributed in-memory caching
systems have been proposed, such as Memcached [3] and
Redis [4]. They put the most frequently accessed data in
memory so that user requests are processed without disk op-
erations, thereby improving overall performance. Distributed
in-memory caching systems are consequently widely used in
large Internet companies such as Facebook [5] and Twitter [6].
In distributed in-memory caching systems, when server
nodes are crashed(or are temporarily unavailable), it is desir-
able to have a mechanism to recover the data on erased nodes
via the non-erased nodes. Otherwise, if the lost data is reloaded
from the disk, the long loading time hurts the system perfor-
mance. A traditional approach for fault tolerance is through
primary-backup replication (PBR) [7]. In this approach, each
primary node has some backup nodes to store the data replicas
for fault tolerance. When a primary node crashes, one of the
backup nodes acts as the new primary node. Although this
approach provides continuous services in the presence of node
failures, the data redundancy is high.
Erasure codes [8] (such as the well-known Reed-Solomon
codes [9]) have been proven very efficient in providing higher
levels of fault-tolerance with less cost, and are widely used
in today’s distributed systems [10]–[12]. In erasure-coded
distributed systems, server nodes are classified into data nodes
(where the raw data is stored) and check nodes (where the
parity check data is stored). The parity check data is computed
using the raw data. Data nodes and check nodes are organized
into coding groups. The raw data and parity check data are
both divided into equal-sized data units. If some data units
(either raw data units or parity check data units) in a coding
group are lost due to node failures, the lost data units can
be recovered using other data units belonging to a common
coding group.
Based on Reed-Solomon codes (RS codes) [9], [13], Zhang
et al. [14] proposed Cocytus, which applied erasure codes
to distributed in-memory caching systems for the first time.
For a RS coding system with k data nodes and n − k check
nodes (normally denoted by RS (n, k)), at most n − k node
failures can be handled [13]. However, in order to recover
one data unit in a RS (n, k) coding system, k units of data
from different nodes need to be fetched to perform the finite-
field computation. The overheads in both computation and
data transmission are huge, which inhibit the throughput of
the system.
In this paper, we propose a efficient fault-tolerance frame-
work for a distributed in-memory caching systems that utilizes:
1) Row-Diagonal Parity (RDP) codes [15]. RDP codes only
use XOR operations for computing parity check units
and during recovery, which are faster than finite-field
operations in RS codes.
2) Row-Diagonal Optimal Recovery (RDOR) scheme [16].
The amount of data required for single failure recovery
using RDOR is less than that using naive recovery
schemes for RDP codes and RS codes.
3) Collective Reconstruction Read (CRR) decoding
scheme [17] which carries out the recovery process in
a distributed and parallel manner.
We design encoding and decoding protocols for the pro-
posed fault-tolerance framework. We implement this as a
modification of Cocytus [14], which is implemented on the
Memcached [18] caching system.
The structure of this paper is as follows. Section 2 reviews
the related work of this paper. Section 3 introduces RDP codes.
Section 4 describes the overall design of our Memcached sys-
tem, which includes data updating and data recovery. Section
王佛伟
- 粉丝: 13
- 资源: 320
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0