Overhead-free In-place Recovery and Repair
Schemes of XOR-based Regenerating Codes
Ximing Fu
∗
, Zhiqing Xiao
†
, and Shenghao Yang
‡
∗
Department of Computer Science and Technology, Tsinghua University
†
Department of Electronic Engineering, Tsinghua University
‡
Institute of Network Coding, The Chinese University of Hong Kong
Abstract—In this paper, refined recovery and repair schemes
are proposed for a storage system using the XOR-based MBR
regenerating storage code proposed by Hou et al. Our schemes
have zero transmission overhead for both recovery and repair,
i.e., the total number of transmitted bits for repair/recovery is
exactly equal to the total number of bits repaired/recovered.
Further, our schemes use mainly XOR operations and have lower
complexity than that of the previous schemes. Moreover, our
schemes require only a small amount of auxiliary space, which
qualifies our schemes as in-place.
I. INTRODUCTION
Regenerating codes were proposed for distributed storage
systems in [1]–[3]. In a typical setting, a file of M bits
is encoded and stored at n storage nodes. The file can be
recovered by accessing any k storage nodes. When a storage
node fails, a new node can be regenerated by accessing any
d surviving nodes such that the new set of n storage nodes
preserves the above recovery and regenerating properties. A
regenerating code with the above parameters is also called an
[n, k, d] code. Suppose that each storage node stores α bits and
the regenerating bandwith is γ bits, i.e., the total number of bits
communicated from the d nodes during regenerating. Dimakis
et al. [3] characterized a fundamental tradeoff between the
storage per node and the regenerating bandwidth.
Two extremal points in the optimal storage-bandwith trade-
off curve are of particular interest, i.e., the minimum-storage
regenerating (MSR) point and minimum-bandwidth regenerat-
ing (MBR) point. The regenerating codes attaining the MSR
point store α =
M
k
bits in each node. Since each k nodes
can be used to recover the original file, such regenerating
codes have zero recovery overhead, i.e., the total recovery
bandwidth is equal to the file size. The regenerating codes
attaining the MBR point have the minimum repair bandwidth
γ =
M
k
2d
2d−k+1
and α = γ. Accessing kα = M
2d
2d−k+1
bits
from any k nodes is sufficient to recover the original file. But
kα is strictly larger than M when k > 1 so that an MBR code
may not have zero recovery overhead. We focus on MBR codes
in this paper.
Rashmi, Shah and Kumar [4] provided product-matrix
constructions of MBR codes for all valid values of [n, k, d]
This work was supported in part by the National Basic Research Program
of China (973 Program) under Grant 2013CB834205 and the National Natural
Science Foundation of China (NSFC) under Grant 61133013 and 61471215.
This work was partially funded by a grant from the University Grants
Committee of the Hong Kong Special Administrative Region (Project No.
AoE/E-02/08).
and MSR codes for d ≥ 2k − 2. The product-matrix MBR
codes, however, require matrix operations over finite fields
for encoding and recovery, which leads to high complexity
for practical systems. To resolve this complexity issue, Hou
et al. [5]–[7] proposed BASIC codes using an exclusive-or
(XOR) version of the product-matrix constructions. For BASIC
codes, encoding and recovery mainly use binary XOR and shift
operations, so the computational complexities are significantly
reduced.
In a BASIC MBR code, a file of M bits are divided into
B = kd−
k
2
sequences, each of which consists of L = M/B
bits. After encoding, each storage node stores d encoded
packets. In the recovery algorithm introduced in [5], kd packets
are retrieved from k storage nodes and the B sequences
are recovered by solving d linear systems of dimension k.
Due to the shift operations, the packets encoded may be
of different length but all have at least L bits. This packet
overhead (the number of bits in a packet minus L) would
affect the recovery bandwidth. The recovery bandwidth, the
extra decoding storage and the computational complexity in
the recovery algorithm of [5] are given in Table I. In the worst
case, about 2M bits are transmitted to recover the M bits.
In this paper, we propose a more efficient recovery scheme
for BASIC codes, which works for all valid values of [n, k, d].
Our recovery scheme has two stages: the retrieving stage
and the decoding stage. In a BASIC MBR code, each node
stores more than M/k bits. In the retrieving stage of our
scheme, exactly M bits are retrieved from any k storage
nodes. Therefore, our scheme achieves the optimal recovery
bandwidth exactly. In other words, our scheme implies that it
is possible to achieve zero recovery overhead for MBR codes.
In the decoding stage of our scheme, the M bits retrieved in the
first stage are used to recover the original file. Our algorithm
is similar to the ZigZag decoding of Sung and Gong [8]
designed for a storage code based on XOR and shift operations.
We optimize their algorithm for BASIC MBR codes to gain
lower computational and storage complexities. Specifically, the
number of XOR operations used in our recovery scheme is
O
dk
2
L
. After retrieving the data from the storage nodes, our
decoding algorithm overwrites the data during execution, and
only consumes O (k log L) extra storage space for auxiliary
variables. After the algorithm executing, the M bits in the
memory storing the retrieved data become exactly the desired
file. Therefore, our decoding algorithm is in-place.
The packet overhead also affects the repair bandwidth. For
the repair scheme of BASIC codes in [4], [5], the total number