2008) (which allows users to check data integrity and
provides access). Current research on distributed storage
focuses on how to storage, transfer, access to large amounts
of data efficiently and improving the system robustness:
Tysowski and Hasan (2011) proposed a re-encryption
scheme for key management in the process of retrieving
data, and Parakh and Kak (2011) designed a secret sharing
based on multi-copy mechanism. In this paper, we focus on
the privacy issue of user’s data.
The principle of distributed storage system is shown in
Fig. 1. The process is as follows: a message M is divided
into k equal length blocks, then forms a vector
M ¼ðM
1
; M
2
; ...; M
k
Þ. We use the erasure encode to make
M redundant to get the code word of a vector C ¼
ðC
1
; C
2
; ...; C
n
Þ by using erasure code ðn; kÞ. N of C
i
are
stored in n storage servers. To retrieve the message M,we
choose randomly k of C
i
from n storage servers by using
erasure code ðn; kÞ. Note that all of the m
i
must be col-
lected, then encoded to form C
i
. So erasure code ðn; kÞ is a
center of this storage system. Since the Internet is a public
environment that anyone can freely access, how to prevent
malicious attackers from stealing the data in the distributed
storage system is important and necessary.
In order to store data stably for a long time (Rhea et al.
2001) in a distributed cloud storage system, we often adopt
the idea of redundancy: replication and erasure code.
Castro and Liskov (1999) use state machine replication
technology to construct a file storage system which can
tolerate faults, but this mechanism does not take confi-
dentiality into account and increases the storage cost. In
erasure code mechanism, the message is divided into
blocks and stored in a centralized way. However, this is
unsuitable for distributed storage servers in cloud dealing
with these blocks independently.
Decentralized erasure code (Dimakis et al. 2006) can
encode each blocks without any center, and it also has
advantages such as high scalability and availability. A
message M is divided into n blocks to form the vector M
i
!
,
and each storage server use decentralized erasure code
ðn; kÞ to obtain the codewords C
i
!
which are stored into the
n servers. To retrieve the original M, we select k of c
i
randomly from storage servers by decoding process. Note
that each server can implement decentralized code inde-
pendently. So there is no center in this storage system.
However decentralized erasure code can hardly provide
confidential mechanisms, such as Tornado encode (Luby
et al. 2001).
A simple solution is using the user secret key to encrypt
the file, but it is risky that user handle the only key. Once
the key is lost or compromised, the user cannot read the
data. At the same time, relying on one key may cause
management problem. Because the key is more easily
stolen when it is just kept by user, so it will increase the
risk of data leakage. Sandhu et al. (2002) proposed split-
key RSA encryption method, but it does not use threshold
approach. Subbiah and Blough (2005) designed a storage
structure called GridSharing which can tolerate the inva-
sion by combining secret sharing and replication technol-
ogy, but the message must first be divided based on the
secret sharing schemes.
In cloud, there are many advantages to relief the burden
of data management for users, such as easy to access, cheap
storage space, convenient resource-sharing. When users
consider their own limited storage space, they hope to
enjoy the convenient large storage space service in cloud.
Users generally upload data to the cloud storage servers,
then delete the local copies. So users lost complete control
over the data itself.
After users confided their data to cloud storage, the most
concern problem is that the cloud service provider (CPS)
may delete users’ data or tamper with the users’ data
maliciously. For the sake of interests, the CPS have many
motivations to fail the responsibility of protecting the user
data. Such as CPSs in order to save their own storage space
and save operating expenses, CPS delete data which users
access few; machine fault lead to loss of data, CPS hide
data loss incident; accidentally delete user data when
transfer data to new storage servers. Security is largely
concerned with enforcing good behavior and stopping
inappropriate behaviour (Baldwin and Shiu 2005). Cloud is
a set of distributed servers. This off-site location model is
insecure and risky, because this model can be subjected to
malicious activities (Fernandes et al. 2013). The servers
cloud not be trusted completely, then we need to verify
integrity of data which stored in the servers.
A direct way is to download all user’s data, then check
them. But this obviously will cost a lot of bandwidth and
resources. And remote data checking (RDC) (Deswarte
et al. 2004) is a very effect method. Kiani et al. (2013)
proposed a distributed cloud context analysis scheme
without privacy preservation. Ateniese et al. (2007) put
Fig. 1 A distributed storage system
858 C. Yao et al.
123