greatly limits the scalability of network stores. (The growth in storage capacity has far outstripped
the growth in storage access times and bandwidth [44]). Furthermore, I/O incurred to establish
data possession interferes with on-demand bandwidth to store and retrieve data. We conclude that
clients need to be able to verify that a server has retained filedatawithout retrieving the data from
the server and without having the server access the entire file.
Previous solutions do not meet these requirements for proving data possession. Some schemes
[20] provide a weaker guarantee by enforcing storage complexity:Theserverhastostoreanamount
of data at least as large as the client’s data, but not necessarily the same exact data. Moreover, all
previous techniques require the server to access the entire file, which is not feasible when dealing
with large amounts of data.
We define a model for provable data p ossession (PDP) that provides probabilistic proof that a
third party stores a file. The model is unique in that it allows the server to access small portions of
the file in generating the proof; all other techniques m ust access the entire file. Within this model,
we give the first provably-secure scheme for remote data checking. The client stores a small O(1)
amount of metadata to verify the server’s proof. Also, the scheme uses O(1) bandwidth
1
.The
challenge and the response are each slightly more than 1 Kilobit. We also present a more efficient
version of this scheme that proves data possession using a single mo dular exponentiation at the
server, even though it provides a weaker guarantee.
Both schemes use homomorphic verifiable tags.Becauseofthehomomorphicproperty,tags
computed for multiple file blocks can be combined into a singlevalue. Theclientpre-computes
tags for each block of a file and then stores the file and its tags with a server. At a later time,
the client can verify that the server possesses the file by generating a random challenge against a
randomly selected set of file blocks. Using the queried blocksandtheircorrespondingtags,the
server generates a proof of possession. The client is thus convinced of data possession, without
actually having to retrieve file blocks.
The efficient PDP scheme is the fu ndamental construct un derlying an archival introspection
system that we are d eveloping for the long-term preservationofAstronomydata. Wearetaking
possession of multi-terabyte Astronomy databases at a University library in order to preserv e the
information long after the research projects and instruments used to collect the data are gone. The
database will b e replicated at multiple sites. Sites includeresource-sharingpartnersthatexchange
storage capacity to ach ieve reliability and scale. As such, the system is subject to freeloading in
which partners attempt to use storage resources and contribute none of their own [20]. The location
and physical implementation of these replicas are managed independently by each partner and will
evolve over time. Partners may even outsource storage to third-party storage server providers [23].
Efficient PDP schemes will ensure that the computational requirements of remote data checking
do not unduly burden the remote storage sites.
We implemented our more efficient scheme (E-PDP)andtwootherremotedatacheckingproto-
cols and evaluated their performance. Experiments show thatprobabilisticpossessionguarantees
make it pr actical to verify possession of large data sets. With sampling, E-PDP verifies a 64MB
file in about 0.4 seconds as compared to 1.8 seconds without sampling. Further, I/O bounds the
performance of E-PDP;itgeneratesproofsasquicklyasthediskproducesdata. Finally, E-PDP is
185 times faster than the previous secure protocol on 768 KB files.
1
Storage overhead and network overhead are constant in the size of the file, but depend on the chosen security
parameter.
2
评论0