没有合适的资源?快使用搜索试试~ 我知道了~
建立一个加密的,分布式的和可搜索的键值存储
需积分: 0 1 下载量 83 浏览量
2021-03-08
04:32:44
上传
评论
收藏 1.12MB PDF 举报
温馨提示
试读
12页
建立一个加密的,分布式的和可搜索的键值存储
资源详情
资源评论
资源推荐
Building an Encrypted, Distributed, and Searchable
Key-value Store
Xingliang Yuan
†‡
, Xinyu Wang
†‡
, Cong Wang
†‡
, Chen Qian
§
, Jianxiong Lin
†
†
Department of Computer Science, City University of Hong Kong, China
‡
City University of Hong Kong Shenzhen Research Institute, China
§
Department of Computer Science, University of Kentucky, Unite States
{xl.y, j.lin}@my.cityu.edu.hk, {xinywang, congwang}@cityu.edu.hk, qian@cs.uky.edu
ABSTRACT
Modern distributed key-value stores are offering superior
performance, incremental scalability, and fine availability
for data-intensive computing and cloud-based applications.
Among those distributed data stores, the designs that en-
sure the confidentiality of sensitive data, however, have not
been fully explored yet. In this paper, we focus on designing
and implementing an encrypted, distributed, and searchable
key-value store. It achieves strong protection on data pri-
vacy while preserving all the above prominent features of
plaintext systems. We first design a secure data partition
algorithm that distributes encrypted data evenly across a
cluster of nodes. Based on this algorithm, we propose a se-
cure transformation layer that supports multiple data mod-
els in a privacy-preserving way, and implement two basic
APIs for the proposed encrypted key-value store. To en-
able secure search queries for secondary attributes of data,
we leverage searchable symmetric encryption to design the
encrypted secondary indexes which consider security, effi-
ciency, and data locality simultaneously, and further enable
secure query processing in parallel. For completeness, we
present formal security analysis to demonstrate the strong
security strength of the proposed designs. We implement
the system prototype and deploy it to a cluster at Microsoft
Azure. Comprehensive performance evaluation is conducted
in terms of Put/Get throughput, Put/Get latency under
different workloads, system scaling cost, and secure query
performance. The comparison with Redis shows that our
prototype can function in a practical manner.
Keywords
Key-value Store; Searchable Encryption
1. INTRODUCTION
In order to manage the persistently growing amount of
data, distributed key-value (KV) stores have become the
backbone of many public cloud services [11, 16, 33]. Their
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full cita-
tion on the first page. Copyrights for components of this work owned by others than
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from Permissions@acm.org.
ASIA CCS ’16, May 30-June 03, 2016, Xi’an, China
c
2016 ACM. ISBN 978-1-4503-4233-9/16/05. . . $15.00
DOI: http://dx.doi.org/10.1145/2897845.2897852
well-understood advantages include high performance, linear
scalability, continuous availability, and even great potentials
of high-level support on rich queries and multiple data mod-
els, as seen in a number of proposals and implementations of
recent KV stores [12, 14,17, 27]. However, with the growing
data breaches, privacy concerns in data outsourcing become
even more pressing than before. Recent works from both
cryptographic perspective, e.g., [5, 10, 15, 19] and database
perspective, e.g., [29, 32, 37], provided solutions with trade-
offs among security, efficiency, and query functionality. Yet,
most of them focus on the setting of a centralized or log-
ically centralized server. They do not specifically consider
the features and the requirements in modern KV stores.
As known, KV stores are probably the simplest data stor-
age system, which stores pairs of primary keys and data
values, and allows to access data values when a primary
key is given. To benefit a variety of data-driven applica-
tions, modern KV stores provide higher-level features. As
one promising feature, multiple richer data models such as
column-oriented, document and graph data are supported
on top of one united KV store. This feature eases the oper-
ational complexity for applications that require more than
one format of data [13, 14, 27]. For the other popular fea-
ture, many KV stores allow for the data access not just from
primary keys, but also from other attributes of data via sec-
ondary indexes, so as to enable more efficient data access and
rich queries [12,16,33]. Although promising, building an en-
crypted, distributed KV stores still face gaps and encounter
challenges, especially for preserving the above salient fea-
tures in a privacy-preserving manner.
Limitations of prior work. One straightforward ap-
proach is to directly store encrypted data along with the
(possibly randomized) data identifier/label [30]. However,
it only allows limited encrypted data retrieval by the iden-
tifier/label, preventing from all possible queries via other
secondary attributes of data. Besides, this approach does
not consider the support of multiple data models.
Another seemingly plausible approach is to treat KV
stores as a blackbox dictionary, and to build an encrypted
secondary index, like the one proposed by Cash et al. in [5].
But this direct combination, though slightly enhancing the
first approach by limiting the query support to the encrypted
index design, would inevitably suffer from secure queries
with long latency. Because they treat the distributed KV
store as a blackbox dictionary, the data locality in the en-
crypted query processing can hardly be preserved. In other
words, the node where the index is accessed could be dif-
ferent from the node where the matched data are stored.
547
Data$Applica*ons$
Secure$Trans.$Layer$
$
Encrypted$K9V$Store$
Secure$Na*ve$API$
Figure 1: The proposed system framework
This approach will introduce extra inter-node communi-
cation overhead, undesirably increasing the query latency.
And the issue will be further exacerbated in more complex
query processing such as aggregation and range queries.
Design goals. In this work, we aim to build an encrypted
and searchable KV store. For data confidentiality, all the
data should remain in the encrypted form as long as they
leave from the client of the data application. Meanwhile, it
should also embrace a bunch of prominent features of non-
encrypted distributed KV stores, such as low latency, load
balancing, horizontal scalability, and fine availability. In ad-
dition, multiple data models should be supported without
loss of data confidentiality. And developers can not only
commit simple retrieval and update requests on a single en-
crypted data value but also execute secure queries via sec-
ondary attributes of data.
Design challenges. Achieving all the design goals simul-
taneously is a non-trivial task. As discussed, straightfor-
wardly using existing KV implementations for the security
counterpart is unsatisfying. Therefore, we propose to build
and evaluate our system by carefully designing the security
sub-components of our system whenever necessary, and cope
with the following challenges. The first is to securely realize
the data partition algorithm across distributed nodes for en-
crypted data, while achieving high performance lookup, load
balancing, and linear scalability. Each encrypted data value
should only be located by a specified token generated from
an authorized client. Designing a customised secure data
partition algorithm allows us to pave the way for securely
preserving data locality and improving query efficiency.
The second challenge is to design an overlay that supports
multiple data models on top of the encrypted KV store. It
should be compatible with the secure data partition algo-
rithm while hiding auxiliary information of ciphertext, e.g.,
the structural information in column-oriented stores, which
is shown to be potentially vulnerable in terms of inference
attacks [26]. The third challenge is to design a framework for
encrypted and distributed indexes that enable secure queries
on given secondary attributes over designated nodes. The
index framework should address the data locality to avoid
inter-node interactions, and provide us a platform to easily
incorporate all the latest secure rich query designs into th
encrypted and distributed KV store.
Contributions. Our system design considerations are illus-
trated below to tackle the above challenges. We will firstly
use secure pseudo-random functions and standard symmet-
ric encryption to build an encrypted KV store, shown in Fig-
ure 1. Essentially, each KV pair contains a pseudo-random
label and an encrypted value: 1) it is simple yet secure, and
inherently compatible with the off-the-shelf data partition
algorithm (i.e., consistent hashing), achieving load balanc-
ing and incremental scalability; 2) it is a stepping stone such
that many other data models can be flexibly built on top.
We then introduce a secure transformation layer between
the encrypted KV store and data applications, depicted in
Figure 1. It formulates an extensible abstraction, which
maps the data formated from different data models to simple
KV pairs in a privacy-preserving fashion, i.e., both data val-
ues and their inherent relationships are strongly protected.
In this paper, we choose the column-oriented data model
as our first instantiation, supported by wide column stores,
e.g., Cassandra [23] and HBase [16]. Afterwards, we also
detail the possible adaptation on graph data and document
data. As a result, it separates data management and data
manipulation from the storage back end, and accrues the
benefits of the encrypted KV store.
In order to enable secure queries based on secondary at-
tributes of data, we propose a framework for encrypted local
indexes. This framework takes into consideration of dis-
tributed processing, KV store benefits, and the flexibility
of instantiating various encrypted secondary indexes in the
very beginning. By integrating our customised secure data
partition algorithm, we can always ensure that the encrypted
secondary index co-locates along with its own data on the
same node. Namely, secure queries are conducted over dis-
tributed nodes in parallel without extra connections, inter-
actions, or data movement. In this paper, we make our
first attempt to support secure search queries that retrieve
encrypted data with matched attributes. The proposed in-
dex construction leverages searchable symmetric encryption
(SSE), a security framework for private keyword search, and
follows one of the latest SSE constructions with sublinear
time, asymptotically optimal space complexity, and prov-
able security [5].
In brief, our contributions are listed as follows:
• We present an encrypted, distributed, and searchable
key-value store that ensures strong data protection.
We propose a secure data partition algorithm and de-
velop two basic APIs for retrieval and update on a
single encrypted data value, i.e., Put/Get. When the
system scales out, the affected encrypted data can be
relocated without loss of confidentiality.
• We introduce a secure transformation layer that sup-
ports secure data modeling. It maps the encrypted
data formated from different data models to encrypted
KV pairs while hiding the structural information of
data. This design is fully compatible with the pro-
posed secure data partition algorithm.
• We propose a framework of encrypted local indexes
that considers both data locality and incremental scal-
ability, where each local index resides the data on the
same node. Our design enables secure search queries
over encrypted and distributed data in parallel. In ad-
dition, we conduct formal security analysis to demon-
strate the strong security strength of this design.
• We develop the system prototype, and deploy it to
Microsoft Azure. The experiments show comprehen-
sive evaluation on Put/Get latency, throughput, in-
dex query performance, and system scaling costs. The
comparisons with Redis on plaintext data show that
548
our system functions in a practical manner with little
security overhead introduced.
Organization. Section 2 introduces the system architec-
ture and the threat models. Section 3 elaborates on the
system design. The security analysis is presented in Sec-
tion 4. The implementation and the performance evalua-
tion are given in Section 5. We discuss the related works in
Section 6, and conclude in Section 7.
2. SYSTEM MODEL
2.1 System Architecture
Figure 2 illustrates the architecture of our proposed pri-
vate KY store. It consists of three entities: the client of a
data application, the dispatcher, and a number of clustered
storage nodes, where the latter two entities are deployed
at the public cloud. We envision that the data application
would like to outsource a huge amount of data to a cloud-
powered data store while ensuring the data confidentiality.In
general, the dispatcher and the nodes are off-premises com-
modity servers or virtual machines. They are programmed
to execute specific algorithms and operations.
Our system operates in a distributed framework that guar-
antees data privacy while preserving high performance and
incremental scalability. It distributes the encrypted data to
all the nodes evenly, which inherently handles heavy work-
loads without revealing the underlying values. The dis-
patcher deals with the secure Put/Get requests generated
from the client for update and retrieval on a single encrypted
data value. It routes the requests via a standard yet se-
cure data partition protocol, i.e., following the algorithm
of consistent hashing but over the encrypted domain. It is
like a logically centralized hub, forwarding all the requests
to the target nodes. To increase the throughput and avoid
single-point failure, it can be physically distributed and syn-
chronized between multiple nodes just like in HBase [16], or
cached to the client and updated periodically just like in Dy-
namo [11]. After the request routing, the nodes respond the
requests and send the encrypted values back to the client.
For the data in the simple KV model, the client will directly
generate secure Put/Get requests. While for the data in
other models, the secure transformation layer at the client
will transform the requests on encrypted structured/semi-
structured data into the secure KV Put/Get requests.
In order to query the encrypted data through secondary
attributes, the encrypted secondary index integrates the se-
cure data partition algorithm so that each node can index
its local data and process a given secure query in parallel.
The proposed secure search query generates tokens for each
of the nodes to provision them the search ability over dis-
tributed encrypted data. In addition, the general framework
of encrypted local indexes can readily support known secure
queries like counting, range, and aggregation. To handle
heavy workloads, new nodes are incrementally added with
moderate impact in our system. The affected encrypted data
can be directly relocated via the secure data partition algo-
rithm by the corresponding nodes, while the affected indexes
are moved in a client-assisted manner, i.e., being rebuilt at
the client for security considerations.
2.2 Threat Assumption
In this paper, we are targeting the threats from semi-
honest adversaries. They are interested in the data, the
1
5!
2!
4!
3!
6!
Client'
Dispatcher'
Node'
Secure Token
Figure 2: System architecture
query attributes and the query results. In Figure 2, the
nodes are allocated in the off-premises public cloud. To
meet the requirements of performance and availability, they
have to be always online, and thus are vulnerable to security
breaches and unauthorized disclosures [21, 34].
On the one hand, outside unauthorized adversaries, who
compromise some or all of the nodes, gain the privilege to
access the hard disks and the physical memory. They at-
tempt to learn the sensitive information about the data. On
the other hand, cloud service providers may also be a threat.
They could faithfully maintain the nodes, and execute the
operations as required, but intentionally learn the data, or
unintentionally mishandle the data to third parties. We as-
sume that the client is in the trusted domain, and does not
collude with those adversaries. Besides, the communication
channels are assumed to be authenticated and encrypted
against eavesdropping. Our security goal is to protect data
confidentiality throughout their life-cycles. User authenti-
cation and data integrity are orthogonal to our focus.
2.3 Cryptographic Primitives
A symmetric encryption scheme SE(KGen, Enc, Dec)
contains three algorithms: The key generation algorithm
KGen takes a security parameter k to return a secret key
K. The encryption algorithm Enc takes a key K and a
value V ∈ {0, 1}
∗
to return a ciphertext V
∗
∈ {0, 1}
∗
; The
decryption algorithm Dec takes K and V
∗
to return V . De-
fine a family of pseudo-random functions F : K × X → R,
if for all probabilistic polynomial-time distinguishers Y ,
|P r[Y
F (k,·)
= 1|k ← K] − P r[Y
g
= 1|g ← {Func : X →
R}]| < negl(k), where negl(k) is a negligible function in k.
3. THE PROPOSED SYSTEM
This section elaborates on our system design. We first in-
troduce the secure data partition algorithm which can dis-
tribute the encrypted data evenly across the nodes without
knowing the underlying value. Then we use the column-
oriented data model as the instantiation to present how our
system maps the data to encrypted KV pairs. Other data
models like graph and document data are also supported.
Subsequently, we implement two APIs for secure and fast ac-
cess of the encrypted KV store. To enable the secure query
on secondary attributes of data, we propose the construction
of encrypted local indexes. We then show how the system
scales out when new nodes are added.
3.1 Encrypted Key-value Store
We first introduce the construction of the proposed en-
crypted and distributed KV store, where each KV pair con-
549
剩余11页未读,继续阅读
weixin_38635323
- 粉丝: 9
- 资源: 955
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 海信智能电视刷机数据 LED42K330X3D(0000) 生产用软件数据 务必确认机编一致 强制刷机 整机USB升级程序
- shujudaochuceshi
- learn-ruby.zip
- test111111111111111111
- face-detect.ipynb
- 以下是一些关于ACM(国际大学生程序设计竞赛)、NOI(全国青少年信息学奥林匹克竞赛)以及CSP(全国青少年信息学奥林匹克竞赛提
- 是一些电子设计竞赛(电赛)经验分享,包括备赛策略、项目管理、团队合作和比赛期间的注意事项
- 全能运行库修复工具DirectX Repair v4.1.0.30770
- las格式点云数据使用详解(附VS编译好的LAStools工具)
- KRPano插件一键解密大师1.4.0 (解压密码1234)
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0