1
Information Leakage in Encrypted Deduplication via
Frequency Analysis: Aacks and Defenses
JINGWEI LI, University of Electronic Science and Technology of China; State Key Laboratory of Information
Security, Institute of Information Engineering, Chinese Academy of Sciences, China
PATRICK P. C. LEE, The Chinese University of Hong Kong, China
CHUFENG TAN, University of Electronic Science and Technology of China, China
CHUAN QIN, The Chinese University of Hong Kong, China
XIAOSONG ZHANG, University of Electronic Science and Technology of China, China
Encrypted deduplication combines encryption and deduplication to simultaneously achieve both data security
and storage efficiency. State-of-the-art encrypted deduplication systems mainly build on deterministic encryp-
tion to preserve deduplication effectiveness. However, such deterministic encryption reveals the underlying
frequency distribution of the original plaintext chunks. This allows an adversary to launch frequency analysis
against the ciphertext chunks and infer the content of the original plaintext chunks. In this paper, we study
how frequency analysis affects information leakage in encrypted deduplication, from both attack and defense
perspectives. Specifically, we target backup workloads, and propose a new inference attack that exploits
chunk locality to increase the coverage of inferred chunks. We further combine the new inference attack with
the knowledge of chunk sizes and show its attack effectiveness against variable-size chunks. We conduct
trace-driven evaluation on both real-world and synthetic datasets and show that our proposed attacks infer a
significant fraction of plaintext chunks under backup workloads. To defend against frequency analysis, we
present two defense approaches, namely MinHash encryption and scrambling. Our trace-driven evaluation
shows that our combined MinHash encryption and scrambling scheme effectively mitigates the severity of the
inference attacks, while maintaining high storage efficiency and incurring limited metadata access overhead.
CCS Concepts:
• Information systems → Cloud based storage
;
Deduplication
;
• Security and privacy
→ Cryptanalysis and other attacks.
Additional Key Words and Phrases: Frequency analysis, encrypted deduplication, cloud storage
An earlier conference version of this paper appeared in [
42
]. In this extended version, we propose new attack and defense
schemes, include a new dataset in our evaluation, and add new prototype experiments.
This work was supported in part by grants by National Key R&D Program of China (Grant number 2017YFB0802300),
National Natural Science Foundation of China (Grant numbers 61602092 and 61972073), Open Research Project of the State
Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences (Grant number
2019-MS-05), and the Research Grants Council of Hong Kong (CRF C7036-15G). Corresponding author: Patrick P. C. Lee.
Authors’ addresses: Jingwei Li, lijw1987@gmail.com, University of Electronic Science and Technology of China; State
Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Chengdu,
Sichuan, China; Patrick P. C. Lee, pclee@cse.cuhk.edu.hk, The Chinese University of Hong Kong, Hong Kong, China;
Chufeng Tan, chufengtan97@gmail.com, University of Electronic Science and Technology of China, Chengdu, Sichuan,
China; Chuan Qin, chintran27@gmail.com, The Chinese University of Hong Kong, Hong Kong, China; Xiaosong Zhang,
johnsonzxs@uestc.edu.cn, University of Electronic Science and Technology of China, Chengdu, Sichuan, China.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from permissions@acm.org.
© 2019 Association for Computing Machinery.
1553-3077/2019/1-ART1 $15.00
https://doi.org/0000001.0000001
ACM Trans. Storage, Vol. 1, No. 1, Article 1. Publication date: January 2019.