Using FPGA to Accelerate Deduplication on High-performance SSD
Zhengguo Chen
a
, Nong Xiao
b
, Fang Liu
c
, Yuxuan Xing
d
, Zhen Sun
e
State Key Laboratory of High Performance Computing, National University of Defense Technology,
Changsha, China
a
zgchen.nudt@gmail.com,
b
nongxiao@nudt.edu.cn,
c
fangliu@nudt.edu.cn,
d
xinghuan1990}@gmail.com,
e
sunzhen@nudt.edu.cn
Keywords: SSD, Deduplication, Software-based, Hardware-based, Performance, Endurance
Abstract. Data deduplication technology applied in solid state disks (SSD), can reduce the amount of
write operations and garbage collection, and thus improve writing performance and prolong lifetime.
With the significant increase of write performance onto SSD, whether deduplication based on SSD
could be a performance bottleneck of SSD comes to a spot worthy of our attention. To this end, this
paper, firstly, performs an experiment on achieving deduplication via software method, and reveals
that software-based deduplication decreases SSD's read and write performance. And then a
hardware-based deduplication with details is proposed and implemented to accelerate deduplication
using FPGA, and expected results are achieved. Finally, we come to the conclusion that
hardware-based deduplication can not only guarantee read and write performance of SSD, but also
save storage capacity and enhance endurance.
Introduction
With the technological advance of industry, in recent years, Flash as a new type of memory is
evolving rapidly, with its ever increasing storage density and falling price. Blessed with its unique
and outstanding features, such as high performance, low power consumption, small size, no noise
and anti-vibration characteristics, Flash-based SSDs (Solid State Disks) are promisingly become a
new generation of storage media. However, in some data computing environments, SSDs, despite of
many superb characteristics, are not used as reliable storage because of its increased error rate with
aging and limited lifespan [1].
In 2007, deduplication in storage systems was proposed. And it is reported that in the current
data centers, more than 80% of the data stored were redundant, adding nearly 10 times cost of
non-redundant data to process [2], posing an urgent need for data deduplication, which could
eliminate duplicated data and improve the storage space utilization.
Deduplication is an effective method of data compression. Firstly, it partitions a large data object
into small chunks and calculates fingerprint for each chunk. Then each chunk will be determined if
it is redundant by fingerprints matching with others, and only a single copy of each chunk will be
stored with all jobs done.
In practical applications [3,4,5], deduplication is widely used in storage system to improve
storage space utilization and IO performance. And recently, efforts to employ deduplication
technology in SSDs have been made in a bid to lengthen the lifespan and enhance the reliability and
IO performance of SSDs by reducing write data [6].
In this paper, we try to adopt separately software method and hardware method to study
deduplication based on a late Samsung SSD called 840 EVO [7] to find out whether deduplication
is suitable for high-performance SSD. Software method means deduplication is implemented in the
host based on a SSD device, while the hardware-based deduplication means using FPGA to
accelerate deduplication in the FTL layer. Our two main motivations are as follows:
• To what extent the software method will influence the read and write performance of SSD.
• Whether hardware-based deduplication on the read and write performance of SSD by
employing some accelerating techniques such as two-way fingerprint calculation, group hash and so
on.
Advanced Materials Research Vol. 1042 (2014) pp 212-217
Online available since 2014/Oct/08 at www.scientific.net
© (2014) Trans Tech Publications, Switzerland
doi:10.4028/www.scientific.net/AMR.1042.212
All rights reserved. No part of contents of this paper may be reproduced or transmitted in any form or by any means without the written permission of TTP,
www.ttp.net. (ID: 220.169.15.63-20/12/14,05:31:18)