没有合适的资源?快使用搜索试试~ 我知道了~
重复数据删除方法研究-研究论文
需积分: 15 1 下载量 97 浏览量
2021-05-19
18:53:21
上传
评论
收藏 165KB PDF 举报
温馨提示
数据是任何组织出于生产需求或获取更多利润所必需的部分。 具有变化的数据的快速增长是需要处理或处理的庄严问题。 数据正在以更高的速度生成,因此必须以唯一的方式存储在数据库中。 重复数据删除是一种从数据库中删除重复数据并提供数据备份的方法。 在重复数据删除中,有许多算法是可行的,它们基本上可以检测和消除多余的数据并存储数据内容的唯一副本。 在本文中,我们首先调查了重复数据删除的背景和关键特征,然后根据重复数据删除过程的关键策略对重复数据删除的研究进行了分类。 有关重复数据删除的最新技术的概述和内容有助于识别和理解数据重复数据删除系统的最重要的设计注意事项。 最后,我们起草了涉及基于重复数据删除的存储系统的未解决问题和未来的研究方向。
资源推荐
资源详情
资源评论
2
nd
INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND SOFTWARE ENGINEERING (ICACSE-19)
A Study of Data De-duplication Methods
Dinesh Mishra
1
and Dr. Sanjeev Patwa
2
Abstract—Data is the most imperative part of any organization for their productive need or to make more profit. Rapid growth of data
with variations is solemn issue to handle or process. Data is generating at higher rate that has to be stored in the databases with
uniqueness. Deduplication is an approach to abolish the duplicated data from the databases and provides the backup of the data. In
data deduplication numerous algorithm are feasible that basically detect and eliminate the superfluous data and store unique copy of
data contents. In our paper, we first survey the background and key features of de-duplication of data, and then classify the research in
data de-duplication according to the key strategy of the data de-duplication process. The summary and locution of the state of the art
on de-duplication helps identify and understand the most important design considerations for data de-duplication systems. Finally, we
draft the open problems and future research directions covering de-duplication-based storage systems.
Keywords: Data de-duplication; data reduction; Level of de-duplication; de-duplication approaches; storage systems.
I. INTRODUCTION
Deduplication is becoming increasingly important in that
it can effectively reduce the storage space in the cloud
server.The exponential growth of data volumes makes it
necessary to explore techniques such as data
deduplication to make data manageable and reduce the
archive or backup cost. With the rapid growth of cloud
data volume, deduplication technology has become
important to cloud storage. It can eliminate redundant
copies of user-uploaded data to save storage space and
management cost of cloud storage server. [1].
1
The use of cloud for storing and backing up data by
companies and common people for sharing information
has increased awfully over the past few years. Data
deduplication is a commonly used method to reduce
storage requirements in data centers and enterprise
servers. It operates by identifying and removing duplicate
blocks of data over long ranges. For example, consider a
corporate logo used in many slide decks of that
corporation. The enterprise storage server, using
deduplication, can store only the first occurrence of the
logo and replace subsequent occurrences with pointers to
the earlier stored one. [25]. De-duplication belongs to data
compression technique for redundant data reduction [5].
Today in IT budgets, on an average of 13% of the
money being invested on storage capacity. Data to grow
more quickly says IDC’s Digital Universe study [3].
1
Ph.D Scholar, Dept. of Comp. Sc. & Engg.,
School of Engg. & Tech, MODY University,
Lakshmangarh, Rajasthan, India
2
Asstt. Prof., Dept. of Comp. Sc. & Engg.,
School of Engg. & Tech, MODY University,
Lakshmangarh, Rajasthan, India
E-mail:
1
dmishra1475@gmail.com,
2
sanjeevpatwa.cet@modyuniversity.ac.in
These impacts creates problems like degradation of
performance and more operational costs. So in order to
swamped the above problems and handle system, the
concept of De-duplication is derived.
A Data De-duplication refers to the eradication of
redundant data by physically storing only the data that is
unique. This technique effectively reduces storage
capacity requirements and has application whenever
multiple copies of same data set need to be stored. De-
duplication reduces the required data storage capacity,
since only single copy of data is stored. Some researches
carried out the area of data de-duplication are [17] [20].
In general, data de-duplication increases the speed of
services and reduces costs. It improves the efficiency of
disk based backups.
De-duplication reduces the storage cost as it
allows reducing the amount of physical capacity
required for the backup job.
As the De-duplication curtails the amount disk
that is needed to support a backup job it will
reduce the power, space, and cooling
requirements of the disk.
II. DE-DUPLICATION PROCESS
De-duplication process mainly has four stages that is
Chunking, Fingerprinting, Indexing and Writing [25].
Figure 1: De-duplication process
Chunking
Fingerprinting
Indexing
Writing
Electronic copy available at: https://ssrn.com/abstract=3351012
资源评论
weixin_38658568
- 粉丝: 3
- 资源: 903
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功