重复数据删除方法研究-研究论文资源-CSDN文库

需积分: 15 97 浏览量 2021-05-19 18:53:21 上传评论收藏 165KB PDF 举报

资源推荐

资源详情

资源评论

INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND SOFTWARE ENGINEERING (ICACSE-19)

A Study of Data De-duplication Methods

Dinesh Mishra

and Dr. Sanjeev Patwa

Abstract—Data is the most imperative part of any organization for their productive need or to make more profit. Rapid growth of data

with variations is solemn issue to handle or process. Data is generating at higher rate that has to be stored in the databases with

uniqueness. Deduplication is an approach to abolish the duplicated data from the databases and provides the backup of the data. In

data deduplication numerous algorithm are feasible that basically detect and eliminate the superfluous data and store unique copy of

data contents. In our paper, we first survey the background and key features of de-duplication of data, and then classify the research in

data de-duplication according to the key strategy of the data de-duplication process. The summary and locution of the state of the art

on de-duplication helps identify and understand the most important design considerations for data de-duplication systems. Finally, we

draft the open problems and future research directions covering de-duplication-based storage systems.

Keywords: Data de-duplication; data reduction; Level of de-duplication; de-duplication approaches; storage systems.



I. INTRODUCTION

Deduplication is becoming increasingly important in that

it can effectively reduce the storage space in the cloud

server.The exponential growth of data volumes makes it

necessary to explore techniques such as data

deduplication to make data manageable and reduce the

archive or backup cost. With the rapid growth of cloud

data volume, deduplication technology has become

important to cloud storage. It can eliminate redundant

copies of user-uploaded data to save storage space and

management cost of cloud storage server. [1].

The use of cloud for storing and backing up data by

companies and common people for sharing information

has increased awfully over the past few years. Data

deduplication is a commonly used method to reduce

storage requirements in data centers and enterprise

servers. It operates by identifying and removing duplicate

blocks of data over long ranges. For example, consider a

corporate logo used in many slide decks of that

corporation. The enterprise storage server, using

deduplication, can store only the first occurrence of the

logo and replace subsequent occurrences with pointers to

the earlier stored one. [25]. De-duplication belongs to data

compression technique for redundant data reduction [5].

Today in IT budgets, on an average of 13% of the

money being invested on storage capacity. Data to grow

more quickly says IDC’s Digital Universe study [3].

Ph.D Scholar, Dept. of Comp. Sc. & Engg.,

School of Engg. & Tech, MODY University,

Lakshmangarh, Rajasthan, India

Asstt. Prof., Dept. of Comp. Sc. & Engg.,

School of Engg. & Tech, MODY University,

Lakshmangarh, Rajasthan, India

E-mail:

dmishra1475@gmail.com,

sanjeevpatwa.cet@modyuniversity.ac.in

These impacts creates problems like degradation of

performance and more operational costs. So in order to

swamped the above problems and handle system, the

concept of De-duplication is derived.

A Data De-duplication refers to the eradication of

redundant data by physically storing only the data that is

unique. This technique effectively reduces storage

capacity requirements and has application whenever

multiple copies of same data set need to be stored. De-

duplication reduces the required data storage capacity,

since only single copy of data is stored. Some researches

carried out the area of data de-duplication are [17] [20].

In general, data de-duplication increases the speed of

services and reduces costs. It improves the efficiency of

disk based backups.

 De-duplication reduces the storage cost as it

allows reducing the amount of physical capacity

required for the backup job.

 As the De-duplication curtails the amount disk

that is needed to support a backup job it will

reduce the power, space, and cooling

requirements of the disk.

II. DE-DUPLICATION PROCESS

De-duplication process mainly has four stages that is

Chunking, Fingerprinting, Indexing and Writing [25].

Figure 1: De-duplication process

Chunking

Fingerprinting

Indexing

Writing

Electronic copy available at: https://ssrn.com/abstract=3351012

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余4页未读，立即下载

评论收藏

内容反馈

weixin_38658568

粉丝: 3
资源: 903

重复数据删除方法研究-研究论文

论文研究-利用重复数据删除和增量编码有效利用基于闪存的SSD.pdf

删除重复数据

删除重复数据的算法

oracle删除重复数据方法

删除MySQL重复数据的方法

高精度重复图像重复数据删除方法。

revdedup:重复数据删除

删除数组内重复的数据

毕业论文-基于iSCSI的重复数据删除系统的设计与实现.doc

重复数据删除器

重复数据删除 dedup

ORACLE删除重复数据

oracle 删除重复数据

excel中删除重复数据

毕业论文-数据挖掘中数据预处理方法及应用.doc

最新资源