OntheOptimalProviderSelectionforRepairinDistributedStorageSystemwithNetworkCoding资源-CSDN文库

144 浏览量 2021-02-09 21:37:27 上传评论收藏 953KB PDF 举报

标题《在带有网络编码的分布式存储系统中进行维修的最优提供者选择》涉及到的是分布式存储系统、网络编码技术、提供者选择和数据路由优化的问题。这是研究论文的范畴，表明本文将对这一领域进行学术探讨与分析。在分布式存储系统（DSS）中，可靠性是通过在互联网上的存储服务器间分布冗余数据来提供的。网络编码（NC）技术已经广泛应用于分布式存储系统中，因为它能够在低修复时间的情况下提升系统的可靠性。当一个不可用的存储服务器需要被替换时，通常会首先被一个新的服务器所取代，这被称为“新来者”（newcomer）。接着，需要从幸存的服务器中选择多个存储服务器作为提供者（providers），并通过互联网向新来者传输它们的编码数据，以便再生丢失的数据。因此，在大型的分布式存储系统中，提供者选择和数据路由在再生阶段对再生时间的性能有很大影响。本文将探讨的是在带有网络编码的分布式存储系统中如何选择最优的提供者以及进行数据路由优化，目的是最小化数据再生时间。具体来讲，首先会在带有网络编码的DSS中定义问题。在给定提供者的情况下，将问题建模为数学规划问题。基于该数学规划，提出最优提供者选择和数据路由问题的整数线性规划问题，并开发基于线性规划松弛（BLP）的高效近似算法。进行了广泛的仿真实验，结果表明所提出的算法的有效性。由于文章中提到“Chengjin Jia, Jin Wang”等多位作者分别来自不同的研究机构，这表明该研究具备跨学科及国际合作的特点，反映了该研究领域的复杂性和先进性。此外，涉及到的关键词包括网络编码、分布式存储系统、提供者选择、路由、线性规划以及LP松弛等，这些关键词都指示了本文涉及的核心概念和理论工具。文章的引言部分提到了大数据的快速发展，信息爆炸导致了对存储系统可靠性和效率的更高要求。因此，在构建和维护大规模分布式存储系统时，如何在保证可靠性的同时，有效降低数据修复时间成为了研究的关键问题。网络编码作为一种前沿技术，对于解决这一问题具有潜在的重大影响。由于文章内容中存在扫描识别错误，将这部分内容进行了适当的语义修正和补充，以保持文章内容的连贯性和完整性。具体到本文的详细知识点，主要包括以下几个方面： 1. 分布式存储系统的可靠性原理：介绍在分布式存储系统中如何通过数据冗余分布来提供系统的可靠性。 2. 网络编码技术（NC）在存储系统中的应用：解释网络编码如何帮助提高分布式存储系统的可靠性和降低修复时间。 3. 提供者选择和数据路由在再生阶段的作用：阐述在数据修复过程中，如何选择合适的存储服务器提供者，并通过有效的路由策略传输数据以再生丢失的数据。 4. 优化问题的建模：说明如何将提供者选择和数据路由的问题转化为数学规划问题，并进一步使用线性规划松弛技术开发近似算法。 5. 近似算法的效率分析：通过模拟实验验证算法的效率，确保其在实际应用中能够达到近似最优的修复时间。以上是基于给定文件内容的知识点概述，它们详细地描述了分布式存储系统和网络编码领域内的核心问题和研究方法。

资源推荐

资源详情

资源评论

On the Optimal Provider Selection

for Repair in Distributed Storage System

with Network Coding

Chengjin Jia

,JinWang

)

,YanqinZhu

Xin Wang

, Kejie Lu

3,4

, Xiumin Wang

, and Zhengqing Wen

Department of Computer Science and Technology,

Soochow University, Suzhou 215006, China

wjin1985@suda.edu.cn

School of Computer Science, Fudan University, Shanghai 200433, China

College of Computer Science and Technology,

Shanghai University of Electronic Power, Shanghai 200444, China

Department of Electrical and Computer Engineering,

University of Puerto Rico at Mayag¨uez, Mayag¨uez 00681-9000, USA

School of Computer and Information, Hefei University of Technology,

Hefei 230000, China

Abstract. In large-scale distributed storage systems (DSS), reliabil-

ity is provided by redundancy spread over storage servers across the

Internet. Network coding (NC) has been widely studied in DSS because

it can improve the reliability with low repair time. To maintain reli-

ability, an unavailable storage server should be ﬁrstly replaced by a

new server, named new comer. Then, multiple storage servers, called

providers, should be selected from surviving servers and send their coded

data through the Internet to the new comer for regenerating the lost

data. Therefore, in a large-scale DSS, provider selection and data rout-

ing during the regeneration phase have great impact on the performance

of regeneration time. In this paper, we investigate a problem of optimal

provider selection and data routing for minimizing the regeneration time

in the DSS with NC. Speciﬁcally, we ﬁrst deﬁne the problem in the DSS

with NC. For the case that the providers are given, we model the problem

as a mathematical programming. Based on the mathematical program-

ming, we then formulate the optimal provider selection and data routing

problem as an integer linear programming problem and develop an eﬃ-

cient near-optimal algorithm based on linear programming relaxation

(BLP). Finally, extensive simulation experiments have been conducted,

and the results show the eﬀectiveness of the proposed algorithm.

Keywords: Network coding

· Distributed storage system · Provider

selection

· Routing · Linear programming · LP relaxation

1 Introduction

With the rapid development of big data, the information explosion results in

the rapid development of data storage. There are about 5 Exabytes independent

 Springer International Publishing Switzerland 2015

G. Wang et al. (Eds.): ICA3PP 2015, Part IV, LNCS 9531, pp. 506–520, 2015.

DOI: 10.1007/978-3-319-27140-8

On the Optimal Provider Selection for Repair in Distributed Storage System 507

information created in 2015 and 8.6 Zettabytes of data center traﬃc by 2018 [1].

Therefore, many large-scale DSSs, e.g., Google File System [2], Azure [3], are

widely used for achieving high reliability by storing the data redundantly over

multiple unreliable storage servers.

Reliability is one of the basic requirements for these DSSs that users can

get data anywhere anytime. The traditional methods for providing reliability in

DSSs include replication and Reed-Solomon codes [4]. In 2000, NC was proposed

to increase the throughput of the network, balance network load and so on [5].

It has been proved distributed storage applications can achieve good beneﬁts

with NC [6]. When using NC, it keeps the MDS property of erasure code that

the original ﬁle is divided into k packets, then encoded into n coded packets [7].

Users can recover the original ﬁle by any set of k coded packets among n coded

packets. Therefore, more and more researchers pay attention on NC in DSS.

Although NC can improve storage reliability, the data of distributed storage

systems is prone to be damaged, such as an outage of the server, invasion by

the hackers, disk damaged. To keep the same level of reliability, when a server

fails or leaves the system, a new server has to join the system and accesses

existing servers to regenerate the lost data, which leads to repair bandwidth

consumption and regeneration time. Based on the ideas of NC, the functional

minimum storage regeneration (FMSR) codes have been proposed to minimize

the repair bandwidth or regeneration time in DSS [8,9].

Although FMSR code can signiﬁcantly minimize repair bandwidth, it cannot

ensure that the regeneration time is minimized. In order to reduce the regener-

ation time, Li et al. proposed a tree-structured data regeneration in the hetero-

geneous network [10,11]. Most of current studies focus on obtaining data from

multiple surviving servers to regenerate the lost data under the condition that

the bandwidth of the path between each servers and the new comer is given.

However, each link in physical network may be shared by multiple paths, which

means the bandwidth of each link should be shared between diﬀerent paths.

Therefore, in practice, the bandwidth of the routing path from each selected

server, i.e., provider, to the new comer may not be achieved.

Next, we introduce an example that shows the eﬀect for regenerating the

lost data by selecting a given number of servers as the providers and routing

paths from the providers to the new comer. Figure 1(a) gives the original network

topology and includes routers denoted as R

and storage servers denoted as F

.In

this example, each server F

stores diﬀerent coded packets of the same ﬁle. When

is unavailable, to keep the same level of reliability, a new server should be

installed to replace F

and acquire data packets from multiple available storage

servers to regenerate the lost data. Therefore, in this example, we also denote the

new comer as F

. We assume the number of providers is 3, which is denoted as d

in the rest of the paper and the size of the ﬁle is M = 300 Mb. With the minimum-

storage regenerating code [12,13], each server storages α = M/k = 150 Mb

data and F

needs to download β = α/(d − k +1) = 75Mb data from each

provider. The bandwidths of the links range from 30 Mbps to 100 Mbps.As

shown in Fig. 1(a), the maximum transmission rate from each storage server to

剩余14页未读，继续阅读

评论收藏

内容反馈

weixin_38546308

粉丝: 4
资源: 969

On the Optimal Provider Selection for Repair in Distributed Stor...

最新资源

On the Optimal Provider Selection for Repair in Distributed Stor...

Distributed Optimal Consensus Control for Multiagent Systems With Input Delay

Z codes: General Systematic Erasure Codes with Optimal Repair Bandwidthunder Minimum Storage for Distributed Storage Systems

On Practical Design for Joint Distributed Source and Network Coding

无线传能网络路由技术

Optimal Selection for Regularization Parameter in Iterative CT Reconstruction Based on the Property of Natural Image Statistics

Optimal Kalman Filtering for System with Unknown Inputs

Optimal Distributed Kalman Filtering Fusion with State Equality Constraint

基于对等结构的广域网分布式存储系统研究

Simulation of Active Heave Compensation System with Optimal Controller based on Dynamic Vibration Absorber for Deepsea Mining System

Distributed Optimal Kalman Filtering for

El-Gamal-A.--Kim-Y.-H.-Network-Information-Theory_Coding Theory_

rand03.6.rar_33 bus ieee_DG on matlab_GA DG_optimal bus_optimal

Optimal portfolio selection under the estimation risk in mean return.pdf

Development of an Optimal Vehicle-to-Grid Aggregator for Frequency Regulation

optimal measurement methods for distributed parameter system identification

On computing the global time optimal motions of robotic manipulators

Optimal Control Theory for Applications

《Principles of Communication System Simulation with Wireless Aplications》源代码

Optimal Transport for Domain Adaptation

Optimal financing and dividend strategies in a dual model with proportional costs

On Optimal service selection

Analytical Network and System Administration: Managing Human-Computer Systems

Intelligent Optimal Adaptive Control for Mechatronic Systems

Optimal Trajectory Generation for Dynamic Street Scenarios in a Frene´t Frame

Optimal control of an HVAC system using cold storage

Reward Rate Maximization and Optimal

Research on Indoor Location Algorithm Based on Wi-Fi

最新资源