从变化中找到恒常：在IaaS云上重新审视网络性能意识的优化资源-CSDN文库

96 浏览量 2021-03-09 16:50:05 上传评论收藏 767KB PDF 举报

在网络性能优化领域，传统的分布式应用优化方法依赖于网络拓扑结构的假设，以及直接使用多对网络性能的测量结果来作为性能优化的依据。然而，在IaaS（基础设施即服务）云环境中，网络拓扑结构对用户来说是不可见的，而且直接使用网络性能的测量可能无法代表长期的性能。随着虚拟化技术的广泛应用，如何在IaaS云上重新审视并优化网络性能意识成为一个新的研究课题。这篇研究论文提出了一种新的优化方法，其核心思想是将网络性能的恒定成分从动态网络性能中分离出来，并通过数学方法——鲁棒主成分分析（RPCA）最小化二者之间的差异。研究者们利用这种恒定成分来指导网络性能意识的优化，并通过采用MPI的集体通信和通用拓扑映射以及两个现实世界中的应用实例——N体和共轭梯度（CG）方法来展示他们方法的有效性。实验在Amazon EC2上进行，并通过模拟验证了其在指导优化方面显著的性能提升。云计算特别是IaaS云的出现为很多分布式应用提供了便捷的计算基础设施，例如许多科学和数据密集型应用已经部署在Amazon EC2、Windows Azure和Google Compute Engine等平台上。这些包括生命科学、物理、大数据处理以及其他在亚马逊案例研究中列出的应用。与传统的集群和网格计算环境相比，云计算提供的是一种按需虚拟机的计算资源和存储，用户可以通过信用卡从公共云提供者那里购买计算资源和存储空间。文章中提到的网络性能意识优化（Network Performance Aware Optimization）是一个有效的方法，它能够针对传统网络环境中的分布式应用进行优化。然而，在IaaS云环境中，由于虚拟化的存在，网络拓扑结构对用户是不可见的。此外，直接使用一些测量得到的节点间网络性能数据可能会忽视了网络性能的长期稳定性。因此，为了在IaaS云上实现对网络性能的优化，研究者提出了一种新的优化框架，该框架能够从动态变化的网络性能中识别并分离出稳定的成分，并以此为基础进行网络优化。鲁棒主成分分析（RPCA）是实现上述分离的关键数学方法。这种方法能够帮助我们在去除噪声的同时提取出网络性能数据中的主要特征，从而更好地进行性能优化。通过这种方法，研究者能够获得一种更加稳定和可靠的网络性能测量，使其能够更加有效地用于网络性能优化中。实验结果表明，该方法在处理MPI集体通信优化、通用网络拓扑映射以及N体模拟和共轭梯度优化等场景时，能显著提高性能。这项研究工作揭示了在IaaS云环境上进行网络性能优化的新途径，为云计算环境下的分布式应用性能优化提供了新的理论和实践依据。随着云计算的进一步普及，这种针对云环境特点优化网络性能的方法将会越来越重要。研究者们的工作不仅有助于提升云计算用户的实际体验，也为云服务提供商在提供高质量服务方面提供了重要的参考。

资源推荐

资源详情

资源评论

Finding Constant From Change: Revisiting Network

Performance Aware Optimizations on IaaS Clouds

Yifan Gong

NEWRI

Interdisciplinary Graduate School

Nanyang Technological University, Singapore

Bingsheng He

School of Computer Engineering

Nanyang Technological University, Singapore

Dan Li

Tsinghua University, China

Abstract—Network performance aware optimizations have

long been an effective approach to optimizing distributed applica-

tions on traditional network environments. However, the assump-

tions of network topology or direct use of several measurements

of pair-wise network performance for optimizations are no longer

valid on IaaS clouds. Virtualization hides network topology from

users, and direct use of network performance measurements may

not represent long-term performance.

To enable existing network performance aware optimizations

on IaaS clouds, we propose to decouple constant component from

dynamic network performance while minimizing the difference by

a mathematical method called RPCA (Robust Principal Compo-

nent Analysis). We use the constant component to guide network

performance aware optimizations and demonstrate the efﬁciency

of our approach by adopting network aware optimizations for

collective communications of MPI and generic topology mapping

as well as two real-world applications, N-body and conjugate

gradient (CG). Our experiments on Amazon EC2 and simulations

demonstrate signiﬁcant performance improvement on guiding the

optimizations.

Keywords—Cloud Computing, Network Performance Aware Op-

timization, RPCA

I. INTRODUCTION

Infrastructure-as-a-service (IaaS) clouds have emerged as

a popular computing infrastructure for many distributed appli-

cations. For example, many scientiﬁc and data-intensive appli-

cations have been deployed in Amazon EC2, Windows Azure

and Google Compute Engine, including life sciences [26], [29],

physics [31], [32], [28], [40], big data processing [17], [8]

and others listed in Amazon case studies [1]. Compared with

the traditional cluster and grid computing environments, cloud

computing offers on-demand virtual machines in the pay-as-

you-go manner. Every one with a credit card can buy the com-

putational resources and storage from public cloud providers.

Due to the pay-as-you-go nature, performance optimizations

are important in not only improving the productivity but also

reducing the total ownership cost. Network performance is

often a key issue for the overall performance of distributed

applications. Although there have been many research studies

on designing novel network bandwidth allocations (e.g., [43],

[2], [33]) or data center networks [16] for IaaS clouds, little

attention has been paid to how applications can adapt their

optimizations to IaaS clouds. Therefore, this paper revisits the

network performance aware optimizations on IaaS clouds.

Network performance aware optimizations have long been

an effective approach to optimize distributed applications on

traditional network environments (e.g., local clusters and grid-

s [39], [24], [21], [38], [3]). Those optimizations have the

assumptions of the a-priori knowledge of network topology or

direct use of several measurements of network performance.

Essentially, those assumptions rely on estimating or measuring

the all-link network performance in a cluster [19], [3]. Given

the all-link performance, communication links are carefully

selected for minimizing the network transfer time of the appli-

cation. For example, one could select the best performing links

for constructing the communication tree in an MPI collective

operation [3].

When revisiting the network performance aware optimiza-

tions on IaaS clouds, we start with studying the network

performance of a virtual cluster (a set of virtual machines).

Data centers consisting of tens of thousands of commodity

servers are the underlying infrastructure for IaaS clouds. Pre-

vious studies have studied the impact of virtualization [41]

and network interference [4] in IaaS clouds. Machine pairs

can have very different network performance as shown in the

previous studies [14], [2]. That means, link selection continues

to be important in virtual clusters, and network performance

aware optimizations are still important to improve the applica-

tion performance, especially for the communication-intensive

applications. A natural question is whether and how we can

apply existing network performance aware optimizations on

virtual clusters of IaaS clouds.

Unfortunately, we ﬁnd that the assumptions of existing

network performance aware optimizations are no longer valid

on IaaS clouds. The topology information is unavailable or

inaccurate in virtual clusters. Virtualization hides the network

hardware and topology from users, without exposing the actual

conﬁgurations of the underlying hardware. Moreover, due to

the cloud system dynamics such as virtual machine consoli-

dation [37], ﬂexible resource management [25] and dynamic

network ﬂow scheduling [4], the static topology information is

no longer sufﬁcient for representing the network performance.

Some recent studies [9], [10] make optimization decisions

based on only a few ad-hoc measurements on the end-to-

end performance. However, such direct use of measurements

is inherently affected by dynamic network and is inaccurate to

reﬂect the long-term performance.

To enable existing network performance aware optimiza-

tions on IaaS clouds, we propose to decouple the constant

component from the dynamic network performance while

minimizing the difference between the network performance

SC14, November 16-21, 2014, New Orleans, LA, USA

978-1-4799-5500-8/14/$31.00

 2014 IEEE

and the constant component. In our work, we treat the constant

component as the component in the network performance

that lasts for a long period until we observe some signiﬁcant

changes in the network performance. The difference can also

be considered as error, since we use the constant component

to guide network performance aware optimizations. It is a

non-trivial task to ﬁnd the constant component from dynamic

network performance.

Interestingly, this problem can be cast into a common prob-

lem in the computer vision, named RPCA (Robust Principal

Component Analysis) [6]. RPCA is to solve the following

problem: for a data matrix, RPCA is used to identify a low-

rank component and a sparse component with minimized

norm, subject to that the sum of the two components are equal

to the data matrix. There are many important applications with

the data that can naturally be modeled as a low-rank plus a

sparse component [6]. Speciﬁcally, we develop a novel ap-

proach based on RPCA with special design and optimizations

for practical use on IaaS clouds, and leverage the theoretical

properties of RPCA to ﬁnd the constant component from the

dynamic network performance. We model each row of the data

matrix to be one snap-shot of all-link performance for the cloud

at a certain point of time, and apply RPCA on that data matrix

to obtain the constant component and error as the low-rank and

sparse components, respectively.

This seemingly simple design of decoupling the constant

component from network performance enables existing or new

network performance aware optimizations in virtual clusters.

Based on the constant component, conventional network per-

formance optimizations become valid, i.e., we can select the

best performing links with the minimized errors. On the other

hand, with the error component, we are able to determine the

effectiveness of network performance aware optimizations in

virtual clusters, e.g., if the error is too large, the network of

the IaaS cloud is too dynamic and network performance aware

optimizations are useless.

We conduct our experiments with two complementary

approaches: one is with the calibration on Amazon EC2 and the

other is with a simulator based on ns-2. The ﬁrst experiment is

to assess our approach in the public cloud, and the latter one is

for full control of the network trafﬁc on a large-scale cluster.

We assess the impact of network performance aware optimiza-

tions on two kinds of basic applications including collective

communications of MPI (Message Passing Interface) [39] and

the generic topology mapping strategy [19] as well as two

real-world applications, N-body and conjugate gradient (CG).

Our experiments show that our RPCA-based approach can

determine the degree of network dynamics for virtual clusters

in the cloud. We ﬁnd that the current network of Amazon

EC2 is relatively stable, and network performance aware

optimizations are still important on Amazon EC2. Moreover,

our RPCA-based approach effectively guides the network per-

formance aware optimizations. On Amazon EC2, the proposed

approach signiﬁcantly improves the performance, reducing the

average elapsed time of broadcast and scatter of MPI and

topology mapping by 20–40% and 8–20% over the baseline

approach and the approach based on direct use of the network

measurements. For N-body and CG, the average improvement

can reach 25% and 31% over the baseline, respectively. In

the simulation on ns-2, we compare with the topology-aware

algorithm [21], [38] and ﬁnd that our approach obtains 25–

40% performance improvement.

The rest of the paper is organized as follows. We introduce

the preliminary and related work on cloud networks, RPCA

and two examples of network performance aware optimizations

in Section II. We present the problem deﬁnition in Section III,

and the RPCA-based approach in Section IV. In Section V,

we show our experimental results. Finally, we conclude this

paper in Section VI.

II. PRELIMINARY AND RELATED WORK

In this section, we brieﬂy introduce the preliminary and

the related work that are closely related to our study.

A. Cloud Network

Previous studies (e.g., [2], [41], [8]) have shown signiﬁcant

variability between network performance of different machines

in data centers. The network performance variability negatively

impacts application performance, and also makes traditional

network performance aware optimizations (e.g., [19], [3])

infeasible. To avoid re-inventing all those network performance

aware optimizations, this paper develops a new approach to

capture the long-term network performance in cloud, and allow

existing/new optimizations applicable to cloud.

Researchers have developed mechanisms on network band-

width allocation in order to obtain predictable performance

for the user. ElasticSwitch [33], SecondNet [15], Oktopus [2]

and TIVC [43] aim at reserving network bandwidth between

each pair of VMs to offer guaranteed network bandwidth

allocations. Those studies are mainly from the cloud provider’s

perspective, whereas this paper optimizes the network perfor-

mance for virtual clusters created by users, mainly from users’

perspective. Therefore, we do not have the information on

the underlying hardware or topology, or the runtime dynamics

about network transfers and virtualization details.

Network topology inference techniques have been inves-

tigated in the traditional environments [23], [36] and cloud

environment [12]. We refer readers to a survey [7] for more

details on classic techniques for network topology discovery

and inference. The information given by basic diagnostic tools

like traceroute is incomplete in the virtualized cloud.

B. Robust Principal Component Analysis

PCA is arguably the most widely used statistical tool

for data analysis and dimensionality reduction. However, the

accuracy of PCA is prone to noise or gross errors in the

input data. Robust Principal Component Analysis (RPCA) [6]

was proposed to improve the robustness of PCA under noisy

or error measurements. The basic idea is to recover a low-

rank matrix from a series of corrupted measurements and to

minimize the noise component that is assumed to be sparse but

unknown. Suppose A is a data matrix, D is a low-rank matrix

and E is a sparse matrix. RPCA is to solve the following

optimization problem, where ∥E∥

is the zero norm of E.

minimize rank(D) + λ∥E∥

subject to A = D + E

RPCA has been widely used in computer vision. It can

be used to solve many important applications like video

剩余11页未读，继续阅读

评论收藏

内容反馈

weixin_38741950

粉丝: 2
资源: 962

从变化中找到恒常：在IaaS云上重新审视网络性能意识的优化

基于IaaS的云资源调度优化研究

周静：BingoCloud IaaS/PaaS构造新型企业云基础架构实践

IaaS云计算管理系统-企业云时代转型

云计算那些事儿：从IaaS到PaaS进阶.docx

云平台 IaaS 层内生安全技术研究.docx

面向IaaS云服务的云系统中虚拟机监控及证据采集.pdf

TMT周报：公有云IaaS市场集中度提升.zip

通信行业：云服务：IaaS巨头云集，垂直赛道领域机遇凸显.pdf

IaaS云平台存储架构规划设计.pdf

基于IaaS的云计算基本性能测试标准与方法.pdf

工商银行IaaS云建设的研究与实践.pdf

通信行业：云服务：IaaS巨头云集，垂直赛道领域机遇凸显.zip

计算机应用行业深度报告：IaaS行业高景气，资本开支扩张利好上中游.pdf

基础设施即服务（IaaS）

季昕华：IaaS云安全实践和挑战

中国联通私有云IaaS管理平台_操作手册-管理平台分册88页.docx

计算机行业周报：我国公有云IaaS市场高速发展，腾讯与宝马深化合作.pdf

先电云计算IAAS平台手工搭建笔记.docx

基于OpenStack的IaaS云管理平台的设计与实现.docx编程资料

ICFF：一种IaaS模式下的云取证框架

开源IaaS云服务软件平台的分析与比较.pdf

中国云计算上市公司一季报总结：IaaS高速增长，SaaS崭露头角.pdf

VMware Workstation搭建先电IaaS云平台.docx

TMT周报：公有云IaaS市场集中度提升.pdf

iaas下载地址.txt

藏经阁-内存取证与IaaS云平台恶意行为的安全监控.pdf

面向虚拟基础设施的云服务:IaaS 和 Eucalyptus

行业分类-设备装置-一种基于IaaS云平台的资源调度方法和装置.zip

最新资源