SpeedinguppacketIOinvirtualmachines资源-CSDN文库

需积分: 10 152 浏览量 2019-05-24 16:04:51 上传评论收藏 237KB PDF 举报

虚拟化技术已成为现代数据中心和云计算平台不可或缺的一部分。然而，虚拟机中的网络性能，尤其是在高速率的封包输入输出（Packet I/O）方面，一直是一个技术挑战。传统的虚拟化设备仿真技术在处理大量TCP流量时效率不高，这主要是因为它们最初的设计并没有考虑到网络性能的优化。本研究由来自意大利比萨大学的Luigi Rizzo, Giuseppe Lettieri和Vincenzo Maffione进行，旨在通过研究如何提高虚拟机中的封包I/O性能，来解决这一问题。传统上，虚拟机网络性能的研究主要集中于大量的TCP流量，因为这是虚拟化技术的经典应用场景。但随着技术的发展，虚拟化技术的应用场景不再局限于经典应用，例如在封包交换设备、中间件、软件定义网络等新场景中，虚拟机的使用越来越多。这些新的应用场景涉及非常高的封包速率，这些速率对于负责网络接口仿真任务的虚拟机监视器（Hypervisor）以及在虚拟机与物理网络接口卡（NICs）之间切换封包的宿主机而言都是有问题的。研究结果表明，即使在虚拟机内部，也可以通过有限但针对性的修改设备驱动程序、虚拟机监视器以及宿主机的虚拟交换机，实现每秒数百万封包的传输速率。这意味着传统网络接口卡（如Intel的e1000）的仿真完全有能力达到这样的封包速率，无需完全不同的设备模型。研究还为不同使用案例提供了各种修改方案，具体根据配置和使用案例，能够将虚拟机的网络吞吐量提高20倍甚至更多。研究中提出的一个重要结果是，通过仅仅大约100行代码的修改，就可以改进Linux和FreeBSD版本的e1000设备驱动程序，实现的原型实现使用了传统的e1000设备和基于套接字的发送/接收器，实现了超过1Mpps（百万封包每秒）的虚拟机到虚拟机的通信速率，并且是使用64字节短帧。这一速率与在裸金属上运行操作系统的速度相同，并且是原始e1000仿真的十倍以上，与virtio相比略胜一筹。此外，当虚拟机使用netmap API时，封包速率可以超过5Mpps。这项工作对于虚拟化技术在不同工作负载上的应用至关重要，因为它能够使一系列新的应用在虚拟机上成为可能。这不仅仅是在软件定义网络(SDN)等新兴领域，也在传统领域，如在提高网络性能方面具有重大意义。通过这些改进，虚拟机能够以接近物理服务器的性能运行网络密集型任务。文章还提出了一组针对性的优化方案，这些优化方案可分为三类： 1. 仅对虚拟机内的修改。 2. 仅对宿主机的修改。 3. 同时对虚拟机和宿主机进行修改。这些建议的修改包括对网络协议栈的优化、使用高性能的封包处理库（如netmap）以及采用特定于硬件的加速技术。所有这些改进方案都旨在提高虚拟机中封包I/O的效率，以支持新的应用场景。这项研究通过实际的实验和优化，对现有的虚拟化技术在高速封包处理方面进行了显著的提升。这些成果使得在虚拟化环境中运行复杂网络任务变得更加高效和可行，为未来的虚拟化应用开辟了新的道路。

资源推荐

资源详情

资源评论

Speeding up packet I/O in virtual machines

Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maﬃone,

Universit`a di Pisa, Italy

rizzo@iet.unipi.it, http://info.iet.unipi.it/∼luigi/vale/

Abstract

Most of the work on VM network performance has fo-

cused so far on bulk TCP traﬃc, which covers classical

applications of virtualization. Following popular wis-

dom (and perhaps bad initial experience with de vice

emula tion), completely new “paravirtualized de vices”

(Xenfront, virtio, vmxnet) have been designed and im-

plemented to improve network throughput.

We expect virtualization to become widely used also

for diﬀerent workloads: packet switching devices and

middleboxes, Software Deﬁned Networks, etc.. These

applications involve very high packet rates that are

problematic not only for the hypervisor (which emulates

network interface s) but also for the host itself (which

switches packets between guests and physical NICs).

In this paper we provide three main results. First, we

demonstrate how rates of millions of packets pe r second

can be achieved even within VMs, with limited but tar-

geted modiﬁcations on device drivers, hypervisors and

the host’s virtual switch. Secondly we show that em-

ulation of conventional NICs (e.g., Intel e1000) is per-

fectly capable of achieving such packet rates, without

requiring completely diﬀerent device models. Finally,

we provide sets of modiﬁcations for various use cases

(acting only on the guest, or only on the host, or on

both) and depending on the conﬁguration a nd use case

we can improve the network throughput of a VM by 20

times or more.

These results are important because they enable a

new set of applications within virtual machines. In par-

ticular, our prototype implementation achieves guest-

to-guest speeds of over 1 Mpps with short frames (and

6 Gbit/s with 1500-byte frames) using a conventional

e1000 device, and socket-based sender/receivers. This

is the same speed that we experience running the OS

on bare metal, and a ten-fold speedup over the per-

formance of the original e1000 emula tion, and slightly

better than virtio. Furthermore, we re ach over 5 Mpps

when guests use the ne tma p API.

Our work requires only small changes to device

drivers (about 100 lines, both for FreeBSD and Linux

version of e1000), similarly small modiﬁcations to the

hypervisor (we have a qemu prototype available) and

the use of the VALE sw itch as a network backend. Rel-

evant changes are being incorporated in FreeBSD and

distributed as patches fo r QEMU/Linux.

1. INTRODUCTION

Virtualization is a technology in heavy demand to

implement server consolidation, improve service avail-

ability, and make eﬃcient use of the many cores present

in today’s CPUs. Of course, users want to exploit the

features oﬀered by this new platform without losing too

much (or possibly, anything) of the performance achiev-

able on traditional, dedicated hardware (bare metal).

Over time, ingenious software solutions [5], and later

hardware support [13, 3], have mostly ﬁlled the gap for

CPU performance. Likewise, performance for storage

peripherals and bulk network traﬃc is now comparable

between VMs and bare metal, especially when I/O can

be coerced to use large blocks (e.g. through TSO/RSC)

and limited transaction rates (e.g., say less than 50 K

trans/s).

However a class of applications, made relevant by the

rise of Software Deﬁned Networking (SDN), still strug-

gles under virtualization. Software router s, s w itches,

ﬁrewalls and other middleboxes, need to deal with very

high packet rates (millions per second) that are not

amenable to reduction through the usual Network In-

terface Card (NIC) oﬄoading techniques. The “direct

mapping” of portions of vir tualization-aware NICs to

individual VMs can provide some relief, but it has scal-

ability and ﬂexibility constraints.

We then decided to explore solutions to let VMs deal

with millions of packets per second without requiring

special hardware, or imposing massive changes to OSes

or hype rvisors. In this paper we discuss the general

problem of network per formance in virtual machines,

identifying the main causes of performance loss c om-

pared to bare metal, and design and experiment with a

comprehensive set of mechanisms to ﬁll the performance

gap.

Our contribution: in detail, we i) emulate inter rupt

moderation, ii) implement “Send Combining”, a driver-

based form of batching and interrupt mode ration; iii)

introduce a n extremely simple but very eﬀective par-

avirtualized extension for the e1000 devices (or other

NICs), providing the same performance of virtio and

alikes with almost no extra complexity; iv) adapt the

hypervisor to our high sp e ed VALE [20] backend, and

v) characterize the behaviour of device polling under

virtualization.

Some of the mechanisms we propose help immensely,

especially within packet processing machines (software

routers, IDS, monitors . . . ). Especially, the fact that

we provide solutions that apply only to the guest, only

to the host, or to both, makes them applicable also in

presence of constraints (e.g., legacy g ue st softwar e that

cannot b e modiﬁed; or propr ie tary VMMs).

In our expe riments with QEMU-KVM and e1000 we

reached a VM-to-VM rate of almost 5 Mpps with short

packets, and 25 Gbit/s with 1500-byte frames, and even

higher speeds between a VM and the host. These large

speed impr ovements have been achieved with a very

small amount of code, and our appro ach can be easily

applied to other OSes and virtualization platforms. We

are pushing the relevant changes to QEMU, FreeBSD

and Linux.

In the rest of this paper, Section 2 introduces the

necessary background and terminology on virtualiza-

tion and discusses related work. Section 3 describe s

in detail the four components of our proposal, whereas

Section 4 presents experimental results, and also dis-

cusses the limitations of our work.

2. BACKGROUND AND RELATED WORK

In our (ra ther standard) virtualization model (Fig-

ure 1), Virtual Machines (VMs) run on a Host which

manages hardware resources with the help of a compo-

nent on the Host called hypervisor or Virtual Machine

Monitor (VMM, for brevity). Each VM has a num-

ber of Virtual CPUs (VCPUs, typically implemented

as threads in the host), and also runs additional IO

threads to emulate and access per ipherals. The VMM

(typica lly implemented partly in the kernel and partly

in user space) controls the execution of the VCPUs, and

communicates with the I/O threads.

The way virtual CPUs are emulated depends on the

features of the emulated CPU and of the host. The

x86 architecture does not lend itself to the trap and

emulate implementation of Virtualization [1], so histor-

ical VMMs (Vmware, QE MU) relied for the most part

on binary translation for “safe” instructions, and calls

to emulation code for others. A recent paper [5], long

but very instructive, shows how the x86 architecture

was virtualized without CP U support. The evolution of

these techniques is documented in [1]. A slowdown of

2..10 times can be expected for typical code sequences,

slightly lower if kernel support is available to intercept

Figure 1: In our virtualized execution envi-

ronment a virtual machine uses one VCPU

thread p er CPU, and one or more IO threads

to support asynchronous operation. The hy-

pervisor (VMM) has one component that runs

in userspace (QEMU) and one kernel modu le

(kvm). The virtual switch also runs within the

kernel.

memory accesses to invalid locations.

Modern CPUs provide hardware support for virtual-

ization (Intel VTX, AMD V) [13, 3], so that most of the

code for the guest OS is run directly on the hos t CPU

operating in “VM” mode. In practice, the kernel side

of a VMM enters VM mode through a system call (typ-

ically an ioctl(.. VMSTART ..)), which starts exe-

cuting the guest code within the VCPU thread, and

returns to host mode as descr ibed below.

2.1 Device emulation

Emulation of I/O devices [2 5] generally interprets ac-

cesses to I/O registers and replicates the behaviour of

the corresponding hardware. The VMM component re-

producing the emulated device is called frontend. Data

from/to the frontend ar e in turn pas sed to a component

called backend which communicates with a physica l de-

vice of the same type: a network interface or switch

port, a disk device, USB port, etc.

Access to peripherals from the guest OS, in the form

of IO or MMIO instructions, causes a context switch

(“VM exit”) that r eturns the CPU to “host” mode. VM

exits often occur a lso when delivering interrupts to a

VM. On modern hardware, the cost of a VM exit/VM

enter pair and IO emulation is 3..10 µs, compared to

the 100- 200 ns for IO instructions on bare metal.

The detour into host mode is used by the VCPU

thread to interact with the frontend to emulate the ac-

tions that the real per ipheral would perform on that

剩余11页未读，继续阅读

评论收藏

内容反馈

caofeng891102

粉丝: 172
资源: 1256

Speeding up packet IO in virtual machines

最新资源

Speeding up packet IO in virtual machines

虚拟机（Virtual Machine）

Virtual Machine Guide

Speeding up Networking - Precision IO-计算机科学

Speeding Up Multi-Relational Data Mining

Best_Practices_for_Speeding_Up_Your_Web_Site

网站优化资料( Best Practices for Speeding Up Your Web Site 中文)

Speeding up MATLAB Applications 加速 MATLAB 应用程序.pdf

藏经阁-Speeding up Spark with Data Co.pdf

ChunkStash， Speeding up Inline Storage Deduplication using Flash

藏经阁-Speeding up Spark with Data Compression on Xeon+FPGA.pdf

火山ML通过可扩展的搜索空间分解加速端到端AutoML_VolcanoML Speeding up End-to-End Aut

Docker in Practice, 2nd Edition

Accelerating MATLAB Performance 1001 tips to speed up MATLAB programs

SAR图像压缩采样恢复的GPU并行实现

DevOps Pushing Bugs to Clients Faster.pdf

Accelerating MATLAB Performance - 1001 Tips to Speed Up MATLAB Programs

A comment on &apos;&apos;An efficient common-multiplicand-multiplication method to the Montgomery algorithm for speeding up exponentiation&apos;&apos;

50 Tips and Tricks for MongoDB Developers

关于视频分割的论文 很有价值

A scheduling of database migration in wide-area networks

小学英语英语故事幽默故事Speeding超速

SpeedUpMyPC 破解版6.0.9

Python Data Analysis Cookbook

[.Net] Professional.C#.7.and.NET.Core.2.0英文原版书籍 源码11.8M，epub格式30M，pdf格式40M

LineSimplification:使用 Douglas-Peucker 算法的线简化算法

Learning OpenCV 3

模糊神经网络分类器

最新资源

A comment on ''An efficient common-multiplicand-multiplication method to the Montgomery algorithm for speeding up exponentiation''

关于视频分割的论文很有价值

[.Net] Professional.C#.7.and.NET.Core.2.0英文原版书籍源码11.8M，epub格式30M，pdf格式40M