RAMCloudpaper资源-CSDN文库

需积分: 10 107 浏览量 2015-02-24 14:20:57 上传评论收藏 728KB PDF 举报

### RAMCloud存储系统：高性能键值存储解决方案 #### 概述 RAMCloud是一种创新的存储系统，它通过在服务器的动态随机存取内存（DRAM）中存储数据来提供低延迟的大规模数据集访问。为了支持大规模容量（1PB或更多），RAMCloud聚合了数千台服务器的内存资源，形成一个统一的键值存储。该系统确保了基于DRAM的数据持久性，通过在辅助存储上保持备份副本，并采用了一种统一的日志结构化机制来管理DRAM和辅助存储，从而实现高性能和高效的内存使用。 #### 通信与性能 RAMCloud采用了基于轮询的通信方式，绕过了内核直接与网络接口卡（NIC）进行通信。这种方式下，客户端应用可以在不到5微秒的时间内从任何RAMCloud存储服务器读取小对象；对于小对象的持久写入操作则大约需要15微秒。值得注意的是，RAMCloud在线状态下不保留多个数据副本，而是通过快速恢复机制（1到2秒内）来确保高可用性。这一恢复机制充分利用了整个集群的并发资源，因此其性能随集群规模增长而提升。 #### 技术细节 RAMCloud的设计涵盖了操作系统中的组织与设计、存储管理（主内存与辅助存储）、分布式内存以及可靠性方面的问题。特别是在设计、实验、性能和可靠性方面进行了深入的研究。该系统的关键词包括数据中心、大规模系统、低延迟和存储系统等，这反映了其在现代大规模分布式环境中的应用前景。 #### 引言部分分析 RAMCloud的引言部分提到了DRAM及其前身——磁芯存储器，在存储系统中的重要作用。例如，20世纪70年代早期版本的UNIX就利用内存中的缓冲区缓存来提高文件系统的性能。过去15年间，随着大型Web应用对大规模数据集处理需求的增加，DRAM在存储系统中的应用显著加速。这些应用处理的数据量之大、访问频率之高，传统磁盘存储无法满足其性能要求。 #### 核心特点 - **低延迟访问**：通过将所有数据存储在DRAM中，RAMCloud能够提供极低延迟的数据访问服务。 - **大规模容量支持**：通过聚合数千台服务器的内存资源，支持PB级别的存储容量。 - **高效的数据管理**：采用日志结构化的管理机制，优化了DRAM和辅助存储的使用效率。 - **数据持久性和高可用性**：通过在辅助存储上保存数据副本确保数据的持久性，并通过快速恢复机制提高系统可用性。 - **高性能通信机制**：通过绕过内核直接与网络接口卡通信的方式，显著提升了数据读写的性能。 #### 结论 RAMCloud作为一种高性能键值存储解决方案，通过一系列创新技术实现了对大规模数据集的低延迟访问和支持。它的设计思路不仅解决了当前大规模数据处理中的性能瓶颈问题，还为未来的数据中心提供了重要的参考模型。随着硬件技术的发展和应用需求的增长，RAMCloud有望成为未来高性能存储领域的关键技术之一。

资源推荐

资源详情

资源评论

The RAMCloud Storage System

JOHN OUSTERHOUT, ARJUN GOPALAN, ASHISH GUPTA, ANKITA KEJRIWAL,

COLLIN LEE, BEHNAM MONTAZERI, DIEGO ONGARO, SEO JIN PARK, HENRY QIN,

MENDEL ROSENBLUM, STEPHEN RUMBLE, and RYAN STUTSMAN, Stanford University

RAMCloud is a storage system that provides low-latency access to large-scale datasets. To achieve low

latency, RAMCloud stores all data in DRAM at all times. To support large capacities (1 PB or more), it

aggregates the memories of thousands of servers into a single coherent key-value store. RAMCloud ensures

the durability of DRAM-based data by keeping backup copies on secondary storage. It uses a uniform log-

structured mechanism to manage both DRAM and secondary storage, which results in high performance and

efﬁcient memory usage. RAMCloud uses a polling-based approach to communication, bypassing the kernel

to communicate directly with NICs; with this approach, client applications can read small objects from any

RAMCloud storage server in less than 5 µs; durable writes of small objects take about 15 µs. RAMCloud does

not keep multiple copies of data online; instead, it provides high availability by recovering from crashes very

quickly (1–2 seconds). RAMCloud’s crash recovery mechanism harnesses the resources of the entire cluster

working concurrently, so that its performance scales with cluster size.

Categories and Subject Descriptors: D.4.7 [Operating Systems]: Organization and Design—Distributed;

D.4.2 [Operating Systems]: StorageManagement—Main memory; Secondary storage; Distributed memo-

ries; D.4.5 [Operating Systems]: Reliability—Fault-tolerance

General Terms: Design, Experimentation, Performance, Reliability

Additional Key Words and Phrases: Datacenters, large-scale systems, low latency, storage systems

1. INTRODUCTION

DRAM and its predecessor, core memory, have played an important role in storage

systems since the earliest days of operating systems. For example, early versions of

UNIX in the 1970s used a cache of buffers in memory to improve ﬁle system per-

formance [Ritchie and Thompson 1974]. Over the last 15 years the use of DRAM in

storage systems has accelerated, driven by the needs of large-scale Web applications.

These applications manipulate very large datasets with an intensity that cannot be

satisﬁed by disk and ﬂash alone. As a result, applications are keeping more and more

of their long-term data in DRAM. By 2005 all of the major Web search engines kept

their search indexes entirely in DRAM, and large-scale caching systems such as mem-

cached [mem 2011] have become widely used for applications such as Facebook, Twit-

ter, Wikipedia, and YouTube.

Although DRAM’s role is increasing, it is still difﬁcult for application developers to

capture the full performance potential of DRAM-based storage. In many cases DRAM

is used as a cache for some other storage system such as a database; this approach

forces developers to manage consistency between the cache and the backing store, and

its performance is limited by cache misses and backing store overheads. In other cases,

DRAM is managed in an application-speciﬁc fashion, which provides high performance

This work was supported by the Gigascale Systems Research Center and the Multiscale Systems Center

(two of six research centers funded under the Focus Center Research Program, a Semiconductor Research

Corporation program), by C-FAR (one of six centers of STARnet, a Semiconductor Research Corporation pro-

gram, sponsored by MARCO and DARPA), by the National Science Foundation under grant No. 096385, and

by Stanford Experimental Data Center Laboratory afﬁliates Cisco, Emulex, Facebook, Google, Inventec,

Mellanox, NEC, NetApp, Samsung, SAP, and VMware. Stephen Rumble was supported by a Natural Sci-

ences and Engineering Research Council of Canada Postgraduate Scholarship. Diego Ongaro was supported

by The Junglee Corporation Stanford Graduate Fellowship

Author’s addresses: TBD.

This document is currently under submission for publication. It can be cited as “Stanford Technical Report,

October 2014.”

ACM Transactions on Computer Systems, Vol. ??, No. ??, Article 1, Publication date: March ??.

1:2 J. Ousterhout et al.

but at a high complexity cost for developers. A few recent systems such as Redis [red

2014] and Cassandra [cas 2014] have begun to provide general-purpose facilities for

accessing data in DRAM, but their performance does not approach the full potential of

DRAM-based storage.

This paper describes RAMCloud, a general-purpose storage system that keeps all

data in DRAM at all times. RAMCloud combines three overall attributes: low latency,

large scale, and durability. When used with leading-edge networking, RAMCloud of-

fers exceptionally low latency for remote access. In our 80-node development cluster, a

client can read any 100-byte object in less than 5 µs, and durable writes take about 15

µs. In a large datacenter with 100,000 nodes, we expect small reads to complete in less

than 10 µs, which is 50-1000x faster than existing storage systems.

RAMCloud’s second attribute is large scale. In order to support future Web appli-

cations, we designed RAMCloud to allow clusters to grow to at least 10,000 servers.

RAMCloud aggregates all of their memories into a single coherent key-value store.

This allows storage capacities of 1 PB or more.

The third attribute of RAMCloud is durability. Although RAMCloud keeps all data in

DRAM, it also maintains backup copies of data on secondary storage to ensure a high

level of durability and availability. This frees application developers from the need to

manage a separate durable storage system, or to maintain consistency between in-

memory and durable storage.

We hope that low-latency storage systems such as RAMCloud will stimulate the de-

velopment of a new class of applications that manipulate large-scale datasets more

intensively than is currently possible. Section 2 motivates RAMCloud by showing how

the high latency of current storage systems limits large-scale applications, and it spec-

ulates about new applications that might be enabled by RAMCloud.

Sections 3–9 present the RAMCloud architecture from three different angles that

address the issues of latency, scale, and durability:

Storage management. RAMCloud uses a uniﬁed log-structured approach for man-

aging data both in memory and on secondary storage. This allows backup copies to

be made efﬁciently, so that RAMCloud can provide the durability of replicated disk

and the low latency of DRAM. The log-structured approach also simpliﬁes crash re-

covery and utilizes DRAM twice as efﬁciently as traditional storage allocators such

as malloc. RAMCloud uses a unique two-level approach to log cleaning, which max-

imizes DRAM space utilization while minimizing I/O bandwidth requirements for

secondary storage.

Latency. RAMCloud avoids the overheads associated with kernel calls and inter-

rupts by communicating directly with the NIC to send and receive packets, and

by using a polling approach to wait for incoming packets. Our greatest challenge

in achieving low latency has been ﬁnding a suitable threading architecture; our

current implementation pays a signiﬁcant latency penalty in order to provide an

acceptable level of ﬂexibility.

Crash recovery. RAMCloud takes advantage of the system’s scale to recover quickly

from server crashes. It does this by scattering backup data across the entire cluster

and using hundreds of servers working concurrently to recover data from backups

after crashes. Crash recovery is fast enough (typically 1-2 seconds) to provide a high

degree of availability without keeping redundant copies of data online in DRAM.

We have implemented all of the features described in this paper in a working sys-

tem, which we hope is of high enough quality to be used for real applications. The

RAMCloud source code is freely available. This paper corresponds to RAMCloud 1.0 as

of September 2014. Table I summarizes a few key performance measurements; these

are discussed in more detail in the rest of the paper.

ACM Transactions on Computer Systems, Vol. ??, No. ??, Article 1, Publication date: March ??.

The RAMCloud Storage System 1:3

Table I. Selected performance metrics for RAMCloud.

Read latency (100-byte objects, one client, unloaded server) 4.7 µs

Read bandwidth (1 MB objects, one client, unloaded server) 2.7 GB/sec

Write latency (100-byte objects, one client, unloaded server) 15.0 µs

Write bandwidth (1 MB objects, one client, unloaded server) 430 MB/sec

Read throughput (100-byte objects, many clients, single server) 900 Kobjects/sec

Multi-read throughput (100-byte objects, many clients, one server) 6 Mobjects/s

Multi-write throughput (100-byte objects, many clients, one server) 450 Kobjects/s

Crash recovery throughput (per server, unloaded) 800 MB/s or

2.3 Mobjects/s

Crash recovery time (40 GB data, 80 servers) 1.9 s

A few themes appear repeatedly in our presentation of RAMCloud. The ﬁrst theme is

the use of randomization. In order for RAMCloud to be scalable, it must avoid central-

ized functionality wherever possible, and we have found randomization to be a pow-

erful tool for creating simple yet effective distributed algorithms. The second theme

is that we have attempted throughout the system to minimize the number of distinct

error cases that must be handled, in order to reduce the complexity of fault tolerance.

Section 6 will discuss how this often means handling errors at a very high level or a

very low level. Third, the design of the system has been inﬂuenced in several ways

by scaling in underlying technologies such as memory capacity and network speed.

The impact of technology is particulary severe when technologies evolve at different

rates. Section 2 discusses how uneven scaling motivated the creation of RAMCloud,

and Section 10 describes how it also limits the system.

2. WHY DOES LOW LATENCY MATTER?

There are several motivations for RAMCloud [Ousterhout et al. 2011], but the most

important one is to enable a new class of applications by creating a storage system

with dramatically lower latency than existing systems. Figure 1 illustrates why stor-

age latency is an important issue for large-scale Web applications. Before the rise of

the Web, applications were typically run by loading the application code and all of its

data into the memory of a single machine (see Figure 1(a)). This allows the application

to access its data at main memory speeds (typically 50-100 ns); as a result, applications

using this approach can perform intensive data manipulation while still providing in-

teractive response to users. However, this approach limits application throughput to

the capacity of a single machine.

The Web has led to the creation of new applications that support 1M–1B users; these

applications cannot possibly use the single-machine approach of Figure 1(a). Instead,

Web applications run on hundreds or thousands of servers in a datacenter, as shown

in Figure 1(b). The servers are typically divided into two groups: one group services

incoming HTTP requests from browsers, while the other group stores the application’s

data. Web applications typically use a stateless approach where the application servers

do not retain data between browser requests: each request fetches the data that it

needs from storage servers and discards that data once a response has been returned

to the browser. The latency for each fetch varies from a few hundred microseconds

to 10ms or more, depending on the network speed and whether the data is stored in

memory, ﬂash, or disk on the storage server.

Unfortunately, the environment for Web applications has not scaled uniformly com-

pared to the single-machine environment. Total CPU power available to a Web appli-

cation has improved by a factor of 1000x or more in comparison to single-server ap-

plications, and total storage capacity has also improved by a factor of 1000x or more,

ACM Transactions on Computer Systems, Vol. ??, No. ??, Article 1, Publication date: March ??.

1:4 J. Ousterhout et al.

App.

Logic

Storage Servers

Application Servers

Datacenter

0.2-10ms latency

(a) (b)

Fig. 1. In a traditional application (a) the application’s data structures reside in memory on the same

machine containing the application logic and user interface code; the latency for an application to access its

data is determined by the last-level cache miss time (50-100 ns). In a scalable Web application (b) the data is

stored on separate servers from the application logic and user interface code; the latency for an application

to access data over the network ranges from 200-300 µs (if data is cached in the storage server’s DRAM) to

10 ms or more (if data is on disk).

but the latency for an application to access its own data has degraded by 3-5 orders

of magnitude. In addition, throughput has not scaled: if an application makes small

random read requests, the total throughput of a few thousand storage servers in the

conﬁguration of Figure 1(b) is not much more than that of a single server in the conﬁg-

uration of Figure 1(a)! As a result, Web applications can serve large user communities,

and they can store large amounts of data, but they cannot use very much data when

processing a given browser request.

When we began the RAMCloud project in 2009, Facebook used a server structure

similar to that in Figure 1(b) and it was experiencing the problems associated with

high latency [Johnson and Rothschild 2009]. Facebook used MySQL database servers

as the primary repository for its user data. However, these servers could not meet

the needs of the application servers in terms of either latency or throughput, so they

had been supplemented with memcached servers that cached recent query results in

DRAM. By 2009, Facebook had approximately 4000 MySQL servers and 2000 mem-

cached servers. The latency for memcached requests was around 300 µs, and the over-

all hit rate for data in memcached was about 96.5%.

Even so, the high latency of data access limited the functionality of Facebook appli-

cations and created complexity for developers. In order to provide acceptable response

times for users, a Facebook application server could only make 100-150 sequential re-

quests for data (either memcached or MySQL) while servicing a given browser request.

Unfortunately, this limited the functionality that could be provided to users. To get

past this limitation, Facebook applications made concurrent requests whenever possi-

ble. In addition, Facebook created materialized views that aggregated larger amounts

of data in each memcached object, in the hopes of retrieving more useful data with

each request. However, these optimizations added considerable complexity to applica-

tion development. For example, the materialized views introduced consistency prob-

lems: it was difﬁcult to identify all of the memcached objects to invalidate when data

was changed in a MySQL server. Even with these optimizations, applications were still

limited in the amount of data they can access.

There do exist scalable frameworks that can manipulate large amounts of data, such

as MapReduce [Dean and Ghemawat 2008] and Spark [Zaharia et al. 2012]. However,

these frameworks require data to be accessed in large sequential blocks in order to hide

latency. As a result, these frameworks are typically used for batch jobs that run for

ACM Transactions on Computer Systems, Vol. ??, No. ??, Article 1, Publication date: March ??.

The RAMCloud Storage System 1:5

minutes or hours; they are not suitable for online use in large-scale Web applications

or for applications that require random access.

Our goal for RAMCloud is to achieve the lowest possible latency for small random

accesses in large-scale applications; today this is around 5 µs for small clusters and

10 µs in a large datacenter. This represents an improvement of 50-1000x over typical

storage systems used by Web applications today.

We hypothesize that low latencies will simplify the development of data-intensive

applications like Facebook and enable a new class of applications that manipulate

large data sets even more intensively. The new applications cannot exist today, since

no existing storage system could meet their needs, so we can only speculate about

their nature. We believe they will have two overall characteristics: (a) they will access

large amounts of data in an irregular fashion (applications such as graph processing

or large-scale machine learning could be candidates), and (b) they will operate at in-

teractive timescales (tens to hundreds of milliseconds).

One possible application area for a system such as RAMCloud is collaboration at

large scale. As a baseline, Facebook offers collaboration at small scale. It creates a

“region of consciousness” for each user of a few dozen up to a few hundred friends:

each user ﬁnds out instantly about status changes for any of his or her friends. In

the future, applications may enable collaboration at a much larger scale. For example,

consider the morning commute in a major metropolitan area in the year 2025. All of

the cars will be self-driving, moving at high speed in tightly-packed caravans. In a

single metropolitan area there may be a million or more cars on the road at once; each

car’s behavior will be affected by thousands of cars in its vicinity, and the region of

consciousness for one car could include 50,000 or more other cars over the duration

of a commute. A transportation system like this is likely to be controlled by a large-

scale datacenter application, and the application is likely to need a storage system

with extraordinarily low latency to disseminate large amounts of information in an

irregular fashion among agents for the various cars.

3. RAMCLOUD ARCHITECTURE

In order to foster a new breed of data-intensive applications, RAMCloud implements

a new class of storage that provides uniform low-latency access to very large datasets,

and it ensures data durability so that developers do not have to manage a separate

backing store. This section describes the overall architecture of the RAMCloud sys-

tem, including the key-value data model offered to applications and the server-based

organization of the system.

3.1. Data model

RAMCloud’s data model is a key-value store, with a few extensions. We chose this

data model because it is general-purpose enough to support a variety of applications,

yet simple enough to yield a low latency implementation. We tried to avoid features

that limit the system’s scalability. For example, if RAMCloud were to assign a unique

sequential key to each new object in a table, it would require all insertions for the table

to pass through a single server; this feature is not scalable because the overall write

throughput for the table could not be increased by adding servers. Thus, RAMCloud

does not assign unique sequential keys.

Data in RAMCloud is divided into tables, each of which is identiﬁed by a unique

textual name and a unique 64-bit identiﬁer. A table contains any number of objects,

each of which contains the following information:

— A variable-length key, up to 64 KB, which must be unique within its table. We ini-

tially used ﬁxed-length 64-bit values for keys, but found that most applications need

ACM Transactions on Computer Systems, Vol. ??, No. ??, Article 1, Publication date: March ??.

剩余53页未读，继续阅读

评论收藏

内容反馈

cc_wx

粉丝: 0
资源: 3

RAMCloud paper

最新资源

RAMCloud paper

paper

paper_experiments

paper-and-dustbin

RAMCloud源码下载

ramcloud 读测试结果

浅议内存云（RAMCloud）的未来发展

RamCloud:修改后的RamCloud源代码镜像-Source code modification

ramcloud-sqlite3:Sqlite3 VFS 驱动程序使用 RAMCloud 作为后备存储

paper_review

scientific-paper

YOLO paper

jscsi paper

Cotan_paper

介绍RDMA的文章

后Hadoop时代的大数据架构

IEEE paper

paper_learning

paper pass

hci paper

paper_msg

面向内存云的数据块索引方法

raft中文版

Raft论文中文翻译版

面向内存云的协调器选举策略

最新资源