rocksdb-rof.pdf资源-CSDN文库

需积分: 9 167 浏览量 2021-02-01 16:48:44 上传评论收藏 392KB PDF 举报

RocksDB是一个为快速存储而优化的持久键值存储系统，它特别为基于闪存的固态驱动器（SSDs）而设计。RocksDB继承自LevelDB，提供了高性能，并且设计上非常灵活，以方便被更高级别的应用程序嵌入作为存储引擎。由于固态驱动器（SSDs）的普及，RocksDB得到了广泛的应用，并且在生产环境中变得非常常见。尤其是一些软件栈将RocksDB嵌入为存储引擎，以此优化对块存储的访问。然而，RocksDB的调整是一项复杂的工作，涉及到许多参数，这些参数之间存在不同程度的依赖关系。本文中，作者展示了高度优化的配置可以将性能提高一个数量级。在本文中，作者描述了他们优化RocksDB的经验，这些优化是为了Redis-on-Flash（RoF）——一个商业实现的Redis内存键值存储，该实现使用SSDs作为RAM扩展，以显著增加每节点的有效容量。RoF将热数据存储在RAM中，并使用RocksDB来存储和管理SSD驱动器上的冷数据。作者描述了他们调整RocksDB参数的方法论，并展示了他们的实验和发现（包括正面和负面的调整结果）在两个云服务EC2和GCE上的情况。总体而言，他们展示了调整RocksDB如何改进了RoF的数据库复制时间，提高了超过11倍。作者希望他们的经验能够帮助其他人采用、配置和调整RocksDB，以实现其全部性能潜力。 RocksDB的主要特点包括：专门为快速存储架构，尤其是基于闪存的SSDs；能够提供出色的性能；具有高度的灵活性，便于集成到更高级别的应用程序中作为存储引擎；广泛被嵌入到各种软件栈中以优化对块存储的访问。 RocksDB的优化对于使用Redis-on-Flash（RoF）来说至关重要。RoF是Redis的内存键值存储的一个商业实现，它使用SSDs作为RAM的扩展，从而显著地增加单个节点的有效容量。这种方法的目的是利用SSD的高速存储特性来存储那些不常访问的“冷数据”，而将经常访问的“热数据”保持在内存中。在这样的应用场景下，RocksDB作为后端存储，承担了存储和管理冷数据的任务。文章中还提到了RocksDB的优化方法论，这涉及调整RocksDB的多个参数。这个过程是复杂的，因为参数之间的依赖关系和相互作用可能会以多种不同的方式影响数据库的性能。为了达到最优的性能，需要对RocksDB的参数进行细致的调节。为了验证优化的效果，作者在EC2和GCE这两个云平台进行了实验。通过实验，作者发现优化后的RocksDB配置显著提高了数据库复制时间，具体表现为超过11倍的性能提升。这些成果不仅提高了RoF系统的数据处理能力，也降低了延迟，增加了系统的吞吐量。通过这项工作，作者希望能够帮助其他使用RocksDB的用户更好地配置、调整并优化RocksDB，从而充分挖掘其性能潜力。这份文档也体现了对RocksDB持续改进和优化的重要性，以及在不断变化的硬件和软件环境中保持数据库性能的关键性。从上述内容可以总结出以下知识点： 1. RocksDB是一个专门为快速存储设计的键值存储系统，优化于使用固态硬盘（SSD）。 2. 固态硬盘（SSDs）在当前的存储设备中变得越来越普遍，RocksDB因此得到了广泛的应用，并常被用于生产环境中。 3. 调整RocksDB需要处理多个参数，并考虑到这些参数之间的依赖性和相互作用。 4. 高度优化的RocksDB配置可以显著提升性能，甚至能将性能提高一个数量级。 5. Redis-on-Flash（RoF）是一个商业实现，它使用SSDs作为RAM扩展，利用RocksDB来存储和管理冷数据。 6. 优化RocksDB的实验表明，合理配置能够显著改进RoF的数据库复制时间。 7. 本文的经验对帮助其他用户实现RocksDB的性能潜力具有重要参考价值。 8. RocksDB的优化经验涉及的两个主要云服务是EC2和GCE。 9. 优化RocksDB的成功不仅能够改善数据处理速度，还有助于减少延迟和提升系统吞吐量。这些知识点涵盖了RocksDB存储系统的核心设计原理、优化方法、应用场景以及优化后在实际应用中的性能提升。对于数据库管理员和系统架构师而言，这些知识点对于利用RocksDB达到系统性能最优化具有指导性意义。

资源推荐

资源详情

资源评论

Optimization of RocksDB for Redis on Flash

Keren Ouaknine

Hebrew University

Givat Ram Jerusalem

9190401 Israel

ouaknine@cs.huji.ac.il

Oran Agra

Redis Labs

Habarzel 28 Tel-Aviv

6971040 Israel

oran@redislabs.com

Zvika Guz

Samsung Semiconductor

3655 N 1st st. San Jose CA

95134 USA

zvika.guz@samsung.com

ABSTRACT

RocksDB is a popular key-value store, optimized for fast storage.

With Solid-State Drives (SSDs) becoming prevalent, RocksDB

gained widespread adoption and is now common in production set-

tings. Speciﬁcally, various software stacks embed RocksDB as a

storage engine to optimize access to block storage. Unfortunately,

tuning RocksDB is a complex task, involving many parameters

with different degrees of dependencies. As we show in this pa-

per, a highly tuned conﬁguration can improve performance by an

order of magnitude over the baseline conﬁguration.

In this paper, we describe our experience optimizing RocksDB for

Redis-on-Flash (RoF) – a commercial implementation of the Redis

in-memory key-value store that uses SSDs as RAM extension to

dramatically increase the effective per-node capacity. RoF stores

hot values in RAM, and utilizes RocksDB to store and manage

cold data on SSD drives. We describe our methodology for tun-

ing RocksDB parameters and present our experiments and ﬁnd-

ings (including both positive and negative tuning results) on two

clouds: EC2 and GCE. Overall, we show how tuning RocksDB im-

proved the database replication time for RoF by more than 11x. We

hope that our experience will help others adopt, conﬁgure, and tune

RocksDB in order to realize its full performance potential.

CCS Concepts

•Information systems → Key-value stores; Database perfor-

mance evaluation;

Keywords

Databases, Benchmark, Redis, Rocksdb, Key-Value Store, SSD,

NVMe

1. INTRODUCTION

RocksDB is a persistent key-value (KV) store that was speciﬁcally

architected for fast storage, mainly ﬂash-based SSDs [1]. Forked

from LevelDB [2], RocksDB provides superior performance [3],

and was designed to be highly ﬂexible in order to facilitate embed-

ding as a storage engine by higher-level applications. Indeed, many

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for proﬁt or commercial advantage and that copies bear this notice and the full cita-

tion on the ﬁrst page. Copyrights for components of this work owned by others than

ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-

publish, to post on servers or to redistribute to lists, requires prior speciﬁc permission

and/or a fee. Request permissions from permissions@acm.org.

ICCDA ’17 May 19-23, 2017, Lakeland, FL, USA

 2017 Association for Computing Machinery.

ACM ISBN 978-1-4503-5241-3/17/05. .. $15.00

http://dx.doi.org/10.1145/3093241.3093278

large-scale production applications use RocksDB to manage stor-

age, leveraging its high performance to mitigate the ever-growing

pressure on the storage-system [4].

Unfortunately, RocksDB ﬂexibility and superior performance come

at a cost: tuning RocksDB is a complex task that involves more than

a hundred parameters with varying levels of inter-dependencies.

Furthermore, “while recent changes have made RocksDB better, it

is much harder to conﬁgure than LevelDB”; too often poor results

“are caused by misconﬁguration” [5].

The main questions raised when operating with RocksDB are: (1)

which conﬁguration parameters should be used for which hardware

and under what workload? (2) what are the optimal values for these

parameters? (3) are parameters interdependent (i.e., tuning param-

eter a works if and only if parameters b, c and d have certain val-

ues)? (4) will the positive optimization from two different tunings

cumulate or negate when brought together? Last but not least, (5)

what, if any, are the side effects of these optimizations?

This paper seeks to answer these questions by sharing our ex-

perience optimizing RocksDB in the context of Redis-on-Flash

(RoF) [6, 7] – a commercial extension to the popular Redis in-

memory key value store [8]. RoF uses SSDs as a RAM extension

to provide competitive performance to the in-memory Redis vari-

ant while dramatically increasing the effective dataset capacity that

can be stored on a single server. In RoF, hot values are saved in

RAM, while cold-values are saved in SSDs and are managed by

RocksDB (See Section 2.2). Because RocksDB handles all of RoF

accesses to storage, its performance plays a major role in the over-

all system performance, especially for use cases with low access

locality. Since RoF aims to provide competitive performance to

the pure-RAM Redis variant, tuning RocksDB proved to be a key

challenge.

During the process of tuning RocksDB for the RoF case, we ana-

lyzed a large set of parameters and experimented with their impact

on the performance for several different workloads – database repli-

cation, a write-only workload, and a 50-50 read:write workload.

To verify the robustness of our settings across different hardware

setups, we run all experiments in both Amazon Elastic Compute

Cloud (EC2) and Google Compute Engine (GCE). Overall, our tun-

ing reduced the time needed to replicated a node by more than 11x.

The bulk of this paper describes the methodology, tuning process,

and speciﬁc parameters settings that lead us to this result.

In Section 3, we describe our methodology and explain the experi-

ments process. Then, in Section 4 we detail the parameters tuning

that had the largest positive effect on performance. We also specif-

ically list parameters for which we expected performance improve-

ment but instead either reduced performance or had other negative

side effects, as we believe this information will be useful to others.

While our experiments were done in the context of RoF, we expect

similar systems to require similar conﬁguration, and hope that our

methodology and results will help reduce the tuning time beyond

the scope of this speciﬁc study.

In summary, this paper makes the following contributions:

• We present our RocksDB benchmark results and analysis for

several workloads under the two main clouds: EC2 and GCE,

• We describe our tuning process, the parameters that had the

largest positive effect on performance, and their optimal set-

tings; and

• We describe negative results for tuning efforts that either did

not pan out, reduced performance, or had non-intuitive side

effects.

2. BACKGROUND

This section brieﬂy overviews Redis, Redis on Flash, and

RocksDB. It describes the high-level architecture of these systems

and provides the necessary background for understanding the de-

tails brought up in the rest of the paper.

2.1 Redis

Redis (Remote Dictionary Server) [8] is a popular open-source in-

memory key-value store that provides advanced Key-Value abstrac-

tion. Redis is single-threaded, it handles a command from just one

client at a time in the process’ main thread. Unlike traditional KV

systems where keys are of a simple data type (usually strings), keys

in Redis can function as complex data types such as hashes, lists,

sets, and sorted sets. Furthermore, Redis enables complex atomic

operations on these data types (e.g., enqueuing and dequeuing from

a list, inserting a new value with a given score to a sorted set, etc.).

Redis abstraction and high ingestion speed have proven to be partic-

ularly useful for many latency-sensitive tasks. Consequently, Redis

has gained wide-spread adoption, and is used by a growing number

of companies in production setting [9].

Redis supports high availability and persistence. High availability

is achieved by replicating the data from the master nodes to the

slave nodes and syncing them. When a master process fails, its

corresponding slave process is ready to take over following a pro-

cess called failover. Persistence can be conﬁgured by either one of

the following two options: (1) using a point-in-time snapshot ﬁle

called RDB (Redis Database), or (2) using a change log ﬁle called

AOF (Append Only File). Note that these three mechanisms (AOF

rewrite, RDB snapshot, and replication) rely on a fork to acquire

a point in time snapshot of the process memory and serializing it

(while the main process keeps serving client commands).

2.2 Redis on Flash

In-memory databases like Redis store their data in DRAM. This

makes them fast yet expensive, because (1) DRAM capacity per

node is limited, and (2) DRAM price per GB is relatively high. Re-

dis on Flash (RoF) [6, 7] is a commercial extension to Redis that

uses SSDs as RAM extension to dramatically increase the effec-

tive dataset capacity on a single server. RoF is fully compatible

with the open-source Redis and implements the entire Redis com-

mand set and features. RoF uses the same mechanisms as Redis to

provide high-availability and persistence rather than relying on the

non-volatile property of ﬂash.

RoF keeps hot values in RAM and evicts cold values to the ﬂash

drives. It utilizes RocksDB as its storage engine: all drives are

managed by RocksDB, and accesses to values on the drives are

Figure 1: Illustration of the compaction process

done via RocksDB interface. When a client requests a cold value,

the request is temporally blocked while a designated RoF I/O thread

submits the I/O request to RocksDB. During this time, the main

Redis thread serves incoming requests from other clients.

2.3 RocksDB

RocksDB [10] is an open source key-value store implemented in

C++. It supports operations such as get, put, delete, and scan

of key-values. RocksDB can ingest massive amounts of data. It

uses SST (sorted static tables) ﬁles to store the data on NVMes,

SATA SSDs, or spinning disks while aiming to minimize latency.

RocksDB uses bloom ﬁlters to determine the presence of a key in

a SST ﬁle. It avoids the cost of random writes by cumulating data

to memtables in RAM, and then ﬂushing them to disks in bulks.

RocksDB ﬁles are immutable: once created they are never over-

written. Records are not updated nor deleted, and instead new ﬁles

are created. This generates redundant data on disk, and requires

regular database compaction. Compaction of ﬁles remove dupli-

cate keys and process key deletions to free up space as shown in

Figure 1.

2.3.1 RocksDB Architecture

RocksDB organizes data in different sorted runs called levels. Each

level has a target size. The target size of levels increases by the

same size multiplier (default x10). Therefore, if the target size of

level 1 is 1GB, the target size of level 2, 3, 4 will be 10GB, 100GB,

and 1000GB. A key can appear on multiple levels, but the most up-

to-date value is located higher in the levels hierarchy as older keys

are pushed down during compaction.

RocksDB initially stores new writes in RAM using memtables.

When a memtable is ﬁlled up, it is converted to an immutable

memtable and inserted into the ﬂush pipeline at which point a new

memtable is allocated for the following writes. Level 0 is an exact

copy of the memtables. When level 0 ﬁll ups, the data is compacted,

i.e. pushed down to the deeper levels. The compaction process ap-

plies to all levels, and merges ﬁles together from level N to level

N+1 as shown in Figure 1.

2.3.2 Ampliﬁcation factors

We measured the impact of the optimizations by monitoring the

throughput and duration of the experiments under various work-

剩余6页未读，继续阅读

评论收藏

内容反馈

边城水手

粉丝: 113
资源: 35

rocksdb-rof.pdf

RocksDB 写入流程详解.docx

rocksdbjni-5.18.4-API文档-中文版.zip

ROF-system.rar_ROF系统_rof_光载无线_光载无线通信_光通信

网络技术-网络基础-ROF无线接入技术研究.pdf

基于偏振复用和反射式半导体光放大器的WDM-RoF-PON系统设计.pdf

ROF-Model-FFT-Transform.rar_rof

RoF.zip_Links_OFDM-ROF_Radio over fiber_evm ofdm_ofdm

ROF.rar_Over_Radio over fiber_fiber radio_rof

论文研究-ROF系统中毫米波生成技术的研究 .pdf

LaserFeed-forward-rof-ieee-2010_ieeepaperrof_

论文研究-ROF系统中毫米波光学生成方法的研究 .pdf

l1magic-1.1.zip_1范数 全变分_L1 TV最小化_TV 最小化_TV模型_l1最小值

论文研究-RoF链路中载波相位噪声对矢量信号的影响及抑制 .pdf

2012-Dsgn-Sim-RoF-OptiSys_rof_Simulation_design_

redis-6.0.7-x64-for-windows-bin.zip

rof.py

Color_Deblurring_ROF.zip_Deblurring_Deblurring color_HNO.ZIP_col

多模光纤RoF系统的模式噪声.pdf

rof.zip_beltri3_rof_去雾_图像去噪_图像去雾

ROF_Denosing-master.zip_图形图像处理_matlab_

适用于RoF系统的高速串行接口的FPGA设计(本期优秀论文).pdf

最新资源

l1magic-1.1.zip_1范数全变分_L1 TV最小化_TV 最小化_TV模型_l1最小值