内存中多版本并发控制的实证评价pdf资源-CSDN文库

需积分: 5 65 浏览量 2021-07-04 13:23:47 上传评论收藏 1.17MB PDF 举报

内存中多版本并发控制（MVCC）是现代数据库管理系统（DBMS）中最流行的事务管理方案。MVCC的基本概念是数据库管理系统维持每个逻辑对象的多个物理版本，以允许在相同对象上进行并行操作。MVCC广泛应用于近十年发布的几乎所有主要的关系型数据库管理系统中。MVCC能够提高事务处理过程中的并行性，同时不牺牲可串行化。但是，由于现代硬件设置中的多核和内存中环境，扩展MVCC并不是一个简单的任务。当有大量线程同时运行时，同步开销可能会抵消多版本的优势。为了理解MVCC在现代硬件设置中处理事务时的性能，研究者们实施了一项广泛的研究，重点研究了并发控制协议、版本存储、垃圾收集和索引管理这四个关键设计决策。他们在一个内存中DBMS中实现了这些方面的最新变体，并使用OLTP工作负载进行了评估。分析中确定了每个设计选择的基本瓶颈。并发控制协议（Concurrency Control Protocol）是数据库管理系统中用于确保数据一致性的重要组件。它定义了多个事务同时访问数据时的规则和行为。MVCC利用其多版本特性来避免读写冲突，允许读操作和写操作几乎不受干扰地并行执行，但这也带来了更高的存储和管理开销。版本存储（Version Storage）指的是数据库管理系统保存数据对象多个版本的方式。有效的版本存储策略能够降低存储空间的消耗和提高数据访问的效率。通常，MVCC数据库系统使用元组级别的版本，因为它在并行性与版本跟踪开销之间提供了良好的平衡。垃圾收集（Garbage Collection）是数据库管理系统自动管理内存的过程，它回收不再被使用的版本空间，以避免内存泄漏和过量消耗。在MVCC环境中，适当的垃圾收集策略对于维持系统的性能至关重要。索引管理（Index Management）负责维护数据库中的索引结构。索引可以提高数据检索的速度，但在多版本环境中，如何高效地更新和维护索引以反映数据的多个版本，是一个挑战。索引管理的设计必须平衡读写操作的性能和索引结构的更新开销。以上这些设计决策的实现和评估对于理解MVCC如何在当前的硬件环境中扩展至关重要。本研究通过实证评价的方式，不仅评估了这些技术在不同工作负载下的表现，而且还识别出了它们在实际应用中可能存在的限制和瓶颈。这对于数据库系统的开发者来说是一个宝贵的参考，因为它们可以根据这些研究结果来优化他们的系统，以更好地满足高并发事务处理的需求。

资源推荐

资源详情

资源评论

An Empirical Evaluation of

In-Memory Multi-Version Concurrency Control

Yingjun Wu Joy Arulraj

National University of Singapore Carnegie Mellon University

yingjun@comp.nus.edu.sg jarulraj@cs.cmu.edu

Jiexi Lin Ran Xian Andrew Pavlo

Carnegie Mellon University Carnegie Mellon University Carnegie Mellon University

jiexil@cs.cmu.edu rxian@cs.cmu.edu pavlo@cs.cmu.edu

ABSTRACT

Multi-version concurrency control (MVCC) is currently the most

popular transaction management scheme in modern database man-

agement systems (DBMSs). Although MVCC was discovered in

the late 1970s, it is used in almost every major relational DBMS

released in the last decade. Maintaining multiple versions of data

potentially increases parallelism without sacriﬁcing serializability

when processing transactions. But scaling MVCC in a multi-core

and in-memory setting is non-trivial: when there are a large number

of threads running in parallel, the synchronization overhead can

outweigh the beneﬁts of multi-versioning.

To understand how MVCC perform when processing transactions

in modern hardware settings, we conduct an extensive study of the

scheme’s four key design decisions: concurrency control protocol,

version storage, garbage collection, and index management. We

implemented state-of-the-art variants of all of these in an in-memory

DBMS and evaluated them using OLTP workloads. Our analysis

identiﬁes the fundamental bottlenecks of each design choice.

1. INTRODUCTION

Computer architecture advancements has led to the rise of multi-

core, in-memory DBMSs that employ efﬁcient transaction man-

agement mechanisms to maximize parallelism without sacriﬁcing

serializability. The most popular scheme used in DBMSs developed

in the last decade is multi-version concurrency control (MVCC). The

basic idea of MVCC is that the DBMS maintains multiple physical

versions of each logical object in the database to allow operations on

the same object to proceed in parallel. These objects can be at any

granularity, but almost every MVCC DBMS uses tuples because it

provides a good balance between parallelism versus the overhead

of version tracking. Multi-versioning allows read-only transactions

to access older versions of tuples without preventing read-write

transactions from simultaneously generating newer versions. Con-

trast this with a single-version system where transactions always

overwrite a tuple with new information whenever they update it.

What is interesting about this trend of recent DBMSs using

MVCC is that the scheme is not new. The ﬁrst mention of it appeared

This work is licensed under the Creative Commons Attribution-

NonCommercial-NoDerivatives 4.0 International License. To view a copy

of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For

any use beyond those covered by this license, obtain permission by emailing

info@vldb.org.

Proceedings of the VLDB Endowment, Vol. 10, No. 7

in a 1979 dissertation [38] and the ﬁrst implementation started in

1981 [22] for the InterBase DBMS (now open-sourced as Firebird).

MVCC is also used in some of the most widely deployed disk-

oriented DBMSs today, including Oracle (since 1984 [4]), Postgres

(since 1985 [41]), and MySQL’s InnoDB engine (since 2001). But

while there are plenty of contemporaries to these older systems

that use a single-version scheme (e.g., IBM DB2, Sybase), almost

every new transactional DBMS eschews this approach in favor of

MVCC [37]. This includes both commercial (e.g., Microsoft Heka-

ton [16], SAP HANA [40], MemSQL [1], NuoDB [3]) and academic

(e.g., HYRISE [21], HyPer [36]) systems.

Despite all these newer systems using MVCC, there is no one

“standard” implementation. There are several design choices that

have different trade-offs and performance behaviors. Until now,

there has not been a comprehensive evaluation of MVCC in a mod-

ern DBMS operating environment. The last extensive study was

in the 1980s [13], but it used simulated workloads running in a

disk-oriented DBMS with a single CPU core. The design choices

of legacy disk-oriented DBMSs are inappropriate for in-memory

DBMSs running on a machine with a large number of CPU cores.

As such, this previous work does not reﬂect recent trends in latch-

free [27] and serializable [20] concurrency control, as well as in-

memory storage [36] and hybrid workloads [40].

In this paper, we perform such a study for key transaction man-

agement design decisions in of MVCC DBMSs: (1) concurrency

control protocol, (2) version storage, (3) garbage collection, and

(4) index management. For each of these topics, we describe the

state-of-the-art implementations for in-memory DBMSs and discuss

their trade-offs. We also highlight the issues that prevent them from

scaling to support larger thread counts and more complex workloads.

As part of this investigation, we implemented all of the approaches

in the

Peloton

[5] in-memory MVCC DBMS. This provides us

with a uniform platform to compare implementations that is not

encumbered by other architecture facets. We deployed Peloton on a

machine with 40 cores and evaluate it using two OLTP benchmarks.

Our analysis identiﬁes the scenarios that stress the implementations

and discuss ways to mitigate them (if it all possible).

2. BACKGROUND

We ﬁrst provide an overview of the high-level concepts of MVCC.

We then discuss the meta-data that the DBMS uses to track transac-

tions and maintain versioning information.

2.1 MVCC Overview

A transaction management scheme permits end-users to access a

database in a multi-programmed fashion while preserving the illu-

781

Table 1: MVCC Implementations

– A summary of the design decisions made for the commercial and research MVCC DBMSs. The year attribute for each

system (except for Oracle) is when it was ﬁrst released or announced. For Oracle, it is the ﬁrst year the system included MVCC. With the exception of Oracle,

MySQL, and Postgres, all of the systems assume that the primary storage location of the database is in memory.

Year Protocol Version Storage Garbage Collection Index Management

Oracle [4] 1984 MV2PL Delta Tuple-level (VAC) Logical Pointers (TupleId)

Postgres [6] 1985 MV2PL/SSI Append-only (O2N) Tuple-level (VAC) Physical Pointers

MySQL-InnoDB [2] 2001 MV2PL Delta Tuple-level (VAC) Logical Pointers (PKey)

HYRISE [21] 2010 MVOCC Append-only (N2O) – Physical Pointers

Hekaton [16] 2011 MVOCC Append-only (O2N) Tuple-level (COOP) Physical Pointers

MemSQL [1] 2012 MVOCC Append-only (N2O) Tuple-level (VAC) Physical Pointers

SAP HANA [28] 2012 MV2PL Time-travel Hybrid Logical Pointers (TupleId)

NuoDB [3] 2013 MV2PL Append-only (N2O) Tuple-level (VAC) Logical Pointers (PKey)

HyPer [36] 2015 MVOCC Delta Transaction-level Logical Pointers (TupleId)

sion that each of them is executing alone on a dedicated system [9].

It ensures the atomicity and isolation guarantees of the DBMS.

There are several advantages of a multi-version system that are

relevant to modern database applications. Foremost is that it can

potentially allow for greater concurrency than a single-version sys-

tem. For example, a MVCC DBMS allows a transaction to read an

older version of an object at the same time that another transaction

updates that same object. This is important in that execute read-only

queries on the database at the same time that read-write transactions

continue to update it. If the DBMS never removes old versions,

then the system can also support “time-travel” operations that allow

an application to query a consistent snapshot of the database as it

existed at some point of time in the past [8].

The above beneﬁts have made MVCC the most popular choice

for new DBMS implemented in recent years. Table 1 provides a

summary of the MVCC implementations from the last three decades.

But there are different ways to implement multi-versioning in a

DBMS that each creates additional computation and storage over-

head. These design decisions are also highly dependent on each

other. Thus, it is non-trivial to discern which ones are better than

others and why. This is especially true for in-memory DBMSs

where disk is no longer the main bottleneck.

In the following sections, we discuss the implementation issues

and performance trade-offs of these design decisions. We then

perform a comprehensive evaluation of them in Sect. 7. We note

that we only consider serializable transaction execution in this paper.

Although logging and recovery is another important aspect of a

DBMS’s architecture, we exclude it from our study because there is

nothing about it that is different from a single-version system and

in-memory DBMS logging is already covered elsewhere [33, 49].

2.2 DBMS Meta-Data

Regardless of its implementation, there is common meta-data that

a MVCC DBMS maintains for transactions and database tuples.

Transactions:

The DBMS assigns a transaction T a unique,

monotonically increasing timestamp as its identiﬁer (T

) when

they ﬁrst enter the system. The concurrency control protocols use

this identiﬁer to mark the tuple versions that a transaction accesses.

Some protocols also use it for the serialization order of transactions.

Tuples:

As shown in Fig. 1, each physical version contains four

meta-data ﬁelds in its header that the DBMS uses to coordinate

the execution of concurrent transactions (some of the concurrency

control protocols discussed in the next section include additional

ﬁelds). The

txn-id

ﬁeld serves as the version’s write lock. Every

tuple has this ﬁeld set to zero when the tuple is not write-locked.

Most DBMSs use a 64-bit

txn-id

so that it can use a single compare-

and-swap (CaS) instruction to atomically update the value. If a

transaction T with identiﬁer T

wants to update a tuple

, then the

DBMS checks whether

’s

txn-id

ﬁeld is zero. If it is, then DBMS

will set the value of

txn-id

to T

using a CaS instruction [27, 44].

begin-ts columns

Content

Header

txn-id end-ts

…

pointer

Figure 1: Tuple Format

– The basic layout of a physical version of a tuple.

Any transaction that attempts to update

is aborted if this

txn-id

ﬁeld is neither zero or not equal to its T

. The next two meta-data

ﬁelds are the

begin-ts

and

end-ts

timestamps that represent the

lifetime of the tuple version. Both ﬁelds are initially set to zero. The

DBMS sets a tuple’s

begin-ts

INF

when the transaction deletes

it. The last meta-data ﬁeld is the

pointer

that stores the address of

the neighboring (previous or next) version (if any).

3. CONCURRENCY CONTROL PROTOCOL

Every DBMS includes a concurrency control protocol that coor-

dinates the execution of concurrent transactions [11]. This protocol

determines (1) whether to allow a transaction to access or modify a

particular tuple version in the database at runtime, and (2) whether to

allow a transaction to commit its modiﬁcations. Although the funda-

mentals of these protocols remain unchanged since the 1980s, their

performance characteristics have changed drastically in a multi-core

and main-memory setting due to the absence of disk operations [42].

As such, there are newer high-performance variants that remove

locks/latches and centralized data structures, and are optimized for

byte-addressable storage.

In this section, we describe the four core concurrency control

protocols for MVCC DBMSs. We only consider protocols that use

tuple-level locking as this is sufﬁcient to ensure serializable exe-

cution. We omit range queries because multi-versioning does not

bring any beneﬁts to phantom prevention [17]. Existing approaches

to provide serializable transaction processing use either (1) addi-

tional latches in the index [35, 44] or (2) extra validation steps when

transactions commit [27].

3.1 Timestamp Ordering (MVTO)

The MVTO algorithm from 1979 is considered to be the original

multi-version concurrency control protocol [38, 39]. The crux of

this approach is to use the transactions’ identiﬁers (T

) to pre-

compute their serialization order. In addition to the ﬁelds described

in Sect. 2.2, the version headers also contain the identiﬁer of the last

transaction that read it (

read-ts

). The DBMS aborts a transaction

that attempts to read or update a version whose write lock is held by

another transaction.

When transaction T invokes a read operation on logical tuple

the DBMS searches for a physical version where T

is in between

the range of the

begin-ts

and

end-ts

ﬁelds. As shown in Fig. 2a,

T is allowed to read version

if its write lock is not held by another

active transaction (i.e., value of

txn-id

is zero or equal to T

)

because MVTO never allows a transaction to read uncommitted

versions. Upon reading

, the DBMS sets

’s

read-ts

ﬁeld to T

if its current value is less than T

. Otherwise, the transaction reads

an older version without updating this ﬁeld.

782

剩余11页未读，继续阅读

评论收藏

内容反馈

木子林_

粉丝: 1560
资源: 2

内存中多版本并发控制的实证评价pdf

数据库并发处理控制 pdf

基于内存模型的Java并发编程.pdf

Java 并发编程实战.pdf

JAVA并发编程艺术 高清pdf

嵌入式数据库SQLite上多版本并发控制的设计与实现.pdf

《java 并发编程实战高清PDF版》

并发控制例子.pdf

Java并发编程实践高清pdf及源码

JAVA并发编程艺术pdf版

一文教你理解MVCC多版本并发控制

内存数据库并发控制优化.pptx

java并发编程实践pdf笔记

JAVA并发编程实践.pdf+高清版+目录 书籍源码

Web应用中并发控制的实现.pdf

数据库并发控制PPT

【文件夹】Go语言并发之道.pdf

高并发系统设计.pdf

Java并发编程实践.pdf

协同CAD系统图档数据库并发控制研究.pdf

mysql多版本并发控制MVCC的实现

基于SQL Server 2000的实用并发控制技术.pdf

（PDF带目录）《Java 并发编程实战》，java并发实战，并发

DataGrip软件包

navicat17安全补丁

数据库课程设计-点餐系统sql文件

DBeaver一款好用的、免费的、开源的的数据库管理工具，可下载

最新资源

JAVA并发编程艺术高清pdf

JAVA并发编程实践.pdf+高清版+目录书籍源码