Table 1: MVCC Implementations
– A summary of the design decisions made for the commercial and research MVCC DBMSs. The year attribute for each
system (except for Oracle) is when it was first released or announced. For Oracle, it is the first year the system included MVCC. With the exception of Oracle,
MySQL, and Postgres, all of the systems assume that the primary storage location of the database is in memory.
Year Protocol Version Storage Garbage Collection Index Management
Oracle [4] 1984 MV2PL Delta Tuple-level (VAC) Logical Pointers (TupleId)
Postgres [6] 1985 MV2PL/SSI Append-only (O2N) Tuple-level (VAC) Physical Pointers
MySQL-InnoDB [2] 2001 MV2PL Delta Tuple-level (VAC) Logical Pointers (PKey)
HYRISE [21] 2010 MVOCC Append-only (N2O) – Physical Pointers
Hekaton [16] 2011 MVOCC Append-only (O2N) Tuple-level (COOP) Physical Pointers
MemSQL [1] 2012 MVOCC Append-only (N2O) Tuple-level (VAC) Physical Pointers
SAP HANA [28] 2012 MV2PL Time-travel Hybrid Logical Pointers (TupleId)
NuoDB [3] 2013 MV2PL Append-only (N2O) Tuple-level (VAC) Logical Pointers (PKey)
HyPer [36] 2015 MVOCC Delta Transaction-level Logical Pointers (TupleId)
sion that each of them is executing alone on a dedicated system [9].
It ensures the atomicity and isolation guarantees of the DBMS.
There are several advantages of a multi-version system that are
relevant to modern database applications. Foremost is that it can
potentially allow for greater concurrency than a single-version sys-
tem. For example, a MVCC DBMS allows a transaction to read an
older version of an object at the same time that another transaction
updates that same object. This is important in that execute read-only
queries on the database at the same time that read-write transactions
continue to update it. If the DBMS never removes old versions,
then the system can also support “time-travel” operations that allow
an application to query a consistent snapshot of the database as it
existed at some point of time in the past [8].
The above benefits have made MVCC the most popular choice
for new DBMS implemented in recent years. Table 1 provides a
summary of the MVCC implementations from the last three decades.
But there are different ways to implement multi-versioning in a
DBMS that each creates additional computation and storage over-
head. These design decisions are also highly dependent on each
other. Thus, it is non-trivial to discern which ones are better than
others and why. This is especially true for in-memory DBMSs
where disk is no longer the main bottleneck.
In the following sections, we discuss the implementation issues
and performance trade-offs of these design decisions. We then
perform a comprehensive evaluation of them in Sect. 7. We note
that we only consider serializable transaction execution in this paper.
Although logging and recovery is another important aspect of a
DBMS’s architecture, we exclude it from our study because there is
nothing about it that is different from a single-version system and
in-memory DBMS logging is already covered elsewhere [33, 49].
2.2 DBMS Meta-Data
Regardless of its implementation, there is common meta-data that
a MVCC DBMS maintains for transactions and database tuples.
Transactions:
The DBMS assigns a transaction T a unique,
monotonically increasing timestamp as its identifier (T
id
) when
they first enter the system. The concurrency control protocols use
this identifier to mark the tuple versions that a transaction accesses.
Some protocols also use it for the serialization order of transactions.
Tuples:
As shown in Fig. 1, each physical version contains four
meta-data fields in its header that the DBMS uses to coordinate
the execution of concurrent transactions (some of the concurrency
control protocols discussed in the next section include additional
fields). The
txn-id
field serves as the version’s write lock. Every
tuple has this field set to zero when the tuple is not write-locked.
Most DBMSs use a 64-bit
txn-id
so that it can use a single compare-
and-swap (CaS) instruction to atomically update the value. If a
transaction T with identifier T
id
wants to update a tuple
A
, then the
DBMS checks whether
A
’s
txn-id
field is zero. If it is, then DBMS
will set the value of
txn-id
to T
id
using a CaS instruction [27, 44].
begin-ts columns
Content
Header
txn-id end-ts
…
pointer
Figure 1: Tuple Format
– The basic layout of a physical version of a tuple.
Any transaction that attempts to update
A
is aborted if this
txn-id
field is neither zero or not equal to its T
id
. The next two meta-data
fields are the
begin-ts
and
end-ts
timestamps that represent the
lifetime of the tuple version. Both fields are initially set to zero. The
DBMS sets a tuple’s
begin-ts
to
INF
when the transaction deletes
it. The last meta-data field is the
pointer
that stores the address of
the neighboring (previous or next) version (if any).
3. CONCURRENCY CONTROL PROTOCOL
Every DBMS includes a concurrency control protocol that coor-
dinates the execution of concurrent transactions [11]. This protocol
determines (1) whether to allow a transaction to access or modify a
particular tuple version in the database at runtime, and (2) whether to
allow a transaction to commit its modifications. Although the funda-
mentals of these protocols remain unchanged since the 1980s, their
performance characteristics have changed drastically in a multi-core
and main-memory setting due to the absence of disk operations [42].
As such, there are newer high-performance variants that remove
locks/latches and centralized data structures, and are optimized for
byte-addressable storage.
In this section, we describe the four core concurrency control
protocols for MVCC DBMSs. We only consider protocols that use
tuple-level locking as this is sufficient to ensure serializable exe-
cution. We omit range queries because multi-versioning does not
bring any benefits to phantom prevention [17]. Existing approaches
to provide serializable transaction processing use either (1) addi-
tional latches in the index [35, 44] or (2) extra validation steps when
transactions commit [27].
3.1 Timestamp Ordering (MVTO)
The MVTO algorithm from 1979 is considered to be the original
multi-version concurrency control protocol [38, 39]. The crux of
this approach is to use the transactions’ identifiers (T
id
) to pre-
compute their serialization order. In addition to the fields described
in Sect. 2.2, the version headers also contain the identifier of the last
transaction that read it (
read-ts
). The DBMS aborts a transaction
that attempts to read or update a version whose write lock is held by
another transaction.
When transaction T invokes a read operation on logical tuple
A
,
the DBMS searches for a physical version where T
id
is in between
the range of the
begin-ts
and
end-ts
fields. As shown in Fig. 2a,
T is allowed to read version
A
x
if its write lock is not held by another
active transaction (i.e., value of
txn-id
is zero or equal to T
id
)
because MVTO never allows a transaction to read uncommitted
versions. Upon reading
A
x
, the DBMS sets
A
x
’s
read-ts
field to T
id
if its current value is less than T
id
. Otherwise, the transaction reads
an older version without updating this field.
782