没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
Readings
in
Database
Systems
Fifth Edition
edited by
Peter Bailis
Joseph M. Hellerstein
Michael Stonebraker
Readings in Database Systems
Fifth Edition (2015)
edited by Peter Bailis, Joseph M. Hellerstein, and Michael Stonebraker
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International
http://www.redbook.io/
Contents
Preface 3
Background Introduced by Michael Stonebraker 4
Traditional RDBMS Systems Introduced by Michael Stonebraker 6
Techniques Everyone Should Know Introduced by Peter Bailis 8
New DBMS Architectures Introduced by Michael Stonebraker 12
Large-Scale Dataflow Engines Introduced by Peter Bailis 14
Weak Isolation and Distribution Introduced by Peter Bailis 18
Query Optimization Introduced by Joe Hellerstein 22
Interactive Analytics Introduced by Joe Hellerstein 25
Languages Introduced by Joe Hellerstein 29
Web Data Introduced by Peter Bailis 33
A Biased Take on a Moving Target: Complex Analytics by Michael Stonebraker 35
A Biased Take on a Moving Target: Data Integration by Michael Stonebraker 40
List of All Readings 44
References 46
2
Readings in Database Systems, 5th Edition (2015)
Preface
In the ten years since the previous edition of Read-
ings in Database Systems, the field of data management
has exploded. Database and data-intensive systems to-
day operate over unprecedented volumes of data, fueled
in large part by the rise of “Big Data” and massive de-
creases in the cost of storage and computation. Cloud
computing and microarchitectural trends have made dis-
tribution and parallelism nearly ubiquitous concerns.
Data is collected from an increasing variety of hetero-
geneous formats and sources in increasing volume, and
utilized for an ever increasing range of tasks. As a re-
sult, commodity database systems have evolved consid-
erably along several dimensions, from the use of new
storage media and processor designs, up through query
processing architectures, programming interfaces, and
emerging application requirements in both transaction
processing and analytics. It is an exciting time, with
considerable churn in the marketplace and many new
ideas from research.
In this time of rapid change, our update to the tradi-
tional “Red Book” is intended to provide both a ground-
ing in the core concepts of the field as well as a commen-
tary on selected trends. Some new technologies bear
striking resemblance to predecessors of decades past,
and we think it’s useful for our readers to be familiar
with the primary sources. At the same time, technology
trends are necessitating a re-evaluation of almost all di-
mensions of database systems, and many classic designs
are in need of revision. Our goal in this collection is
to surface important long-term lessons and foundational
designs, and highlight the new ideas we believe are most
novel and relevant.
Accordingly, we have chosen a mix of classic, tradi-
tional papers from the early database literature as well as
papers that have been most influential in recent develop-
ments, including transaction processing, query process-
ing, advanced analytics, Web data, and language design.
Along with each chapter, we have included a short com-
mentary introducing the papers and describing why we
selected each. Each commentary is authored by one of
the editors, but all editors provided input; we hope the
commentaries do not lack for opinion.
When selecting readings, we sought topics and pa-
pers that met a core set of criteria. First, each selec-
tion represents a major trend in data management, as
evidenced by both research interest and market demand.
Second, each selection is canonical or near-canonical;
we sought the most representative paper for each topic.
Third, each selection is a primary source. There are
good surveys on many of the topics in this collection,
which we reference in commentaries. However, read-
ing primary sources provides historical context, gives
the reader exposure to the thinking that shaped influen-
tial solutions, and helps ensure that our readers are well-
grounded in the field. Finally, this collection represents
our current tastes about what is “most important”; we
expect our readers to view this collection with a critical
eye.
One major departure from previous editions of the
Red Book is the way we have treated the final two sec-
tions on Analytics and Data Integration. It’s clear in
both research and the marketplace that these are two of
the biggest problems in data management today. They
are also quickly-evolving topics in both research and in
practice. Given this state of flux, we found that we had
a hard time agreeing on “canonical” readings for these
topics. Under the circumstances, we decided to omit of-
ficial readings but instead offer commentary. This obvi-
ously results in a highly biased view of what’s happen-
ing in the field. So we do not recommend these sections
as the kind of “required reading” that the Red Book has
traditionally tried to offer. Instead, we are treating these
as optional end-matter: “Biased Views on Moving Tar-
gets”. Readers are cautioned to take these two sections
with a grain of salt (even larger that the one used for the
rest of the book.)
We are releasing this edition of the Red Book free
of charge, with a permissive license on our text that al-
lows unlimited non-commercial re-distribution, in mul-
tiple formats. Rather than secure rights to the rec-
ommended papers, we have simply provided links to
Google Scholar searches that should help the reader lo-
cate the relevant papers. We expect this electronic for-
mat to allow more frequent editions of the “book.” We
plan to evolve the collection as appropriate.
A final note: this collection has been alive since
1988, and we expect it to have a long future life. Ac-
cordingly, we have added a modicum of “young blood”
to the gray beard editors. As appropriate, the editors of
this collection may further evolve over time.
Peter Bailis
Joseph M. Hellerstein
Michael Stonebraker
3
Readings in Database Systems, 5th Edition (2015)
Chapter 1: Background
Introduced by Michael Stonebraker
Selected Readings:
Joseph M. Hellerstein and Michael Stonebraker. What Goes Around Comes Around. Readings in Database
Systems, 4th Edition (2005).
Joseph M. Hellerstein, Michael Stonebraker, James Hamilton. Architecture of a Database System. Foundations
and Trends in Databases, 1, 2 (2007).
I am amazed that these two papers were written a
mere decade ago! My amazement about the anatomy
paper is that the details have changed a lot just a few
years later. My amazement about the data model paper
is that nobody ever seems to learn anything from history.
Lets talk about the data model paper first.
A decade ago, the buzz was all XML. Vendors were
intent on adding XML to their relational engines. In-
dustry analysts (and more than a few researchers) were
touting XML as “the next big thing”. A decade later it
is a niche product, and the field has moved on. In my
opinion, (as predicted in the paper) it succumbed to a
combination of:
• excessive complexity (which nobody could un-
derstand)
• complex extensions of relational engines, which
did not seem to perform all that well and
• no compelling use case where it was wildly ac-
cepted
It is a bit ironic that a prediction was made in the
paper that X would win the Turing Award by success-
fully simplifying XML. That prediction turned out to be
totally wrong! The net-net was that relational won and
XML lost.
Of course, that has not stopped “newbies” from rein-
venting the wheel. Now it is JSON, which can be viewed
in one of three ways:
• A general purpose hierarchical data format. Any-
body who thinks this is a good idea should read
the section of the data model paper on IMS.
• A representation for sparse data. Consider at-
tributes about an employee, and suppose we wish
to record hobbies data. For each hobby, the data
we record will be different and hobbies are funda-
mentally sparse. This is straightforward to model
in a relational DBMS but it leads to very wide,
very sparse tables. This is disasterous for disk-
based row stores but works fine in column stores.
In the former case, JSON is a reasonable encod-
ing format for the “hobbies” column, and several
RDBMSs have recently added support for a JSON
data type.
• As a mechanism for “schema on read”. In effect,
the schema is very wide and very sparse, and es-
sentially all users will want some projection of
this schema. When reading from a wide, sparse
schema, a user can say what he wants to see at
run time. Conceptually, this is nothing but a pro-
jection operation. Hence, ’schema on read” is just
a relational operation on JSON-encoded data.
In summary, JSON is a reasonable choice for sparse
data. In this context, I expect it to have a fair amount of
“legs”. On the other hand, it is a disaster in the mak-
ing as a general hierarchical data format. I fully ex-
pect RDBMSs to subsume JSON as merely a data type
(among many) in their systems. In other words, it is a
reasonable way to encode spare relational data.
No doubt the next version of the Red Book will
trash some new hierarchical format invented by people
who stand on the toes of their predecessors, not on their
shoulders.
The other data model generating a lot of buzz in the
last decade is Map-Reduce, which was purpose-built by
Google to support their web crawl data base. A few
years later, Google stopped using Map-Reduce for that
application, moving instead to Big Table. Now, the rest
of the world is seeing what Google figured out earlier;
Map-Reduce is not an architecture with any broad scale
applicability. Instead the Map-Reduce market has mor-
4
剩余53页未读,继续阅读
资源评论
etah000
- 粉丝: 2
- 资源: 9
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功