2
Designing Data-Intensive Applications
Martin Kleppmann
Beijing • Cambridge • Farnham • Köln • Sebastopol • Tokyo
3
Designing Data-Intensive Applications
Martin Kleppmann
Editor
Mike Loukides
Editor
Ann Spencer
Copyright © 2014 Martin Kleppmann
Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered
trademarks of O’Reilly Media, Inc. !!FILL THIS IN!! and related trade dress are trademarks of
O’Reilly Media, Inc.
Many of the designations used by manufacturers and sellers to distinguish their products are
claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc.
was aware of a trademark claim, the designations have been printed in caps or initial caps.
While every precaution has been taken in the preparation of this book, the publisher and
authors assume no responsibility for errors or omissions, or for damages resulting from the use
of the information contained herein.
O’Reilly Media
1005 Gravenstein Highway North
Sebastopol, CA 95472
2014-09-10T13:31:02-07:00
Print ISBN: 978-1-4493-7332-0 1-4493-7332-1
Ebook ISBN: 978-1-4919-0309-4 1-4919-0309-0
4
Computing is pop culture. […] Pop culture holds a disdain for history. Pop culture is all about
identity and feeling like you’re participating. It has nothing to do with cooperation, the past or
the future—it’s living in the present. I think the same is true of most people who write code for
money. They have no idea where [their culture came from]…
And the Internet was done so well that most people think of it as a natural resource like the
Pacific Ocean, rather than something that was man-made. When was the last time a technology
with a scale like that was so error-free?
— Alan Kay Dr Dobb’s Journal — July 2012
5
Table of Contents
About the Author
.................................................................................................................................
6
About this Book
....................................................................................................................................
6
Part I. The Big Picture
........................................................................................................................
11
Chapter 1. Reliable, Scalable and Maintainable Applications
....................................
11
Thinking About Data Systems
..........................................................................................................
12
Reliability
............................................................................................................................................
14
Scalability
............................................................................................................................................
18
Maintainability
...................................................................................................................................
24
Summary
.............................................................................................................................................
28
Chapter 2. The Battle of the Data Models
...........................................................................
31
Rivals of the Relational Model
.........................................................................................................
32
Query Languages for Data
................................................................................................................
45
Graph-like Data Models
....................................................................................................................
52
Summary
.............................................................................................................................................
67
Chapter 3. Storage and Retrieval
...........................................................................................
73
Data Structures that Power Your Database
...................................................................................
73
Transaction Processing or Analytics?
.............................................................................................
90
Column-oriented storage
..................................................................................................................
97
Summary
...........................................................................................................................................
104
Part II. Systems of Record
.............................................................................................................
110
Chapter 4. Replication
..............................................................................................................
111
Shared-Nothing Architectures
........................................................................................................
111
Leaders and Followers
.....................................................................................................................
114
Problems With Replication Lag
.....................................................................................................
122
Beyond leader-based replication
...................................................................................................
128
Summary
...........................................................................................................................................
140