Big Data Glossary
Pete Warden
Beijing
•
Cambridge
•
Farnham
•
Köln
•
Sebastopol
•
Tokyo
Big Data Glossary
by Pete Warden
Copyright © 2011 Pete Warden. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions
are also available for most titles (http://my.safaribooksonline.com). For more information, contact our
corporate/institutional sales department: (800) 998-9938 or corporate@oreilly.com.
Editor: Mike Loukides
Production Editor: Teresa Elsey
Cover Designer: Karen Montgomery
Interior Designer: David Futato
Illustrator: Robert Romano
Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of
O’Reilly Media, Inc. Big Data Glossary, the image of an elephant seal, and related trade dress are trade-
marks of O’Reilly Media, Inc.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as
trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a
trademark claim, the designations have been printed in caps or initial caps.
While every precaution has been taken in the preparation of this book, the publisher and authors assume
no responsibility for errors or omissions, or for damages resulting from the use of the information con-
tained herein.
ISBN: 978-1-449-31459-0
[LSI]
1315581712
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1. Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Document-Oriented 1
Key/Value Stores 2
Horizontal or Vertical Scaling 2
MapReduce 3
Sharding 3
2. NoSQL Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
MongoDB 6
CouchDB 6
Cassandra 7
Redis 7
BigTable 8
HBase 9
Hypertable 9
Voldemort 9
Riak 10
ZooKeeper 10
3. MapReduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Hadoop 11
Hive 12
Pig 13
Cascading 13
Cascalog 13
mrjob 13
Caffeine 14
S4 14
MapR 14
iii