Projects FAQ Mobile view
Sign In to comment
HBase: The Definitive Guide
HBase: The Definitive Guide
Lars George
Dedication
For my wife Katja, my daughter Laura and son Leon. I love you!
Show all sections
Preface
Conventions Used in This Book
Using Code Examples
Safari® Books Online
How to Contact Us
Acknowledgments
General Information
1. Introduction
The Dawn of Big Data
The Problem with Relational Database Systems
Non-relational Database Systems, Not-only SQL or NoSQL?
Building Blocks
HBase - The Hadoop Database
2. Installation
Quick Start Guide
Requirements
File Systems for HBase
Installation Choices
Run Modes
Configuration
Deployment
Operating a Cluster
3. Client API: The Basics
General Notes
CRUD Operations
Batch Operations
Row Locks
Scans
Miscellaneous Features
4. Client API: Advanced Features
Show all comments
Home Shop Answers Radar: News & Commentary Safari Books Online Conferences Training School of Technology
SEARCH
Preface
Add a comment
Filters
Counters
Coprocessors
HTablePool
Connection Handling
5. Client API: Administrative Features
Schema Definition
HBaseAdmin
6. Available Clients
Introduction to REST, Thrift and Avro
Interactive Clients
Batch Clients
Shell
Web Based UI
7. MapReduce Integration
Framework
8. Architecture
Seek vs. Transfer
Storage
Write-Ahead Log
Read Path
Region Lookups
The Region Life Cycle
ZooKeeper
Replication
9. Advanced Usage
Key Design
Advanced Schemas
Secondary Indexes
Search Integration
Transactions
Bloom Filters
Versioning
10. Cluster Monitoring
Introduction
The Metrics Framework
Ganglia
JMX
Nagios
11. Performance Tuning
Garbage Collection Tuning
Memstore-Local Allocation Buffer
Compression
Optimize Splits and Compactions
Load Balancing
Merging Regions
Client API - Best Practices
Configuration
Load Tests
12. Cluster Administration
Operational Tasks
Data Tasks
Additional Tasks
Change Logging Levels
Troubleshooting
A. HBase Configuration Properties
B. Roadmap
HBase 0.92.0
HBase 0.94.0
C. Upgrade from Previous Releases
Upgrading To HBase 0.90.x
Upgrading to HBase 0.92.0
D. Distributions
Cloudera's Distribution including Apache Hadoop
E. Hush SQL Schema
F. HBase vs. BigTable
Show all sections
HBase: The Definitive Guide
©2011, O'Reilly Media, Inc.
(707) 827-7000 / (800) 998-9938
All trademarks and registered
trademarks appearing on
oreilly.com are the property of
their respective owners.
About O'Reilly
Academic Solutions
Contacts
Customer Service
Careers
Press Room
Privacy Policy
Terms of Service
Writing for O'Reilly
Community
Authors
Forums
Membership
Newsletters
RSS Feeds
User Groups
More O'Reilly Sites
igniteshow.com
makerfaire.com
makezine.com
craftzine.com
labs.oreilly.com
Partner Sites
PayPal Developer Zone
O'Reilly Insights on Forbes.com
Preface
Projects FAQ Mobile view
Sign In to comment
HBase: The Definitive Guide
Preface
Preface
There may be many reasons that brought you here, it could be because you heard
all about Hadoop and what it can do to crunch petabytes of data in a reasonable
amount of time. While reading into Hadoop you found that for random access to the
accumulated data there is something call HBase. Or it was the hype that is prevalent
these days addressing a new kind of data storage architecture. It strives to solve
large scale data problems where traditional solutions may either be too involved or
cost prohibitive. A common term used in this area is NoSQL.
No matter how you have arrived here, I presume you want to know and learn - like
me not too long ago - how you can use HBase in your company or organization to
store a virtually endless amount of data. You may have a background in relational
databases theory or you want to start fresh and this "column oriented thing" is
something that seems to fit your bill. You also heard that HBase can scale without
much effort and that alone is reason enough to look at it since you are building the
next web-scale system.
I was at that point in late 2007 facing the task of storing millions of documents in a
system that needed to be fault tolerant and scalable while still being maintainable by
just me. I have decent skills in managing a MySQL database system and was using it
to store data that would ultimately be served to our website users. This database
was running on a single server, with another as a backup. The issue was that it would
not be able to hold the amount of data I needed to store for this new project. I
either invest into serious RDBMS scalability skills, or find something else instead.
Obviously I went the latter route and since my mantra always was (and still is) "How
does someone like Google do it?", I came across Hadoop. After a few attempts of
using Hadoop directly I was faced with implementing a random access layer on top of
it - but that problem had been solved already: in 2006 Google had published a paper
called BigTable
[1]
and the Hadoop developers had an open-source implementation
of it called HBase (the Hadoop Database). That was the answer to all my problems.
Or so it seemed...
What follows is a blur to me. Looking back I realize that I would have wished for this
customer project to start today. HBase is now mature, nearing a 1.0 release and is
used by many high profile companies, such as Facebook, Adobe, Twitter, and
StumbleUpon. Mine was one of the very first clusters in production (and is still in use
today!) and my use-case triggered a few very interesting issues (let me refrain from
saying more).
But that was to be expected betting on a 0.1x version of a community project. And I
had the opportunity over the years to contribute back and stay close to the
development team so that eventually I was humbled by being asked to become a
full-time committer as well.
I learned a lot over the last few years from my fellow HBase developers and am still
learning more every day. My belief is that we are by far not at the peak of this
technology and it will evolve further over the years to come. Let me pay my respect
to the entire HBase community with this book which strives to cover not just the
internal workings of HBase or how to get it going but more specifically how to apply it
to your use-case.
Show all comments
Home Shop Answers Radar: News & Commentary Safari Books Online Conferences Training School of Technology
SEARCH
Chapter 1
Add a comment
Add a comment
Add a comment
Add a comment
Add a comment
View 1 comment
Add a comment
In fact, I strongly assume that this is why you are here right now. You want to learn
how HBase can solve your problem. Let me help you trying to figure this out.
Conventions Used in This Book
The following typographical conventions are used in this book:
Italic
Indicates new terms, URLs, email addresses, filenames, and file extensions.
Constant width
Used for program listings, as well as within paragraphs to refer to program
elements such as variable or function names, databases, data types,
environment variables, statements, and keywords.
Constant width bold
Shows commands or other text that should be typed literally by the user.
Constant width italic
Shows text that should be replaced with user-supplied values or by values
determined by context.
This icon signifies a tip, suggestion, or general note.
This icon indicates a warning or caution.
Using Code Examples
This book is here to help you get your job done. In general, you may use the code in
this book in your programs and documentation. You do not need to contact us for
permission unless you’re reproducing a significant portion of the code. For example,
writing a program that uses several chunks of code from this book does not require
permission. Selling or distributing a CD-ROM of examples from O’Reilly books does
require permission. Answering a question by citing this book and quoting example
code does not require permission. Incorporating a significant amount of example
code from this book into your product’s documentation does require permission.
We appreciate, but do not require, attribution. An attribution usually includes the
title, author, publisher, and ISBN. For example: “HBase: The Definitive Guide by Lars
George. Copyright 2010 O’Reilly Media, Inc., 978-1-449-39610-7.”
If you feel your use of code examples falls outside fair use or the permission given
above, feel free to contact us at <permissions@oreilly.com>.
Safari® Books Online
Safari Books Online is an on-demand digital library that lets you easily
search over 7,500 technology and creative reference books and videos to
find the answers you need quickly.
With a subscription, you can read any page and watch any video from our library
online. Read books on your cell phone and mobile devices. Access new titles before
they are available for print, and get exclusive access to manuscripts in development
and post feedback for the authors. Copy and paste code samples, organize your
favorites, download chapters, bookmark key sections, create notes, print out pages,
Add a comment
Add a comment
Add a comment
Add a comment
Add a comment
Add a comment
Add a comment
Add a comment