4
Bigtable: A Distributed Storage System
for Structured Data
FAY CHANG, JEFFREY DEAN, SANJAY GHEMAWAT, WILSON C. HSIEH,
DEBORAH A. WALLACH, MIKE BURROWS, TUSHAR CHANDRA,
ANDREW FIKES, and ROBERT E. GRUBER
Google, Inc.
Bigtable is a distributed storage system for managing structured data that is designed t o scale
to a very large size: petabytes of data across thousands of commodity servers. Many projects at
Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These
applications place very different demands on Bigtab l e, both in terms of data size (from URLs to
web pages to satellite imagery) and latency requirements (from ba ckend bulk processing to real-
time data serving). Despite these varied demands, Bigtable has successfully provided a flexible,
high-performance solution for all of these Google products. In this article, we describe the simple
data model provided by Bigtable, which gives clients dynamic control over data layout and format,
and we describe the design and implementation of Bigtable.
Categories and Subject Descriptors: C.2.4 [Computer Communication Networks]: Distributed
Systems—distributed databases
General Terms: Design
Additional Key Words and Phrases: Large-Scale Distributed Storage
ACM Reference Format:
Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T.,
Fikes, A., and Gruber, R. E. 2008. Bigtable: A distributed storage system for structured data. ACM
Trans. Comput. Syst. 2 6 , 2, Article 4 (June 2008), 26 pages. DOI = 10.1145/1365815.1365816.
http://doi.acm.org/10.1145/1365815.1365816.
This article was originally published as an award paper in the Proceedings of the 7
th
Symposium
on Operating Systems Design and Implementation [Chang et al. 2006]. It is being republished here
with minor modifications and clarifications.
Authors’ address: Google Inc., 1600 Amphitheatre Parkway, Mountain View, CA 94043; email: {fay,
jeff, sanjay, wilsonh, kerr, m3b, tush a r, fikes, gruber}@google.com.
Permission to make digita l or hard copies of part or all of th i s work for personal or classroom
use is granted without fee provided that copies are not made or distributed for profit or direct
commercial advantage and that copies show this notice on the first page or initi a l screen of a
display along with the full citation. Copyright s for components of this work owned by others than
ACM must be honored. Abstracting with credits is permitted. To copy otherwise, to republish, to
post on servers, to redistribute to lists, or to use any component of this work in other works requires
prior specific permission and/or a fee. Permissions may be requested from the Publications Dept.,
ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or
permission@acm.org.
c
2008 ACM 0734-2071/2008/06-ART4 $5.00 DOI: 10.1145/1365815.1365816. http://doi.acm.org/
10.1145/1365815.1365816.
ACM Transactions on Computer Systems, Vol. 26, No. 2, Art icle 4, Pub. date: June 2008.