There may be many reasons that brought you here, it could be because you heard all about Hadoop and what it can do to crunch petabytes of data in a reasonable amount of time. While reading into Hadoop you found that for random access to the accumulated data there is something call HBase. Or it was the hype that is prevalent these days addressing a new kind of data storage architecture. It strives to solve large scale data problems where traditional solutions may either be too involved or cost prohibitive. A common term used in this area is NoSQL. No matter how you have arrived here, I presume you want to know and learn - like me not too long ago - how you can use HBase in your company or organization to store a virtually endless amount of data. You may have a background in relational databases theory or you want to start fresh and this "column oriented thing" is something that seems to fit your bill. You also heard that HBase can scale without much effort and that alone is reason enough to look at it since you are building the next web-scale system. I was at that point in late 2007 facing the task of storing millions of documents in a system that needed to be fault tolerant and scalable while still being maintainable by just me. I have decent skills in managing a MySQL database system and was using it to store data that would ultimately be served to our website users. This database was running on a single server, with another as a backup. The issue was that it would not be able to hold the amount of data I needed to store for this new project. I either invest into serious RDBMS scalability skills, or find something else instead. Obviously I went the latter route and since my mantra always was (and still is) "How does someone like Google do it?", I came across Hadoop. After a few attempts of using Hadoop directly I was faced with implementing a random access layer on top of it - but that problem had been solved already: in 2006 Google had published a paper called BigTable [1] and the Hadoop developers had an open-source implementation of it called HBase (the Hadoop Database). That was the answer to all my problems. Or so it seemed... What follows is a blur to me. Looking back I realize that I would have wished for this customer project to start today. HBase is now mature, nearing a 1.0 release and is used by many high profile companies, such as Facebook, Adobe, Twitter, and StumbleUpon. Mine was one of the very first clusters in production (and is still in use today!) and my use-case triggered a few very interesting issues (let me refrain from saying more). But that was to be expected betting on a 0.1x version of a community project. And I had the opportunity over the years to contribute back and stay close to the development team so that eventually I was humbled by being asked to become a full-time committer as well. I learned a lot over the last few years from my fellow HBase developers and am still learning more every day. My belief is that we are by far not at the peak of this technology and it will evolve further over the years to come. Let me pay my respect to the entire HBase community with this book which strives to cover not just the internal workings of HBase or how to get it going but more specifically how to apply it to your use-case. In fact, I strongly assume that this is why you are here right now. You want to learn how HBase can solve your problem. Let me help you trying to figure this out. 《HBase:权威指南》是一本深入探讨分布式大数据存储系统的书籍,主要针对HBase这一开源的、基于Hadoop的非关系型数据库。本书适合对传统关系型数据库有了解,或者希望学习新式数据存储架构的人群,特别是那些面对大规模数据处理挑战的读者。 HBase在大数据领域扮演着重要的角色,它提供了随机访问海量数据的能力,弥补了Hadoop在实时查询方面的不足。作为NoSQL数据库的一员,HBase设计的目标是解决传统解决方案在扩展性和成本上的难题。在2006年,Google发布了BigTable论文,启发了Hadoop开发者创建了HBase,即Hadoop数据库,为Hadoop生态系统提供了一个强一致性的分布式数据存储系统。 书中详细介绍了HBase的内部机制,包括表、行、列族、时间戳等核心概念,以及数据模型、分布式架构、数据读写流程等。同时,作者强调了如何将HBase应用到具体业务场景,帮助读者理解如何利用HBase解决实际问题。 作者分享了他在早期采用HBase时的经验,包括遇到的问题和解决策略,以及随着时间推移,HBase不断成熟,被众多知名公司如Facebook、Adobe、Twitter和StumbleUpon广泛采用的过程。作者也成为了HBase开发团队的全职贡献者,这表明他对HBase有深入的理解和实践经验。 书中还包含了对使用代码示例的说明,提示读者如何在Safari Books Online上获取电子版资源,并提供了构建示例和运行Hush——一个基于HBase的URL缩短服务的指南。这些实战性的内容有助于读者快速上手并熟悉HBase的日常操作。 通过《HBase:权威指南》,读者不仅可以了解到HBase的基础知识,还能深入理解其在高可用性、可伸缩性和性能优化等方面的应用策略。随着大数据时代的持续发展,HBase作为一款强大的数据存储工具,其重要性日益凸显,本书无疑是掌握这一技术的宝贵资源。
剩余63页未读,继续阅读
- 粉丝: 2
- 资源: 13
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
- 1
- 2
- 3
前往页