Hadoop权威指南(第四版)en

所需积分/C币:39 2016-11-07 10:41:09 11.77MB PDF
收藏 收藏
举报

Hadoop权威指南(第四版)
Hadoop: The Definitive Guide Tom white For eliane emilia and lottie Foreword Doug Cutting, April 2009 Shed in the yard, california Hadoop got its start in Nutch. a few of us were attempting to build an open source web search engine and having trouble managing computations running on even a handful of computers. Once Google published its GFS and MapReduce papers, the route became So we started, two of us, half-time, to try to re-create these systems as a part of Nut < tch clear. They 'd devised systems to solve precisely the problems we were having with Nutch We managed to get Nutch limping along on 20 machines, but it soon became clear that to handle the web's massive scale. we'd need to run it on thousands of machines and moreover, that the job was bigger than two half-time developers could handle Around that time, Yahoo! got interested, and quickly put together a team that i joined. We split off the distributed computing part of Nutch, naming it Hadoop With the help of Yahoo!, Hadoop soon grew into a technology that could truly scale to the web In 2006, Tom White started contributing to Hadoop. I already knew Tom through an excellent article he'd written about Nutch, so i knew he could present complex ideas in clear prose I soon learned that he could also develop software that was as pleasant to read as his prose From the beginning Tom' s contributions to hadoop showed his concern for users and for the project. Unlike most open source contributors, Tom is not primarily interested in tweaking the system to better meet his own needs, but rather in making it easier for anyone to use Initially, Tom specialized in making Hadoop run well on amazon's EC2 and s3 services Then he moved on to tackle a wide variety of problems, including improving the MapReduce aPIs, enhancing the website, and devising an object serialization framework In all cases, Tom presented his ideas precisely. In short order, Tom earned the role of Hadoop committer and soon thereafter became a member of the hadoop project Management Committee om is now a respected senior member of the Hadoop developer community. Though he's an expert in many technical corners of the project, his specialty is making Hadoop easier to use and understand Given this, I was very pleased when i learned that Tom intended to write a book about Hadoop. Who could be better qualified? Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk Preface Martin gardner. the mathematics and science writer once said in an interview Beyond calculus, I am lost. That was the secret of my columns success. It took me so long to understand what I was writing about that I knew how to write in a way most readers would understand. LII In many ways, this is how I feel about Hadoop. Its inner workings are complex, resting as they do on a mixture of distributed systems theory, practical engineering, and common sense. and to the uninitiated, Hadoop can appear alien But it doesn't need to be like this. Stripped to its core, the tools that Hadoop provides for working with big data are simple. If there's a common theme, it is about raising the level of abstraction -to create building blocks for programmers who have lots of data to store and analyze and who don' t have the time the skill, or the inclination to become distributed systems experts to build the infrastructure to handle it With such a simple and generally applicable feature set, it seemed obvious to me when I started using it that Hadoop deserved to be widely used. However, at the time(in early 2006), setting up, configuring and writing programs to use Hadoop was an art. Things have certainly improved since then there is more documentation, there are more examples, and there are thriving mailing lists to go to when you have questions. And yet the biggest hurdle for newcomers is understanding what this technology is capable of, where it excels, and how to use it. That is why i wrote this book The apache hadoop community has come a long way. since the publication of the first edition of this book, the Hadoop project has blossomed. "Big data"has become a household term. 2] In this time. the software has made great leaps in adoption performance, reliability, scalability, and manageability. The number of things being built and run on the Hadoop platform has grown enormously. In fact, it's difficult for one person to keep track. To gain even wider adoption, I believe we need to make Hadoop even easier to use. This will involve writing more tools; integrating with even more systems; and writing new, improved aPIs. i'm looking forward to being a part of this, and I hope this book will encourage and enable others to do So, too Administrative notes During discussion of a particular Java class in the text, I often omit its package name to reduce clutter If you need to know which package a class is in you can easily look it up in the java api documentation for Hadoop linked to from the apache Hadoop home page) or the relevant project. Or if you're using an integrated development environment (iDe), its auto-complete mechanism can help find what you're looking for Similarly, although it deviates from usual style guidelines, program listings that import multiple classes from the same package may use the asterisk wildcard character to save space(for example, import org. apache hadoop. io. *) The sample programs in this book are available for download from the book's website You will also find instructions there for obtaining the datasets that are used in examples throughout the book, as well as further notes for running the programs in the book and links to updates, additional resources, and my blog What's New in the fourth edition? The fourth edition covers Hadoop 2 exclusively. The hadoop 2 release series is the current active release series and contains the most stable versions of Hadoop There are new chapters covering YARN(Chapter 4), Parquet(Chapter 13), Flume (Chapter 14), Crunch( Chapter 18), and Spark( chapter 19). There's also a new section to help readers navigate different pathways through the book(what's in This Book?) This edition includes two new case studies( chapters 22 and 23): one on how Hadoop is used in healthcare systems, and another on using Hadoop technologies for genomics data processing. Case studies from the previous editions can now be found online Many corrections, updates, and improvements have been made to existing chapters to bring them up to date with the latest releases of hadoop and its related projects

...展开详情
试读 127P Hadoop权威指南(第四版)en
立即下载 低至0.43元/次 身份认证VIP会员低至7折
    抢沙发
    一个资源只可评论一次,评论内容不能少于5个字
    关注 私信 TA的资源
    上传资源赚积分,得勋章
    最新推荐
    Hadoop权威指南(第四版)en 39积分/C币 立即下载
    1/127
    Hadoop权威指南(第四版)en第1页
    Hadoop权威指南(第四版)en第2页
    Hadoop权威指南(第四版)en第3页
    Hadoop权威指南(第四版)en第4页
    Hadoop权威指南(第四版)en第5页
    Hadoop权威指南(第四版)en第6页
    Hadoop权威指南(第四版)en第7页
    Hadoop权威指南(第四版)en第8页
    Hadoop权威指南(第四版)en第9页
    Hadoop权威指南(第四版)en第10页
    Hadoop权威指南(第四版)en第11页
    Hadoop权威指南(第四版)en第12页
    Hadoop权威指南(第四版)en第13页
    Hadoop权威指南(第四版)en第14页
    Hadoop权威指南(第四版)en第15页
    Hadoop权威指南(第四版)en第16页
    Hadoop权威指南(第四版)en第17页
    Hadoop权威指南(第四版)en第18页
    Hadoop权威指南(第四版)en第19页
    Hadoop权威指南(第四版)en第20页

    试读已结束,剩余107页未读...

    39积分/C币 立即下载 >