Contents Foreword........................................................................................................................................... 8 Acknowledgments............................................................................................................................. 10 About the Author............................................................................................................................... 13 1. Introduction: Why Look Beyond Hadoop Map-Reduce?.........................................14 Hadoop Suitability...................................................................................................................... 15 Big Data Analytics: Evolution of Machine Learning Realizations....................................... 19 Closing Remarks......................................................................................................................... 24 References....................................................................................................................................25 2. What Is the Berkeley Data Analytics Stack (BDAS)?................................................28 Motivation for BDAS..................................................................................................................28 BDAS Design and Architecture.................................................................................................32 Spark: Paradigm for Efficient Data Processing on a Cluster................................................34 Shark: SQL Interface over a Distributed System................................................................... 42 Mesos: Cluster Scheduling and Management System...........................................................45 Closing Remarks......................................................................................................................... 50 References................................................................................................................................... 50 3. Realizing Machine Learning Algorithms with Spark............................................... 55 Basics of Machine Learning...................................................................................................... 55 Logistic Regression: An Overview............................................................................................59 Logistic Regression Algorithm in Spark.................................................................................. 61 Support Vector Machine (SVM)............................................................................................... 64 PMML Support in Spark........................................................................................................... 68 Machine Learning on Spark with MLbase.............................................................................. 78 References....................................................................................................................................79 4. Realizing Machine Learning Algorithms in Real Time............................................81 Introduction to Storm................................................................................................................ 81 Design Patterns in Storm.......................................................................................................... 88 Implementing Logistic Regression Algorithm in Storm....................................................... 91 Implementing Support Vector Machine Algorithm in Storm.............................................. 94 7 Naive Bayes PMML Support in Storm.....................................................................................97 Real-Time Analytic Applications............................................................................................100 Spark Streaming....................................................................................................................... 106 References..................................................................................................................................107 5. Graph Processing Paradigms...........................................................................................109 Pregel: Graph-Processing Framework Based on BSP......................................................... 109 Open Source Pregel Implementations....................................................................................112 GraphLab....................................................................................................................................114 References..................................................................................................................................128 6. Conclusions: Big Data Analytics Beyond Hadoop Map-Reduce......................... 131 Overview of Hadoop YARN......................................................................................................131 Other Frameworks over YARN............................................................................................... 133 What Does the Future Hold for Big Data Analytics?........................................................... 134 References..................................................................................................................................136 A. Code Sketches........................................................................................................................ 138 Code for Naive Bayes PMML Scoring in Spark.................................................................... 138 Code for Linear Regression PMML Support in Spark.........................................................149 Page Rank in GraphLab........................................................................................................... 153 SGD in GraphLab......................................................................................................................158
- 1
- 粉丝: 0
- 资源: 1
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助