目录!
第一章! Spark 简介与计算模型!.....................................................................................................!4!
1!What!is!Spark!........................................................................................................................!4!
2!Spark 简介!............................................................................................................................!4!
3!Spark 历史!............................................................................................................................!5!
4!BDAS 生态系统!....................................................................................................................!5!
5!Spark 与 Hadoop 的差异!.....................................................................................................!6!
6!Spark 的适用场景!................................................................................................................!7!
7!Spark 成功案例!................................................................................................ ....................!7!
第二章! Spark 开发环境搭建!..........................................................................................................!9!
1!Spark 运行模式!................................................................................................ ....................!9!
2!Spark 环境搭建!................................................................................................ ....................!9!
2.1Scala 的安装!.............................................................................................................!9!
2.2Spark 的单节点配置!...............................................................................................!10!
2.3Spark-Standalone 集群配置!...................................................................................!10!
2.4Spark-on-Yarn 模式配置!.........................................................................................!13!
2.5Spark-on-Mesos 模式配置!......................................................................................!14!
2.6Hive-on-Spark 配置!.................................................................................................!14!
第三章! Spark 计算模型!................................................................................................................ !16!
1!RDD 编程!............................................................................................................................!16!
1.1 弹性分布式数据集 RDD!........................................................................................!16!
1.2 构建 RDD 对象!.......................................................................................................!16!
2RDD 操作!............................................................................................................................!16!
2.1 将函数传递给 Spark!..............................................................................................!17!
2.2 了解闭包!................................................................................................................!17!
2.3Pair!RDD 模型!.........................................................................................................!18!
2.4Spark 常见转换操作!...............................................................................................!19!
2.5Spark 常见行动操作!...............................................................................................!21!
2.6RDD 持久化操作!.....................................................................................................!22!
2.7 注意事项!................................................................................................................!24!
2.7 并行度调优!............................................................................................................!25!
2.8 分区方式!................................................................................................................!26!
3Examle:PageRank!................................................................................................ ................!28!
第四章! Spark 编程进阶!................................................................................................................ !30!
1 共享变量!............................................................................................................................!30!
1.1 累加器!....................................................................................................................!31!
1.2 广播变量!................................................................................................................!32!
2 基于分区进行操作!............................................................................................................!33!
3 与外部程序间的管道!........................................................................................................!34!
4 数值 RDD 的操作!...............................................................................................................!35!
5!Spark!Shuffle 机制!..............................................................................................................!35!