

- CASIA-WebFace 数据(百度云)WebFace 数据集,百度云链接,压缩数据共 4.1 G. WebFace 数据集,百度云链接,压缩数据共 4.1 G. WebFace 数据集,百度云链接,压缩数据共 4.1 G. WebFace 数据集,百度云链接,压缩数据共 4.1 G. WebFace 数据集,百度云链接,压缩数据共 4.1 G.
5 1980浏览
会员免费 - 大数据全套教程完整版大数据大小:69B大数据基础到精通完整版, 涵盖技术点:python 基础 java基础,mysql,oracle,ssm框架,linux,hadoop,hbase,zookeeper,flume,scala,spark。资源宝贵,速度下载大数据基础到精通完整版, 涵盖技术点:python 基础 java基础,mysql,oracle,ssm框架,linux,hadoop,hbase,zookeeper,flume,scala,spark。资源宝贵,速度下载
4 2993浏览
会员免费 - 全国2014-2018年空气质量csv数据集文件数据数据集大小:27MB全国2014-2018年空气质量csv数据集文件数据,包含字段time(时间),city(城市),AQI,PM2.5,PM10,SO2,NO2,CO,O3,primary_pollutant(主要污染物),共计55万条数据。全国2014-2018年空气质量csv数据集文件数据,包含字段time(时间),city(城市),AQI,PM2.5,PM10,SO2,NO2,CO,O3,primary_pollutant(主要污染物),共计55万条数据。
4 6639浏览
会员免费 - Spark最全操作完整示例代码最全Spark操作完整示例代码-------是基于java的。 包含所有的spark常用算子操作和ml以及mlib、sparkstreaming、sparkSQL操作的示例DEMO。 内附有详细说明,由于内容过大删除了两个jar包,需要自己去下载,spark的安装包下的有这两个包,导入项目即可。懒得改maven项目了。 适合有一定基础和已经工作的人员下载学习。 个人Github: https://github.com/huangyueranbbc
5 3291浏览
会员免费 - scala-2.11.8.msi 安装程序scala大小:109MBScala是一门多范式的编程语言,一种类似java的编程语言,设计初衷是实现可伸缩的语言、并集成面向对象编程和函数式编程的各种特性。 此为安装程序Scala是一门多范式的编程语言,一种类似java的编程语言,设计初衷是实现可伸缩的语言、并集成面向对象编程和函数式编程的各种特性。 此为安装程序
5 498浏览
会员免费 - Spark面试2000题(1~6期 外加60题)Spark面试2000题(1~6期 外加60题) Spark面试2000题(1~6期 外加60题)
5 4175浏览
会员免费 - spark-2.2.1-bin-hadoop2.7.tgzspark大小:192MBApache Spark 是专为大规模数据处理而设计的快速通用的计算引擎。Spark是UC Berkeley AMP lab (加州大学伯克利分校的AMP实验室)所开源的类Hadoop MapReduce的通用并行框架,Spark,拥有Hadoop MapReduce所具有的优点;但不同于MapReduce的是——Job中间输出结果可以保存在内存中,从而不再需要读写HDFS,因此Spark能更好地适用于数据挖掘与机器学习等需要迭代的MapReduce的算法。Apache Spark 是专为大规模数据处理而设计的快速通用的计算引擎。Spark是UC Berkeley AMP lab (加州大学伯克利分校的AMP实验室)所开源的类Hadoop MapReduce的通用并行框架,Spark,拥有Hadoop MapReduce所具有的优点;但不同于MapReduce的是——Job中间输出结果可以保存在内存中,从而不再需要读写HDFS,因此Spark能更好地适用于数据挖掘与机器学习等需要迭代的MapReduce的算法。
5 582浏览
会员免费 - Scala-升级版.docxScala快速入门(适合为学Spark学习Scala的同学)Word文档
0 1392浏览
免费 - 《大数据Spark企业级实战版》2大数据大小:52MB大数据领域必读!此文件为第二分卷。大数据领域必读!此文件为第二分卷。
5 230浏览
会员免费 - 《大数据Spark企业实战版》1大数据大小:52MB大数据领域的必读书!此文件为第一分卷。大数据领域的必读书!此文件为第一分卷。
4 152浏览
会员免费 - spark-1.6.0-bin-hadoop2.6.tgzspark大小:276MBlinux中搭建spark环境使用的spark-1.6.0-bin-hadoop2.6.tgz安装包linux中搭建spark环境使用的spark-1.6.0-bin-hadoop2.6.tgz安装包
4 760浏览
会员免费 - spark 优秀资源源码(个人整理)spark大小:23MB里面包含很多spark源码(包括etl,kafka,hbase整合等)里面包含很多spark源码(包括etl,kafka,hbase整合等)
5 1007浏览
会员免费 - Spark面试2000题系列第5期参考答案 (1).pdfSpark面试2000题系列第5期参考答案 (1).pdfSpark面试2000题系列第5期参考答案 (1).pdf
5 405浏览
会员免费 - 尚硅谷spark尚硅谷最新spark视频,Apache Spark 是专为大规模数据处理而设计的快速通用的计算引擎。从入门到精通!
4 1174浏览
会员免费 - 华为云平台全套搭建+安装手册华为云安装手册,包括软件安装手册和快速安装文档,非常详细的官方文档
4 2553浏览
会员免费 - 某大数据平台整体方案建议书某大数据平台整体方案建议书 ,,每个大数据人都应该有一个全面的认识。
5 661浏览
会员免费 - 深入理解Spark+核心思想与源码分析.pdf深入理解Sp深入理解SPARK:核心思想与源码分析》结合大量图和示例,对Spark的架构、部署模式和工作模块的设计理念、实现源码与使用技巧进行了深入的剖析与解读。 《深入理解SPARK:核心思想与源码分析》一书对Spark1.2.0版本的源代码进行了全面而深入的分析,旨在为Spark的优化、定制和扩展提供原理性的指导。阿里巴巴集团专家鼎力推荐、阿里巴巴资深Java开发和大数据专家撰写。ark+核心思想与源码分析.pdf完整版
5 463浏览
会员免费 - Spark使用指南(权威版)权威版 Spark使用指南,Apache Spark 是专为大规模数据处理而设计的快速通用的计算引擎;Spark 是一种与 Hadoop 相似的开源集群计算环境,但是两者之间还存在一些不同之处,这些有用的不同之处使 Spark 在某些工作负载方面表现得更加优越,换句话说,Spark 启用了内存分布数据集,除了能够提供交互式查询外,它还可以优化迭代工作负载。
5 2595浏览
会员免费 - spark-assembly-1.6.1-hadoop2.6.0.jarspark-assemb大小:179MBspark-assembly-1.6.1-hadoop2.6.0.jar,欢迎下载,,,,,,,,,,,,,,,,,,,,spark-assembly-1.6.1-hadoop2.6.0.jar,欢迎下载,,,,,,,,,,,,,,,,,,,,
5 1451浏览
会员免费 - python爬虫爬取股票评论,调用百度AI进行语义分析, matlab数据处理,股票涨跌和评论的关系股票评论大小:2MBpython爬虫爬取股票评论, 调用百度AI进行语义分析, matlab数据处理, excel作图 股票涨跌和评论的关系python爬虫爬取股票评论, 调用百度AI进行语义分析, matlab数据处理, excel作图 股票涨跌和评论的关系
5 4698浏览
会员免费 - 图解Spark核心技术与案例实战_PDF电子书,文字版目录 第一篇基础篇 Spark 及其生态圈概述.............. . . . .…......... .. ... . ... .. ....... . ...... .. . ............... . .... ................. ........... 1 ··B·A . -··A Spark 简介.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 1.1.2 1.1.3 什么是Spa此. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . 1 Spark 与MapReduce 比较.. .. ... . .. .. . ....... .. .. .. . ..... . .. .. .... ......... . ... . .. ..... ...... .. . ...... .. . . .............. 3 Spark 的演进路线图.. . .. .... ..... . ....... . ... . ... . .. .. .............. . .. .. .. . ...... . ... . ....... ........ .. .. . . . ... .... . . ..... 4 1.2 Spark 生态系统.......................................................................................................... . .. 5 I .2. I 1.2.2 1.2.3 1.2.4 1.2.5 1.2.6 1.2.7 1.2.8 1.3 小结 第2 章 Spark Core ..…....... .. ......... ... ......... ........ ........... .............................. .... ................. ....... ........ 6 Spark Streaming .............….. .......…........................................ ...................................... ..... 7 Spark SQL .................. ..................... .................................. ....... .... .... ............................ .... 9 BlinkDB ......... ..............…..... ............ .………………………….........…........……........…........... 11 MLBase/MLlib ..................... ............ ...…...... ... ..... ... .......................... ......................... ..... 12 GraphX ...... ......... ................…….......……·…….......….. ... ..... ............................…................ 12 S parkR ...................……...................... ............ ........ ..................................... .... .......... ....... 13 Alluxio ................. .............................. .................................................... .... ... ..... ... ........... 14 …................... 15 搭建Spark 实战环境.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . • . . . . . . . . . . . . . • . . . . . . • . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . • . . • . 16 2.1 基础环境搭建.... . .....…….......….......…........................................................................ 16 2.1.1 2.1.2 搭建集群样板机. . . .... . .. ... ... .. ... . .. . ... . .. ... .. . ... . .. ... . ... . .. .. . . .. . ... . .. .. .. . .. ... . . . . ... .. . . .. . ...................门 配置集群环境.. ... .. . .... ...... .. .. .. ... .. . .... .. .. .. . . .. .. .. .. . .... ... . ...... . ...... . .. ... . .. .. . ..... . .. .. . .... .. .... .... ....22 2.2 编译Spark 源代码.. .....………………………….. • . . . . • . . . • . . . • . . • • . . . • • • • • • . • . • • . . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . 25 2.2.1 2.2.2 2.2.3 2.2.4 自己置Spark 编译环境............ . .... . ... . .. . ... . ...... . . .... . ......... . ....... . . ... .. . . . . . .. .. .. ... . .. . . . .......... . .. . ..2 6 使用Maven 编译Spark . ... .. . . . ......... ...... . . .. . .. .. .. . ... .. .. .. .. . .. .. . ... .. ...... . ...... .. ..... .. .... . .. . .. .. . .....刀 使用SBT 编译Spark ....... ......... ....... ...................................... ...................... .................. 29 生成Spark 部署包..... . .. . . . ... .. . . . . . ... ... . .... . ......... . ........ . .. . .. ... . .. .. . .. ... . .. . ............. . .. . .... . ... . .. ...3 0 2.3 搭建Spark 运行集群..... . .…….. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 I x I 图解Spark :核心技术与案例实战 第3 章 2.4 2.5 2.3.1 2.3.2 2.3.3 2.3.4 修改配置文件. ....... .…... ... . ........... .. ... .. ... . .... ... ................... . ..... . .......... . ........ . ...................引 启动Spa此.. . ....... .. .. . .. ... ..... ................. . . . . ... . ... .... .. . ......... .. ........ . .......................................刀 验证启动. ...... ... .... .. .. . ... ... .. . .......... .. .. ... ... ... . .. .. ... . ... .. .... . .. .. ... . .... .... .. . .. ...... ... .... .. ... .. ... .. . . . ..日 第一个实例.................... .. . ........... . .... . ........ . ... . ... . ............... . .... . ........ . ...... . .... . ..... .. ......... ..” 搭建Spark 实战开发环境.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 5 2.4. l 2.4.2 2.4.3 CentOS 中部署IDEA ... .... ............. ......... ............. ......................................................... .. 36 使用IDEA 开发程序..... . .. .. .. ... . .. ... . ... ... . ... .. ... .... . ... .. ... ... ... .. ..... . ...... .. .... . .... . .......... .. ........幻 使用IDEA 阅读源代码.. ...... .. . ... .. .. .. . ..... .. .. ... ..... .... ........ . .... . ..... .. ......... .. .... .. .. ...... .. ...... ..42 小结.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 7 第二篇核心篇 Spark 编程模型.. . . . • . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 8 3.1 RDD 概述.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.2 3.3 3.4 3.5 3 .1.1 3.1.2 3.1.3 背景....... .. ............. ... .. . ... . .. ............ . . ....... ... .. .. .... . .. .. ... .. ... .. .. ... .. .. .... . . . .. .... .. .. ..... ... . . .... ..... ...48 RDD 简介. .. . ... .. .. . .. .. ... . .. ... .. . ... . .. ... . .. .... . ... .. .. . .... .. . . . . .... . ... .. ....... ... .... .. ... .. .... .. ..... . ......... . . . .的 RDD 的类型. ..... . ............................................................................................................” RDD 的实现..................................... ...........................................................................引 3.2.1 3.2.2 3.2.3 3.2.4 3.2.5 作业调度. . .. .. ... .. . .. . ..……... . ....... . ... . ... . ...... . ...... . ... .. .. ... ... . .. .... . ... . ....... . .............. . . ... .... ..... ...51 解析器集成..... . ... . .. .. ...... . .... . ..... ... ...... ... ..... . .... ... ....... . ..... . ............ ... .......... . ................ . ....但 内存管理........…·.... .. .. . .. .. .. ... . . . . .. .. ...... . .... .. ....... .. ........... . ......... . ................. . .....................幻 检查点支持...... .. .. .. ……….. . .... .. .....……........… .....….. ... ..….. .. .. .. .... ... .. ... ....... .. …............ ...54 多用户管理.........… . ..... ... . . ......... . .. .. .. . .. ... ... ... ... . ... . .. ... ........ . ... . .......... . . . .... . ...... . ...... . ........5 4 编程接口..................................... . ............................................................................... 5 5 3.3. l 3.3 ~2 3.3.3 3.3.4 3.3.5 RDD 分区( Partitions) RDD 首选位置 RDD 依赖关系 RDD 分区计算 RDD 分区函数 ( PreferredLocations) (Dependencies) (Iterator) (Partitioner) ....... 55 .. ......... .... ...... ........... ...... ........ ......... 56 ..... 56 ..... 58 ...... ....... ....... ........... 58 创建操作....... ..…………………………………………………………………… … ………………………” 3.4.1 3.4.2 并行化集合创建操作....………………………………… …………… ……………………………………·” 外部存储创建操作............ ... . . ... . .. ..... ... ........ ... ....... . .. .. ... . . .. ... .. ............. . . . . ... .. .. ..... . ........刷 转换操作. • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 6 3 3.5.1 基础转换操作................ . ..... ... .. .. ... .................... . ............................................................63 3.6 3.7 3.8 第4 章 控制操作 行动操作 3.5.2 3. 7 .1 3.7.2 目录| XI 键值转换操作... ...... . ... ....… .. . . . ... . .. .. . . .. .. .. . .. … …… … …….......……….. . ....….................... . ....70 …………………………………………………………................................................... 77 ……………………………………………………......................................................... 80 集合标量行动操作. .. . . .….. .. .. .. .. . ... .. … .... .. .. .. .. .. .. ... . .. .. .. .. . .. ... . .. .. .. .. .... . . . ..... .. ... . .. . .. .. .. .. .....80 存储行动操作. ..... . . . ........ .. ... . .. .. .. .. .. .. .. .. . . .. . ... . ... . ....... . ... . ... . .. .. ............................... . .........84 小结.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 7 Spark 核心原理. ......……......... .. ... . .......……………………………………………………………………··” 4.1 4.2 4.3 4.4 4.5 4.6 消息通信原理…… …………………………………………………………………………………………·” 4.1.1 4.1.2 4.1 .3 AU -- A“丁 AYAYAY 信 构信通 架通息 营息肖, , , 通消时 息动行 消启运 LEALKALK VAV 且V且 narnaynar n、unδ 们δ 作业执行原理..... ... ... ... ............ .. ..... . . ....... . ......... . ......... ..... ..... ...... .................. . ......... . .1 02 4.2.1 4.2.2 4.2.3 4.2.4 4.2.5 4.2.6 4.2.7 概述. ... .. .. . . .. . .. .. .. .. . .... . ... . . .. ... ..... . ... . ........... .. ...... . ...…. . .. .. .. .. .. . ...…............ . .. . .. .. ... . .. . .. ... ... 102 提交作业.. .. . .. .. . ... .. ... . . ... .. . .. . . ... . .. ... . . ...... . ... . ... . ... . .. ... . .. ... . .. .. . ... . ... . ... . ...... .. .. .. . ..... ... . ........1 04 划分调度阶段.. ... . . . . . .. . .... . .. . ... .. .............. .. ... .. .. . ... .. . .. . .... . . ... .. .. . ... . . . .. ... .... . . ... . . . . .. .. .. .. . .. .. . 1 06 才是交调度阶段. ... .. .... . ... .. .. . ... .. ... .. . ... …... . .. ...... .. ... . .. . ..…...... . ... . ... . .. .. . .. .. . . ... .... . . . . .. .. .. ... . . . 109 才是交任务. ... .. ....... .. .. . ..... . ... . .. . ... . ... . ... .. . .... ....... .. .. . ... . .. .. ...... .. . .. .. ... .. . ... ..... . .... ... ... .. .. . ... . .. . 112 执行任务. . ..... . ...... .. .. . ..... ... . .......... . . .. . ... . .. . .... . ... .. ...... .. . .. ................. . ........... . .......... . ....... 11 7 获取执行结果. ....... .. ....... ... . . .. ... .. . ... . . .. .. .. . .. ........ . ..... .. ....... .. . .. ........ ..... . . .. .. .. .. .. ...... .. .. . ... 119 调度算法................................................................................... ..... ....... .. ....... . ........... 122 4.3. l 4.3.2 4.3.3 应用程序之间. ... . ...... . . .... . .... . .. .. . . . . . .. . ... . ... . .. .. . ... . ....... . ... ... . . ... . ... . .. .. . . . . . . .. . . ... ... . . . .. . ... . ... . . 12 2 作业及调度阶段之间.. ... . ... . ... . .. .. ... . ... . ... ..... .. ... ... . . . ....... . . .. . ... ... . .... . ... . ... . . .... . . . . .. . ... . . ..... 12 6 任务之间. . . .. .. ... .. .. .... .………….. . ............................... . . .. . .. . . .......... ...... ..... .. .. .. .. . .... .. ....... . . 130 容错及HA................................. ......... ... .................... ............. ...... ............. ................. 13 6 4.4.1 4.4.2 4.4.3 Executor 异常. .. .. .... .. ... ... . .. ... .. .. . ... .. .. ...... .. .. .. .. .. .......... . . .. . ........... . .......... . .. ....... . . .......... . . 136 Worker 异常. ' ....…............... .................…... .... ............ ........ .......... ........................... ...... 137 Master 异常... .. ... .. ...... . .... .... . ...... .. .......... . ....... . . ............................ .. .............. . ...... . . .. . ..... 13 8 监控管理......…....... . ......................... . .... . ...... . ................................... . .. .... .............. .... . 13 9 4.5.1 4.5.2 4.5.3 UI 监控. ... . . .. . .…. . ... .. … ... . . ..….. . ... . ... . .. .............. .. .. ...... .. ....... . ......... .. ... . . ... .. . . . . .. . .. .. . ...... . .. 139 Metrics ................. .................................................... ........ .............................................. 150 REST ........…...........…...... ....…….. .... ............................ .................……........……............. 152 实例演示..... . ...... . .... .... . . . ... . ...... .. . . . . . . . .. .... ... .. . . ...... .... .. .. . ..... . . ... . ... .. . ...... . ..................... 15 4 4.6.1 计算年降水实例...... . .. …. .. . .. .. .. ... . ... . . . . . .. . . . . . . . ... .. . ....... .. ... . . .... . .. . .. ... . .. . . . ... .. . . . ... . . . . . . ... .. ... 154 XII l 图解Spark :核心技术与案例实战 4.7 第5 章 4.6.2 HA 配置实例....................... . . . ................ .. .. .. .. . .. ...... .. .. ... .. .. . .... ... .. ..... .... ... .. ... ............ ... 15 7 小结.. . ................. .................. ........................... . ...... . ............. . ......................... . .... . .. . . .. 160 Spark 存储原理............…....... … ..... . . . ..... ... ........ . .. . . .. .... ... . ... .... .. . .. . ... . .. . ..... . . ........ . . . ... . ........ 161 5.1 存储分析........... .. ..... ........... ..... ......................... ....... ........... . ...................................... 161 5.2 5.3 5.4 5.5 5.6 第6 章 5 .1.1 5.1.2 5.1.3 5.1.4 5.1.5 整体架构... .. . .. .. .. ... ...... .. .. . .. . . .... . ... . .. .. ... .. .. ... . . .. .. .. .. . . . ... .. .. ... ... ... .. .... ... .. ... .. .... .. .. ............. 161 存储级别.. .. ................... .. .............. .. ... . ... .. ... . .... .. .. .. .. .. .... .. .... . .. .. ... ... ... ... .. ............. .. .... . . . 167 RDD 存储调用. . . . .. . ...... ... . . ... .. .. ...... . . . ... . ... . ..... . .. .. ... . ... .. ... .. .... . .... .. ... . .... .. ..... . ..... .... ..... . . . 168 读数据过程... .. . ... . ... .. . . ………. . ... .. .. .. .. .. ... .. ... . .. ... ... . .... . ... .. ... . . . .. ... .. . ..... . ..... .. ..... . ............. l 7 0 写数据过程..................................……........ .… . .. .. .. ... .. ... . ..... . ... .. .. ... .... . ...... . .... .... . … ... . ... 177 Shuffle 分析... .. . . ........ . ...... . . . ...................................... .. ......................................... .. .. . . 186 5.2.1 5.2.2 5.2.3 乐V 瓦U 句3 0000AV J 作作 。来。来 介写读 简的的 hmhemhem e σδn 飞unδ 序列化和压缩..... .... ... ............. .......................... ....................................... .............. .... 2 0 0 5.3.1 5.3.2 序列化.. .......................... .. .. . ... . .... . . ... .. ... .. .. ... .. . . ........ . ...... . .. .. ....... ... ......... ... .. ... ...... .. ... .. .2 0 0 压缩. . . .. . .. .. .. . ... .. .. . .......................................................... . .... . ............. . ... . .. .. ..... .. ..... .. . ... ..2 0 1 共享变量....................................................................................................................202 5.4.1 5.4.2 广播变量. . ... . .. .. .. .……… .. .. .. . ... ... . ... .. .. . ........ .. .. .. ... .. ... . ... . .... .. ... .. .. .... ... .. .... .. .. .... ..... .........2 0 2 累力口器.......................................................... ... ...... .. ....... .. ... .. ......... .. ... .... ... .... . ..... . ........203 实例演示....................................... . ............................... . ......... . .................................. 2 0 4 小结. ... .....…… . .... . . ... ...... . ........................ . ......................... . .................................... . .... 2 0 8 Spark 运行架构.......................................................... . ......................... . .............................2 09 6.1 运行架构总体介绍.... ............. ....... . ............................... . .. .... ................................... . ..20 9 6.2 6.3 6.4 6.1.1 6.1.2 总体介绍......... . .... . .......................... .. ... ... . .... . ... .. . ... .. ... . .... .. .. .. ..... . ... .. .. .. .... ... ... .... ... ..... ..2 0 9 重妥类介绍. .. ... . ....... . ... . ................... . .... .. .... . .. .. .... . . .... . . .... . ... .. .... . ..... . .. ... ... ... ... . . .. ... ..... . .2 l 0 本地( Local )运行模式.............................................. .. .............. .... ...... .. . ........ . .. . .. ..211 6.2. l 6.2.2 运行模式介绍...... .. .. . ...... … . . ............. .. ... .. ... .. . .. .. ... .. .. .. ... . ... .. .... . ... .. .... . .... . .... ... ..... . .... .. ..211 实现原理.. .. . . . . .. .. . ....................... . .. . ........... . ... . ... . ....... .. . . .. . .. . .... ... . .... ... .. ......... .. .. . . .. ....... . 2 日 伪分布( Local-Cluster )运行模式.............................. . .. . ............ ... ................ . .... . . .. .215 6.3.1 6.3.2 运行模式介绍. . . . . ........ . .. .. ... .... .. .. ............... . .. . .................... . .... .. ... .. ... .. .... . .... . ...... . ...... ...2 15 实现原理... . ... . .. .. ... .. .. .. . ... . ... . . ... . . . . .. ... .. . .. . ....... .. . . .... . ... . . . .. .. .... . . .. .............. .. ......... .... .. . . . . .2 1 6 独立( Standalone )运行模式. ... ... ........….......….......…….. . .............……............... . . . . .218 第7 章 6.5 '6.6 6.7 6.8 6.4.1 6.4.2 6.5.1 6.5.2 6.5.3 6.5.4 6.5.5 6.5.6 自录| 运行模式介绍...... .. ..... .. .. .. .. ... .. .. .. . .. . ... . ... . .. .. ... . .. .......................... . ... . .. . ... . .......... . ... . ..... .21 8 实现原理. ..... .. ..... .... .... .. ... . .. .. ... . ...... . ... . .. .. ... . .. ... . .…… . ... . ...…. . .. .. .…… …· ......... … ............219 y成N 运行框架.. .... . .. .... ... . ....... . .. ... . .. .. .. . ... .. .. . ... . .... . ..................... . ... . .. . ... .. ..... . .. .. .. . .. .. .22 0 YARN-Client 运行模式介绍.. .. .. .. . … …. ... . . .... . ... .. ..... .. ... . . ... .. . .. .. ... . ... . . .. .. ... . .. .. . .......... . .221 YARN-Client 运行模式实现原理. ... . ... . .. . .........…………………........ . ..…..... .. ..… ..... . .. .. .223 YARN”Cluster 运行模式介绍.... .. ...... .. ... . .. . ... .. . ... .. .. . ... . .. . ... . ... . .. .. ... . .. .. .. . ... . .. .. .. .. . .. . ... ..22 7 YARN-Cluster 运行模式实现原理....... . .. . ... . ... . .. .. .. . ... . ... . ... . ........................................22 9 YARN-Client 与YARN-Cluster 对比. . .... . .. .. .. .. .. . ... . .. .. ...... .. .. .. .. .. .. .. .. .. .. . ... .. . ... . . .. .. .. .. .. .23 2 Me sos 运行模式.................. ...... ... ....................... . ..... .. .... . . ... ... ... ..... ... . . .... ... .. . . . ..........2 3 3 6.6.1 6.6.2 6.6.3 6.6.4 6.6.5 6.6.6 Mesos 介绍......... . ..................................... .. .. .. .. .. .. .. .. . .. ... . .. .. . ... . .... . .. . .................. . .. .. .. . ..2 3 3 粗粒度运行模式介绍.. .. . ... .. ... . ... .. .. .. ... . ... . .. . .. ... .. ... . . .. ........ . ...... . ...... . ... . .. .. .. .. .. . .. .. .. . .. .. ..2 3 4 粗粒度实现原理.................................................................. . ............................. .. .........2 3 6 细粒度运行模式介绍.. . ............ . ... . .... . .. .. . .... . .. . ... ... . . ... . ... . ... . .. .. .. .. .. .. . .... ... .... ........ . . . . . ... .2 3 9 细粒度实现原理...... .. .. .. ... . ... .. ... . .. .. ... . ..... ..... . . ....... . .. ... . .. ........... . .. . ...............................2 4 0 Mesos 粗粒度和Mesos 细粒度对比.... . ...... . ................................................................2 4 3 实例演示........ ............................... .. ...... .... .... .... .. ....... ... ... ... .... ... . . .... ... .... ... ... .. ........ . ..24 3 6.7.1 6.7.2 6.7.3 独立运行模式实例....................................... .. .. ... . . .... . ... . .. . ... . ... . ...................................2 4 3 YARN-Client 实仔•J ...……………………………….......................................................... ....... 24 7 YARN-Cluster 实例.. ... . .... . .. .. ... . ... .. . . . . .. .. .. .. . . . . . ... .. .. . ... . ... .. .. . ... . ... ... . . .. .. .. .. . ... . ........... . ... .2 5 0 小结.. ...... .... ... .................... ........ ... ............ . ....................... .. ........ .. .... .. ....... .. ............ . ..2 5 3 第三篇组件篇 Spark SQL ..………............................................................................................................... 2 5 5 7 .1 Spark SQL 简介........................ .. .......................... . ...... ... ... .... .... ..... .. .... .... ..... .. ... ... .. ...2 5 5 7.2 7.3 7. I. I 7.1 .2 Spark SQL 发展历史.. .. ... .. ... . ..... . . .. .. .. ... . ....... .. .................. . ....... . .............. . ... . ...... . .. .. .. ..2 5 5 DataFrame/Dataset 介绍............... . ... .. ...... .. .. .. .. . .... . ... . ... . ..............................................2 5 8 Spark SQL 运行原理........ .............. .. . . .. .... .... .... ... . . .... .. . . ... . . .. .................... ... .... ... ...... . .2 61 7.2. l 通用SQL 执行原理. .. .... . ..…....... . ... .. …·.. . ... .. .. .. .. ... ...... ... .. ... . ... . .. .. ..... ... . .. .. . ... . ... .. .. . ... . ..约1 7.2.2 7.2.3 7.2.4 SparkSQL 运行架构.. .... . . . . ... ….. .. ... .. .. .. .. .. .. . ... .. . .. ... .. .. .. . . . .. . .. .. .. . .. .. . ... ... . . ... . ... . ....... . ... . ..2 6 2 SQLContext 运行原理分析...... ... .... .... ... . ... . . ................ . ........ . .. . ... . ... . ...... . ........... . ...... ..2 65 HiveContext 介绍........ ... ... . .. ... .. . ... .. . ... ... . .. ...... . ... .. .. .. ... . ... . .. .. .. . ... ... . .. .. .. . ... . ... . .. .. ... . ... . ...2 7 6 使用Hive-Console ................ .......... ............................... ..... ... ................. ... ................ 278 XIII XIV l 图解Spark :核心技术与案例实战 第8 章 7.4 7.5 7.6 7.7 7.8 7 .3.1 编译Hive-Conso le . … . .. . ....………......………....... . ......….......…..... . .... . ......……….. .. ....…....278 7.3.2 查看执行计划. .. .. ....…… … . . .....… … …….. . ......................... . ......... .. ..................................2 8 0 7.3.3 应用Hive-Console. ….. . .. . . . .. .. . ... ... . .. .... ... ... . ... . .. .. ... . ......... . ......... .... ..... .. . .. ... .. ... .. ............2 8 I 使用SQLConsole ............ ............ ........ .. ...... .... .. .......... ... .............. ............. ................. 284 7.4.1 启动HDFS 和Spark Shell ......... .............. ..... ........ ............................... ........................ 284 7.4.2 与RDD 交互操作.. . ... .. . .. ....... . ... ... .. . ... . .... .. .. .. .. .. ... .. . . . ... .. .... . . .. . .... ..... . ... .. ... ... ... .... .... .....284 7.4.3 读取JSON 格式数据.. . ... .. .. ....... .. .. . ..... . ... . ...................... . ...................... . ......................2 8 7 7.4.4 读取Parquet 格式数据... .. .. ... . .. .. ... .. ... .. .. ... . .. ... . .... . .... . ............................ . .....................2 8 8 7.4.5 缓存演示. ... . ....... .. .. ... . .. .. .. .. .. .. .. . .... . ... .. .. ... ... .. .. .. . . . ... ... . .. .... . ......... . ..... .. ... . ......................2 8 9 7.4.6 DSL 演示... . .. .. ... . .....……………........................................... . .. .. .......... .. ........... .. . .. .......... 2 9 0 使用Spark SQL CLI .............................................. .... .......... ....... ..... ........ .............. .... 290 7.5.1 7.5.2 配置并启动Spark SQL CLI ...………………………………………………………………………….. 291 实占戈Spark SQL CLI ....... .... .... ..... ...................................... ..... ................. ..................... 292 使用Thrift Server ................................................................................ .. ...... ...... ......... 293 7 .6.1 7.6.2 7.6.3 7.6.4 自己置并启动Thri位Server ....... ..... .... ................... ..... ..... .......... ............. .............. ........... 293 基本操作. .. . ... . ........ . ... . ... . ... . ... .. .. .. .. ........ . .... . .. .. ... .. ... ... .. .. ... ... .. ... .. ..... ... .. . . . ..... .. ... . ... ... ...2 9 5 交易数据实例. .. ... . .. .. .. ... . ... . ... .. . ... .. .. ... .. ... . ... . ............... . .... . ................ . ..........................2 9 6 使用IDEA 开发实例.. ... . ...…. . .... . .. .. ... ... .. .. . ..... . ... ... .. .. ... . .... .. ... .... . .. .. . . ... .. . .. ... . . . ..... . . .. ...2 9 8 实例演示. .... ...... ... ..... . ......... .. ...... .. ..................... . ........... . . . . .. ...................... . . . ..... . .......2 99 销售数据分类实例. ... . 7. 7 .1 ... .. . .. ... ... .. . .. . ..... . ... . ... . ................................................................2 99 7.7.2 网店销售数据统计. ........ . .. . .... . ... . ... .. . ..... . .. . .............. . .... .. ... . .... ... .. ... .... ... .... .... ... ....... ... .3 0 3 小结............ . ........ . ...... ... ..... ... ... .. . . ... . ... . . ... . ... ... . . . . ... . . .... . . . . ..... ... . ....... . . . . . . . . .. ..... . ...... . ...3 0 6 Spark Streaming ............................. ............... ............... .. .............. ...... .............................. 308 8.1 Spark S位earning 简介................................ .. ............. . .... . . ....... . . .. . ....... . . .. . ... . ......... .. .. ..3 08 8.2 8.3 8.1.1 8. 1.2 术语定义. ..................................................................... . ................. . .... . .......... . . . ....... . ....3 09 Spark Streaming 特点. ........ . .. .. .. ... .. ... .. ... .. .... . .. .. ... . .... . .. ... ... .... ... .... .. ... . .. ... .... .... ... ...... ...312 Spark S位earning 编程模型... . ....................... . .......... . ...... ............ . .... ........ ............ . .... .. 314 8.2.1 8.2.2 DS甘earn 的输入源.......... . ................. .. ............... ... . .. ... .. ... ... ... .. ................ .. ... .. ..............314 DStream 的操作...... . ....................... .. ...... .. .... . . .. ..... ... ............. .... ................... .... ............315 Spark Streaming 运行架构........ .. ............ .. . .... ................. . ............... . ............... . ... . ...... 3 1 9 8.3.1 8.3.2 8.3.3 运行架构. . . .. ... . ............... .. .. .. ........... ... ................. ... .......... .. ................ . . .. ....... ... .............31 9 消息通信.. .. .. ........... . .. .. ...... . ............. . .. .. .. .. ......... .. .................. .. .. .. .... .. ... .. ....... .. .............32 0 Receiver 分发. .. . ... . .. . . . ... . .. .. . . . . . ........ ..... ... .. ... . .......... . .... . ............ . ..... ... ... .. ......................3 23 第9 章 8.4 8.5 8.6 目录| 8.3 .4 容错性.... . .....…...... .…… ... .. . …… …… ……… .. ... . . …· · … . .. .. .. .. .. .. .. . . .......... .. .. . .. ..... . ... .. . . ...... . . 3 2 9 Spark S仕earning 运行原理·· · · · · · · · · · · · · · . · . · · .. · .................................................................. 3 3 I 8.4.1 启动流处理引擎... ... .... .. .. ... . ... . ...... .. .. .. . ... .. . .. .. .................. . ... . .. . .. . .... . ......... . ... . . ...... . ......3 3 I 8.4.2 8.4.3 接收及存储流数据.... . .... . . ........ . ... . .........…….. .. ....... . .. .. . ... .. . . .. ... . .. .. .. . .... ........ . . .. ... . .. . .. .. 3 3 4 数据处理. ... .. ... ... . . . .............. . .. .. ....... . .... . . .. .. .. .. .. . ....... .. .. . .. . ........... . . ... .. .. . .. . .....................3 41 实例演示..... .. .... . ...…............. . ...............….... .. ............ .... ... .. ...... . ...... . .........................34 6 8.5.1 8.5.2 8.5.3 流数据模拟器.. ... .... .. .... . ... .. .. .. . .... .. . .. .. ... . .. .. .. .......... . .. . ........... . .. .. . ... . ... . .. . ... . .. . .. .. .. . ...... ..3 4 6 销售数据统计实例............. . ... .. . . ............ . ... .. . .......... . .. .. .. .. .. . ....... . .. .. .. . .. ... . .. .. .. .. ..... .. .. . ..3 4 8 Spark S甘earning+Kafka 实例.... . .............................. . .. .. .. . ... . ............. ... .. ......... .. ...... . .. .. 3 5 I 小结.......... .... ..... ....... . . . ................... . . . ...... .. .................... .. ........................................... 3 5 6 Spark ML Ii b ........................................................................................................................ 3 s 8 9.1 Spark MLlib 简介............................. .... ... .. ....... ... .......... ... ...... ... .................. . .... . .........3 5 8 9 .1.1 Spark MLlib 介绍...... . . . .. ... .. .. . .. ............ . .. . .. .... . .. . .......... .. ...... ..... ... . ......... .. . ... . ... . . ... .. .... .3 5 8 9.2 9.3 9.4 9.5 9.6 9.7 9.1.2 9.1.3 9. 1.4 Spark MLlib 数据类型.. .. .. ... .. .. .. .. .. .. . ... ... . ... . .. .. . ... . ... . ....... . ....... . .. .. ......... . .......... . ... . .. . ....3 60 Spark MLlib 基本统计方法... .. .. .. .. .. . ... .. .. .. .. ... . . ....... . .............. . .. ... .. . ... . .................... . ... .3 6 5 预言模型标记语言... .. .. ... . .. .. ... .. .. .. .. .. .. .. . ... ... ..... .. .. ............. . ... .. ......... .. ... . .......... . .. . ... . .. .3 6 9 线性模型.......................................... . ......................................................................... 3 7 0 9.2.1 9.2.2 9.2.3 9.2.4 9.2.5 9.2.6 数学公式... ... .. . ... .. ..... . . … . .. ... ... .. . ... ..….. . .. .. . ... ..... ... .. . ... . .. .. .. .. .. . ... ... ..... . ... . . . . .. .. . .. .. ... .......3 7 0 线性回归.... . ... .. . .. . . .. .. .... . ........... . ... . ... . ....... . ........ . ...... . ....................... . .. .. .. . ... . ... . ... . ... . .. ..3 7 I 线性支持向量机. .. .. ... . ... .. .. ... .. .. .. .. .. .. . ... .. ... . .. .. .. . ............... . .......... .. .. . ................... . ........3 7 2 逻辑回归.... . ... . ... . . . .. .. .. . .. .… .. ... .. .. .. .. .. ... . .. ... . ... . .. . ... .. . ... .. .. .. . .. .. .. ... . .. .. .. ... . . .. .. ... . .. .. . .. .. .. . ..3 7 3 线性最,j 、二乘法、Lasso 和岭回归........... . .... . .. .. . ... . .. .. .. . ... .. .. .. ... . . .. .. .. ...... . ....... . ..... . ...3 7 3 流式线性回归.. ... . . . . … … ……….. . ... . ... ... . . ... .. ... . ... . . .. .. .. ... .. . ... . .. ... . .. ......... .. .......... . .. . .........3 7 3 决策树.... ...... ..... ....... . . ......... ... . . ....... .... ... ..... ... .............. . ............................................. 3 7 4 决策模型组合.............. ... .................. ..... .... . . . .............. .. .... ..... ... . ....... .. .......................3 7 5 9.4.1 9.4.2 随机森林.. . . .. .. ......... . ....... . .. .. .. . .. ... .. .. .............................. .. . .......................... . ... . ... . .. .. .. . ..3 7 6 梯度提升决策树.... . ......... . ... ... ...... . ... . ... . ... . ... . ... . ... .. .. . .. .. .. ... . .. .. .. .. ... . ... . .................... .. ... 3 7 7 朴素贝叶斯............................ ...... .. ...... .... .... .... ... ...... .. .... .................................. . ..... ...3 77 协同过滤. ...... ... ....................... .... .... .... .... ................ . ..... .. ................................... .. ......3 7 8 聚类...... ......... ..................... . . . ........ . ........ . ..... . ................. . ..... .... ............ . .....................3 8 0 9 .7. I 9.7.2 9.7.3 K-means .... .............................. ............................ ..... .................................. .... ............... 380 高其斤混合... . .... . ….............. .. .. ..... . ...... . . . . .. ... ...... ... ...... ... . … . .. .. .. .. ... . . ... . .. .. ..…..... . .. ... . . .... . .3 82 快速迭代聚类. ....... .. . ... .. ... .. . ... .. .. ... .. . ... . ... .. . ... . .. .. .. ... . ... .. .............. . ...... . ........... . ... . ...... .. .3 84 xv XVI |图解Spark :核心技术与案例实战 第10 章 9.8 9.9 9.10 9.11 9.12 9.7.4 LDA ..….... ...................................................................................................................... 3 84 9.7.5 二分K-means ...... .... ........ ............. .... ...... ....... ............................. ............ ............. .... ..... 385 9.7.6 流式K-means ..................................................... .............. ................................... ......... 386 降维..... 386 9.8.1 奇异值分解降维.. ... . . . . . ...…........ . …...... ... .. . ... . ... .. ... .. .. .. . . .... . .. . ....... . ..... ... . . . .. ... .. .. ...... . . .. .3 8 6 9.8.2 主成分分析降维.. .. .. .. .. . ... . .... . ... . ... . ... . .... .. .. ... .. .. . .... . .... . ...... . ..... . ............................. . .... . .3 8 7 特征提取和变换...............…. ....... .. ...... .. .. . ...... . . ... . . .... ..... .. . . .. ... . ..... ........ . .. ..... .. ... ... . .. . 3 8 8 9.9.1 9.9.2 9.9.3 9.9.4 词频一逆文档频率. ... . .... . .. . ... . .... . ....... . ........ .. ... . ...........................................................3 8 8 词向量化工具. .... . .. .. .. .. ... . ... . .. .. .. .. ... .. .. . ... ... .. .. ... ... .. . .... .. .... . ... .. .... . ........... . .....................3 8 9 标准化. .. .. .. . .. .. .. ... . .. .. .. . .... .. .. . ...... . ................................................................. . .............. ..3 9 0 范数化...... .. .. ... . ................. . . . .... . .. .. ... . ........ . ........ . ........ . ......... . ...................... ... ..... . . ... ....3 9 0 频繁模式挖掘.. .... ... . …...... .……. . .. .. …. . . ... ........ .. ........... . ............... . . . ..... . ... . .. . .... . ..... . .3 91 9.10.1 9.10.2 9.10.3 频繁模式增长. . .. .. .. .. .. ... . . .... .. . .. ... . ... . .... . ... . .... .. .. ..... . .. .. .. . .... . ..... . ............................ . .....3 91 关联规则挖掘.. .. .. .. .. .. .. .. . . . . ... . . ..... . . . . .. . .. ... ... . ............. . ..... . .. . . . ... . ........................ . ..... ....” l PrefixSpan ............………................. ......... ......... .......... ...... ..... ......... ...................... ....... 3 91 实例演示... .. . . ... ……. .. ... … …….. ... .… … ….... ..… …. .. . ...… ….... ..…. ... .. .…… . . ......………….392 9.11 .l 9 .11.2 K-means 聚类算法实例....... . ....... . ......... . .. . ….. . ... . ..…… … ... .. ...…... . .... .. ..... .. ...… .. ........3 92 手机短信分类实例................................................................................................. . ...3 9 6 小结........................... . ................. . ...... .... ........ .. ......... ...... .. . . .. . ................... . ..............40 1 Spark GraphX .........................................…………...............….........……………..... .. ..…….... 402 10.1 10.2 10.3 10.4 GraphX 介绍..... ... ...... . ....... ... .... . . ... .... ..... . . . . .. .. . .. ....... ......... ...... . .. . .. ...... ................... . .402 10.1.1 10.1.2 10.1.3 图计算.. . ... . ..… . . . . . . . ... .. .. . ... . ........ . ... . .... . .. . . . . . . . . .... . ........ .... .. .. .. .... .. ... .. ..... . ... . ..................40 GraphX 介绍. .. .. ... . … . ... . .. . .... . .... . ... . ............. .. . .. ............. . .................. . .. . ...... . ................40 发展历程.........…........................... . ............. . ... . .................................................. . .... ....40 2 3 4 GraphX 实现分析..... . ........ . ....................................... . ...................... . .. . ............... .. . ..40 5 10.2.1 10.2.2 10.2.3 10.2.4 GraphX 图数据模型........ .. .. .. . . . .. ... .. . …… . . ... ..…. .. .. .. .. ... .. ...... . ..... .. ... .. .. .. ... .. ...... .. ....... .40 6 GraphX 图数据存储...................................................................................... .... ...... ...40 8 GraphX 图切分策咯.. .. . ... .. .. .. . ... .. .... .. . . ...... .. .. .. . ... . ... . .... .. ..... . ..... .. .... .. .. ... .. .. .... . ........ . ..41 0 GraphX 图才桑作... . .. .. ... . .. .. ...... . .... .. ..... . ............ . .. .. . ..... . . .. . . ..... .. . . .. .... ... . ....... . ............... .412 实例演示........... . ...... .. ....... . .... . . . ................. . . . ............. .. ............. . ..............................418 10.3.1 10.3.2 图例演示. ........... . ....... . ...... . .... . ... .. ..................... .. ... . ......... . ..... ...... .. ... .. .. .. ............... ... ..418 社区发现演示. .. . ... .. .. .. .. . ... .. .. . ..... . ...... ... .. . .. . ...... .. . ... ... . .... . ...... . . .... .. ... .. .. .. .. .. ................4 2 5 小结.. . . . ..… ... .... … …… . ........……. . ....… . ......... . ....…··… .. . .. ....... . . ……… … … . .. .... .…… . .. ...429 第11 章 第12 章 目录| SparkR ... ..............………................................................................................................... 430 11.1 11 .2 11 .3 11.4 11.5 概述.............. . .... ... .. . . . .............................................. . ..................... . ............... . ..... . ....4 3 0 11 .1.1 11 .1.2 R 语言介绍..... .. .. .. .. .. .... .. .. .. ... .. ... . .. . ... .. .. .. .. .. ... . .. . ... ... . ... . .. ... . ........... . . .. ...... .. .. . .. .. .. .. .. . ..4 3 0 SparkR 介绍... .... . . ... ... ... .. .. .. .. ... .. . ... .. ...... . ................... . ........ . ... . . . .... . ... . .. . ....... . ... . ...... . .础l SparkR 与DataFrame ...................................................................................... .... ..... 4 3 2 11.2. l 11 .2.2 DataFrames 介绍............................ .. ..... .. ....... . ....... .. .. .. . .. ...... .. . . . . .. . ... . .. .. ....... . . ... .. . ... . ..43 2 与DataFrame 的相关操作........ ......... ... ....... . .. .. .. .. .. .. .. .. .. . .. .. .. .. ... ...... . .. .. ........ ... ... . . ... . .434 编译安装SparkR.................................................................................................. .... 435 11 .3 .1 11 .3.2 11 .3 .3 11 .3.4 编译安装R 语言..... . ......................................................................................... . ...... .. .4 3 安装SparkR 运行环境.............. .. ...... .. .. .. .. . .... . ....... . .. .. .. .. .. .. . ... .. .. ... . . ....... .. .. .. . .. ..........43 安装SparkR ...... .................... ......... .... .... ... ..... ............. ....... .... ........ .... ........ .... .... ......... 43 启动并验证安装...... . .... . ....... . .... .. .. . ................ .. . .. ... . .. .. ... . ...... .. ....... . .. .. . ... . .. .. .. .. . ... . ... . .43 5 7 8 9 实例演示...... ...... ....... ..... . . ..... ..... ..... . . . . . ... . .. . ........ . ...... ... ..... ... ..... ... ... . ................ .. ......440 小结.......................... .. .. . .... . ... ... ....... . ......... .. ..... .... ..... .... .... ..... ... ... ..... .. .... ... .... ... .... ...444 Alluxio .................….........…...........................………………………….. ... ....…..... ..... ................ 44 5 12.1 12.2 12.3 12.4 12.5 Alluxio 简介........................ . .......... .. ........ ... .. .. . ........ .............. . ......... . ..................... . .44 5 12.1.1 12.1.2 12. 1.3 Alluxio 介绍... . ......... .. ... . ... .. .. . . . . .. ... .. .. ... . .. .. ... . ... . ... .. .. . ... .. . ... .. . .... . .. . ... ..... ... . ... . .. ... . .. .....44 5 Alluxio 系统架构......... . .... . ... . ........ . ... .. .. . ... . ............ . ... . .. .. ....... . ....... . ....... . ... . .. .. .. .. .. ... . .44 6 HDFS 与Alluxio..……... ......….... ........ ..... ........ .... ........ .... ............................................ 450 Alluxio 编译部署... .... . .. … . . . . ...... .. ....... . ................................... . ..... . .............. ..... .... ...4 5 1 12.2.1 12.2.2 12.2.3 编译Alluxio ........................... ............. .... ........ ........ .... .... ..... ... .... ................................ 451 单机部署Alluxio ................................................ ...... ..... ..... ... ..... ........... ................... .. 453 集群模式部署Alluxio ..…………………………...... .................................... ..... ... .... .... ...... 4 5 5 Alluxio 命令行使用.. .............. . ......................... . ................................ . ....... . ...... . .... ..4 5 7 12.3.1 12.3.2 接口说明.... . ..... .. ...... . ..... . . ............ . .. .. .......... . ........ .. ... . ... . .............. . ... . ....... .. .. .. . .. .. .. . .....45 7 接口操作示例. ... .. .. .…………………………………………………………………………………..... .459 实例演示............................. . . . .......... . ..... . ................... . .............................................46 2 12.4.1 12.4.2 12.4.3 启动环境... .. . . .. ... ...... .. .. . . ... . .. ... ...... . ... . ... . ... . ... . ... .. .. . ........ . ... . ... . ... . ... . ... . .. .. .. .. . .. .. . ... ..... . .4 6 2 Alluxio 上运行Spark.............. .................................. ............................ ............... .... ... 46 2 Alluxio 上运行MapReduce .. ..... .... .... .... ..... ... ......... ..... ....... .... .... ................................ 46 5 小结…………………………………………………………………………………………………………·“6
4 0浏览
免费 - 基于Spark的大数据分析平台的设计与实现数据分析实现采用Scala编程语言进行实现,通过Spark SQL进行对数据的处理,并把结果存储到MySQL中,最后通过数据可视化技术把数据展示出来
5 3923浏览
会员免费 - 美团大数据平台架构实战美团大数据平台架构实战,美团大数据平台架构实战,美团大数据平台架构实战
5 851浏览
会员免费 - Spark入门(完整版)Spark大小:32MBSpark入门完整版 PDF版,从生态圈 部署安装 编程模型 运行框架 stream sql mllib graphx tachyon安装部署介绍Spark入门完整版 PDF版,从生态圈 部署安装 编程模型 运行框架 stream sql mllib graphx tachyon安装部署介绍
3 651浏览
会员免费 - Spark_GraphX大规模图计算和图挖掘graphx大小:39MBSpark_GraphX大规模图计算和图挖掘Spark_GraphX大规模图计算和图挖掘
5 593浏览
会员免费 - Spark 入门实战系列Spark大小:32MBSpark 入门实战系列,适合初学者,文档包括十部分内容,质量很好,为了感谢文档作者,也为了帮助更多的人入门,传播作者的心血,特此友情转贴: 1.Spark及其生态圈简介.pdf 2.Spark编译与部署(上)--基础环境搭建.pdf 2.Spark编译与部署(下)--Spark编译安装.pdf 2.Spark编译与部署(中)--Hadoop编译安装.pdf 3.Spark编程模型(上)--概念及SparkShell实战.pdf 3.Spark编程模型(下)--IDEA搭建及实战.pdf 4.Spark运行架构.pdf 5.Hive(上)--Hive介绍及部署.pdf 5.Hive(下)--Hive实战.pdf 6.SparkSQL(上)--SparkSQL简介.pdf 6.SparkSQL(下)--Spark实战应用.pdf 6.SparkSQL(中)--深入了解运行计划及调优.pdf 7.SparkStreaming(上)--SparkStreaming原理介绍.pdf 7.SparkStreaming(下)--SparkStreaming实战.pdf 8.SparkMLlib(上)--机器学习及SparkMLlib简介.pdf 8.SparkMLlib(下)--SparkMLlib实战.pdf 9.SparkGraphX介绍及实例.pdf 10.分布式内存文件系统Tachyon介绍及安装部署.pdfSpark 入门实战系列,适合初学者,文档包括十部分内容,质量很好,为了感谢文档作者,也为了帮助更多的人入门,传播作者的心血,特此友情转贴: 1.Spark及其生态圈简介.pdf 2.Spark编译与部署(上)--基础环境搭建.pdf 2.Spark编译与部署(下)--Spark编译安装.pdf 2.Spark编译与部署(中)--Hadoop编译安装.pdf 3.Spark编程模型(上)--概念及SparkShell实战.pdf 3.Spark编程模型(下)--IDEA搭建及实战.pdf 4.Spark运行架构.pdf 5.Hive(上)--Hive介绍及部署.pdf 5.Hive(下)--Hive实战.pdf 6.SparkSQL(上)--SparkSQL简介.pdf 6.SparkSQL(下)--Spark实战应用.pdf 6.SparkSQL(中)--深入了解运行计划及调优.pdf 7.SparkStreaming(上)--SparkStreaming原理介绍.pdf 7.SparkStreaming(下)--SparkStreaming实战.pdf 8.SparkMLlib(上)--机器学习及SparkMLlib简介.pdf 8.SparkMLlib(下)--SparkMLlib实战.pdf 9.SparkGraphX介绍及实例.pdf 10.分布式内存文件系统Tachyon介绍及安装部署.pdf
4 776浏览
会员免费 - 学习scala好的项目spark大小:19MB该scala项目时自己整理的,对初学者很有帮助,里面包含了scala各种技术该scala项目时自己整理的,对初学者很有帮助,里面包含了scala各种技术
4 877浏览
会员免费 - scala-2.12.6.msiscala大小:124MB官网下载的 https://www.scala-lang.org/download/,放心使用官网下载的 https://www.scala-lang.org/download/,放心使用
5 213浏览
会员免费 - 龙果学院 深入大数据架构师之路,问鼎40万年薪龙果学院 深入大数据架构师之路, 问鼎40万年薪
1 230浏览
会员免费 - hadoop-common-2.7.3-bin-masterhadoop大小:439KB当在windows开发hadoop时,需要配置HADOOP_HOME环境变量,变量值D:\Program Files\hadoop-common-2.7.3-bin-master,出现如下错误之一,需要配置: 1、org.apache.hadoop.io.nativeio.NativeIO$Windows.createDirectoryWithMode0(Ljava/lang/String;I)V 2、org.apache.hadoop.io.nativeio.NativeIO$Windows.createDirectoryWithMode0当在windows开发hadoop时,需要配置HADOOP_HOME环境变量,变量值D:\Program Files\hadoop-common-2.7.3-bin-master,出现如下错误之一,需要配置: 1、org.apache.hadoop.io.nativeio.NativeIO$Windows.createDirectoryWithMode0(Ljava/lang/String;I)V 2、org.apache.hadoop.io.nativeio.NativeIO$Windows.createDirectoryWithMode0
4 492浏览
会员免费 - 大数据项目源代码,电影推荐系统Movie_recommend-masterspark大小:60MB大数据项目源代码,电影推荐系统Movie_recommend-master,包括实时推荐和离线推荐大数据项目源代码,电影推荐系统Movie_recommend-master,包括实时推荐和离线推荐
3 4142浏览
会员免费 - 亚马逊美食评论50万数据集(Amazon Fine Food Reviews)spark大小:353MB亚马逊美食评论50万数据集(Amazon Fine Food Reviews) 之前做评论分类 好不容易找到的数据集 分享了。 需要更多资源请关注。 Github: https://github.com/huangyueranbbc亚马逊美食评论50万数据集(Amazon Fine Food Reviews) 之前做评论分类 好不容易找到的数据集 分享了。 需要更多资源请关注。 Github: https://github.com/huangyueranbbc
3 2139浏览
会员免费 - Spark-The Definitive Guide Big Data Processing Made SimpleSpark-The Definitive Guide Big Data Processing Made Simple 完美true pdf。 Apache Spark is a unified computing engine and a set of libraries for parallel data processing on computer clusters. As of this writing, Spark is the most actively developed open source engine for this task, making it a standard tool for any developer or data scientist interested in big data. Spark supports multiple widely used programming languages (Python, Java, Scala, and R), includes libraries for diverse tasks ranging from SQL to streaming and machine learning, and runs anywhere from a laptop to a cluster of thousands of servers. This makes it an easy system to start with and scale-up to big data processing or incredibly large scale.
5 189浏览
会员免费 - netcat-0.7.1netcat大小:390KBnetcat被誉为网络安全界的‘瑞士军刀',相信没有什么人不认识它吧...... 一个简单而有用的工具,透过使用TCP或UDP协议的网络连接去读写数据。它被设计成一个稳定的后门工具, 能够直接由其它程序和脚本轻松驱动。netcat被誉为网络安全界的‘瑞士军刀',相信没有什么人不认识它吧...... 一个简单而有用的工具,透过使用TCP或UDP协议的网络连接去读写数据。它被设计成一个稳定的后门工具, 能够直接由其它程序和脚本轻松驱动。
5 2239浏览
会员免费 - spark2.1.0-bin-hadoop2.7TGZ大小:187MBspark-2.1.0-bin-hadoop2.7.tgz linux 安装文件 。spark-2.1.0-bin-hadoop2.7.tgz linux 安装文件 。
5 695浏览
会员免费 - 电影推荐系统:基于spark、hadoop、kafka、MongoDB、angular等大数据框架实现实时+离线推荐spark大小:223KB基于spark、hadoop、kafka、MongoDB、flume、elasticseach、angular等大数据框架实现实时+离线推荐,文件里面是源代码,还附送教学视频,让零基础的新手也能快速实现该系统,从而获得一个拿得出手的项目,帮助大家转型大数据或找工作等。基于spark、hadoop、kafka、MongoDB、flume、elasticseach、angular等大数据框架实现实时+离线推荐,文件里面是源代码,还附送教学视频,让零基础的新手也能快速实现该系统,从而获得一个拿得出手的项目,帮助大家转型大数据或找工作等。
1 3005浏览
会员免费 - Learning Apache Spark 2Apache大小:16MBLearning Apache Spark 2 by Muhammad Asif Abbasi English | 6 Jun. 2017 | ASIN: B01M7RO7US | 356 Pages | AZW3 | 16.22 MB Key Features Exclusive guide that covers how to get up and running with fast data processing using Apache Spark Explore and exploit various possibilities with Apache Spark using real-world use cases in this book Want to perform efficient data processing at real time? This book will be your one-stop solution. Book Description Spark juggernaut keeps on rolling and getting more and more momentum each day. The core challenge are they key capabilities in Spark (Spark SQL, Spark Streaming, Spark ML, Spark R, Graph X) etc. Having understood the key capabilities, it is important to understand how Spark can be used, in terms of being installed as a Standalone framework or as a part of existing Hadoop installation and configuring with Yarn and Mesos. The next part of the journey after installation is using key components, APIs, Clustering, machine learning APIs, data pipelines, parallel programming. It is important to understand why each framework component is key, how widely it is being used, its stability and pertinent use cases. Once we understand the individual components, we will take a couple of real life advanced analytics examples like: Building a Recommendation system Predicting customer churn The objective of these real life examples is to give the reader confidence of using Spark for real-world problems. What you will learn Overview Big Data Analytics and its importance for organizations and data professionals. Delve into Spark to see how it is different from existing processing platforms Understand the intricacies of various file formats, and how to process them with Apache Spark. Realize how to deploy Spark with YARN, MESOS or a Stand-alone cluster manager. Learn the concepts of Spark SQL, SchemaRDD, Caching, Spark UDFs and working with Hive and Parquet file formats Understand the architecture of Spark MLLib while discussing some of the off-the-shelf algorithms that come with Spark. Introduce yourself to SparkR and walk through the details of data munging including selecting, aggregating and grouping data using R studio. Walk through the importance of Graph computation and the graph processing systems available in the market Check the real world example of Spark by building a recommendation engine with Spark using collaborative filtering Use a telco data set, to predict customer churn using Regression About the Author Asif Abbasi has worked in the industry for over 15 years, in a variety of roles starting from engineering solutions to selling solutions and everything in between. Asif is currently working with SAS a Market Leader in Analytic Solutions as a Principal Business Solutions Manager for the Global Technologies Practice. Based out of London, Asif has vast experience in consulting for major organizations & industries across the globe, and running proof-of-concepts across various industries including but not limited to Telecommunications, Manufacturing, Retail, Finance, Services, Utilities and Government. Asif has presented at various conferences and delivered workshops on topics such as Big Data, Hadoop, Teradata, and Analytics using Aster on Teradata and Hadoop. Asif is a Oracle Certified Java EE 5 Enterprise Architect, Teradata Certified Master, PMP, Hortonworks Hadoop Certified developer and Administrator. Asif also holds a Masters degree in Computer Science and Business Administration.Learning Apache Spark 2 by Muhammad Asif Abbasi English | 6 Jun. 2017 | ASIN: B01M7RO7US | 356 Pages | AZW3 | 16.22 MB Key Features Exclusive guide that covers how to get up and running with fast data processing using Apache Spark Explore and exploit various possibilities with Apache Spark using real-world use cases in this book Want to perform efficient data processing at real time? This book will be your one-stop solution. Book Description Spark juggernaut keeps on rolling and getting more and more momentum each day. The core challenge are they key capabilities in Spark (Spark SQL, Spark Streaming, Spark ML, Spark R, Graph X) etc. Having understood the key capabilities, it is important to understand how Spark can be used, in terms of being installed as a Standalone framework or as a part of existing Hadoop installation and configuring with Yarn and Mesos. The next part of the journey after installation is using key components, APIs, Clustering, machine learning APIs, data pipelines, parallel programming. It is important to understand why each framework component is key, how widely it is being used, its stability and pertinent use cases. Once we understand the individual components, we will take a couple of real life advanced analytics examples like: Building a Recommendation system Predicting customer churn The objective of these real life examples is to give the reader confidence of using Spark for real-world problems. What you will learn Overview Big Data Analytics and its importance for organizations and data professionals. Delve into Spark to see how it is different from existing processing platforms Understand the intricacies of various file formats, and how to process them with Apache Spark. Realize how to deploy Spark with YARN, MESOS or a Stand-alone cluster manager. Learn the concepts of Spark SQL, SchemaRDD, Caching, Spark UDFs and working with Hive and Parquet file formats Understand the architecture of Spark MLLib while discussing some of the off-the-shelf algorithms that come with Spark. Introduce yourself to SparkR and walk through the details of data munging including selecting, aggregating and grouping data using R studio. Walk through the importance of Graph computation and the graph processing systems available in the market Check the real world example of Spark by building a recommendation engine with Spark using collaborative filtering Use a telco data set, to predict customer churn using Regression About the Author Asif Abbasi has worked in the industry for over 15 years, in a variety of roles starting from engineering solutions to selling solutions and everything in between. Asif is currently working with SAS a Market Leader in Analytic Solutions as a Principal Business Solutions Manager for the Global Technologies Practice. Based out of London, Asif has vast experience in consulting for major organizations & industries across the globe, and running proof-of-concepts across various industries including but not limited to Telecommunications, Manufacturing, Retail, Finance, Services, Utilities and Government. Asif has presented at various conferences and delivered workshops on topics such as Big Data, Hadoop, Teradata, and Analytics using Aster on Teradata and Hadoop. Asif is a Oracle Certified Java EE 5 Enterprise Architect, Teradata Certified Master, PMP, Hortonworks Hadoop Certified developer and Administrator. Asif also holds a Masters degree in Computer Science and Business Administration.
4 132浏览
会员免费 - Agile Data Science 2.0Spark大小:6MBAgile Data Science 2.0: Building Full-Stack Data Analytics Applications with Spark by Russell Jurney English | 7 Jun. 2017 | ASIN: B072MKL34K | 352 Pages | AZW3 | 5.91 MB Data science teams looking to turn research into useful analytics applications require not only the right tools, but also the right approach if they’re to succeed. With the revised second edition of this hands-on guide, up-and-coming data scientists will learn how to use the Agile Data Science development methodology to build data applications with Python, Apache Spark, Kafka, and other tools. Author Russell Jurney demonstrates how to compose a data platform for building, deploying, and refining analytics applications with Apache Kafka, MongoDB, ElasticSearch, d3.js, scikit-learn, and Apache Airflow. You’ll learn an iterative approach that lets you quickly change the kind of analysis you’re doing, depending on what the data is telling you. Publish data science work as a web application, and affect meaningful change in your organization. Build value from your data in a series of agile sprints, using the data-value pyramid Extract features for statistical models from a single dataset Visualize data with charts, and expose different aspects through interactive reports Use historical data to predict the future via classification and regression Translate predictions into actions Get feedback from users after each sprint to keep your project on trackAgile Data Science 2.0: Building Full-Stack Data Analytics Applications with Spark by Russell Jurney English | 7 Jun. 2017 | ASIN: B072MKL34K | 352 Pages | AZW3 | 5.91 MB Data science teams looking to turn research into useful analytics applications require not only the right tools, but also the right approach if they’re to succeed. With the revised second edition of this hands-on guide, up-and-coming data scientists will learn how to use the Agile Data Science development methodology to build data applications with Python, Apache Spark, Kafka, and other tools. Author Russell Jurney demonstrates how to compose a data platform for building, deploying, and refining analytics applications with Apache Kafka, MongoDB, ElasticSearch, d3.js, scikit-learn, and Apache Airflow. You’ll learn an iterative approach that lets you quickly change the kind of analysis you’re doing, depending on what the data is telling you. Publish data science work as a web application, and affect meaningful change in your organization. Build value from your data in a series of agile sprints, using the data-value pyramid Extract features for statistical models from a single dataset Visualize data with charts, and expose different aspects through interactive reports Use historical data to predict the future via classification and regression Translate predictions into actions Get feedback from users after each sprint to keep your project on track
5 153浏览
会员免费 - scala-intellij-bin-2018.1.8.zipscala大小:51MBidea的scala插件,解压缩后吧scala文件夹放到idea安装文件夹下的plugins文件夹中即可。idea只有安装该插件后才可以使用scala编程idea的scala插件,解压缩后吧scala文件夹放到idea安装文件夹下的plugins文件夹中即可。idea只有安装该插件后才可以使用scala编程
4 302浏览
会员免费 - Link Prediction相似性计算方法示例.rarLink大小:566KB根据吕琳媛、周涛《链路预测》中的相似性指标计算示例,参考附录提供的代码分别用python和matlab实现CN、Jaccard、RA指标的计算。并用书中提供了简单的五点的无权无向网络进行测试,结果与作者的计算吻合。 内含文件如下: (1)书中相似性指标计算示例原文 (2)网络的邻接表.txt文件 (3)matlab代码 (4)python代码 (5)python生成的网络图根据吕琳媛、周涛《链路预测》中的相似性指标计算示例,参考附录提供的代码分别用python和matlab实现CN、Jaccard、RA指标的计算。并用书中提供了简单的五点的无权无向网络进行测试,结果与作者的计算吻合。 内含文件如下: (1)书中相似性指标计算示例原文 (2)网络的邻接表.txt文件 (3)matlab代码 (4)python代码 (5)python生成的网络图
5 1477浏览
会员免费 - scala2.10.6scala2.10.6大小:58MBscala2.10.6,官网下载贼慢 scala2.10.6,官网下载贼慢 scala2.10.6,官网下载贼慢 scala2.10.6,官网下载贼慢 scala2.10.6,官网下载贼慢scala2.10.6,官网下载贼慢 scala2.10.6,官网下载贼慢 scala2.10.6,官网下载贼慢 scala2.10.6,官网下载贼慢 scala2.10.6,官网下载贼慢
0 227浏览
会员免费 - spark官方文档中文版.pdfspark官方文档中文版.pdf !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!1111111111111111
5 946浏览
会员免费 - 三套大数据实战项目集合(视频讲解+源代码+相关文档+相关软件资料)1、大数据平台基础概述。2、驴妈妈大数据平台项目。3、某团购网大型电商离线数据分析平台。4、视频讲解+源代码+相关文档+相关软件资料
5 1295浏览
会员免费 - 《scala编程》第3版英文版&第3版源码scala大小:13MB资源包含《Scala编程》第3版英文版和第3版源代码,Scala编程第3版,目前是最新版,支持Scala 2.11以上。《Scala编程》是Scala语言的创始人参与编写的,涵盖的语法特性非常全面,并且作者解释了为什么这么设计,有怎样的考虑,开发者应该以什么样的方式使用。 因此学习这本书,不单单是学会了scala语言本身,更重要的是扩展了视野,提升了思维能力和培养了考虑问题的思维方式方法。资源包含《Scala编程》第3版英文版和第3版源代码,Scala编程第3版,目前是最新版,支持Scala 2.11以上。《Scala编程》是Scala语言的创始人参与编写的,涵盖的语法特性非常全面,并且作者解释了为什么这么设计,有怎样的考虑,开发者应该以什么样的方式使用。 因此学习这本书,不单单是学会了scala语言本身,更重要的是扩展了视野,提升了思维能力和培养了考虑问题的思维方式方法。
5 205浏览
会员免费 - spark mllib 协同过滤推荐算法(ALS) python 实现 完整实例程序spark大小:866KB一个完成的spark mllib 协同过滤推荐算法ALS 完整实例程序,基于 spark yarn-client模式运行,另外,包括训练数据。一个完成的spark mllib 协同过滤推荐算法ALS 完整实例程序,基于 spark yarn-client模式运行,另外,包括训练数据。
0 4016浏览
会员免费 - Frank Kane's Taming Big Data with Apache Spark and Python 【含代码】Apache大小:6MBFrank Kane's Taming Big Data with Apache Spark and Python English | 2017 | ISBN-10: 1787287947 | 296 pages | AZW3/PDF/EPUB (conv) | 6.12 Mb Key Features Understand how Spark can be distributed across computing clusters Develop and run Spark jobs efficiently using Python A hands-on tutorial by Frank Kane with over 15 real-world examples teaching you Big Data processing with Spark Book Description Frank Kane's Taming Big Data with Apache Spark and Python is your companion to learning Apache Spark in a hands-on manner. Frank will start you off by teaching you how to set up Spark on a single system or on a cluster, and you'll soon move on to analyzing large data sets using Spark RDD, and developing and running effective Spark jobs quickly using Python. Apache Spark has emerged as the next big thing in the Big Data domain – quickly rising from an ascending technology to an established superstar in just a matter of years. Spark allows you to quickly extract actionable insights from large amounts of data, on a real-time basis, making it an essential tool in many modern businesses. Frank has packed this book with over 15 interactive, fun-filled examples relevant to the real world, and he will empower you to understand the Spark ecosystem and implement production-grade real-time Spark projects with ease. What you will learn Find out how you can identify Big Data problems as Spark problems Install and run Apache Spark on your computer or on a cluster Analyze large data sets across many CPUs using Spark's Resilient Distributed Datasets Implement machine learning on Spark using the MLlib library Process continuous streams of data in real time using the Spark streaming module Perform complex network analysis using Spark's GraphX library Use Amazon's Elastic MapReduce service to run your Spark jobs on a cluster About the Author My name is Frank Kane. I spent nine years at Amazon and IMDb, wrangling millions of customer ratings and customer transactions to produce things such as personalized recommendations for movies and products and "people who bought this also bought." I tell you, I wish we had Apache Spark back then, when I spent years trying to solve these problems there. I hold 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, I left to start my own successful company, Sundog Software, which focuses on virtual reality environment technology, and teaching others about big data analysis. Table of Contents Getting Started with Spark Spark Basics and Simple Examples Advanced Examples of Spark Programs Running Spark on a Cluster SparkSQL, Dataframes and Datasets Other Spark Technologies and Libraries Where to Go From Here? - Learning More About Spark and Data ScienceFrank Kane's Taming Big Data with Apache Spark and Python English | 2017 | ISBN-10: 1787287947 | 296 pages | AZW3/PDF/EPUB (conv) | 6.12 Mb Key Features Understand how Spark can be distributed across computing clusters Develop and run Spark jobs efficiently using Python A hands-on tutorial by Frank Kane with over 15 real-world examples teaching you Big Data processing with Spark Book Description Frank Kane's Taming Big Data with Apache Spark and Python is your companion to learning Apache Spark in a hands-on manner. Frank will start you off by teaching you how to set up Spark on a single system or on a cluster, and you'll soon move on to analyzing large data sets using Spark RDD, and developing and running effective Spark jobs quickly using Python. Apache Spark has emerged as the next big thing in the Big Data domain – quickly rising from an ascending technology to an established superstar in just a matter of years. Spark allows you to quickly extract actionable insights from large amounts of data, on a real-time basis, making it an essential tool in many modern businesses. Frank has packed this book with over 15 interactive, fun-filled examples relevant to the real world, and he will empower you to understand the Spark ecosystem and implement production-grade real-time Spark projects with ease. What you will learn Find out how you can identify Big Data problems as Spark problems Install and run Apache Spark on your computer or on a cluster Analyze large data sets across many CPUs using Spark's Resilient Distributed Datasets Implement machine learning on Spark using the MLlib library Process continuous streams of data in real time using the Spark streaming module Perform complex network analysis using Spark's GraphX library Use Amazon's Elastic MapReduce service to run your Spark jobs on a cluster About the Author My name is Frank Kane. I spent nine years at Amazon and IMDb, wrangling millions of customer ratings and customer transactions to produce things such as personalized recommendations for movies and products and "people who bought this also bought." I tell you, I wish we had Apache Spark back then, when I spent years trying to solve these problems there. I hold 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, I left to start my own successful company, Sundog Software, which focuses on virtual reality environment technology, and teaching others about big data analysis. Table of Contents Getting Started with Spark Spark Basics and Simple Examples Advanced Examples of Spark Programs Running Spark on a Cluster SparkSQL, Dataframes and Datasets Other Spark Technologies and Libraries Where to Go From Here? - Learning More About Spark and Data Science
1 204浏览
会员免费 - Spark_in_Action_Maning英文原版Maning原版电子书,楼主40美刀购得,分享给大家,讲解内容深入浅出,但需一定英文阅读能力
5 0浏览
会员免费 - 毕业设计(论文)-基于物联网的智能家居设计毕业设计(论文)-基于物联网的智能家居设计,毕业设计RFID,m2m,智能家居,GPRS,远程控制
0 3251浏览
会员免费 - RStudio中文学习手册简单明了地为我们介绍了rstudio中一些必须要用到的技巧,将英文转化成了中文,是初学r语言必不可少的哦
5 5019浏览
会员免费 - Spark大数据分析实战(高清完整本,带书签)spark大小:23MB本书一共11章:其中第1~3章,主要介绍了Spark的基本概念、编程模型、开发与部署的方法;第4~11章,详细详解了热点新闻分析系统、基于云平台的日志数据分析、情感分析系统、搜索引擎链接分析系统等的应用与算法等核心知识点。本书一共11章:其中第1~3章,主要介绍了Spark的基本概念、编程模型、开发与部署的方法;第4~11章,详细详解了热点新闻分析系统、基于云平台的日志数据分析、情感分析系统、搜索引擎链接分析系统等的应用与算法等核心知识点。
5 0浏览
会员免费 - Spark SQL 入门到精通到项目实战的世界(全套日志分析)日志文件日志文件大小:5MB慕课网Spark SQL 入门到精通到项目实战的世界(全套日志分析)日志文件慕课网Spark SQL 入门到精通到项目实战的世界(全套日志分析)日志文件
5 526浏览
会员免费 - 基于SPARK的大数据实战(在线电影推荐)SPARK大小:91KB基于SPARK的大数据实战(在线电影推荐),使用最主流的大数据技术,实现电影的推荐,包含代码。基于SPARK的大数据实战(在线电影推荐),使用最主流的大数据技术,实现电影的推荐,包含代码。
0 3492浏览
会员免费 - 全国2014-2018年空气质量csv数据集文件数据.csv空气质量大小:27MB全国2014-2018年空气质量csv数据集文件数据,包含字段time(时间),city(城市),AQI,PM2.5,PM10,SO2,NO2,CO,O3,primary_pollutant(主要污染物),共计55万条数据。全国2014-2018年空气质量csv数据集文件数据,包含字段time(时间),city(城市),AQI,PM2.5,PM10,SO2,NO2,CO,O3,primary_pollutant(主要污染物),共计55万条数据。
0 3591浏览
会员免费 - 5.Hive(下)--Hive实战.pdf1.Spark及其生态圈简介.pdf 2.Spark编译与部署(上)--基础环境搭建.pdf 2.Spark编译与部署(下)--Spark编译安装.pdf 2.Spark编译与部署(中)--Hadoop编译安装.pdf 3.Spark编程模型(上)--概念及SparkShell实战.pdf 3.Spark编程模型(下)--IDEA搭建及实战.pdf 4.Spark运行架构.pdf 5.Hive(上)--Hive介绍及部署.pdf 5.Hive(下)--Hive实战.pdf 6.SparkSQL(上)--SparkSQL简介.pdf 6.SparkSQL(下)--Spark实战应用.pdf 6.SparkSQL(中)--深入了解运行计划及调优.pdf 7.SparkStreaming(上)--SparkStreaming原理介绍.pdf 7.SparkStreaming(下)--SparkStreaming实战.pdf 8.SparkMLlib(上)--机器学习及SparkMLlib简介.pdf 8.SparkMLlib(下)--SparkMLlib实战.pdf 9.SparkGraphX介绍及实例.pdf 10.分布式内存文件系统Tachyon介绍及安装部署.pdf
5 0浏览
会员免费 - 基于Spark的推荐系统的设计与实现推荐系统是数据挖掘的一个重要部分,能够实现海量数据信息的快速、全面、准确过滤。然而基于以往传统单个主机模式实现的推荐算法其计算过程耗费的时间过长,已经不能满足当前商业时代快速可靠的技术追求。大数据平台Spark分布式计算框架通过引入RDD(弹性分布式数据集)的概念以及基于内存的运算模式,能够更好的适应大数据挖掘这一应用场景。推荐算法在实现过程中存在多次迭代计算,Spark计算框架的使用可以极大提升推荐系统的运算效率。本文利用Spark平台设计了一个基于物品的协同过滤(Item-CF)算法的商品推荐系统,并将其应用在Movie Lens数据集上运行测试。实验结果表明,该系统能够提高推荐精确度并降低运算时间,为继续深入研究大数据平台的推荐算法提供借鉴和帮助。
5 4933浏览
会员免费 - 基于 Flume+Kafka+Spark Streaming 实现实时监控输出日志的报警系统的 Spark Streaming 程序代码Spark大小:7MB基于 Flume+Kafka+Spark Streaming 实现实时监控输出日志的报警系统的 Spark Streaming 程序代码,博客链接: https://blog.csdn.net/linge1995/article/details/81326146基于 Flume+Kafka+Spark Streaming 实现实时监控输出日志的报警系统的 Spark Streaming 程序代码,博客链接: https://blog.csdn.net/linge1995/article/details/81326146
4 2000浏览
会员免费 - 大数据测试数据集测试数据大小:17MB本数据是淘宝开源的用户真是产生的数据,包括商品id,用户id,商品品类id,生成时间,用户行为:pv,cut(加入购车车)。。。。本数据是淘宝开源的用户真是产生的数据,包括商品id,用户id,商品品类id,生成时间,用户行为:pv,cut(加入购车车)。。。。
2 1884浏览
会员免费 - 工信部:工业数据分类分级指南(试行)【导读】工业和信息化部近日印发《工业数据分类分级指南(试行)》,《指南》适用于工业和信息化主管部门、工业企业、平台企业等开展工业数据分类分级工作。其所指工业数据是工业领域产品和服务全生命周期产生和应用的数据,包括但不限于工业企业在研发设计、生产制造、经营管理、运维服务等环节中生成和使用的数据,以及工业互联网平台企业在设备接入、平台运行、工业APP应用等过程中生成和使用的数据。
0 1721浏览
会员免费 - 本人的spark项目代码以及数据,请下载查看大数据项目大小:51MB这个里面装的是本人写的一个关于广告精准投放的一个spark项目,实现语言的是scala,然后里面有代码和注释,还有需要的操作的日志文件,东西很齐全这个里面装的是本人写的一个关于广告精准投放的一个spark项目,实现语言的是scala,然后里面有代码和注释,还有需要的操作的日志文件,东西很齐全
3 811浏览
会员免费 - Apache Spark的设计与实现 PDF中文版本文主要讨论 Apache Spark 的设计与实现,重点关注其设计思想、运行原理、实现架构及性能调优,附带讨论与 Hadoop MapReduce 在设计与实现上的区别。不喜欢将该文档称之为“源码分析”,因为本文的主要目的不是去解读实现代码,而是尽量有逻辑地,从设计与实现原理的角度,来理解 job 从产生到执行完成的整个过程,进而去理解整个系统。 讨论系统的设计与实现有很多方法,本文选择 问题驱动 的方式,一开始引入问题,然后分问题逐步深入。从一个典型的 job 例子入手,逐渐讨论 job 生成及执行过程中所需要的系统功能支持,然后有选择地深入讨论一些功能模块的设计原理与实现方式。也许这样的方式比一开始就分模块讨论更有主线。 本文档面向的是希望对 Spark 设计与实现机制,以及大数据分布式处理框架深入了解的 Geeks。 因为 Spark 社区很活跃,更新速度很快,本文档也会尽量保持同步,文档号的命名与 Spark 版本一致,只是多了一位,最后一位表示文档的版本号。 由于技术水平、实验条件、经验等限制,当前只讨论 Spark core standalone 版本中的核心功能,而不是全部功能。诚邀各位小伙伴们加入进来,丰富和完善文档。 好久没有写这么完整的文档了,上次写还是三年前在学 Ng 的 ML 课程的时候,当年好有激情啊。这次的撰写花了 20+ days,从暑假写到现在,大部分时间花在 debug、画图和琢磨怎么写上,希望文档能对大家和自己都有所帮助。 内容 本文档首先讨论 job 如何生成,然后讨论怎么执行,最后讨论系统相关的功能特性。具体内容如下: Overview 总体介绍 Job logical plan 介绍 job 的逻辑执行图(数据依赖图) Job physical plan 介绍 job 的物理执行图 Shuffle details 介绍 shuffle 过程 Architecture 介绍系统模块如何协调完成整个 job 的执行 Cache and Checkpoint 介绍 cache 和 checkpoint 功能 Broadcast 介绍 broadcast 功能 Job Scheduling
3 1139浏览
会员免费