hadoop_spark_数据算法资源-CSDN文库

hadoop

spark

需积分: 10 84 浏览量 2017-11-16 12:15:05 上传评论收藏 34.34MB GZ 举报

共1个文件

pdf：1个

资源推荐

资源详情

资源评论

收起资源包目录

book.tar.gz （1个子文件）

book

数据算法--Hadoop-Spark.pdf 36.93MB

Data Algorithms_Recipes for Scaling up with Hadoop and Spark

Data Algorithms

by Mahmoud Parsian

Publisher: O'Reilly Media, Inc.

Release Date:

二月

15, 2015

ISBN: 9781491906187

Book Description

Learn the algorithms and tools you need to build MapReduce applications with Hadoop for processing gigabyte, terabyte, or

petabyte-sized datasets on clusters of commodity hardware. With this practical book, Author Mahmoud Parsian, head of the big

data team at Illumina, takes you step-by-step through the design of machine-learning algorithms, such as Naive Bayes and

Markov Chain, and shows you how apply them to clinical and biological datasets, using MapReduce design patterns.

Table of Contents

1. Preface

2. 0.1 Introduction

3. 0.2 Relationship of Spark and Hadoop

4. 0.3 What is MapReduce?

5. 0.4 Why use MapReduce?

6. 0.5 What Is in This Book?

7. 0.6 What Is the Focus of This Book?

8. 0.7 What are Core Concepts of MapReduce/Hadoop?

9. 0.8 Is MapReduce for Everything?

10. 0.9 What is not MapReduce

11. 0.10 Who Is This Book For?

12. 0.11 What Software Is Used in This Book?

13. 0.12 Using Code Examples

14. 0.13 Where NOT to use MapReduce?

15. 0.14 Chapters in This Book?

16. 0.15 Online Resources

17. 0.16 Comments and Questions for This Book?

18. 1 Secondary Sort: Introduction

19. 1.1 What is a Secondary Sort Problem?

20. 1.2 Solutions to Secondary Sort Problem

21. 1.2.1 Sort Order of Intermediate Keys

22. 1.3 Data Flow Using Plug-in Classes

23. 1.4 Mapreduce/Hadoop Solution

24. 1.4.1 Input

25. 1.4.2 Expected Output

26. 1.4.3 map() function

27. 1.4.4 reduce() function

28. 1.4.5 Hadoop Implementation

29. 1.4.6 Sample Run of Hadoop Implementation

30. 1.4.7 Sample Run

31. 1.5 What If Sorting Ascending or Descending

32. 1.6 Spark Solution To Secondary Sorting

33. 1.6.1 Time-Series as Input

34. 1.6.2 Expected Output

35. 1.6.3 Option-1: Secondary Sorting in Memory

36. 1.6.4 Spark Sample Run

37. 1.6.5 Option-2: Secondary Sorting using Framework

38. 2 Secondary Sorting: Detailed Example

39. 2.1 Introduction

40. 2.2 Secondary Sorting Technique

41. 2.3 Complete Example of Secondary Sorting

42. 2.3.1 Problem Statement

43. 2.3.2 Input Format

44. 2.3.3 Output Format

45. 2.3.4 Composite Key

46. 2.3.5 Sample Run

47. 2.4 Secondary Sort using New Hadoop API

48. 3 Top 10 List

49. 3.1 Introduction

50. 3.2 Top-N Formalized

51. 3.3 MapReduce Solution

52. 3.4 Implementation in Hadoop

53. 3.4.1 Input

54. 3.4.2 Sample Run 1: find top 10 list

55. 3.4.3 Output

56. 3.4.4 Sample Run 2: find top 5 list

57. 3.5 Bottom 10

58. 3.6 Spark Implementation: Unique Keys

59. 3.6.1 Introduction

60. 3.6.2 What is an RDD?

61. 3.6.3 Spark's Function Classes

62. 3.6.4 Spark Solution for Top-10 Pattern

63. 3.6.5 Complete Spark Solution for Top-10 Pattern

64. 3.6.6 Input

65. 3.6.7 Sample Run : find top-10 list

66. 3.7 What If for Top-N

67. 3.7.1 Shared Data Structures Definition and Usage

68. 3.8 What If for Bottom-N

69. 3.9 Spark Implementation : Non-Unique Keys

70. 3.9.1 Complete Spark Solution for Top-10 Pattern

71. 4 Left Outer Join in MapReduce

72. 4.1 Introduction

73. 4.2 Implementation of Left Outer Join in MapReduce

74. 4.2.1 MapReduce Phase-1

75. 4.2.2 MapReduce Phase-2: Counting Unique Locations ...

76. 4.2.3 Implementation Classes in Hadoop

77. 4.3 Sample Run

78. 4.3.1 Input for Phase-1

79. 4.3.2 run Phase-1

80. 4.3.3 View Output of Phase-1 (Input of Phase-2)

81. 4.3.4 Run Phase-2

82. 4.3.5 View Output of Phase-2

83. 4.4 Spark Implementation

84. 4.4.1 Spark Program

85. 4.4.2 STEP-0: Import Required Classes

86. 4.4.3 STEP-1: Read Input Parameters

87. 4.4.4 STEP-2: Create JavaSparkContext Object

88. 4.4.5 STEP-3: Create a JavaPairRDD for Users

89. 4.4.6 STEP-4: Create a JavaPairRDD for Transactions

90. 4.4.7 STEP-5: Create a union of RDD's created by STEP-3 and STEP-4

91. 4.4.8 STEP-6: Create a JavaPairRDD(userID, List(T2)) by calling groupBy()

92. 4.4.9 STEP-7: Create a productLocationsRDD as JavaPair-RDD(String,String)

93. 4.4.10 STEP-8: Find all locations for a product

94. 4.4.11 STEP-9: Finalize output by changing "value"

95. 4.4.12 STEP-10: Print the final result RDD

96. 4.4.13 Running Spark Solution

97. 4.5 Running Spark on YA RN

98. 4.5.1 Script to Run Spark on YA RN

99. 4.5.2 Running Script

100. 4.5.3 Checking Expected Output

101. 4.6 Left Outer Join by Spark's leftOuterJoin()

102. 4.6.1 High-Level Steps

103. 4.6.2 STEP-0: import required classes and interfaces

104. 4.6.3 STEP-1: read input parameters

105. 4.6.4 STEP-2: create Spark's context object

106. 4.6.5 STEP-3: create RDD for user's data

107. 4.6.6 STEP-4: Create usersRDD: The "right" Table

108. 4.6.7 STEP-5: create transactionRDD for transaction's data

109. 4.6.8 STEP-6: Create transactionsRDD: The Left Table

110. 4.6.9 STEP-7: use Spark's built-in JavaPairRDD.leftOuterJoin() method

111. 4.6.10 STEP-8: create (product, location) pairs

112. 4.6.11 STEP-9: group (K=product, V=location) pairs by K .

113. 4.6.12 STEP-10: create final output (K=product, V=Set(location))

114. 4.6.13 Sample Run by YA RN

115. 5 Order Inversion Pattern

116. 5.1 Introduction

117. 5.2 Example of Order Inversion Pattern

118. 5.3 MapReduce for Order Inversion Pattern

119. 5.3.1 Custom Partitioner

120. 5.3.2 Relative Frequency Mapper

121. 5.3.3 Relative Frequency Reducer

122. 5.3.4 Implementation Classes in Hadoop

123. 5.4 Sample Run

124. 5.4.1 Input

125. 5.4.2 Running MapReduce Job

126. 5.4.3 Generated Output

127. 6 Moving Average

128. 6.1 Introduction

129. 6.1.1 Example-1: Time Series Data

130. 6.1.2 Example-2: Time Series Data

131. 6.2 Formal Definition

132. 6.3 Moving Average by POJO

133. 6.3.1 First solution: using Queue

134. 6.3.2 Second Solution : using Array

135. 6.3.3 Testing of Moving Average

136. 6.3.4 Sample Run

137. 6.4 MapReduce Solution

138. 6.4.1 Input

139. 6.4.2 Output

140. 6.4.3 MapReduce Solution: Option-1: sort in RAM

141. 6.4.4 Hadoop Implementation: sort in RAM

142. 6.4.5 Sample Run

143. 6.4.6 MapReduce Solution: Option-2: Sort by MR Framework

144. 6.5 Sample Run

145. 7 Market Basket Analysis

146. 7.1 What is Market Basket Analysis?

147. 7.2 MapReduce/Hadoop Solution

148. 7.3 What are the Application areas for MBA?

149. 7.4 Market Basket Analysis using MapReduce

150. 7.4.1 Mapper Formal

151. 7.4.2 Reducer

152. 7.5 MapReduce/Hadoop Implementation Classes

153. 7.5.1 Find Sorted Combinations

154. 7.5.2 Market Basket Analysis Driver: MBADriver

155. 7.5.3 Market Basket Analysis Mapper: MBAMapper ....

156. 7.5.4 Sample Run

157. 7.6 Spark/Hadoop Solution

158. 7.6.1 MapReduce Algorithm

159. 7.6.2 Input

160. 7.6.3 Spark Implementation

161. 7.6.4 Creating Item Sets From Transactions

162. 8 Common Friends

163. 8.1 Introduction

164. 8.2 Input

165. 8.3 Common Friends Algorithm

166. 8.4 MapReduce Algorithm

167. 8.4.1 MapReduce Algorithm in Action

168. 8.5 Solution 1: Hadoop Implementation using Text

169. 8.5.1 Sample Run for Solution 1

170. 8.6 Solution 2: Hadoop Implementation using ArrayListOfLongsWritable

171. 8.6.1 Sample Run for Solution 2

172. 8.7 Spark Solution

173. 8.7.1 STEP-0: Import Required Classes

174. 8.7.2 STEP-1: Check Input Parameters

175. 8.7.3 STEP-2: Create a JavaSparkContext Object

176. 8.7.4 STEP-3: Read Input

177. 8.7.5 STEP-4: Apply a Mapper

178. 8.7.6 STEP-5: Apply a Reducer

179. 8.7.7 STEP-6: Find Common Friends

180. 8.8 Sample Run of a Spark Program

181. 8.8.1 HDFS Input

182. 8.8.2 Script to Run Spark Program

183. 8.8.3 Log of Sample Run

184. 9 Recommendation Engines using MapReduce

185. 9.1 Customers Who Bought This Item Also Bought

186. 9.1.1 Input

187. 9.1.2 Expected Output

188. 9.1.3 MapReduce Solution

189. 9.2 Frequently Bought Together

190. 9.2.1 Input

191. 9.2.2 MapReduce Solution

192. 9.3 Recommend People Connection

193. 9.3.1 Input

194. 9.3.2 Output

195. 9.3.3 MapReduce Solution

196. 9.4 Spark Implementation

197. 9.4.1 STEP-0: Import Required Classes

198. 9.4.2 STEP-1: Handle Input Parameters

199. 9.4.3 STEP-2: Create Spark's Context Object

200. 9.4.4 STEP-3: Read HDFS Input File

201. 9.4.5 STEP-4: Implement map() Function

202. 9.4.6 STEP-5: Implement reduce() Function

203. 9.4.7 STEP-6: Generate Final Output

204. 9.4.8 Convenient Methods

205. 9.4.9 HDFS Input

206. 9.4.10 Script to Run Spark Program

207. 9.4.11 Program Run Log

208. 10 Content-Based Recommendation: Movies

209. 10.1 Input

210. 10.2 MapReduce PHASE-1

211. 10.3 MapReduce PHASE-2 and PHASE-3

212. 10.4 MapReduce-Phase-2 Mapper

213. 10.5 MapReduce-Phase-2 Reducer

214. 10.6 MapReduce-Phase-3 Mapper

215. 10.7 MapReduce-Phase-3 Reducer

216. 10.8 More Similarity Measures

217. 10.9 Movie Recommendation in Spark

218. 10.9.1 High-Level Solution in Spark

219. 10.9.2 High-Level Solution: All Steps

220. 10.9.3 STEP-0: Import Required Classes

221. 10.9.4 STEP-1: Handle Input Parameters

222. 10.9.5 STEP-2: Create a Spark's Context Object

223. 10.9.6 STEP-3: Read Input File and Create RDD

224. 10.9.7 STEP-4: Find Who Has Rated Movies

225. 10.9.8 STEP-5: Group moviesRDD by Movie

评论收藏

内容反馈

快乐的小蝌蚪

粉丝: 0
资源: 14

hadoop_spark_数据算法

数据算法 Hadoop Spark大数据处理技巧

数据算法: Hadoop+Spark大数据处理技巧.pdf

spark聚类算法的数据

数据算法 hadoop spark大数据处理技巧

大数据(spark+hadoop)教程

大数据hadoop,spark教程.zip

一种思路,学习开源大数据平台( hadoop+spark).mm

数据算法 Hadoop_Spark大数据处理技巧(网盘链接和密码) ,马哈默德·帕瑞斯安 ,P680 ,2016.10.txt

数据算法Hadoop/Spark大数据处理技巧 源代码

Starred_Paper_Hadoop_Spark.docx

数据算法 Hadoop_Spark大数据处理技巧 ,马哈默德·帕瑞斯安 ,P680 ,2016.10.pdf

大数据学习视频（全）spark+hadoop

基于hadoop-spark的发电企业信息化大数据平台建设.docx

大数据hadoop-spark-storm全套视频教程

基于hadoop和spark 架构大数据平台搭建 24页

2017零基础学云计算大数据视频教程hadoop storm kafka spark开发（重发）

数据算法 Hadoop Spark大数据处理技巧(1/3)

数据算法：Hadoop／Spark大数据处理技巧

HADOOP_SPARK

数据算法-Hadoop Spark大数据处理技巧

2017零基础学云计算大数据视频教程hadoop storm kafka spark开发

大数据环境搭建（java, hadoop, hbase, spark, miniconda, jupyte）

大数据基础面试题hadoop,zookeeper,hbase,hive,spark,kafka,flink,clickhouse

大数据（hadoop+spark+hbase+zookeeper+kafka+scala+ambari）全套视频教程（花3000￥买的）

spark+hadoop+kafka+zookeeper 大数据平台搭建脚本

数据算法 Hadoop Spark大数据处理技巧 中文PDF

spark 数据算法 Hadoop/Spark大数据处理技巧（Data Algorithms）

数据算法:Hadoop+Spark大数据 中文版

最新资源

数据算法Hadoop/Spark大数据处理技巧源代码

数据算法 Hadoop Spark大数据处理技巧中文PDF

数据算法:Hadoop+Spark大数据中文版