Data.Algorithms.Recipes.for.Scaling.Up.with.Hadoop.and.Spark.1资源-CSDN文库

5星 · 超过95%的资源需积分: 13 41 浏览量 2015-01-29 11:21:08 上传评论 8 收藏 30.49MB PDF 举报

资源推荐

资源详情

资源评论

Data Algorithms_Recipes for Scaling up with Hadoop and Spark

Data Algorithms

by Mahmoud Parsian

Publisher: O'Reilly Media, Inc.

Release Date:

二月

15, 2015

ISBN: 9781491906187

Book Description

Learn the algorithms and tools you need to build MapReduce applications with Hadoop for processing gigabyte, terabyte, or

petabyte-sized datasets on clusters of commodity hardware. With this practical book, Author Mahmoud Parsian, head of the big

data team at Illumina, takes you step-by-step through the design of machine-learning algorithms, such as Naive Bayes and

Markov Chain, and shows you how apply them to clinical and biological datasets, using MapReduce design patterns.

Table of Contents

1. Preface

58. 3.6 Spark Implementation: Unique Keys

59. 3.6.1 Introduction

60. 3.6.2 What is an RDD?

61. 3.6.3 Spark's Function Classes

62. 3.6.4 Spark Solution for Top-10 Pattern

63. 3.6.5 Complete Spark Solution for Top-10 Pattern

64. 3.6.6 Input

65. 3.6.7 Sample Run : find top-10 list

66. 3.7 What If for Top-N

67. 3.7.1 Shared Data Structures Definition and Usage

68. 3.8 What If for Bottom-N

69. 3.9 Spark Implementation : Non-Unique Keys

70. 3.9.1 Complete Spark Solution for Top-10 Pattern

71. 4 Left Outer Join in MapReduce

72. 4.1 Introduction

73. 4.2 Implementation of Left Outer Join in MapReduce

74. 4.2.1 MapReduce Phase-1

75. 4.2.2 MapReduce Phase-2: Counting Unique Locations ...

76. 4.2.3 Implementation Classes in Hadoop

77. 4.3 Sample Run

78. 4.3.1 Input for Phase-1

79. 4.3.2 run Phase-1

80. 4.3.3 View Output of Phase-1 (Input of Phase-2)

81. 4.3.4 Run Phase-2

82. 4.3.5 View Output of Phase-2

83. 4.4 Spark Implementation

84. 4.4.1 Spark Program

85. 4.4.2 STEP-0: Import Required Classes

86. 4.4.3 STEP-1: Read Input Parameters

87. 4.4.4 STEP-2: Create JavaSparkContext Object

88. 4.4.5 STEP-3: Create a JavaPairRDD for Users

89. 4.4.6 STEP-4: Create a JavaPairRDD for Transactions

90. 4.4.7 STEP-5: Create a union of RDD's created by STEP-3 and STEP-4

91. 4.4.8 STEP-6: Create a JavaPairRDD(userID, List(T2)) by calling groupBy()

92. 4.4.9 STEP-7: Create a productLocationsRDD as JavaPair-RDD(String,String)

93. 4.4.10 STEP-8: Find all locations for a product

94. 4.4.11 STEP-9: Finalize output by changing "value"

95. 4.4.12 STEP-10: Print the final result RDD

96. 4.4.13 Running Spark Solution

97. 4.5 Running Spark on YARN

98. 4.5.1 Script to Run Spark on YARN

99. 4.5.2 Running Script

100. 4.5.3 Checking Expected Output

101. 4.6 Left Outer Join by Spark's leftOuterJoin()

102. 4.6.1 High-Level Steps

103. 4.6.2 STEP-0: import required classes and interfaces

104. 4.6.3 STEP-1: read input parameters

105. 4.6.4 STEP-2: create Spark's context object

106. 4.6.5 STEP-3: create RDD for user's data

107. 4.6.6 STEP-4: Create usersRDD: The "right" Table

108. 4.6.7 STEP-5: create transactionRDD for transaction's data

109. 4.6.8 STEP-6: Create transactionsRDD: The Left Table

110. 4.6.9 STEP-7: use Spark's built-in JavaPairRDD.leftOuterJoin() method

111. 4.6.10 STEP-8: create (product, location) pairs

112. 4.6.11 STEP-9: group (K=product, V=location) pairs by K .

113. 4.6.12 STEP-10: create final output (K=product, V=Set(location))

170. 8.6 Solution 2: Hadoop Implementation using ArrayListOfLongsWritable

171. 8.6.1 Sample Run for Solution 2

172. 8.7 Spark Solution

173. 8.7.1 STEP-0: Import Required Classes

174. 8.7.2 STEP-1: Check Input Parameters

175. 8.7.3 STEP-2: Create a JavaSparkContext Object

176. 8.7.4 STEP-3: Read Input

177. 8.7.5 STEP-4: Apply a Mapper

178. 8.7.6 STEP-5: Apply a Reducer

179. 8.7.7 STEP-6: Find Common Friends

180. 8.8 Sample Run of a Spark Program

181. 8.8.1 HDFS Input

182. 8.8.2 Script to Run Spark Program

183. 8.8.3 Log of Sample Run

184. 9 Recommendation Engines using MapReduce

185. 9.1 Customers Who Bought This Item Also Bought

186. 9.1.1 Input

187. 9.1.2 Expected Output

188. 9.1.3 MapReduce Solution

189. 9.2 Frequently Bought Together

190. 9.2.1 Input

191. 9.2.2 MapReduce Solution

192. 9.3 Recommend People Connection

193. 9.3.1 Input

194. 9.3.2 Output

195. 9.3.3 MapReduce Solution

196. 9.4 Spark Implementation

197. 9.4.1 STEP-0: Import Required Classes

198. 9.4.2 STEP-1: Handle Input Parameters

199. 9.4.3 STEP-2: Create Spark's Context Object

200. 9.4.4 STEP-3: Read HDFS Input File

201. 9.4.5 STEP-4: Implement map() Function

202. 9.4.6 STEP-5: Implement reduce() Function

203. 9.4.7 STEP-6: Generate Final Output

204. 9.4.8 Convenient Methods

205. 9.4.9 HDFS Input

206. 9.4.10 Script to Run Spark Program

207. 9.4.11 Program Run Log

208. 10 Content-Based Recommendation: Movies

209. 10.1 Input

210. 10.2 MapReduce PHASE-1

211. 10.3 MapReduce PHASE-2 and PHASE-3

212. 10.4 MapReduce-Phase-2 Mapper

213. 10.5 MapReduce-Phase-2 Reducer

214. 10.6 MapReduce-Phase-3 Mapper

215. 10.7 MapReduce-Phase-3 Reducer

216. 10.8 More Similarity Measures

217. 10.9 Movie Recommendation in Spark

218. 10.9.1 High-Level Solution in Spark

219. 10.9.2 High-Level Solution: All Steps

220. 10.9.3 STEP-0: Import Required Classes

221. 10.9.4 STEP-1: Handle Input Parameters

222. 10.9.5 STEP-2: Create a Spark's Context Object

223. 10.9.6 STEP-3: Read Input File and Create RDD

224. 10.9.7 STEP-4: Find Who Has Rated Movies

225. 10.9.8 STEP-5: Group moviesRDD by Movie

剩余685页未读，继续阅读

评论收藏

内容反馈

yogapig123456

2015-04-01

Great book for learning hadoop and spark. You must read it if you want to learn big data. Thx.
泪-_-很累

2015-03-04

正好需要，谢谢。资源完整并且清晰
k329621268

2015-10-26

相当不错~ 文字版, 不是扫描版!
tangzhenyu2022

2015-03-30

书是完整的，内容很丰富
slz_3333

2015-09-21

全英文，文字清晰！质量不错！感谢贡献者！