Spark1.4.1RDD算子详解资源-CSDN文库

需积分: 49 5 浏览量 2017-03-02 09:24:41 上传评论收藏 4.64MB DOC 举报

资源推荐

资源详情

资源评论

Spark 常用算子实现原理详解
Spark 常用算子实现原理详解...............................................................................................................................................1
take(num:Int)...............................................................................................................................................................2
first()............................................................................................................................................................................4
sortByKey(ascending: Boolean = true, numPartitions: Int = self.partitions.length)..................................................4
count()........................................................................................................................................................................13
countApprox( timeout: Long, confidence: Double = 0.95)......................................................................................13
countApproxDistinct(relativeSD: Double = 0.05)....................................................................................................17
collect()......................................................................................................................................................................18
toLocalIterator...........................................................................................................................................................18
takeOrdered(num: Int)...............................................................................................................................................20
aggregate[U: ClassTag](zeroValue: U)(seqOp: (U, T) => U, combOp: (U, U) => U)...........................................21
fold(zeroValue: T)....................................................................................................................................................22
treeAggregate...........................................................................................................................................................23
reduce(f: (T, T) => T)..............................................................................................................................................25
max()........................................................................................................................................................................26
min().........................................................................................................................................................................27
treeReduce(f: (T, T) => T).......................................................................................................................................27
map[U: ClassTag](f: T => U)..................................................................................................................................28
mapPartitions[U: ClassTag]( f: Iterator[T] => Iterator[U], preservesPartitioning: Boolean = false).....................29
mapPartitionsWithIndex[U: ClassTag](   f: (Int,   Iterator[T]) =>   Iterator[U], preservesPartitioning: Boolean   =
false)...............................................................................................................................................................................30
flatMap[U: ClassTag](f: T => TraversableOnce[U])..............................................................................................31
filter(f: T => Boolean).............................................................................................................................................31
  combineByKey[C](createCombiner:   V   =>   C,   mergeValue:  (C,   V)   =>   C,   mergeCombiners:   (C,   C)   =>   C,
partitioner: Partitioner, mapSideCombine: Boolean = true, serializer: Serializer = null)............................................33
distinct()...................................................................................................................................................................40
groupByKey(partitioner: Partitioner)......................................................................................................................41
aggregateByKey[U: ClassTag](zeroValue: U, numPartitions: Int).........................................................................42
coalesce(numPartitions: Int, shuffle: Boolean = false)...........................................................................................44
repartition(numPartitions: Int).................................................................................................................................51
sample( withReplacement: Boolean, fraction: Double, seed: Long = Utils.random.nextLong)............................52
takeSample( withReplacement: Boolean, num: Int, seed: Long = Utils.random.nextLong)..................................54
randomSplit( weights: Array[Double], seed: Long = Utils.random.nextLong)......................................................56
union(other: RDD[T])..............................................................................................................................................58
++(other: RDD[T])..................................................................................................................................................63
intersection(other: RDD[T])....................................................................................................................................63
glom.........................................................................................................................................................................68

35. cartesian[U: ClassTag].............................................................................................................................................69

36. zip[U: ClassTag](other: RDD[U])...........................................................................................................................70

37. zipPartitions.............................................................................................................................................................73

38. zipWithIndex().........................................................................................................................................................73

39. zipWithUniqueId.....................................................................................................................................................75

40. foreach(f: T => Unit)...............................................................................................................................................76

41. foreachPartition(f: Iterator[T] => Unit)...................................................................................................................77

42. subtract( other: RDD[T],p: Partitioner)...................................................................................................................77

43. keyBy[K] (f: T => K)..............................................................................................................................................81

1. take(num:Int)

获取前 num 条记录。

def take(num: Int): Array[T] = withScope {

if (num == 0) {

new Array[T](0)

} else {

val buf = new ArrayBuffer[T]

val totalParts = this.partitions.length

var partsScanned = 0

while (buf.size < num && partsScanned < totalParts) {

// The number of partitions to try in this iteration. It is ok for this number to be

// greater than totalParts because we actually cap it at totalParts in runJob.