CCTC 2016 Databricks范文臣:Dataset in Spark

3星(超过75%的资源)
所需积分/C币:16 2016-05-16 18:35:36 1010KB PDF
112
收藏 收藏
举报

该文档来自CCTC 2016中国云计算技术大会。Apache Spark committer & Databricks软件工程师范文臣发表的题为“Dataset in Spark”的主题演讲,欢迎下载!
Why you love spark? Sdatabricks Why you love spark? Efficient: general execution graphs, in memory storage 120 110 90 ■ Hadoop 60 Spark 30 0.9 Logistic regression in Hadoop and Spark Sdatabricks Why you love spark? Ease of Use: collection (rdd) based API text file spark, textFile ("hdfs: //.") text file. flatMap(lambda line: line split()) map( lambda word: (word, 1)) reduceByKey lambda a, b: a+b) Word count in Spark's Python APl Sdatabricks Background: What is in an RDD? Dependencies Partitions (with optional locality info) Compute function: Partition=> IteratorlTI 6 Sdatabricks Background: What is in an RDD? Dependencies Partitions (with optional locality info) Compute function: Partition=> Iterator] Opaque computation Sdatabricks Background: What is in an RDD? Dependencies Partitions (with optional locality info) Compute function: Partition = IteratorlTI paque Data 8 Sdatabricks RDD API is not expressive enough pdata, map(lambda X:(xdept, Ix. age, 11))\ reduce ByKey(lambda x, y: [x[]+y[o, x[1]+y[1]])\ map( Lambda x:[x[0],×[1][0]/x[1[1) collect() seLeCT dept, AvGCage) FROM pdata GROUP BY dept 9 Sdatabricks Structure By definition, structure will limit what can be expressed In practice, we can accommodate the vast majority of computations Limiting the space of what can be expressed enables optimizations 10 Sdatabricks

...展开详情
试读 40P CCTC 2016 Databricks范文臣:Dataset in Spark
立即下载
限时抽奖 低至0.43元/次
身份认证后 购VIP低至7折
一个资源只可评论一次,评论内容不能少于5个字
hitlx 很清楚的资料,感觉内容过于细节。
2016-05-20
回复
您会向同学/朋友/同事推荐我们的CSDN下载吗?
谢谢参与!您的真实评价是我们改进的动力~
  • 分享王者

关注 私信
上传资源赚钱or赚积分
最新推荐
CCTC 2016 Databricks范文臣:Dataset in Spark 16积分/C币 立即下载
1/40
CCTC 2016 Databricks范文臣:Dataset in Spark第1页
CCTC 2016 Databricks范文臣:Dataset in Spark第2页
CCTC 2016 Databricks范文臣:Dataset in Spark第3页
CCTC 2016 Databricks范文臣:Dataset in Spark第4页
CCTC 2016 Databricks范文臣:Dataset in Spark第5页
CCTC 2016 Databricks范文臣:Dataset in Spark第6页
CCTC 2016 Databricks范文臣:Dataset in Spark第7页
CCTC 2016 Databricks范文臣:Dataset in Spark第8页

试读结束, 可继续读4页

16积分/C币 立即下载