CCTC 2016 Intel程浩:Spinach An Ad-hoc Query Engine on Top of Spark SQL

5星(超过95%的资源)
所需积分/C币:9 2016-05-18 17:05:10 997KB PDF
82
收藏 收藏
举报

该文档来自CCTC 2016中国云计算技术大会。英特尔亚太研发有限公司Spark Core团队研发经理程浩发表的题为“Spinach An Ad-hoc Query Engine on Top of Spark SQL”的主题演讲,欢迎下载!
Spinach CCTG 2016中国云计算技术大会 Cloud Computing Technology Conference 2016 about me Active Spark Contributor in Apache Open Source Engineering Manager from BDT of Intel APAC Leading the ia optimization for spark at intel Motive CCTG 2016中国云计算技术大会 Cloud Computing Technology Conference 2016 How to accelerate sql queries with Spark sQl? Motive CCTG 2016中国云计算技术大会 Cloud Computing Technology Conference 2016 ° Tungsten (Offheap) Data oriented Memory Management Cache-aware computation Code generation ° Tungsten Whole Stage Code Generation ∨ ectorization What Else? Why Spinach Can Accelerate the Ad-hoc Query? CCTG 2016中国云计算技术大会 Cloud Computing Technology Conference 2016 Computing engine Hive Table Redis Cassandra Parquet JSON ORC HBase Data Source apl Connector Connector Connector A ALLUXIo redis cassandra Bs〓 Cache Layer ■■■ amazon Storage layer web services s3云丽里OSS Why Spinach Can Accelerate the Ad-hoc Query? CCTG 2016中国云计算技术大会 Cloud Computing Technology Conference 2016 Computing engine No additional 3rd Service required Data Source APl Fine-grained Data Cached Spinach Customized Indices supported Cache Layer Data Cached in Off-heap Memory(No GC overhead) ■■■ amazon s3云丽里OSS Storage layer web services How to use Spinach? CCTG 2016中国云计算技术大会 Cloud Computing Technology Conference 2016 Getting Started 1. Start the Spark sQl shell and load the spinach package SSPARK HOME/bin/spark-sql--jars spinach-01 jar 2. Create a Spinach backend Data Source Table spark-sql> CrEATE TABle src(a INT, b STRING, value INT)USING org. apache. spark. sql execution. datasources. spinach 3. Add Index Support the Data Source Table spark-sq >cREATE INDEX idx 1 ON src(a) Auto trigger the index idx 1 4. Ad-hoc Query by auto enable the indices spark-sqI> INSERT INTO TABLE srC SELECT key 1, key 2, value FROM XXX spark-sql> SELECT MAX(value) FroM src WherE a>100 AND a <=120 ANd b=spinach spark-sql> CREATE INDEX idx 2 ON src(a, b); spark-sq sELECT MAX(value) froM src WherE a>=100 AND b=spinach spark-sql> DROP INDEX idx 2 spark-sql> SELECT MAX(Value)FROM src WHERE a>=100, Auto trigger the index idx 2(TBD) Trigger the index idx 1, but found too many records return, auto bypass index and fall back to full table scan Spinach Implementation CCTG 2016中国云计算技术大会 Cloud Computing Technology Conference 2016 DDL Statement Extension(Index Management Create/Add Index( Parser Logical Node /Physical Execution Drop Index( parser Logical Node/Physical Execution) Data source extension mplements the hadoopfSrelation interface(Support Partition file status Caching Abbr(spn)for Spinach data Source in data frame AP save “”)load(/ path/to Enable the extensions Spinach Context (SQLContext Make SQLContext configurable in thriftserver/ sparksql shell Spark-shell Spark executor HeartBeat extenstions (talk later) Spinach Data Format CCTC 2016中国云计算技术大会 Cloud Computing Technology Conference 2016 Data File(n files Index file(n M files) RowGroups N is the number of data files Fibers in each row group Spinach Meta(1) Mis the number of indices File meta Data schema Index meta Data File Statistic/Entries. Index fibers Indices entries Column(Fiber)#1 File entries Column(Fiber)#N Index Node(Fiber)#1 Row Group Meta Index entries RoW Group #1 Data Type Index Node( iber)#N Version info Index Met! Data Row Group #N Spinach Meta File Index fil File meta data Data files Fibers (the minimum unit for caching /loading/eviction) Index Fibers Data Fibers Columnar based)

...展开详情
试读 19P CCTC 2016 Intel程浩:Spinach An Ad-hoc Query Engine on Top of Spark SQL
立即下载 身份认证后 购VIP低至7折
一个资源只可评论一次,评论内容不能少于5个字
gleek Spark业界资源
2018-08-24
回复
geoge0714 最喜欢来自业界的分享了
2017-11-18
回复
zenithward5 很好的资源,谢谢分享
2017-10-09
回复
AlphaBeta1001 很好的分享!!!
2017-07-01
回复
无法开启学霸模式 很好的资源!谢谢分享~
2016-08-29
回复
您会向同学/朋友/同事推荐我们的CSDN下载吗?
谢谢参与!您的真实评价是我们改进的动力~
  • 分享王者

关注 私信
上传资源赚钱or赚积分
最新推荐
CCTC 2016 Intel程浩:Spinach An Ad-hoc Query Engine on Top of Spark SQL 9积分/C币 立即下载
1/19
CCTC 2016 Intel程浩:Spinach An Ad-hoc Query Engine on Top of Spark SQL第1页
CCTC 2016 Intel程浩:Spinach An Ad-hoc Query Engine on Top of Spark SQL第2页
CCTC 2016 Intel程浩:Spinach An Ad-hoc Query Engine on Top of Spark SQL第3页
CCTC 2016 Intel程浩:Spinach An Ad-hoc Query Engine on Top of Spark SQL第4页

试读结束, 可继续读2页

9积分/C币 立即下载