没有合适的资源?快使用搜索试试~ 我知道了~
Row-level Runtime Filters in Spark
需积分: 0 0 下载量 111 浏览量
2024-02-29
20:07:46
上传
评论
收藏 133KB PDF 举报
温馨提示
试读
6页
Row-level Runtime Filters in Spark
资源推荐
资源详情
资源评论
Row-level Runti d me Filters in Spark
Authors: Abhishek Somani, Yunxiao Ma, Yingyi Bu
JIRA: SPARK-32268 , PR: https://github.com/apache/spark/pull/35789 , Target version: Spark 3.3
Date: March-05-2022
1. Background and Motivation
2. Design
2.1 Bloom Filter Creation as an Aggregate Function
2.2 Bloom Filter Application as a Scalar Subquery Filter
2.3 Rewrite Trigger Condition
2.4 Bloom Filter Sizing
2.5 Pros. and Cons. of this approach
3. Performance Evaluation Numbers
3.1 With Semi-Join
3.2 With Bloom Filter
This document proposes row-level runtime filters in Spark to reduce intermediate data volume for
operators like shue, join and aggregate, and hence improve performance. We propose two
mechanisms to do this: semi-join filters or Bloom filters, and both mechanisms are proposed to
co-exist side-by-side behind feature configs.
1. Background and Motivation
Row-level runtime filters can reduce intermediate data and expensive computations during query
execution. In particular, shue operations are expensive as the data is first written to disk by
executors, and then fetched by other executors (across nodes) over the network. This network+disk
activity is expensive, and minimizing the data to be shued can bestow huge performance gains.
In this document, we propose reducing shue data by pushing down a runtime filter from one side
of a join that has a selective filter to the other side. We propose to do this in two different
scenarios:
1.
When the join itself is a shue join: Row-level runtime filtering will help in reducing the
shue data for the join;
2.
When the join is a broadcast join AND there is likely a shue below the join on the probe
side: The runtime filter can be pushed down through the shue, reducing the data to be
shued.
资源评论
不甚了然
- 粉丝: 231
- 资源: 5
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功