没有合适的资源?快使用搜索试试~ 我知道了~
提升C语义用于数据流优化_Lifting C Semantics for Dataflow Optimization.pdf
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 150 浏览量
2022-01-04
23:29:43
上传
评论
收藏 1.32MB PDF 举报
温馨提示
![preview](https://dl-preview.csdnimg.cn/73880990/0001-d7654f4858fbcbce4faa88dd4863985b_thumbnail.jpeg)
![preview-icon](https://csdnimg.cn/release/downloadcmsfe/public/img/scale.ab9e0183.png)
试读
12页
提升C语义用于数据流优化_Lifting C Semantics for Dataflow Optimization.pdf
资源推荐
资源详情
资源评论
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![exe](https://img-home.csdnimg.cn/images/20210720083343.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![](https://csdnimg.cn/release/download_crawler_static/73880990/bg1.jpg)
Lifting C Semantics for Dataow Optimization
Alexandru Calotoiu
acalotoiu@inf.ethz.ch
ETH Zurich, Switzerland
Tal Ben-Nun
talbn@inf.ethz.ch
ETH Zurich, Switzerland
Grzegorz Kwasniewski
gkwasnie@inf.ethz.ch
ETH Zurich, Switzerland
Johannes de Fine Licht
definelj@inf.ethz.ch
ETH Zurich, Switzerland
Timo Schneider
timo.schneider@inf.ethz.ch
ETH Zurich, Switzerland
Philipp Schaad
philipp.schaad@inf.ethz.ch
ETH Zurich, Switzerland
Torsten Hoeer
torsten.hoefler@inf.ethz.ch
ETH Zurich, Switzerland
Abstract
C is the lingua franca of programming and almost any device
can be programmed using C. However, programming mod-
ern heterogeneous architectures such as multi-core CPUs
and GPUs requires explicitly expressing parallelism as well
as device-specic properties such as memory hierarchies.
The resulting code is often hard to understand, debug, and
modify for dierent architectures. We propose to lift C pro-
grams to a parametric dataow representation that lends
itself to static data-centric analysis and enables automatic
high-performance code generation. We separate writing code
from optimizing for dierent hardware: simple, portable C
source code is used to generate ecient specialized versions
with a click of a button. Our approach can identify parallelism
when no other compiler can, and outperforms a bespoke par-
allelized version of a scientic proxy application by up to
21%.
Keywords:
parallelism, dataow analysis, automatic paral-
lelization
1 Introduction
Many performance critical applications are written in C, as
its machine model is usually closest to hardware and allows
for bare-metal tuning to achieve highest performance. Ac-
cording to the TIOBE index [
45
] in 2020, C was the most
popular language in Internet searches. High-performance
computing centers state that 25% of their users primarily use
C [
17
]. Since Kernighan’s and Ritchie’s original inception of
the C language, systems have changed dramatically. Most
architectures need specialized instructions, compiler direc-
tives, or libraries to be used eciently. This usually leads
to C programs where more lines of code are implementing
optimizations tailored to the architecture than solving the
actual problem.
Targeted optimization is tightly coupled to hardware ar-
chitectures. A code written for GPUs using CUDA, a code
written to exploit shared memory using OpenMP, and a code
written for large supercomputers using the message passing
interface (MPI) can be nominally written in C, but will vary
widely even if they solve the same problem. The only aspect
they are likely to have in common is the sequential algo-
rithm each variant is based on. We argue that specializing
the programs to an architecture treats the symptoms, but
cannot eliminate the root cause: precisely because C was not
designed for performance portability, optimizing C programs
is both challenging and time consuming.
A powerful alternative to specialization is using tools
provided by modern compilers such as polyhedral analy-
sis [
10
,
22
] to optimize and parallelize sequential C code, with
results rivaling and even surpassing hand-tuned versions of
the code. However, these are limited to static control parts
(SCOPs) within functions [
22
]. SCOPs impose constraints
on what type of source code can be analyzed: indirect array
accesses such as
𝑥 [𝑐𝑜𝑙𝑢𝑚𝑛_𝑖𝑛𝑑𝑒𝑥 [𝑗]]
are typically not per-
mitted. The limitation is apparent in the following example
(sparse matrix vector multiplication), as no optimization is
possible due to the data-dependent indirect array accesses.
for (i = 0; i < N; i ++)
for (j = row _p tr [ i ]; j < row_ pt r [ i + 1]; j ++)
y[ i ] += A [ c ol_ id x [ j ]] * x[j ];
In search of a more general solution, we observe that data
movement is the most expensive part of most program exe-
cutions when considering both energy and time [
26
]. Data-
centric programming and leveraging dataow graphs is al-
ready widely performed in compiler analysis [
30
,
31
,
49
], and
recently emerging in graph analytics [
48
], high performance
computing [
6
,
42
], and machine learning [
3
]. Data-centric
models are both productive and portable, as parallelism is in-
herently expressed as data-independent sections, regardless
of the target hardware.
Our goal is to generate optimized, parallel code for dif-
ferent platforms by minimizing data movement. To achieve
it, we extract the data movement semantics from most C pro-
grams into a parametric dataow representation, where data
movement can be better analyzed and transformed. While
one cannot statically analyze the dataow of all C programs,
as can be shown by the Halting problem or Rice’s theorem,
we observe that high performance C codes, a subset of C
1
arXiv:2112.11879v2 [cs.PL] 30 Dec 2021
![](https://csdnimg.cn/release/download_crawler_static/73880990/bg2.jpg)
Calotoiu, et al.
C program
int
int
int
C-to-DaCe
translation
(§2)
Dataflow
coarsening
(§3)
Data-centric program
view (SDFG)
Dataflow
optimization
(§4)
Optimized C program
Code
generation
[8]
int
int
int
Figure 1. Optimizing C programs by lifting dataow.
programs without undened behavior, recursion or function
pointers, can be lifted.
We keep track of memory accesses using symbolic analy-
sis of access patterns and leverage the dataow across the
entirety of a program. To showcase the opportunities pro-
vided by the data-centric approach we show how we can
automatically expose data parallelism by identifying and
optimizing updates to shared memory locations. We evalu-
ate the eectiveness of our parallelization by providing an
automatic method of deriving work/depth models for code
we have parallelized. There is no need to annotate the code
to recover parametrically-parallel sections, as we derive the
required information directly from dataow. The overall pro-
cess is fully automatic and is summarized in Figure 1. Figure 2
shows a more detailed view of how the sparse matrix vector
multiplication code is translated, transformed and optimized
and will be discussed in detail in Sections 2, 3, and 4.
As we shall show, from the raw C codes, we are able to
not only generate codes that perform equivalently or better
than specialized tools such as polyhedral compilers; but also
operate on LULESH [
27
], a scientic computing application,
nding parallelization opportunities that no state-of-the-art
tool detects, and even outperform the tuned parallel version
provided by the application authors.
Contributions.
•
We statically lift the semantics of dataow from C into
a data-centric intermediate representation.
1
•
We use symbolic analysis of data access patterns across
entire programs to expose optimizations and paral-
lelism in unmodied C programs.
•
We statically detect the update of a memory location
as a distinct data access pattern to expose additional
parallelism opportunities.
•
We introduce an automatic, static work-depth analysis
to objectively measure the degree to which we have
exposed parallelism in sequential C code.
•
On the LULESH [
27
] high-performance scientic ap-
plication, we automatically generate a parallel version
that outperforms all other compilers and autoparal-
lelizing tools and even surpasses the developers’ own
OpenMP parallelization by up to 21%.
1
The code is available under hps://github.com/spcl/c2dace.
2 From C to Data-Centric Programming
The data-centric programming paradigm revolves around
memory, its movement, and its manipulation through com-
putations. Rather than prioritizing control-ow constructs
(e.g., sequential statements, loops), the core component of
data-centric models is dataow. Execution order is thus rst
a byproduct of data dependencies, and secondly a result of
explicit control-ow. There are three governing principles to
the paradigm: separation of data containers from computa-
tion, explicit data movement expressed as a rst-class com-
ponent, and providing control dependencies for cases where
dataow is not implied (e.g., data-dependent branches).
This is a crucial dierence to control-centric C programs,
where dataow is implicit. In order to perform this para-
digm shift we must execute a workow to lift dataow from
C programs. Throughout this workow, we must maintain
semantic equivalence in every step of the translation. We sep-
arate the workow: rst, we perform AST transformations
to simplify the translation to the dataow representation.
Then, we parse the C code into a ne-grained dataow rep-
resentation. Then we repeatedly coarsen that dataow, after
which we can perform optimizing transformation passes. Fi-
nally, we can generate optimized C source code for dierent
architectures.
In this work, we focus on the Stateful Dataow Multigraph
(SDFG) IR [
8
] as the data-centric representation. An SDFG
is a directed graph, representing a state machine, where
each node (
state
) is in itself a parametric directed acyclic
multigraph. In the outer graph, edges contain state transi-
tion conditions and assignments. Each state is in turn an
acyclic dataow multigraph, with edges representing data
movement and nodes representing data containers, compu-
tations, and parametric parallelism scopes. The components
are summarized in Figure 2 and full operational semantics
can be found in Ben-Nun et al. [8].
Using the DaCe framework, SDFGs were shown to accel-
erate a wide range of application classes in dense/sparse
linear algebra and graph algorithms [
8
], deep learning Trans-
former architectures [26], numerical weather prediction on
FPGAs [
16
], and extreme-scale quantum transport simula-
tions on the world’s largest supercomputer [51].
We provide a high level overview mapping major C syntax
elements [
1
] to equivalent SDFG elements in Table 1, and
introduce both relevant SDFG components in more detail as
well as discuss the more challenging aspects of C to SDFG
translation below.
2
![](https://csdnimg.cn/release/download_crawler_static/73880990/bg3.jpg)
Liing C Semantics for Dataflow Optimization
C Language SDFG Equivalent
Declarations and Types (§ 2.1)
Primitive data type Scalar data container
Array Array data container
Pointer
Access node to existing data con-
tainer, or new data container if point-
ing to newly allocated memory.
Expressions and Assignments (§ 2.2)
Operators (Unary, Binary,...)
Tasklet with incoming and outgoing
memlets for read/written operands
Array expression Memlet
Statements (§ 2.3)
Compound (blocks) State
Branching (if,...)
Branch conditions on state transi-
tion edges
Iteration (
for, while, ...)
State for compound statement, with
states and transitions for loop logic
Function ow (
break,
continue, return)
State transitions
goto State transition
Functions (§ 2.4)
Function calls (with source)
Nested SDFG for content, memlets
reduce shape of inputs and outputs
External/Library calls Tasklet with library state
Recursion Unsupported
Function pointers No equivalent, unsupported
Parallelism (§ 2.6)
— Parametric map scope
Table 1.
Mapping of major C syntax [
1
] elements to SDFG
representation.
2.1 Declarations and Types
We need to capture all instances where data is dened, read,
and written. The rst step is to capture all instances where
data is dened, whether statically or at runtime. The equiva-
lent to declarations in C is the creation of data containers in
SDFGs.
Data containers
are accessed using access nodes in SD-
FGs, and represent arrays, both one- and multi-dimensional.
Scalars are thus specialized data containers, with just one
instance of a primitive data type.
Some examples of data containers are shown below:
C B
B[0:50, 0:50]C[0:10]
…
…
double **A, B[50][50];
float *C = (float *)malloc(sizeof(float) * 10);
C code Corresponding SDFG
Here,
C
will be registered as a one-dimensional single preci-
sion oating point array of 10 elements in the SDFG, and
B
as a two-dimensional double precision oating point array
of 50 elements times 50 elements. We ensure no aliasing is
possible in our representation by not creating a separate
data container for pointers such as
A
. Containers will be
created only if
A
is assigned to newly allocated memory. If
A
is assigned to an existing data container,
A
will simply be
replaced with an access to that container.
SDFGs rely on symbolic math to perform useful analyses
and transformations. A
symbol
is dened as a scalar that
will not be modied within any state. Symbols can only be
set between states, in an inter-state edge. We can thus use
symbolic expressions in memory osets and integer sets, and
dierentiate them from runtime-computed scalars.
To analyze dynamically allocated memory such as
malloc
and variable-length arrays, we automatically create symbols
out of integer scalar values, as we detail in Section 3.1.
2.2 Expressions and Assignments
Assignments are some of the most common constructs en-
countered in C. An assignment contains both data (read and
written) and computation (as part of expressions), and we
discuss their SDFG equivalents below.
for (int i = 0; i < N ; i++)
for (int j = row_ptr[i]; j < row_ptr[i+1]; j++)
y[i] += A[col_idx[j]] * x[j];
for (int i = 0; i < N ; i++)
for (int j = row_ptr[i]; j < row_ptr[i+1]; j++)
{
int idx=col_idx[j];
y[i] += A[idx] * x[j];
}
j=row_ptr[i]
A
x
y
y = A*x
A[idx]
x[j]
y[i] (CR : Sum)
j>=row_ptr[i+1]
i=0
i<N
i++
j<row_ptr[i+1]
idx=col_idx[j]
j++
[i=0:N] (omp parallel for)
[j=row_ptr[i]:row_ptr[i+1]
(omp parallel for)
Data: Array containers
A[idx]
y[i] (CR: Sum)
Memlet: Data movement unit, with parallel write
conflict resolution (CR) options
States: Control dependencies
Map: Parametric parallelism scope
y = A * x
Tasklet: Fine-grained computation
A x
i=0
i<N
Interstate edges: symbolic conditions and
assignments dependencies
SDFG components
idx=col_idx[j]
Update
detection
Symbolic
access
Index
extraction
AST transformation (§2)
i>=N
A
x
y
y = A*x
A[idx]
x[j]
y[i] (CR : Sum)
C-to-DaCe
translation
(§2)
Dataflow
coarsening
(§3)
Dataflow
optimization
(§4)
Figure 2. From C to Data-Centric Programming.
3
剩余11页未读,继续阅读
资源评论
![avatar-default](https://csdnimg.cn/release/downloadcmsfe/public/img/lazyLogo2.1882d7f4.png)
![avatar](https://profile-avatar.csdnimg.cn/ace77722cc904668be9c7ee0feb247ba_dwf1354046363.jpg!1)
![avatar-vip](https://csdnimg.cn/release/downloadcmsfe/public/img/user-vip.1c89f3c5.png)
易小侠
- 粉丝: 6508
- 资源: 9万+
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助
![voice](https://csdnimg.cn/release/downloadcmsfe/public/img/voice.245cc511.png)
![center-task](https://csdnimg.cn/release/downloadcmsfe/public/img/center-task.c2eda91a.png)
最新资源
- 以下是一些适用于英语六级作文的万能句型模板,涵盖了引言、正文和结论部分的各类表达方式.docx
- MATLAB中的非线性规划
- 进行C语言面试资格确认是招聘过程中一个重要的步骤,目的是确保候选人具备足够的C语言编程能力和知识.docx
- Java 轻量级的集群负载均衡设计
- 纹身师个人网站模板.jpg
- 在C语言中,连接两个字符串(即将一个字符串附加到另一个字符串的末尾)通常可以使用标准库中的 `strcat` 函数.docx
- 数据库管理工具:dbeaver-ce-23.1.1-stable.x86-64.rpm
- 以下是几个具体竞赛题目的详细解答,包括建模思路、方法和步骤 .docx
- 一份关于全国大学生建模大赛的相关教程!!
- 以下是关于计算机网络和现代通信组网的详细教程、案例和相关项目的推荐.docx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
![feedback](https://img-home.csdnimg.cn/images/20220527035711.png)
![feedback](https://img-home.csdnimg.cn/images/20220527035711.png)
![feedback-tip](https://img-home.csdnimg.cn/images/20220527035111.png)
安全验证
文档复制为VIP权益,开通VIP直接复制
![dialog-icon](https://csdnimg.cn/release/downloadcmsfe/public/img/green-success.6a4acb44.png)