[查看中文](./README_CN.md)
GraphEngine(GE) is a sub-module of MindSpore connecting the front end and devices which was designed by the researches and engineers within Huawei Technologies Co.,Ltd. GE is implemented via C++. It takes the graph of front end as its input and a series of graph operations are carried out to adapt the graph to a certain form which can be effectively operated on devices. GE is specifically designed for an efficient operation on Ascend Chips. GE is automatically called without any exposure to the users. GE mainly consists of two parts, i.e. GE API and GE Core. The architecture diagram of GE is illustrated as follows
![GE_schema](docs/GE_Architecture.png)
- GE API
GE API is the interface between GE Core and front end which controls the initialization and finalization of GE Core and Sessions. It also provides the interfaces for graph adding and running.
- GE Core
GE Core acts as the core module of GE and is responsible for graph processing operations. It consist of six parts, i.e. graph preparation, graph partition, graph optimization, graph compilation, graph loading and graph execution. These six parts are performed in series and all together complete the complicated graph processing operations.
- Graph preparation & Whole graph optimization
All the shapes of feature maps and variables in the graph are inferred in this stage for memory allocation later. Some aggregations of operators like allreduce are performed as well.
- Graph partition
Ascend Chips are heterogeneous chips including CPUs and vector calculation units, i.e. AICORE. Each operator in the graph is assigned to a certain operating cores according to the costs and supports. These two cores correspond to two different abstract engines in software. The whole graph is split into several sub-graphs based on the assigned engine in previous stage. Certain operators are added to the sub-graphs as the marks for graph edges. Such a partition enables an efficient optimization, compilation in next stages.
- Subgraph optimization
Different optimizer interfaces are called due to different engines that each sub-graph belongs to. To thoroughly utilize the calculation ability of the CUBE module in AICORE, A novel data layout format for faster hardware fetch is applied and the transition between normal 4D to this special format is performed in this stage. Such an operation guarantees less data handling between RAMs and CUBEs. Certain combination of operators is fused into a single big operator to further reduce the computation costs. This fusion is carried out in this stage as well.
- Graph compilation & Graph loading
GraphEngine uses real-time operator compilation technology, i.e. the operator executable program is generated at real time according to the network structure. Meanwhile, Memory allocation is completed considering memory reuse strategy in resources allocation stage. According to the graph information, the queue, event, stream resources are allocated. Each operator is compiled to a task bound to a certain stream. Tasks on the same stream are performed in series and task on different streams can be executed in parallel. In the Graph Loading stage, the operators of graph are assigned to different engines according to the engine information, and the graph is loaded on the devices for running.
- Graph execution
The graph is executed on devices efficiently in this stage and the corresponding outputs are returned to the hosts. For efficiency consideration, a sink mode is provided where the graph is executed several times with the last output returned. Such a mode effectively reduces the data handling between devices and hosts.
In training or evaluating process, the aforementioned graph processing operations are carried out automatically. All in all, GE is a linked up module between MindSpore front end and Ascend Chips aiming to adapt the graph designed by users to a more efficient form that can be directly executed on Ascend Chips.
- [Installation](#installation)
- [Community](#community)
- [Contributing](#contributing)
- [Release Notes](#release-notes)
- [License](#license)
# Installation
## Installing GraphEngine
GE is automatically installed and compiled once you finish installing MindSpore. There are three dynamic link libraries corresponding to GE.
## Installing Using the Source Code
You may also build GraphEngine from source.
To build GraphEngine, please make sure that you have access to an [Ascend 910](https://e.huawei.com/se/products/cloud-computing-dc/atlas/ascend-910) environment as compiling environment, and make sure that following software requirements are fulfilled.
> - GCC >= 7.3.0
> - CMake >= 3.14.0
> - Autoconf >= 2.64
> - Libtool >= 2.4.6
> - Automake >= 1.15.1
as your compiling environment and have GCC version >= 7.3.0 and CMake version >= 3.14.0 installed. It is necessary to be using an Ascend 910 environment to build GraphEngine.
The output of building GraphEngine is a set of shared libraries which can be linked with MindSpore, they are not meant to be used independently.
1. Download GraphEngine source code.
GraphEngine source code is available on [Gitee](https://gitee.com/mindspore/graphengine):
```shell
git clone https://gitee.com/mindspore/graphengine.git
cd graphengine
```
2. Run the following command in the root directory of the source code to compile GraphEngine:
To build with default options, simply:
```shell
bash build.sh
```
> - Before running the preceding command, ensure that the relevant paths have been added to the environment variable PATH.
> - In the build.sh script, the git clone command will be executed to obtain code from Gitee.com. Ensure that the network settings of Git are correct.
> - In the build.sh script, the default number of compilation threads is 8. If the compiler performance is poor, compilation errors may occur. You can add -j{Number of threads} in to bash command to reduce the number of threads. For example, `bash build.sh -j4`.
3. Access the output directory of the source code, obtain the generated GraphEngine libraries which can be linked with MindSpore for further installation/testing.
For more information on other options of building GraphEngine:
```shell
bash build.sh -h
```
If you wish to clean all outputs from last build and try again:
```shell
rm -rf build/ output/
bash build.sh
```
## Community
- [MindSpore Slack](https://join.slack.com/t/mindspore/shared_invite/enQtOTcwMTIxMDI3NjM0LTNkMWM2MzI5NjIyZWU5ZWQ5M2EwMTQ5MWNiYzMxOGM4OWFhZjI4M2E5OGI2YTg3ODU1ODE2Njg1MThiNWI3YmQ) - Ask questions and find answers.
## Contributing
Welcome contributions. See our [Contributor Wiki](https://gitee.com/mindspore/mindspore/blob/master/CONTRIBUTING.md) for more details.
## Release Notes
The release notes, see our [RELEASE](RELEASE.md).
## License
[Apache License 2.0](LICENSE)
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
连接前端和昇腾处理器的连接链路模块。图引擎模块(GE)是MindSpore的一个子模块,其代码由C++实现,位于前端模块ME和底层硬件之间,起到承接作用。图引擎模块以ME下发的图作为输入,然后进行一系列的深度图优化操作,最后输出一张可以在底层硬件上高效运行的图。GE针对昇腾AI处理器的硬件结构特点,做了特定的优化工作,以此来充分发挥出昇腾AI处理器的强大算力。在进行模型训练/推理时,GE会被自动调用而用户并不感知。GE主要由GE API和GE Core两部分组成。GE API是连接前端模块ME和GE Core的接口,负责GE Core中初始化、Session管理模块的接口,支持运行环境初始化,Session创建、销毁,图添加执行。GE Core是GE的核心模块,负责整个训练过程中的图管理。GE Core中的图处理可细分为六大步骤,分别是图准备、图拆分、图优化、图编译、图加载和图执行,对于ME下发的每一张图都会经过这六个步骤的操作,最终得到可以直接在底层硬件上高效执行的图。图准备 & 整图优化完成整图级别的数据准备和优化,涉及到IR库及算子库。使用IR库中算子的InferShape函数
资源推荐
资源详情
资源评论
收起资源包目录
连接前端和昇腾处理器的连接链路模块位于前端模块ME和底层硬件之间起到承接作用 (395个子文件)
.clang-format 4KB
.gitignore 333B
.gitmodules 216B
elewise_calculation_ops.h 153KB
elewise_calculation_ops.h 153KB
nn_detect_ops.h 112KB
nn_detect_ops.h 112KB
nn_training_ops.h 107KB
nn_training_ops.h 107KB
selection_ops.h 106KB
selection_ops.h 106KB
image_ops.h 106KB
image_ops.h 106KB
data_flow_ops.h 95KB
data_flow_ops.h 95KB
acl_dvpp.h 93KB
nn_norm_ops.h 92KB
nn_norm_ops.h 92KB
rnn.h 83KB
rnn.h 83KB
nn_calculation_ops.h 83KB
nn_calculation_ops.h 83KB
matrix_calculation_ops.h 78KB
matrix_calculation_ops.h 78KB
nn_pooling_ops.h 77KB
nn_pooling_ops.h 77KB
math_ops.h 58KB
math_ops.h 58KB
array_ops.h 57KB
array_ops.h 57KB
acl_rt.h 55KB
reduce_ops.h 54KB
reduce_ops.h 54KB
acl_mdl.h 47KB
transformation_ops.h 44KB
transformation_ops.h 44KB
sparse_ops.h 44KB
sparse_ops.h 44KB
nonlinear_fuc_ops.h 43KB
nonlinear_fuc_ops.h 43KB
kernel.h 40KB
types.h 37KB
random_ops.h 34KB
random_ops.h 34KB
ge_api_types.h 33KB
mem.h 33KB
string_ops.h 32KB
string_ops.h 32KB
nn_batch_norm_ops.h 30KB
nn_batch_norm_ops.h 30KB
fusion_ops.h 28KB
fusion_ops.h 28KB
rt_mem_queue.h 27KB
nn_other.h 27KB
nn_other.h 27KB
acl_base.h 24KB
acl_op.h 22KB
split_combination_ops.h 21KB
split_combination_ops.h 21KB
pad_ops.h 20KB
pad_ops.h 20KB
acl_cblas.h 20KB
linalg_ops.h 19KB
linalg_ops.h 19KB
stateless_random_ops.h 19KB
stateless_random_ops.h 19KB
rt_ffts_plus_define.h 18KB
hcom_ops.h 18KB
hcom_ops.h 18KB
rt_model.h 18KB
acl_tdt_queue.h 18KB
prof_common.h 17KB
base.h 17KB
candidate_sampling_ops.h 17KB
candidate_sampling_ops.h 17KB
list_ops.h 16KB
list_ops.h 16KB
image.h 16KB
image.h 16KB
dev.h 16KB
functional_ops.h 16KB
functional_ops.h 16KB
nn_ops.h 15KB
nn_ops.h 15KB
nn_norm.h 15KB
nn_norm.h 15KB
control_flow_ops.h 15KB
control_flow_ops.h 15KB
ge_api.h 15KB
acl_prof.h 15KB
slog.h 15KB
prof_api.h 14KB
hccl.h 13KB
event.h 12KB
deep_md.h 12KB
deep_md.h 12KB
parsing_ops.h 12KB
parsing_ops.h 12KB
lookup_ops.h 11KB
lookup_ops.h 11KB
共 395 条
- 1
- 2
- 3
- 4
资源评论
传奇开心果编程
- 粉丝: 8436
- 资源: 335
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功