# benchdnn
**benchdnn** is a standalone correctness and performance benchmark for
[Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN)](/intel/mkl-dnn) library.
The purpose of the benchmark is extended and robust correctness verification of
the primitives provided by MKL-DNN. So far **benchdnn** supports convolutions
and inner products of different data types. It also implicitly tests reorders.
## License
**benchdnn** is licensed under
[Apache License Version 2.0](http://www.apache.org/licenses/LICENSE-2.0).
## Usage (main driver)
**benchdnn** itself is a driver for different implementation specific
harnesses. So far it has harness for Intel MKL-DNN convolution, inner product,
reorder, batch normalization, and harness for testing itself.
The usage:
```
$ ./benchdnn: [--HARNESS] [--mode=MODE] [-vN|--verbose=N] HARNESS-OPTS
```
where:
- `HARNESS` is either `conv` [default], `ip`, `reorder`, `bnorm`, `rnn` or `self`
- `MODE` -- string that contains flags for benchmark mode. Use `C` or `c` for correctness (used by default), and `P` or `p` for performance
- `N` -- verbose level (integer from 0 [default] to ...)
- `HARNESS-OPTS` are passed to the chosen harness
Returns `0` on success (all tests passed), and non-zero in case of any error
happened.
## Usage (convolution harness)
The usage:
```
[harness-knobs] [conv-desc] ...
```
where *harness-knobs* are:
- `--cfg={f32, u8s8u8s32, ...}` configuration (see below), default `f32`
- `--dir={FWD_D (forward data), FWD_B (forward data + bias), BWD_D (backward data), BWD_W (backward weights), BWD_WB (backward weights + bias)}` direction, default `FWD_B`
- `--alg={DIRECT, WINO}` convolution algorithm, default DIRECT
- `--merge={NONE, RELU}` merged primitive, default NONE (nothing merged)
- `--attr="attr_str"` convolution attributes (see in the section below), default `""` (no attributes set)
- `--mb=N` override minibatch that is specified in convolution description, default `0` (use mb specified in conv desc)
- `--match=regex` check only convolutions that match with regex, default is `".*"`. Notice: Windows may only interpret string arguments surrounded by double quotation marks.
- `--skip-impl="str1[:str2]..."` skip implementation (see mkldnn_query_impl_info_str), default `""`
- `--allow-unimpl=true|false` do not treat unimplemented configuration as an error, default `false`
- `--perf-template=template-str` set template for performance report (see section *Performance measurements*)
- `--reset` reset all the parameters set before to default one
- `-vN|--verbose=N` verbose level, default `0`
- `--batch=file` use options from the given file (see in subdirectory)
and *conv-desc* is convolution description. The canonical form is:
```
gXmbXicXihXiwXocXohXowXkhXkwXshXswXphXpwXdhXdwXnS
```
Here X is a number and S is string (n stands for name). Some of the parameters
might be omitted if there is either default one (e.g. if g is not specified
**benchdnn** uses 1) or if the can be computed automatically (e.g. output shape
can be derived from the input one and kernel). Also if either width or height
is not specified than it is assumed height == width. Special symbol `_` is
ignored, hence maybe used as delimiter. See `str2desc()` in conv/conv_aux.cpp
for more details and implicit rules :^)
The attribute string *attr_str* is defined as (new lines for readability):
```
[irmode={nearest,down};]
[oscale={none,common,per_oc}[:scale];]
[post_ops='[{relu,sum[:sum_scale]};]...';]
```
Here `irmode` defines the rounding mode for integer output (default is nearest).
Next, `oscale` stands for output_scales. The first parameter is the policy that
is defined below. The second optional parameter is a scale that specifies
either the one common output scale (for `none` and `common` polices) or a
starting point for `per_oc` policy, which uses many scales. The default scale
is 1.0. Known policies are:
- `none` (default) means no output scales set (i.e. scale = 1.)
- `common` corresponds to `mask=0` with common scale factor
- `per_oc` corresponds to `mask=1<<1` (i.e. output channels) with different scale factors
Next, `post_ops` stands for post operation sequence. Currently supported post
ops are:
- `relu` with no parameters (i.e. corresponding scale is 1., alg = eltwise_relu, alpha = beta = 0.)
- `sum` with optional parameter scale (default 1.)
### convolution configurations (aka precision specification)
`--cfg` option specifies what convolution would be used in terms of data type.
Also it defines all the magic with data filling inside. For integer type
saturation is implicitly implied.
Finally configuration defines threshold for computation errors (ideally we
want keep it 0 and it seems to work for now).
The table below shows cases supported by Intel MKL-DNN and corresponding
configurations for **benchdnn**:
|src type | wei type | dst type | acc type | cfg | notes
|:--- |:--- |:--- |:--- |:--- |:---
| f32 | f32 | f32 | f32 | f32 | inference optimized for sse4.2+, training avx2+
| s16 | s16 | s32 | s32 | s16s16s32s32 | optimized for processors with support of 4vnni, forward pass only (aka FWD_D, FWD_B)
| s32 | s16 | s16 | s32 | s32s16s16s32 | optimized for processors with support of 4vnni, backward wrt data only (aka BWD_D)
| s16 | s32 | s16 | s32 | s16s32s16s32 | optimized for processors with support of 4vnni, backward wrt weights (aka BWD_W, BWD_WB)
| u8 | s8 | f32 | s32 | u8s8f32s32 | optimized for processors with support of avx512vl, forward pass only (aka FWD_D, FWD_B)
| u8 | s8 | s32 | s32 | u8s8s32s32 | same notes as for u8s8s32s32
| u8 | s8 | s8 | s32 | u8s8s8s32 | same notes as for u8s8s32s32
| u8 | s8 | u8 | s32 | u8s8u8s32 | same notes as for u8s8s32s32
## Performance measurements
**benchdnn** supports custom performance report. Template is passed via
command line and consists of terminal and nonterminal symbols. Nonterminal
symbols are printed as is. Description of terminal symbols is given below.
There is also a notion of modifiers (marked as @) that change meaning of
terminal symbols, e.g. sign '-' means minimum of (in terms of time). See
table of modifiers below.
> **caution:** threads have to be pinned in order to get consistent frequency
| abbreviation | description
|:------------ |:-----------
| %d | problem descriptor
| %D | expanded problem descriptor (conv parameters in csv format)
| %n | problem name
| %z | direction
| %@F | effective cpu frequency computed as clocks[@] / time[@]
| %O | number of ops required (padding is not taken into account)
| %@t | time in ms
| %@c | time in clocks
| %@p | ops per second
| modifier | description
|:-------- |:-----------
| | default
| - | min (time) -- default
| 0 | avg (time)
| + | max (time)
| |
| K | Kilo (1e3)
| M | Mega (1e6)
| G | Giga (1e9)
The definition of expanded problem descriptor is:
`g,mb,ic,ih,iw,oc,oh,ow,kh,kw,sh,sw,ph,pw`.
The default template can be found in conv/bench_conv.cpp that is defined as
`perf,%n,%d,%GO,%GF,%-t,%-Gp,%0t,%0Gp`. That will produce the following output
in CSV format:
```
string: perf
convolution name
full conv-desc
number of giga ops calculated
effective cpu frequency in GHz (amb clocks[min] / time[min])
minimum time spent in ms
best gigaops (since it corresponds to mimimum time)
average time spent in ms
average gigaops (since it corresponds to average time)
```
## Examples
Run the set of f32 forward convolutions from inputs/conv_all file w/ bias and default minibatch:
```
$ ./benchdnn --conv \
--cfg=f32 --dir=FWD_B --batch=inputs/conv_all
```
Run t
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
Intel MKL.dnn (455个子文件)
_clang-format 3KB
prepare_mkl.bat 2KB
bnorm_3d 85B
bnorm_googlenet_v2 4KB
bnorm_googlenet_v3 6KB
bnorm_lgmb 16B
bnorm_resnet_50 3KB
bnorm_topo 78B
simple_training_net.c 39KB
simple_net.c 18KB
api.c 13KB
simple_convolution.c 9KB
only_convolution.c 6KB
only_pooling.c 5KB
gtest.cc 191KB
gtest-death-test.cc 50KB
gtest-port.cc 42KB
gtest-filepath.cc 14KB
gtest-printers.cc 12KB
gtest-test-part.cc 4KB
gtest-typed-test.cc 4KB
gtest-all.cc 2KB
gtest_main.cc 2KB
MKL.cmake 7KB
platform.cmake 5KB
SDL.cmake 3KB
OpenMP.cmake 3KB
Doxygen.cmake 2KB
profiling.cmake 1KB
conv_3d_unet 579B
conv_a3c 144B
conv_alexnet 269B
conv_all 163B
conv_all_topo 438B
conv_densnet 3KB
conv_dilated 728B
conv_dilated_rfcn 82B
conv_fastrcnn 89B
conv_fastrcnn_p1 4KB
conv_fastrcnn_p2 2KB
conv_fastrcnn_p3 7KB
conv_googlenet_v1 4KB
conv_googlenet_v2 6KB
conv_googlenet_v3 9KB
conv_maskrcnn 64B
conv_maskrcnn_p1 10KB
conv_maskrcnn_p2 669B
conv_mobilenet 5KB
conv_mobilenet_dw 684B
conv_regression_padding 470B
conv_regression_small_spatial 1KB
conv_resnet_50 4KB
conv_resnet_50_sparse 5KB
conv_segnet 348B
conv_ssd_300_voc0712 2KB
conv_tails 901B
conv_unet 986B
conv_vgg_11 460B
conv_vgg_19 860B
conv_xception 2KB
conv_yolov2 16KB
COPYRIGHT 3KB
jit_avx512_common_conv_kernel.cpp 156KB
jit_avx2_gemm_f32.cpp 96KB
jit_avx512_core_conv_winograd_kernel_f32.cpp 93KB
jit_avx512_common_convolution_winograd.cpp 91KB
test_pooling_forward.cpp 82KB
jit_avx512_common_gemm_f32.cpp 76KB
jit_avx512_common_convolution.cpp 62KB
jit_avx512_common_conv_winograd_kernel_f32.cpp 59KB
ref_rnn.cpp 49KB
jit_avx512_common_1x1_conv_kernel.cpp 48KB
jit_uni_batch_normalization.cpp 46KB
jit_uni_lrn_kernel_f32.cpp 43KB
test_pooling_backward.cpp 42KB
ref_rnn.cpp 40KB
jit_avx512_core_convolution_winograd.cpp 40KB
jit_avx512_core_u8s8s32x_wino_convolution.cpp 39KB
test_lrn_backward.cpp 37KB
jit_avx2_conv_kernel_f32.cpp 36KB
jit_uni_eltwise.cpp 35KB
jit_transpose_src_utils.cpp 34KB
simple_net.cpp 33KB
jit_avx512_common_1x1_convolution.cpp 31KB
jit_avx512_common_lrn.cpp 29KB
jit_avx512_core_u8s8s32x_1x1_conv_kernel.cpp 26KB
test_batch_normalization.cpp 25KB
ref_wino.cpp 25KB
simple_rnn.cpp 24KB
jit_sse42_1x1_conv_kernel_f32.cpp 24KB
bnorm.cpp 24KB
jit_uni_reorder.cpp 23KB
jit_avx2_1x1_conv_kernel_f32.cpp 23KB
jit_uni_dw_conv_kernel_f32.cpp 23KB
test_lrn_forward.cpp 22KB
conv.cpp 22KB
rnn.cpp 22KB
jit_uni_pool_kernel_f32.cpp 21KB
memory_desc_wrapper.cpp 21KB
cpu_reorder.cpp 21KB
共 455 条
- 1
- 2
- 3
- 4
- 5
资源评论
- 元气少女缘结神2019-07-25真是太太太奇怪了,为什么从github上下载的都没有prepare_mkl.sh的脚本?我下了几个版本都没有
qq_42859819
- 粉丝: 0
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- stm32f103c8t6驱动st7735sLCD屏幕显示程序
- 精选微信小程序源码:美食菜谱小程序(含源码+源码导入视频教程&文档教程,亲测可用)
- Flume+Kafka+StructuredStreaming+Mysql分布式采集与微批处理
- 微信小程序识别二维码并提取二维码中的文本数据代码
- 基于51单片机 4*4*4三色光立方演示程序
- apache-tomcat-11.0.1-windows.zip
- 基于arduino PC 室内环境监测系统+项目源码+文档说明
- C# winform自定义饼图控件.zip
- 同步空间新手教程.docx
- 13章Electron+Vue3+AI+云存储-实战跨平台桌面应用
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功