# benchdnn
**benchdnn** is a standalone correctness and performance benchmark for
[Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN)](/intel/mkl-dnn) library.
The purpose of the benchmark is extended and robust correctness verification of
the primitives provided by MKL-DNN. So far **benchdnn** supports convolutions
and inner products of different data types. It also implicitly tests reorders.
## License
**benchdnn** is licensed under
[Apache License Version 2.0](http://www.apache.org/licenses/LICENSE-2.0).
## Usage (main driver)
**benchdnn** itself is a driver for different implementation specific
harnesses. So far it has harness for Intel MKL-DNN convolution, inner product,
reorder, batch normalization, and harness for testing itself.
The usage:
```
$ ./benchdnn: [--HARNESS] [--mode=MODE] [-vN|--verbose=N] HARNESS-OPTS
```
where:
- `HARNESS` is either `conv` [default], `ip`, `reorder`, `bnorm`, `rnn` or `self`
- `MODE` -- string that contains flags for benchmark mode. Use `C` or `c` for correctness (used by default), and `P` or `p` for performance
- `N` -- verbose level (integer from 0 [default] to ...)
- `HARNESS-OPTS` are passed to the chosen harness
Returns `0` on success (all tests passed), and non-zero in case of any error
happened.
## Usage (convolution harness)
The usage:
```
[harness-knobs] [conv-desc] ...
```
where *harness-knobs* are:
- `--cfg={f32, u8s8u8s32, ...}` configuration (see below), default `f32`
- `--dir={FWD_D (forward data), FWD_B (forward data + bias), BWD_D (backward data), BWD_W (backward weights), BWD_WB (backward weights + bias)}` direction, default `FWD_B`
- `--alg={DIRECT, WINO}` convolution algorithm, default DIRECT
- `--merge={NONE, RELU}` merged primitive, default NONE (nothing merged)
- `--attr="attr_str"` convolution attributes (see in the section below), default `""` (no attributes set)
- `--mb=N` override minibatch that is specified in convolution description, default `0` (use mb specified in conv desc)
- `--match=regex` check only convolutions that match with regex, default is `".*"`. Notice: Windows may only interpret string arguments surrounded by double quotation marks.
- `--skip-impl="str1[:str2]..."` skip implementation (see mkldnn_query_impl_info_str), default `""`
- `--allow-unimpl=true|false` do not treat unimplemented configuration as an error, default `false`
- `--perf-template=template-str` set template for performance report (see section *Performance measurements*)
- `--reset` reset all the parameters set before to default one
- `-vN|--verbose=N` verbose level, default `0`
- `--batch=file` use options from the given file (see in subdirectory)
and *conv-desc* is convolution description. The canonical form is:
```
gXmbXicXihXiwXocXohXowXkhXkwXshXswXphXpwXdhXdwXnS
```
Here X is a number and S is string (n stands for name). Some of the parameters
might be omitted if there is either default one (e.g. if g is not specified
**benchdnn** uses 1) or if the can be computed automatically (e.g. output shape
can be derived from the input one and kernel). Also if either width or height
is not specified than it is assumed height == width. Special symbol `_` is
ignored, hence maybe used as delimiter. See `str2desc()` in conv/conv_aux.cpp
for more details and implicit rules :^)
The attribute string *attr_str* is defined as (new lines for readability):
```
[irmode={nearest,down};]
[oscale={none,common,per_oc}[:scale];]
[post_ops='[{relu,sum[:sum_scale]};]...';]
```
Here `irmode` defines the rounding mode for integer output (default is nearest).
Next, `oscale` stands for output_scales. The first parameter is the policy that
is defined below. The second optional parameter is a scale that specifies
either the one common output scale (for `none` and `common` polices) or a
starting point for `per_oc` policy, which uses many scales. The default scale
is 1.0. Known policies are:
- `none` (default) means no output scales set (i.e. scale = 1.)
- `common` corresponds to `mask=0` with common scale factor
- `per_oc` corresponds to `mask=1<<1` (i.e. output channels) with different scale factors
Next, `post_ops` stands for post operation sequence. Currently supported post
ops are:
- `relu` with no parameters (i.e. corresponding scale is 1., alg = eltwise_relu, alpha = beta = 0.)
- `sum` with optional parameter scale (default 1.)
### convolution configurations (aka precision specification)
`--cfg` option specifies what convolution would be used in terms of data type.
Also it defines all the magic with data filling inside. For integer type
saturation is implicitly implied.
Finally configuration defines threshold for computation errors (ideally we
want keep it 0 and it seems to work for now).
The table below shows cases supported by Intel MKL-DNN and corresponding
configurations for **benchdnn**:
|src type | wei type | dst type | acc type | cfg | notes
|:--- |:--- |:--- |:--- |:--- |:---
| f32 | f32 | f32 | f32 | f32 | inference optimized for sse4.2+, training avx2+
| s16 | s16 | s32 | s32 | s16s16s32s32 | optimized for processors with support of 4vnni, forward pass only (aka FWD_D, FWD_B)
| s32 | s16 | s16 | s32 | s32s16s16s32 | optimized for processors with support of 4vnni, backward wrt data only (aka BWD_D)
| s16 | s32 | s16 | s32 | s16s32s16s32 | optimized for processors with support of 4vnni, backward wrt weights (aka BWD_W, BWD_WB)
| u8 | s8 | f32 | s32 | u8s8f32s32 | optimized for processors with support of avx512vl, forward pass only (aka FWD_D, FWD_B)
| u8 | s8 | s32 | s32 | u8s8s32s32 | same notes as for u8s8s32s32
| u8 | s8 | s8 | s32 | u8s8s8s32 | same notes as for u8s8s32s32
| u8 | s8 | u8 | s32 | u8s8u8s32 | same notes as for u8s8s32s32
## Performance measurements
**benchdnn** supports custom performance report. Template is passed via
command line and consists of terminal and nonterminal symbols. Nonterminal
symbols are printed as is. Description of terminal symbols is given below.
There is also a notion of modifiers (marked as @) that change meaning of
terminal symbols, e.g. sign '-' means minimum of (in terms of time). See
table of modifiers below.
> **caution:** threads have to be pinned in order to get consistent frequency
| abbreviation | description
|:------------ |:-----------
| %d | problem descriptor
| %D | expanded problem descriptor (conv parameters in csv format)
| %n | problem name
| %z | direction
| %@F | effective cpu frequency computed as clocks[@] / time[@]
| %O | number of ops required (padding is not taken into account)
| %@t | time in ms
| %@c | time in clocks
| %@p | ops per second
| modifier | description
|:-------- |:-----------
| | default
| - | min (time) -- default
| 0 | avg (time)
| + | max (time)
| |
| K | Kilo (1e3)
| M | Mega (1e6)
| G | Giga (1e9)
The definition of expanded problem descriptor is:
`g,mb,ic,ih,iw,oc,oh,ow,kh,kw,sh,sw,ph,pw`.
The default template can be found in conv/bench_conv.cpp that is defined as
`perf,%n,%d,%GO,%GF,%-t,%-Gp,%0t,%0Gp`. That will produce the following output
in CSV format:
```
string: perf
convolution name
full conv-desc
number of giga ops calculated
effective cpu frequency in GHz (amb clocks[min] / time[min])
minimum time spent in ms
best gigaops (since it corresponds to mimimum time)
average time spent in ms
average gigaops (since it corresponds to average time)
```
## Examples
Run the set of f32 forward convolutions from inputs/conv_all file w/ bias and default minibatch:
```
$ ./benchdnn --conv \
--cfg=f32 --dir=FWD_B --batch=inputs/conv_all
```
Run t
没有合适的资源?快使用搜索试试~ 我知道了~
资源详情
资源评论
资源推荐
收起资源包目录
mkl-dnn-0.14.zip_DNN_matlab_mkl (422个子文件)
_clang-format 3KB
prepare_mkl.bat 2KB
bnorm_googlenet_v2 4KB
bnorm_googlenet_v3 6KB
bnorm_lgmb 16B
bnorm_resnet_50 3KB
bnorm_topo 78B
simple_training_net.c 39KB
simple_net.c 18KB
api.c 13KB
simple_convolution.c 9KB
only_convolution.c 6KB
only_pooling.c 5KB
gtest.cc 191KB
gtest-death-test.cc 50KB
gtest-port.cc 42KB
gtest-filepath.cc 14KB
gtest-printers.cc 12KB
gtest-test-part.cc 4KB
gtest-typed-test.cc 4KB
gtest-all.cc 2KB
gtest_main.cc 2KB
MKL.cmake 6KB
platform.cmake 5KB
SDL.cmake 2KB
OpenMP.cmake 2KB
Doxygen.cmake 2KB
profiling.cmake 1KB
conv_3d_unet 579B
conv_alexnet 269B
conv_all 163B
conv_dilated 728B
conv_googlenet_v1 4KB
conv_googlenet_v2 6KB
conv_googlenet_v3 9KB
conv_mobilenet 751B
conv_regression_general 2KB
conv_regression_padding 470B
conv_regression_small_spatial 1KB
conv_resnet_50 4KB
conv_resnet_50_sparse 5KB
conv_segnet 348B
conv_ssd_300_voc0712 2KB
conv_vgg_11 460B
conv_vgg_19 860B
conv_xception 2KB
COPYRIGHT 3KB
jit_avx512_common_conv_kernel.cpp 147KB
jit_avx2_gemm_f32.cpp 100KB
jit_avx512_common_convolution_winograd.cpp 88KB
jit_avx512_common_gemm_f32.cpp 82KB
test_pooling_forward.cpp 74KB
jit_avx512_core_conv_winograd_kernel_f32.cpp 70KB
jit_avx512_common_convolution.cpp 58KB
jit_avx512_common_conv_winograd_kernel_f32.cpp 58KB
jit_avx512_core_convolution_winograd.cpp 50KB
jit_avx512_common_1x1_conv_kernel.cpp 45KB
jit_uni_lrn_kernel_f32.cpp 43KB
jit_uni_batch_normalization.cpp 40KB
test_pooling_backward.cpp 39KB
ref_rnn.cpp 38KB
jit_avx2_conv_kernel_f32.cpp 36KB
jit_uni_eltwise.cpp 35KB
jit_transpose_src_utils.cpp 35KB
jit_avx512_core_u8s8s32x_wino_convolution.cpp 33KB
simple_net.cpp 33KB
test_lrn_backward.cpp 31KB
jit_avx512_common_1x1_convolution.cpp 30KB
jit_avx512_common_lrn.cpp 29KB
ref_rnn.cpp 29KB
jit_avx512_core_u8s8s32x_1x1_conv_kernel.cpp 26KB
jit_sse42_1x1_conv_kernel_f32.cpp 24KB
simple_rnn.cpp 24KB
test_batch_normalization.cpp 24KB
bnorm.cpp 23KB
jit_avx2_1x1_conv_kernel_f32.cpp 23KB
conv.cpp 22KB
jit_uni_reorder.cpp 22KB
jit_uni_dw_conv_kernel_f32.cpp 22KB
jit_uni_pool_kernel_f32.cpp 21KB
rnn.cpp 21KB
memory_desc_wrapper.cpp 21KB
test_lrn_forward.cpp 21KB
jit_avx2_1x1_convolution.cpp 20KB
cpu_reducer.cpp 19KB
test_deconvolution.cpp 19KB
simple_training_net.cpp 19KB
jit_avx512_core_u8s8s32x_conv_kernel.cpp 18KB
jit_avx512_core_i8i8_pooling.cpp 18KB
cpu_reorder.cpp 17KB
jit_sse42_conv_kernel_f32.cpp 17KB
rnn_aux.cpp 16KB
gemm_convolution_utils.cpp 16KB
ref_convolution.cpp 15KB
test_inner_product_backward_weights.cpp 15KB
ncsp_batch_normalization.cpp 14KB
test_reorder.cpp 14KB
ref_pooling.cpp 14KB
test_eltwise.cpp 13KB
cpu_engine.cpp 13KB
共 422 条
- 1
- 2
- 3
- 4
- 5
weixin_42653672
- 粉丝: 93
- 资源: 1万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0