mkl-dnn-0.14.zip_DNN_matlab_mkl资源-CSDN文库

共422个文件

cpp：154个

hpp：133个

h：41个

版权申诉

dnn

matlab

64 浏览量 2022-09-21 01:31:41 上传评论收藏 1.35MB ZIP 举报

资源详情

资源评论

资源推荐

收起资源包目录

mkl-dnn-0.14.zip_DNN_matlab_mkl （422个子文件）

_clang-format 3KB

prepare_mkl.bat 2KB

bnorm_googlenet_v2 4KB

bnorm_googlenet_v3 6KB

bnorm_lgmb 16B

bnorm_resnet_50 3KB

bnorm_topo 78B

simple_training_net.c 39KB

simple_net.c 18KB

api.c 13KB

simple_convolution.c 9KB

only_convolution.c 6KB

only_pooling.c 5KB

gtest.cc 191KB

gtest-death-test.cc 50KB

gtest-port.cc 42KB

gtest-filepath.cc 14KB

gtest-printers.cc 12KB

gtest-test-part.cc 4KB

gtest-typed-test.cc 4KB

gtest-all.cc 2KB

gtest_main.cc 2KB

MKL.cmake 6KB

platform.cmake 5KB

SDL.cmake 2KB

OpenMP.cmake 2KB

Doxygen.cmake 2KB

profiling.cmake 1KB

conv_3d_unet 579B

conv_alexnet 269B

conv_all 163B

conv_dilated 728B

conv_googlenet_v1 4KB

conv_googlenet_v2 6KB

conv_googlenet_v3 9KB

conv_mobilenet 751B

conv_regression_general 2KB

conv_regression_padding 470B

conv_regression_small_spatial 1KB

conv_resnet_50 4KB

conv_resnet_50_sparse 5KB

conv_segnet 348B

conv_ssd_300_voc0712 2KB

conv_vgg_11 460B

conv_vgg_19 860B

conv_xception 2KB

jit_avx512_common_conv_kernel.cpp 147KB

jit_avx2_gemm_f32.cpp 100KB

jit_avx512_common_convolution_winograd.cpp 88KB

jit_avx512_common_gemm_f32.cpp 82KB

test_pooling_forward.cpp 74KB

jit_avx512_core_conv_winograd_kernel_f32.cpp 70KB

jit_avx512_common_convolution.cpp 58KB

jit_avx512_common_conv_winograd_kernel_f32.cpp 58KB

jit_avx512_core_convolution_winograd.cpp 50KB

jit_avx512_common_1x1_conv_kernel.cpp 45KB

jit_uni_lrn_kernel_f32.cpp 43KB

jit_uni_batch_normalization.cpp 40KB

test_pooling_backward.cpp 39KB

ref_rnn.cpp 38KB

jit_avx2_conv_kernel_f32.cpp 36KB

jit_uni_eltwise.cpp 35KB

jit_transpose_src_utils.cpp 35KB

jit_avx512_core_u8s8s32x_wino_convolution.cpp 33KB

simple_net.cpp 33KB

test_lrn_backward.cpp 31KB

jit_avx512_common_1x1_convolution.cpp 30KB

jit_avx512_common_lrn.cpp 29KB

ref_rnn.cpp 29KB

jit_avx512_core_u8s8s32x_1x1_conv_kernel.cpp 26KB

jit_sse42_1x1_conv_kernel_f32.cpp 24KB

simple_rnn.cpp 24KB

test_batch_normalization.cpp 24KB

bnorm.cpp 23KB

jit_avx2_1x1_conv_kernel_f32.cpp 23KB

conv.cpp 22KB

jit_uni_reorder.cpp 22KB

jit_uni_dw_conv_kernel_f32.cpp 22KB

jit_uni_pool_kernel_f32.cpp 21KB

rnn.cpp 21KB

memory_desc_wrapper.cpp 21KB

test_lrn_forward.cpp 21KB

jit_avx2_1x1_convolution.cpp 20KB

cpu_reducer.cpp 19KB

test_deconvolution.cpp 19KB

simple_training_net.cpp 19KB

jit_avx512_core_u8s8s32x_conv_kernel.cpp 18KB

jit_avx512_core_i8i8_pooling.cpp 18KB

cpu_reorder.cpp 17KB

jit_sse42_conv_kernel_f32.cpp 17KB

rnn_aux.cpp 16KB

gemm_convolution_utils.cpp 16KB

ref_convolution.cpp 15KB

test_inner_product_backward_weights.cpp 15KB

ncsp_batch_normalization.cpp 14KB

test_reorder.cpp 14KB

ref_pooling.cpp 14KB

test_eltwise.cpp 13KB

cpu_engine.cpp 13KB

共 422 条

# benchdnn **benchdnn** is a standalone correctness and performance benchmark for [Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN)](/intel/mkl-dnn) library. The purpose of the benchmark is extended and robust correctness verification of the primitives provided by MKL-DNN. So far **benchdnn** supports convolutions and inner products of different data types. It also implicitly tests reorders. ## License **benchdnn** is licensed under [Apache License Version 2.0](http://www.apache.org/licenses/LICENSE-2.0). ## Usage (main driver) **benchdnn** itself is a driver for different implementation specific harnesses. So far it has harness for Intel MKL-DNN convolution, inner product, reorder, batch normalization, and harness for testing itself. The usage: ``` $ ./benchdnn: [--HARNESS] [--mode=MODE] [-vN|--verbose=N] HARNESS-OPTS ``` where: - `HARNESS` is either `conv` [default], `ip`, `reorder`, `bnorm`, `rnn` or `self` - `MODE` -- string that contains flags for benchmark mode. Use `C` or `c` for correctness (used by default), and `P` or `p` for performance - `N` -- verbose level (integer from 0 [default] to ...) - `HARNESS-OPTS` are passed to the chosen harness Returns `0` on success (all tests passed), and non-zero in case of any error happened. ## Usage (convolution harness) The usage: ``` [harness-knobs] [conv-desc] ... ``` where *harness-knobs* are: - `--cfg={f32, u8s8u8s32, ...}` configuration (see below), default `f32` - `--dir={FWD_D (forward data), FWD_B (forward data + bias), BWD_D (backward data), BWD_W (backward weights), BWD_WB (backward weights + bias)}` direction, default `FWD_B` - `--alg={DIRECT, WINO}` convolution algorithm, default DIRECT - `--merge={NONE, RELU}` merged primitive, default NONE (nothing merged) - `--attr="attr_str"` convolution attributes (see in the section below), default `""` (no attributes set) - `--mb=N` override minibatch that is specified in convolution description, default `0` (use mb specified in conv desc) - `--match=regex` check only convolutions that match with regex, default is `".*"`. Notice: Windows may only interpret string arguments surrounded by double quotation marks. - `--skip-impl="str1[:str2]..."` skip implementation (see mkldnn_query_impl_info_str), default `""` - `--allow-unimpl=true|false` do not treat unimplemented configuration as an error, default `false` - `--perf-template=template-str` set template for performance report (see section *Performance measurements*) - `--reset` reset all the parameters set before to default one - `-vN|--verbose=N` verbose level, default `0` - `--batch=file` use options from the given file (see in subdirectory) and *conv-desc* is convolution description. The canonical form is: ``` gXmbXicXihXiwXocXohXowXkhXkwXshXswXphXpwXdhXdwXnS ``` Here X is a number and S is string (n stands for name). Some of the parameters might be omitted if there is either default one (e.g. if g is not specified **benchdnn** uses 1) or if the can be computed automatically (e.g. output shape can be derived from the input one and kernel). Also if either width or height is not specified than it is assumed height == width. Special symbol `_` is ignored, hence maybe used as delimiter. See `str2desc()` in conv/conv_aux.cpp for more details and implicit rules :^) The attribute string *attr_str* is defined as (new lines for readability): ``` [irmode={nearest,down};] [oscale={none,common,per_oc}[:scale];] [post_ops='[{relu,sum[:sum_scale]};]...';] ``` Here `irmode` defines the rounding mode for integer output (default is nearest). Next, `oscale` stands for output_scales. The first parameter is the policy that is defined below. The second optional parameter is a scale that specifies either the one common output scale (for `none` and `common` polices) or a starting point for `per_oc` policy, which uses many scales. The default scale is 1.0. Known policies are: - `none` (default) means no output scales set (i.e. scale = 1.) - `common` corresponds to `mask=0` with common scale factor - `per_oc` corresponds to `mask=1<<1` (i.e. output channels) with different scale factors Next, `post_ops` stands for post operation sequence. Currently supported post ops are: - `relu` with no parameters (i.e. corresponding scale is 1., alg = eltwise_relu, alpha = beta = 0.) - `sum` with optional parameter scale (default 1.) ### convolution configurations (aka precision specification) `--cfg` option specifies what convolution would be used in terms of data type. Also it defines all the magic with data filling inside. For integer type saturation is implicitly implied. Finally configuration defines threshold for computation errors (ideally we want keep it 0 and it seems to work for now). The table below shows cases supported by Intel MKL-DNN and corresponding configurations for **benchdnn**: |src type | wei type | dst type | acc type | cfg | notes |:--- |:--- |:--- |:--- |:--- |:--- | f32 | f32 | f32 | f32 | f32 | inference optimized for sse4.2+, training avx2+ | s16 | s16 | s32 | s32 | s16s16s32s32 | optimized for processors with support of 4vnni, forward pass only (aka FWD_D, FWD_B) | s32 | s16 | s16 | s32 | s32s16s16s32 | optimized for processors with support of 4vnni, backward wrt data only (aka BWD_D) | s16 | s32 | s16 | s32 | s16s32s16s32 | optimized for processors with support of 4vnni, backward wrt weights (aka BWD_W, BWD_WB) | u8 | s8 | f32 | s32 | u8s8f32s32 | optimized for processors with support of avx512vl, forward pass only (aka FWD_D, FWD_B) | u8 | s8 | s32 | s32 | u8s8s32s32 | same notes as for u8s8s32s32 | u8 | s8 | s8 | s32 | u8s8s8s32 | same notes as for u8s8s32s32 | u8 | s8 | u8 | s32 | u8s8u8s32 | same notes as for u8s8s32s32 ## Performance measurements **benchdnn** supports custom performance report. Template is passed via command line and consists of terminal and nonterminal symbols. Nonterminal symbols are printed as is. Description of terminal symbols is given below. There is also a notion of modifiers (marked as @) that change meaning of terminal symbols, e.g. sign '-' means minimum of (in terms of time). See table of modifiers below. > **caution:** threads have to be pinned in order to get consistent frequency | abbreviation | description |:------------ |:----------- | %d | problem descriptor | %D | expanded problem descriptor (conv parameters in csv format) | %n | problem name | %z | direction | %@F | effective cpu frequency computed as clocks[@] / time[@] | %O | number of ops required (padding is not taken into account) | %@t | time in ms | %@c | time in clocks | %@p | ops per second | modifier | description |:-------- |:----------- | | default | - | min (time) -- default | 0 | avg (time) | + | max (time) | | | K | Kilo (1e3) | M | Mega (1e6) | G | Giga (1e9) The definition of expanded problem descriptor is: `g,mb,ic,ih,iw,oc,oh,ow,kh,kw,sh,sw,ph,pw`. The default template can be found in conv/bench_conv.cpp that is defined as `perf,%n,%d,%GO,%GF,%-t,%-Gp,%0t,%0Gp`. That will produce the following output in CSV format: ``` string: perf convolution name full conv-desc number of giga ops calculated effective cpu frequency in GHz (amb clocks[min] / time[min]) minimum time spent in ms best gigaops (since it corresponds to mimimum time) average time spent in ms average gigaops (since it corresponds to average time) ``` ## Examples Run the set of f32 forward convolutions from inputs/conv_all file w/ bias and default minibatch: ``` $ ./benchdnn --conv \ --cfg=f32 --dir=FWD_B --batch=inputs/conv_all ``` Run t