benchncnn can be used to test neural network inference performance
Only the network definition files (ncnn param) are required.
The large model binary files (ncnn bin) are not loaded but generated randomly for speed test.
More model networks may be added later.
---
Build
```
# assume you have already build ncnn library successfully
# uncomment the following line in <ncnn-root-dir>/CMakeLists.txt with your favorite editor
# add_subdirectory(benchmark)
$ cd <ncnn-root-dir>/<your-build-dir>
$ make -j4
# you can find benchncnn binary in <ncnn-root-dir>/<your-build-dir>/benchmark
```
Usage
```
# copy all param files to the current directory
$ ./benchncnn [loop count] [num threads] [powersave] [gpu device] [cooling down]
```
run benchncnn on android device
```
# for running on android device, upload to /data/local/tmp/ folder
$ adb push benchncnn /data/local/tmp/
$ adb push <ncnn-root-dir>/benchmark/*.param /data/local/tmp/
$ adb shell
# executed in android adb shell
$ cd /data/local/tmp/
$ ./benchncnn [loop count] [num threads] [powersave] [gpu device] [cooling down]
```
Parameter
|param|options|default|
|---|---|---|
|loop count|1~N|4|
|num threads|1~N|max_cpu_count|
|powersave|0=all cores, 1=little cores only, 2=big cores only|0|
|gpu device|-1=cpu-only, 0=gpu0, 1=gpu1 ...|-1|
|cooling down|0=disable, 1=enable|1|
---
Typical output (executed in android adb shell)
Qualcomm SM8150-AC Snapdragon 855+ (Kyro485 2.96 GHz + 2.42 GHz x 3 + 1.80 GHz x 4 + Adreno 640)
```
OnePlus7T:/data/local/tmp $ ./benchncnn 8 4 2 -1 1
[0 Adreno (TM) 640] queueC=0[3] queueG=0[3] queueT=0[3]
[0 Adreno (TM) 640] buglssc=0 bugsbn1=0 buglbia=0 bugihfa=1
[0 Adreno (TM) 640] fp16p=1 fp16s=0 fp16a=1 int8s=0 int8a=0
loop_count = 8
num_threads = 4
powersave = 2
gpu_device = -1
cooling_down = 1
squeezenet min = 8.84 max = 8.89 avg = 8.87
squeezenet_int8 min = 11.86 max = 11.98 avg = 11.89
mobilenet min = 11.36 max = 11.46 avg = 11.40
mobilenet_int8 min = 26.63 max = 26.76 avg = 26.70
mobilenet_v2 min = 9.67 max = 9.79 avg = 9.72
mobilenet_v3 min = 9.14 max = 9.40 avg = 9.22
shufflenet min = 6.69 max = 6.89 avg = 6.79
shufflenet_v2 min = 5.16 max = 5.41 avg = 5.25
mnasnet min = 8.62 max = 8.73 avg = 8.69
proxylessnasnet min = 10.16 max = 10.26 avg = 10.22
efficientnet_b0 min = 16.94 max = 17.10 avg = 17.02
regnety_400m min = 16.77 max = 16.99 avg = 16.90
blazeface min = 1.88 max = 2.36 avg = 2.04
googlenet min = 27.83 max = 28.06 avg = 27.95
googlenet_int8 min = 38.19 max = 38.38 avg = 38.29
resnet18 min = 29.89 max = 29.98 avg = 29.92
resnet18_int8 min = 36.57 max = 36.71 avg = 36.62
alexnet min = 30.67 max = 30.91 avg = 30.81
vgg16 min = 159.45 max = 164.00 avg = 162.05
vgg16_int8 min = 249.24 max = 250.14 avg = 249.64
resnet50 min = 64.06 max = 64.82 avg = 64.24
resnet50_int8 min = 77.52 max = 77.85 avg = 77.62
squeezenet_ssd min = 28.52 max = 28.84 avg = 28.64
squeezenet_ssd_int8 min = 36.10 max = 36.31 avg = 36.21
mobilenet_ssd min = 24.05 max = 24.29 avg = 24.19
mobilenet_ssd_int8 min = 39.57 max = 40.00 avg = 39.70
mobilenet_yolo min = 54.10 max = 55.55 avg = 54.86
mobilenetv2_yolov3 min = 30.92 max = 31.09 avg = 30.98
OnePlus7T:/data/local/tmp $ ./benchncnn 8 1 2 -1 1
[0 Adreno (TM) 640] queueC=0[3] queueG=0[3] queueT=0[3]
[0 Adreno (TM) 640] buglssc=0 bugsbn1=0 buglbia=0 bugihfa=1
[0 Adreno (TM) 640] fp16p=1 fp16s=0 fp16a=1 int8s=0 int8a=0
loop_count = 8
num_threads = 1
powersave = 2
gpu_device = -1
cooling_down = 1
squeezenet min = 18.12 max = 18.30 avg = 18.22
squeezenet_int8 min = 27.24 max = 27.37 avg = 27.30
mobilenet min = 29.91 max = 30.11 avg = 29.98
mobilenet_int8 min = 63.81 max = 64.10 avg = 63.96
mobilenet_v2 min = 20.77 max = 20.99 avg = 20.86
mobilenet_v3 min = 18.65 max = 18.78 avg = 18.72
shufflenet min = 11.64 max = 11.77 avg = 11.70
shufflenet_v2 min = 10.08 max = 10.16 avg = 10.12
mnasnet min = 19.25 max = 19.49 avg = 19.36
proxylessnasnet min = 24.15 max = 24.36 avg = 24.27
efficientnet_b0 min = 42.89 max = 43.14 avg = 43.00
regnety_400m min = 26.08 max = 26.23 avg = 26.15
blazeface min = 3.74 max = 3.96 avg = 3.83
googlenet min = 63.38 max = 63.54 avg = 63.45
googlenet_int8 min = 90.35 max = 90.65 avg = 90.48
resnet18 min = 56.61 max = 57.02 avg = 56.75
resnet18_int8 min = 89.95 max = 90.08 avg = 90.02
alexnet min = 70.55 max = 70.69 avg = 70.62
vgg16 min = 306.45 max = 306.91 avg = 306.62
vgg16_int8 min = 526.03 max = 526.50 avg = 526.28
resnet50 min = 145.12 max = 145.78 avg = 145.38
resnet50_int8 min = 195.47 max = 196.43 avg = 195.93
squeezenet_ssd min = 45.31 max = 45.65 avg = 45.52
squeezenet_ssd_int8 min = 71.72 max = 71.96 avg = 71.89
mobilenet_ssd min = 61.36 max = 61.68 avg = 61.45
mobilenet_ssd_int8 min = 99.53 max = 99.81 avg = 99.70
mobilenet_yolo min = 134.94 max = 135.08 avg = 135.02
mobilenetv2_yolov3 min = 71.09 max = 71.24 avg = 71.16
OnePlus7T:/data/local/tmp $ ./benchncnn 8 1 2 0 1
[0 Adreno (TM) 640] queueC=0[3] queueG=0[3] queueT=0[3]
[0 Adreno (TM) 640] buglssc=0 bugsbn1=0 buglbia=0 bugihfa=1
[0 Adreno (TM) 640] fp16p=1 fp16s=0 fp16a=1 int8s=0 int8a=0
loop_count = 8
num_threads = 1
powersave = 2
gpu_device = 0
cooling_down = 1
squeezenet min = 9.27 max = 9.56 avg = 9.43
mobilenet min = 13.04 max = 13.42 avg = 13.23
mobilenet_v2 min = 10.92 max = 11.33 avg = 11.06
mobilenet_v3 min = 12.28 max = 12.78 avg = 12.45
shufflenet min = 8.26 max = 8.47 avg = 8.38
shufflenet_v2 min = 9.03 max = 9.28 avg = 9.14
mnasnet min = 11.40 max = 11.76 avg = 11.60
proxylessnasnet min = 12.40 max = 12.92 avg = 12.55
efficientnet_b0 min = 23.04 max = 23.29 avg = 23.15
regnety_400m min = 15.85 max = 16.38 avg = 16.16
blazeface min = 2.80 max = 3.80 avg = 3.24
googlenet min = 29.84 max = 30.14 avg = 29.97
resnet18 min = 25.12 max = 25.50 avg = 25.31
alexnet min = 30.62 max = 31.66 avg = 31.23
vgg16 min = 159.00 max = 183.80 avg = 170.15
resnet50 min = 59.69 max = 60.17 avg = 59.98
squeezenet_ssd min = 39.39 max = 40.21 avg = 39.97
mobilenet_ssd min = 27.95 max = 28.15 avg = 28.05
mobilenet_yolo min = 53.29 max = 54.21 avg = 53.98
mobilenetv2_yolov3 min = 28.68 max = 28.92 avg = 28.79
```
Qualcomm MSM6150 Snapdragon 675 (Kyro460 2.0GHz x 2 + Kyro460 1.7GHz x 6 + Adreno 612)
```
violet:/data/local/tmp/ncnn $ ./benchncnn 8 2 0
loop_count = 8
num_threads = 2
powersave = 0
gpu_device = -1
squeezenet min = 23.29 max = 24.65 avg = 23.95
squeezenet_int8 min = 23.24 max = 61.55 avg = 31.20
mobilenet min = 31.60 max = 32.10 avg = 31.80
mobilenet_int8 min = 30.35 max = 32.03 avg = 30.95
mobilenet_v2 min = 25.92 max = 26.45 avg = 26.08
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
算法部署-为NCNN添加CUDA后端支持以实现GPU加速-优质算法部署项目实战.zip (2000个子文件)
.astylerc 623B
build.bat 3KB
BUILD.bazel 6KB
squeezenet_v1.1.bin 4.71MB
squeezenet_v1.1.param.bin 4KB
squeezenet_v1.1.caffemodel 4.72MB
tf_types.cc 16KB
tf_attributes.cc 4KB
continuous.cfg 2KB
presubmit.cfg 2KB
continuous.cfg 2KB
continuous.cfg 2KB
presubmit.cfg 2KB
presubmit.cfg 2KB
.clang-format 4KB
.clang-format 379B
ncnn_generate_shader_spv_header.cmake 42KB
ios.toolchain.cmake 21KB
ncnn_add_layer.cmake 18KB
ChooseMSVCCRT.cmake 4KB
iossimxc.toolchain.cmake 2KB
iossimxc-x64.toolchain.cmake 2KB
iosxc.toolchain.cmake 1KB
iosxc-arm64.toolchain.cmake 1KB
ncnn_add_shader.cmake 1KB
himix200.toolchain.cmake 1KB
hisiv300.toolchain.cmake 1KB
hisiv500.toolchain.cmake 1KB
hisiv600.toolchain.cmake 1022B
himix100.toolchain.cmake 1002B
jetson.toolchain.cmake 938B
ncnn_generate_shader_comp_header.cmake 882B
mips-mti-linux-gnu.toolchain.cmake 790B
host-c.clang.toolchain.cmake 764B
host-c.gcc.toolchain.cmake 758B
arm-linux-gnueabi-c.toolchain.cmake 688B
riscv64-unknown-elf.toolchain.cmake 673B
riscv32-unknown-elf.toolchain.cmake 673B
aarch64-linux-gnu-c.toolchain.cmake 620B
mipsisa64r6el-linux-gnuabi64.toolchain.cmake 616B
arm-linux-gnueabi.toolchain.cmake 580B
arm-linux-gnueabihf.toolchain.cmake 568B
mipsisa32r6el-linux-gnu.toolchain.cmake 532B
powerpc64le-linux-gnu.toolchain.cmake 530B
host.gcc.toolchain.cmake 526B
mips64el-linux-gnuabi64.toolchain.cmake 523B
ncnn_generate_arm82_source.cmake 514B
aarch64-linux-gnu.toolchain.cmake 512B
ncnn_generate_avx2_source.cmake 511B
riscv64-linux-gnu.toolchain.cmake 510B
mipsel-linux-gnu.toolchain.cmake 507B
host.clang-m32.toolchain.cmake 484B
host.gcc-m32.toolchain.cmake 478B
host.gcc-c++03.toolchain.cmake 459B
run_test.cmake 275B
pi3.toolchain.cmake 170B
build-android.cmd 3KB
padding_pack4to8.comp 47KB
padding_pack8to4.comp 28KB
spv.subgroupPartitioned.comp 28KB
glsl.es320.subgroupPartitioned.comp 21KB
normalize_reduce_sum4_fp16_to_fp32_pack8.comp 17KB
padding_pack1to8.comp 16KB
glsl.es320.subgroupArithmetic.comp 15KB
normalize_reduce_sum4_fp32_pack8.comp 14KB
crop_pack4to8.comp 14KB
padding_pack8.comp 13KB
convolution_pack8_1x1s1d1.comp 13KB
padding_pack4.comp 12KB
normalize_reduce_sum4_fp16_to_fp32_pack4.comp 12KB
normalize_reduce_sum4_fp16_to_fp32.comp 12KB
padding_pack8to1.comp 11KB
padding_pack4to1.comp 11KB
permute_pack8.comp 11KB
normalize_reduce_sum4_fp32.comp 11KB
padding_pack1to4.comp 11KB
normalize_reduce_sum4_fp32_pack4.comp 11KB
permute_pack4to8.comp 11KB
permute_pack1to8.comp 10KB
permute_pack8to1.comp 10KB
reshape_pack8.comp 10KB
crop_pack8to4.comp 10KB
reshape_pack4to8.comp 9KB
convolution_pack8_3x3s1d1_winograd23_transform_output.comp 9KB
deconvolutiondepthwise_group_pack8.comp 9KB
spv.subgroupExtendedTypesQuadNeg.comp 9KB
spv.subgroupExtendedTypesQuad.comp 9KB
binaryop_broadcast.comp 9KB
convolution_pack8_3x3s1d1_winograd23_gemm.comp 9KB
deconvolution_pack8.comp 8KB
deconvolutiondepthwise_group_pack4to8.comp 8KB
convolutiondepthwise_group_pack8.comp 8KB
binaryop_broadcast_pack8.comp 8KB
deconvolution_pack4to8.comp 8KB
permute_pack8to4.comp 8KB
padding.comp 8KB
packing_pack1to8_fp16_to_fp32.comp 8KB
packing_pack1to8_fp32_to_fp16.comp 8KB
permute_pack4.comp 8KB
convolution_pack8.comp 8KB
共 2000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 20
资源评论
极智视界
- 粉丝: 2w+
- 资源: 1524
下载权益
C知道特权
VIP文章
课程特权
开通VIP
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功