benchncnn can be used to test neural network inference performance
Only the network definition files (ncnn param) are required.
The large model binary files (ncnn bin) are not loaded but generated randomly for speed test.
More model networks may be added later.
---
Build
```shell
# assume you have already build ncnn library successfully
# uncomment the following line in <ncnn-root-dir>/CMakeLists.txt with your favorite editor
# add_subdirectory(benchmark)
cd <ncnn-root-dir>/<your-build-dir>
make -j4
# you can find benchncnn binary in <ncnn-root-dir>/<your-build-dir>/benchmark
```
Usage
```shell
# copy all param files to the current directory
./benchncnn [loop count] [num threads] [powersave] [gpu device] [cooling down]
```
run benchncnn on android device
```shell
# for running on android device, upload to /data/local/tmp/ folder
adb push benchncnn /data/local/tmp/
adb push <ncnn-root-dir>/benchmark/*.param /data/local/tmp/
adb shell
# executed in android adb shell
cd /data/local/tmp/
./benchncnn [loop count] [num threads] [powersave] [gpu device] [cooling down]
```
Parameter
|param|options|default|
|---|---|---|
|loop count|1~N|4|
|num threads|1~N|max_cpu_count|
|powersave|0=all cores, 1=little cores only, 2=big cores only|0|
|gpu device|-1=cpu-only, 0=gpu0, 1=gpu1 ...|-1|
|cooling down|0=disable, 1=enable|1|
Tips: Disable android UI server and set CPU and GPU to max frequency
```shell
# stopping android ui server, can be retarted later via adb shell start
adb root
adb shell stop
# executed in android adb shell
# set cpu performance mode
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor
# set gpu performance mode (eg. RK3399)
echo "performance" > /sys/class/misc/mali0/device/devfreq/ff9a0000.gpu/governor
# set gpu performance mode (eg. Android Adreno)
echo 1 > /sys/class/kgsl/kgsl-3d0/force_clk_on
echo 10000000 > /sys/class/kgsl/kgsl-3d0/idle_timer
echo "performance" > /sys/class/kgsl/kgsl-3d0/devfreq/governor
echo <max freq> > /sys/class/kgsl/kgsl-3d0/gpuclk
```
---
Typical output (executed in android adb shell)
### NVIDIA Jetson AGX Orin (Cortex-A78AE 2.2 GHz x 12 + [email protected] GHz Tensor Cores 64)
```
i@orin:~/projects/ncnn/benchmark$ ./benchncnn 64 1 0 -1 0
loop_count = 64
num_threads = 1
powersave = 0
gpu_device = -1
cooling_down = 0
squeezenet min = 11.66 max = 11.80 avg = 11.74
squeezenet_int8 min = 12.24 max = 12.39 avg = 12.31
mobilenet min = 19.56 max = 19.73 avg = 19.65
mobilenet_int8 min = 16.06 max = 16.25 avg = 16.14
mobilenet_v2 min = 13.20 max = 13.41 avg = 13.29
mobilenet_v3 min = 11.39 max = 11.57 avg = 11.48
shufflenet min = 8.07 max = 8.18 avg = 8.11
shufflenet_v2 min = 8.41 max = 8.51 avg = 8.45
mnasnet min = 12.74 max = 12.91 avg = 12.79
proxylessnasnet min = 15.18 max = 15.32 avg = 15.25
efficientnet_b0 min = 26.86 max = 26.96 avg = 26.90
efficientnetv2_b0 min = 35.99 max = 36.15 avg = 36.07
regnety_400m min = 16.81 max = 16.98 avg = 16.87
blazeface min = 4.25 max = 4.37 avg = 4.29
googlenet min = 48.73 max = 48.98 avg = 48.87
googlenet_int8 min = 47.39 max = 47.60 avg = 47.49
resnet18 min = 30.93 max = 31.24 avg = 31.08
resnet18_int8 min = 55.44 max = 55.70 avg = 55.56
alexnet min = 44.19 max = 44.43 avg = 44.33
vgg16 min = 173.94 max = 174.97 avg = 174.46
vgg16_int8 min = 475.10 max = 479.37 avg = 477.33
resnet50 min = 89.50 max = 90.11 avg = 89.80
resnet50_int8 min = 106.77 max = 107.14 avg = 106.96
squeezenet_ssd min = 37.78 max = 38.35 avg = 37.93
squeezenet_ssd_int8 min = 50.48 max = 50.88 avg = 50.74
mobilenet_ssd min = 45.62 max = 46.12 avg = 45.74
mobilenet_ssd_int8 min = 37.77 max = 38.00 avg = 37.88
mobilenet_yolo min = 90.23 max = 90.49 avg = 90.35
mobilenetv2_yolov3 min = 47.27 max = 47.48 avg = 47.33
yolov4-tiny min = 60.41 max = 60.75 avg = 60.57
nanodet_m min = 19.26 max = 19.43 avg = 19.35
yolo-fastest-1.1 min = 8.16 max = 8.31 avg = 8.20
yolo-fastestv2 min = 8.26 max = 8.39 avg = 8.32
i@orin:~/projects/ncnn/benchmark$ ./benchncnn 64 2 0 -1 0
loop_count = 64
num_threads = 2
powersave = 0
gpu_device = -1
cooling_down = 0
squeezenet min = 6.83 max = 6.98 avg = 6.90
squeezenet_int8 min = 7.39 max = 7.50 avg = 7.45
mobilenet min = 10.40 max = 10.50 avg = 10.45
mobilenet_int8 min = 8.92 max = 9.09 avg = 8.99
mobilenet_v2 min = 7.67 max = 7.80 avg = 7.74
mobilenet_v3 min = 6.86 max = 7.01 avg = 6.93
shufflenet min = 6.34 max = 6.44 avg = 6.39
shufflenet_v2 min = 5.71 max = 5.83 avg = 5.76
mnasnet min = 7.47 max = 7.58 avg = 7.53
proxylessnasnet min = 8.73 max = 8.83 avg = 8.78
efficientnet_b0 min = 14.93 max = 15.13 avg = 15.03
efficientnetv2_b0 min = 20.17 max = 20.70 avg = 20.29
regnety_400m min = 12.50 max = 12.62 avg = 12.57
blazeface min = 2.95 max = 3.06 avg = 3.00
googlenet min = 26.25 max = 26.53 avg = 26.37
googlenet_int8 min = 26.54 max = 26.79 avg = 26.66
resnet18 min = 16.69 max = 16.90 avg = 16.80
resnet18_int8 min = 29.70 max = 29.93 avg = 29.81
alexnet min = 22.96 max = 23.12 avg = 23.03
vgg16 min = 88.39 max = 89.16 avg = 88.79
vgg16_int8 min = 245.86 max = 247.55 avg = 246.62
resnet50 min = 46.55 max = 46.86 avg = 46.70
resnet50_int8 min = 56.28 max = 56.63 avg = 56.43
squeezenet_ssd min = 23.65 max = 24.29 avg = 23.81
squeezenet_ssd_int8 min = 30.86 max = 31.27 avg = 30.99
mobilenet_ssd min = 25.17 max = 25.31 avg = 25.24
mobilenet_ssd_int8 min = 21.77 max = 21.97 avg = 21.84
mobilenet_yolo min = 48.03 max = 48.33 avg = 48.14
mobilenetv2_yolov3 min = 26.58 max = 26.81 avg = 26.66
yolov4-tiny min = 35.31 max = 35.53 avg = 35.41
nanodet_m min = 12.93 max = 13.08 avg = 13.01
yolo-fastest-1.1 min = 6.00 max = 6.10 avg = 6.04
yolo-fastestv2 min = 6.46 max = 6.61 avg = 6.52
i@orin:~/projects/ncnn/benchmark$ ./benchncnn 64 4 0 -1 0
loop_count = 64
num_threads = 4
powersave = 0
gpu_device = -1
cooling_down = 0
squeezenet min = 4.54 max = 4.84 avg = 4.61
squeezenet_int8 min = 4.96 max = 5.41 avg = 5.05
mobilenet min = 5.96 max = 6.23 avg = 6.04
mobilenet_int8 min = 5.21 max = 5.50 avg = 5.30
mobilenet_v2 min = 5.05 max = 5.26 avg = 5.15
mobilenet_v3 min = 4.83 max = 5.14 avg = 4.90
shufflenet min = 5.11 max = 5.34 avg = 5.18
shufflenet_v2 min = 4.13 max = 4.44 avg = 4.18
mnasnet min = 4.93 max = 5.27 avg = 5.01
proxylessnasnet min = 5.64 max = 5.89 avg = 5.72
efficient
没有合适的资源?快使用搜索试试~ 我知道了~
ncnn.zip资源包
共2000个文件
out:1206个
cpp:1122个
h:1055个
需积分: 9 0 下载量 193 浏览量
2022-12-29
09:22:49
上传
评论
收藏 33.17MB ZIP 举报
温馨提示
ncnn.zip资源包
资源推荐
资源详情
资源评论
收起资源包目录
ncnn.zip资源包 (2000个子文件)
glslang_tab.cpp 630KB
Initialize.cpp 504KB
hlslParseHelper.cpp 431KB
ParseHelper.cpp 430KB
GlslangToSpv.cpp 411KB
mat_pixel_rotate.cpp 229KB
onnx2ncnn.cpp 206KB
gpu.cpp 165KB
doc.cpp 153KB
SpvBuilder.cpp 141KB
command.cpp 135KB
Intermediate.cpp 135KB
hlslGrammar.cpp 129KB
linkValidate.cpp 103KB
convolution_x86.cpp 91KB
softmax_x86.cpp 89KB
binaryop_riscv.cpp 89KB
intermOut.cpp 88KB
mat_pixel.cpp 87KB
StandAlone.cpp 87KB
hlslParseables.cpp 84KB
ncnnoptimize.cpp 84KB
binaryop_arm.cpp 83KB
binaryop_x86.cpp 83KB
net.cpp 82KB
ShaderLang.cpp 82KB
convolution_vulkan.cpp 81KB
mxnet2ncnn.cpp 81KB
convolution_arm.cpp 80KB
requantize_x86.cpp 79KB
mat_pixel_affine.cpp 78KB
ir.cpp 76KB
binaryop_arm_asimdhp.cpp 74KB
iomapper.cpp 74KB
Scan.cpp 73KB
requantize_loongarch.cpp 71KB
requantize_mips.cpp 70KB
requantize_arm.cpp 68KB
allocator.cpp 67KB
innerproduct_arm.cpp 67KB
Versions.cpp 65KB
dequantize_arm.cpp 60KB
Constant.cpp 60KB
packing_riscv.cpp 60KB
innerproduct_loongarch.cpp 59KB
packing_x86.cpp 59KB
innerproduct_mips.cpp 58KB
main.cpp 58KB
mlir2ncnn.cpp 58KB
gru_arm.cpp 57KB
packing_arm.cpp 55KB
reflection.cpp 55KB
eltwise_arm_asimdhp.cpp 55KB
ncnn2table.cpp 55KB
convolutiondepthwise_arm.cpp 54KB
dequantize_arm_asimdhp.cpp 53KB
SPVRemapper.cpp 53KB
PpScanner.cpp 51KB
HexFloat.cpp 51KB
Pp.cpp 49KB
cpu.cpp 48KB
eltwise_arm.cpp 47KB
caffe2ncnn.cpp 45KB
deconvolution_arm_asimdhp.cpp 44KB
convolutiondepthwise_x86.cpp 43KB
F_interpolate.cpp 42KB
concat_vulkan.cpp 42KB
convolution_riscv.cpp 41KB
c_api.cpp 41KB
deconvolutiondepthwise_vulkan.cpp 41KB
slice_vulkan.cpp 41KB
gru_arm_asimdhp.cpp 40KB
reshape_vulkan.cpp 40KB
convolution_arm_asimdhp.cpp 40KB
mat_pixel_resize.cpp 40KB
mat.cpp 40KB
propagateNoContraction.cpp 40KB
lstm_arm_asimdhp.cpp 40KB
padding_vulkan.cpp 39KB
deconvolution_arm.cpp 39KB
mat_pixel_drawing.cpp 39KB
interp_riscv.cpp 38KB
lstm_arm.cpp 38KB
convolutiondepthwise_riscv.cpp 37KB
convolution_loongarch.cpp 37KB
convolution_mips.cpp 37KB
instancenorm_vulkan.cpp 36KB
dequantize_x86.cpp 36KB
reduction.cpp 36KB
crop_vulkan.cpp 36KB
hlslScanContext.cpp 35KB
convolutiondepthwise_vulkan.cpp 35KB
quantize_arm.cpp 35KB
gru_riscv.cpp 34KB
ResourceLimits.cpp 33KB
binaryop_loongarch.cpp 33KB
binaryop_mips.cpp 33KB
dequantize_mips.cpp 32KB
crop_x86.cpp 32KB
darknet2ncnn.cpp 32KB
共 2000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 20
资源评论
冰达机器人
- 粉丝: 43
- 资源: 2
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 5.23-Java概述,JDK安装及注释、关键字、标识符、数据类型、变量、常量的介绍
- 《Python基础》实验三指导书(1).doc
- TensorFlow 深度学习、机器学习-任何能够用计算流图形来表达的计算,都可以使用TensorFlow
- 2024最新学成在线网页实战项目代码
- 一个基于springboot+sureness的面向REST API资源无状态认证权限管理系统
- 王博外文文献.pdf
- python毕业设计基于社区检测的多任务聚类联邦学习项目源码+使用说明(高分项目).zip
- Javaweb项目源码-编程爱好者博客地带.zip
- java各个技术栈相关知识点
- PYthon代码 pdf合并
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功