benchncnn can be used to test neural network inference performance
Only the network definition files (ncnn param) are required.
The large model binary files (ncnn bin) are not loaded but generated randomly for speed test.
More model networks may be added later.
---
Build
```shell
# assume you have already build ncnn library successfully
# uncomment the following line in <ncnn-root-dir>/CMakeLists.txt with your favorite editor
# add_subdirectory(benchmark)
cd <ncnn-root-dir>/<your-build-dir>
make -j4
# you can find benchncnn binary in <ncnn-root-dir>/<your-build-dir>/benchmark
```
Usage
```shell
# copy all param files to the current directory
./benchncnn [loop count] [num threads] [powersave] [gpu device] [cooling down]
```
run benchncnn on android device
```shell
# for running on android device, upload to /data/local/tmp/ folder
adb push benchncnn /data/local/tmp/
adb push <ncnn-root-dir>/benchmark/*.param /data/local/tmp/
adb shell
# executed in android adb shell
cd /data/local/tmp/
./benchncnn [loop count] [num threads] [powersave] [gpu device] [cooling down]
```
Parameter
|param|options|default|
|---|---|---|
|loop count|1~N|4|
|num threads|1~N|max_cpu_count|
|powersave|0=all cores, 1=little cores only, 2=big cores only|0|
|gpu device|-1=cpu-only, 0=gpu0, 1=gpu1 ...|-1|
|cooling down|0=disable, 1=enable|1|
Tips: Disable android UI server and set CPU and GPU to max frequency
```shell
# stopping android ui server, can be retarted later via adb shell start
adb root
adb shell stop
# executed in android adb shell
# set cpu performance mode
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor
# set gpu performance mode (eg. RK3399)
echo "performance" > /sys/class/misc/mali0/device/devfreq/ff9a0000.gpu/governor
# set gpu performance mode (eg. Android Adreno)
echo 1 > /sys/class/kgsl/kgsl-3d0/force_clk_on
echo 10000000 > /sys/class/kgsl/kgsl-3d0/idle_timer
echo "performance" > /sys/class/kgsl/kgsl-3d0/devfreq/governor
echo <max freq> > /sys/class/kgsl/kgsl-3d0/gpuclk
```
---
Typical output (executed in android adb shell)
### NVIDIA Jetson AGX Orin (Cortex-A78AE 2.2 GHz x 12 + [email protected] GHz Tensor Cores 64)
```
i@orin:~/projects/ncnn/benchmark$ ./benchncnn 64 1 0 -1 0
loop_count = 64
num_threads = 1
powersave = 0
gpu_device = -1
cooling_down = 0
squeezenet min = 11.66 max = 11.80 avg = 11.74
squeezenet_int8 min = 12.24 max = 12.39 avg = 12.31
mobilenet min = 19.56 max = 19.73 avg = 19.65
mobilenet_int8 min = 16.06 max = 16.25 avg = 16.14
mobilenet_v2 min = 13.20 max = 13.41 avg = 13.29
mobilenet_v3 min = 11.39 max = 11.57 avg = 11.48
shufflenet min = 8.07 max = 8.18 avg = 8.11
shufflenet_v2 min = 8.41 max = 8.51 avg = 8.45
mnasnet min = 12.74 max = 12.91 avg = 12.79
proxylessnasnet min = 15.18 max = 15.32 avg = 15.25
efficientnet_b0 min = 26.86 max = 26.96 avg = 26.90
efficientnetv2_b0 min = 35.99 max = 36.15 avg = 36.07
regnety_400m min = 16.81 max = 16.98 avg = 16.87
blazeface min = 4.25 max = 4.37 avg = 4.29
googlenet min = 48.73 max = 48.98 avg = 48.87
googlenet_int8 min = 47.39 max = 47.60 avg = 47.49
resnet18 min = 30.93 max = 31.24 avg = 31.08
resnet18_int8 min = 55.44 max = 55.70 avg = 55.56
alexnet min = 44.19 max = 44.43 avg = 44.33
vgg16 min = 173.94 max = 174.97 avg = 174.46
vgg16_int8 min = 475.10 max = 479.37 avg = 477.33
resnet50 min = 89.50 max = 90.11 avg = 89.80
resnet50_int8 min = 106.77 max = 107.14 avg = 106.96
squeezenet_ssd min = 37.78 max = 38.35 avg = 37.93
squeezenet_ssd_int8 min = 50.48 max = 50.88 avg = 50.74
mobilenet_ssd min = 45.62 max = 46.12 avg = 45.74
mobilenet_ssd_int8 min = 37.77 max = 38.00 avg = 37.88
mobilenet_yolo min = 90.23 max = 90.49 avg = 90.35
mobilenetv2_yolov3 min = 47.27 max = 47.48 avg = 47.33
yolov4-tiny min = 60.41 max = 60.75 avg = 60.57
nanodet_m min = 19.26 max = 19.43 avg = 19.35
yolo-fastest-1.1 min = 8.16 max = 8.31 avg = 8.20
yolo-fastestv2 min = 8.26 max = 8.39 avg = 8.32
i@orin:~/projects/ncnn/benchmark$ ./benchncnn 64 2 0 -1 0
loop_count = 64
num_threads = 2
powersave = 0
gpu_device = -1
cooling_down = 0
squeezenet min = 6.83 max = 6.98 avg = 6.90
squeezenet_int8 min = 7.39 max = 7.50 avg = 7.45
mobilenet min = 10.40 max = 10.50 avg = 10.45
mobilenet_int8 min = 8.92 max = 9.09 avg = 8.99
mobilenet_v2 min = 7.67 max = 7.80 avg = 7.74
mobilenet_v3 min = 6.86 max = 7.01 avg = 6.93
shufflenet min = 6.34 max = 6.44 avg = 6.39
shufflenet_v2 min = 5.71 max = 5.83 avg = 5.76
mnasnet min = 7.47 max = 7.58 avg = 7.53
proxylessnasnet min = 8.73 max = 8.83 avg = 8.78
efficientnet_b0 min = 14.93 max = 15.13 avg = 15.03
efficientnetv2_b0 min = 20.17 max = 20.70 avg = 20.29
regnety_400m min = 12.50 max = 12.62 avg = 12.57
blazeface min = 2.95 max = 3.06 avg = 3.00
googlenet min = 26.25 max = 26.53 avg = 26.37
googlenet_int8 min = 26.54 max = 26.79 avg = 26.66
resnet18 min = 16.69 max = 16.90 avg = 16.80
resnet18_int8 min = 29.70 max = 29.93 avg = 29.81
alexnet min = 22.96 max = 23.12 avg = 23.03
vgg16 min = 88.39 max = 89.16 avg = 88.79
vgg16_int8 min = 245.86 max = 247.55 avg = 246.62
resnet50 min = 46.55 max = 46.86 avg = 46.70
resnet50_int8 min = 56.28 max = 56.63 avg = 56.43
squeezenet_ssd min = 23.65 max = 24.29 avg = 23.81
squeezenet_ssd_int8 min = 30.86 max = 31.27 avg = 30.99
mobilenet_ssd min = 25.17 max = 25.31 avg = 25.24
mobilenet_ssd_int8 min = 21.77 max = 21.97 avg = 21.84
mobilenet_yolo min = 48.03 max = 48.33 avg = 48.14
mobilenetv2_yolov3 min = 26.58 max = 26.81 avg = 26.66
yolov4-tiny min = 35.31 max = 35.53 avg = 35.41
nanodet_m min = 12.93 max = 13.08 avg = 13.01
yolo-fastest-1.1 min = 6.00 max = 6.10 avg = 6.04
yolo-fastestv2 min = 6.46 max = 6.61 avg = 6.52
i@orin:~/projects/ncnn/benchmark$ ./benchncnn 64 4 0 -1 0
loop_count = 64
num_threads = 4
powersave = 0
gpu_device = -1
cooling_down = 0
squeezenet min = 4.54 max = 4.84 avg = 4.61
squeezenet_int8 min = 4.96 max = 5.41 avg = 5.05
mobilenet min = 5.96 max = 6.23 avg = 6.04
mobilenet_int8 min = 5.21 max = 5.50 avg = 5.30
mobilenet_v2 min = 5.05 max = 5.26 avg = 5.15
mobilenet_v3 min = 4.83 max = 5.14 avg = 4.90
shufflenet min = 5.11 max = 5.34 avg = 5.18
shufflenet_v2 min = 4.13 max = 4.44 avg = 4.18
mnasnet min = 4.93 max = 5.27 avg = 5.01
proxylessnasnet min = 5.64 max = 5.89 avg = 5.72
efficient
没有合适的资源?快使用搜索试试~ 我知道了~
ncnn-master.zip
共2000个文件
cpp:1050个
h:926个
py:523个
0 下载量 15 浏览量
2022-12-21
09:56:58
上传
评论
收藏 14.61MB ZIP 举报
温馨提示
ncnn 是一个为手机端极致优化的高性能神经网络前向计算框架。 ncnn 从设计之初深刻考虑手机端的部署和使用。 无第三方依赖,跨平台,手机端 cpu 的速度快于目前所有已知的开源框架。 基于 ncnn,开发者能够将深度学习算法轻松移植到手机端高效执行, 开发出人工智能 APP,将 AI 带到你的指尖。 ncnn 目前已在腾讯多款应用中使用,如:QQ,Qzone,微信,天天 P 图等。
资源推荐
资源详情
资源评论
收起资源包目录
ncnn-master.zip (2000个子文件)
gemm_x86.cpp 247KB
mat_pixel_rotate.cpp 229KB
onnx2ncnn.cpp 206KB
gpu.cpp 165KB
gemm_arm.cpp 153KB
command.cpp 135KB
softmax_x86.cpp 89KB
binaryop_riscv.cpp 89KB
mat_pixel.cpp 87KB
ncnnoptimize.cpp 84KB
binaryop_arm.cpp 83KB
binaryop_x86.cpp 83KB
net.cpp 82KB
convolution_vulkan.cpp 81KB
mxnet2ncnn.cpp 81KB
convolution_arm.cpp 80KB
requantize_x86.cpp 79KB
mat_pixel_affine.cpp 78KB
ir.cpp 76KB
binaryop_arm_asimdhp.cpp 74KB
requantize_loongarch.cpp 71KB
requantize_mips.cpp 70KB
requantize_arm.cpp 68KB
allocator.cpp 67KB
convolution_x86.cpp 67KB
innerproduct_arm.cpp 67KB
dequantize_arm.cpp 60KB
packing_riscv.cpp 60KB
innerproduct_loongarch.cpp 59KB
packing_x86.cpp 59KB
innerproduct_mips.cpp 58KB
main.cpp 58KB
mlir2ncnn.cpp 58KB
gru_arm.cpp 57KB
packing_arm.cpp 55KB
eltwise_arm_asimdhp.cpp 55KB
ncnn2table.cpp 55KB
convolutiondepthwise_arm.cpp 54KB
dequantize_arm_asimdhp.cpp 53KB
cpu.cpp 49KB
eltwise_arm.cpp 47KB
caffe2ncnn.cpp 45KB
deconvolution_arm_asimdhp.cpp 44KB
convolutiondepthwise_x86.cpp 43KB
F_interpolate.cpp 42KB
concat_vulkan.cpp 42KB
convolution_riscv.cpp 41KB
c_api.cpp 41KB
deconvolutiondepthwise_vulkan.cpp 41KB
slice_vulkan.cpp 41KB
gru_arm_asimdhp.cpp 40KB
reshape_vulkan.cpp 40KB
convolution_arm_asimdhp.cpp 40KB
mat_pixel_resize.cpp 40KB
mat.cpp 40KB
lstm_arm_asimdhp.cpp 40KB
padding_vulkan.cpp 39KB
deconvolution_arm.cpp 39KB
mat_pixel_drawing.cpp 39KB
interp_riscv.cpp 38KB
lstm_arm.cpp 38KB
convolutiondepthwise_riscv.cpp 37KB
convolution_loongarch.cpp 37KB
convolution_mips.cpp 37KB
instancenorm_vulkan.cpp 36KB
dequantize_x86.cpp 36KB
reduction.cpp 36KB
crop_vulkan.cpp 36KB
convolutiondepthwise_vulkan.cpp 35KB
quantize_arm.cpp 35KB
gru_riscv.cpp 34KB
deformableconv2d_x86.cpp 34KB
binaryop_loongarch.cpp 33KB
binaryop_mips.cpp 33KB
dequantize_mips.cpp 32KB
crop_x86.cpp 32KB
darknet2ncnn.cpp 32KB
deconvolution_vulkan.cpp 32KB
innerproduct_riscv.cpp 32KB
dequantize_loongarch.cpp 32KB
convolutiondepthwise_loongarch.cpp 32KB
convolutiondepthwise_mips.cpp 32KB
deconvolutiondepthwise_riscv.cpp 31KB
quantize_arm_asimdhp.cpp 31KB
pooling_riscv.cpp 31KB
interp_arm_asimdhp.cpp 31KB
innerproduct_arm_asimdhp.cpp 31KB
interp_vulkan.cpp 31KB
binaryop_vulkan.cpp 30KB
softmax_arm.cpp 30KB
rnn_arm.cpp 29KB
packing_vulkan.cpp 29KB
simpleomp.cpp 29KB
interp_arm.cpp 29KB
rnn_arm_asimdhp.cpp 29KB
interp_x86.cpp 29KB
reshape_x86.cpp 28KB
innerproduct_vulkan.cpp 28KB
pixelshuffle_arm.cpp 28KB
convolution1d_riscv.cpp 28KB
共 2000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 20
资源评论
m0_72731342
- 粉丝: 2
- 资源: 1832
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功