# TPU-Kernel Samples
## 0. Introduction
Gemm, Topk and some CV operators are provided to demonstrate how to program on TPU and call from host.
Moreover, these samples can also be used to test TPU performance on the specified operators
``` shell
├── README.md # the current file
├── CMakeLists.txt # host and device build scripts
├── device # codes on TPU device
├── host # codes on host to call TPU kernel and test
├── include # common definitions both for device and host
└── test # some test scripts
```
## 1. Environment
Follow ../README.md, to initialize the cross toolchains and install libsophon packages
## 2. Compilation
### 2.1 PCIE Mode
We create a build directory and test the samples there.
``` shell
# switch to pcie mode
use_pcie
mkdir build && cd build
# Release Version Build
cmake ..
# Debug Version Build
# cmake .. -DCMAKE_BUILD_TYPE=Debug
# Compile custom operators in the device directory and link with libbm1684x.a
# Generate libbm1684x_kernel_module.so and host program
make -j
```
### 2.2 CModel Mode
We create a build directory and test the samples there.
``` shell
# switch to cmodel mode
use_cmodel
mkdir build_cmodel && cd build_cmodel
# Debug Version Build
cmake ..
# "load" the generated firmware
set_cmodel_firmware libfirmware_cmodel.so
# Build all the host applications to call tpu kernel
make -j
```
### 2.3 SOC Mode
Note: SOC_SDK is needed, which can be initialized following SOC Mode Section in the LIBSOPHON USAGE MANUAL
We create a build directory and test the samples there.
``` shell
# switch to pcie mode
use_soc
# tell where the soc sdk locates
export SOC_SDK=path_to_soc_sdk
mkdir build_soc && cd build_soc
# Release Version Build
cmake ..
# Debug Version Build
# cmake .. -DCMAKE_BUILD_TYPE=Debug
# Build all the host applications to call tpu kernel
make -j
# Collect all the useful files into 'install' dir
make install
```
Run on soc
1. copy whole 'install' directory to soc device
2. login on soc device, and entry 'install' directory
``` shell
cd bin
# run the applications
# single test
./tpu_crop
# batch test
python3 ../batch_test_crop.py
```
### 2.4 loongarch64 PCIe Mode
Note: LIBSOPHON_DIR is needed, which is loongarch64 libsophon dir
We create a build directory and test the samples there.
``` shell
# switch to pcie mode
use_pcie_loongarch64
mkdir build && cd build
# Release Version Build
cmake .. -DLIBSOPHON_DIR=path_to_loongarch64_libsophon
# Debug Version Build
# cmake .. -DCMAKE_BUILD_TYPE=Debug -DLIBSOPHON_DIR=path_to_loongarch64_libsophon
# Build all the host applications to call tpu kernel
make -j
# Collect all the useful files into 'install' dir
make install
```
Run on loongarch64 pcie
1. copy whole 'install' directory to loongarch64 device
2. login on loongarch64 device, and entry 'install' directory
``` shell
cd bin
# run the applications
# single test
./tpu_crop
# batch test
python3 ../batch_test_crop.py
```
## 3. Test
After compilation, we get the following host applications in the current directory:
* tpu_gemm
* tpu_database_topk
* tpu_database_group_topk
* tpu_multi_crop_resize
* tpu_rgb2yuv
* tpu_yuv2rgb_formula
* tpu_yuv2rgb_lookup_table
* tpu_warp_affine
* tpu_warp_affine_bilinear
* tpu_crop
* tpu_rpn
* tpu_hanming_distance
Note: all the tests work on device_id=0
### Batch Test Common Usage
All the batch_test_xxx.py can be controlled by environment variable 'CASE_START', 'CASE_COUNT', for examples with topk,
``` shell
# run CASE {5, 6}. Note case index starts from 0
CASE_START=5 CASE_COUNT=2 python3 ../test/batch_test_topk.py
# run all cases
python3 ../test/batch_test_topk.py
```
When batch_test finished without failure, xxx.csv will saved in current directory which contains running param and time info.
### GEMM
General Matrix Multiplication(GEMM) is a typical operation for TPU.
`tpu_gemm` shows how to implement MxK and KxN Matrix Multiplication,
and provides many cases test gemm performance
The related files include:
* include/tpu_api_protocol.h
* host/tpu_gemm.cpp
* device/tpu_device_gemm.c
* test/batch_test_gemm.py
The following is the usage of `tpu_gemm`.
``` shell
# print the usage
./tpu_gemm -h
# output as follows:
# --L_ROW(-m) xxx : Left matrix row, there is no limit on the maximum if cpu side can malloc enough memory, default 10
# --L_COL(-k) xxx : Left matrix columns, there is no limit on the maximum if cpu side can malloc enough memory, default 10
# --R_COL(-n) xxx : Right matrix columns, there is no limit on the maximum if cpu side can malloc enough memory, default 20
# --idtype(-i) xxx : input data_type, 5:FP32, 3:FP16, 1:INT8, 0:UINT8, 7:INT16, 6:UINT16, 9:INT32, 8:UINT32, 11:BFP16 default: FP32
# --odtype(-o) xxx : output data_type, default: FP32
# --seed(-s) xxx : set test seed
# --compare(-c) xxx : need compare result, default=1
# Note: Regarding the combination of idtype and odtype, the following combinations are supported (expressed as idtype/odtype):
# FP32/FP32, FP16/FP16, FP16/FP32, BFP16/FP32, BFP16/BFP16, INT8/FP32, INT8/INT32,
# INT8/INT16, INT8/INT8, UINT8/FP32, UINT8/UINT32, UINT8/UINT16, UINT8/UINT8,
# UINT16/UINT32, UINT16/INT32, INT16/UINT32, INT16/INT32
```
``` shell
./tpu_gemm
# output as follows:
# L_row=10, L_col=10, R_col=20, L=F32, R=F32, Y=F32, time=13(us) --> success
# batch test without compare
python3 ../test/batch_test_gemm.py 0
# batch test with compare
python3 ../test/batch_test_gemm.py 1
```
### Full Database TopK
The Full Database Topk is a user defined operation and applyed on TPU.
It can select data from database according to their attributes and get topk from selected data.
Related files:
* include/tpu_api_protocol.h
* host/tpu_database_topk.cpp
* device/tpu_device_attr_filter.c
* device/tpu_device_topk.c
* test/batch_test_topk.py
Full Database TopK has four parameters for user to set: total_people_num、db_num、db_sel_num and k.
* Total_people_num represent the total data volume in all databases;
* Db_num represent how many databases all data is located in;
* Db_sel_num represent how many databases to select from all databases;
* K represent get the top k numbers out of all the selected data.
User can customize total_people_num、db_num、db_sel_num and k as the examples belowed.
`tpu_database_top` Usage:
``` shell
./tpu_database_topk [total_people_num] [db_num] [db_sel_num] [k]
```
``` shell
# Default settings:
# total number of people is 1000000,
# total number of databases is 64,
# select 64 databases,
# get the top 10 item
./tpu_database_topk
# Custom settings:
# total number of people is 1000000,
# total number of databases is 64,
# select 5 databases,
# get the top 1 item
./tpu_database_topk 1000000 64 5 1
# output as follows:
# TopK total_people_num=1000000, db_num=64, db_sel_num=5, k=1, avg_time=0.8205(ms)
# --> Topk value: [ 99.9954 ]
# --> Topk index: [ 999954 ]
# batch test
python3 ../test/batch_test_topk.py
```
### Grouped Database TopK
The Grouped Database Topk is a user defined operation and applyed on TPU.
It can select data from database according to their attributes and get topk values in each selected database.
Related files:
* include/tpu_api_protocol.h
* host/tpu_database_group_topk.cpp
* device/tpu_device_attr_filter.c
* device/tpu_device_db_seperate.c
* device/tpu_device_topk.c
* test/batch_test_group_topk.py
Grouped Database TopK has four parameters for user to set: total_people_num、db_num、db_sel_num and k.
* Total_people_num represent the total data volume in all databases;
* Db_num represent how many databases all data is located in;
* Db_sel_num represent how many databases to select from all databases;
* K represent get the top k values in each selected database.
The Differeence from Full Database Topk is:
* Full Database Topk get k values from all selected data;
* Grouped Database Topk need seperate database first and get top k values in each sel
没有合适的资源?快使用搜索试试~ 我知道了~
bm1684x-tpu-kernel相关
需积分: 5 0 下载量 160 浏览量
2024-04-12
14:30:37
上传
评论
收藏 7.46MB GZ 举报
温馨提示
共84个文件
c:19个
cpp:19个
py:19个
bm1684x-tpu-kernel相关资源+边缘端设备+机器视觉
资源推荐
资源详情
资源评论
收起资源包目录
tpu-kernel-1684x_v3.1.7-d21fb8b2-230710.tar.gz (84个子文件)
tpu-kernel-1684x_v3.1.7-d21fb8b2-230710
include
tpu_fp16.h 9KB
device
tpu_kernel.h 117KB
fp16.h 21KB
bitcasts.h 2KB
common_util.h 5KB
common.h 6KB
tpu_defs.h 2KB
lib
libbm1684x.a 3.63MB
libcmodel_firmware.so 3.12MB
libbmlib_cmodel.so 435KB
doc
TPU-KERNEL_Quick_Start.pdf 966KB
TPU-KERNEL_Technical_Reference_Mannal.pdf 1.04MB
TPU-KERNEL开发参考手册.pdf 1.23MB
TPU-KERNEL快速入门指南.pdf 643KB
samples
include
bmcv_api_ext.h 48KB
tpu_api_protocol.h 12KB
CMakeLists.txt 395B
device
tpu_device_gemm.c 45KB
tpu_device_db_seperate.c 9KB
tpu_device_cv_warp_affine_bilinear.c 43KB
tpu_device_rpn.c 88KB
tpu_device_test_instrutions.c 9KB
tpu_device_crop.c 2KB
tpu_device_crop_and_resize.c 80KB
tpu_device_pad.c 9KB
tpu_device_warp_affine.c 14KB
tpu_device_hm_distance.c 5KB
tpu_device_cv_yuv2rgb_formula.c 14KB
tpu_device_resize_using_lut.c 15KB
tpu_device_image_resize.c 72KB
tpu_device_cv_rgb2yuv.c 8KB
tpu_device_attr_filter.c 13KB
tpu_device_multi_crop_resize.c 62KB
tpu_device_yuv_deinterlace.c 2KB
tpu_device_topk.c 2KB
tpu_device_cv_yuv2rgb_lookup_table.c 15KB
src
bmcv_image_ext.cpp 27KB
cmodel.cmake 2KB
pcie_loongarch64.cmake 3KB
test
batch_test_yuv2rgb_lookup_table.py 388B
batch_test_rgb2yuv.py 273B
batch_test_group_topk.py 361B
batch_test_warp_affine_bilinear.py 488B
batch_test_pad.py 863B
batch_test_yuv2rgb_formula.py 377B
batch_test_multi_crop_resize.py 3KB
batch_test_rpn.py 304B
batch_test_resize_using_lut.py 1KB
batch_test_instructions.py 429B
batch_test_topk.py 343B
batch_test_hanming_distance.py 279B
batch_test_crop.py 740B
batch_test.py 5KB
batch_test_image_resize.py 1KB
batch_test_crop_and_resize.py 1KB
batch_test_warp_affine.py 462B
batch_test_yuv_deinterlace.py 516B
batch_test_gemm.py 1KB
pcie.cmake 3KB
soc.cmake 3KB
README.md 23KB
host
tpu_crop.cpp 8KB
tpu_database_group_topk.cpp 19KB
tpu_yuv2rgb_formula.cpp 21KB
tpu_hanming_distance.cpp 7KB
tpu_image_resize.cpp 43KB
tpu_yuv2rgb_lookup_table.cpp 23KB
tpu_rgb2yuv.cpp 6KB
tpu_rpn.cpp 34KB
tpu_yuv_deinterlace.cpp 29KB
tpu_test_instrutions.cpp 11KB
tpu_database_topk.cpp 14KB
tpu_warp_affine.cpp 28KB
tpu_pad.cpp 25KB
tpu_multi_crop_resize.cpp 37KB
tpu_warp_affine_bilinear.cpp 36KB
tpu_crop_and_resize.cpp 56KB
tpu_resize_using_lut.cpp 46KB
tpu_gemm.cpp 21KB
tpu_kernel_module
libbm1684x_kernel_module.so 1.92MB
README.md 4KB
scripts
prepare_toolchains.sh 1KB
envsetup.sh 2KB
version_info.sh 221B
共 84 条
- 1
资源评论
滴答滴答滴嗒滴
- 粉丝: 577
- 资源: 10
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功