# YOLOv9 QAT for TensorRT
This repository contains an implementation of YOLOv9 with Quantization-Aware Training (QAT), specifically designed for deployment on platforms utilizing TensorRT for hardware-accelerated inference. <br>
This implementation aims to provide an efficient, low-latency version of YOLOv9 for real-time detection applications.<br>
If you do not intend to deploy your model using TensorRT, it is recommended not to proceed with this implementation.
- The files in this repository represent a patch that adds QAT functionality to the original [YOLOv9 repository](https://github.com/WongKinYiu/yolov9/).
- This patch is intended to be applied to the main YOLOv9 repository to incorporate the ability to train with QAT.
- The implementation is optimized to work efficiently with TensorRT, an inference library that leverages hardware acceleration to enhance inference performance.
- Users interested in implementing object detection using YOLOv9 with QAT on TensorRT platforms can benefit from this repository as it provides a ready-to-use solution.
We use [TensorRT's pytorch quntization tool](https://github.com/NVIDIA/TensorRT/tree/main/tools/pytorch-quantization) to finetune training QAT yolov9 from the pre-trained weight, then export the model to onnx and deploy it with TensorRT. The accuray and performance can be found in below table.
For those who are not familiar with QAT, I highly recommend watching this video:<br> [Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training](https://www.youtube.com/watch?v=0VdNflU08yA)
**Important**<br>
Currently, quantization is only available for object detection models. However, since quantization primarily affects the backbone of the YOLOv9 model and the backbone remains consistent across all YOLOv9 variants, quantization is effectively prepared for all YOLOv9-based models, regardless of whether they are used for detection or segmentation tasks. Quantization support for segmentation models has not yet been released, as it necessitates the development of evaluation criteria and the validation of quantization for the final layers of the model. <br>
ð We still have plenty of nodes to improve Q/DQ, and we rely on the community's contribution to enhance this project, benefiting us all. Let's collaborate and make it even better! ð
## Release Highlights
- This release includes an upgrade from TensorRT 8 to TensorRT 10, ensuring compatibility with the CUDA version supported - by the latest NVIDIA Ada Lovelace GPUs.
- The inference has been upgraded utilizing `enqueueV3` instead `enqueueV2`.<br>
- To maintain legacy support for TensorRT 8, a [dedicated branch](https://github.com/levipereira/yolov9-qat/tree/TensorRT-8) has been created. **Outdated** <br>
- We've added a new option `val_trt.sh --generate-graph` which enables [Graph Rendering](#generate-tensort-profiling-and-svg-image) functionality. This feature facilitates the creation of graphical representations of the engine plan in SVG image format.
# Perfomance / Accuracy
[Full Report](#benchmark)
## Accuracy Report
**YOLOv9-C**
### Evaluation Results
#### Activation SiLU
| Eval Model | AP | AP50 | Precision | Recall |
|------------|--------|--------|-----------|--------|
| **Origin (Pytorch)** | 0.529 | 0.699 | 0.743 | 0.634 |
| **INT8 (Pytorch)** | 0.529 | 0.702 | 0.742 | 0.63 |
| **INT8 (TensorRT)** | 0.529 | 0.696 | 0.739 | 0.635 |
#### Activation ReLU
| Eval Model | AP | AP50 | Precision | Recall |
|------------|--------|--------|-----------|--------|
| **Origin (Pytorch)** | 0.519 | 0.69 | 0.719 | 0.629 |
| **INT8 (Pytorch)** | 0.518 | 0.69 | 0.726 | 0.625 |
| **INT8 (TensorRT)** | 0.517 | 0.685 | 0.723 | 0.626 |
### Evaluation Comparison
#### Activation SiLU
| Eval Model | AP | AP50 | Precision | Recall |
|----------------------|------|------|-----------|--------|
| **INT8 (TensorRT)** vs **Origin (Pytorch)** | | | | |
| | 0.000 | -0.003 | -0.004 | +0.001 |
#### Activation ReLU
| Eval Model | AP | AP50 | Precision | Recall |
|----------------------|------|------|-----------|--------|
| **INT8 (TensorRT)** vs **Origin (Pytorch)** | | | | |
| | -0.002 | -0.005 | +0.004 | -0.003 |
## Latency/Throughput Report - TensorRT
![image](https://github.com/levipereira/yolov9-qat/assets/22964932/61a46206-9784-4c75-bcd4-6534eba51223)
## Device
| **GPU** | |
|---------------------------|------------------------------|
| Device | **NVIDIA GeForce RTX 4090** |
| Compute Capability | 8.9 |
| SMs | 128 |
| Device Global Memory | 24207 MiB |
| Application Compute Clock Rate | 2.58 GHz |
| Application Memory Clock Rate | 10.501 GHz |
### Latency/Throughput
| Model Name | Batch Size | Latency (99%) | Throughput (qps) | Total Inferences (IPS) |
|-----------------|------------|----------------|------------------|------------------------|
| **(FP16) SiLU** | 1 | 1.25 ms | 803 | 803 |
| | 4 | 3.37 ms | 300 | 1200 |
| | 8 | 6.6 ms | 153 | 1224 |
| | 12 | 10 ms | 99 | 1188 |
| | | | | |
| **INT8 (SiLU)** | 1 | 0.97 ms | 1030 | 1030 |
| | 4 | 2,06 ms | 486 | 1944 |
| | 8 | 3.69 ms | 271 | 2168 |
| | 12 | 5.36 ms | 189 | 2268 |
| | | | | |
| **INT8 (ReLU)** | 1 | 0.87 ms | 1150 | 1150 |
| | 4 | 1.78 ms | 562 | 2248 |
| | 8 | 3.06 ms | 327 | 2616 |
| | 12 | 4.63 ms | 217 | 2604 |
## Latency/Throughput Comparison (INT8 vs FP16)
| Model Name | Batch Size | Latency (99%) Change | Throughput (qps) Change | Total Inferences (IPS) Change |
|---|---|---|---|---|
| **INT8(SiLU)** vs **FP16** | 1 | -20.8% | +28.4% | +28.4% |
| | 4 | -37.1% | +62.0% | +62.0% |
| | 8 | -41.1% | +77.0% | +77.0% |
| | 12 | -46.9% | +90.9% | +90.9% |
## QAT Training (Finetune)
In this section, we'll outline the steps to perform Quantization-Aware Training (QAT) using fine-tuning. <br> **Please note that the supported quantization mode is fine-tuning only.** <br> The model should be trained using the original implementation train.py, and after training and reparameterization of the model, the user should proceed with quantization.
### Steps:
1. **Train the Model Using [Training Session](https://github.com/WongKinYiu/yolov9/tree/main?tab=readme-ov-file#training):**
- Utilize the original implementation train.py to train your YOLOv9 model with your dataset and desired configurations.
- Follow the training instructions provided in the original YOLOv9 repository to ensure proper training.
2. **Reparameterize the Model [reparameterization.py](https://github.com/sunmooncode/yolov9/blob/main/tools/reparameterization.py):**
- After completing the training, reparameterize the trained model to prepare it for quantization. This step is crucial for ensuring that the model's weights are in a suitable for
没有合适的资源?快使用搜索试试~ 我知道了~
YOLOv9-YOLOv9目标检测QAT量化感知训练+TensorRT部署实现-附详细流程教程+benchmark+项目源码
共11个文件
py:6个
sh:4个
md:1个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 31 浏览量
2024-05-08
16:01:08
上传
评论
收藏 43KB ZIP 举报
温馨提示
YOLOv9_YOLOv9目标检测QAT量化感知训练+TensorRT部署实现_附详细流程教程+benchmark+项目源码_优质项目实战
资源推荐
资源详情
资源评论
收起资源包目录
YOLOv9_YOLOv9目标检测QAT量化感知训练+TensorRT部署实现_附详细流程教程+benchmark+项目源码_优质项目实战.zip (11个子文件)
YOLOv9_YOLOv9目标检测QAT量化感知训练+TensorRT部署实现_附详细流程教程+benchmark+项目源码_优质项目实战
draw-engine.py 2KB
patch_yolov9.sh 1KB
install_dependencies.sh 3KB
models
quantize.py 21KB
quantize_rules.py 3KB
val_trt.py 20KB
export_qat.py 34KB
README.md 28KB
qat.py 24KB
scripts
val_trt.sh 2KB
generate_trt_engine.sh 3KB
共 11 条
- 1
资源评论
极智视界
- 粉丝: 2w+
- 资源: 1597
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功