YOLOv9-YOLOv9目标检测QAT量化感知训练+TensorRT部署实现-附详细流程教程+benchmark+项目源码

共11个文件

py：6个

sh：4个

md：1个

版权申诉

31 浏览量 2024-05-08 16:01:08 上传评论收藏 43KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

YOLOv9_YOLOv9目标检测QAT量化感知训练+TensorRT部署实现_附详细流程教程+benchmark+项目源码_优质项目实战.zip （11个子文件）

YOLOv9_YOLOv9目标检测QAT量化感知训练+TensorRT部署实现_附详细流程教程+benchmark+项目源码_优质项目实战

draw-engine.py 2KB

patch_yolov9.sh 1KB

install_dependencies.sh 3KB

models

quantize.py 21KB

quantize_rules.py 3KB

val_trt.py 20KB

export_qat.py 34KB

README.md 28KB

qat.py 24KB

scripts

val_trt.sh 2KB

generate_trt_engine.sh 3KB

# YOLOv9 QAT for TensorRT This repository contains an implementation of YOLOv9 with Quantization-Aware Training (QAT), specifically designed for deployment on platforms utilizing TensorRT for hardware-accelerated inference. This implementation aims to provide an efficient, low-latency version of YOLOv9 for real-time detection applications. If you do not intend to deploy your model using TensorRT, it is recommended not to proceed with this implementation. - The files in this repository represent a patch that adds QAT functionality to the original [YOLOv9 repository](https://github.com/WongKinYiu/yolov9/). - This patch is intended to be applied to the main YOLOv9 repository to incorporate the ability to train with QAT. - The implementation is optimized to work efficiently with TensorRT, an inference library that leverages hardware acceleration to enhance inference performance. - Users interested in implementing object detection using YOLOv9 with QAT on TensorRT platforms can benefit from this repository as it provides a ready-to-use solution. We use [TensorRT's pytorch quntization tool](https://github.com/NVIDIA/TensorRT/tree/main/tools/pytorch-quantization) to finetune training QAT yolov9 from the pre-trained weight, then export the model to onnx and deploy it with TensorRT. The accuray and performance can be found in below table. For those who are not familiar with QAT, I highly recommend watching this video: [Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training](https://www.youtube.com/watch?v=0VdNflU08yA) **Important** Currently, quantization is only available for object detection models. However, since quantization primarily affects the backbone of the YOLOv9 model and the backbone remains consistent across all YOLOv9 variants, quantization is effectively prepared for all YOLOv9-based models, regardless of whether they are used for detection or segmentation tasks. Quantization support for segmentation models has not yet been released, as it necessitates the development of evaluation criteria and the validation of quantization for the final layers of the model. ð We still have plenty of nodes to improve Q/DQ, and we rely on the community's contribution to enhance this project, benefiting us all. Let's collaborate and make it even better! ð ## Release Highlights - This release includes an upgrade from TensorRT 8 to TensorRT 10, ensuring compatibility with the CUDA version supported - by the latest NVIDIA Ada Lovelace GPUs. - The inference has been upgraded utilizing `enqueueV3` instead `enqueueV2`. - To maintain legacy support for TensorRT 8, a [dedicated branch](https://github.com/levipereira/yolov9-qat/tree/TensorRT-8) has been created. **Outdated** - We've added a new option `val_trt.sh --generate-graph` which enables [Graph Rendering](#generate-tensort-profiling-and-svg-image) functionality. This feature facilitates the creation of graphical representations of the engine plan in SVG image format. # Perfomance / Accuracy [Full Report](#benchmark) ## Accuracy Report **YOLOv9-C** ### Evaluation Results #### Activation SiLU | Eval Model | AP | AP50 | Precision | Recall | |------------|--------|--------|-----------|--------| | **Origin (Pytorch)** | 0.529 | 0.699 | 0.743 | 0.634 | | **INT8 (Pytorch)** | 0.529 | 0.702 | 0.742 | 0.63 | | **INT8 (TensorRT)** | 0.529 | 0.696 | 0.739 | 0.635 | #### Activation ReLU | Eval Model | AP | AP50 | Precision | Recall | |------------|--------|--------|-----------|--------| | **Origin (Pytorch)** | 0.519 | 0.69 | 0.719 | 0.629 | | **INT8 (Pytorch)** | 0.518 | 0.69 | 0.726 | 0.625 | | **INT8 (TensorRT)** | 0.517 | 0.685 | 0.723 | 0.626 | ### Evaluation Comparison #### Activation SiLU | Eval Model | AP | AP50 | Precision | Recall | |----------------------|------|------|-----------|--------| | **INT8 (TensorRT)** vs **Origin (Pytorch)** | | | | | | | 0.000 | -0.003 | -0.004 | +0.001 | #### Activation ReLU | Eval Model | AP | AP50 | Precision | Recall | |----------------------|------|------|-----------|--------| | **INT8 (TensorRT)** vs **Origin (Pytorch)** | | | | | | | -0.002 | -0.005 | +0.004 | -0.003 | ## Latency/Throughput Report - TensorRT ![image](https://github.com/levipereira/yolov9-qat/assets/22964932/61a46206-9784-4c75-bcd4-6534eba51223) ## Device | **GPU** | | |---------------------------|------------------------------| | Device | **NVIDIA GeForce RTX 4090** | | Compute Capability | 8.9 | | SMs | 128 | | Device Global Memory | 24207 MiB | | Application Compute Clock Rate | 2.58 GHz | | Application Memory Clock Rate | 10.501 GHz | ### Latency/Throughput | Model Name | Batch Size | Latency (99%) | Throughput (qps) | Total Inferences (IPS) | |-----------------|------------|----------------|------------------|------------------------| | **(FP16) SiLU** | 1 | 1.25 ms | 803 | 803 | | | 4 | 3.37 ms | 300 | 1200 | | | 8 | 6.6 ms | 153 | 1224 | | | 12 | 10 ms | 99 | 1188 | | | | | | | | **INT8 (SiLU)** | 1 | 0.97 ms | 1030 | 1030 | | | 4 | 2,06 ms | 486 | 1944 | | | 8 | 3.69 ms | 271 | 2168 | | | 12 | 5.36 ms | 189 | 2268 | | | | | | | | **INT8 (ReLU)** | 1 | 0.87 ms | 1150 | 1150 | | | 4 | 1.78 ms | 562 | 2248 | | | 8 | 3.06 ms | 327 | 2616 | | | 12 | 4.63 ms | 217 | 2604 | ## Latency/Throughput Comparison (INT8 vs FP16) | Model Name | Batch Size | Latency (99%) Change | Throughput (qps) Change | Total Inferences (IPS) Change | |---|---|---|---|---| | **INT8(SiLU)** vs **FP16** | 1 | -20.8% | +28.4% | +28.4% | | | 4 | -37.1% | +62.0% | +62.0% | | | 8 | -41.1% | +77.0% | +77.0% | | | 12 | -46.9% | +90.9% | +90.9% | ## QAT Training (Finetune) In this section, we'll outline the steps to perform Quantization-Aware Training (QAT) using fine-tuning. **Please note that the supported quantization mode is fine-tuning only.** The model should be trained using the original implementation train.py, and after training and reparameterization of the model, the user should proceed with quantization. ### Steps: 1. **Train the Model Using [Training Session](https://github.com/WongKinYiu/yolov9/tree/main?tab=readme-ov-file#training):** - Utilize the original implementation train.py to train your YOLOv9 model with your dataset and desired configurations. - Follow the training instructions provided in the original YOLOv9 repository to ensure proper training. 2. **Reparameterize the Model [reparameterization.py](https://github.com/sunmooncode/yolov9/blob/main/tools/reparameterization.py):** - After completing the training, reparameterize the trained model to prepare it for quantization. This step is crucial for ensuring that the model's weights are in a suitable for

评论收藏

内容反馈

版权申诉