Python-利用TensorRT加速的TensorFlow模型_pythontensorRT10.7推理资源-CSDN文库

共19个文件

py：5个

jpg：5个

sh：2个

需积分: 49 109 浏览量 2019-08-11 06:41:02 上传评论 3 收藏 736KB ZIP 举报

在机器学习领域，高效地运行模型对于实际应用至关重要。NVIDIA TensorRT是一个高性能的深度学习推理（Inference）优化器和运行时系统，它能够为深度学习模型提供低延迟和高吞吐量的服务，尤其适合在GPU上运行。本主题将深入探讨如何使用TensorRT来加速基于TensorFlow构建的模型。 TensorFlow是一种广泛使用的开源深度学习框架，它提供了丰富的API用于构建、训练和部署复杂的神经网络模型。然而，尽管TensorFlow在训练过程中表现优秀，但在推理阶段可能会面临性能瓶颈，尤其是在处理大规模模型或高并发请求时。 TensorRT通过动态构建计算图并对其进行优化，以提高模型的执行效率。它能够进行以下优化： 1. **精度调整**：TensorRT允许在保持可接受的模型性能的前提下，牺牲一定的精度以换取更高的速度。这包括对模型中的某些层进行整数运算替代浮点运算。 2. **静态形状分析**：TensorRT在编译模型时会获取输入形状的静态信息，从而能够对计算图进行更深度的优化。 3. **模型融合**：将多个运算符融合成一个，减少数据在内存和计算单元之间的移动，降低内存开销和提高执行速度。 4. **多精度支持**：支持FP32、FP16、INT8等不同精度，其中INT8量化技术在保持准确率的同时显著提升速度。要将TensorFlow模型转换为TensorRT兼容格式，通常需要以下步骤： 1. **导出模型**：使用TensorFlow的`SavedModel`接口或者`tf.saved_model.save()`函数将训练好的模型保存为序列化格式。 2. **创建TensorRT构建器**：在Python中，可以使用`trt.Builder()`来创建一个TensorRT构建器实例，设置所需的精度、最大工作内存等参数。 3. **构建网络定义**：使用`tf.experimental.tensorrt.Converter()`将TensorFlow模型转换为TensorRT网络定义，指定输入和输出节点。 4. **优化和序列化**：调用`build()`方法对网络进行优化，并将其序列化为`.engine`文件，这个文件可以在后续的推理过程中直接加载和使用。 5. **运行推理**：使用`TRTExecutor`或其他类似工具加载`.engine`文件，执行模型推理。在`tf_trt_models-master`这个压缩包中，很可能包含了一系列已经转换为TensorRT优化的TensorFlow模型示例，这些模型可能包括常见的卷积神经网络（如ResNet、VGG）、循环神经网络（如LSTM）或者其他类型的网络。通过研究这些示例，你可以了解如何将不同类型的模型集成到TensorRT中，以实现更快的推理速度。利用NVIDIA TensorRT加速TensorFlow模型是提高深度学习推理性能的有效途径。通过理解和实践这一技术，开发者能够在满足实时性和效率要求的同时，充分利用GPU的计算能力，提升AI应用的用户体验。

资源推荐

资源详情

资源评论

收起资源包目录

Python-利用TensorRT加速的TensorFlow模型.zip （19个子文件）

tf_trt_models-master

.gitmodules 104B

data

classification_graphic.jpg 60KB

landing_graphic.jpg 52KB

detection_graphic.jpg 61KB

install.sh 692B

examples

classification

data

imagenet_labels_1000.txt 22KB

dog-yawning.jpg 54KB

imagenet_labels_1001.txt 22KB

classification.ipynb 186KB

detection

data

huskies.jpg 74KB

detection.ipynb 407KB

setup.py 293B

LICENSE.md 1KB

third_party

models

README.md 8KB

scripts

install_protoc.sh 451B

tf_trt_models

graph_utils.py 3KB

__init__.py 0B

classification.py 9KB

detection.py 6KB

TensorFlow/TensorRT Models on Jetson ==================================== <p align="center"> <img src="data/landing_graphic.jpg" alt="landing graphic" height="300px"/> </p> This repository contains scripts and documentation to use TensorFlow image classification and object detection models on NVIDIA Jetson. The models are sourced from the [TensorFlow models repository](https://github.com/tensorflow/models) and optimized using TensorRT. * [Setup](#setup) * [Image Classification](#ic) * [Models](#ic_models) * [Download pretrained model](#ic_download) * [Build TensorRT / Jetson compatible graph](#ic_build) * [Optimize with TensorRT](#ic_trt) * [Jupyter Notebook Sample](#ic_notebook) * [Train for custom task](#ic_train) * [Object Detection](#od) * [Models](#od_models) * [Download pretrained model](#od_download) * [Build TensorRT / Jetson compatible graph](#od_build) * [Optimize with TensorRT](#od_trt) * [Jupyter Notebook Sample](#od_notebook) * [Train for custom task](#od_train) <a name="setup"></a> Setup ----- 1. Flash your Jetson TX2 with JetPack 3.2 (including TensorRT). 2. Install miscellaneous dependencies on Jetson ``` sudo apt-get install python-pip python-matplotlib python-pil ``` 3. Install TensorFlow 1.7+ (with TensorRT support). Download the [pre-built pip wheel](https://devtalk.nvidia.com/default/topic/1031300/jetson-tx2/tensorflow-1-8-wheel-with-jetpack-3-2-/) and install using pip. ``` pip install tensorflow-1.8.0-cp27-cp27mu-linux_aarch64.whl --user ``` or if you're using Python 3. ``` pip3 install tensorflow-1.8.0-cp35-cp35m-linux_aarch64.whl --user ``` 4. Clone this repository ``` git clone --recursive https://github.com/NVIDIA-Jetson/tf_trt_models.git cd tf_trt_models ``` 5. Run the installation script ``` ./install.sh ``` or if you want to specify python intepreter ``` ./install.sh python3 ``` <a name="ic"></a> Image Classification -------------------- <img src="data/classification_graphic.jpg" alt="classification" height="300px"/> <a name="ic_models"></a> ### Models | Model | Input Size | TF-TRT TX2 | TF TX2 | |:------|:----------:|-----------:|-------:| | inception_v1 | 224x224 | 7.36ms | 22.9ms | | inception_v2 | 224x224 | 9.08ms | 31.8ms | | inception_v3 | 299x299 | 20.7ms | 74.3ms | | inception_v4 | 299x299 | 38.5ms | 129ms | | inception_resnet_v2 | 299x299 | | 158ms | | resnet_v1_50 | 224x224 | 12.5ms | 55.1ms | | resnet_v1_101 | 224x224 | 20.6ms | 91.0ms | | resnet_v1_152 | 224x224 | 28.9ms | 124ms | | resnet_v2_50 | 299x299 | 26.5ms | 73.4ms | | resnet_v2_101 | 299x299 | 46.9ms | | | resnet_v2_152 | 299x299 | 69.0ms | | | mobilenet_v1_0p25_128 | 128x128 | 3.72ms | 7.99ms | | mobilenet_v1_0p5_160 | 160x160 | 4.47ms | 8.69ms | | mobilenet_v1_1p0_224 | 224x224 | 11.1ms | 17.3ms | **TF** - Original TensorFlow graph (FP32) **TF-TRT** - TensorRT optimized graph (FP16) The above benchmark timings were gathered after placing the Jetson TX2 in MAX-N mode. To do this, run the following commands in a terminal: ``` sudo nvpmodel -m 0 sudo ~/jetson_clocks.sh ``` <a name="ic_download"></a> ### Download pretrained model As a convenience, we provide a script to download pretrained models sourced from the TensorFlow models repository. ```python from tf_trt_models.classification import download_classification_checkpoint checkpoint_path = download_classification_checkpoint('inception_v2') ``` To manually download the pretrained models, follow the links [here](https://github.com/tensorflow/models/tree/master/research/slim#Pretrained). <a name="ic_build"></a> ### Build TensorRT / Jetson compatible graph ```python from tf_trt_models.classification import build_classification_graph frozen_graph, input_names, output_names = build_classification_graph( model='inception_v2', checkpoint=checkpoint_path, num_classes=1001 ) ``` ### Optimize with TensorRT ```python import tensorflow.contrib.tensorrt as trt trt_graph = trt.create_inference_graph( input_graph_def=frozen_graph, outputs=output_names, max_batch_size=1, max_workspace_size_bytes=1 << 25, precision_mode='FP16', minimum_segment_size=50 ) ``` <a name="ic_notebook"></a> ### Jupyter Notebook Sample For a comprehensive example of performing the above steps and executing on a real image, see the [jupyter notebook sample](examples/classification/classification.ipynb). <a name="ic_train"></a> ### Train for custom task Follow the documentation from the [TensorFlow models repository](https://github.com/tensorflow/models/tree/master/research/slim). Once you have obtained a checkpoint, proceed with building the graph and optimizing with TensorRT as shown above. <a name="od"></a> Object Detection ---------------- <img src="data/detection_graphic.jpg" alt="detection" height="300px"/> <a name="od_models"></a> ### Models | Model | Input Size | TF-TRT TX2 | TF TX2 | |:------|:----------:|-----------:|-------:| | ssd_mobilenet_v1_coco | 300x300 | 50.5ms | 72.9ms | | ssd_inception_v2_coco | 300x300 | 54.4ms | 132ms | **TF** - Original TensorFlow graph (FP32) **TF-TRT** - TensorRT optimized graph (FP16) The above benchmark timings were gathered after placing the Jetson TX2 in MAX-N mode. To do this, run the following commands in a terminal: ``` sudo nvpmodel -m 0 sudo ~/jetson_clocks.sh ``` <a name="od_download"></a> ### Download pretrained model As a convenience, we provide a script to download pretrained model weights and config files sourced from the TensorFlow models repository. ```python from tf_trt_models.detection import download_detection_model config_path, checkpoint_path = download_detection_model('ssd_inception_v2_coco') ``` To manually download the pretrained models, follow the links [here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md). > **Important:** Some of the object detection configuration files have a very low non-maximum suppression score threshold (ie. 1e-8). > This can cause unnecessarily large CPU post-processing load. Depending on your application, it may be advisable to raise > this value to something larger (like 0.3) for improved performance. We do this for the above benchmark timings. This can be done by modifying the configuration > file directly before calling build_detection_graph. The parameter can be found for example in this [line](https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_mobilenet_v1_coco.config#L130). <a name="od_build"></a> ### Build TensorRT / Jetson compatible graph ```python from tf_trt_models.detection import build_detection_graph frozen_graph, input_names, output_names = build_detection_graph( config=config_path, checkpoint=checkpoint_path ) ``` <a name="od_trt"></a> ### Optimize with TensorRT ```python import tensorflow.contrib.tensorrt as trt trt_graph = trt.create_inference_graph( input_graph_def=frozen_graph, outputs=output_names, max_batch_size=1, max_workspace_size_bytes=1 << 25, precision_mode='FP16', minimum_segment_size=50 ) ``` <a name="od_notebook"></a> ### Jupyter Notebook Sample For a comprehensive example of performing the above steps and executing on a real image, see the [jupyter notebook sample](examples/detection/detection.ipynb). <a name="od_train"></a> ### Train for custom task Follow the documentation from the [TensorFlow models repository](https://github.com/tensorflow/models/tree/master/research/object_detection). Once you have obtained a checkpoint, proceed with building the graph and optimizing with TensorRT as shown above. Please note that all models are not tested so you should use an object detection config file during training that resembles one of the ssd_mobilenet_v1_coco or ssd_inception_v2_coco models. Some config parameters may be modified, such as

评论收藏

内容反馈