# Deploying the TFT model on Triton Inference Server
This folder contains instructions for deployment to run inference
on Triton Inference Server as well as a detailed performance analysis.
The purpose of this document is to help you with achieving
the best inference performance.
## Table of contents
- [Solution overview](#solution-overview)
- [Introduction](#introduction)
- [Deployment process](#deployment-process)
- [Setup](#setup)
- [Quick Start Guide](#quick-start-guide)
- [Performance](#performance)
- [Offline scenario](#offline-scenario)
- [Offline: NVIDIA A30, NVIDIA TensorRT with FP16, Dataset: electricity](#offline-nvidia-a30-nvidia-tensorrt-with-fp16-dataset-electricity)
- [Offline: NVIDIA A30, NVIDIA TensorRT with FP16, Dataset: traffic](#offline-nvidia-a30-nvidia-tensorrt-with-fp16-dataset-traffic)
- [Offline: NVIDIA A30, PyTorch with FP16, Dataset: electricity](#offline-nvidia-a30-pytorch-with-fp16-dataset-electricity)
- [Offline: NVIDIA A30, PyTorch with FP16, Dataset: traffic](#offline-nvidia-a30-pytorch-with-fp16-dataset-traffic)
- [Offline: NVIDIA DGX-1 (1x V100 32GB), NVIDIA TensorRT with FP16, Dataset: electricity](#offline-nvidia-dgx-1-1x-v100-32gb-nvidia-tensorrt-with-fp16-dataset-electricity)
- [Offline: NVIDIA DGX-1 (1x V100 32GB), NVIDIA TensorRT with FP16, Dataset: traffic](#offline-nvidia-dgx-1-1x-v100-32gb-nvidia-tensorrt-with-fp16-dataset-traffic)
- [Offline: NVIDIA DGX-1 (1x V100 32GB), PyTorch with FP16, Dataset: electricity](#offline-nvidia-dgx-1-1x-v100-32gb-pytorch-with-fp16-dataset-electricity)
- [Offline: NVIDIA DGX-1 (1x V100 32GB), PyTorch with FP16, Dataset: traffic](#offline-nvidia-dgx-1-1x-v100-32gb-pytorch-with-fp16-dataset-traffic)
- [Offline: NVIDIA DGX A100 (1x A100 80GB), NVIDIA TensorRT with FP16, Dataset: electricity](#offline-nvidia-dgx-a100-1x-a100-80gb-nvidia-tensorrt-with-fp16-dataset-electricity)
- [Offline: NVIDIA DGX A100 (1x A100 80GB), NVIDIA TensorRT with FP16, Dataset: traffic](#offline-nvidia-dgx-a100-1x-a100-80gb-nvidia-tensorrt-with-fp16-dataset-traffic)
- [Offline: NVIDIA DGX A100 (1x A100 80GB), PyTorch with FP16, Dataset: electricity](#offline-nvidia-dgx-a100-1x-a100-80gb-pytorch-with-fp16-dataset-electricity)
- [Offline: NVIDIA DGX A100 (1x A100 80GB), PyTorch with FP16, Dataset: traffic](#offline-nvidia-dgx-a100-1x-a100-80gb-pytorch-with-fp16-dataset-traffic)
- [Offline: NVIDIA T4, NVIDIA TensorRT with FP16, Dataset: electricity](#offline-nvidia-t4-nvidia-tensorrt-with-fp16-dataset-electricity)
- [Offline: NVIDIA T4, NVIDIA TensorRT with FP16, Dataset: traffic](#offline-nvidia-t4-nvidia-tensorrt-with-fp16-dataset-traffic)
- [Offline: NVIDIA T4, PyTorch with FP16, Dataset: electricity](#offline-nvidia-t4-pytorch-with-fp16-dataset-electricity)
- [Offline: NVIDIA T4, PyTorch with FP16, Dataset: traffic](#offline-nvidia-t4-pytorch-with-fp16-dataset-traffic)
- [Online scenario](#online-scenario)
- [Online: NVIDIA A30, NVIDIA TensorRT with FP16, Dataset: electricity](#online-nvidia-a30-nvidia-tensorrt-with-fp16-dataset-electricity)
- [Online: NVIDIA A30, NVIDIA TensorRT with FP16, Dataset: traffic](#online-nvidia-a30-nvidia-tensorrt-with-fp16-dataset-traffic)
- [Online: NVIDIA A30, PyTorch with FP16, Dataset: electricity](#online-nvidia-a30-pytorch-with-fp16-dataset-electricity)
- [Online: NVIDIA A30, PyTorch with FP16, Dataset: traffic](#online-nvidia-a30-pytorch-with-fp16-dataset-traffic)
- [Online: NVIDIA DGX-1 (1x V100 32GB), NVIDIA TensorRT with FP16, Dataset: electricity](#online-nvidia-dgx-1-1x-v100-32gb-nvidia-tensorrt-with-fp16-dataset-electricity)
- [Online: NVIDIA DGX-1 (1x V100 32GB), NVIDIA TensorRT with FP16, Dataset: traffic](#online-nvidia-dgx-1-1x-v100-32gb-nvidia-tensorrt-with-fp16-dataset-traffic)
- [Online: NVIDIA DGX-1 (1x V100 32GB), PyTorch with FP16, Dataset: electricity](#online-nvidia-dgx-1-1x-v100-32gb-pytorch-with-fp16-dataset-electricity)
- [Online: NVIDIA DGX-1 (1x V100 32GB), PyTorch with FP16, Dataset: traffic](#online-nvidia-dgx-1-1x-v100-32gb-pytorch-with-fp16-dataset-traffic)
- [Online: NVIDIA DGX A100 (1x A100 80GB), NVIDIA TensorRT with FP16, Dataset: electricity](#online-nvidia-dgx-a100-1x-a100-80gb-nvidia-tensorrt-with-fp16-dataset-electricity)
- [Online: NVIDIA DGX A100 (1x A100 80GB), NVIDIA TensorRT with FP16, Dataset: traffic](#online-nvidia-dgx-a100-1x-a100-80gb-nvidia-tensorrt-with-fp16-dataset-traffic)
- [Online: NVIDIA DGX A100 (1x A100 80GB), PyTorch with FP16, Dataset: electricity](#online-nvidia-dgx-a100-1x-a100-80gb-pytorch-with-fp16-dataset-electricity)
- [Online: NVIDIA DGX A100 (1x A100 80GB), PyTorch with FP16, Dataset: traffic](#online-nvidia-dgx-a100-1x-a100-80gb-pytorch-with-fp16-dataset-traffic)
- [Online: NVIDIA T4, NVIDIA TensorRT with FP16, Dataset: electricity](#online-nvidia-t4-nvidia-tensorrt-with-fp16-dataset-electricity)
- [Online: NVIDIA T4, NVIDIA TensorRT with FP16, Dataset: traffic](#online-nvidia-t4-nvidia-tensorrt-with-fp16-dataset-traffic)
- [Online: NVIDIA T4, PyTorch with FP16, Dataset: electricity](#online-nvidia-t4-pytorch-with-fp16-dataset-electricity)
- [Online: NVIDIA T4, PyTorch with FP16, Dataset: traffic](#online-nvidia-t4-pytorch-with-fp16-dataset-traffic)
- [Advanced](#advanced)
- [Step by step deployment process](#step-by-step-deployment-process)
- [Latency explanation](#latency-explanation)
- [Release notes](#release-notes)
- [Changelog](#changelog)
- [Known issues](#known-issues)
## Solution overview
### Introduction
The [NVIDIA Triton Inference Server](https://github.com/NVIDIA/triton-inference-server)
provides a datacenter and cloud inferencing solution optimized for NVIDIA GPUs.
The server provides an inference service via an HTTP or gRPC endpoint,
allowing remote clients to request inferencing for any number of GPU
or CPU models being managed by the server.
This README provides step-by-step deployment instructions for models generated
during training (as described in the [model README](../readme.md)).
Additionally, this README provides the corresponding deployment scripts that
ensure optimal GPU utilization during inferencing on Triton Inference Server.
### Deployment process
The deployment process consists of two steps:
1. Conversion.
The purpose of conversion is to find the best performing model
format supported by Triton Inference Server.
Triton Inference Server uses a number of runtime backends such as
[TensorRT](https://developer.nvidia.com/tensorrt),
[LibTorch](https://github.com/triton-inference-server/pytorch_backend) and
[ONNX Runtime](https://github.com/triton-inference-server/onnxruntime_backend)
to support various model types. Refer to the
[Triton documentation](https://github.com/triton-inference-server/backend#where-can-i-find-all-the-backends-that-are-available-for-triton)
for a list of available backends.
2. Configuration.
Model configuration on Triton Inference Server, which generates
necessary [configuration files](https://github.com/triton-inference-server/server/blob/master/docs/model_configuration.md).
After deployment Triton inference server is used for evaluation of converted model in two steps:
1. Accuracy tests.
Produce results which are tested against given accuracy thresholds.
2. Performance tests.
Produce latency and throughput results for offline (static batching)
and online (dynamic batching) scenarios.
All steps are executed by provided runner script. Refer to [Quick Start Guide](#quick-start-guide)
## Setup
Ensure you have the following components:
* [NVIDIA Docker](https://github.com/NVIDIA/nvidia-docker)
* [PyTorch NGC container 21.12](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch)
* [Triton In
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
【项目资源】:汇聚了云计算、区块链、网络安全、前端设计、后端架构、UI/UX设计、游戏开发、移动应用开发、虚拟现实(VR)、增强现实(AR)、3D建模与渲染、云计算服务、网络安全工具等各类技术项目的素材和模板。包括AWS、Azure、Docker、Kubernetes、React、Vue、Angular、Node.js、Django、Flask、Unity、Unreal Engine、Blender、Sketch、Figma、Wireshark、Nmap等项目的素材和模板。【项目质量】:所有素材和模板都经过精心筛选和整理,确保满足专业标准。在发布前,我们已经对功能进行了全面测试,确保其稳定性和可用性。【适用人群】:适合对技术充满热情的初学者、希望提升专业技能的中级开发者、以及寻求创新解决方案的高级工程师。无论是个人项目、团队合作、课程设计还是商业应用,都能在这里找到合适的资源。【附加价值】:这些项目资源不仅具有很高的学习价值,而且能够直接应用于实际项目中,提高开发效率。对于有志于深入研究或拓展新领域的人来说,它们提供了丰富的灵感和基础框架,帮助你快速构建出令人惊艳的作品。
资源推荐
资源详情
资源评论
收起资源包目录
人工智能+深度学习+NVIDIA Tensor内核深度学习示例+源码 (2000个子文件)
RepeatPlugin.h 6KB
AddPosEncPlugin.h 5KB
dot_based_interact_ampere.h 2KB
dot_based_interact_volta.h 2KB
input.json 3KB
embedding_sizes.json 344B
bert-large-cased.json 314B
bert-large-uncased.json 314B
bert-base-cased.json 313B
bert-base-uncased.json 313B
README.md 304KB
README.md 152KB
README.md 119KB
tensorflow_inference.md 100KB
README.md 94KB
README.md 73KB
README.md 73KB
README.md 72KB
README.md 71KB
README.md 70KB
README.md 66KB
README.md 63KB
README.md 63KB
README.md 63KB
README.md 60KB
README.md 59KB
README.md 58KB
README.md 58KB
README.md 57KB
README.md 57KB
README.md 57KB
README.md 54KB
README.md 53KB
README.md 52KB
README.md 50KB
README.md 50KB
README.md 48KB
README.md 48KB
README.md 48KB
README.md 48KB
README.md 48KB
README.md 48KB
README.md 48KB
README.md 46KB
README.md 46KB
README.md 46KB
merlin_hps_inference.md 45KB
README.md 44KB
README.md 44KB
README.md 43KB
README.md 43KB
README.md 43KB
README.md 42KB
README.md 42KB
README.md 41KB
README.md 41KB
README.md 40KB
README.md 40KB
README.md 38KB
README.md 38KB
README.md 38KB
README.md 37KB
README.md 37KB
README.md 37KB
README.md 36KB
README.md 36KB
README.md 36KB
README.md 35KB
README.md 35KB
README.md 34KB
README.md 33KB
README.md 33KB
README.md 33KB
README.md 31KB
README.md 31KB
README.md 31KB
README.md 31KB
README.md 30KB
README.md 30KB
README.md 29KB
README.md 28KB
README.md 28KB
README.md 28KB
README.md 27KB
README.md 26KB
README.md 26KB
README.md 23KB
DLRM.md 20KB
README.md 19KB
README.md 17KB
multidataset.md 15KB
README.md 15KB
running_pets.md 13KB
DCNv2.md 13KB
README.md 13KB
README.md 13KB
README.md 12KB
detection_model_zoo.md 12KB
README.md 12KB
LICENSE.md 11KB
共 2000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 20
资源评论
周yyyyyyyyyy
- 粉丝: 157
- 资源: 105
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 《1+X移动互联网应用开发初级》试卷答案3
- 《1+X移动互联网应用开发初级》试卷答案2
- 《1+X移动互联网应用开发初级》试卷答案
- PLC机械手课程设计样本PLC机械手课程设计样本.doc
- 格雷码,外差 基于c++版本相位编码与解码 GrayCoding 类 为相移+格雷码的编码与解码程序 MultiFrequency 类 为三频外差的编码与解码程序 Main为运行代码的主程序,包含
- python 代码实现了一个目标检测应用程序,使用YOLOv8模型对视频中的目标进行检测 它从指定的视频文件中读取帧,使用模型进行检测,并在窗口中显示带有检测结果的帧,直到用户按下q键退出
- 基于语音识别的智能垃圾分类系统源代码(完整前后端+mysql+说明文档+LW).zip
- 基于网易新闻+评论的舆情热点分析平台源代码(完整前后端+mysql+说明文档+LW).zip
- MATLAB实现BiLSTM(双向长短期记忆神经网络)数据异常检测(含完整的程序,GUI设计和代码详解)
- 653152225001783外卖管理系统.apk
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功