Triton后端，可以在Python中实现预处理、后处理和其他逻辑。_C++_Python

共114个文件

py：28个

h：26个

cc：24个

版权申诉

113 浏览量 2023-04-26 11:08:10 上传评论收藏 288KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

Triton后端，可以在Python中实现预处理、后处理和其他逻辑。_C++_Python_下载.zip （114个子文件）

python_be.cc 75KB

pb_stub.cc 54KB

infer_request.cc 21KB

stub_launcher.cc 20KB

pb_tensor.cc 17KB

request_executor.cc 15KB

infer_response.cc 13KB

pb_memory.cc 12KB

pb_env.cc 10KB

pb_stub_utils.cc 10KB

pb_utils.cc 8KB

shm_manager.cc 7KB

response_sender.cc 7KB

ipc_message.cc 5KB

pb_string.cc 4KB

pb_response_iterator.cc 4KB

pb_log.cc 4KB

pb_metric_reporter.cc 4KB

pb_map.cc 4KB

memory_manager.cc 3KB

infer_payload.cc 3KB

pb_error.cc 2KB

shm_monitor.cc 2KB

scoped_defer.cc 2KB

.clang-format 773B

.gitignore 2KB

python_be.h 17KB

pb_stub.h 10KB

message_queue.h 9KB

pb_utils.h 9KB

pb_tensor.h 9KB

shm_manager.h 8KB

stub_launcher.h 6KB

infer_request.h 6KB

pb_memory.h 6KB

infer_response.h 6KB

ipc_message.h 5KB

pb_log.h 3KB

memory_manager.h 3KB

pb_string.h 3KB

pb_map.h 3KB

infer_payload.h 3KB

pb_stub_utils.h 3KB

pb_metric_reporter.h 3KB

pb_error.h 2KB

request_executor.h 2KB

pb_response_iterator.h 2KB

pb_preferred_memory.h 2KB

pb_env.h 2KB

response_sender.h 2KB

pb_exception.h 2KB

scoped_defer.h 2KB

TritonPythonBackendConfig.cmake.in 2KB

libtriton_python.ldscript 2KB

LICENSE 1KB

README.md 58KB

README.md 15KB

README.md 8KB

README.md 7KB

README.md 6KB

README.md 5KB

README.md 4KB

README.md 2KB

config.pbtxt 2KB

repeat_config.pbtxt 2KB

sync_config.pbtxt 2KB

async_config.pbtxt 2KB

config.pbtxt 2KB

square_config.pbtxt 2KB

config.pbtxt 2KB

async_config.pbtxt 2KB

sync_config.pbtxt 2KB

gen_triton_model.py 33KB

triton_python_backend_utils.py 18KB

repeat_model.py 12KB

square_model.py 11KB

batch_model.py 10KB

nobatch_model.py 10KB

async_model.py 7KB

model.py 7KB

sync_model.py 7KB

model.py 7KB

model.py 6KB

sync_model.py 6KB

model.py 6KB

sync_client.py 5KB

square_client.py 5KB

repeat_client.py 5KB

client.py 4KB

model.py 4KB

async_client.py 3KB

client.py 3KB

共 114 条

[![License](https://img.shields.io/badge/License-BSD3-lightgrey.svg)](https://opensource.org/licenses/BSD-3-Clause) # Python Backend The Triton backend for Python. The goal of Python backend is to let you serve models written in Python by Triton Inference Server without having to write any C++ code. ## User Documentation - [Python Backend](#python-backend) - [User Documentation](#user-documentation) - [Quick Start](#quick-start) - [Building from Source](#building-from-source) - [Usage](#usage) - [`auto_complete_config`](#auto_complete_config) - [`initialize`](#initialize) - [`execute`](#execute) - [Default Mode](#default-mode) - [Decoupled mode](#decoupled-mode) - [Use Cases](#use-cases) - [Known Issues](#known-issues) - [`finalize`](#finalize) - [Model Config File](#model-config-file) - [Inference Request Parameters](#inference-request-parameters) - [Managing Python Runtime and Libraries](#managing-python-runtime-and-libraries) - [Building Custom Python Backend Stub](#building-custom-python-backend-stub) - [Creating Custom Execution Environments](#creating-custom-execution-environments) - [Important Notes](#important-notes) - [Error Handling](#error-handling) - [Managing Shared Memory](#managing-shared-memory) - [Multiple Model Instance Support](#multiple-model-instance-support) - [Running Multiple Instances of Triton Server](#running-multiple-instances-of-triton-server) - [Business Logic Scripting](#business-logic-scripting) - [Using BLS with Stateful Models](#using-bls-with-stateful-models) - [Limitation](#limitation) - [Interoperability and GPU Support](#interoperability-and-gpu-support) - [`pb_utils.Tensor.to_dlpack() -> PyCapsule`](#pb_utilstensorto_dlpack---pycapsule) - [`pb_utils.Tensor.from_dlpack() -> Tensor`](#pb_utilstensorfrom_dlpack---tensor) - [`pb_utils.Tensor.is_cpu() -> bool`](#pb_utilstensoris_cpu---bool) - [Input Tensor Device Placement](#input-tensor-device-placement) - [Frameworks](#frameworks) - [PyTorch](#pytorch) - [TensorFlow](#tensorflow) - [Examples](#examples) - [AddSub in NumPy](#addsub-in-numpy) - [AddSubNet in PyTorch](#addsubnet-in-pytorch) - [AddSub in JAX](#addsub-in-jax) - [Business Logic Scripting](#business-logic-scripting-1) - [Preprocessing](#preprocessing) - [Decoupled Models](#decoupled-models) - [Running with Inferentia](#running-with-inferentia) - [Logging](#logging) - [Reporting problems, asking questions](#reporting-problems-asking-questions) ## Quick Start 1. Run the Triton Inference Server container. ``` docker run --shm-size=1g --ulimit memlock=-1 -p 8000:8000 -p 8001:8001 -p 8002:8002 --ulimit stack=67108864 -ti nvcr.io/nvidia/tritonserver:<xx.yy>-py3 ``` Replace \<xx.yy\> with the Triton version (e.g. 21.05). 2. Inside the container, clone the Python backend repository. ``` git clone https://github.com/triton-inference-server/python_backend -b r<xx.yy> ``` 3. Install example model. ``` cd python_backend mkdir -p models/add_sub/1/ cp examples/add_sub/model.py models/add_sub/1/model.py cp examples/add_sub/config.pbtxt models/add_sub/config.pbtxt ``` 4. Start the Triton server. ``` tritonserver --model-repository `pwd`/models ``` 5. In the host machine, start the client container. ``` docker run -ti --net host nvcr.io/nvidia/tritonserver:<xx.yy>-py3-sdk /bin/bash ``` 6. In the client container, clone the Python backend repository. ``` git clone https://github.com/triton-inference-server/python_backend -b r<xx.yy> ``` 7. Run the example client. ``` python3 python_backend/examples/add_sub/client.py ``` ## Building from Source 1. Requirements * cmake >= 3.17 * numpy * rapidjson-dev * libarchive-dev * zlib1g-dev ``` pip3 install numpy ``` On Ubuntu or Debian you can use the command below to install `rapidjson`, `libarchive`, and `zlib`: ``` sudo apt-get install rapidjson-dev libarchive-dev zlib1g-dev ``` 2. Build Python backend. Replace \<GIT\_BRANCH\_NAME\> with the GitHub branch that you want to compile. For release branches it should be r\<xx.yy\> (e.g. r21.06). ``` mkdir build cd build cmake -DTRITON_ENABLE_GPU=ON -DTRITON_BACKEND_REPO_TAG=<GIT_BRANCH_NAME> -DTRITON_COMMON_REPO_TAG=<GIT_BRANCH_NAME> -DTRITON_CORE_REPO_TAG=<GIT_BRANCH_NAME> -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install .. make install ``` The following required Triton repositories will be pulled and used in the build. If the CMake variables below are not specified, "main" branch of those repositories will be used. \<GIT\_BRANCH\_NAME\> should be the same as the Python backend repository branch that you are trying to compile. * triton-inference-server/backend: `-DTRITON_BACKEND_REPO_TAG=<GIT_BRANCH_NAME>` * triton-inference-server/common: `-DTRITON_COMMON_REPO_TAG=<GIT_BRANCH_NAME>` * triton-inference-server/core: `-DTRITON_CORE_REPO_TAG=<GIT_BRANCH_NAME>` Set `-DCMAKE_INSTALL_PREFIX` to the location where the Triton Server is installed. In the released containers, this location is `/opt/tritonserver`. 3. Copy example model and configuration ``` mkdir -p models/add_sub/1/ cp examples/add_sub/model.py models/add_sub/1/model.py cp examples/add_sub/config.pbtxt models/add_sub/config.pbtxt ``` 4. Start the Triton Server ``` /opt/tritonserver/bin/tritonserver --model-repository=`pwd`/models ``` 5. Use the client app to perform inference ``` python3 examples/add_sub/client.py ``` ## Usage In order to use the Python backend, you need to create a Python file that has a structure similar to below: ```python import triton_python_backend_utils as pb_utils class TritonPythonModel: """Your Python model must use the same class name. Every Python model that is created must have "TritonPythonModel" as the class name. """ @staticmethod def auto_complete_config(auto_complete_model_config): """`auto_complete_config` is called only once when loading the model assuming the server was not started with `--disable-auto-complete-config`. Implementing this function is optional. No implementation of `auto_complete_config` will do nothing. This function can be used to set `max_batch_size`, `input` and `output` properties of the model using `set_max_batch_size`, `add_input`, and `add_output`. These properties will allow Triton to load the model with minimal model configuration in absence of a configuration file.

评论收藏

内容反馈

版权申诉