TensorRT-5.1.5.0.Windows10版本.x86_64.平台cuda-10.0.cudnn7.5资源-CSDN文库

共1159个文件

h：25个

md：16个

cpp：16个

cuda

tensorrt

需积分: 9 14 浏览量 2022-03-17 14:01:38 上传评论收藏 247.01MB ZIP 举报

资源详情

资源评论

资源推荐

收起资源包目录

TensorRT-5.1.5.0 .Windows10版本.x86_64.平台cuda-10.0.cudnn7.5 （1159个子文件）

batch0 307KB

batch1 307KB

batch10 307KB

batch100 307KB

batch1000 307KB

batch1001 307KB

batch1002 307KB

batch101 307KB

batch102 307KB

batch103 307KB

batch104 307KB

batch105 307KB

batch106 307KB

batch107 307KB

batch108 307KB

batch109 307KB

batch11 307KB

batch110 307KB

batch111 307KB

batch112 307KB

batch113 307KB

batch114 307KB

batch115 307KB

batch116 307KB

batch117 307KB

batch118 307KB

batch119 307KB

batch12 307KB

batch120 307KB

batch121 307KB

batch122 307KB

batch123 307KB

batch124 307KB

batch125 307KB

batch126 307KB

batch127 307KB

batch128 307KB

batch129 307KB

batch13 307KB

batch130 307KB

batch131 307KB

batch132 307KB

batch133 307KB

batch134 307KB

batch135 307KB

batch136 307KB

batch137 307KB

batch138 307KB

batch139 307KB

batch14 307KB

batch140 307KB

batch141 307KB

batch142 307KB

batch143 307KB

batch144 307KB

batch145 307KB

batch146 307KB

batch147 307KB

batch148 307KB

batch149 307KB

batch15 307KB

batch150 307KB

batch151 307KB

batch152 307KB

batch153 307KB

batch154 307KB

batch155 307KB

batch156 307KB

batch157 307KB

batch158 307KB

batch159 307KB

batch16 307KB

batch160 307KB

batch161 307KB

batch162 307KB

batch163 307KB

batch164 307KB

batch165 307KB

batch166 307KB

batch167 307KB

batch168 307KB

batch169 307KB

batch17 307KB

batch170 307KB

batch171 307KB

batch172 307KB

batch173 307KB

batch174 307KB

batch175 307KB

batch176 307KB

batch177 307KB

batch178 307KB

batch179 307KB

batch18 307KB

batch180 307KB

batch181 307KB

batch182 307KB

batch183 307KB

batch184 307KB

batch185 307KB

共 1159 条

# Performing Inference In INT8 Using Custom Calibration **Table Of Contents** - [Description](#description) - [How does this sample work?](#how-does-this-sample-work) * [Defining the network](#defining-the-network) * [Setup the calibrator](#setup-the-calibrator) * [Calibration data](#calibration-data) * [Calibrator interface](#calibrator-interface) * [Calibration file](#calibration-file) * [Configuring the builder](#configuring-the-builder) * [Building the engine](#building-the-engine) * [Running the engine](#running-the-engine) * [Verifying the output](#verifying-the-output) * [TensorRT API layers and ops](#tensorrt-api-layers-and-ops) - [Batch files for calibration](#batch-files-for-calibration) * [Generating batch files for Caffe users](#generating-batch-files-for-caffe-users) * [Generating batch files for non-Caffe users](#generating-batch-files-for-non-caffe-users) - [Running the sample](#running-the-sample) * [Sample `--help` options](#sample---help-options) - [Additional resources](#additional-resources) - [License](#license) - [Changelog](#changelog) - [Known issues](#known-issues) ## Description This sample, sampleINT8, performs INT8 calibration and inference. Specifically, this sample demonstrates how to perform inference in 8-bit integer (INT8). INT8 inference is available only on GPUs with compute capability 6.1 or 7.x. After the network is calibrated for execution in INT8, output of the calibration is cached to avoid repeating the process. You can then reproduce your own experiments with any deep learning framework in order to validate your results on ImageNet networks. ## How does this sample work? INT8 engines are build from 32-bit network definitions, similarly to 32-bit and 16-bit engines, but with more configuration steps. In particular, the builder and network must be configured to use INT8, which requires per-tensor dynamic ranges. The INT8 calibrator can determine how best to represent weights and activations as 8-bit integers and sets the per tensor dynamic ranges accordingly. Alternatively, you can set custom per tensor dynamic ranges; this is covered in sampleINT8API. This sample is accompanied by the [MNIST training set](https://github.com/BVLC/caffe/blob/master/data/mnist/get_mnist.sh) located in the TensorRT-5.1.0.4/data/mnist/batches directory. The packaged MNIST model that is shipped with this sample is based on [lenet.prototxt](https://github.com/BVLC/caffe/edit/master/examples/mnist/lenet.prototxt). For more information, see the [MNIST BVLC Caffe example](https://github.com/BVLC/caffe/tree/master/examples/mnist). This sample can also be used with other Image classification models, for example, [deploy.prototxt](https://github.com/BVLC/caffe/blob/master/models/bvlc_googlenet/deploy.prototxt). The packaged data set file that is shipped with this sample is based on the [MNIST data set](https://github.com/BVLC/caffe/tree/master/data/mnist). However, the batch file generation from the above data set is described in [Batch files for calibration](#batch-files-for-calibration). Specifically, this sample performs the following steps: - [Defines the network](#defining-the-network) - [Sets up the calibrator](#setup-the-calibrator) - [Configures the builder](#configuring-the-builder) - [Builds the engine](#building-the-engine) - [Runs the engine](#running-the-engine) - [Verifies the output](#verifying-the-output) ### Defining the network Defining a network for INT8 execution is exactly the same as for any other precision. Weights should be imported as FP32 values, and the builder will calibrate the network to find appropriate quantization factors to reduce the network to INT8 precision. This sample imports the network using the NvCaffeParser: ``` const IBlobNameToTensor* blobNameToTensor = parser->parse(locateFile(deployFile).c_str(), locateFile(modelFile).c_str(), *network, DataType::kFLOAT); ``` ### Setup the calibrator Calibration is an additional step required when building networks for INT8. The application must provide TensorRT with sample input, in other words, calibration data. TensorRT will then perform inference in FP32 and gather statistics about intermediate activation layers that it will use to build the reduced precision INT8 engine. #### Calibration data Calibration must be performed using images representative of those which will be used at runtime. Since the sample is based around Caffe, any image preprocessing that caffe would perform prior to running the network (such as scaling, cropping, or mean subtraction) will be done in Caffe and captured as a set of files. The sample uses a utility class (BatchStream) to read these files and create appropriate input for calibration. Generation of these files is discussed in [Batch files for calibration](#batch-files-for-calibration). You can create calibration data stream (calibrationStream), for example: `BatchStream calibrationStream(CAL_BATCH_SIZE, NB_CAL_BATCHES);` The BatchStream class provides helper methods used to retrieve batch data. Batch stream object is used by the calibrator in order to retrieve batch data while calibrating. In general, the BatchStream class should provide implementation for `getBatch()` and `getBatchSize()` which can be invoked by `IInt8Calibrator::getBatch()` and `IInt8Calibrator::getBatchSize()`. Ideally, you can write your own custom BatchStream class to serve calibration data. For more information, see `BatchStream.h`. **Note:** The calibration data must be representative of the input provided to TensorRT at runtime; for example, for image classification networks, it should not consist of images from just a small subset of categories. For ImageNet networks, around 500 calibration images is adequate. #### Calibrator interface The application must implement the `IInt8Calibrator` interface to provide calibration data and helper methods for reading/writing the calibration table file. We can create calibrator object (`calibrator`), for example: `std::unique_ptr<IInt8Calibrator> calibrator;` TensorRT provides 3 implementations for `IInt8Calibrator`: 1. IInt8EntropyCalibrator 2. IInt8EntropyCalibrator2 3. IInt8LegacyCalibrator See `NvInfer.h` for more information on the `IInt8Calibrator` interface variants. This sample uses `IInt8EntropyCalibrator2` by default. We can set the calibrator interface to use `IInt8EntropyCalibrator2` as shown: ``` calibrator.reset(new Int8EntropyCalibrator2(calibrationStream, FIRST_CAL_BATCH, gNetworkName, INPUT_BLOB_NAME)); ``` where `calibrationStream` is a BatchStream object. The calibrator object should be configured to use the calibration batch stream. In order to perform calibration, the interface must provide implementation for `getBatchSize()` and `getBatch()` to retrieve data from the BatchStream object. The builder calls the `getBatchSize()` method once, at the start of calibration, to obtain the batch size for the calibration set. The method `getBatch()` is then called repeatedly to obtain batches from the application, until the method returns false. Every calibration batch must include exactly the number of images specified as the batch size. ``` bool getBatch(void* bindings[], const char* names[], int nbBindings) override { if (!mStream.next()) return false; CHECK(cudaMemcpy(mDeviceInput, mStream.getBatch(), mInputCount * sizeof(float), cudaMemcpyHostToDevice)); assert(!strcmp(names[0], INPUT_BLOB_NAME)); bindings[0] = mDeviceInput; return true; } ``` For each input tensor, a pointer to input data in GPU memory must be written into the bindings array. The names array contains the names of the input tensors. The position for each tensor in the bindings array matches the position of its name in the names array. Both arrays have size `nbBindings`. Since the calibration step is time consuming, you can choose to provide the implementation for `writeCalibrationCache()` to write calibration table to the appropriate location to be used for later runs.