# Performing Inference In INT8 Using Custom Calibration
**Table Of Contents**
- [Description](#description)
- [How does this sample work?](#how-does-this-sample-work)
* [Defining the network](#defining-the-network)
* [Setup the calibrator](#setup-the-calibrator)
* [Calibration data](#calibration-data)
* [Calibrator interface](#calibrator-interface)
* [Calibration file](#calibration-file)
* [Configuring the builder](#configuring-the-builder)
* [Building the engine](#building-the-engine)
* [Running the engine](#running-the-engine)
* [Verifying the output](#verifying-the-output)
* [TensorRT API layers and ops](#tensorrt-api-layers-and-ops)
- [Batch files for calibration](#batch-files-for-calibration)
* [Generating batch files for Caffe users](#generating-batch-files-for-caffe-users)
* [Generating batch files for non-Caffe users](#generating-batch-files-for-non-caffe-users)
- [Running the sample](#running-the-sample)
* [Sample `--help` options](#sample---help-options)
- [Additional resources](#additional-resources)
- [License](#license)
- [Changelog](#changelog)
- [Known issues](#known-issues)
## Description
This sample, sampleINT8, performs INT8 calibration and inference.
Specifically, this sample demonstrates how to perform inference in 8-bit integer (INT8). INT8 inference is available only on GPUs with compute capability 6.1 or 7.x. After the network is calibrated for execution in INT8, output of the calibration is cached to avoid repeating the process. You can then reproduce your own experiments with any deep learning framework in order to validate your results on ImageNet networks.
## How does this sample work?
INT8 engines are build from 32-bit network definitions, similarly to 32-bit and 16-bit engines, but with more configuration steps. In particular, the builder and network must be configured to use INT8, which requires per-tensor dynamic ranges. The INT8 calibrator can determine how best to represent weights and activations as 8-bit integers and sets the per tensor dynamic ranges accordingly. Alternatively, you can set custom per tensor dynamic ranges; this is covered in sampleINT8API.
This sample is accompanied by the [MNIST training set](https://github.com/BVLC/caffe/blob/master/data/mnist/get_mnist.sh) located in the TensorRT-5.1.0.4/data/mnist/batches directory. The packaged MNIST model that is shipped with this sample is based on [lenet.prototxt](https://github.com/BVLC/caffe/edit/master/examples/mnist/lenet.prototxt). For more information, see the [MNIST BVLC Caffe example](https://github.com/BVLC/caffe/tree/master/examples/mnist). This sample can also be used with other Image classification models, for example, [deploy.prototxt](https://github.com/BVLC/caffe/blob/master/models/bvlc_googlenet/deploy.prototxt).
The packaged data set file that is shipped with this sample is based on the [MNIST data set](https://github.com/BVLC/caffe/tree/master/data/mnist). However, the batch file generation from the above data set is described in [Batch files for calibration](#batch-files-for-calibration).
Specifically, this sample performs the following steps:
- [Defines the network](#defining-the-network)
- [Sets up the calibrator](#setup-the-calibrator)
- [Configures the builder](#configuring-the-builder)
- [Builds the engine](#building-the-engine)
- [Runs the engine](#running-the-engine)
- [Verifies the output](#verifying-the-output)
### Defining the network
Defining a network for INT8 execution is exactly the same as for any other precision. Weights should be imported as FP32 values, and the builder will calibrate the network to find appropriate quantization factors to reduce the network to INT8 precision. This sample imports the network using the NvCaffeParser:
```
const IBlobNameToTensor* blobNameToTensor =
parser->parse(locateFile(deployFile).c_str(),
locateFile(modelFile).c_str(),
*network,
DataType::kFLOAT);
```
### Setup the calibrator
Calibration is an additional step required when building networks for INT8. The application must provide TensorRT with sample input, in other words, calibration data. TensorRT will then perform inference in FP32 and gather statistics about intermediate activation layers that it will use to build the reduced precision INT8 engine.
#### Calibration data
Calibration must be performed using images representative of those which will be used at runtime. Since the sample is based around Caffe, any image preprocessing that caffe would perform prior to running the network (such as scaling, cropping, or mean subtraction) will be done in Caffe and captured as a set of files. The sample uses a utility class (BatchStream) to read these files and create appropriate input for calibration. Generation of these files is discussed in [Batch files for calibration](#batch-files-for-calibration).
You can create calibration data stream (calibrationStream), for example:
`BatchStream calibrationStream(CAL_BATCH_SIZE, NB_CAL_BATCHES);`
The BatchStream class provides helper methods used to retrieve batch data. Batch stream object is used by the calibrator in order to retrieve batch data while calibrating. In general, the BatchStream class should provide implementation for `getBatch()` and `getBatchSize()` which can be invoked by `IInt8Calibrator::getBatch()` and `IInt8Calibrator::getBatchSize()`. Ideally, you can write your own custom BatchStream class to serve calibration data. For more information, see `BatchStream.h`.
**Note:** The calibration data must be representative of the input provided to TensorRT at runtime; for example, for image classification networks, it should not consist of images from just a small subset of categories. For ImageNet networks, around 500 calibration images is adequate.
#### Calibrator interface
The application must implement the `IInt8Calibrator` interface to provide calibration data and helper methods for reading/writing the calibration table file.
We can create calibrator object (`calibrator`), for example:
`std::unique_ptr<IInt8Calibrator> calibrator;`
TensorRT provides 3 implementations for `IInt8Calibrator`:
1. IInt8EntropyCalibrator
2. IInt8EntropyCalibrator2
3. IInt8LegacyCalibrator
See `NvInfer.h` for more information on the `IInt8Calibrator` interface variants.
This sample uses `IInt8EntropyCalibrator2` by default. We can set the calibrator interface to use `IInt8EntropyCalibrator2` as shown:
```
calibrator.reset(new Int8EntropyCalibrator2(calibrationStream, FIRST_CAL_BATCH, gNetworkName, INPUT_BLOB_NAME));
```
where `calibrationStream` is a BatchStream object. The calibrator object should be configured to use the calibration batch stream.
In order to perform calibration, the interface must provide implementation for `getBatchSize()` and `getBatch()` to retrieve data from the BatchStream object.
The builder calls the `getBatchSize()` method once, at the start of calibration, to obtain the batch size for the calibration set. The method `getBatch()` is then called repeatedly to obtain batches from the application, until the method returns false. Every calibration batch must include exactly the number of images specified as the batch size.
```
bool getBatch(void* bindings[], const char* names[], int nbBindings)
override
{
if (!mStream.next())
return false;
CHECK(cudaMemcpy(mDeviceInput, mStream.getBatch(), mInputCount * sizeof(float), cudaMemcpyHostToDevice));
assert(!strcmp(names[0], INPUT_BLOB_NAME));
bindings[0] = mDeviceInput;
return true;
}
```
For each input tensor, a pointer to input data in GPU memory must be written into the bindings array. The names array contains the names of the input tensors. The position for each tensor in the bindings array matches the position of its name in the names array. Both arrays have size `nbBindings`.
Since the calibration step is time consuming, you can choose to provide the implementation for `writeCalibrationCache()` to write calibration table to the appropriate location to be used for later runs.
没有合适的资源?快使用搜索试试~ 我知道了~
资源详情
资源评论
资源推荐
收起资源包目录
TensorRT-5.1.5.0 .Windows10版本.x86_64.平台cuda-10.0.cudnn7.5 (1159个子文件)
batch0 307KB
batch1 307KB
batch10 307KB
batch100 307KB
batch1000 307KB
batch1001 307KB
batch1002 307KB
batch101 307KB
batch102 307KB
batch103 307KB
batch104 307KB
batch105 307KB
batch106 307KB
batch107 307KB
batch108 307KB
batch109 307KB
batch11 307KB
batch110 307KB
batch111 307KB
batch112 307KB
batch113 307KB
batch114 307KB
batch115 307KB
batch116 307KB
batch117 307KB
batch118 307KB
batch119 307KB
batch12 307KB
batch120 307KB
batch121 307KB
batch122 307KB
batch123 307KB
batch124 307KB
batch125 307KB
batch126 307KB
batch127 307KB
batch128 307KB
batch129 307KB
batch13 307KB
batch130 307KB
batch131 307KB
batch132 307KB
batch133 307KB
batch134 307KB
batch135 307KB
batch136 307KB
batch137 307KB
batch138 307KB
batch139 307KB
batch14 307KB
batch140 307KB
batch141 307KB
batch142 307KB
batch143 307KB
batch144 307KB
batch145 307KB
batch146 307KB
batch147 307KB
batch148 307KB
batch149 307KB
batch15 307KB
batch150 307KB
batch151 307KB
batch152 307KB
batch153 307KB
batch154 307KB
batch155 307KB
batch156 307KB
batch157 307KB
batch158 307KB
batch159 307KB
batch16 307KB
batch160 307KB
batch161 307KB
batch162 307KB
batch163 307KB
batch164 307KB
batch165 307KB
batch166 307KB
batch167 307KB
batch168 307KB
batch169 307KB
batch17 307KB
batch170 307KB
batch171 307KB
batch172 307KB
batch173 307KB
batch174 307KB
batch175 307KB
batch176 307KB
batch177 307KB
batch178 307KB
batch179 307KB
batch18 307KB
batch180 307KB
batch181 307KB
batch182 307KB
batch183 307KB
batch184 307KB
batch185 307KB
共 1159 条
- 1
- 2
- 3
- 4
- 5
- 6
- 12
白码飞
- 粉丝: 13
- 资源: 14
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0