没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
CUDNN LIBRARY
DU-06702-001_v5.1 | May 2016
User Guide
www.nvidia.com
cuDNN Library DU-06702-001_v5.1|2
www.nvidia.com
cuDNN Library DU-06702-001_v5.1|1
Chapter1.
INTRODUCTION
NVIDIA
®
cuDNN is a GPU-accelerated library of primitives for deep neural networks.
It provides highly tuned implementations of routines arising frequently in DNN
applications:
‣
Convolution forward and backward, including cross-correlation
‣
Pooling forward and backward
‣
Softmax forward and backward
‣
Neuron activations forward and backward:
‣
Rectified linear (ReLU)
‣
Sigmoid
‣
Hyperbolic tangent (TANH)
‣
Tensor transformation functions
‣
LRN, LCN and batch normalization forward and backward
cuDNN's convolution routines aim for performance competitive with the fastest GEMM
(matrix multiply) based implementations of such routines while using significantly less
memory.
cuDNN features customizable data layouts, supporting flexible dimension ordering,
striding, and subregions for the 4D tensors used as inputs and outputs to all of its
routines. This flexibility allows easy integration into any neural network implementation
and avoids the input/output transposition steps sometimes necessary with GEMM-based
convolutions.
cuDNN offers a context-based API that allows for easy multithreading and (optional)
interoperability with CUDA streams.
www.nvidia.com
cuDNN Library DU-06702-001_v5.1|2
Chapter2.
GENERAL DESCRIPTION
2.1.Programming Model
The cuDNN Library exposes a Host API but assumes that for operations using the GPU,
the necessary data is directly accessible from the device.
An application using cuDNN must initialize a handle to the library context by calling
cudnnCreate(). This handle is explicitly passed to every subsequent library function
that operates on GPU data. Once the application finishes using cuDNN, it can release
the resources associated with the library handle using cudnnDestroy() . This
approach allows the user to explicitly control the library's functioning when using
multiple host threads, GPUs and CUDA Streams. For example, an application can use
cudaSetDevice() to associate different devices with different host threads and in each
of those host threads, use a unique cuDNN handle which directs library calls to the
device associated with it. cuDNN library calls made with different handles will thus
automatically run on different devices. The device associated with a particular cuDNN
context is assumed to remain unchanged between the corresponding cudnnCreate()
and cudnnDestroy() calls. In order for the cuDNN library to use a different device
within the same host thread, the application must set the new device to be used by
calling cudaSetDevice() and then create another cuDNN context, which will be
associated with the new device, by calling cudnnCreate().
2.2.Notation
As of CUDNN v4 we have adopted a mathematicaly-inspired notation for layer inputs
and outputs using x,y,dx,dy,b,w for common layer parameters. This was done to
improve readability and ease of understanding of parameters meaning. All layers now
follow a uniform convention that during inference
y = layerFunction(x, otherParams).
And during backpropagation
(dx, dOtherParams) = layerFunctionGradient(x,y,dy,otherParams)
General Description
www.nvidia.com
cuDNN Library DU-06702-001_v5.1|3
For convolution the notation is
y = x*w+b
where w is the matrix of filter weights, x is the previous layer's data (during
inference), y is the next layer's data, b is the bias and * is the convolution operator.
In backpropagation routines the parameters keep their meanings. dx,dy,dw,db
always refer to the gradient of the final network error function with respect to a given
parameter. So dy in all backpropagation routines always refers to error gradient
backpropagated through the network computation graph so far. Similarly other
parameters in more specialized layers, such as, for instance, dMeans or dBnBias refer to
gradients of the loss function wrt those parameters.
w is used in the API for both the width of the x tensor and convolution filter
matrix. To resolve this ambiguity we use w and filter notation interchangeably for
convolution filter weight matrix. The meaning is clear from the context since the
layer width is always referenced near it's height.
2.3.Tensor Descriptor
The cuDNN Library describes data holding images, videos and any other data with
contents with a generic n-D tensor defined with the following parameters :
‣
a dimension dim from 3 to 8
‣
a data type (32-bit floating point, 64 bit-floating point, 16 bit floating point...)
‣
dim integers defining the size of each dimension
‣
dim integers defining the stride of each dimension (e.g the number of elements to
add to reach the next element from the same dimension)
The first two dimensions define respectively the batch number n and the number of
features maps c. This tensor definition allows for example to have some dimensions
overlapping each others within the same tensor by having the stride of one dimension
smaller than the product of the dimension and the stride of the next dimension. In
cuDNN, unless specified otherwise, all routines will support tensors with overlapping
dimensions for forward pass input tensors, however, dimensions of the output tensors
cannot overlap. Even though this tensor format supports negative strides (which can be
useful for data mirroring), cuDNN routines do not support tensors with negative strides
unless specified otherwise.
2.3.1.WXYZ Tensor Descriptor
Tensor descriptor formats are identified using acronyms, with each letter referencing a
corresponding dimension. In this document, the usage of this terminology implies :
‣
all the strides are strictly positive
‣
the dimensions referenced by the letters are sorted in decreasing order of their
respective strides
剩余148页未读,继续阅读
资源评论
blackperl
- 粉丝: 0
- 资源: 2
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功