【免费】适用于cudnn8.9.1、cuda12.1的cudnn测试用例

需积分: 0 110 浏览量 2023-05-31 12:58:43 上传评论收藏 1.73MB GZ 举报

共62个文件

txt：13个

h：12个

bin：8个

资源推荐

资源详情

资源评论

收起资源包目录

cudnn_samples_v8.tar.gz （62个子文件）

cudnn_samples_v8

mnistCUDNN

fp16_dev.h 588B

mnistCUDNN 225KB

mnistCUDNN.cpp 36KB

fp16_emu.o 13KB

Makefile 7KB

mnistCUDNN.o 250KB

data

conv2.bin 98KB

conv2.bias.bin 200B

three_28x28.pgm 836B

conv1.bin 2KB

one_28x28.pgm 836B

five_28x28.pgm 836B

ip2.bin 20KB

ip1.bias.bin 2KB

conv1.bias.bin 80B

ip1.bin 1.53MB

ip2.bias.bin 40B

fp16_emu.cpp 5KB

fp16_dev.cu 1KB

error_util.h 7KB

gemv.h 3KB

fp16_emu.h 5KB

fp16_dev.o 72KB

readme.txt 3KB

multiHeadAttention

multiHeadAttention.cpp 50KB

Makefile 6KB

README.txt 3KB

attn_ref.py 8KB

fp16_emu.h 5KB

multiHeadAttention.h 9KB

run_ref.sh 784B

RNN_v8.0

compare.py 5KB

RNN_example.cu 30KB

RNN_example.h 11KB

golden_3.txt 505B

Makefile 6KB

fp16_emu.cpp 5KB

golden_1.txt 408B

golden_2.txt 408B

golden_4.txt 408B

fp16_emu.h 5KB

readme.txt 4KB

conv_sample

fp16_dev.h 588B

Makefile 6KB

fp16_emu.cpp 6KB

fp16_dev.cu 1KB

run_conv_sample.sh 2KB

error_util.h 7KB

fp16_emu.h 6KB

conv_sample.cpp 67KB

readme.txt 4KB

RNN

compare.py 5KB

RNN_example.cu 39KB

golden_3.txt 505B

Makefile 6KB

fp16_emu.cpp 5KB

golden_1.txt 408B

golden_2.txt 408B

golden_4.txt 408B

fp16_emu.h 5KB

readme.txt 4KB

samples_common.mk 1KB

This example demonstrates how to use CUDNN library calls cudnnConvolutionForward, cudnnConvolutionBackwardData, and cudnnConvolutionBackwardFilter with the option to enable Tensor Cores on Volta with cudnnSetConvolutionMathType. 1. Make sure cuda and cudnn are installed in the same directory. 2. Run make from the directory of the sample specifying the cuda installation path: make CUDA_PATH=<cuda installation path> 3. Use the following arguments to run sample with different convolution parameters: -c2048 -h7 -w7 -k512 -r1 -s1 -pad_h0 -pad_w0 -u1 -v1 -c512 -h28 -w28 -k128 -r1 -s1 -pad_h0 -pad_w0 -u1 -v1 -c512 -h28 -w28 -k1024 -r1 -s1 -pad_h0 -pad_w0 -u2 -v2 -c512 -h28 -w28 -k256 -r1 -s1 -pad_h0 -pad_w0 -u2 -v2 -c256 -h14 -w14 -k256 -r3 -s3 -pad_h1 -pad_w1 -u1 -v1 -c256 -h14 -w14 -k1024 -r1 -s1 -pad_h0 -pad_w0 -u1 -v1 -c1024 -h14 -w14 -k256 -r1 -s1 -pad_h0 -pad_w0 -u1 -v1 -c1024 -h14 -w14 -k2048 -r1 -s1 -pad_h0 -pad_w0 -u2 -v2 -c1024 -h14 -w14 -k512 -r1 -s1 -pad_h0 -pad_w0 -u2 -v2 -c512 -h7 -w7 -k512 -r3 -s3 -pad_h1 -pad_w1 -u1 -v1 -c512 -h7 -w7 -k2048 -r1 -s1 -pad_h0 -pad_w0 -u1 -v1 -c2048 -h7 -w7 -k512 -r1 -s1 -pad_h0 -pad_w0 -u1 -v1 4. Use the following arguments to run sample with int8x4 and int8x32 benchmarks: -mathType1 -filterFormat2 -n1 -c512 -h100 -w100 -k64 -r8 -s8 -pad_h0 -pad_w0 -u1 -v1 -b -mathType1 -filterFormat2 -n1 -c4096 -h64 -w64 -k64 -r4 -s4 -pad_h1 -pad_w1 -u1 -v1 -b -mathType1 -filterFormat2 -n1 -c512 -h100 -w100 -k64 -r8 -s8 -pad_h1 -pad_w1 -u1 -v1 -b -mathType1 -filterFormat2 -n1 -c512 -h128 -w128 -k64 -r13 -s13 -pad_h1 -pad_w1 -u1 -v1 -b 5. Use the following additional arguments to run the layer with a different setup: -mathType1 : enable Tensor Cores. -dataType0 : Data is represented as FLOAT -dataType1 : Data is represented as HALF -dataType2 : Data is represented as INT8x4 -dataType3 : Data is represented as INT8x32 -dgrad : run cudnnConvolutionBackwardData() instead of cudnnConvolutionForward(). -wgrad : run cudnnConvolutionBackwardFilter() instead of cudnnConvolutionForward(). -n<int> : mini batch size. (use -b with large n) -b : benchmark mode. Bypass the CPU correctness check. -filterFormat0 : Use tensor format CUDNN_TENSOR_NCHW (Default). -filterFormat1 : Use tensor format CUDNN_TENSOR_NHWC. -filterFormat2 : Use tensor format CUDNN_TENSOR_NCHW_VECT_C. Using this format switches to int8x4 and int8x32 testing 6. Note that changing the "-filterFormat" flag will automatically switch to valid data types for that format. CUDNN_TENSOR_NCHW and CUDNN_TENSOR_NHWC support single and half precision tests, while CUDNN_TENSOR_NCHW_VECT_C supports int8x4 and int8x32 tests. 7. "-fold" flag is useful for strided cases, FFT algorithm is chosen for demo purposes, but it can be applied to other algorithms as well 8. Use the following arguments to run INT8x4 and INT8x32 convolution with reordered filter matrices. -mathType1 -filterFormat2 -dataType3 -n5 -c32 -h16 -w16 -k32 -r5 -s5 -pad_h0 -pad_w0 -u1 -v1 -b -mathType1 -filterFormat2 -dataType3 -n5 -c64 -h16 -w16 -k32 -r5 -s5 -pad_h0 -pad_w0 -u1 -v1 -b -mathType1 -filterFormat2 -dataType3 -n5 -c128 -h16 -w16 -k32 -r5 -s5 -pad_h0 -pad_w0 -u1 -v1 -b -mathType1 -filterFormat2 -dataType3 -n5 -c32 -h16 -w16 -k64 -r5 -s5 -pad_h0 -pad_w0 -u1 -v1 -b -mathType1 -filterFormat2 -dataType3 -n5 -c64 -h32 -w32 -k64 -r5 -s5 -pad_h0 -pad_w0 -u1 -v1 -b -mathType1 -filterFormat2 -dataType3 -n5 -c128 -h16 -w16 -k64 -r5 -s5 -pad_h0 -pad_w0 -u1 -v1 -b -mathType1 -filterFormat2 -dataType3 -n5 -c128 -h16 -w16 -k128 -r5 -s5 -pad_h0 -pad_w0 -u1 -v1 -b 9. Use the following arguments to transform NCHW data to NC/32H32W format. Dimension of input NCHW have been given using n, c, h, w flags -n1 -c3 -h2 -w2 -transformFromNCHW -n1 -c18 -h2 -w2 -transformFromNCHW -n1 -c30 -h2 -w2 -transformFromNCHW 10. Use the following arguments to transform NC/32H32W data to NCHW format. Dimension of output NCHW have been given using n, c, h, w flags -n1 -c3 -h2 -w2 -transformToNCHW -n1 -c18 -h2 -w2 -transformToNCHW -n1 -c30 -h2 -w2 -transformToNCHW

评论收藏

内容反馈