目 录
图表目录...............................................................................................................................................................v
第 1 章 简介.................................................................................................................................................................1
1.1 CUDA:可伸缩并行编程模型.....................................................................................................................1
1.2 GPU:高度并行化、多线程、多核处理器................................................................................................1
1.3 文档结构.........................................................................................................................................................3
第 2 章 编程模型..........................................................................................................................................................4
2.1 线程层次结构................................................................................................................................................4
2.2 存储器层次结构............................................................................................................................................6
2.3 主机和设备.....................................................................................................................................................6
2.4 软件栈.............................................................................................................................................................7
2.5 计算能力.........................................................................................................................................................8
第 3 章 GPU 实现........................................................................................................................................................9
3.1 具有芯片共享存储器的一组 SIMT 多处理器............................................................................................9
3.2 多个设备.......................................................................................................................................................11
3.3 模式切换.......................................................................................................................................................11
第 4 章 应用程序编程接口.......................................................................................................................................13
4.1 C 编程语言的扩展.......................................................................................................................................13
4.2 语言扩展.......................................................................................................................................................13
4.2.1 函数类型限定符...............................................................................................................................13
4.2.1.1 _device_..................................................................................................................................13
4.2.1.2 _global_..................................................................................................................................14
4.2.1.3 _host_......................................................................................................................................14
4.2.1.4 限制........................................................................................................................................14
4.2.2 变量类型限定符...............................................................................................................................14
4.2.2.1 _device_..................................................................................................................................14
4.2.2.2 _constant_...............................................................................................................................14
4.2.2.3 _shared_..................................................................................................................................15
4.2.2.4 限制........................................................................................................................................15
4.2.3 执行配置...........................................................................................................................................16
4.2.4 内置变量...........................................................................................................................................16
4.2.4.1 gridDim...................................................................................................................................16
4.2.4.2 blockIdx..................................................................................................................................16
4.2.4.3 blockDim................................................................................................................................16
4.2.4.4 threadIdx.................................................................................................................................17
4.2.4.5 warpSize.................................................................................................................................17
4.2.4.6 限制........................................................................................................................................17
4.2.5 使用 NVCC 进行编译......................................................................................................................17
4.2.5.1 _noinline_...............................................................................................................................17
4.2.5.2 #pragma unroll........................................................................................................................17
4.3 通用运行时组件..........................................................................................................................................18
4.3.1 内置向量类型...................................................................................................................................18
4.3.1.1
char1、uchar1、char2、uchar2、char3、uchar3、char4、uchar4、short1、ushort1、short2、us
hort2、short3、ushort3、short4、ushort4、int1、uint1、int2、uint2、int3、uint3、int4、uint4
、long1、ulong1、long2、ulong2、long3、ulong3、long4、ulong4、float1、float2、float3、fl
oat4、double2.....................................................................................................................................18
4.3.1.2 dim3 类型...............................................................................................................................18
4.3.2 数学函数...........................................................................................................................................18
4.3.3 计时函数...........................................................................................................................................19
4.3.4 纹理类型...........................................................................................................................................19
4.3.4.1 纹理参考声明........................................................................................................................19
4.3.4.2 运行时纹理参考属性............................................................................................................19