没有合适的资源?快使用搜索试试~ 我知道了~
cuda_programming_guide
4星 · 超过85%的资源 需积分: 33 2 下载量 158 浏览量
2014-03-02
17:55:29
上传
评论
收藏 6.43MB PDF 举报
温馨提示
试读
136页
NVIDIA_CUDA_Programming_Guide_1.1_chs
资源详情
资源评论
资源推荐
NVIDIA CUDA 编程指南
NVIDIA
技术文档(全译文)
Jan
.
2008
Version 1.1
- 2 -
GPU
系列技术文档
.....................................................................................................................1
NVIDIA CUDA
编程指南
.........................................................................................................................1
Chapter1
介绍
CUDA…….....................................................................................................................11
1.1
作为一个并行数据计算设备的图形处理器单元
………………………….............................................11
1.2 CUDA
: 一个在
GPU
上计算的新架构
..............................................................................................12
Chapter2
编程模型
............................................................................................................................... 15
2.1
一个超多线程协处理器
.....................................................................................................................15
2.2
线程批处理
.......................................................................................................................................15
2.2.1
线程块
..........................................................................................................................................16
2.2.2
线程块栅格
...................................................................................................................................16
2.3
内存模型
...........................................................................................................................................17
Chapter3
硬件实现
................................................................................................................................18
3.1
一组带有
on-chip
共享内存的
SIMD
多处理器
....................................................................................18
3.2
执行模式
...........................................................................................................................................19
3.3
计算兼容性
........................................................................................................................................20
3.4
多设备
...............................................................................................................................................20
3.5
模式切换
...........................................................................................................................................20
Chapter4
应用程序编程接口(
API
)
...................................................................................................21
4.1
一个
C
语言的扩展
..............................................................................................................................21
4.2
语言扩展
...........................................................................................................................................21
4.2.1
函数类型限定词
......................................................................................................................... 22
4.2.1.1 __device__..................................................................................................................22
4.2.1.2 __global__..........................................................................................................22
4.2.1.3 __host__............................................................................................................22
4.2.1.4
限定
....................................................................................................................22
4.2.2
变量类型限定词
........................................................................................................................23
4.2.2.1 __device__..................................................................................................23
4.2.2.2 __constant__ .......................................................................................................23
4.2.2.3 __shared__............................................................................................................23
4.2.2.4
限定
.............................................................................................................................24
4.2.3
执行配置
...........................................................................................................................25
4.2.4
内置变量
..............................................................................................................................26
4.2.4.1 gridDim.............................................................................................................................. 26
4.2.4.2 blockIdx...........................................................................................................................26
4.2.4.3 blockDim............................................................................................................................26
- 3 -
4.2.4.4 threadIdx......................................................................................................................... 26
4.2.4.5
限定
..................................................................................................................................... 26
4.2.5 NVCC
编译
.................................................................................................................................. 26
4.2.5.1 __noinline__ ...................................................................................................................27
4.2.5.2 #pragmaunroll .................................................................................................................27
4.3
公共
Runtime
组件
........................................................................................................................... 28
4.3.1
内置矢量类型
............................................................................................................................. 28
4.3.1.1
char1,uchar1,char2,uchar2,char3,uchar3,char4,uchar4,short1,ushort1,short2,us
hort2,short3,ushort3,short4,ushort4,int1,uint1,int2,uint2,int3,uint3,int4,ui
nt4,long1,ulong1,long2,ulong2,long3,ulong3,long4,ulong4,float1,float2,float3
,float4
.
......................................................................................................................................... 28
4.3.1.2 dim3
类型
........................................................................................................................... 28
4.3.2
数学函数
..................................................................................................................................... 28
4.3.3
时间函数
..................................................................................................................................... 28
4.3.4
纹理类型
..................................................................................................................................... 29
4.3.4.1 Texture Reference
声明
.......................................................................................................29
4.3.4.2 RuntimeTexture Reference
属性
.........................................................................................30
4.3.4.3
线性内存纹理操作对比
CUDA
数组
.......................................................................................31
4.4
设备
Runtime
组件
........................................................................................................................... 31
4.4.1
数学函数
..................................................................................................................................... 31
4.4.2
同步函数
..................................................................................................................................... 31
4.4.3
类型转换函数
............................................................................................................................. 32
4.4.4 TypeCasting
函数
........................................................................................................................32
4.4.5
纹理函数
..................................................................................................................................... 33
4.4.5.1
设备内存纹理操作
................................................................................................................33
4.4.5.2 CUDA
数组纹理操作
............................................................................................................33
4.4.6
原子函数
...................................................................................................................................... 34
4.5
主机
Runtime
组件
........................................................................................................................... 34
4.5.1
公共概念
..................................................................................................................................... 35
4.5.1.1
设备
..................................................................................................................................... 35
4.5.1.2
内存
..................................................................................................................................... 35
4.5.1.3 OpenGL Interoperability ...................................................................................................... 36
4.5.1.4 Direct3D Interoperability ..................................................................................................... 36
4.5.1.5
异步的并发执行
...................................................................................................................37
- 4 -
4.5.2 RuntimeAPI.................................................................................................................................... 38
4.5.2.1
初始化
..................................................................................................................................... 38
4.5.2.2
设备管理
.................................................................................................................................. 38
4.5.2.3
内存管理
.................................................................................................................................. 39
4.5.2.4
流管理
..................................................................................................................................... 40
4.5.2.5
事件管理
.................................................................................................................................. 41
4.5.2.6 Texture Reference
管理
...........................................................................................................42
4.5.2.7 OpenGL Interoperability .......................................................................................................... 44
4.5.2.8 Direct3D Interoperability ......................................................................................................... 44
4.5.2.9
使用设备仿真方式调试
.............................................................................................................45
4.5.3
驱动
API ........................................................................................................................................ 47
4.5.3.1
初始化
..................................................................................................................................... 47
4.5.3.2
设备管理
................................................................................................................................. 47
4.5.3.3 Context
管理
............................................................................................................................. 47
4.5.3.4
模块管理
................................................................................................................................. 48
4.5.3.5
执行控制
................................................................................................................................. 49
4.5.3.6
内存管理
................................................................................................................................. 49
4.5.3.7
流管理
..................................................................................................................................... 51
4.5.3.8
事件管理
................................................................................................................................. 51
4.5.3.9 Texture Reference
管理
..........................................................................................................52
4.5.3.10 OpenGL Interoperability ...................................................................................................... 53
4.5.3.11 Direct3D Interoperability ...................................................................................................... 53
Chapter5
性能指导
................................................................................................................................ 54
5.1
指令性能
........................................................................................................................................... 54
5.1.1
指令吞吐量
................................................................................................................................. 54
5.1.1.1
算术指令
............................................................................................................................. 54
5.1.1.2
控制流指令
.......................................................................................................................... 55
5.1.1.3
内存指令
............................................................................................................................. 56
5.1.1.4
同步指令
............................................................................................................................. 56
5.1.2
内存带宽
.................................................................................................................................... 56
5.1.2.1
全局内存
............................................................................................................................. 57
5.1.2.2
常量内存
............................................................................................................................. 62
5.1.2.3
纹理内存
............................................................................................................................. 63
5.1.2.4
共享内存
............................................................................................................................. 63
5.1.2.5
寄存器
................................................................................................................................. 70
- 5 -
5.2
每个块的线程数量
.......................................................................................................................... 70
5.3
主机与设备的数据传输
....................................................................................................................71
5.4 Texture Fetch
对比全局或常驻内存读取
.........................................................................................71
5.5
性能优化策略总结
.......................................................................................................................... 72
Chapter6
矩阵乘法的例子
....................................................................................................................74
6.1
概要
............................................................................................................................................... 74
6.2
源代码
........................................................................................................................................... 76
6.3
源代码解释
.................................................................................................................................... 78
6.3.1 Mul().................................................................................................................................. 78
6.3.2 Muld()................................................................................................................................ 79
附录
A
技术规格
................................................................................................................................... 80
A.1
通用规格
....................................................................................................................................... 81
A.2
浮点数标准
.................................................................................................................................... 82
附录
B
数学函数
................................................................................................................................... 83
B.1
公共
runtime
组件
............................................................................................................................. 83
B.2
设备
runtime
组件
........................................................................................................................... 86
附录
C
原子函数
................................................................................................................................... 88
C.1
算法函数
....................................................................................................................................... 88
C.1.1 atomicAdd() ................................................................................................................... 88
C.1.2 atomicSub() ................................................................................................................... 88
C.1.3 atomicExch() ................................................................................................................. 88
C.1.4 atomicMin() ................................................................................................................... 88
C.1.5 atomicMax() ................................................................................................................... 89
C.1.6 atomicInc() ....................................................................................................................89
C.1.7 atomicDec() ................................................................................................................... 89
C.1.8 atomicCAS() ................................................................................................................... 89
C.2
位操作函数
.................................................................................................................................... 90
C.2.1 atomicAnd() ................................................................................................................... 90
C.2.2 atomicOr()..................................................................................................................….. 90
C.2.3 atomicXor() ................................................................................................................... 90
附录
D Runtime API Reference ............................................................................................................ 91
D.1
设备管理
........................................................................................................................................ 91
D.1.1 cudaGetDeviceCount().......................................................................................................91
D.1.2 cudaSetDevice()..................................................................................................................91
D.1.3 cudaGetDevice()..................................................................................................................91
D.1.4 cudaGetDeviceProperties() .............................................................................91
D.1.5 cudaChooseDevice() .......................................................................................................93
剩余135页未读,继续阅读
royweiluo
- 粉丝: 0
- 资源: 2
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论1