没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
Version 3.1.1
7/21/2010
NVIDIA CUDA™
NVIDIA CUDA C
Programming Guide
ii CUDA C Programming Guide Version 3.1.1
Changes from Version 3.1
Removed from Sections 3.1.6 and 5.2.3 the paragraph about loading 32-bit
device code from 64-bit host code as this capability will no longer be supported
in the next toolkit release.
In Section 3.2.6.3, removed the reference to the canMapHostMemory property
and mentioned that all devices of compute capability greater than 1.0 now
support mapped page-locked host memory.
Mentioned in Section 3.2.7.1 that host device memory copies of a memory
block of 64 KB or less are asynchronous.
Fixed the maximum size of a 3D texture reference for devices of compute
capability 2.0 (2048 instead of 4096) in Section G.1.
Updated the paragraph about __fdividef(x,y) in Section C.2.1 to clarify
behavior depending on compute capability and compilation flag.
CUDA C Programming Guide Version 3.1.1 iii
Table of Contents
Chapter 1. Introduction ................................................................................... 1
1.1 From Graphics Processing to General-Purpose Parallel Computing ................... 1
1.2 CUDA™: a General-Purpose Parallel Computing Architecture .......................... 3
1.3 A Scalable Programming Model .................................................................... 4
1.4 Document’s Structure ................................................................................. 6
Chapter 2. Programming Model ....................................................................... 7
2.1 Kernels ...................................................................................................... 7
2.2 Thread Hierarchy ........................................................................................ 8
2.3 Memory Hierarchy .................................................................................... 10
2.4 Heterogeneous Programming .................................................................... 11
2.5 Compute Capability ................................................................................... 14
Chapter 3. Programming Interface ................................................................ 15
3.1 Compilation with NVCC ............................................................................. 15
3.1.1 Compilation Workflow ......................................................................... 16
3.1.2 Binary Compatibility ........................................................................... 16
3.1.3 PTX Compatibility ............................................................................... 16
3.1.4 Application Compatibility ..................................................................... 17
3.1.5 C/C++ Compatibility .......................................................................... 18
3.1.6 64-Bit Compatibility ............................................................................ 18
3.2 CUDA C ................................................................................................... 18
3.2.1 Device Memory .................................................................................. 19
3.2.2 Shared Memory ................................................................................. 21
3.2.3 Multiple Devices ................................................................................. 28
3.2.4 Texture Memory ................................................................................ 29
3.2.4.1 Texture Reference Declaration ...................................................... 30
3.2.4.2 Runtime Texture Reference Attributes ........................................... 30
3.2.4.3 Texture Binding ........................................................................... 31
3.2.5 Surface Memory ................................................................................. 34
iv CUDA C Programming Guide Version 3.1.1
3.2.6 Page-Locked Host Memory .................................................................. 35
3.2.6.1 Portable Memory ......................................................................... 36
3.2.6.2 Write-Combining Memory ............................................................. 36
3.2.6.3 Mapped Memory .......................................................................... 36
3.2.7 Asynchronous Concurrent Execution .................................................... 37
3.2.7.1 Concurrent Execution between Host and Device ............................. 37
3.2.7.2 Overlap of Data Transfer and Kernel Execution .............................. 37
3.2.7.3 Concurrent Kernel Execution ........................................................ 37
3.2.7.4 Concurrent Data Transfers ........................................................... 38
3.2.7.5 Stream ....................................................................................... 38
3.2.7.6 Event ......................................................................................... 39
3.2.7.7 Synchronous Calls ....................................................................... 40
3.2.8 Graphics Interoperability ..................................................................... 40
3.2.8.1 OpenGL Interoperability ............................................................... 40
3.2.8.2 Direct3D Interoperability .............................................................. 43
3.2.9 Error Handling ................................................................................... 49
3.3 Driver API ................................................................................................ 50
3.3.1 Context ............................................................................................. 52
3.3.2 Module .............................................................................................. 53
3.3.3 Kernel Execution ................................................................................ 53
3.3.4 Device Memory .................................................................................. 55
3.3.5 Shared Memory ................................................................................. 58
3.3.6 Multiple Devices ................................................................................. 59
3.3.7 Texture Memory ................................................................................ 60
3.3.8 Surface Memory ................................................................................. 61
3.3.9 Page-Locked Host Memory .................................................................. 63
3.3.10 Asynchronous Concurrent Execution .................................................... 63
3.3.10.1 Stream ....................................................................................... 63
3.3.10.2 Event Management ...................................................................... 64
3.3.10.3 Synchronous Calls ....................................................................... 65
3.3.11 Graphics Interoperability ..................................................................... 65
3.3.11.1 OpenGL Interoperability ............................................................... 66
3.3.11.2 Direct3D Interoperability .............................................................. 68
CUDA C Programming Guide Version 3.1.1 v
3.3.12 Error Handling ................................................................................... 74
3.4 Interoperability between Runtime and Driver APIs ....................................... 75
3.5 Versioning and Compatibility...................................................................... 75
3.6 Compute Modes ....................................................................................... 76
3.7 Mode Switches ......................................................................................... 77
Chapter 4. Hardware Implementation ........................................................... 79
4.1 SIMT Architecture ..................................................................................... 79
4.2 Hardware Multithreading ........................................................................... 80
4.3 Multiple Devices ....................................................................................... 81
Chapter 5. Performance Guidelines ............................................................... 83
5.1 Overall Performance Optimization Strategies ............................................... 83
5.2 Maximize Utilization .................................................................................. 83
5.2.1 Application Level ................................................................................ 83
5.2.2 Device Level ...................................................................................... 84
5.2.3 Multiprocessor Level ........................................................................... 84
5.3 Maximize Memory Throughput ................................................................... 86
5.3.1 Data Transfer between Host and Device .............................................. 87
5.3.2 Device Memory Accesses .................................................................... 87
5.3.2.1 Global Memory ............................................................................ 88
5.3.2.2 Local Memory .............................................................................. 89
5.3.2.3 Shared Memory ........................................................................... 90
5.3.2.4 Constant Memory ........................................................................ 90
5.3.2.5 Texture Memory .......................................................................... 91
5.4 Maximize Instruction Throughput ............................................................... 91
5.4.1 Arithmetic Instructions ....................................................................... 92
5.4.2 Control Flow Instructions .................................................................... 94
5.4.3 Synchronization Instruction ................................................................. 95
Appendix A. CUDA-Enabled GPUs .................................................................. 97
Appendix B. C Language Extensions ............................................................ 101
B.1 Function Type Qualifiers .......................................................................... 101
B.1.1 __device__ ...................................................................................... 101
B.1.2 __global__ ...................................................................................... 101
B.1.3 __host__ ......................................................................................... 101
剩余172页未读,继续阅读
资源评论
- 元气少女缘结神2017-06-22Nvidia的OpenCL代码很渣 最好看AMD的
- lantuling2013-10-25不是很清楚,看了之后没有什么帮助
- kaixingua2015-05-23CUDA正在学习,这个需要参考参考
- even12342016-12-21质量还可以吧,谢谢分享
- dreamcraneleaf2015-04-18内容不错,可以看,不过就是不太清楚
x845311724
- 粉丝: 1
- 资源: 5
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 19ec93431a5b148599c01eda2a0de94f
- 一个简单的Python计算器程序 这个程序将使用基本的Python语法和功能,实现加、减、乘、除四种基本运算
- 三菱PLC例程源码FX与日立SJ300变频通讯,有注解
- Ai智能写作文章助手一键生成原创文章文案,支持安卓,PC,小程序
- 三菱PLC例程源码FX和EMERSON通讯案例FxModbustest
- 三菱PLC例程源码FX和EMERSONModbus通讯案例
- 推荐GitKraken - 超好用的 Git 可视化工具
- 一个简单的Python计算器程序案例,目的是用Python最简单的写法让大家明白Python的语法怎么用
- 三菱PLC例程源码FX防洪闸门控制程序
- pyecharts操作3.ipynb
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功