没有合适的资源?快使用搜索试试~ 我知道了~
PARALLEL THREAD EXECUTION ISA
需积分: 9 9 下载量 124 浏览量
2017-07-21
14:23:30
上传
评论
收藏 2.5MB PDF 举报
温馨提示
试读
277页
PARALLEL THREAD EXECUTION ISA,5.0,v5.0 | June 2017,Application Guide
资源推荐
资源详情
资源评论
PARALLEL THREAD EXECUTION ISA
v5.0 | June 2017
Application Guide
www.nvidia.com
Parallel Thread Execution ISA v5.0|ii
TABLE OF CONTENTS
Chapter1.Introduction.........................................................................................1
1.1.Scalable Data-Parallel Computing using GPUs....................................................... 1
1.2. Goals of PTX...............................................................................................2
1.3.PTX ISA Version 5.0...................................................................................... 2
1.4.Document Structure...................................................................................... 2
Chapter2.Programming Model............................................................................... 4
2.1.A Highly Multithreaded Coprocessor...................................................................4
2.2.Thread Hierarchy......................................................................................... 4
2.2.1.Cooperative Thread Arrays........................................................................ 4
2.2.2.Grid of Cooperative Thread Arrays...............................................................5
2.3.Memory Hierarchy........................................................................................ 6
Chapter3.PTX Machine Model................................................................................9
3.1.A Set of SIMT Multiprocessors with On-chip Shared Memory...................................... 9
Chapter 4. Syntax.............................................................................................. 12
4.1. Source Format........................................................................................... 12
4.2. Comments.................................................................................................12
4.3. Statements................................................................................................13
4.3.1.Directive Statements.............................................................................. 13
4.3.2.Instruction Statements............................................................................ 13
4.4. Identifiers.................................................................................................14
4.5. Constants................................................................................................. 15
4.6.Integer Constants........................................................................................15
4.6.1.Floating-Point Constants.......................................................................... 15
4.6.2.Predicate Constants............................................................................... 16
4.6.3.Constant Expressions.............................................................................. 16
4.6.4.Integer Constant Expression Evaluation........................................................ 17
4.6.5.Summary of Constant Expression Evaluation Rules........................................... 18
Chapter5.State Spaces, Types, and Variables...........................................................20
5.1. State Spaces..............................................................................................20
5.1.1.Register State Space.............................................................................. 21
5.1.2.Special Register State Space.....................................................................22
5.1.3.Constant State Space..............................................................................22
5.1.3.1.Banked Constant State Space (deprecated)..............................................22
5.1.4.Global State Space................................................................................ 23
5.1.5.Local State Space..................................................................................23
5.1.6.Parameter State Space............................................................................23
5.1.6.1.Kernel Function Parameters................................................................ 24
5.1.6.2.Kernel Function Parameter Attributes.................................................... 25
5.1.6.3.Kernel Parameter Attribute: .ptr.......................................................... 25
5.1.6.4.Device Function Parameters................................................................ 26
www.nvidia.com
Parallel Thread Execution ISA v5.0|iii
5.1.7.Shared State Space................................................................................ 27
5.1.8.Texture State Space (deprecated).............................................................. 27
5.2. Types.......................................................................................................28
5.2.1.Fundamental Types................................................................................ 28
5.2.2.Restricted Use of Sub-Word Sizes............................................................... 29
5.3.Texture Sampler and Surface Types..................................................................29
5.3.1.Texture and Surface Properties..................................................................30
5.3.2.Sampler Properties.................................................................................31
5.3.3.Channel Data Type and Channel Order Fields.................................................33
5.4. Variables.................................................................................................. 34
5.4.1.Variable Declarations..............................................................................34
5.4.2. Vectors............................................................................................... 34
5.4.3.Array Declarations................................................................................. 35
5.4.4. Initializers........................................................................................... 35
5.4.5. Alignment............................................................................................37
5.4.6.Parameterized Variable Names.................................................................. 37
5.4.7.Variable Attributes.................................................................................37
5.4.8.Variable Attribute Directive: .attribute........................................................ 38
Chapter6.Instruction Operands............................................................................ 39
6.1.Operand Type Information............................................................................. 39
6.2.Source Operands.........................................................................................39
6.3.Destination Operands................................................................................... 40
6.4.Using Addresses, Arrays, and Vectors................................................................40
6.4.1.Addresses as Operands............................................................................40
6.4.2.Arrays as Operands................................................................................ 40
6.4.3.Vectors as Operands...............................................................................41
6.4.4.Labels and Function Names as Operands...................................................... 41
6.5.Type Conversion......................................................................................... 41
6.5.1.Scalar Conversions................................................................................. 42
6.5.2.Rounding Modifiers.................................................................................42
6.6. Operand Costs........................................................................................... 43
Chapter7.Abstracting the ABI.............................................................................. 44
7.1.Function Declarations and Definitions............................................................... 44
7.1.1.Changes from PTX ISA Version 1.x.............................................................. 47
7.2.Variadic Functions.......................................................................................47
7.3. Alloca...................................................................................................... 48
Chapter8.Instruction Set.................................................................................... 50
8.1.Format and Semantics of Instruction Descriptions.................................................50
8.2.PTX Instructions......................................................................................... 50
8.3.Predicated Execution................................................................................... 51
8.3.1. Comparisons.........................................................................................51
8.3.1.1.Integer and Bit-Size Comparisons.......................................................... 51
8.3.1.2.Floating Point Comparisons................................................................. 52
www.nvidia.com
Parallel Thread Execution ISA v5.0|iv
8.3.2.Manipulating Predicates...........................................................................53
8.4.Type Information for Instructions and Operands................................................... 53
8.4.1.Operand Size Exceeding Instruction-Type Size................................................ 54
8.5.Divergence of Threads in Control Constructs.......................................................56
8.6. Semantics................................................................................................. 57
8.6.1.Machine-Specific Semantics of 16-bit Code................................................... 57
8.7. Instructions............................................................................................... 58
8.7.1.Integer Arithmetic Instructions.................................................................. 58
8.7.1.1.Integer Arithmetic Instructions: add...................................................... 58
8.7.1.2.Integer Arithmetic Instructions: sub.......................................................59
8.7.1.3.Integer Arithmetic Instructions: mul...................................................... 60
8.7.1.4.Integer Arithmetic Instructions: mad......................................................61
8.7.1.5.Integer Arithmetic Instructions: mul24................................................... 62
8.7.1.6.Integer Arithmetic Instructions: mad24...................................................63
8.7.1.7.Integer Arithmetic Instructions: sad.......................................................64
8.7.1.8.Integer Arithmetic Instructions: div....................................................... 64
8.7.1.9.Integer Arithmetic Instructions: rem...................................................... 65
8.7.1.10.Integer Arithmetic Instructions: abs..................................................... 66
8.7.1.11.Integer Arithmetic Instructions: neg..................................................... 66
8.7.1.12.Integer Arithmetic Instructions: min.....................................................67
8.7.1.13.Integer Arithmetic Instructions: max.................................................... 68
8.7.1.14.Integer Arithmetic Instructions: popc................................................... 69
8.7.1.15.Integer Arithmetic Instructions: clz......................................................69
8.7.1.16.Integer Arithmetic Instructions: bfind................................................... 70
8.7.1.17.Integer Arithmetic Instructions: brev.................................................... 71
8.7.1.18.Integer Arithmetic Instructions: bfe..................................................... 72
8.7.1.19.Integer Arithmetic Instructions: bfi...................................................... 73
8.7.1.20.Integer Arithmetic Instructions: dp4a................................................... 74
8.7.1.21.Integer Arithmetic Instructions: dp2a................................................... 75
8.7.2.Extended-Precision Integer Arithmetic Instructions.......................................... 76
8.7.2.1.Extended-Precision Arithmetic Instructions: add.cc.................................... 76
8.7.2.2.Extended-Precision Arithmetic Instructions: addc.......................................77
8.7.2.3.Extended-Precision Arithmetic Instructions: sub.cc.....................................78
8.7.2.4.Extended-Precision Arithmetic Instructions: subc....................................... 79
8.7.2.5.Extended-Precision Arithmetic Instructions: mad.cc....................................80
8.7.2.6.Extended-Precision Arithmetic Instructions: madc......................................81
8.7.3.Floating-Point Instructions........................................................................82
8.7.3.1.Floating Point Instructions: testp.......................................................... 83
8.7.3.2.Floating Point Instructions: copysign...................................................... 84
8.7.3.3.Floating Point Instructions: add............................................................85
8.7.3.4.Floating Point Instructions: sub............................................................ 86
8.7.3.5.Floating Point Instructions: mul............................................................88
8.7.3.6.Floating Point Instructions: fma............................................................89
www.nvidia.com
Parallel Thread Execution ISA v5.0|v
8.7.3.7.Floating Point Instructions: mad........................................................... 91
8.7.3.8.Floating Point Instructions: div.............................................................93
8.7.3.9.Floating Point Instructions: abs............................................................ 95
8.7.3.10.Floating Point Instructions: neg...........................................................96
8.7.3.11.Floating Point Instructions: min.......................................................... 97
8.7.3.12.Floating Point Instructions: max..........................................................98
8.7.3.13.Floating Point Instructions: rcp........................................................... 99
8.7.3.14.Floating Point Instructions: rcp.approx.ftz.f64....................................... 100
8.7.3.15.Floating Point Instructions: sqrt.........................................................102
8.7.3.16.Floating Point Instructions: rsqrt........................................................103
8.7.3.17.Floating Point Instructions: rsqrt.approx.ftz.f64..................................... 105
8.7.3.18.Floating Point Instructions: sin.......................................................... 106
8.7.3.19.Floating Point Instructions: cos..........................................................107
8.7.3.20.Floating Point Instructions: lg2.......................................................... 108
8.7.3.21.Floating Point Instructions: ex2......................................................... 110
8.7.4.Half Precision Floating-Point Instructions.................................................... 111
8.7.4.1.Half Precision Floating Point Instructions: add......................................... 111
8.7.4.2.Half Precision Floating Point Instructions: sub......................................... 113
8.7.4.3.Half Precision Floating Point Instructions: mul.........................................114
8.7.4.4.Half Precision Floating Point Instructions: fma.........................................116
8.7.5.Comparison and Selection Instructions....................................................... 117
8.7.5.1.Comparison and Selection Instructions: set.............................................118
8.7.5.2.Comparison and Selection Instructions: setp........................................... 119
8.7.5.3.Comparison and Selection Instructions: selp........................................... 121
8.7.5.4.Comparison and Selection Instructions: slct............................................ 122
8.7.6.Half Precision Comparison Instructions....................................................... 123
8.7.6.1.Half Precision Comparison Instructions: set............................................ 123
8.7.6.2.Half Precision Comparison Instructions: setp........................................... 125
8.7.7.Logic and Shift Instructions.....................................................................126
8.7.7.1.Logic and Shift Instructions: and......................................................... 126
8.7.7.2.Logic and Shift Instructions: or........................................................... 127
8.7.7.3.Logic and Shift Instructions: xor..........................................................128
8.7.7.4.Logic and Shift Instructions: not......................................................... 128
8.7.7.5.Logic and Shift Instructions: cnot........................................................ 129
8.7.7.6.Logic and Shift Instructions: lop3........................................................ 130
8.7.7.7.Logic and Shift Instructions: shf..........................................................131
8.7.7.8.Logic and Shift Instructions: shl.......................................................... 133
8.7.7.9.Logic and Shift Instructions: shr..........................................................134
8.7.8.Data Movement and Conversion Instructions.................................................134
8.7.8.1.Cache Operators.............................................................................135
8.7.8.2.Data Movement and Conversion Instructions: mov.................................... 136
8.7.8.3.Data Movement and Conversion Instructions: mov.................................... 138
8.7.8.4.Data Movement and Conversion Instructions: shfl..................................... 139
剩余276页未读,继续阅读
资源评论
Yingyue219
- 粉丝: 18
- 资源: 19
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功