CUDA C++ Programming Guide PG-02829-001_v11.2|v
3.4.Compute Modes.................................................................................................................... 102
3.5.Mode Switches...................................................................................................................... 102
3.6.Tesla Compute Cluster Mode for Windows.........................................................................103
Chapter4.Hardware Implementation............................................................................. 104
4.1.SIMT Architecture................................................................................................................. 104
4.2.Hardware Multithreading......................................................................................................106
Chapter5.Performance Guidelines.................................................................................107
5.1.Overall Performance Optimization Strategies.....................................................................107
5.2.Maximize Utilization.............................................................................................................. 107
5.2.1.Application Level.............................................................................................................107
5.2.2.Device Level....................................................................................................................108
5.2.3.Multiprocessor Level......................................................................................................108
5.2.3.1.Occupancy Calculator..............................................................................................110
5.3.Maximize Memory Throughput.............................................................................................111
5.3.1.Data Transfer between Host and Device...................................................................... 112
5.3.2.Device Memory Accesses...............................................................................................113
5.4.Maximize Instruction Throughput........................................................................................ 117
5.4.1.Arithmetic Instructions...................................................................................................117
5.4.2.Control Flow Instructions.............................................................................................. 122
5.4.3.Synchronization Instruction........................................................................................... 123
AppendixA.CUDA-Enabled GPUs....................................................................................124
AppendixB.C++ Language Extensions............................................................................125
B.1.Function Execution Space Specifiers.................................................................................. 125
B.1.1.__global__.......................................................................................................................125
B.1.2.__device__...................................................................................................................... 125
B.1.3.__host__..........................................................................................................................125
B.1.4.Undefined behavior........................................................................................................ 126
B.1.5.__noinline__ and __forceinline__..................................................................................126
B.2.Variable Memory Space Specifiers......................................................................................127
B.2.1.__device__...................................................................................................................... 127
B.2.2.__constant__.................................................................................................................. 127
B.2.3.__shared__..................................................................................................................... 127
B.2.4.__managed__................................................................................................................. 128
B.2.5.__restrict__.....................................................................................................................128
B.3.Built-in Vector Types............................................................................................................130
B.3.1.char, short, int, long, longlong, float, double...............................................................130
B.3.2.dim3................................................................................................................................ 131