CUDA Programming Model Overview
© NVIDIA Corporation 2006
2
CUDA Programming Model
Parallel portions of an application are executed on
the device as kernels
One kernel is executed at a time
Many threads execute each kernel
Differences between CUDA and CPU threads
CUDA threads are extremely lightweight
Very little creation overhead
Instant switching
CUDA uses 1000s of threads to achieve efficiency
Multi-core CPUs can use only a few
© NVIDIA Corporation 2006
3
Programming Model
A kernel is executed as a
grid of thread blocks
A thread block is a batch
of threads that can
cooperate with each
other by:
Sharing data through
shared memory
Synchronizing their
execution
Threads from different
blocks cannot cooperate
Host
Kernel
1
Kernel
2
Device
Grid 1
Block
(0, 0)
Block
(1, 0)
Block
(2, 0)
Block
(0, 1)
Block
(1, 1)
Block
(2, 1)
Grid 2
Block (1, 1)
Thread
(0, 1)
Thread
(1, 1)
Thread
(2, 1)
Thread
(3, 1)
Thread
(4, 1)
Thread
(0, 2)
Thread
(1, 2)
Thread
(2, 2)
Thread
(3, 2)
Thread
(4, 2)
Thread
(0, 0)
Thread
(1, 0)
Thread
(2, 0)
Thread
(3, 0)
Thread
(4, 0)
© NVIDIA Corporation 2006
4
Processors execute computing threads
Thread Execution Manager issues threads
128 Thread Processors
Parallel Data Cache accelerates processing
G80 Device
Thread Execution Manager
Input Assembler
Host
Parallel
Data
Cache
Global Memory
Load/store
Parallel
Data
Cache
Thread Processors
Parallel
Data
Cache
Parallel
Data
Cache
Thread Processors
Parallel
Data
Cache
Parallel
Data
Cache
Thread Processors
Parallel
Data
Cache
Parallel
Data
Cache
Thread Processors
Parallel
Data
Cache
Parallel
Data
Cache
Thread Processors
Parallel
Data
Cache
Parallel
Data
Cache
Thread Processors
Parallel
Data
Cache
Parallel
Data
Cache
Thread Processors
Parallel
Data
Cache
Parallel
Data
Cache
Thread Processors
© NVIDIA Corporation 2006
5
Programming Model
Threads and blocks have IDs
So each thread can decide
what data to work on
Block ID: 1D or 2D
Thread ID: 1D, 2D, or 3D
Simplifies memory
addressing when processing
multidimensional data
Image processing
Solving PDEs on volumes
Device
Grid 1
Block
(0, 0)
Block
(1, 0)
Block
(2, 0)
Block
(0, 1)
Block
(1, 1)
Block
(2, 1)
Block (1, 1)
Thread
(0, 1)
Thread
(1, 1)
Thread
(2, 1)
Thread
(3, 1)
Thread
(4, 1)
Thread
(0, 2)
Thread
(1, 2)
Thread
(2, 2)
Thread
(3, 2)
Thread
(4, 2)
Thread
(0, 0)
Thread
(1, 0)
Thread
(2, 0)
Thread
(3, 0)
Thread
(4, 0)
评论0