Last modified on: <07-29-2009>
The CUDA
Compiler Driver
NVCC
nvcc.pdf v2.3 1
July 2009
Introduction
Overview
CUDA programming model
The CUDA Toolkit targets a class of applications whose control part runs as a
process on a general purpose computer (Linux, Windows), and which use one or
more NVIDIA GPUs as coprocessors for accelerating SIMD parallel jobs. Such
jobs are „self- contained‟, in the sense that they can be executed and completed by a
batch of GPU threads entirely without intervention by the „host‟ process, thereby
gaining optimal benefit from the parallel graphics hardware.
Dispatching GPU jobs by the host process is supported by the CUDA Toolkit in
the form of remote procedure calling. The GPU code is implemented as a collection
of functions in a language that is essentially „C‟, but with some annotations for
distinguishing them from the host code, plus annotations for distinguishing
different types of data memory that exists on the GPU. Such functions may have
parameters, and they can be „called‟ using a syntax that is very similar to regular C
function calling, but slightly extended for being able to specify the matrix of GPU
threads that must execute the „called‟ function. During its life time, the host process
may dispatch many parallel GPU tasks. See Figure 1.
CUDA sources
Hence, source files for CUDA applications consist of a mixture of conventional
C++ „host‟ code, plus GPU „device‟ (i.e. GPU-) functions. The CUDA compilation
trajectory separates the device functions from the host code, compiles the device
functions using proprietary NVIDIA compilers/assemblers, compiles the host code
using a general purpose C/C++ compiler that is available on the host platform, and
afterwards embeds the compiled GPU functions as load images in the host object
file. In the linking stage, specific CUDA runtime libraries are added for supporting
remote SIMD procedure calling and for providing explicit GPU manipulation such
as allocation of GPU memory buffers and host-GPU data transfer.
Purpose of nvcc
This compilation trajectory involves several splitting, compilation, preprocessing,
and merging steps for each CUDA source file, and several of these steps are subtly
different for different modes of CUDA compilation (such as compilation for device
emulation, or the generation of device code repositories). It is the purpose of the
CUDA compiler driver nvcc to hide the intricate details of CUDA compilation from
developers. Additionally, instead of being a specific CUDA compilation driver,
nvcc mimics the behavior of the GNU compiler gcc: it accepts a range of
conventional compiler options, such as for defining macros and include/library
The CUDA compiler driver nvcc
nvcc.pdf v2.3 2
July 2009
paths, and for steering the compilation process. All non-CUDA compilation steps
are forwarded to a general purpose C compiler that is supported by nvcc, and on
Windos platforms, where this compiler is an instance of the Microsoft Visual Studio
compiler, nvcc will translate its options into appropriate „cl‟ command syntax. This
extended behavior plus „cl‟ option translation is intended for support of portable
application build and make scripts across Linux and Windows platforms.
Supported host compilers
Nvcc will use the following compilers for host code compilation:
On Linux platforms: The GNU compiler, gcc
On Windows platforms: The Microsoft Visual Studio compiler, cl
On both platforms, the compiler found on the current execution search path will be
used, unless nvcc option –compiler-bindir is specified (see page 13).
Supported build environments
Nvcc can be used in the following build environments:
Linux Any shell
Windows DOS shell
Windows CygWin shells, use nvcc‟s drive prefix options (see page 14).
Windows MinGW shells, use nvcc‟s drive prefix options (see page 14).
Although a variety of POSIX style shells is supported on Windows, nvcc will still
assume the Microsoft Visual Studio compiler for host compilation. Use of gcc is not
supported on Windows.
The CUDA compiler driver nvcc
nvcc.pdf v2.3 3
July 2009
#define ACOS_TESTS (5)
#define ACOS_THREAD_CNT (128)
#define ACOS_CTA_CNT (96)
struct acosParams {
float *arg;
float *res;
int n;
};
__global__ void acos_main (struct acosParams parms)
{
int i;
int totalThreads = gridDim.x * blockDim.x;
int ctaStart = blockDim.x * blockIdx.x;
for (i = ctaStart + threadIdx.x; i < parms.n; i += totalThreads) {
parms.res[i] = acosf(parms.arg[i]);
}
}
int main (int argc, char *argv[])
{
volatile float acosRef;
float* acosRes = 0;
float* acosArg = 0;
float* arg = 0;
float* res = 0;
float t;
struct acosParams funcParams;
int errors;
int i;
cudaMalloc ((void **)&acosArg, ACOS_TESTS * sizeof(float));
cudaMalloc ((void **)&acosRes, ACOS_TESTS * sizeof(float));
arg = (float *) malloc (ACOS_TESTS * sizeof(arg[0]));
res = (float *) malloc (ACOS_TESTS * sizeof(res[0]));
cudaMemcpy (acosArg, arg, ACOS_TESTS * sizeof(arg[0]),
cudaMemcpyHostToDevice);
funcParams.res = acosRes;
funcParams.arg = acosArg;
funcParams.n = opts.n;
acos_main<<<ACOS_CTA_CNT,ACOS_THREAD_CNT>>>(funcParams);
cudaMemcpy (res, acosRes, ACOS_TESTS * sizeof(res[0]),
cudaMemcpyDeviceToHost);
Figure 1: Example of CUDA source file
The CUDA compiler driver nvcc
nvcc.pdf v2.3 4
July 2009
Compilation Phases
Nvcc identification macro
Nvcc predefines the macro __CUDACC__. This macro can be used in sources to
test whether they are currently being compiled by nvcc.
Nvcc phases
A compilation phase is the a logical translation step that can be selected by
command line options to nvcc. A single compilation phase can still be broken up by
nvcc into smaller steps, but these smaller steps are „just‟ implementations of the
phase: they depend on seemingly arbitrary capabilities of the internal tools that nvcc
uses, and all of these internals may change with a new release of the CUDA Toolkit
Hence, only compilation phases are stable across releases, and although nvcc
provides options to display the compilation steps that it executes, these are for
debugging purposes only and must not be copied and used into build scripts.
Nvcc phases are selected by a combination of command line options and input file
name suffixes, and the execution of these phases may be modified by other command
line options. In phase selection, the input file suffix defines the phase input, while
the command line option defines the required output of the phase.
The following paragraphs will list the recognized file name suffixes and the
supported compilation phases. A full explanation of the nvcc command line options
can be found in the next chapter.
Supported input file suffixes
The following table defines how nvcc interprets its input files
.cu
CUDA source file, containing host code and device functions
.cup
Preprocessed CUDA source file, containing host code and device functions
.c
„C‟ source file
.cc, .cxx, .cpp
C++ source file
.gpu
Gpu intermediate file (see 0)
.ptx
Ptx intermeditate assembly file (see 0)