nvcc_2.3.rar_cuda_nvcc_www.nvcc2.c_编程中的nvcc资源-CSDN文库

共1个文件

pdf：1个

版权申诉

5星 · 超过95%的资源 29 浏览量 2022-09-14 16:58:59 上传评论收藏 637KB RAR 举报

《CUDA并行编程环境的compiler driver———nvcc详解》在现代计算机科学中，GPU（图形处理器）已经不再仅仅局限于图形渲染，而是被广泛应用于高性能计算领域，尤其是在科学计算、机器学习和人工智能中，CUDA（Compute Unified Device Architecture）成为了GPU编程的重要平台。CUDA提供了一套全面的开发工具，其中nvcc是核心的编译器驱动，它使得开发者能够利用C++、Fortran等语言编写高效能的GPU代码。本文将深入探讨nvcc及其在编程中的应用。 nvcc，全称NVIDIA CUDA Compiler Driver，是CUDA工具链的重要组成部分，负责将源代码转换为可以在CUDA设备上运行的二进制代码。它的主要功能包括预处理、编译、汇编和链接。通过nvcc，开发者可以充分利用GPU的并行计算能力，提高程序的执行效率。 1. 预处理：nvcc首先对源代码进行预处理，处理宏定义、条件编译指令等，这一阶段类似于传统的C/C++编译器。 2. 编译：预处理后的代码将被转化为中间表示(IR)，这个过程会检查语法和类型，同时进行优化，如常量折叠、死代码消除等。 3. 汇编：IR代码经过汇编转换成低级的汇编代码，这些代码针对GPU的流式多处理器(SM)进行了优化。 4. 链接：nvcc会链接各个编译单元，生成可执行文件。这个阶段还会处理库的引用，确保所有依赖项都被正确地链接。在编程实践中，使用nvcc需要注意以下几点： 1. 命令行选项：nvcc支持丰富的命令行选项，例如`-I`指定头文件搜索路径，`-L`指定库文件搜索路径，`-l`链接特定库，以及`-arch`指定目标架构等。 2. 并行计算模型：CUDA编程基于线程块和网格的概念，开发者需要理解如何组织线程以实现高效的并行计算。 3. 内存管理：CUDA程序需要处理设备内存、全局内存、共享内存和寄存器等多种内存类型，合理使用内存可以显著提升性能。 4. CUDA C++特性：nvcc支持一些特殊的C++特性，如设备函数、常量内存、纹理内存和表面内存等，这些特性有助于更好地利用GPU资源。 5. 错误检查与调试：nvcc生成的可执行文件可以包含CUDA运行时错误检查代码，便于在运行时发现问题。同时，NVIDIA的Nsight工具集提供了强大的调试能力。 6. 性能优化：理解并利用nvcc提供的性能分析工具，如nvprof，可以帮助开发者识别瓶颈并进行针对性优化。通过深入理解和熟练使用nvcc，开发者能够编写出高效的CUDA程序，充分利用GPU的并行计算能力，解决复杂计算问题。然而，CUDA编程也是一项技术性极强的工作，需要不断学习和实践，以掌握其精髓。《nvcc_2.3.pdf》文档提供了详细的指导，是学习nvcc的宝贵资源。

资源推荐

资源详情

资源评论

收起资源包目录

nvcc_2.3.rar （1个子文件）

nvcc_2.3.pdf 767KB

Last modified on: <07-29-2009>

The CUDA

Compiler Driver

NVCC

nvcc.pdf v2.3 1

July 2009

Introduction

Overview

CUDA programming model

The CUDA Toolkit targets a class of applications whose control part runs as a

process on a general purpose computer (Linux, Windows), and which use one or

more NVIDIA GPUs as coprocessors for accelerating SIMD parallel jobs. Such

jobs are „self- contained‟, in the sense that they can be executed and completed by a

batch of GPU threads entirely without intervention by the „host‟ process, thereby

gaining optimal benefit from the parallel graphics hardware.

Dispatching GPU jobs by the host process is supported by the CUDA Toolkit in

the form of remote procedure calling. The GPU code is implemented as a collection

of functions in a language that is essentially „C‟, but with some annotations for

distinguishing them from the host code, plus annotations for distinguishing

different types of data memory that exists on the GPU. Such functions may have

parameters, and they can be „called‟ using a syntax that is very similar to regular C

function calling, but slightly extended for being able to specify the matrix of GPU

threads that must execute the „called‟ function. During its life time, the host process

may dispatch many parallel GPU tasks. See Figure 1.

CUDA sources

Hence, source files for CUDA applications consist of a mixture of conventional

C++ „host‟ code, plus GPU „device‟ (i.e. GPU-) functions. The CUDA compilation

trajectory separates the device functions from the host code, compiles the device

functions using proprietary NVIDIA compilers/assemblers, compiles the host code

using a general purpose C/C++ compiler that is available on the host platform, and

afterwards embeds the compiled GPU functions as load images in the host object

file. In the linking stage, specific CUDA runtime libraries are added for supporting

remote SIMD procedure calling and for providing explicit GPU manipulation such

as allocation of GPU memory buffers and host-GPU data transfer.

Purpose of nvcc

This compilation trajectory involves several splitting, compilation, preprocessing,

and merging steps for each CUDA source file, and several of these steps are subtly

different for different modes of CUDA compilation (such as compilation for device

emulation, or the generation of device code repositories). It is the purpose of the

CUDA compiler driver nvcc to hide the intricate details of CUDA compilation from

developers. Additionally, instead of being a specific CUDA compilation driver,

nvcc mimics the behavior of the GNU compiler gcc: it accepts a range of

conventional compiler options, such as for defining macros and include/library

The CUDA compiler driver nvcc

nvcc.pdf v2.3 2

July 2009

paths, and for steering the compilation process. All non-CUDA compilation steps

are forwarded to a general purpose C compiler that is supported by nvcc, and on

Windos platforms, where this compiler is an instance of the Microsoft Visual Studio

compiler, nvcc will translate its options into appropriate „cl‟ command syntax. This

extended behavior plus „cl‟ option translation is intended for support of portable

application build and make scripts across Linux and Windows platforms.

Supported host compilers

Nvcc will use the following compilers for host code compilation:

On Linux platforms: The GNU compiler, gcc

On Windows platforms: The Microsoft Visual Studio compiler, cl

On both platforms, the compiler found on the current execution search path will be

used, unless nvcc option –compiler-bindir is specified (see page 13).

Supported build environments

Nvcc can be used in the following build environments:

Linux Any shell

Windows DOS shell

Windows CygWin shells, use nvcc‟s drive prefix options (see page 14).

Windows MinGW shells, use nvcc‟s drive prefix options (see page 14).

Although a variety of POSIX style shells is supported on Windows, nvcc will still

assume the Microsoft Visual Studio compiler for host compilation. Use of gcc is not

supported on Windows.

The CUDA compiler driver nvcc

nvcc.pdf v2.3 3

July 2009

#define ACOS_TESTS (5)

#define ACOS_THREAD_CNT (128)

#define ACOS_CTA_CNT (96)

struct acosParams {

float *arg;

float *res;

int n;

};

__global__ void acos_main (struct acosParams parms)

{

int i;

int totalThreads = gridDim.x * blockDim.x;

int ctaStart = blockDim.x * blockIdx.x;

for (i = ctaStart + threadIdx.x; i < parms.n; i += totalThreads) {

parms.res[i] = acosf(parms.arg[i]);

}

int main (int argc, char *argv[])

{

volatile float acosRef;

float* acosRes = 0;

float* acosArg = 0;

float* arg = 0;

float* res = 0;

float t;

struct acosParams funcParams;

int errors;

int i;

cudaMalloc ((void **)&acosArg, ACOS_TESTS * sizeof(float));

cudaMalloc ((void **)&acosRes, ACOS_TESTS * sizeof(float));

arg = (float *) malloc (ACOS_TESTS * sizeof(arg[0]));

res = (float *) malloc (ACOS_TESTS * sizeof(res[0]));

cudaMemcpy (acosArg, arg, ACOS_TESTS * sizeof(arg[0]),

cudaMemcpyHostToDevice);

funcParams.res = acosRes;

funcParams.arg = acosArg;

funcParams.n = opts.n;

acos_main<<<ACOS_CTA_CNT,ACOS_THREAD_CNT>>>(funcParams);

cudaMemcpy (res, acosRes, ACOS_TESTS * sizeof(res[0]),

cudaMemcpyDeviceToHost);

Figure 1: Example of CUDA source file

The CUDA compiler driver nvcc

nvcc.pdf v2.3 4

July 2009

Compilation Phases

Nvcc identification macro

Nvcc predefines the macro __CUDACC__. This macro can be used in sources to

test whether they are currently being compiled by nvcc.

Nvcc phases

A compilation phase is the a logical translation step that can be selected by

command line options to nvcc. A single compilation phase can still be broken up by

nvcc into smaller steps, but these smaller steps are „just‟ implementations of the

phase: they depend on seemingly arbitrary capabilities of the internal tools that nvcc

uses, and all of these internals may change with a new release of the CUDA Toolkit

Hence, only compilation phases are stable across releases, and although nvcc

provides options to display the compilation steps that it executes, these are for

debugging purposes only and must not be copied and used into build scripts.

Nvcc phases are selected by a combination of command line options and input file

name suffixes, and the execution of these phases may be modified by other command

line options. In phase selection, the input file suffix defines the phase input, while

the command line option defines the required output of the phase.

The following paragraphs will list the recognized file name suffixes and the

supported compilation phases. A full explanation of the nvcc command line options

can be found in the next chapter.

Supported input file suffixes

The following table defines how nvcc interprets its input files

.cu

CUDA source file, containing host code and device functions

.cup

Preprocessed CUDA source file, containing host code and device functions

„C‟ source file

.cc, .cxx, .cpp

C++ source file

.gpu

Gpu intermediate file (see 0)

.ptx

Ptx intermeditate assembly file (see 0)

评论收藏

内容反馈

版权申诉

qq_38862411

2023-10-12

资源有很好的参考价值，总算找到了自己需要的资源啦。

朱moyimi

粉丝: 78
资源: 1万+

nvcc_2.3.rar_cuda_nvcc_www.nvcc2.c_编程中的nvcc

The CUDA Compiler Driver NVCC Version 2.3

CUDA_NVCC_doc

CUDA_Compiler_Driver_NVCC__

NVIDIA CUDA C Programming Best Practices Guide Version 2.3

nv-cuda编程手册中文版

bazel_nvcc:在bazel中将nvcc编译器用于cuda

NVIDIA 显卡匹配的CUDA 驱动

下载cudart64_101.dll

cuda_10.0.130_win10_network.rar

DLL_Test.rar_cuda_cuda dll_hkbu.exe

dlerror: cublas64_10.dll not found等文件资源

cuda容易缺少的dll文件.rar

CUDA_Getting_Started_Guide.rar_CUDA书_CUDA并行计算_Getting Started

cmake_cuda_patch.rar

cuda_10.1.105_418.96_win10.zip

CUDA_Installation_Guide_Linux.pdf

simulink&&CUDA.zip_SIMULINK_cuda_rtw_continuous.h_调用simulink

CUDA_C_Programming_Guide.pdf

TensorRT-7.2.2.3 win10系统

cuda_debug.zip

mmdetection_ops.zip

WINDOWS10下安装CUDA 11.0版本下载

2024.1.8新版CUDA 官方文档CUDA_C_Programming_Guide.pdf

CUDA_C_Programming_Guide v10.0.pdf

cuda_11.1.0_456.43_win10.7z

CUDA_getting_started_linux.pdf

cuda90-cudnn70-1.txt

CUDA.rar_CUDA ppt_GPU_cuda_cuda学习_cuda编程，PPT

最新资源