V
V
e
e
r
r
s
s
i
i
o
o
n
n
2
2
.
.
0
0
.
.
1
1
–
–
M
M
a
a
y
y
2
2
0
0
0
0
4
4
2
Notice
ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS,
DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY,
“MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES,
EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE
MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF
NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR
PURPOSE.
Information furnished is believed to be accurate and reliable. However, NVIDIA
Corporation assumes no responsibility for the consequences of use of such
information or for any infringement of patents or other rights of third parties that
may result from its use. No license is granted by implication or otherwise under
any patent or patent rights of NVIDIA Corporation. Specifications mentioned in this
publication are subject to change without notice. This publication supersedes and
replaces all information previously supplied. NVIDIA Corporation products are not
authorized for use as critical components in life support devices or systems without
express written approval of NVIDIA Corporation.
Trademarks
NVIDIA, the NVIDIA logo, GeForce, and NVIDIA Quadro are registered trademarks
of NVIDIA Corporation. Other company and product names may be trademarks of
the respective companies with which they are associated.
Copyright
© 2004 by NVIDIA Corporation. All rights reserved.
NVIDIA GPU Programming Guide
3
Table of Contents
Chapter 1. About This Document ..................................................................6
1.1. Introduction.............................................................................6
1.2. Sending Feedback .................................................................... 7
Chapter 2. How to Optimize Your Application ..............................................8
2.1. Making Accurate Measurements ................................................. 8
2.2. Finding the Bottleneck .............................................................. 9
2.2.1. Understanding Bottlenecks 9
2.2.2. Basic Tests 10
2.2.3. Using NVPerfHUD 10
2.3. Bottleneck: CPU ..................................................................... 11
2.4. Bottleneck: GPU ..................................................................... 12
Chapter 3. General GPU Performance Tips .................................................13
3.1. List of Tips............................................................................. 13
3.2. Batching................................................................................ 15
3.2.1. Use Fewer Batches 15
3.3. Vertex Shader........................................................................ 15
3.3.1. Use Indexed Primitive Calls 15
3.4. Shaders................................................................................. 16
3.4.1. Choose the Lowest Pixel Shader Version That Works 16
3.4.2. Compile Pixel Shaders Using the ps_2_a Profile 16
3.4.3. Choose the Lowest Data Precision That Works 16
3.4.4. Save Computations by Using Algebra 18
3.4.5. Don’t Pack Vector Values into Scalar Components of Multiple
Interpolants 18
3.4.6. Don’t Write Overly Generic Library Functions 19
4
3.4.7. Don’t Compute the Length of Normalized Vectors 19
3.4.8. Fold Uniform Constant Expressions 19
3.4.9. Don’t Use Uniform Parameters for Constants That Won’t Change
Over the Life of a Pixel Shader 20
3.4.10. Balance the Vertex and Pixel Shaders 20
3.4.11. Push Linearizable Calculations to the Vertex Shader If You’re Bound
by the Pixel Shader 21
3.4.12. Use the mul() Standard Library Function 21
3.4.13. Use D3DTADDRESS_CLAMP (or GL_CLAMP_TO_EDGE) Instead of
saturate() for Dependent Texture Coordinates 21
3.4.14. Use Lower-Numbered Interpolants First 21
3.5. Texturing .............................................................................. 22
3.5.1. Use Mipmapping 22
3.5.2. Use Trilinear and Anisotropic Filtering Prudently 22
3.5.3. Replace Complex Functions with Texture Lookups 22
3.6. Performance .......................................................................... 24
3.6.1. Double-Speed Z-Only and Stencil Rendering 24
3.6.2. Early-Z Optimization 25
3.6.3. Lay Down Depth First 25
3.6.4. Allocating Memory 26
3.7. Antialiasing............................................................................ 26
Chapter 4. GeForce 6 Series Programming Tips..........................................29
4.1. Shader Model 3.0 Support ....................................................... 29
4.1.1. Pixel Shader 3.0 29
4.1.2. Vertex Shader 3.0 30
4.1.3. Dynamic Branching 31
4.1.4. Easier Code Maintenance 31
4.1.5. Instancing 32
4.1.6. Summary 32
NVIDIA GPU Programming Guide
5
4.2. sRGB Encoding....................................................................... 32
4.3. Separate Alpha Blending ......................................................... 33
4.4. Supported Texture Formats ..................................................... 33
4.5. Floating-Point Textures ........................................................... 34
4.5.1. Limitations 35
4.6. Multiple Render Targets (MRTs)............................................... 35
4.7. Identifying GPUs .................................................................... 35
4.8. Vertex Texturing .................................................................... 36
4.9. Hardware Shadow Maps.......................................................... 36
Chapter 5. GeForce FX Programming Tips ..................................................39
5.1. Vertex Shaders....................................................................... 39
5.2. Pixel Shader Length................................................................ 39
5.3. DirectX-Specific Pixel Shaders .................................................. 40
5.4. OpenGL-Specific Pixel Shaders................................................. 40
5.5. Using 16-Bit Floating-Point ...................................................... 40
5.6. Supported Texture Formats ..................................................... 42
5.7. Using ps_2_x and ps_2_a in DirectX ...................................... 43
5.8. Using Floating-Point Render Targets......................................... 43
5.9. Changed Driver Behavior of Shadow Maps in DirectX.................. 44
5.10. Newer Chips and Architectures................................................. 44
5.11. Summary............................................................................... 45
Chapter 6. Performance Tools Overview.....................................................47
6.1. NVPerfHUD............................................................................ 47
6.2. NVShaderPerf ........................................................................ 48
6.3. FX Composer ......................................................................... 48
6.4. Questions and Feedback ......................................................... 49
Chapter 7. GPU Codename and Product Name List.....................................51