GPU工作原理分析资料.pdf


-
In the early 1990s, ubiquitous interactive 3D graphics was still the stuff of science fiction. By the end of the decade, nearly every new computer contained a graphics processing unit (GPU) dedicated to providing a high-performance, visually rich, interactive 3D experience. This dramatic shift was th
HOW THINGS WORK 回 GPUS introduced increased flexibility adding support for longer programs more registers, and control-flow prim itives such as branches, loops, and subroutines The atI Radeon 9700 (July 2002) nVIDIA and nvidia GeForce FX anuary 2003 replaced the often awkward ister combiners with fully program mable pixel shaders. NVIDIA's latest chip, the ge force 8800(November 2006), adds programmability to the primitive assembly stage allowing developers to control how they con- struct triangles from transformed ver tices. As Figure 2 she OWS、 modern GPUs achieve stunning visual realism Increases in precision have accom panied increases in programmability. The traditional graphics pipeline pro vided only 8-bit integers per color Figure 1.Programmable shading. The introduction of programmable shading in 2001 led channel, allowing values ranging from to several visual effects not previously possible, such as this simulation of refractive 0 to 255. The ati Radeon 9700 chromatic dispersion for a soap bubble"effect increased the representable range of olor to 24-bit floating point, and NVIDIA,s GeForce fx followed with both 16-bit and 32-bit floating point. Both vendors have announced plans to support 64-bit double Drecision oating point in upconing chips To keep up with the relentless demand for graphics performance GPUs have aggressively embraced parallel design. GPUs have long used four-wide vector registers much like Intel,'s Streaming SIMD Extensions (SSE)instruction sets now provide on Intel CPUs. The number of such four wide processors executing in parallel has increased as well, from only four on GeForce fx to 16 on geforce 6800(April 2004)to 24 on GeForce 7800 (May 2005). The GeForce 8800 actually includes 128 scalar shader processors that also run on a special shader clock at 2.5 times the clock rate (relative to pixel output) of for- Figure 2. Unprecedented visual realism. Modern GPUs can use programmable shading to mer chips, so the computational per- achieve near-cinematic realism, as this interactive demonstration shows, featuring formance might be considered equiv actress Adrianne Curry on an NVIDIA GeForce 8800 GTX lent to 128 2.5/4=80 four-wide vell as a short default program that exposing the texturing hardware s/ pixel shaders uses these units to perform vertex trans- functionality as a set of register com- UNIFIED SHADERS formation and lighting hiners that could achieve novel visua The latest step in the evolution from GeForce 3 also introduced limited effects such as the"soap-bubble"look hardwired pipeline to flexible compu reconfigurability into pixel processing, I demonstrated in Figure 1. Subsequent I tational fabric is the introduction of 98Computer 回爸 unified shaders. Unified shaders were 3D geometric first realized in the ati Xenos chip for primitives he Xbox 360 game console, and NViDiA introduced them to pcs with GPU the ge Force 8800 chip Instead of separate custom proces Programmable unified processors sors for vertex shaders, geometry Vertex Geometry Compute shaders, and pixel shaders, a unified programs programs programs programs shader architecture provides one large grid of data-parallel floating-point processors general enough to run all Rasterization Hidden surface these shader workloads As Figure 3 removal shows, vertices, triangles, and pixels recirculate through the grid rather than flowing through a pipeline with GPU memory(D Final image stages of fixed width This configuration leads to better overall utilization because demand for the various shaders varies greatly Figure 3. Graphics pipeline evolution. The NVIDIA GeForce 8800 GPU replaces the between applications, and indeed even traditional graphics pipeline with a unified shader architecture in which vertices, within a single frame of one applica- triangles, and pixels recirculate through a set of programmable processors. The flexibility tion. For example, a videogame might and computational power of these processors invites their use for general-purpose com begin an image by using large trian- puting tasks gles to draw the sky and distant ter- rain. This quickly saturates the pixel extremely high arithmetic throughput| resources, mapping well to the GPU's shaders in a traditional pipeline, while and streaming memory bandwidth many-core arithmetic intensity,or leaving the vertex shaders mostly idle. but tolerates considerable latency in they require streaming through large One millisecond later, the game might an individual computation since final quantities of data, mapping well to the use highly detailed geometry to draw images are only displayed every 16 GPU,s streaming memory subsystem intricate characters and objects. This milliseconds. These workload charac- Porting a judiciously chosen algo behavior will swamp the vertex shaders teristics have shaped the underlying rithm to the GPU often produces and leave the pixel shaders mostly idle. GPU architecture: Whereas CPUs are speedups of five to 20 times over These dramatic oscillations in optimized for low latency, GPUs are mature, optimized CPU codes running resource demands in a single image optimized for high throughput on state-of-the-art CPUs, and speed present a load-balancing nightmare The raw computational horsepower ups of more than 100 times have been for the game designer and can also of GPUs is staggering: A single GeForce reported for some algorithms that vary unpredictably as the players' 8800 chip achieves a sustained 330 bil- map especially well viewpoint and actions change. A uni- lion floating-point operations per sec- Notable GPGPU success stories fied shader architecture, on the other ond(Gflops)on simple benchmarks include Stanford University's Folding@ hand,canallocateavaryingpercent-(http:graphicsstanford.edu/projects/homeprojectwhichusessparecycles age of its pool of processors to each gpubench). The ever-increasing power, that users around the world donate to shader type programmability,andprecisionofstudyproteinfolding(http://folding For this example, a GeForce 8800 GPUs have motivated a great deal of stanford. edu). Anew GPU-accelerated might use 90 percent of its 128 proces- research on general-purpose compu- Folding@home client contributed sors as pixel shaders and 10 percent tation on graphics hardware--GPGPU 28,000 Gflops in the month after it as vertex shaders while drawing the for short. GPGPU researchers and October 2006 release--more than 18 sky, then reverse that ratio when developers use the GPU as a compu- percent of the total Gflops that CPU drawing a distant character's geome- tational coprocessor rather than as an clients contributed running on micro try. The net result is a flexible parallel image-synthesis device soft windows since october 2000 architecture that improves GPU uti- The GPU's specialized architecture In another GPGPU success story, lization and provides much greater isn't well suited to every algorithm. researchers at the University of North flexibility for game designers Many applications are inherently ser- Carolina and Microsoft used GPU ial and are characterized by incoher- based code to win the 2006 Indy GPGPU ent and unpredictable memory access. Penny Sort category of the Terasort The highly parallel workload of Nonetheless, many important prob- competition, a sorting benchmark real-time computer graphics demands lems require significant computational I testing price/performance for database HOW THINGS WORK 交回 operations(http://gamma.cs.unc.edu/GpuarchitecturesbutnotwithoutDavidLuebkeisaresearchscientist GPUTERASORT). Closer to home for limit; neither vendors nor users want at NVIDIA Research. Contact bim at theGpubusinesstheHavokfxprodtosacrificethespecializedarchitec-dluebkeanvidia.com uct uses GPGPU techniques to accel- ture that made GPUs successful in the erate tenfold the physics calculations first place. Today, GPU developers Greg Humphreys is a faculty member in used to add realistic behavior to need new high-level programming the Computer Science Department at the objectsincomputergames(www.modelsformassivelymultithreadedUniversityofVirginiaContacthimat havok. com) parallel computation, a problem soon bumper @cs.virginia. edu to impact multicore CPU vendors as M odern GPUs could be seen as well the first generation of com- Can GPU vendors, graphics devel- modity data-parallel proces- opers, and the GPGPU research com- Computer welcomes your submis sors. Their tremendous computational munity build on their success with sions to this bimonthly column. For capacity and rapid growth curve, far commodity parallel computing to additional information, or to outstripping traditional CPUS, high- transcend their computer graphics suggest topics that you would like light the advantages of domain-spe- roots and develop the computational to see explained, contact column cialized data-parallel computing idioms, techniques, and frameworks editor Alf Weaver at weaver@cs mability and generality from future I environment of the futured puting We can expect increased program- for the desktop parallel cor virginia. edu ◆EE Computer Welcomes Your Contribution Computer Computer, the flagship publication of the IEEE Computer Society, publishes peer-reviewed technical content that covers all aspects of computer science, computer magazIne engineering, technology, and applications looks ahead Articles selected for publication in Computer are edited to enhance readability for the nearly 100,000 computing to future professionals who receive this monthly magazine Readers depend on Computer to provide current, technologies unbiased, thoroughly researched information on the newest directions in computing technology EEE To submit a manuscript for peer review, ④ computer see Computers author quidelines: SoCiety www.computer.org/computer/author.htm 100Computer

-
2019-07-23
-
2019-07-26
150.34MB
GPU精粹 1 2 pdf全集
2018-08-30GPU精粹 1 2 pdf全集..
2.88MB
浅析GPU通信技术.pdf
2020-04-27GPU 在高性能计算和深度学习加速中扮演着非常重要的角色,GPU 的强大的并行计算能力,大大提升了运算性能。随着运算数据量的不断攀升,GPU 间需要大量的交换数据,GPU 通信性能成为了非常重要的指标
Adreno GPU详细介绍 _course
2016-07-27Adreno GPU详细介绍 Adreno GPU是美国Qualcomm为移动平台设计的集成GPU。支持最先进的移动API,同时具有优异的性,应用于对带宽、功耗、散热等方面都有限制的移动芯片。Adre
-
学院
MySQL 四类管理日志(详解及高阶配置)
MySQL 四类管理日志(详解及高阶配置)
-
下载
浙江科技学院《抗震》知识点总结.pdf
浙江科技学院《抗震》知识点总结.pdf
-
下载
西南科技大学《操作系统》习题答案.pdf
西南科技大学《操作系统》习题答案.pdf
-
下载
浙江科技大学《材料力学》历年多套期末考试试卷(含答案).pdf
浙江科技大学《材料力学》历年多套期末考试试卷(含答案).pdf
-
博客
牛客网 KY187 二进制数
牛客网 KY187 二进制数
-
博客
深入【Get】与【Post】区别
深入【Get】与【Post】区别
-
下载
西南科技大学《电力电子技术》期末复习题(含答案 精心整理版).pdf
西南科技大学《电力电子技术》期末复习题(含答案 精心整理版).pdf
-
下载
浙江科技学院《结构力学》题库.pdf
浙江科技学院《结构力学》题库.pdf
-
学院
朱老师鸿蒙系列课程第1期-3.鸿蒙系统Harmonyos源码配置和管理
朱老师鸿蒙系列课程第1期-3.鸿蒙系统Harmonyos源码配置和管理
-
下载
浙江大学《微积分(1)》历年期末考试试题.pdf
浙江大学《微积分(1)》历年期末考试试题.pdf
-
学院
朱老师鸿蒙系列课程第1期-2鸿蒙系统Harmonyos源码架构分析
朱老师鸿蒙系列课程第1期-2鸿蒙系统Harmonyos源码架构分析
-
博客
awk指令常用内容
awk指令常用内容
-
学院
2021年 系统分析师 系列课
2021年 系统分析师 系列课
-
学院
MySQL 函数、用户自定义函数
MySQL 函数、用户自定义函数
-
博客
SpringBoot整合Redis
SpringBoot整合Redis
-
学院
libFuzzer视频教程
libFuzzer视频教程
-
学院
Mysql数据库面试直通车
Mysql数据库面试直通车
-
学院
linux基础入门和项目实战部署系列课程
linux基础入门和项目实战部署系列课程
-
学院
Docker从入门到精通
Docker从入门到精通
-
博客
2021-03-02任务
2021-03-02任务
-
下载
行政法与行政诉讼法--期末复习资料.pdf
行政法与行政诉讼法--期末复习资料.pdf
-
博客
快速排序
快速排序
-
学院
PPT大神之路高清教程
PPT大神之路高清教程
-
下载
hibernate5.0.2Set.rar
hibernate5.0.2Set.rar
-
博客
22--方法的递归
22--方法的递归
-
下载
西南科技大学模电试题-选择、判断填空.pdf
西南科技大学模电试题-选择、判断填空.pdf
-
下载
西南科技大学--电力工程--基础试卷(含答案).pdf
西南科技大学--电力工程--基础试卷(含答案).pdf
-
学院
自动化测试Python3+Selenium3+Unittest
自动化测试Python3+Selenium3+Unittest
-
下载
西南科技大学《电路分析》试题库(有答案).pdf
西南科技大学《电路分析》试题库(有答案).pdf
-
博客
Keil5C51 无法生成HEX 文件 ERROR L104: MULTIPLE PUBLIC DEFINITIONS
Keil5C51 无法生成HEX 文件 ERROR L104: MULTIPLE PUBLIC DEFINITIONS