GPU工作原理分析资料.pdf

所需积分/C币:50 2019-07-23 18:06:40 3.97MB .PDF
26
收藏 收藏
举报

In the early 1990s, ubiquitous interactive 3D graphics was still the stuff of science fiction. By the end of the decade, nearly every new computer contained a graphics processing unit (GPU) dedicated to providing a high-performance, visually rich, interactive 3D experience. This dramatic shift was th
HOW THINGS WORK 回 GPUS introduced increased flexibility adding support for longer programs more registers, and control-flow prim itives such as branches, loops, and subroutines The atI Radeon 9700 (July 2002) nVIDIA and nvidia GeForce FX anuary 2003 replaced the often awkward ister combiners with fully program mable pixel shaders. NVIDIA's latest chip, the ge force 8800(November 2006), adds programmability to the primitive assembly stage allowing developers to control how they con- struct triangles from transformed ver tices. As Figure 2 she OWS、 modern GPUs achieve stunning visual realism Increases in precision have accom panied increases in programmability. The traditional graphics pipeline pro vided only 8-bit integers per color Figure 1.Programmable shading. The introduction of programmable shading in 2001 led channel, allowing values ranging from to several visual effects not previously possible, such as this simulation of refractive 0 to 255. The ati Radeon 9700 chromatic dispersion for a soap bubble"effect increased the representable range of olor to 24-bit floating point, and NVIDIA,s GeForce fx followed with both 16-bit and 32-bit floating point. Both vendors have announced plans to support 64-bit double Drecision oating point in upconing chips To keep up with the relentless demand for graphics performance GPUs have aggressively embraced parallel design. GPUs have long used four-wide vector registers much like Intel,'s Streaming SIMD Extensions (SSE)instruction sets now provide on Intel CPUs. The number of such four wide processors executing in parallel has increased as well, from only four on GeForce fx to 16 on geforce 6800(April 2004)to 24 on GeForce 7800 (May 2005). The GeForce 8800 actually includes 128 scalar shader processors that also run on a special shader clock at 2.5 times the clock rate (relative to pixel output) of for- Figure 2. Unprecedented visual realism. Modern GPUs can use programmable shading to mer chips, so the computational per- achieve near-cinematic realism, as this interactive demonstration shows, featuring formance might be considered equiv actress Adrianne Curry on an NVIDIA GeForce 8800 GTX lent to 128 2.5/4=80 four-wide vell as a short default program that exposing the texturing hardware s/ pixel shaders uses these units to perform vertex trans- functionality as a set of register com- UNIFIED SHADERS formation and lighting hiners that could achieve novel visua The latest step in the evolution from GeForce 3 also introduced limited effects such as the"soap-bubble"look hardwired pipeline to flexible compu reconfigurability into pixel processing, I demonstrated in Figure 1. Subsequent I tational fabric is the introduction of 98Computer 回爸 unified shaders. Unified shaders were 3D geometric first realized in the ati Xenos chip for primitives he Xbox 360 game console, and NViDiA introduced them to pcs with GPU the ge Force 8800 chip Instead of separate custom proces Programmable unified processors sors for vertex shaders, geometry Vertex Geometry Compute shaders, and pixel shaders, a unified programs programs programs programs shader architecture provides one large grid of data-parallel floating-point processors general enough to run all Rasterization Hidden surface these shader workloads As Figure 3 removal shows, vertices, triangles, and pixels recirculate through the grid rather than flowing through a pipeline with GPU memory(D Final image stages of fixed width This configuration leads to better overall utilization because demand for the various shaders varies greatly Figure 3. Graphics pipeline evolution. The NVIDIA GeForce 8800 GPU replaces the between applications, and indeed even traditional graphics pipeline with a unified shader architecture in which vertices, within a single frame of one applica- triangles, and pixels recirculate through a set of programmable processors. The flexibility tion. For example, a videogame might and computational power of these processors invites their use for general-purpose com begin an image by using large trian- puting tasks gles to draw the sky and distant ter- rain. This quickly saturates the pixel extremely high arithmetic throughput| resources, mapping well to the GPU's shaders in a traditional pipeline, while and streaming memory bandwidth many-core arithmetic intensity,or leaving the vertex shaders mostly idle. but tolerates considerable latency in they require streaming through large One millisecond later, the game might an individual computation since final quantities of data, mapping well to the use highly detailed geometry to draw images are only displayed every 16 GPU,s streaming memory subsystem intricate characters and objects. This milliseconds. These workload charac- Porting a judiciously chosen algo behavior will swamp the vertex shaders teristics have shaped the underlying rithm to the GPU often produces and leave the pixel shaders mostly idle. GPU architecture: Whereas CPUs are speedups of five to 20 times over These dramatic oscillations in optimized for low latency, GPUs are mature, optimized CPU codes running resource demands in a single image optimized for high throughput on state-of-the-art CPUs, and speed present a load-balancing nightmare The raw computational horsepower ups of more than 100 times have been for the game designer and can also of GPUs is staggering: A single GeForce reported for some algorithms that vary unpredictably as the players' 8800 chip achieves a sustained 330 bil- map especially well viewpoint and actions change. A uni- lion floating-point operations per sec- Notable GPGPU success stories fied shader architecture, on the other ond(Gflops)on simple benchmarks include Stanford University's Folding@ hand,canallocateavaryingpercent-(http:graphicsstanford.edu/projects/homeprojectwhichusessparecycles age of its pool of processors to each gpubench). The ever-increasing power, that users around the world donate to shader type programmability,andprecisionofstudyproteinfolding(http://folding For this example, a GeForce 8800 GPUs have motivated a great deal of stanford. edu). Anew GPU-accelerated might use 90 percent of its 128 proces- research on general-purpose compu- Folding@home client contributed sors as pixel shaders and 10 percent tation on graphics hardware--GPGPU 28,000 Gflops in the month after it as vertex shaders while drawing the for short. GPGPU researchers and October 2006 release--more than 18 sky, then reverse that ratio when developers use the GPU as a compu- percent of the total Gflops that CPU drawing a distant character's geome- tational coprocessor rather than as an clients contributed running on micro try. The net result is a flexible parallel image-synthesis device soft windows since october 2000 architecture that improves GPU uti- The GPU's specialized architecture In another GPGPU success story, lization and provides much greater isn't well suited to every algorithm. researchers at the University of North flexibility for game designers Many applications are inherently ser- Carolina and Microsoft used GPU ial and are characterized by incoher- based code to win the 2006 Indy GPGPU ent and unpredictable memory access. Penny Sort category of the Terasort The highly parallel workload of Nonetheless, many important prob- competition, a sorting benchmark real-time computer graphics demands lems require significant computational I testing price/performance for database HOW THINGS WORK 交回 operations(http://gamma.cs.unc.edu/GpuarchitecturesbutnotwithoutDavidLuebkeisaresearchscientist GPUTERASORT). Closer to home for limit; neither vendors nor users want at NVIDIA Research. Contact bim at theGpubusinesstheHavokfxprodtosacrificethespecializedarchitec-dluebkeanvidia.com uct uses GPGPU techniques to accel- ture that made GPUs successful in the erate tenfold the physics calculations first place. Today, GPU developers Greg Humphreys is a faculty member in used to add realistic behavior to need new high-level programming the Computer Science Department at the objectsincomputergames(www.modelsformassivelymultithreadedUniversityofVirginiaContacthimat havok. com) parallel computation, a problem soon bumper @cs.virginia. edu to impact multicore CPU vendors as M odern GPUs could be seen as well the first generation of com- Can GPU vendors, graphics devel- modity data-parallel proces- opers, and the GPGPU research com- Computer welcomes your submis sors. Their tremendous computational munity build on their success with sions to this bimonthly column. For capacity and rapid growth curve, far commodity parallel computing to additional information, or to outstripping traditional CPUS, high- transcend their computer graphics suggest topics that you would like light the advantages of domain-spe- roots and develop the computational to see explained, contact column cialized data-parallel computing idioms, techniques, and frameworks editor Alf Weaver at weaver@cs mability and generality from future I environment of the futured puting We can expect increased program- for the desktop parallel cor virginia. edu ◆EE Computer Welcomes Your Contribution Computer Computer, the flagship publication of the IEEE Computer Society, publishes peer-reviewed technical content that covers all aspects of computer science, computer magazIne engineering, technology, and applications looks ahead Articles selected for publication in Computer are edited to enhance readability for the nearly 100,000 computing to future professionals who receive this monthly magazine Readers depend on Computer to provide current, technologies unbiased, thoroughly researched information on the newest directions in computing technology EEE To submit a manuscript for peer review, ④ computer see Computers author quidelines: SoCiety www.computer.org/computer/author.htm 100Computer

...展开详情
试读 5P GPU工作原理分析资料.pdf
立即下载 低至0.43元/次 身份认证VIP会员低至7折
一个资源只可评论一次,评论内容不能少于5个字
weixin_39841365 如果觉得有用,不妨留言支持一下
2019-07-23
wentian1234qasky 内容总结精辟,收获满满,感谢感谢
2019-07-26
回复
您会向同学/朋友/同事推荐我们的CSDN下载吗?
谢谢参与!您的真实评价是我们改进的动力~
  • 至尊王者

    成功上传501个资源即可获取
关注 私信 TA的资源
上传资源赚积分or赚钱
    最新推荐
    GPU工作原理分析资料.pdf 50积分/C币 立即下载
    1/5
    GPU工作原理分析资料.pdf第1页

    试读结束, 可继续读1页

    50积分/C币 立即下载 >