331
Programming Massively Parallel Processors. DOI:
Copyright © David B. Kirk/NVIDIA Corporation and Wen-mei W. Hwu. Published by Elsevier Inc. All rights reserved
2017
http://dx.doi.org/10.1016/B978-0-12-811986-0.00015-7
Application case study—
molecular visualization
and analysis
15
CHAPTER
John Stone
CHAPTER OUTLINE
15.1 Background .....................................................................................................332
15.2 A Simple Kernel Implementation .......................................................................333
15.3 Thread Granularity Adjustment ..........................................................................337
15.4 Memory Coalescing ..........................................................................................338
15.5 Summary .........................................................................................................342
15.6 Exercises .........................................................................................................343
References ...............................................................................................................344
The previous case study used a statistical estimation application to illustrate the pro-
cess of selecting an appropriate level of a loop nest for parallel execution, transform-
ing the loops for reduced memory access interference, using constant memory for
magnifying the memory bandwidth for read-only data, using registers to reduce the
consumption of memory bandwidth, and the use of special hardware functional units
to accelerate trigonometry functions. In this case study, we use a molecular dynam-
ics application based on regular grid data structures to illustrate the use of additional
practical techniques that achieve global memory access coalescing and improved
computation throughput. As we did in the previous case study, we present a series
of implementations of an electrostatic potential map calculation kernel, with each
version improving upon the previous one. Each version adopts one or more practical
techniques. Some of the techniques are in common with the previous case study but
some are new: systematic reuse of computational results, thread granularity coarsen-
ing, and fast boundary condition checking. This application case study shows that
the effective use of these practical techniques can signicantly improve the execution
throughput of the application.