【免费】基于CPU-GPU异构平台的性能优化及多核并行编程模型的研究1

需积分: 0 71 浏览量 2022-08-04 00:47:39 上传评论收藏 1.68MB PDF 举报

资源详情

资源评论

资源推荐

中国科学技术大学

硕士学位论文

基于CPU-GPU异构平台的性能优化及多核并行编程模型的研究

姓名：陈波

申请学位级别：硕士

专业：计算机应用技术

指导教师：徐云

2011-04-18

摘要

随着图形处理器（GPU）的计算能力和可编程性的不断提高，利用 GPU 进

行通用计算（GPGPU）逐渐成为研究的热点。通常 GPGPU 计算采用 CPU-GPU

的异构模式，虽然这种异构模式能够获得好的性能收益，但其程序开发和性能优

化的复杂度要比同构系统大的多。

在 CPU-GPU 异构系统上进行计算会遇到很多性能瓶颈，例如：负载均衡、

同步与延迟、数据局部性、任务划分等。这些因素对提高程序的性能至关重要。

此外，尽管 CUDA 编程模型极大的降低了 CPU-GPU 异构平台编程的难度，但对

于大多数串行程序开发者来说，其开发门槛还是相对较高，而且当底层的硬件平

台发生变化时，软件开发者又要学习一种新的编程模型并针对新的硬件平台重新

改写已有的程序，这无疑加重了程序员的负担。因此设计一种使用简单、平台无

关的多核并行编程模型具有重要意义。

本文主要进行了以下研究工作：

（1）分析了影响 CPU-GPU 异构平台上程序性能的关键因素，全面总结了

已有的优化方法并设计了一种使用原子函数实现不同线程块之间同步等自己的

优化方法和优化策略。对每一种优化方法都进行了实验验证和理论分析，其中我

们设计的使用原子函数实现不同线程块之间同步的方法比现有的重新启动内核

函数的方法要快 4~5 倍。

（2）为了进一步验证各种优化方法的效果，也为了完整的介绍在 CPU-GPU

异构上进行程序开发的流程（算法设计、编程实现、性能优化），我们以解决生

物信息学中的 DNA 或蛋白质局部序列比对问题为例，在 CPU-GPU 异构平台上

设计并实现了基于列并行的 Smith-Waterman 算法，综合运用多种优化方法进行

优化后的并行程序获得了平均 37 倍的加速比。

（3）在深入分析了 OpenMM 并行编程框架之后，我们设计了一个基于库的、

平台无关的多核并行编程模型。为了验证该模型的可行性和易用性，我们实现了

一个面向科学计算的原型系统，通过设计合理的 API 层次结构，对上层用户屏蔽

了底层硬件的具体细节，用户只需在编译时根据具体的底层硬件平台选择相应的

动态链接库就可以将原来的串行程序变成高效的并行程序。

关键词：GPU，GPGPU，CUDA，异构计算平台，性能优化，并行编程模型

Abstract

ABSTRACT

With the computing power and programmability of graphics processor uint (GPU)

increasing continuously, general purpose computing on GPU (GPGPU) is gradually

becoming a research hotspot. Usually the computing with GPGPU utilizes a

heterogeneous mode of CPU and GPU. Although the heterogeneous system based on

CPU-GPU can achieve good performance gains, program development and

performance optimization of it are more complexity compared with the homogeneous

system.

Computing on the heterogeneous system based on CPU-GPU will encounter a lot

of performance bottlenecks, such as load balancing, synchronization and delay, data

locality, task division and so on. These factors are essential to improve the

performance of the program. On the other hand, although the programming difficulty

of the heterogeneous system based on CPU-GPU reduced greatly due to the CUDA

programming model, the development requirement is still high for most of the serial

program developers. And when the underlying hardware changes, software developers

have to learn a new programming model and rewrite programs for the new hardware

platform, which increases the burden on the programmer. So it is very significate to

designing a simple and platform-independent multicore parallel programming model.

We mainly did the following researches:

(1) Analyzed the key factors which affect the performance of CUDA programs,

summarized the existing optimization methods comprehensively and proposed our

new optimization methods and optimization strategy, such as using atomic functions

to achieve synchronization between different thread blocks. For each optimization

method, we did experiments to verify its effectiveness and theoretical analysis. And

our method that exploiting atomic functions to synchronize different thread blocks is

4~5 times faster than existing method that restarting the kernel function.

(2) To further validate the effectiveness of various optimization methods, and

also to descripte the development process (algorithm design, programming,

performance optimization) of heterogeneous platform based on CPU-GPU, we

exploited CPU-GPU heterogeneous computing platform to solve the problem of DNA

or protein sequence alignment which is a bioinformatics problem, namely designed

and implemented a new column-based parallel Smith-Waterman algorithm based on

剩余75页未读，继续阅读

评论收藏

内容反馈

思想假

粉丝: 25
资源: 326

基于CPU-GPU异构平台的性能优化及多核并行编程模型的研究1

评论0

最新资源

基于CPU-GPU异构平台的性能优化及多核并行编程模型的研究1

评论0

论文研究-基于CPU-GPU融合架构下的MapReduce编程模型 .pdf

多核CPU-GPU异构平台下并行Agent仿真负载均衡方法.pdf

基于多核CPU-GPU异构平台的并行Agent仿真.pdf

面向CPU-GPU异构并行系统的编程模型与编译优化关键技术研究

论文研究-多核CPU-GPU协同的并行深度优先算法.pdf

论文研究-基于异构多核的CCA并行构件模型.pdf

论文研究-异构计算平台上列存储系统的并行连接优化策略.pdf

论文研究-面向CPU-GPU架构的源到源自动映射方法.pdf

CUDA平台下多核GPU高性能并行编程

异构众核系统综述

基于GPU大规模并行计算的异构计算体系介绍

opencl异构计算

基于OpenCL的多GPU并行计算的研究与应用1

AMD OpenCL 中文版教程

面向大数据复杂应用的GPU协同计算模型

heterogeneous-papers:有关异构系统的一些有趣论文的阅读清单

AdaptiveAutosar

最新版ISO/IEC 27001:2022、ISO 27002:2022中英文合集

Goby红队版-win-x64-2.4.7版本

Chrome Header Editor 插件

ISO SAE 21434-2021 中文版.pdf

安全认证cisp教材全套

OpenVAS GVM 中文翻译补丁

2024最新：Hvv中常见的面试问题

现代永磁同步电机控制原理及MATLAB仿真__袁雷编著1

全面的安全基线核查清单

CISP、NISP二级、CISE题库最新版（2024年1月更新）

OpenVAS离线资源

最新资源