没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
| INTRODUCTION TO OPENCL | OCTOBER 23, 2013 | PUBLIC2
OPENCL PERFORMANCE CONSIDERATION ON GPUS
CPU + dGPU with OpenCL has obvious bottlenecks
‒ CPU/GPU data movement is a side effect
‒ dGPU has limited memory size
‒ CPU + dGPU has seeable overhead of cooperation under OpenCL runtime
Try to narrow the side effects down as much as possible
‒ CPU/GPU data movement over PIC-E or other bus is the introduced overhead
‒ Double buffering or APU platform is the ideal technology to reduce the overhead
Ideas to tune overall system performance should be paid attention
‒ Double buffering for dGPU
‒ APU platform for eliminating CPU/GPU data movement
‒ HSA technique gives CPU/GPU cooperation a more harmonious way
| INTRODUCTION TO OPENCL | OCTOBER 23, 2013 | PUBLIC3
AGENDA
OpenCL system performance
‒ CPU/GPU data movement
‒ OpenCL runtime overhead
APU architecture and OpenCL optimization
HSA and OpenCL optimization
| INTRODUCTION TO OPENCL | OCTOBER 23, 2013 | PUBLIC4
CPU/GPU DATA MOVEMENT
For normal CPU + dGPU platform, a single buffer for computing and data movement looks like the below
There’s additional time consuming for CPU <-> GPU data movement which is introduced side effect
This side effect is even worse in the case that:
‒ Data movement time is significantly larger than Kernel time
‒ Or Data movement time is even larger than CPU computing time
Data in Compute Data out Data in Compute Data out
| INTRODUCTION TO OPENCL | OCTOBER 23, 2013 | PUBLIC5
OPENCL APPLICATION OPTIMIZATION
Very useful common technique
‒ One buffer is computing while another buffer is filled in data
‒ To overlap the time of computing and the time of CPU/GPU data movement
‒ Especially useful for CPU + dGPU platform
With AMD OpenCL implementation, DMA is asynchronous
‒ Use two command queue, one for buffer en-queue operation and another for Kernel operation
‒ Use event to synchronize
DOUBLE BUFFERING
剩余66页未读,继续阅读
资源评论
AMD异构开发
- 粉丝: 103
- 资源: 14
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- mmexport1713192608513.mp4
- 斯特林V4发动机 斯特林V4发动机
- 基于C实现的N阶数字正方形 ;N阶数字三角形;N阶数字递减三角形;乘法表
- 基于分水岭算法的图像分割的python源码(课程设计).zip
- 基于Java 实现的二进制十进制之间的相互转换
- Pytorch实现基于卷积神经网络的面部表情识别项目源码+数据集+全部资料(毕业设计).zip
- Pytorch实现基于深度学习卷积神经网络的面部表情识别项目源码+面部表情数据集(人脸面部表情识别项目).zip
- 淘金小游戏助手.apk
- 基于卷积神经网络的人脸面部表情识别项目源码+面部表情数据集+训练好的模型(人脸面部表情识别项目).zip
- 深度学习基于卷积神经网络的人脸面部表情识别项目源码+面部表情数据集+训练好的模型(人脸面部表情识别项目).zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功