DeepComp:towardsabalancedsystemdesignforhighperformancecomputersystems资源-CSDN文库

90 浏览量 2021-02-21 13:33:05 上传评论收藏 163KB PDF 举报

在当今时代，集群计算已经成为了高性能计算机系统（HPC）的主要架构，占据了最近TOP500列表中超过80%的市场份额。集群架构之所以成为主流，得益于一系列技术进步，包括中央处理单元（CPU）芯片的发展、操作系统优化、互连网络改进以及通过平衡系统设计、应用算法优化和运行时优化实现的高Linpack和应用程序效率。为了在大规模集群系统中实现高效率，平衡的系统设计至关重要。本文探讨了DeepComp高性能计算机系统在平衡系统设计方面的实践经验，并提出了设计大规模平衡集群系统的方法。特别地，本文给出了一种平衡CPU和内存层次结构的方法，提出了两种平衡计算节点和I/O系统的途径：最大带宽准则和能同时访问I/O系统的最大计算节点数量。联想的高性能集群系统的经验表明，上述方法是有效的。文章还探讨了联想在平衡设计面向peta和10 peta级别的高生产力计算系统（HPCS）的策略。从架构角度来看，如果一个系统的关键数据通道上没有瓶颈，并且所有设备都能及时从数据供应者处获取所需数据，那么这个系统被称为平衡系统。为了达到这种平衡状态，设计时需要考虑多个因素，包括但不限于： 1. CPU的性能和能效：现代高性能计算系统中CPU芯片的进步是核心因素之一，其性能直接影响到整个系统的处理能力。 2. 操作系统的优化：操作系统对于资源的管理和分配对于确保系统的高效运行至关重要。它需要能够充分利用硬件资源，提供必要的任务调度、内存管理等服务。 3. 互连网络的设计：高性能计算集群中节点间的通信通常依赖于高效的互连网络。为了减少延迟和提高带宽，网络的设计需要平衡高速和低延迟的特性。 4. 高Linpack和应用程序效率：Linpack是高性能计算领域用于衡量系统浮点计算性能的标准。高Linpack效率意味着系统在进行大规模并行计算时，能够有效地使用所有可用的计算资源。应用程序效率则体现在特定计算任务的执行速度和资源利用效率上。 5. 应用算法优化：针对特定的应用场景优化算法，可以提高计算效率，减少不必要的计算和数据传输，进一步提高系统性能。 6. 运行时优化：运行时系统是负责管理和调度计算任务的软件层，通过有效的任务调度、负载平衡和资源管理，可以显著提高计算效率。在设计高性能计算系统时，关键在于平衡CPU和内存层次结构，确保数据能够高效地在各组件间流动。设计需要特别关注以下方面： - 避免出现数据通道上的瓶颈：这意味着所有的硬件组件，如CPU、内存、存储设备和网络接口，都需要有足够的吞吐量来支持其它组件的请求。 - 充分利用数据供应者的性能：每个设备都应该能够根据自身需求及时从数据供应者处获取数据。这涉及到数据缓存、预取和局部性原理的应用。 - 实现I/O系统的高效访问：在高性能计算中，I/O通常是一个瓶颈。因此，需要设计高效的I/O系统，以支持大量计算节点对数据的并发访问，并保证数据传输的高带宽和低延迟。联想在高性能计算系统方面的策略和经验表明，通过系统地平衡硬件设计、操作系统、网络架构和软件优化，可以有效地实现高性能集群系统的设计目标。无论是在peta规模还是10 peta规模的高生产力计算系统中，这些平衡设计的原则和技术都是至关重要的。

资源推荐

资源详情

资源评论

RESEARCH ARTICLE

DeepComp: towards a balanced system design for high

performance computer systems

Mingfa ZHU (✉)

1,2

, Limin XIAO

1,2

, Li RUAN

, Qinfen HAO

1 State Key Laboratory of Software Development Environment, Beijing 100191, China

2 School of Computer Science and Engineering, Beihang University, Beijing 100191, China

Abstract Today, cluster-based computing is the main-

stream architecture for high end computer systems.

Balanced system design is critical for large scale cluster

systems to achieve high efﬁciency. This paper addresses

the practice on DeepComp high end computer systems

toward a balanced system design. Methodologies of

designing balanced large scale cluster systems are given.

A method for balancing central processing unit (CPU) and

memory hierarchy is addressed. For balancing computing

nodes and I/O systems, two approaches are given:

maximum bandwidth criterion and maximum number of

computing nodes which can concurrently access I/O

systems. Experiences of Lenovo high end cluster systems

show that above methods are effective. Lenovo strategies

toward a balanced system design for both peta and 10 peta

scale high productivity computing systems (HPCSs).

Keywords high performance computer systems (HPCs),

high productivity computing systems (HPCSs), cluster,

balanced system design

1 Introduction

Since the middle of this decade, the cluster has been the

main stream architecture for high performance computer

systems (HPCs) or high end computer systems and has a

share greater than 80% in recent world TOP500 lists. A

number of factors helped this happen including technical

progresses in central processing unit (CPU) chips,

operating systems, interconnection networks, and high

Linpack and application efﬁciencies, which are achieved

by a balanced system design, application algorithm

optimization, and runtime optimization.

From an architecture point of view, a system is said to

be a balanced system if there is no bottle neck in any key

data channel and all devices are able to obtain data from

data suppliers in time for their own purposes. In a cluster

system, the main data channels are between the CPUs and

the main memory banks, within computing nodes, among

computing nodes, and in between computing nodes and

I/O systems, or storage (RAID disk). In another words, a

cluster system is a balanced system if the following are

true: if the data bandwidth of the memory hierarchy meets

the needs of all CPUs in any and all computing nodes, if

the power of the communication system matches the

computing power of all computing nodes, and if the

bandwidth of the I/O system meets the need of all

computing nodes.

Today, Moore’s Law of CPU chips with respect to

speed still holds. Although Moore’s Law with respect to

main memory capacity holds, memory access speed

increase much more slowly (about 10% each year) and

the gap between CPU speed and memory speed is getting

larger and larger. In large scale cluster systems, there are

serious bottle necks between the large number of

computing nodes and I/O systems, and bottle necks also

exist in communication systems which include intercon-

nection network hardware and message passing software

packages. Therefore, in a large scale cluster design, a

serious issue is the system balance issue which includes

balance between CPU computing power and memory data

supply power, balance between node computing power

Received August 17, 2010; accepted September 18, 2010

E-mail: zhumf@buaa.edu.cn

Front. Comput. Sci. China 2010, 4(4): 475–479

DOI 10.1007/s11704-010-0150-z

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余4页未读，立即下载

评论收藏

内容反馈

weixin_38657102

粉丝: 9
资源: 934

DeepComp: towards a balanced system design for high performance ...

最新资源

DeepComp: towards a balanced system design for high performance ...

DeepCoMP:使用（多主体）深度强化学习的协作多点（CoMP）动态多单元选择

system-design

System-Design

System Design

Lenovo 700深腾使用指南

全球超级计算机排行.pdf

SystemDesign

基于CORDIC的反正弦和反余弦计算的FPGA实现

使用3DCNN和卷积LSTM进行手势识别学习时空特征

BA无标度网络中的SIR模型

基于三次贝塞尔曲线的类汽车曲率连续路径平滑

基于机器学习的设备剩余寿命预测方法综述

基于维纳过程的退化模型，具有递归过滤算法，可用于估计剩余使用寿命

基于FPGA的奇异值和特征值分解的快速实现。

基于BP神经网络的人口预测

磁悬浮系统自适应模糊PID控制器的设计

两轮平衡车的建模与控制研究

无人机协同目标的多无人机协同搜索方法

基于改进遗传算法的六自由度机器人时间最优轨迹规划

一种基于深度学习的机械臂抓取方法

基于深度神经网络的交通流量预测

一种去除ECG中基线漂移和工频干扰的高效滤波方法

基于稀疏贝叶斯学习的高效DOA估计方法

适用于1-8GHz宽带应用的原始Vivaldi天线

最新资源