没有合适的资源?快使用搜索试试~ 我知道了~
High Radix On-Chip Networks at Incremental Reconfiguration Cost
需积分: 2 0 下载量 199 浏览量
2024-04-08
14:24:39
上传
评论
收藏 419KB PDF 举报
温馨提示
试读
6页
NOC 文章
资源推荐
资源详情
资源评论
High Radix On-Chip Networks at Incremental Reconfiguration Costs
Animesh Jain, Ritesh Parikh and Valeria Bertacco
Department of Computer Science and Engineering, University of Michigan
{anijain, parikh, valeria}@umich.edu
ABSTRACT
Networks-on-chip (NoCs) have become increasingly widespread in re-
cent years due to the extensive integration of many components in mod-
ern multicore processors and SoC designs. One of the fundamental
tradeoffs in NoC design is the radix of its constituent routers. While
high radix routers enable a richly connected and low diameter network,
low radix routers provide simple and low power designs. Today, NoC
designs take significant silicon area and may consume up to 30% of the
entire chip power budget; thus, naïvely deploying an expensive high-
radix network is no longer possible.
In this work, we present Hi ROIC
1
(High Radix On-chip Networks at
Incremental re-Configuration Cost), t o provide high-radix like perfor-
mance at a cost similar to a low-radix network. HiROIC leverages the
irregularity in runtime communication patterns to provide short low-
latency paths between frequently communicating nodes, while infre-
quently communicating pairs take longer paths. To this end, HiROIC
proposes a flexible topology reconfiguration infrastructure where the
abundantly available links between routers (in accordance to a high-
radix topology) are decoupled from scarcely available router ports (sim-
ilar to a low-radix topology). The link-to-port binding is done at run-
time, based on traffic patterns, using low-overhead multiplexers. Hi-
ROIC employs a statistics collection and decision-making framework
to orchestrate the topology modifications that maximize performance.
While our solution may require some additional and/or longer links, we
observe that links are not the timing bottlenecks in contemporary router
pipelines and links that are not coupled to ports could be power-gated
to save power. HiROIC ensures a globally connected and deadlock-
free network at all times. O ur experiments on a 64-node CMP, running
multi-programmed workloads, show that HiROIC reduces average net-
work latency by 21% over an area- and power- comparable mesh NoC.
1. INTRODUCTION
As a result of increasing integration of components into CMP and
SoC architectures, networks-on-chip (NoCs) have become the dom-
inant choice for on-chip interconnects, due to the highly concurrent
communication paths and better scalability they provide. Moreover,
to keep up with the communication demands of the cores/IPs on-chip,
NoCs are increasingly incorporating bulky and power-hungry resources,
required to meet target latency and bandwidth goals.
One such design decision is the radix of the routers in the topology,
that is, the number of I/O ports that a router provides to connect links
to adjacent routers. High-radix routers enable low-diameter t opologies,
and allow the processing nodes to be connected closely, with pack-
ets traversing just a few routers to reach their destination. On the
down side, router components, such as crossbar and allocators, grow
quadratically in area with the radix of the router. In addition, high-radix
routers lead to increased signal propagation latencies, and slow er oper-
ating frequencies. A popular alternative are topologies deploying low-
radix routers, such as meshes. Typically, routers up to a radix of five
(e.g., mesh) are considered low radix, while larger ones are considered
high-radix. Low-radix routers can be clocked at a substantially higher
clock rate than their high-radix counterparts. For example, according
to the models provided in [17], a radix-7 router has a 4.1% higher cycle
time compared to a r adix-5 router. Unfortunately, for many-core ar-
chitectures, low-radix topologies could lead to large network diameters
and prohibitively high hop counts. Using low-radix routers particularly
hurts performance when applications do not have sufficient memory-
level parallelism ( MLP) to hide the higher latency. The radix of the
router is therefore an important design decision that directly affects la-
tency, area and power targets.
With HiROIC we want to provide the best of both classes of topolo-
gies: low and high radix. HiRO I C provides an effective network di-
1
ROIC is a popular Economics acronym for ’Return on Invested Cap-
ital.’ HiROIC is synonymous to high-RO I C, since we provide a high
performance return for a given area/power budget dedicated to NoCs.
ameter at par with high-radix topologies, w hil e only utilizing r esources
comparable to low-radix routers. HiROIC exploits the non-uniformity
of communication patterns to provide short, low latency paths only
between heavily communicating nodes, while it forces the low traffic
source-destination pairs to use longer paths. Therefore HiROIC pro-
vides, on average, a small hop count for packets traversing the network,
similar to high-radix topologies. N aturall y, the greater the disparity in
communication load between source-destination pairs, the greater i s
HiRO I C’s effectiveness. With the increasing integration of application-
specific components, the location and quantity of heavily used routing
paths is likely to be highly unbalanced both across and within appli-
cations. We therefore envision great potential for the deployment of
HiRO I C in upcoming C MP and SoC designs.
HiRO I C uses routing and topology reconfiguration to optimize for
high-volume source-destination pairs. At the heart of HiROIC is the
concept of link-port decoupling. HiROIC’s r outers do not statically
bind their ports to links, unlike tr aditional routers. Rather, this bind-
ing is applied at runtime, depending on the communication demands
of the application. HiROIC deploys links abundantly, in accordance to
a high-radix topology, to potentially provide short paths between any
source-destination pair. However, HiROI C’s routers stil l maintain the
internal port-count of low-radix networks. HiROIC deploys an addi-
tional layer of glue-logic to bind ports to links at runtime. These bind-
ings are used for one epoch of execution, after which HiROIC evaluates
whether the current topology is suitable for upcoming t raffic patterns.
If not, binding decisions are re-evaluated to globally optimize for the
new communication patterns. In essence, our infrastructure’s ability
to realize many irregular or regular topologies is leveraged to adapt to
application’s demands at runtime. While HiROIC’s wir ing overhead
is greater than conventional topologies like meshes, due to longer and
additional wires, we observe that wires are never the timi ng bottleneck
in conventional router pipelines [10]. In addition, unused wires can be
power-gated once the port-to-link binding decisions are finalized.
Note that, in typical NoCs, routers have one local port (sometimes
more) connecting to the processing node(s). Since the connection to
the processing node is essential, HiROIC uses a fixed port-link binding
for local ports. In the rest of this paper we exclude the local port(s)
when reporting the radix of the router.
0
20
40
60
80
100
0 20 40 60 80 100
Region of interest
Region of interest
0
15
30
45
60
75
0 2 4 6 8 10 12
> 60%
network traffic (in %)
network traffic (in %)
cumulative source-destination pairs(in %) – high to low traffic sharing
Figure 1: Fraction of traffic load shared by the most e xercised source-destination pairs.
The top 10% source-destination pairs transfers more than 60% of total network traffic be-
tween them, therefore, HiROIC targets this pool for topology optimization.
It is essential for HiROIC to have a high vari ation between high-
usage source-destination pairs and other source-destination pairs. To
this end, we conducted a study whose findings are plotted in Figure 1.
The plot shows the contribution of traffic flowing between each source-
destination pair. Our testbed consisted of an 8x8 mesh CMP running a
multiprogrammed mix of applications from the SPEC CPU2006 suite.
Source-destination pairs are sorted by decreasing traffic activity dur-
ing the execution, and the plot on the left indicates what fraction of
network traffic (Y axis) was carried out by a given fraction of sorted
source-destination pairs. The plot on the right is an enlargement of the
contribution by the top 12% source-destination pairs: less than 10% of
the source-destination pairs shares as much as 60% of the traffic load
on average. Beyond the tenth percentile of utilization, this disparity
is no l onger obvious. Thus, HiROIC’s goal is to identify and leverage
the 10% most used source-destination pairs to provide short and high-
bandwidth paths between them. This, in turn, minimizes the effective
资源评论
bob346244045
- 粉丝: 0
- 资源: 3
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功