没有合适的资源?快使用搜索试试~ 我知道了~
3DNetworkonChipwithmultiplexedThroughSiliconVias
需积分: 10 1 下载量 141 浏览量
2016-05-22
02:56:58
上传
评论
收藏 8MB PDF 举报
温馨提示
Microprocessors and Microsystems journal 2016 A-design-methodology-and-various-performance-and-fabrication-metrics-evaluation-of-3D-Network-on-Chip-with-multiplexed-Through-Silicon-Vias_2016_Micro
资源推荐
资源详情
资源评论
Microprocessors and Microsystems 43 (2016) 26–46
Contents lists available at ScienceDirect
Microprocessors and Microsystems
journal homepage: www.elsevier.com/locate/micpro
A design methodology and various performance and fabrication
metrics evaluation of 3D Network-on-Chip with multiplexed
Through-Silicon Vias
Mostafa Said
a , ∗
,Ahmed Shalaby
b
, Farhad Mehdipour
c
,Morteza Biglari-Abhari
d
,
Mohamed El-Sayed
b
a
Department of Electrical Engineering, Faculty of Engineering, Assiut University, Assiut, Egypt
b
Department of Electronics and Communications, Egypt-Japan University of Science and Technology (E-JUST), Alexandria, Egypt
c
E-JUST Center, Graduate School of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan
d
Department of Electrical & Computer Engineering, University of Auckland, Auckland, New Zealand
a r t i c l e i n f o
Article history:
Received 30 June 2015
Revised 13 November 2015
Accepted 13 January 2016
Available online 3 February 2016
Keywords:
3D Network-on-Chip
Traffic patterns
Through-Silicon Vias (TSVs)
Fabrication yield and cost
a b s t r a c t
The use of short Through-Silicon Vias (TSVs) in 3D integration Technology introduces a significant re-
duction in routing area, power consumption, and delay. Although, there are still several challenges in
3D integration technology; mainly low yield, which is a direct result of extra fabrication steps of TSVs.
Therefore, reducing TSV count has a considerable effect on improving yield and hence reducing cost. A
TSV multiplexing technique called TSVBOX was introduced in Said et al. (2013) to reduce the TSV count
without affecting the direct benefits of TSVs. Although, the TSVBOX introduces some delay to the signals
to be multiplexed, this delay effect of TSV multiplexing is not addressed yet. In this paper, we analyze the
TSVBOX timing requirements and propose a design methodology for TSVBOX-based 3D Network-on-Chip
(NoC). Then performance and power comparisons are conducted to investigate the direct effects of TSV
multiplexing on these two metrics. After that the basic fabrication metrics are compared to investigate
the effect of the proposed design methodology on yield and cost. We show that the TSVBOX extremely
enhances the fabrication metrics at minimal degradation in performance and power consumption, espe-
cially for Hotspot-like traffic patterns.
© 2016 Elsevier B.V. All rights reserved.
1. Introduction
Conventional 2D integration proves to have many limitations for
nowadays large systems needs. For example, long wires increase
power consumption and routing area. Also, it adds great difficulty
to distribute the clock signal with minimum delay for large sys-
tems. Such problems make 2D integration unable to follow Moore’s
law any more [2] .
On the other side, 3D integration is an emerging technology
that can mitigate the main limitations of conventional 2D integra-
tion. However there are still some challenges that need more and
more focus and research work to make such promising technology
mature and reliable. One of these challenges is the reliability is-
sues in terms of yield and cost. 3D-ICs show very low yield due
∗
Corresponding author. Tel.: +20 1096112888 .
E-mail addresses: mossaied2@gmail.com , mostafa.saied@ejust.edu.eg (M. Said),
ahmed.shalaby@ejust.edu.eg (A. Shalaby), farhad@ejust.kyushu-u.ac.jp
(F. Mehdipour), m.abhari@auckland.ac.nz (M. Biglari-Abhari), m.ragab@ejust.edu.eg
(M. El-Sayed).
to extra fabrication steps for bonding dies or wafers to each other
to create the 3D stack. These extra steps may result in faulty TSVs
due to misalignment of TSVs, partially filled ones, etc. [3] . What
makes the situation worse, is that the probability of having faulty
TSVs increases as the total number of TSVs increases. Hence, find-
ing a technique that reduces TSV count without affecting the ben-
efits gained by 3D integration is very important. In [1,4,5] the TSV
count has been reduced by multiplexing, serialization, or virtual-
ization, respectively. The TSV multiplexing technique introduced in
[1] reduces the number of TSVs by half, by multiplexing each two
3D signals
1
into one signal and passing this signal through one TSV
instead of two as in the conventional 3D-ICs. Therefore almost 50%
reduction in the number of TSVs is achieved. Due to the signifi-
cant reduction in TSV count, the analysis done in [1] on yield has
revealed very high improvement over conventional 3D-ICs.
1
A 3D signal is the signal that traverses from one layer to another in the 3D
stack.
http://dx.doi.org/10.1016/j.micpro.2016.01.011
0141-9331/© 2016 Elsevier B.V. All rights reserved.
M. Said et al. / Microprocessors and Microsystems 43 (2016) 26–46 27
Table 1
Differences and extra contributions of this extended work in comparison to the work of Said et al. [8] .
Facets of comparison Work of Said et al. [8] This paper
3D NoC adopted 3 × 3 × 2 4 × 4 × 4
Switiching technique Store-And-Forward Wormhole
Virtual channels (VCs) Single buffer per port 2 VCs per port
Technology adopted 180 nm More finer 65 nm technology
TSV capacitance Fixed at 15 fF Considered as a parameter with 15 fF to 500 fF
to reflect different TSV technologies
TSVBOX multiplexing ratio used Fixed at 2 × 1 Considered as a general parameter N
MUX
× 1, N
MUX
≥ 2
Yield and cost analysis – Included in Section 8
Analytical analysis on the effect of different traffic
patterns on the performance of TSVBOX
– Included in Section 7.4
Power consumption evaluation – Included in Section 9.5
Scalability analysis – Included in Section 7
The investigation in [1] covered area, yield, cost, and power
consumption analysis. In [6,7] , other physical effects of reducing
TSV count are studied. In [6] , the impact of reducing TSV count
on maximum temperature is investigated, and as expected the
maximum temperature increases as the TSV count decreases. TSVs
are usually fabricated using low thermal resistivity materials such
as copper or tungsten [2] . Therefore reducing TSV count will in-
crease the total thermal resistance of the 3D stack. In [7] the
residual thermal-stress impact created during bonding process in
fabrication has been studied carefully to accurately determine the
Keep-Out-Zone area overhead around TSVs in order to accurately
estimate the yield of the 3D stack.
The TSVBOX uses extra selection signal ( S ) to control the multi-
plexer (MUX) and the demultiplexer (DeMUX). This S signal intro-
duces some delay to one of the multiplexed signals besides the
parasitics of the TSVBOX itself. Such delay may affect the func-
tional validity of the system to be implemented using TSVBOX.
Although, [1,6,7] address most of the issues related to TSVs mul-
tiplexing and show the advantages and limitations of this tech-
nique, the timing requirements and the design methodology based
on TSV multiplexing have not been studied yet. None of the above
related works shows any system implementation of a circuit using
TSVBOX, so its functionality and applicability in system level have
not been proven yet.
Due to its scalability and novelty as a multicore communication
architecture for future multiprocessor SoCs, 3D NoC is selected as
our target system architecture for applying the TSVBOX technique.
In this paper, timing requirements for TSVBOX-based 3D NoC are
investigated so that the delay introduced by the TSVBOX is mit-
igated and the performance degradation and incorrect operation
are avoided. Also, the design methodology for the TSVBOX-based
3D NoCs is introduced. For the sake of comparison, two versions
of 3D NoCs are introduced, one based on conventional 3D integra-
tion without TSV multiplexing, while the other version is TSVBOX-
based. Finally, the main aspects of the NoC architecture in terms of
performance and power are investigated under different simulation
scenarios.
The contribution of this paper can be summarized in the fol-
lowing points:
• A Low-level circuit model for TSVBOX is introduced, in order to
estimate the TSVBOX delay and power consumption; showing
possible RC parasitics of its different components.
• Studying the timing requirements of the TSVBOX, determining
the selection signal properties and its relation to the main clock
signal in the case of 3D NoCs.
• Introducing a complete design methodology with detailed steps
required to design a 3D NoC involving TSVBOX.
• Investigating the most important aspects of the 3D NoC; per-
formance and power consumption, showing the cases at which
the TSVBOX does not affect these metrics.
• Proposing analytical models to compare the basic fabrication
metrics; yield and cost of the conventional and TSVBOX-based
3D NoCs.
Finally, this work is an extensive extension of the work of Said
et al. [8] in many ways. All differences between this work and the
work of Said et al. [8] is included in Table 1 . According to our esti-
mation the extra work is more than 60% of the work of Said et al.
[8] .
The rest of this paper is organized as follows: Section 2 presents
the details of the TSVBOX technique. Section 3 explores the archi-
tecture and design of the target 3D NoC and the 3D router used.
Sections 4 and 5 highlight the TSVBOX parasitic model and various
design aspects. Section 6 introduces the TSVBOX timing require-
ments analysis and the TSVBOX-based 3D NoC design methodol-
ogy. The scalability analysis is investigated in Section 7 , while the
fabrication yield and cost are analyzed in Section 8 . Simulation re-
sults are discussed and presented in Section 9 . Finally, Section 10
concludes the paper.
2. TSVBOX
Fig. 1 a shows the TSVBOX structure. As shown in Fig. 1 b, the
two inputs of the TSVBOX multiplexer (MUX) are the two signals
( V
1
, V
2
) that are supposed to be multiplexed through single TSV.
The S signal ( Fig. 1 c) is the signal that controls MUX and DeMUX,
and its clocking period T
S
is at least equivalent to double the delay
of the TSVBOX (T
d−T SV BOX
) , where T
d−T SV BOX
is the delay from the
input point V
1
( V
2
) to the output point V
1
( V
2
). Assuming that
during the first half cycle V
2
is selected, the TSVBOX circuit will
hold the charge of V
2
during the second half cycle, therefore
V
2
(t) = V
2
t −
T
S
2
(1)
During the second half cycle similar behavior for V
1
is repeated
but with another time shift equal to
T
S
2
due to the waiting of the
selection process of the first half cycle, so at the end we have
V
1
(t) = V
1
(t − T
S
) (2)
Fig. 2 shows all the TSVBOX signals and how V
1
and V
2
are af-
fected with the TSVBOX delays. For more details about the TSVBOX
functionality refer to Ref. [1] . In [1] it is assumed that the delay
incurred by the TSVBOX T
S
T
CLK
, therefore this delay can be ne-
glected and there would be no incorrectness issues in reading the
multiplexed voltages. In real application this situation is feasible
for example when both 3D paths of V
1
and V
2
are not part of
the critical path, and their delay is at least less than the critical
path delay by T
S
. However, the general situation when T
S
is com-
parable to the clock signal is not addressed in [1] . Therefore, there
are some timing requirements that the TSVBOX must fulfill for the
sake of correct operation. In this paper all conditions or require-
ments will be studied for 3D NoC architecture.
28 M. Said et al. / Microprocessors and Microsystems 43 (2016) 26–46
Fig. 1. (a) TSVBOX schematic, (b) TSVBOX circuit implementation, (c) Selection
signal S .
3. The target 3D NoC architecture
Fig. 3 shows the target 3D NoC architecture, which relies on
a 4 × 4 × 4 mesh topology. In [9] an 80 core NoC was pre-
sented, so our target NoC matches the trend of NoC domain. For
simplicity, each router has core concentration of one [10] , which
means only one processor core is connected to the local port of the
router. To achieve strong fairness between internal requests of the
router, separable allocation is adopted with Round-Robin arbitra-
tion [11,12] . The size of each injected packet is five flits, and each
flit in turn is 64 bits. The head flit contains the routing informa-
tion while the others are supposed to carry the data. We choose
wormhole switching and XYZ deterministic routing to be the 3D
NoC switching technique and routing algorithm [13] , respectively.
Each input port in the router contains two virtual channels of size
one flit while the local virtual channel buffers are assumed to be
with infinite size to serve isolating traffic injection process from
the NoC. Therefore the delay after generating the flit and before
entering source buffer is accounted for as well as the delay in the
source queue buffers [14] . According to [15,16] the total system
area can be assumed 400 mm
2
, thus the area of each layer of the
target 3D NoC can be assumed 10 × 10 mm
2
. Hence the length
of the horizontal interconnect wires between two neighbor routers
are 2.5 mm [16] . For the vertical interconnects, we choose the TSV
capacitance to be a parameter in our simulations. The change in
TSV capacitance reflects the change in TSV length and the resistiv-
ity of the substrate bulk used which in turn reflects different 3D
integration technologies [17–20] .
The data bus width N
BW
is assumed to be equivalent to the
flit size, as shown in Fig. 5 a and b. For the conventional 3D NoC,
the whole 3D data bus width is N
BW
+ 2, where the extra two bits
are required for the handshaking communication protocol which
needs request ( REQ ) and acknowledgement ( ACK ) signals [21] . For
the TSVBOX case, the data bits of the packet are multiplexed and
hence
N
BW
2
+ 2 TSVs are required, including REQ and ACK signals.
Fig. 2. Various TSVBOX signals; S ( t ), V
1
( t ), V
2
( t ), V
1
( t ), and V
2
( t ).
Fig. 3. 4 × 4 × 4 3D NoC architecture.
However for each vertical bus width two extra TSVs are required to
transfer the S signal and its inverted version S . Therefore, the verti-
cal connection bus width is
N
BW
2
+ 4 for the TSVBOX-based 3D NoC.
Any two neighbor routers are connected by two opposite unidirec-
tional channels. Thus the vertical port contains 2( N
BW
+ 2) TSVs for
conventional 3D NoC, and 2(
N
BW
2
+ 4) for TSVBOX case.
Since SPICE models for such system would be too complicated
and time consuming either in design or simulation, SystemC-A is
used for our 3D NoC implementation. SystemC supported by the
Open SystemC Initiative (OSCI)
2
is an open source language avail-
able to meet the ever-increasing needs of system-level design and
SoC technologies. Using SystemC-A, high- and low-levels of im-
plementation can be done together for a system. For processor
cores, routers, and intra-layer interconnects, we use behavioral sys-
tem implementation, while for inter-layer interconnects (the verti-
cal connections) , we rely on a low-level circuit implementation to
be able to estimate delays and power consumption.
2
http://www.systemc-ams.org/ .
M. Said et al. / Microprocessors and Microsystems 43 (2016) 26–46 29
abc
de
Fig. 4. (a) The 3D signal path and circuit models of (b) TSV, (c) global wiring, (d)
NMOS , and (e) PMOS .
4. Modeling
As shown in Fig. 4 a, the 3D signal is assumed to pass through
an input inverter driver, a global wiring segment in the first layer,
a TSV, a global wiring segment in the second layer, and an output
inverter driver. The output inverter driver is assumed always 1x-
inverter (minimum size inverter). For the TSV and wiring circuit
models, the models introduced in [2,22] are used, which are shown
in Fig. 4 b and c, respectively.
The TSVBOX is composed of MUX and DeMUX circuits and
a TSV in between. The MUX or the DeMUX is composed of
two transmission gates and each transmission gate is composed
of two transistors. Therefore to model TSVBOX, a transistor cir-
cuit model that depicts different transistors’ RC parasitics is re-
quired. Referring to the work in [23] , the transistor parasitics
can be modeled as shown in Fig. 4 d and e. The parasitics of
this model are: ON resistance of NMOS ( R
onN
), ON resistance
of PMOS ( R
onP
), NMOS source/drain-bulk capacitance ( C
sbN
/ C
dbN
),
PMOS source/drain-bulk capacitance ( C
sbP
/ C
dbP
), NMOS gate capac-
itance ( C
gN
), and PMOS gate capacitance ( C
gP
). According to Weste
and Harris [23] , C
sbN
= C
dbN
= C
N
for NMOS and C
sbP
= C
dbP
= C
P
for
PMOS . Also, for equivalent NMOS and PMOS sizes, both NMOS and
PMOS gate capacitances’ values are equivalent so C
gP
= C
gN
= C
g
.
4.1. Conventional 3D-IC 3D signal path modeling
The conventional 3D-IC 3D signal path is shown in Fig. 6 where
the signal is assumed to pass through an inverter driver (repre-
sented by its ON resistance R
dr−Con v
), a global wiring segment in
the first layer, a TSV, a global wiring segment in the second layer,
and a load capacitance which is assumed to be the input gate ca-
pacitance of a 1x-inverter driver in the second layer.
4.2. TSVBOX 3D signal path modeling
Fig. 7 shows the TSVBOX circuit model. It is similar to the cir-
cuit of the conventional 3D signal path, the difference is that the
equivalent RC parasitic circuit of the transistors in MUX and De-
MUX are involved. Since there is no transistor models in SystemC-
A, the transistors of the transmission gate are modeled using per-
fect switches. The S signal controls the upper transmission gates
of the MUX and the DeMUX, and its inverted version S controls
the lower ones. Therefore both the lower and upper transmission
gates will switch ON or OFF exclusively as required in the original
TSVBOX design. The S signal path shown in Fig. 7 , is similar to the
conventional 3D signal path. However, since the S signal is driving
the gates of the transmission gates, therefore for each S path, there
are four gate capacitances 4 C
g
involved in the load; 2 C
g
from MUX
and 2 C
g
from DeMUX.
5. Design parameters and parasitics
In this section, parasitics values and technology parameters are
introduced and the design considerations are detailed.
5.1. Technology parasitics and parameters
In this study 65 nm is selected to be our target technol-
ogy. Technology parasitics and parameters spans the NMOSs’ and
PMOSs’ parasitics shown in Fig. 4 and also their threshold voltages.
ab
Fig. 5. Full duplex transmission for (a) conventional and (b) TSVBOX-based 3D NoC.
Fig. 6. Conventional 3D NoC 3D signal path.
剩余20页未读,继续阅读
资源评论
qq_35077624
- 粉丝: 0
- 资源: 2
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 基于Springboot+Vue的影院订票系统的设计与实现-毕业源码案例设计(源码+数据库).zip
- 基于Springboot+Vue的疫情管理系统-毕业源码案例设计(高分项目).zip
- 基于Springboot+Vue的影城管理电影购票系统毕业源码案例设计(95分以上).zip
- 贝加莱控制系统常见问题手册
- uDDS源程序subscriber
- 基于Springboot+Vue的游戏交易系统-毕业源码案例设计(源码+数据库).zip
- 基于Springboot+Vue的在线教育系统设计与实现毕业源码案例设计(源码+论文).zip
- 基于Springboot+Vue的在线拍卖系统毕业源码案例设计(高分毕业设计).zip
- PDF翻译器:各种语言的PDF互翻译,能完美保留公式、格式、图片,还能生成单独或者中英对照的PDF文件
- 基于Springboot+Vue的智能家居系统-毕业源码案例设计(源码+数据库).zip
- 基于Springboot+Vue的在线文档管理系统毕业源码案例设计(源码+项目说明+演示视频).zip
- 基于Springboot+Vue的智慧生活商城系统设计与实现-毕业源码案例设计(95分以上).zip
- 基于Springboot+Vue的装饰工程管理系统-毕业源码案例设计(源码+项目说明+演示视频).zip
- 基于Springboot+Vue的租房管理系统-毕业源码案例设计(高分毕业设计).zip
- 基于Springboot+Vue电影评论网站系统设计毕业源码案例设计(高分项目).zip
- 基于Springboot+Vue服装生产管理系统毕业源码案例设计(95分以上).zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功