没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
Wafer Scale Integration project paper. Abstract-In this paper we outline some of the technology, successful and unsuccessful, of part of a large European project in wafer scale integration (WSI). The work described is an attempt to build a 64 by 64 array processor on a 4-in wafer. Such a processor would have a computing power in excess of 10 billion operations per second. A test chip and a demonstration system, which achieves such a processing power, is also outlined
资源推荐
资源详情
资源评论
626
IEEE
TRANSACTIONS ON
COMPONENTS,
HYBRIDS,
AND
MANUFACTURING
TECHNOLOGY,
VOL.
16,
NO.
7,
NOVEMBER
1993
The
ELSA
Wafer Scale Integration Project
Peter
Ivey
Abstract-In this paper we outline some of the technology,
successful and unsuccessful, of part of a large European project in
wafer scale integration
(WSI).
The work described is an attempt
to build a 64 by 64 array processor on a 4-in wafer. Such a
processor would have a computing power in excess of
10
billion
operations per second.
A
test chip and a demonstration system,
which achieves such a processing power, is also outlined.
I.
INTRODUCTION
HE intention of this paper is to provide an overview
of
T
the ELSA project (ELSA-the European Large SIMD
Array) which was part of the European Community funded
Esprit Project
824,
Wafer Scale Integration. The aim is to
show the successes and failures of the project, which was
a major undertaking involving
8
companies, subcontractors,
and research institutes in 4 European Community countries
and to allow readers to glimpse the breadth and depth of the
investigation. Many of the results have applications in other
packaging and large area device projects and are, therefore,
of
wider interest than to just the wafer scale integration (WSI)
community.
Our work can be seen, in some sense, as steps on the
road to ultra-large chips and wafer-sized circuits. Whatever
the geometry
of
the underlying silicon, there is no doubt that
more can be put onto a “large” chip or wafer than can be put on
a
conventional device.
To
achieve large chips, economically,
requires reconfiguration and defect tolerance and our work has
been predominately a study of techniques and architectures for
achieving these goals.
The main objective of the ELSA project was to provide
the opportunity to develop and evaluate WSI technology.
‘The final outcome of the project was to be a wafer level
SIMD
processor, together with the necessary software support
tools and an evaluation board. WSI was considered relevant
to
a number of applications. The wafer has many potential
itpplications, such as in telecommunications, image processing,
pattern matching, artificial intelligence, robotics, military, and
aerospace, etc.
The range of hardware technologies that was developed
and evaluated as part of the whole program was very wide
1
I],
[2] including wafer packaging, defect tolerance, over-
wafer copper tracking, e-beam programmable transistors, soft
reconfiguration, etc. The aim
of
this paper is to show how some
of these technologies were exploited in the ELSA project.
Manuscript received March
27,
1993; revised July 22, 1993. This work was
,upported by SGS-Thomson, BT Laboratories, and the European Community.
This paper was presented at the International Conference on Wafer Scale
Integration. San Francisco,
CA,
January 2Ck22, 1993.
The author is with the Department
of
Electronic and Electrical Engineering,
IJniversity
of
Sheffield, Sheffield
S1
3JD,
U.
K.
IEEE
Log
Number 9212045.
The project consisted of many components. In the first
phase we analyzed the architectures that were suitable for
WSI [3] and concluded that generality and regularity were
essential if a successful “product” was to be produced. In
parallel with this, the basic technologies were developed and
demonstrated on silicon [4],
[5].
The design
of
the processor
was commenced and a prototype chip was developed to allow
the basic architecture to be tested
[3], [6]. In the second phase
of the project the wafer scale device (ELSA) was designed,
building on the successful outcome of the demonstrator chip.
A second task involved the development
of
the software
tools necessary to allow applications to be programmed on
the array. These tools were designed to be
of
sufficiently high
performance to enable practical use of the final system. This
task involved very little risk since the technology of compiler
design is well understood and the type
of
system required is
relatively simple compared to a compiler for a general purpose
computer or microprocessor.
In a third task, a prototype system was fabricated in order
that development
of
the compiler and simulator could take
place. This prototype was then developed further to incorporate
the reconfigured wafer. It was intended that at the final stage
of the project a number of highly parallel algorithms would
be demonstrated on the wafer system.
In this paper we will concentrate on the hardware aspects
of the ELSA project, leaving the reconfiguration and software
to be covered in
[7].
Firstly, in Section 11, we will outline the
architecture of the wafer, concentrating on the basic processing
element and the reconfiguration hardware, while in Section
111
we will describe the ELSA hardware implementation and
packaging. In Section
IV
we will outline the main results of the
project and the lessons learned. In addition we will discuss the
ELSA demonstrator and its expected performance by reference
to an emulator system which was designed using demonstrator
chips. In Section
V
we will conclude the paper and summarize
the overall achievements of the project.
11.
ELSA
Large arrays of simple processors have a long history.
As
early as 1962, an array of
32
by 32 SIMD processors was
proposed (but never built) by Slotnick
et
al.
[8].
Later, in
1966, Illiac
IV
was developed at the University of Illinois [9]
and a small (8 by 8) version was built. In 1975, International
Computers Ltd. developed the
DAP
[lo],
which is a 64
by 64 array of simple SIMD processors, and this was one
of the first commercial applications
of
array computers. In
1983, Goodyear Aerospace produced the Massively Parallel
Processor (MPP) for NASA [ll] and most recently (in 1984)
Martin Marietta and NCR introduced the Geometric Arith-
0148-6411/93$03.00
0
1993 IEEE
IVEY:
THE
ELSA
WAFER
SCALE
INTEGRATION
PROJECT
621
WAFER LEVEL
RETICLE LEVEL
CHIP
LEVEL
Fig.
1.
The
ELSA
hierarchy.
metic Parallel Processor (GAPP) which contains a 6 by 12
array of 1-b SIMD processors on a single chip.
ELSA is also based on a
SIMD
architecture and builds on
the successes of these earlier designs and takes advantage of
the lessons learned by their developers. Some new features
have been introduced which increase the performance by about
an order of magnitude over the earlier devices. It is now some
years since the ELSA processing element (PE) was specified,
and it is undoubtedly true that architectural improvements
have occurred in the interim (e.g.,
[12])
and that, were we
to commence ELSA today, the detailed architecture would be
different. However, this does not invalidate this work since
our main intention was to develop
WSI
technology.
A.
ELSA
Architecture
ELSA is structured in a three level hierarchy
[l]
(Fig. 1).
The basic PE’s are arranged to create a “chip” level component
of
7
x
12
PE’s from which, by configuration, a final chip
array size of
6
x
12
PE’s
is
constructed. These chips are then
arranged to form a reticle level component which contains 4
chips. At this level, reconfiguration switches allow defective
chips to be bypassed. The reticle
is
stepped and repeated
across the wafer, as in the step and repeat process used in
conventional
IC
fabrication, except that connections are made
between the edges of adjacent reticles. A wafer level metal
mask allows the placing of bonding pads around the wafer
periphery and connection of these pads to the reticle edge
connectors.
The hierarchical structure of the ELSA wafer
is
shown in
Fig.
1.
Large numbers of bit serial PE’s are configured in a
two-dimensional array and will operate at a clock speed of up
to
20
MHz. For 8-b integer data, addition can be carried out
at about
5
billion operations per second. The wafer employs
a 1.2-pm double metal CMOS process.
The high computing power provided by ELSA will enable
many very computationally intensive image processing appli-
CMS
Rl
d
CM
FG
F
LG
Control
I
N
S
EW
BIT
R1
Control
E
W
NS
R2
BIT
Control
llrt-t-
R2
CY
CM
Control
C
Control
SM
CM
Control
Control
Control
Fig.
2.
The
ELSA
PE
architecture
cations to be carried out in real time. The main features of
the development are: a PE architecture that is optimized for
performance, executing one single bit instruction per clock
cycle; high speed of operation with the use of the
RAM
in
read-modify-write mode and employment of a single reticle
technology requiring minimal extension to a standard VLSI
process line.
1)
SIMD
Cell Architecture:
The
ELSA
SIMD architecture
requires global transmission of commands
so that each PE is
operating on the same instruction at the same time. However,
because conditional instructions are rarely executed, the in-
structions can be pipelined to an arbitrary depth. The PE that
we will describe is a single bit processor (as in the MPP) and
of low complexity. As ever, complexity of the PE is traded-off
against a number of processors on the wafer and the problem
of making this trade-off is not one of what to include but of
what to leave out.
2)
PE
Architecture:
The circuit diagram of the ELSA pro-
cessing element is shown in Fig.
2.
Each PE contains six
multiplexers, five latches, an adderhubtractor, and two inde-
pendently addressable 64-b RAM’s (designated R1 and
R2).
The multiplexers control the selection of input data to the
adder/subtractor, the flag register, FG, the
RAM’S,
and the
communication register, CM. The data source selected in each
multiplexer is controlled by command bits.
The two 64-b static
RAM’S
are independently addressable
and an additional facility is a crossover selector on the
RAM
outputs which enables the fixed relationships between
RAM
outputs and multiplexers to be transposed.
3)
PE
Instruction:
Instructions for the PE have three fields
-
a
RAM
source/destination field, a multiplexer field, and
an
ALU
operation field. The
RAM
field is divided into
subfields which control the sourcing of data from RAM’S
A
剩余10页未读,继续阅读
资源评论
crawlsnailx
- 粉丝: 0
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- (源码)基于C++的East Zone DSTADSO Robotics Challenge 2019机器人控制系统.zip
- (源码)基于Arduino平台的焊接站控制系统.zip
- (源码)基于ESPboy系统的TZXDuino WiFi项目.zip
- (源码)基于Java的剧场账单管理系统.zip
- (源码)基于Java Swing的船只资料管理系统.zip
- (源码)基于Python框架的模拟购物系统.zip
- (源码)基于C++的图书管理系统.zip
- (源码)基于Arduino的简易温度显示系统.zip
- (源码)基于Arduino的智能电动轮椅系统.zip
- (源码)基于C++的数据库管理系统.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功