A Design Flow for Application-Specific Networks on Chip with Guaranteed
Performance to Accelerate SOC Design and Verification
Kees Goossens, John Dielissen, Om Prakash Gangwal, Santiago Gonz
´
alez Pestana,
Andrei R
˘
adulescu, and Edwin Rijpkema
Philips Research Laboratories, Eindhoven, The Netherlands
Abstract
Systems on chip (SOC) are composed of intellectual property
blocks (IP) and interconnect. While mature tooling exists to design
the former, tooling for interconnect design is still a research area.
In this paper we describe an operational design flow that gener-
ates and configures application-specific network on chip (NOC) in-
stances, given application communication requirements. The NOC
can be simulated in SystemC and RTL VHDL. An independent per-
formance verification tool verifies analytically that the NOC in-
stance (hardware) and its configuration (software) together meet
the application performance requirements. The Æthereal NOC’s
guaranteed performance is essential to replace time-consuming
simulation by fast analytical performance validation. As a result,
application-specific NOCs that are guaranteed to meet the appli-
cation’s communication requirements are generated and verified
in minutes, reducing the number of design iterations. A realistic
MPEG SOC example substantiates our claims.
1 Introduction
A SOC is naturally composed of computation and storage ele-
ments (intellectual property blocks or IP) that are interconnected
by communication elements (busses, networks on chip or NOC). In
this paper, we focus on NOC interconnects because of their modu-
larity, scalability, and other advantages for large SOCs.
Mature tooling exists to design individual IP, such as RTL syn-
thesis and processor synthesis. Moreover, extensive IP re-use (of
memories, processors, and application-specific blocks) is com-
mon practice. In contrast, an interconnect is specific to a SOC
because the communication requirements depend on the composi-
tion of IP, which is application specific. Its design costs cannot be
amortised over multiple SOCs, because it cannot be re-used whole.
Tools for NOC synthesis are therefore essential for fast and effi-
cient SOC design. These tools depend on the modularity of NOCs;
i.e. (application-specific) NOCs are composed of two re-usable
parametrised components: routers and network interfaces (NI).
In this paper we describe our design flow to dimension and gen-
erate application-specific NOC instances, given the communication
requirements of the application. The NOC hardware (router and
NI topology), and the IP port to NI port mapping are described in
XML, which are translated to synthesisable RTL VHDL, and to Sys-
temC. Minimum buffer sizes can also be computed. Every NOC
instance is programmable, and its configuration (software) is gen-
erated in XML format for SystemC and VHDL simulation, and in C
format for embedded processors in the SOC. VHDL simulation is
bit and cycle-accurate, and SystemC transaction-level simulation
is flit-accurate. If IP are not yet available for simulation, traffic
generators are used that mimic their communication behaviour, as
specified in the application communication requirements. A pow-
erful new element in our design flow is performance verification,
described below.
Impact of guaranteed NOC services on design flows
One of the major challenges in SOC design is ensuring that
the SOC fulfills the (real-time) application requirements under all
circumstances, such as video throughput and latency for set-top
boxes, or packet loss and throughput for network processors. As-
suming the IPs have the right performance (operations per second,
storage capacity, etc.), we must generate a NOC with the right per-
formance. We will show that using a NOC with guaranteed services
(such as minimum throughput, maximum latency and jitter, etc.)
as opposed to a best-effort NOC has important benefits for a design
flow. In particular, this results in a fundamental difference in how
performance is validated.
Using a NOC with best-effort services, any method can be used
to generate a NOC. Then, the NOC performance must be validated
by simulating the complete SOC (i.e. NOC and IPs) because the
behaviours of IPs and NOC may be interdependent and influence
each other. Simulation of a single trace is relatively slow, and the
number of traces is huge. Therefore, given that not all possible
traces can be simulated, no 100% guarantee can be given that per-
formance requirements are met. Moreover, the performance ob-
served in the simulated traces and the worst-case performance of a
system may differ substantially, which means that adding a “safety
margin” (e.g. sizing a buffer to twice the maximum observed dur-
ing simulation) is not safe (e.g. see Section 3.6).
Only analysis can cover all cases. However, the distributed
arbitration in NOCs often leads to statistical performance mod-
els [1,9], which offer no guarantee that performance requirements
are always met.
1
NOCs with guaranteed services take provisions
in their architectures to offer connections with guaranteed per-
formance, such as absence of data loss, minimum throughput,
and maximum latency. This enables analytical reasoning about
NOC performance. Examples are Æthereal [6, 8], Nostrum [14],
aSOC [13], using time-division-multiple-access (TDMA) schemes,
and [5,12] using (virtual)-circuit-switching schemes.
NOC communication guarantees have several positive effects
on the design flow. First, all IPs and the NOC are decoupled [19],
meaning that the communication behaviour of one IP cannot affect
that of other IPs. As a result, they can be designed and validated
independently (compositionality). (In contrast to best-effort NOCs
where all IPs and NOC have to simulated together.) Second, the
NOC performance model can be used to generate an application-
specific NOC that meets the communication requirements under all
circumstances (correct by construction). Third, the performance of
1
With 99.9% of packets meeting their required service [1], for every
high-definition video frame 2000 pixels are too late. Delayed control traffic
(e.g. programming a DMA engine for every 100Hz frame) can have much
larger impact, and would occur every 10 seconds.
1