Robust Chip-Level Clock Tree Synthesis for SOC Designs
Anand Rajaram
Department of ECE
University of Texas at Austin, Texas
anandr@mail.utexas.edu
David Z. Pan
Department of ECE
University of Texas at Austin, Texas
dpan@ece.utexas.edu
ABSTRACT
A key problem that arises in System-on-a-Chip (SOC) designs
of today is the Chip-level Clock Tree Synthesis (CCTS). CCTS
is done by merging all the clock trees belonging to different
IPs per chip specifications. A primary requirement of CCTS
is to balance the sub-clock-trees belonging to different IPs such
that the entire tree has a small skew across all process cor-
ners. This helps in timing closure across all the design cor-
ners. Another important requirement of CCTS is to reduce
clock divergence between IPs that have critical timing paths
between them, thereby reducing maximum possible clock skew
in the critical paths and thus improves yield. In this work,
we propose effective CCTS algorithms to simultaneously re-
duce multi-corner skew and clock divergence. To the best of
our knowledge, this is the first work that attempts to solve this
practically important problem. Experimental results on several
testcases indicate that our methods achieve 10%-31%(20% on
average) clock divergence reduction and between 16-64ps skew
reduction (1.6%-6.4% of cycle time for a 1GHz clock) with less
than 0.5% increase in buffer area/wirelength compared to ex-
isting CTS algorithms.
Categories and Subject Descriptors
B.7.2 [Hardware]: Integrated Circuits
General Terms
Algorithms
Keywords
Clock Network, Chip-level CTS, Physical Design
1. INTRODUCTION
A System-on-a-Chip (SOC) design can be defined as “an IC,
designed by stitching together multiple stand-alone VLSI de-
signs to provide full functionality for an application” [1]. In
today’s 65nm/45nm VLSI technologies, SOC designs have be-
come increasingly common and the trend is expected to con-
tinue in the future [2]. Most SOC physical design closure is
done in a hierarchical fashion [1]. In such a methodology, differ-
ent logical and physical partitions of the chip are timing closed
independently [1–4] followed by a chip-level timing closure step.
This chip-level timing closure includes CCTS in which a chip-
level clock tree is synthesized to drive all the block-level clock
trees. The primary objective of CCTS is that the full clock tree,
which includes the chip-level and all the block-level clock trees,
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00.
should be balanced and have less skew across all design corners.
Satisfying this requirement is relatively easy when considering
only the nominal delay corner. However, timing closure in most
practical chips involve verifying timing across several corners
that represent several global variation effects. This implies that
the clock trees should have small skews across all the design
corners. This is a very challenging task primarily because of
the possible difference in the way the delays of the different
sub-clock-trees scale, either because of difference in the clock
structures or the relative significance of cell and interconnect
delays. Another objective of CCTS is to minimize the clock
divergence for the IPs with critical path between them. This
helps to minimize skew variation between the critical timing
paths between the IPs and thus improves the overall yield. In
this work, we propose effective algorithms with the objective
of addressing the above two objectives.
2. MOTIVATION
2.1 Significance of Clock Divergence Reduction
The significance of reducing clock divergence between registers
in timing-critical paths is well known. For a given overall delay,
the lesser the divergent delay between the such register-pairs,
the lesser is the value of maximum skew (and skew variation)
that can be seen between them. The same principle is also
applicable at the chip-level where different sub-blocks interact
with each other instead of register pairs.
2.2 Impact of Sub-block Clock Pin Location
Unlike hard IPs, the clock pins of the soft-IPs can be changed
specific to a given chip and floorplan. This flexibility can be
used towards clock divergence reduction between critical IPs.
Figure 1 shows a simple example where the clock pin assign-
ment might make a difference in clock divergence reducing.
B
A
B
C
A
B
C
Critical Paths
A
Critical Paths
Divergence point
between A,B
Figure 1: Pin location in Case B will result in reduced
clock divergence between A and B.
2.3 Multi-corner skew reduction problem
Consider Figure 2 where only two sub-blocks are present. The
squares in the sub-blocks represent clock sinks. The left-side
block has bigger buffers with longer interconnects and the right-
side block has smaller buffers with shorter interconnect. Let us
assume that both sub-clock-trees have identical delays in the
nominal corner. However, their delays across other corners will
be different, mainly because of the difference in the intercon-
nect lengths and buffer sizes. To balance these two sub-clock-
720
40.4