either regarding it as secondary objective [21][11][18] or even
ignoring it at all [9]. We demonstrate latter that for a se-
ries of realistic benchmarks, INTEGRA’s clustering degen-
erates into a random clustering of MBFFs and induces huge
degradation of signal wirelength, which might cancel out the
power benefit of MBFFs.
In this paper we propose an analytical model for the clus-
tering score. The basic idea is to first propose an exact for-
mulation of the number of clusters based on the Dirac delta
function, and smooth it using the Gaussian function to make
it applicable in a nonlinear programming (NLP) framework
with another objective in wirelength. With a well-designed
NLP solver, the acceleration techniques such as customized
fast Gauss transformation, and a further discrete refinement,
our clustering flow delivers high-quality MBFF clustering re-
sults with short wirelength in sub-quadratic time. Taken IN-
TEGRA as baseline, our method shows a comparable power
reduction, reducing clock power by about 20%. Further-
more, our method optimizes signal wirelength by about 25%.
What’s more, the proposed method is promising to be in-
tegrated in an in-placement MBFF clustering solver, and
be applied in other problems which require formulating the
clustering score in the objective function.
The rest of this paper is organized as follows. Section 2
reviews the effectiveness and limitations of previous work.
Section 3 introduces our optimization flow, including ana-
lytical clustering model and discrete refinement. Section 4
elaborates the details of NLP solver and the acceleration
technique. Section 5 evaluates our method, and shows ex-
perimental data and comparisons. Finally, Section 6 con-
cludes this work.
2. PRELIMINARY AND MOTIVATION
This section will focus on two concepts, the timing violation-
free distance (TVFD) and the average distance between neigh-
boring flip-flops (AFFD). The concept of TVFD will be de-
scribed in Section 2.1, and the concept of AFFD and our
motivations will be discussed in Section 2.2.
2.1 Post-placement MBFF Problem Formula-
tion and Previous Solutions
Given the following inputs, the post-placement MBFF
clustering problem will replace a few single-bit FFs with
MBFF such that the power is minimized and the timing
constraint and placement density constraint are satisfied.
• The number of N placed flip-flops (FFs), where each
FF
i
with coordinate (x
i
, y
i
) can either be single-bit or
multi-bit.
• The upstream pin PI
i
and downstream pin PO
i
of
FF
i
, and corresponding timing slacks T
s
(PI
i
, FF
i
) and
T
s
(PO
i
, FF
i
) between the pins and FF on the up-
stream and downstream timing paths, respectively.
• The MBFF library with information of power and area.
• The locations of placement blockages.
Timing constraint is proposed to prevent MBFF cluster-
ing from violating cycle delay. The basic idea is shown in
Figure 2. With timing analysis, we could obtain timing slack
between input (output) pins and FF. By transforming the
timing slack into equivalent metal wirelength using Elmore
Figure 2: Concept of timing-violation-free distance
(TVFD) and timing-violation-free region (TVFR)
delay model [4], we can find the timing-violation-free dis-
tance (TVFD) between FF and input (output) pins. Taken
the input and output TVFDs into account, each FF can
find a feasible region for movement without violating timing,
which we label as timing-violation-free region (TVFR) in
Figure 2. Consequently, clustering a few single-bit FFs with
overlapped TVFRs can optimize power and area without vi-
olating timing constraint. Specifically, using the library in
Table 1, the basic objective of MBFF clustering problem
should merge FFs with overlapped TVFRs into 4-bit groups
as many as possible.
Previous works certify the timing constraint with well-
designed structures: intersection graph [21][11][18] or in-
terval graph [9] and search the MBFF candidates on these
graphs.
The MBFF clustering based on intersection graph has the
following defects: 1) searching maximum clique is computa-
tional expensive with known best performance of O(N
3
). 2)
The window-based optimization technique for problem size
reduction limits the clustering ratio of MBFF due to the
difficulty of manipulating the boundaries.
Unlike window-based technique, INTEGRA considers all
FFs simultaneously, with a well designed structure: interval
graph. Searching on the interval graph is more efficient, with
sub-quadratic complexity. Moreover, the optimization with
global information delivers better clustering ratio than the
optimization with local information only. Thus, INTEGRA
outperforms the previous work in terms of power reduction
and runtime complexity.
However, the efficient runtime of INTEGRA partially ben-
efits from the simple strategy of choosing MBFF candidates
without measuring the impact to signal wirelength. Taking
an extreme case as example, where the intervals of feasible
region for each FF is as large as the placement region, INTE-
GRA’s solution degenerates to a random solution. Although
the wirelength degradation on widely used benchmarks C1-
C6 [11] is acceptable (around 3%), we analyze in the follow-
ing subsection that for a series of realistic designs, wirelength
degradation is quite huge. Such observation motivates us to
propose a new solution with great power reduction, better
signal wirelength and efficient time complexity.
2.2 Previous Solutions are Inappropriate for
Realistic Designs
We demonstrate in this part that 1) the widely used bench-
marks C1-C6 [11] can only represent a single kind of circuit;
2) realistic designs from IWLS[2] are quite different from
C1-C6 in terms of TVFR. Specifically, for the realistic de-