0% found this document useful (0 votes)
10 views15 pages

document

Uploaded by

vijay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views15 pages

document

Uploaded by

vijay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO.

3, MARCH 2010 367

DeFer: Deferred Decision Making Enabled


Fixed-Outline Floorplanning Algorithm
Jackey Z. Yan and Chris Chu

Abstract—In this paper, we present DeFer—a fast, high-quality, hierarchical framework is preferred by modern application-
scalable, and nonstochastic fixed-outline floorplanning algorithm. specific integrated circuit designs. Nevertheless, fixed-outline
DeFer generates a nonslicing floorplan by compacting a slicing
floorplanning has been shown to be much more difficult, com-
floorplan. To find a good slicing floorplan, instead of searching
through numerous slicing trees by simulated annealing as in pared with classical outline-free floorplanning, even without
traditional approaches, DeFer considers only one single slicing considering wirelength optimization [3].
tree. However, we generalize the notion of slicing tree based
on the principle of deferred decision making (DDM). When
two subfloorplans are combined at each node of the generalized A. Previous Work
slicing tree, DeFer does not specify their orientations, the left– Simulated annealing has been the most popular method of
right/top–bottom order between them, and the slice line direction. exploring good solutions on the fixed-outline floorplanning
DeFer even does not specify the slicing tree structure for small
subfloorplan. In other words, we are deferring the decisions on problem. Using sequence pair representation, Adya et al. [4]
these factors, which are specified arbitrarily at an early step in modified the objective function, and proposed a few new
traditional approaches. Because of DDM, one slicing tree actually moves based on slack computation to guide a better local
corresponds to a large number of slicing floorplan solutions, all of search. To improve the floorplanning scalability and initially
which are efficiently maintained in one single shape curve. With optimize the interconnections, in [5] the original circuit is
the final shape curve, it is straightforward to choose a good
floorplan fitting into the fixed outline. Several techniques are first cut into multiple partitions by a min-cut partitioner.
also proposed to further optimize the wirelength. For both fixed- Simultaneously, the chip region is split into small bins. After
outline and classical floorplanning problems, experimental results that, the annealing-based floorplanner [4] performs fixed-
show that DeFer achieves the best success rate, the best wirelength, outline floorplanning on each partition within its associated
the best runtime, and the best area on average compared with bin. In [6], Chen et al. adopted the B*-tree [7] representation
all other state-of-the-art floorplanners.
to describe the geometric relationships among modules, and
Index Terms—Deferred decision making, fixed outline, floor- performed a novel three-stage cooling schedule to speed up
planning, layout optimization. the annealing process. In [8] a multilevel partitioning step
is performed beforehand on the original circuit. Different
I. Introduction from [5], the annealing-based fixed-outline floorplanner is
performed iteratively at each level of the multilevel framework.
F LOORPLANNING has become a very crucial step in
modern very large scale integration (VLSI) designs. As
the start of physical design flow, floorplanning not only deter-
By enumerating the positions in sequence pairs, Chen et al. [9]
applied insertion after remove (IAR) to accelerate the simu-
lated annealing. As a result, both the runtime and success rate1
mines the top-level spatial structure of a chip, but also initially
are enhanced dramatically. Recently, using Ordered Quadtree
optimizes the interconnections. Thus, a good floorplan solution
representation, He et al. [10] adopted quadratic equations to
among circuit modules definitely has a positive impact on the
solve the fixed-outline floorplanning problem.
placement, routing, and even manufacturing. In the nanometer
All of the above techniques are based on simulated anneal-
scale era, the ever-increasing complexity of integrated circuits
ing. Generally, the authors tried various approaches to improve
(ICs) promotes the prevalence of hierarchical design. However,
the algorithm efficiency. But one common drawback is that
as pointed out by Kahng [1], classical outline-free floorplan-
these techniques do not have a good scalability. They become
ning [2] cannot satisfy such requirements of modern designs.
quite slow when the size of circuits grows large, e.g., 100
In contrast with this, fixed-outline floorplanning enabling the
modules. Additionally, the annealing-based techniques always
Manuscript received January 31, 2009; revised May 31, 2009 and October have a hard time handling circuits with soft modules, because
7, 2009. Current version published February 24, 2010. This work was partially they need to search a large solution space, which can be time-
supported by International Business Machines, Faculty Award, and National
Science Foundation under Grant CCF-0540998. This paper was recommended consuming.
by Associate Editor L. Scheffer. Some researchers have adopted nonstochastic methods.
The authors are with the Department of Electrical and Computer Engi- In [11], a slicing tree is first built up by recursively partitioning
neering, Iowa State University, Ames, IA 50010 USA (e-mail: zijunyan@
iastate.edu; cnchu@iastate.edu). the original circuit until each leaf node contains at most
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. 1 Success rate is defined as the ratio of the number of runs resulting a layout
Digital Object Identifier 10.1109/TCAD.2010.2041850 within fixed die, to the total number of runs.
0278-0070/$26.00 
c 2010 IEEE

Authorized licensed use limited to: Iowa State University. Downloaded on April 29,2010 at 19:31:34 UTC from IEEE Xplore. Restrictions apply.
368 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 3, MARCH 2010

two modules. Then the authors rely on various heuristics to nodes, both orders between them, and both horizontal and
determine the geometry relationships among the modules and vertical slice lines. Note that the work in [17] and [18]
output a final floorplan solution. Sassone et al. [12] proposed only generalized the orientation for individual module
an algorithm containing two phases. First the modules are and the slice line direction, respectively. In order to carry
grouped together only based on connectivity. Second the mod- out the combination of generalized slicing trees, we also
ules are packed physically by a row-oriented block packing extend original shape curve operation to curve Flipping
(ROB) technique which organizes the modules by rows based and curve Merging.3
on their dimensions. But this technique cannot handle soft • Enumerative Packing: To defer the decision on the slicing
modules. In [13], Zhan et al. applied a quadratic analytical tree structure within small subfloorplan, we develop the
approach similar to those used for placement problems. To enumerative packing (EP) technique. It enumerates all
generate a nonoverlapping floorplan, the quadratic approach possible slicing structures, and builds up one shape curve
relies on a legalization process. However, this legalization capturing all slicing layouts among the modules of small
is very difficult for circuits with big hard macros. Cong subfloorplan. The naive enumeration is very expensive
et al. [14] presented an area-driven look-ahead floorplanner in terms of CPU time and memory usage. But using the
in a hierarchical framework. Two main techniques are used technique of dynamic programming, EP can be efficiently
in their algorithm: the ROB and zero-dead space (ZDS). To applied to up to 10 modules.
handle both hard and soft modules, ROB is extended from • Block Swapping and Mirroring: To make the decision
[12]. ZDS is used to pack soft modules. But, ROB may on the subfloorplan order (left–right/top–bottom), we
generate a layout with large whitespace when the module sizes adopt three techniques: Rough Swapping, Detailed Swap-
in a subfloorplan are quite different from each other, e.g., a ping [11], and Mirroring. The motivation is to greedily
design with big hard macros. optimize the wirelength. As far as we know, we are
the first proposing the Rough Swapping technique and
B. Our Contributions showing that without Rough Swapping Detailed Swapping
This paper presents a fast, high-quality, scalable, and non- may degrade the wirelength.
stochastic fixed-outline floorplanner called DeFer.2 It can Additionally, we adopt the following three methods to
efficiently handle both hard and soft modules. enhance the robustness and quality of DeFer.
DeFer generates a final nonslicing floorplan by compacting • Terminal Propagation (TP): DeFer accounts for fixed pins
a slicing floorplan. It has been proved in [16] that any by using TP [19] during partitioning process.
nonslicing floorplan can be generated by compacting a slicing • Whitespace-Aware Pruning (WAP): A pruning method is
floorplan. In traditional annealing-based approaches, obtaining proposed to systematically control the number of points
a good slicing floorplan usually takes a long time, because the on each shape curve.
algorithms have to search many slicing trees. By comparison, • High-Level EP: Based on EP, we propose the high-level
DeFer considers only one single slicing tree generated by EP technique to further improve the packing quality.
recursive partitioning. However, to guarantee that a large By switching the strategy of selecting the points on the final
solution space is explored, we generalize the notion of slicing shape curve, we extend DeFer to handle other floorplanning
tree [2] based on the principle of deferred decision making problems, e.g., classical outline-free floorplanning,
(DDM). When two subfloorplans are combined at each node For fixed-outline floorplanning, experimental results on Gi-
of the generalized slicing tree, DeFer does not specify their gaScale Systems Research Center (GSRC) Hard-Block, GSRC
orientations, the left–right/top–bottom order between them, Soft-Block, hybrid blocks (HB) (containing both hard and soft
and the slice line direction. For small subfloorplan, DeFer modules), and HB+ (a hard version of HB) benchmarks show
even does not specify its slicing tree structure, i.e., the skeletal that DeFer achieves the best success rate, the best wirelength,
structure (not including tree nodes) in the slicing tree. In other and the best runtime on average, compared with all other state-
words, we are deferring the decisions on these four factors of-the-art floorplanners. The runtime difference between small
correspondingly: 1) subfloorplan orientation; 2) subfloorplan and large circuits shows DeFer’s good scalability. For classical
order; 3) slice line direction; and 4) slicing tree structure. outline-free floorplanning, using a linear combination of area
Because of DDM, one slicing tree actually represents a large and wirelength as the objective, DeFer achieves 12% better
number of slicing floorplan solutions. In DeFer, all of these so- cost value than Parquet 4.5 with 76× faster runtime.
lutions are efficiently maintained in a single shape curve [17]. The rest of this paper is organized as follows. Section II
With the final shape curve, it is straightforward to choose a describes the algorithm flow. Section III introduces the Gen-
good slicing floorplan fitting into the fixed outline. To realize eralized Slicing Tree. Section IV describes the Whitespace-
the DDM idea, we propose the following techniques. Aware Pruning. Section V describes the Enumerative Pack-
• Generalized Slicing Tree: To defer the decisions on these ing technique. Section VI illustrates the Block Swapping
three factors: 1) subfloorplan orientation; 2) subfloorplan and Mirroring. Section VII introduces the extension of
order; and 3) slice line direction, we generalize the DeFer on other floorplanning problems. Section VIII ad-
original slicing tree. In the generalized slicing tree, one dresses the implementation details. Experimental results are
tree node can represent both orientations of its two child
3 In this paper, all slicing trees and shape curve operation stand for the
2A preliminary version of DeFer was presented in [15]. generalized version by default.

Authorized licensed use limited to: Iowa State University. Downloaded on April 29,2010 at 19:31:34 UTC from IEEE Xplore. Restrictions apply.
YAN AND CHU: DEFER: DEFERRED DECISION MAKING ENABLED FIXED-OUTLINE FLOORPLANNING ALGORITHM 369

Fig. 1. Pseudocode on algorithm flow of DeFer. Fig. 3. Final shape curve with fixed outline and candidate points.

the hierarchical framework in Step 1, DeFer traverses


from bottom-up constructing a shape curve for every
tree node. The final shape curve at the root will main-
tain all explored slicing floorplan layouts of the whole
circuit.
3) Back-Tracing Step: Once the final shape curve is avail-
able, it is fairly straightforward to select the points fitting
into the fixed outline (see Fig. 3). For each of the
points we select, a back-tracing4 process is applied. As
every point in the parent curve is generated by adding
two points from two child curves, basically the back-
Fig. 2. High-level slicing tree. tracing is to trace the selected point on each shape curve
from top-down. During this process, DeFer makes the
decisions on every subfloorplan orientation, slice line
presented in Section IX. Finally, this paper ends with a
direction, and slicing tree structure of each subcircuit.
conclusion.
4) Swapping Step: The fourth step is to make decisions on
the subfloorplan order (left–right/top–bottom), by greed-
II. Algorithm Flow of DeFer ily swapping every two child subfloorplans. Basically we
perform three wirelength refinement processes through
Essentially, DeFer has six steps as shown in Fig. 1. The the hierarchical framework. First, Rough Swapping is
details of each step are as follows. applied from top-down, followed by Detailed Swapping.
1) Partitioning Step: As the number of modules in one de- Finally, we apply Mirroring.
sign becomes large, exploring all slicing layout solutions 5) Compacting Step: After fixing the slicing floorplan, this
among them is very expensive. Thus, the purpose of this step is to compact all modules to the center of the fixed
step is to divide the original circuit into several small outline. The compaction puts modules closer to each
subcircuits, and initially minimize the interconnections other, such that the wirelength is further reduced. If the
among them. hMetis [20], the state-of-the-art hypergraph slicing floorplan is outside of the fixed outline, DeFer
partitioner, is called to perform a recursive bisectioning compacts them to the lower-left corner rather than the
on the circuit, until every partition contains less than or center, so that potentially there is a higher chance to find
equal to maxN modules (maxN = 10 by default). TP a valid layout within the fixed outline.
is used in this step. Theoretically TP can be applied at 6) Shifting Step: In Step 5, some modules may be over-
any cut. But as using TP degrades the packing quality compacted. So we greedily shift such modules to-
(see Section III-C), we apply it only at the first cut ward the optimal positions [21] regarding wirelength
on the original circuit. During partitioning, a high-level minimization. At the end, DeFer outputs the final
slicing tree structure is built up where each leaf node floorplan.
is a subcircuit, and each tree node is a subpartition
(see Fig. 2). Due to the generalized notion of slicing From the algorithm flow, we can see that by initially
tree, the whole high-level slicing tree not only sets deferring the decisions in Steps 1 and 2, DeFer explores a
up a hierarchical framework, but also represents many large collection of slicing layouts, all of which are efficiently
possible packing solutions among the subcircuits. maintained in one final shape curve at the top; by finally mak-
2) Combining Step: In this step, we first defer the decision ing the decisions in Steps 3 and 4, DeFer chooses good slicing
on the slicing tree structure of each subcircuit, by ap- layouts fitting into the fixed outline. The main techniques are
plying the Enumerative Packing technique to explore all discussed in detail in Sections III–VII.
slicing packing layouts within the subcircuit. After that,
an associated shape curve representing these possible 4 Back-tracing is different from back-tracking [5] which traverses from
layouts for each subcircuit is produced. Then, based on bottom-up to determine legal solutions.

Authorized licensed use limited to: Iowa State University. Downloaded on April 29,2010 at 19:31:34 UTC from IEEE Xplore. Restrictions apply.
370 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 3, MARCH 2010

Fig. 4. Generalized slicing tree and sixteen different layouts.


Fig. 5. Extended shape curve operation. (a) Addition. (b) Flipping.
(c) Merging.
III. Generalized Slicing Tree
In this section, we introduce the generalized slicing tree, from set W to set H, for a given height only one point
which enables the deferred decisions on these three factors: can be kept. We choose the point with a smaller width
1) subfloorplan orientation; 2) subfloorplan order; and 3) slice out of Ch and Cv , e.g., point k in Fig. 5(c), because such
line direction. point corresponds to smaller floorplan area.
As a result, we have derived three steps to actualize the
A. Notion of Generalized Slicing Tree operator “⊕” in the slicing tree combination. Now given two
In an ordinary slicing tree, the parent tree node of two child child curves corresponding to two child subfloorplans in the
subfloorplans A and B is labeled “H”/“V” to specify that A slicing tree, these three steps are applied to combine the two
and B are separated by a horizontal/vertical slice line, and the curves into one parent curve, in which the entire slicing layouts
order between the two child nodes in the slicing tree specifies between the two child subfloorplans are captured.
the top–bottom/left–right order of A and B in the layout. For
example, if in the ordinary slicing tree the left child is A, the C. Decision of Slice Line Direction for Terminal Propagation
right child is B, and the parent node is labeled “V,” then in Because all cut line directions in the high-level slicing tree
the corresponding layout A is on the left of B. If we want to are undetermined, we cannot apply TP during partitioning. In
switch to other layouts between A and B, then the slicing tree order to enable TP, we pre-decide the cut line direction based
has to be changed as well. on the aspect ratio5 τp of the subpartition region. That is, if
Now we generalize the ordinary slicing tree, such that one τp > 1, the subpartition will be cut “horizontally;” otherwise,
generalized slicing tree represents multiple slicing layouts. it will be cut “vertically.” In principle, we can use such strategy
Here, we introduce a new operator—“⊕” to incorporate both on all cut lines in the high-level slicing tree. However, by doing
“H” and “V” slice line directions. Moreover, we do not this we restrict the combine direction in the generalized slicing
differentiate the “top–bottom” or “left–right” order between tree, which degrades the packing quality. To make a trade-off,
the two child subfloorplans any more, which means even we only apply TP at the root, i.e., the first cut on the original
though we put A at the left child, it can be switched to the circuit.
right later on. We even do not specify the orientation for each
subfloorplan. As a result, the decisions on slice line direction,
subfloorplan order, and subfloorplan orientation are deferred. IV. Whitespace-Aware Pruning
Now each parent node in the slicing tree represents all sixteen In this section, we present the WAP technique, which
slicing layouts between two child subfloorplans (see Fig. 4). systematically prunes the points on the shape curve with
whitespace awareness.
B. Extended Shape Curve Operation
A. Motivation on WAP
To actualize the slicing tree combination we use the shape
curve operation. The shape of each subfloorplan is captured In Fig. 6, two subfloorplans A and B are combined into
by its associated shape curve. In order to derive a compatible subfloorplan C. Shape curves Ca , Cb , and Cc contain various
operation for the new operator “⊕,” we develop three steps to floorplan solutions of A, B, and C, respectively. Because Cb
combine two child curves A and B into one parent curve C. has a gap between points P2 and P3 , during the combining
process point P1 cannot find any point from Cb with the
1) Addition: Firstly, we add two curves A and B horizon-
matched height, and is forced to combined with P2 . Due to the
tally to get curve Ch , on which each point corresponds
height difference between P1 and P2 , the resulted point P4 on
to a horizontal combination of two subfloorplan layouts
curve Cc represents a layout with extra whitesapce. The bigger
from A and B, respectively [see Fig. 5(a)].
the gap is, the more the whitespace is generated.
2) Flipping: Next, we flip curve Ch symmetrically based
It is only an ideal situation that each point always had a
on the W = H line to derive curve Cv . The purpose
matched point on another curve. Therefore, in the hierarchical
of doing this is to generate the curve that contains the
framework during the curve combining process, the whitespace
corresponding vertical combination cases from the two
will be generated and accumulated to the top level. For a fixed-
subfloorplan layouts [see Fig. 5(b)].
outline floorplanning problem, we have a budget/maximum
3) Merging: The final step is to merge Ch and Cv into the
parent curve C. Since the curve function is a bijection 5 In this paper, aspect ratio is defined as the ratio of height to width.

Authorized licensed use limited to: Iowa State University. Downloaded on April 29,2010 at 19:31:34 UTC from IEEE Xplore. Restrictions apply.
YAN AND CHU: DEFER: DEFERRED DECISION MAKING ENABLED FIXED-OUTLINE FLOORPLANNING ALGORITHM 371

Fig. 7. Calculation of Wpi and Wo .

Fig. 6. Generation of whitespace during curve combination.


above pruning, we have hip+1 ≤ (1 + βi ) · hip . So approximately
hip+2 ≥ (1+βi )hip . Thus, the relationship between the first point
whitespace amount Wb . In order to avoid exceeding Wb , the and point ki is
whitespace generated in the curve combination needs to be  
minimized. One direct way to achieve this is to increase the ki −1 ln(h i
k /h i
1 )
number of points, such that the sizes of gaps among the points hiki ≥ (1 + βi ) 2 hi1 ⇒ ki ≤ 2 · i
+ 1. (3)
ln(1 + βi )
are minimized. However, the more points we keep, the slower
the algorithm runs. This rises the question WAP is trying to Because of the Flipping [see Fig. 5(b)], each shape curve is
solve: How can we minimize the number of points on the shape symmetrical based on W = H line. So in the implementation
curve, while guaranteeing that the total whitespace would not we only keep the lower half curve. In this case, the last point
exceed Wb ? ki is actually very close6 to W = H line, so we have

wiki ≈ hiki ⇒ hiki ≈ Ai (4)
B. Problem Formulation of WAP
WAP is to prune the points on the shape curve, while making where Ai is the area of subpartition i. It equals to the sum
sure that the gaps among the points are small enough, such of total module area in subpartition i and the accumulated
that we can guarantee the total whitespace would not exceed whitesapce from the subcircuits at lower level. In (3), hi1 is
the budget Wb . WAP is formulated as follows: actually the minimum height of the outlines on shape curve
i. Suppose subpartition i contains Vi modules. The width and

M
i i
Minimize ki height of module m are xm and ym
i=1
M 
N (1) hi1 = max(min(x1i , y1i ), . . . , min(xVi i , yVi i )). (5)
subject to W pi + Wcj + Wo ≤ Wb .
In the following part, we explain the calculation of other
i=1 j=1
terms in (1).
In (1), suppose there are M subpartitions and N subcircuits in • Calculation of Wpi : Suppose two child subpartitions S1
i
the high-level slicing tree (see Fig. 2). Before pruning, there i
and S2 are combined into parent subpartition Si , where
are ki points on shape curve i of subpartition i. During the the area of S1i , S2i and Si are Ai1 , Ai2 and Ai . The
combine process of generating shape curve i, the introduced pruning parameter of Si is βi . As shown in Fig. 7(a),
whitespace in subpartition i is Wpi . The whitespace inside the whitespace produced in the combining process is
subcircuit j is Wcj . At the root, the whitespace between the
Ai2 · βi
floorplan outline and the fixed outline is Wo . Wpi = Ai · . (6)
To do pruning, we calculate a pruning parameter βi for Ai1 + Ai2 + Ai2 · βi
shape curve i. In subpartition i, let the corresponding width Since the partitioner tries to balance the area of S1i and
and height of point p (1 ≤ p ≤ ki ) be wip and hip . On each S2i , we can assume Ai1 ≈ Ai2 . Typically βi  2, so Ai1 +
shape curve, the points are sorted based on the ascending order Ai2 + Ai2 · βi ≈ Ai . Thus
of the height. Hp is defined for point p as follows: βi
Wpi = Ai1 · βi = Ai2 · βi = Ai · . (7)
Hp = βi · hip . (2) 2
• Calculation of Wcj : Before pruning, the shape curves of
Within the distance of Hp above point p, only the point that
subcircuits have already been generated by EP. We choose
is the closest to hip + Hp is kept, and other points are pruned
the minimum whitespace among all layouts of subcircuit
away. The intuition is that the gap within Hp is small enough N
j as the value of Wcj , so that j=1 Wcj ≥ Wb can be
to guarantee that no large whitespace will be generated. Such
prevented.
pruning method is applied only on every pair of child curves
• Calculation of Wo : At the root, there is extra whitespace
of subpartitions in the high-level slicing tree, before they are
Wo between the floorplan outline and the fixed outline.
combined into a parent curve. We do not prune any point on
DeFer picks at most δ points (δ = 21 by default) for back-
the shape curves of subcircuits.
tracing step. So we assume there are δ points enclosed
Now we rewrite (1) into a form related with βi , such that
by solving WAP we can get the value of βi . Based on the 6 If ki represents a outline of a square, it is on W = H line.

Authorized licensed use limited to: Iowa State University. Downloaded on April 29,2010 at 19:31:34 UTC from IEEE Xplore. Restrictions apply.
372 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 3, MARCH 2010

into the fixed outline, and the first and last points P1 , Pd
out of δ are on the right and top boundary of the fixed
outline [see Fig. 7(b)]. For various points/layouts, Wo is
different. We use the one of P1 to approximate Wo . As
in pruning we always keep the point that is the closest to
(1 + βi )hip , here we can assume h1p+1 = (1 + β1 )h1p . So we
have
Wo = A1 · ((1 + β1 )δ−1 − 1). (8)

From (3), (4), (7), and (8), (1) can be rewritten as



M
ln( Ai /hi1 )
Minimize Fig. 8. List of different slicing tree structures.
i=1
ln(1 + βi )


M
βi 
N
Eventually, the pruning parameters βi returned by WAP are
subject to Ai · + Wcj + Wo ≤ Wb (9)
2 j=1 used to systematically prune the points on the shape curve of
i=1
each subpartition i. Best of all, we do not need to worry about
Wo = A1 · ((1 + β1 )δ−1 − 1) the over-pruning and degradation of the packing quality.
βi ≥ 0 i = 1, . . . , M.
V. Enumerative Packing
C. Solving WAP In order to defer the decision on the slicing tree structure,
To solve WAP (9), we relax the constraint related with Wb we propose the EP technique that can efficiently enumerate all
by Lagrangian relaxation. Let possible slicing layouts among a set of modules, and finally
 λ be the nonnegative Lagrange
multiplier, and W  = Wb − N j=1 Wcj − Wo
keep all of them into one shape curve.

 ln( Ai /h )
M i M
βi
Lλ (βi ) = 1
+λ·( Ai · − W ) A. Naive Approach of Enumeration
ln(1 + βi ) 2
i=1 i=1 In this section, we plot out a naive way to enumerate
LRS : Minimize Lλ (βi ) all slicing packing solutions among n modules. We first
subject to βi ≥ 0 i = 1, . . . , M. enumerate all slicing tree structures and then enumerate all
permutations of the modules. Let L(n) be the number of
LRS is the Lagrangian relaxation subproblem associated with different slicing tree structures for n modules. So we have
λ. Let the function Q(λ) be the optimal value of LRS. The n

Lagrangian dual problem (LDP ) is defined as 


2

L(n) = L(n − i) · L(i). (10)


LDP : Maximize Q(λ) i=1
subject to λ ≥ 0.
All slicing tree structures for 3–6 modules are listed in Fig. 8.
As WAP is a convex problem, if λ is the optimal solution Note that we are using the generalized slicing tree which
of LDP, then the optimal solution of LRS also optimizes WAP. does not differentiate the left–right order between two child
We differentiate Lλ (βi ) based on βi and λ, respectively subtrees. As we can see the number of different slicing tree
  √ structures is actually very limited.
∂L 1 ln( A1 /h11 )
= λA1 + (δ − 1) · ((1 + β1 )δ−2 ) − . To completely explore all slicing packing solutions among n
∂β1 2 (1+β1 ) · ln2 (1 + β1 ) modules, for each slicing tree structure, different permutations
of the modules should also be considered. For example in
√ Fig. 8, in tree T4a four modules A, B, C, and D can be
∂L λAi ln( Ai /hi1 )
= − , i = 2, . . . , M. mapped to leaves “1–2–3–4” by the order “A–B–C–D” or
∂βi 2 (1 + βi ) · ln2 (1 + βi )
“A–C–B–D.” Obviously these two orders derive two different
layouts. However, again because the generalized slicing tree
∂L 
M
βi does not differentiate the left–right order between two child
= Ai · − W . subtrees which share the same parent node, for example,
∂λ i=1
2
orders “A–C–B–D” and “B–A–C–D” are exactly the same
To find the “saddle point” between LRS and LDP, we first in T4a . After pruning such redundancy, we have 4!2 = 12
set an arbitrary λ. Once λ is fixed, ∂β ∂L
i
(1 ≤ i ≤ M) is a nonredundant permutations for mapping four modules to the
univariate function that can be solved by Bisection Method four leaves in T4a . Therefore, for each slicing tree structure of
to get βi . Then βi is used to get the value of function ∂L∂λ
. If n modules, we first enumerate all nonredundant permutations,
∂L
∂λ
= 0, we adjust λ accordingly based on Bisection Method for each one of which a shape curve is produced. And then
and do another iteration of the above calculation, until ∂L
∂λ
= 0. we merge these curves into one curve associated with each

Authorized licensed use limited to: Iowa State University. Downloaded on April 29,2010 at 19:31:34 UTC from IEEE Xplore. Restrictions apply.
YAN AND CHU: DEFER: DEFERRED DECISION MAKING ENABLED FIXED-OUTLINE FLOORPLANNING ALGORITHM 373

TABLE I
Comparison on # of ‘‘⊕’’ Operation

n # of ⊕ # of ⊕
by Naive Approach With DP
2 1 1
3 6 6
4 45 25
5 400 90
6 4155 301
7 49 686 966 Fig. 9. Illustration of high-level EP.
8 674 877 3025
9 10 295 316 9330
10 174 729 015 28 501
However, if the set contains too many modules, two prob-
lems appear in EP: 1) the memory to store results from subsets
slicing tree structure. Finally, these curves from all slicing tree can be expensive; and 2) since the interconnections among the
structures are merged into one curve that captures all possible modules are not considered, the wirelength may be increased.
slicing layouts among these n modules. To show the amount Due to these two concerns, in the first step of DeFer, we
of computations in this process, we list the number of “⊕” apply hMetis to recursively cut the original circuit into multiple
operations for different numbers of modules in the second smaller subcircuits. This process not only helps us to cut
column of Table I. down the number of modules in each subcircuit, but initially
optimizes the wirelength as well. Later on as applying EP
B. Enumeration by Dynamic Programming on each subcircuit, the wirelength would not become a big
concern, because this is only a locally packing exploration
Table I shows that the naive approach can be very expensive among a small number of modules. In other words, in the
in both runtime and memory usage. Alternatively, we notice spirit of DDM, instead of deferring the decision on the slicing
that the shape curve for a set of modules (M) can be defined tree structure among all modules in the original circuit, first we
recursively as follows: fix the high-level slicing tree structure among the subcircuits
S(M) = MERGE (S(A) ⊕ S(B)). (11) by partitioning, and then defer the decision on the slicing tree
A⊂B,B= M−A structure among the modules within each subcircuit.
S(M) is the shape curve capturing all slicing layouts among
D. High-Level EP
modules in M, MERGE() is similar to the Merging in Fig. 5(c),
but operates on shape curves from different sets. In the modern system-on-a-chip design, the usage of intel-
Based on (11), we can use dynamical programming (DP) to lectual property becomes more and more popular. As a result,
implement the shape curve generation. First of all, we generate a circuit usually contains numbers of big hard macros. Due
the shape curve representing the outline(s) of each module. to the big size differences from other small modules, they
For hard modules, there are two points7 in each curve. For may produce some large whitespace. For example in Fig. 9(a),
soft modules, only several points from each original curve are after partitioning, the original circuit has been cut into four
evenly sampled.8 And then starting from the smallest subset of subcircuits A, B, C, and D. A contains a big hard macro.
modules, we proceed to build up the shape curves for the larger Respecting the slicing tree structure of T4b , you may find that
subsets step by step, until the shape curve S(M) is generated. no matter how hard EP explores various packing layouts within
Since in this process the previously generated curves can be A or B, there is always a large whitespace, such as Q, in the
reused for building up the curves of larger subsets of modules, parent subfloorplan. This is because the high-level slicing tree
many redundant computations are eliminated. After applying structure among subcircuits has been fixed by partitioning, so
DP, the resulted numbers of “⊕” operations are listed in the that some small subcircuit is forced to combine with some big
third column of Table I. subcircuit. Thus, to solve this problem, we need to explore
other slicing tree structures among the subcircuits.
To do so, we apply EP on a set of subfloorplans, instead
C. Impact of EP on Packing
of a set of modules. As the input of EP is actually a set of
To control the quality of packing in EP, we can adjust shape curves, and shape curves can represent the shape of
the number of modules in the set. Consequently the impact both subfloorplans and modules, it is capable of using EP to
on packing is: The more modules a set contains, the more explore the layouts among subfloorplans. In Fig. 9(b), EP is
different slicing tree structures we explore, the more slicing applied on the four shape curves coming from subfloorplans
layout possibilities we have, and thus the better quality of A, B, C, and D, respectively. So all slicing tree structures (T4a
packing we will gain at the top level. and T4b ) and permutations among these subfloorplans can be
7 One completely explored. Eventually one tightly-packed layout can
point if the hard module is a square.
8 The number of sampled points on the whole curve is determined by be chosen during back-tracing step [see Fig. 9(c)].
Ai
A0 ρ + 4, where Ai is the area of soft block i, A0 is the total block area, Before we describe the criteria of triggering high-level EP,
and ρ is a constant (ρ = 10 000 by default). some concepts are introduced here as follows.

Authorized licensed use limited to: Iowa State University. Downloaded on April 29,2010 at 19:31:34 UTC from IEEE Xplore. Restrictions apply.
374 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 3, MARCH 2010

Fig. 11. Swapping and Mirroring.

Fig. 10. One exception of identifying hTree.

• Big gap: Based on the definition of Hp in Section IV,


if hip+1 − hip > ω · Hp (ω is “Gap Ratio,” ω = 5 by
default), then we say there is a “big gap” between points
p and p + 1. Intuitively, if there is a big gap, most likely
it would cause serious packing problem at upper level.
• hNode: In the high-level slicing tree, the tree node or leaf Fig. 12. Motivation on Rough Swapping.
node that contains big gap(s).
• hTree: A subtree of the high-level slicing tree, where the
right subfloorplans, inside of which the relative positions
high-level EP is applied. For example, T4b is a hTree [see
among the modules are unchanged. In Mirroring, instead
Fig. 9(a)].
of simply swapping two subfloorplans, we first figure out
• hRoot: The root node of hTree.
the symmetrical axis of the outline at their parent floorplan,
High-level EP is to solve the packing problem caused by big and then attempt to mirror them based on this axis. When
gaps, so we need to identify the hTree that contains big gap. calculating the HPWL, in Rough Swapping we treat all internal
First we search for the big gap through the high-level slicing modules to be at the center of their subfloorplan outline. In
tree. If any shape curve has a big gap, then the corresponding Detailed Swapping, we use the actual center coordinates of
node becomes an hNode. After identifying all hNodes, each each module in calculating the HPWL.
hNode becomes an hRoot, and the subtree whose root node Rough Swapping is an essential step before Detailed Swap-
is hRoot becomes an hTree. But there is one exception: as ping. Without it, the results produced by Detailed Swapping
shown in Fig. 10, if one hTree T2 is a subtree of another could degrade the wirelength. For example in Fig. 12, when
hTree T1 , then T2 will not become an hTree. Eventually, each we try to swap two subfloorplans A and B, two types of nets
hTree contains at least one big gap, which implies critical need to be considered: internal nets neti between A and B, and
packing problems. Thus, for every hTree we use high-level external nets neto between the modules inside A or B and other
EP to further explore the various packing layouts among the outside modules or fixed pads. Let C and D be two modules
subfloorplans, i.e., leaves of hTree. If an hTree has more than inside A and B, respectively. C and D are highly connected
10 leaves, we will combine them from bottom-up until the by netcd . After back-tracing step, the coordinates of C and D
number of leaves becomes 10. are still unknown. If we randomly specify the positions of C
As mentioned in Section V-C, EP only solves the packing and D as shown in Fig. 12(a), then we may swap A and B to
issue, which may degrade the wirelength. Therefore, to make gain better wirelength. Alternatively, if C and D are specified
a trade-off we apply high-level EP only if there is no point in the positions in Fig. 12(b), then we may not swap them.
enclosed into the fixed outline after combining step. If that As we can see, the randomly specified module position may
is the case, then we will use the above criteria to trigger the mislead us to make the wrong decision. To avoid such “noise”
high-level EP, and reconstruct the final shape curve. generated by neti in the swapping process, the best thing to do
is to assume C, D and all modules inside subfloorplans A and
VI. Block Swapping and Mirroring B are at the centers of A and B, such that the right decision
can be made based on neto .
After back-tracing step, the decision on subfloorplan order Essentially, we first apply Rough Swapping from top-down,
(left–right/top–bottom) has not been made yet. Using such followed by Detailed Swapping. Finally, Mirroring is used.
property, this section focuses on optimizing the wirelength. Note that the order between Detailed Swapping and Mirroring
In slicing structures switching the order (left–right/top– can be changed, and both of them can be applied from either
bottom) of two child subfloorplans would not change the top-down or bottom-up.
dimension of their parent floorplan outline, but it may actu-
ally improve the wirelength. Basically, we adopt three tech-
niques here: 1) Rough Swapping; 2) Detailed Swapping; and
3) Mirroring. Each of them is trying to switch the positions VII. Extension of DeFer
of two subfloorplans to improve the half-perimeter wirelength This section presents the different strategies of selecting the
(HPWL). Fig. 11 illustrates the differences between Swapping points from the final shape curve, such that DeFer is capable
and Mirroring. In Swapping, we try to switch the left and of handling floorplanning problems with various objectives.

Authorized licensed use limited to: Iowa State University. Downloaded on April 29,2010 at 19:31:34 UTC from IEEE Xplore. Restrictions apply.
YAN AND CHU: DEFER: DEFERRED DECISION MAKING ENABLED FIXED-OUTLINE FLOORPLANNING ALGORITHM 375

Fig. 13. Compact invalid points into fixed outline.


Fig. 14. Two strategies of identifying hRoot.

1) Fixed-Outline Floorplanning: Given the final shape


curve, it is very straightforward to select the valid points
enclosed into the fixed outline. Let P be the number
of such valid points. As for each selected point the
swapping process is applied to optimize the HPWL, to
make a trade-off between runtime and solution quality
DeFer chooses at most δ points (δ = 21 by default) for
the back-tracing. So we have three cases.
a) P > δ: Based on the geometric observation
between aspect ratio and HPWL in [9], DeFer
chooses δ points where the outline aspect ratio is
closed to 1. Fig. 15. Tuned parameters at each run.
b) 0 < P ≤ δ: All P points are chosen.
c) P = 0: DeFer still chooses at most δ points near
the upper-right corner of the fixed outline (see decreasing the gap ratio ω. Also, we can use different strategies
Fig. 13), in that we attempt to compact them into to identify hRoot (see Fig. 14).
the fixed outline in compacting step. 1) Each hNode becomes an hRoot.
2) Min-Area Floorplanning: For min-area floorplanning, 2) Each hNode’s grandparent tree node becomes an hRoot.
DeFer just needs to go through each points on the final Strategy 1 is the one we mentioned in Section V-D. Appar-
shape curve and find out the one with the minimum area. ently, if we adopt strategy 1, more hTrees will be generated,
Because the area minimization is the only objective here, and thus the high-level EP is used more often, which leads
we can even skip swapping step and shifting step to gain better packing. However, this takes longer runtime.
fast runtime. This problem considers to be very easy for Another way to improve the packing quality is to balance
DeFer. both the area and number of modules, rather than only the
3) Min-Area and Wirelength Floorplanning: This problem area in each partition at partitioning step. Thus, we have two
uses a linear combination of area and wirelength as methods to set the weight for the module.
the cost function. Compared with the strategy of fixed-
outline floorplanning, the only difference is that we just 1) Wgt = Am .
need to choose the δ points with the minimum area, 2) Wgt = Am + 0.6 · Ap .
rather than within the fixed outline. Here, Wgt and Am are the weight and area for module m,
As shown above, DeFer is very easy to be switched to Ap is the average module area in partition p. In experiments,
handle other floorplanning problems. Because once the final we observe that method 2, which considers both the area
shape curve is available, DeFer has provided a large amount and number of modules, generates better packing results, yet
of floorplan candidates. Given any objective function, e.g., sacrifices the wirelength.
that used in simulated annealing, we just need to evaluate the Essentially, DeFer starts with the defaulted parameters for
candidates, and pick the one that gives the minimum cost. the first run. If failing to pack all modules into the fixed
outline, it will internally enhance the packing strength and
try another run. By default DeFer will try at most eight runs.
The tuned parameters for each run is listed in Fig. 15. For
VIII. Implementation Details
Runs 3–5, because they share the same partition result with
Sometimes DeFer cannot pack all modules into the fixed Run 2, DeFer skips the partitioning step in those runs.
outline. This may occur because hMetis generates a hard-to- Even though DeFer internally executes multiple runs, it still
pack partition result, or the packing strength is not strong achieves the best runtime compared with all other floorplan-
enough. To enhance the robustness of DeFer, we adaptively ners. There are two reasons: 1) DeFer is so fast. Even it runs
tune some parameters and try another run. multiple times, it is still much faster than other floorplanners;
One effective way to improve the packing quality of DeFer and 2) DeFer has better packing quality. For most circuits,
is to enhance the packing strength in the high-level EP, e.g., by DeFer can satisfy the fixed-outline constraint within Run 1.

Authorized licensed use limited to: Iowa State University. Downloaded on April 29,2010 at 19:31:34 UTC from IEEE Xplore. Restrictions apply.
376 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 3, MARCH 2010

Fig. 16. Circuit n300 layouts generated by DeFer. (a) n300 hard block γ = 10%. (b) n300 hard block γ = 10%. (c) n300 hard block γ = 10%. (d) n300
soft block γ = 1%.

TABLE II
Comparison on GSRC Hard-Block Benchmarks [22] (γ = 10%)

Circuit n100 n200 n300 Normallized


Aspect Ratio 1 2 3 1 2 3 1 2 3
Parquet 4.5 42% 43% 33% 26% 19% 17% 16% 16% 14% 0.25
FSA 100% 0% 0% 100% 0% 0% 0% 0% 0% 0.22
IMF 100% 100% 100% 100% 100% 100% 100% 100% 100% 1.00
Suc% IARFP 99% 100% 99% 100% 99% 63% 100% 100% 46% 0.90
PATOMA 0% 0% 0% 0% 100% 0% 100% 100% 100% 0.44
Capo 10.5 17% 17% 15% 0% 0% 2% 0% 1% 0% 0.06
DeFer 100% 100% 100% 100% 100% 100% 100% 100% 100% 1
Parquet 4.5 248 652 269 191 289 963 467 627 506 946 544 621 686 588 725 833 781 556 1.27
FSA 243 823 – – 414 777 – – – – – 1.14
IMF 250 680 251 418 257 935 438 467 454 231 482 651 584 578 617 510 666 245 1.14
HPWL IARFP 220 269 230 553 247 283 386 537 409 208 433 631 535 850 567 496 600 438 1.03
PATOMA – – – – 483 110 – 653 711 697 740 680 671 1.25
Capo 10.5 227 046 241 789 261 334 – – 444 079 – 566 998 – 1.05
DeFer 208 650 229 603 248 567 372 546 402 155 431 552 498 909 538 515 577 209 1
Parquet 4.5 10.85 10.58 10.27 44.43 44.47 41.96 95.02 87.03 86.31 181.49
FSA 39.78 – – 202.13 – – – – – 557.74
IMF 7.65 10.82 9.29 41.21 43.59 38.71 74.74 71.48 71.72 157.91
Time (s) IARFP 4.44 4.50 4.52 16.51 15.48 14.22 29.30 29.48 30.03 64.33
PATOMA – – – – 0.25 – 0.36 0.34 0.48 1.15
Capo 10.5 122.64 125.18 160.07 – – 3054 – 8661 – 222.39
DeFer 0.13 0.11 0.11 0.25 0.23 0.22 0.35 0.33 0.33 1
#Valid Point/#Total Point 3/617 4/621 3/621 3/670 2/672 2/672 6/869 5/869 4/869

Authorized licensed use limited to: Iowa State University. Downloaded on April 29,2010 at 19:31:34 UTC from IEEE Xplore. Restrictions apply.
YAN AND CHU: DEFER: DEFERRED DECISION MAKING ENABLED FIXED-OUTLINE FLOORPLANNING ALGORITHM 377

TABLE III
Comparison on GSRC Soft-Block Benchmarks [22] (γ = 1%)

Circuit n100 n200 n300 Normallized


Aspect Ratio 1 2 3 1 2 3 1 2 3
Parquet 4.5 0% 0% 0% 0% 0% 0% 0% 0% 0% 0
Suc% Capo 10.5 0% 0% 0% 0% 0% 0% 0% 0% 0% 0
PATOMA 100% 100% 100% 100% 100% 100% 100% 100% 100% 1.00
DeFer 100% 100% 100% 100% 100% 100% 100% 100% 100% 1
Parquet 4.5 – – – – – – – – – —
HPWL Capo 10.5 – – – – – – – – – —
PATOMA 215 455 213 561 230 759 383 330 367 565 404 574 524 774 486 351 518 204 1.01
DeFer 196 457 217 686 235 702 354 885 380 470 410 464 476 508 514 764 551 610 1
Parquet 4.5 – – – – – – – – – —
Time (s) Capo 10.5 – – – – – – – – – —
PATOMA 0.39 0.40 0.38 0.92 0.93 0.83 1.28 1.28 1.37 3.50
DeFer 0.09 0.09 0.09 0.18 0.19 0.19 0.78 0.96 0.97 1
#Valid Point/#Total Point 28/20 392 30/20 469 30/20 469 16/25 513 18/25 493 17/25 493 9/30 613 10/30 598 10/30 603

IX. Experimental Results 2) GSRC Soft-Block Benchmarks: These circuits con-


In this section, we present the experimental results. All tain 100, 200, and 300 soft modules. DeFer compares with
experiments were performed on a Linux machine with Intel Parquet 4.5, Capo 10.5, and PATOMA, as only these floorplan-
Core Duo9 1.86 GHz CPU and 2 GB memory. The wirelength ners can handle soft modules. We add “-soft” to Parquet 4.5
is measured by HPWL. We compare DeFer with all the best command line. The maximum whitespace percentage γ = 1%,
publicly available state-of-the-art floorplanners, of which the which is almost zero whitespace requirements. As we can see
binaries are the latest version. For the hMetis 1.5 parameters in from Table III, after 100 runs both Parquet 4.5 and Capo 10.5
DeFer, NRuns = 1, UBfactor = 10, and others are defaulted. cannot pack all modules within the fixed outline. PATOMA and
DeFer reach 100% success rate on every test case. Compared
A. Experiments on Fixed-Outline Floorplanning with PATOMA, DeFer generates 1% better wirelength with
In this section, we compare DeFer with other fixed-outline 4× faster runtime. Fig. 16(d) is the final layout generated by
floorplanners. On GSRC and HB benchmarks, for each circuit DeFer on circuit n300 with τ = 1, which shows almost 0%
we choose three different fixed-outline aspect ratios: τ = whitespace is reached.
1, 2, 3. All input/output (I/O) pads are scaled to the according 3) HB Benchmarks: We compare DeFer with PATOMA
boundary. On HB+ benchmarks, we use the defaulted fixed and Capo 10.5 on HB benchmarks. These circuits are gen-
outlines and I/O pad locations. By default every floorplanner erated from the IBM/ISPD98 suite containing both hard and
runs 100 times for each test case, and the results are averaged soft modules ranging from 500 to 2000, some of which are
over all successful runs. As PATOMA has internally fixed the big hard macros. Detailed statistics are listed in the second
hMetis seed, and produces the same result no matter how many column of Table IV. To get better runtime, wirelength and
times it runs, we run it only once. For other floorplanners, success rate, we run Capo 10.5 in “-SCAMPI” [23] mode.
the initial seed is the same as the index of each run. Parquet However, Capo 10.5 still takes a long time to finish one run for
4.5 runs in wirelength minimization mode. The parameters for each test case, so we only run it once with the defaulted seed.
other floorplanners are defaulted. For each type of benchmarks, To show its slowness, we also list the reported runtime for the
we finally normalize all results to DeFer’s results. unsuccessful runs. From Table IV, we can see that DeFer does
1) GSRC Hard-Block Benchmarks: These circuits contain not achieve 100% success rate for only one test case, and the
100, 200, and 300 hard modules. DeFer compares with six success rate is 2.33× and 8.33× higher than PATOMA and
floorplanners: Parquet 4.5 [4], FSA [6], IMF [8], IARFP [9], Capo 10.5. Capo 10.5 crashes on four test cases, and takes
PATOMA [14], and Capo 10.5 [5]. The maximum whitespace more than two days to finish one test case. Compared with
percentage γ = 10%. The results are summarized in Table II. PATOMA, DeFer is 28% better on average in HPWL, and 3×
For every test case DeFer reaches 100% success rate. DeFer faster. Compared with Capo 10.5, DeFer generates as much as
generates 27%, 14%, 14%, 3%, 25%, and 5% better HPWL 72% better HPWL with even 790× faster runtime. We also run
in 181×, 558×, 158×, 64×, 15%, and 222× faster runtime Parquet 4.5 on these circuits. However, it is so slow that even
than Parquet 4.5, FSA, IMF, IARFP, PATOMA, and Capo 10.5, running one test case once takes thousands of seconds. So for
respectively. DeFer consistently achieves the best HPWL and each test case, we only run it once instead of 100 times, but
best runtime on all 9 test cases, except for only one case (n100, none of the results fits into the fixed outline. Fig. 17(a)–(c)
τ = 3) DeFer generates 0.5% worse HPWL than IARFP. But shows the layouts generated by PATOMA, Capo 10.5, and
for that one DeFer is 41× faster than IARFP with 100% DeFer on circuit ibm03 with τ = 2.
success rate. Fig. 16(a)–(c) shows the layouts produced by 4) HB+ Benchmarks: DeFer compares with PATOMA and
DeFer on circuit n300 with τ = 1, 2, 3. Capo 10.5 on HB+ benchmarks. These circuits are generated
9 In the experiments, only one core was used. from HB benchmarks, while the biggest hard macro is inflated

Authorized licensed use limited to: Iowa State University. Downloaded on April 29,2010 at 19:31:34 UTC from IEEE Xplore. Restrictions apply.
378 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 3, MARCH 2010

TABLE IV
Comparison on HB Benchmarks [24] (γ = 10%)

Circuit #Soft./#Hard. Aspect PATOMA [14] Capo 10.5 [5] DeFer #Valid Point
/#Net. Ratio Suc% WL (e+06) Time (s) Suc% WL (e+06) Time (s) Suc% WL (e+06) Time (s) /#Total Point
665 1 100% 2.84 7.04 0% – 183 100% 2.66 1.44 16/1571
ibm01 /246 2 0% – – 0% – 977 100% 2.70 1.28 11/1482
/4236 3 100% 5.60 1.66 0% – 696 100% 2.82 1.30 12/1490
1200 1 0% – – 0% – 456 85% 6.55 14.48 6/2348
ibm02 /271 2 0% – – – – > 2 days 100% 6.21 3.33 7/1161
/7652 3 0% – – 0% – 3726 100% 6.29 3.52 10/1144
999 1 100% 12.59 5.42 100% 10.70 566 100% 8.77 3.60 59/2684
ibm03 /290 2 100% 12.94 5.58 100% 12.01 1874 100% 8.89 3.49 40/2503
/7956 3 0% – – 0% – 2028 100% 8.99 3.59 44/2630
1289 1 0% – – 0% – 2752 100% 8.94 3.04 4/1492
ibm04 /295 2 0% – – 100% 17.77 5253 100% 8.96 3.12 9/1514
/10 055 3 0% – – 100% 16.32 2262 100% 9.64 6.31 12/2685
564 1 100% 12.27 14.21 0% – 458 100% 12.61 3.55 46/3369
ibm05 /0 2 100% 12.60 13.68 0% – 358 100% 12.73 3.52 46/3371
/7887 3 100% 13.19 13.85 0% – 411 100% 13.45 3.53 46/3371
571 1 0% – – 0% – 235 100% 7.87 3.66 53/2187
ibm06 /178 2 0% – – 0% – 592 100% 7.76 3.66 41/2235
/7211 3 0% – — 0% – 2831 100% 8.91 3.60 36/2196
829 1 0% – – 0% – 1094 100% 13.81 3.87 12/1527
ibm07 /291 2 100% 24.64 7.85 0% – 1270 100% 13.91 4.48 22/1625
/11 109 3 100% 24.34 8.68 0% – 2274 100% 14.32 4.26 18/1590
968 1 0% – – 0% – 2527 100% 13.95 5.44 15/1333
ibm08 /301 2 0% – – 0% – 1110 100% 14.16 5.40 17/1290
/11 536 3 0% – – 0% – 1958 100% 14.43 5.55 19/1309
860 1 0% – – 0% – 2273 100% 12.85 2.60 3/1495
ibm09 /253 2 0% – – 0% – 2670 100% 12.57 3.77 17/1486
/11 008 3 0% – – 100% 34.48 6652 100% 12.98 3.54 14/1486
809 1 100% 48.47 21.71 0% – 2353 100% 33.25 11.63 9/2576
ibm10 /786 2 0% – – Crashed Crashed Crashed 100% 34.23 18.00 14/2897
/16 334 3 0% – – 100% 53.64 2014 100% 36.59 16.52 9/2725
1124 1 100% 20.87 33.87 0% – 8070 100% 21.99 4.84 12/2218
ibm11 /373 2 0% – – 0% – 4732 100% 22.13 4.96 8/2207
/16 985 3 0% – – 0% – 2245 100% 22.83 4.67 7/2174
582 1 0% – – 0% – 3085 100% 29.72 10.95 20/2909
ibm12 /651 2 0% – – 0% – 864 100% 31.53 7.71 18/3011
/11 873 3 0% – – 0% – 19 952 100% 32.16 4.59 8/1957
530 1 0% – – 0% – 3401 100% 25.92 6.03 12/2553
ibm13 /424 2 100% 43.81 9.84 0% – 3662 100% 25.46 3.79 10/2048
/14 202 3 0% – – 0% – 3201 100% 26.47 3.83 8/2095
1021 1 100% 71.87 23.59 0% – 4253 100% 50.83 9.69 30/2976
ibm14 /614 2 100% 55.99 35.65 0% – 10 373 100% 51.67 9.70 34/2971
/26 675 3 100% 61.65 35.12 0% – 4976 100% 53.71 9.70 36/2971
1019 1 0% – – 0% – 3634 100% 64.18 9.71 25/1651
ibm15 /393 2 0% – – 0% – 6827 100% 63.17 9.13 19/1580
/28 270 3 0% – – 0% – 2902 100% 66.06 9.46 20/1623
633 1 0% – – Crashed Crashed Crashed 100% 56.88 16.79 18/3823
ibm16 /458 2 100% 88.33 16.55 0% – 8928 100% 58.55 14.55 24/4833
/21 013 3 100% 98.77 22.94 0% – 11 675 100% 59.91 12.84 18/4093
682 1 100% 102.45 41.75 Crashed Crashed Crashed 100% 95.92 10.43 32/3253
ibm17 /760 2 100% 96.46 46.63 0% – 2250 100% 95.48 10.41 27/3252
/30 556 3 100% 98.18 42.45 Crashed Crashed Crashed 100% 100.82 10.42 29/3252
658 1 100% 50.28 38.24 0% – 1083 100% 49.12 7.93 42/3106
ibm18 /285 2 100% 49.74 39.15 0% – 4630 100% 49.29 7.97 41/3128
/21 191 3 100% 52.26 36.97 0% – 5262 100% 51.39 7.97 41/3128
Normalized 0.43 1.28 3.28 0.12 1.72 789.79 1 1 1

Authorized licensed use limited to: Iowa State University. Downloaded on April 29,2010 at 19:31:34 UTC from IEEE Xplore. Restrictions apply.
YAN AND CHU: DEFER: DEFERRED DECISION MAKING ENABLED FIXED-OUTLINE FLOORPLANNING ALGORITHM 379

Fig. 17. Circuit ibm03 layouts generated by PATOMA, Capo 10.5, and DeFer (γ = 10% and τ = 2). (a) By PATOMA. (b) By Capo 10.5. (c) By DeFer.

TABLE V
Comparison on HB+ Benchmarks [23]

Circuit White- Aspect PATOMA [14] Capo 10.5 [5] DeFer #Valid Point
space γ Ratio Suc% WL (e+06) Time (s) Suc% WL (e+06) Time (s) Suc% WL (e+06) Time (s) /#Total Point
ibm01 26% 1 100% 4.67 4.44 – – > 2 days 100% 3.09 1.84 120/10 860
ibm02 25% 1 0% – – 100% 7.86 124 100% 6.17 15.28 45/3380
ibm03 30% 1 0% – – 100% 12.75 343 100% 9.19 4.01 102/5020
ibm04 25% 1 0% – – 100% 12.03 147 100% 10.26 14.15 63/5170
ibm06 25% 1 0% – – 100% 10.09 155 100% 8.78 5.01 84/3560
ibm07 25% 1 100% 16.38 23.41 100% 16.41 99 100% 15.48 4.55 12/3780
ibm08 26% 1 0% – – 100% 18.29 284 100% 18.73 19.25 106/5070
ibm09 25% 1 100% 16.62 25.45 100% 17.85 100 100% 16.66 4.22 12/3070
ibm10 20% 1 0% – – 100% 81.27 1685 100% 45.12 6.32 27/6880
ibm11 25% 1 100% 25.86 38.72 100% 28.26 149 100% 26.99 7.07 19/4150
ibm12 26% 1 0% – – 100% 52.46 126 100% 50.17 5.54 69/6880
ibm13 25% 1 100% 36.74 29.08 100% 40.22 299 100% 35.51 5.85 15/3860
ibm14 25% 1 100% 68.30 51.79 100% 73.89 410 100% 64.50 12.01 36/7870
ibm15 25% 1 0% – – 100% 92.79 474 100% 84.29 14.66 182/9900
ibm16 25% 1 100% 95.97 47.14 100% 153.02 595 100% 98.66 8.08 10/5770
ibm17 25% 1 100% 142.41 65.06 100% 146.03 440 100% 144.56 14.70 41/9540
ibm18 25% 1 100% 73.76 47.71 100% 75.92 224 100% 71.86 11.30 44/9160
Normalized 0.53 1.07 4.76 1.00 1.19 46.66 1 1 1

TABLE VI
Comparison on Linear Combination of HPWL and Area

Circuit Parquet 4.5 [4] DeFer


Area Whitespace% HPWL Area + HPWL Time (s) Area Whitespace% HPWL Area + HPWL Time (s)
n100 194 425 8.31% 235 070 429 495 13.66 191 164 6.50% 209 785 400 949 0.33
n200 191 191 8.82% 438 584 629 775 54.84 187 734 6.85% 374 676 562 410 0.74
n300 298 540 9.29% 628 422 926 962 108.70 291 385 6.67% 503 311 794 696 0.96
Normalized 1.02 1.32 1.18 1.12 76.24 1 1 1 1 1

Authorized licensed use limited to: Iowa State University. Downloaded on April 29,2010 at 19:31:34 UTC from IEEE Xplore. Restrictions apply.
380 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 3, MARCH 2010

TABLE VII
Estimation on Contributions of Main Techniques and Runtime Breakdown in DeFer

Algorithm Step Partitioning Combining Back-tracing Swapping Compacting Shifting


Main Technique Min-Cut TP EP Combination – Rough Detailed Mirroring Compaction Shifting
Wirelength Improvement Major Minor – – – Major Minor Minor Minor Minor
Packing Improvement Minor – Major Minor – – – – Minor —
GSRC Hard 29% 63% 0% 8% 0% 0%
Runtime GSRC Soft 35% 37% 0% 28% 0% 0%
Breakdown HB 52% 4% 0% 44% 0% 0%
HB+ 46% 3% 0% 51% 0% 0%

by 100% and the area of remaining soft macros are reduced X. Conclusion
to preserve the total cell area. As a result, the circuits become As the earliest stage of VLSI physical design, floorplanning
even harder to handle. Due to the same reason, we set has numerous impacts on the final performance of ICs. In
Capo 10.5 to “-SCAMPI” mode, and run it only once. The this paper, we have proposed a fast, high-quality, scalable and
results are shown in Table V. DeFer achieves the 100% success nonstochastic fixed-outline floorplanner DeFer.
rate on all circuits, which is 1.89× better than PATOMA. Based on the principle of Deferred Decision Making, DeFer
Capo 10.5 also achieves 100% success rate, expect for one outperforms all other state-of-the-art floorplanners in every
circuit it takes more than two days to finish. In terms of aspect. It is hard to accurately calculate how much each
the HPWL comparison, DeFer is 7% and 19% better than technique in DeFer contributes to the overall significant im-
PATOMA and Capo 10.5. DeFer is also 5× and 47× faster provement. But we do have a rough estimation in Table VII,
than PATOMA and Capo 10.5. in which we also show the runtime breakdown of DeFer for
Both HB and HB+ benchmarks are considered to be very each set of benchmarks. Note that, the DDM idea is the soul
hard to handle, because these circuits not only contain both of DeFer. Without it, those techniques cannot be integrated in
hard and soft modules, but also big hard macros. As far as we such a nice manner and produce promising results.
know, only the above floorplanners can handle these circuits. Such a high-quality and efficient floorplanner is expected
Obviously, DeFer reaches the best result. We also monitor to handle the increasing complexity of modern designs. The
the memory usage of DeFer on such large-scale circuits, and source code of DeFer and all benchmarks are publicly avail-
observe that the peak memory usage in DeFer is only 53 MB. able at [25]. In the future, we will integrate DeFer into
5) Analysis of Points in DeFer: In Tables II–V, for each placement tools to handle large-scale mixed-size designs.
test case we list the number of valid points (#VP) within
the fixed outline and the total number of points (#FP) in the
final shape curve. Both #VP and #FP are averaged over all
successful runs. We have three observations as follows. Acknowledgment
• As the circuit size grows, #FP increases. The authors would like to thank Dr. I. Markov, S. Chen,
• For the same circuit with various τ, ideally #FP should T.-C. Chen, and the University of California, Los Angeles
be the same. But they are actually different in some test Computer-Aided Design group, for their help with Capo 10.5,
cases. It is because high-level EP reconstructed the final IARFP, IMF, and PATOMA, respectively. They are also grate-
shape curve for some hard-to-pack instances. As you can ful to Dr. T.-C. Wang and the anonymous reviewers for their
see high-level EP can significantly increase #FP, e.g., helpful suggestions and comments on this paper.
ibm12 in Table IV, which means it improves packing
quite effectively.
• Sometimes while other algorithms cannot satisfy the
fixed-outline constraint, #VP of DeFer is more than 100, References
e.g., ibm15 in Table V. This shows DeFer’s superior [1] A. B. Kahng, “Classical floorplanning harmful?,” in Proc. Int. Symp.
packing ability. Phys. Design, 2000, pp. 207–213.
[2] R. H. J. M. Otten, “Efficient floorplan optimization,” in Proc. Int. Conf.
B. Experiments on Classical Outline-Free Floorplanning Comput. Design, 1983, pp. 499–502.
[3] S. N. Adya and I. L. Markov, “Fixed-outline floorplanning through better
For the classical outline-free floorplanning problem, as far local search,” in Proc. Int. Conf. Comput. Design, 2001, pp. 328–334.
as we know, only Parquet 4.5 can handle GSRC benchmarks, [4] S. N. Adya and I. L. Markov, “Fixed-outline floorplanning: Enabling
hierarchical design,” IEEE Trans. Very Large Scale Integrat. Syst.,
so we compare it with DeFer on GSRC Hard-Block bench- vol. 11, no. 6, pp. 1120–1135, Dec. 2003.
marks. The results are averaged over 100 runs. The objective [5] J. A. Roy, S. N. Adya, D. A. Papa, and I. L. Markov, “Min-cut
function is a linear combination of the HPWL and area, which floorplacement,” IEEE Trans. Comput.-Aided Design Integrat. Circuits
Syst., vol. 25, no. 7, pp. 1313–1326, Jul. 2006.
are equally weighted. We add “-minWL” to the Parquet 4.5 [6] T.-C. Chen and Y.-W. Chang, “Modern floorplanning based on B*-
command line. As shown in Table VI, DeFer produces 32% trees and fast simulated annealing,” IEEE Trans. Comput.-Aided Design
less whitespace than Parquet 4.5, with 18% less wirelength. Integrat. Circuits Syst., vol. 25, no. 4, pp. 637–650, Apr. 2006.
[7] Y. C. Wang, Y. W. Chang, G. M. Wu, and S. W. Wu, “B*-Tree: A
Overall, DeFer is 12% better in the total cost, and 76× faster new representation for nonslicing floorplans,” in Proc. Design Automat.
than Parquet 4.5. Conf., 2000, pp. 458–463.

Authorized licensed use limited to: Iowa State University. Downloaded on April 29,2010 at 19:31:34 UTC from IEEE Xplore. Restrictions apply.
YAN AND CHU: DEFER: DEFERRED DECISION MAKING ENABLED FIXED-OUTLINE FLOORPLANNING ALGORITHM 381

[8] T.-C. Chen, Y.-W. Chang, and S.-C. Lin, “A new multilevel framework [24] HB Floorplan Benchmarks [Online]. Available: http://cadlab.cs.ucla.edu/
for large-scale interconnect-driven floorplanning,” IEEE Trans. Comput.- cpmo/HBsuite.html
Aided Design Integrat. Circuits Syst., vol. 27, no. 2, pp. 286–294, Feb. [25] DeFer Source Code [Online]. Available: http://www.public.iastate.edu/∼
2008. zijunyan/
[9] S. Chen and T. Yoshimura, “Fixed-outline floorplanning: Enumerating
block positions and a new objective function for calculating area costs,”
Jackey Z. Yan received the B.S. degree in automa-
IEEE Trans. Comput.-Aided Design Integrat. Circuits Syst., vol. 27, no.
tion from the Huazhong University of Science and
5, pp. 858–871, May 2008.
Technology, Wuhan, China, in 2006. He is currently
[10] O. He, S. Dong, J. Bian, S. Goto, and C.-K. Cheng, “A novel fixed-
pursing the Ph.D. degree in computer engineering
outline floorplanner with zero deadspace for hierarchical design,” in
at the Department of Electrical and Computer Engi-
Proc. Int. Conf. Comput. Aided Design, 2008, pp. 16–23.
neering, Iowa State University, Ames.
[11] A. Ranjan, K. Bazargan, S. Ogrenci, and M. Sarrafzadeh, “Fast floor-
His research interests include very large scale
planning for effective prediction and construction,” IEEE Trans. Very
integration physical designs, specifically in algo-
Large Scale Integrat. Syst., vol. 9, no. 2, pp. 341–352, Apr. 2001.
rithms for floorplanning and placement, and physical
[12] P. G. Sassone and S. K. Lim, “A novel geometric algorithm for fast wire-
synthesis integrated system-on-a-chip designs.
optimized floorplanning,” in Proc. Int. Conf. Comput. Aided Design,
Mr. Yan’s work on fixed-outline floorplanning was
2003, pp. 74–80.
nominated for the Best Paper Award at the Design Automation Conference
[13] Y. Zhan, Y. Feng, and S. S. Sapatnekar, “A fixed-die floorplanning
in 2008. He received the Ultra-Excellent Student Award from Renesas
algorithm using an analytical approach,” in Proc. Asia South Pacific-
Technology Corp., Tokyo, Japan, in 2005.
Design Automat. Conf., 2006, pp. 771–776.
[14] J. Cong, M. Romesis, and J. R. Shinnerl, “Fast floorplanning by look-
ahead enabled recursive bipartitioning,” IEEE Trans. Comput.-Aided
Design Integrat. Circuits Syst., vol. 25, no. 9, pp. 1719–1732, Sep. 2006. Chris Chu received the B.S. degree from the Uni-
[15] J. Z. Yan and C. Chu, “DeFer: Deferred decision making enabled fixed- versity of Hong Kong, Hong Kong, in 1993, and
outline floorplanner,” in Proc. Design Automat. Conf., 2008, pp. 161– the M.S. and Ph.D. degrees from the University of
166. Texas, Austin, in 1994 and 1999, respectively, all in
[16] M. Lai and D. F. Wong, “Slicing tree is a complete floorplan represen- computer science.
tation,” in Proc. Design, Automat. Test Eur., 2001, pp. 228–232. Since 1999, Chu has been a Faculty with Iowa
[17] L. Stockmeyer, “Optimal orientations of cells in slicing floorplan de- State University, Ames. He is currently an Associate
signs,” Informat. Control, vol. 57, pp. 91–101, May–Jun. 1983. Professor with the Department of Electrical and
[18] G. Zimmerman, “A new area and shape function estimation technique Computer Engineering, Iowa State University. His
for very large scale integration layouts,” in Proc. Design Automat. Conf., research interests include computer-aided design of
1988, pp. 60–65. very large scale integration physical designs, and
[19] A. E. Dunlop and B. W. Kernighan, “A procedure for placement of design and analysis of algorithms.
standard-cell very large scale integration circuits,” IEEE Trans. Comput.- Dr. Chu received the IEEE Transactions on Computer-Aided Design
Aided Design Integrat. Circuits Syst., vol. 4, no. 1, pp. 92–98, Jan. 1985. of Integrated Circuits and Systems Best Paper Award in 1999 for
[20] G. Karypis and V. Kumar, “Multilevel k-way hypergraph partitioning,” his work in performance-driven interconnect optimization. He received the
in Proc. Design Automat. Conf., 1999, pp. 343–348. International Symposium on Physical Design (ISPD) Best Paper Award in
[21] S. Goto, “An efficient algorithm for the 2-D placement problem in 2004 for his work in efficient placement algorithm. He received the Bert Kay
electrical circuit layout,” IEEE Trans. Circuits Syst., vol. 28, no. 1, pp. Best Dissertation Award in 1998–1999 from the Department of Computer
12–18, Jan. 1981. Sciences, University of Texas. He has served on the technical program
[22] GSRC Floorplan Benchmarks [Online]. Available: http://vlsicad.eecs. committees of several major conferences including the Design Automation
umich.edu/BK/GSRCbench/ Conference, the International Conference on Computer-Aided Design, the
[23] A. Ng, I. L. Markov, R. Aggarwai, and V. Ramachandran, “Solving hard ISPD, the International Symposium on Circuits and Systems, the Design,
instances of floorplacement,” in Proc. Int. Symp. Phys. Design, 2006, Automation and Test in Europe, the Asia and South Pacific Design Automation
pp. 170–177. Conference, and the system level interconnect prediction.

Authorized licensed use limited to: Iowa State University. Downloaded on April 29,2010 at 19:31:34 UTC from IEEE Xplore. Restrictions apply.

You might also like