0% found this document useful (0 votes)
5 views54 pages

Module 2a

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 54

Module 2

Floor Planning, Placement and Routing


Module 2
• Floor planning and placement: Goals and objectives, Measurement
of delay in Floor planning, Floor planning tools, Channel definition,
I/O and Power planning and Clock planning.
• Placement: Goals and Objectives, Min-cut Placement algorithm,
Iterative Placement Improvement, Time driven placement methods,
Physical Design Flow.
• Routing: Global Routing: Goals and objectives, Global Routing
Methods, Global routing between blocks, Back annotation.
(Text Book 1)
Floorplanning
• Floorplanning is the process of placing blocks/macros in the chip area.
• In this step, the netlist which describes the design and various blocks
of the design and the interconnection between the different blocks.
• The netlist is the logical description of the ASIC design. Floorplan is
the physical description of the ASIC design. In floorplanning we are
mapping logic description of the design with physical description.
• At the start of floorplanning we have a netlist describing circuit
blocks, the logic cells within the blocks, and their connections.
• For example, Figure 1 shows the Viterbi decoder example as a
collection of standard cells with no room set aside yet for routing.
• We can think of the standard cells as a hod of bricks to be made into a
wall. What we have to do now is set aside spaces (we call these
spaces the channels ) for interconnect, the mortar, and arrange the
cells
The starting point for the floorplanning and placement steps for the Viterbi decoder (containing only standard cells).
This is the initial display of the floorplanning and placement tool. The small boxes that look like bricks are the
outlines of the standard cells. The largest standard cells, at the bottom of the display (labeled dfctnb) are 188 D flip-
flops. The '+' symbols represent the drawing origins of the standard cells—for the D flip-flops they are shifted to the
left and below the logic cell bottom left-hand corner. The large box surrounding all the logic cells represents the
estimated chip size. (This is a screen shot from Cadence Cell Ensemble.)
The Viterbi Decoder (from Figure 16.1 ) after floorplanning and placement. There are 18 rows of standard cells
separated by 17 horizontal channels (labeled 2–18). The channels are routed as numbered. In this example, the
I/O pads are omitted to show the cell placement more clearly.
• Figure 3 shows that both interconnect delay and gate delay decrease
as we scale down feature sizes—but at different rates. This is because
interconnect capacitance tends to a limit of about 2 pFcm –1 for a
minimum-width wire while gate delay continues to decrease.
Floorplanning allows us to predict this interconnect delay by
estimating interconnect length.
Goals and Objectives
• The goals of floorplanning are to:
• arrange the blocks on a chip
• decide the location of the I/O pads
• decide the location and number of the power pads,
• decide the type of power distribution, and
• decide the location and type of clock distribution.
• The objectives of floorplanning are to minimize the
chip area and minimize delay
Measurement of Delay in Floorplanning
• Throughout the ASIC design process we need to predict the performance of
the final layout.
• In floorplanning we wish to predict the interconnect delay before we
complete any routing.
• To predict delay we need to know the parasitics associated with
interconnect: the interconnect capacitance ( wiring capacitance or routing
capacitance ) as well as the interconnect resistance
• At the floorplanning stage we know only the fanout ( FO ) of a net (the
number of gates driven by a net) and the size of the block that the net
belongs to.
• We cannot predict the resistance of the various pieces of the interconnect
path since we do not yet know the shape of the interconnect for a net
• However, we can estimate the total length of the interconnect and
thus estimate the total capacitance.
• We estimate interconnect length by collecting statistics from
previously routed chips and analyzing the results.
• From these statistics we create tables that predict the interconnect
capacitance as a function of net fanout and block size.
• A floorplanning tool can then use these predicted-capacitance tables
(also known as interconnect-load tables or wire-load tables).
Floorplanning Tools

Floorplanning a cell-based ASIC. (a) Initial floorplan generated by the floorplanning tool. Two of the blocks are
flexible (A and C) and contain rows of standard cells (unplaced). A pop-up window shows the status of block A.
(b) An estimated placement for flexible blocks A and C. The connector positions are known and a rat’s nest
display shows the heavy congestion below block B. (c) Moving blocks to improve the floorplan. (d) The updated
display shows the reduced congestion after the changes.
Congestion analysis. (a) The initial floorplan with a 2:1.5 die aspect ratio. (b) Altering the floorplan to give a 1:1 chip
aspect ratio. (c) A trial floorplan with a congestion map. Blocks A and C have been placed so that we know the
terminal positions in the channels. Shading indicates the ratio of channel density to the channel capacity. Dark areas
show regions that cannot be routed because the channel congestion exceeds the estimated capacity. (d) Resizing
flexible blocks A and C alleviates congestion.
Channel Definition
• During the floorplanning step we assign the areas between blocks
that are to be used for interconnect. This process is known as channel
definition or channel allocation
Defining the channel routing order for a slicing floorplan using a slicing tree. (a) Make a cut all the way across the
chip between circuit blocks. Continue slicing until each piece contains just one circuit block. Each cut divides a piece
into two without cutting through a circuit block. (b) A sequence of cuts: 1, 2, 3, and 4 that successively slices the chip
until only circuit blocks are left. (c) The slicing tree corresponding to the sequence of cuts gives the order in which to
route the channels: 4, 3, 2, and finally 1.
Figure shows a floorplan that is not a slicing structure. We cannot cut the chip all the way across with a knife
without chopping a circuit block in two. This means we cannot route any of the channels in this floorplan
without routing all of the other channels first. We say there is a cyclic constraint in this floorplan. There are
two solutions to this problem. One solution is to move the blocks until we obtain a slicing floorplan. The other
solution is to allow the use of L -shaped, rather than rectangular, channels (or areas with fixed connectors on
all sides—a switch box ). We need an area-based router rather than a channel router to route L -shaped
regions or switch boxes
Channel definition and ordering.
(a) We can eliminate the cyclic constraint by merging the blocks A
and C.
(b) A slicing structure.
I/O and Power Planning
• Every chip communicates with the outside world.
• Signals flow onto and off the chip and we need to supply power. We
need to consider the I/O and power constraints early in the
floorplanning process.
• A silicon chip or die (plural die, dies, or dice) is mounted on a chip
carrier inside a chip package .
• Connections are made by bonding the chip pads to fingers on a metal
lead frame that is part of the package.
• The metal lead-frame fingers connect to the package pins .
FIGURE 16.12 Pad-limited and core-limited die. (a) A pad-limited die. The number of pads determines
the die size. (b) A core-limited die: The core logic determines the die size. (c) Using both pad-limited
pads and core-limited pads for a square die.
Bonding pads
(a) This chip uses both pad-
limited and core-limited pads.
(b) A hybrid corner pad.
(c) A chip with stagger-bonded
pads.
(d) An area-bump bonded chip
(or flip-chip). The chip is turned
upside down and solder bumps
connect the pads to the lead
frame
Gate-array I/O pads. (a) Cell-based ASICs may contain pad cells of
different sizes and widths. (b) A corner of a gate-array base. (c) A gate-
array base with different I/O cell and pad pitches.
Power Distribution
• Power distribution.
• (a) Power distributed using m1 for VSS and m2 for VDD. This helps
minimize the number of vias and layer crossings needed but causes
problems in the routing channels.
• (b) In this floorplan m1 is run parallel to the longest side of all
channels, the channel spine. This can make automatic routing easier
but may increase the number of vias and layer crossings.
• (c) An expanded view of part of a channel (interconnect is shown as
lines). If power runs on different layers along the spine of a channel,
this forces signals to change layers.
• (d) A closeup of VDD and VSS buses as they cross. Changing layers
requires a large number of via contacts to reduce resistance
Clock planning
• Clock distribution.
• (a) A clock spine for a gate array.
• (b) A clock spine for a cell-based ASIC (typical chips have thousands of
clock nets).
• (c) A clock spine is usually driven from one or more clock-driver cells.
Delay in the driver cell is a function of the number of stages and the
ratio of output to input capacitance for each stage (taper).
• (d) Clock latency and clock skew. We would like to minimize both
latency and skew.
clock tree.
• Minimum delay is achieved when the taper of successive stages is
about 3.
• (b) Using a fanout of three at successive nodes.
• (c) A clock tree for a cell-based ASIC We have to balance the clock
arrival times at all of the leaf nodes to minimize clock skew
Placement
• After completing a floorplan we can begin placement of the logic cells
within the flexible blocks.
• Placement is much more suited to automation than floorplanning.
Thus we shall need measurement techniques and algorithms.
• After we complete floorplanning and placement, we can predict both
intrablock and interblock capacitances.
• This allows us to return to logic synthesis with more accurate
estimates of the capacitive loads that each logic cell must drive.
Goals and Objectives
• The goal of a placement tool is to arrange all the logic cells within the
flexible blocks on a chip.
• Ideally, the objectives of the placement step are to
• Guarantee the router can complete the routing step
• Minimize all the critical net delays
• Make the chip as dense as possible
• Minimize power dissipation
• Minimize cross talk between signals
Min-cut placement algorithm
• The min-cut placement method uses successive application of
partitioning [ Breuer, 1977].
• The following steps are shown in Figure
• 1.Cut the placement area into two pieces.
• 2. Swap the logic cells to minimize the cut cost.
• 3. Repeat the process from step 1, cutting smaller pieces until all the
logic cells are placed.
Iterative Placement Algorithm
• An iterative placement improvement algorithm takes an existing placement
and tries to improve it by moving the logic cells.
• There are two parts to the algorithm:
• The selection criteria that decides which logic cells to try moving.
• The measurement criteria that decides whether to move the selected
cells.
• There are several iterative or interchange methods
• Pairwise interchange
• Force directed interchange
• Force directed relaxation
• Force directed pairwise relaxation
• Select the source logic cell at random.
• 2. Try all the other logic cells in turn as the destination logic cell.
• 3. Use any of the measurement methods we have discussed to
decide on whether to accept the interchange.
• 4. The process repeats from step 1, selecting each logic cell in turn as
a source logic cell.

Time Driven placement methods
• Minimizing delay is becoming more and more important as a placement
objective.
• There are two main approaches: net based and path based. We know that
we can use net weights in our algorithms.
• The problem is to calculate the weights. One method finds the n most
critical paths (using a timing-analysis engine, possibly in the synthesis tool).
• The net weights might then be the number of times each net appears in
this list.
• The problem with this approach is that as soon as we fix (for example) the
first 100 critical nets, suddenly another 200 become critical. This is rather
like trying to put worms in a can—as soon as we open the lid to put one in,
two more pop out.
Physical Design Flow
• 1. Design entry. The input is a logical description with no physical information.
• 2. Synthesis. The initial synthesis contains little or no information on any
interconnect loading. The output of the synthesis tool (typically an EDIF netlist) is
the input to the floorplanner.
• 3. Initial floorplan. From the initial floorplan interblock capacitances are input to
the synthesis tool as load constraints and intrablock capacitances are input as wire-
load tables.
• 4. Synthesis with load constraints. At this point the synthesis tool is able to
resynthesize the logic based on estimates of the interconnect capacitance each gate
is driving. The synthesis tool produces a forward annotation file to constrain path
delays in the placement step.
• 5. Timing-driven placement. After placement using constraints from the
synthesis tool, the location of every logic cell on the chip is fixed and accurate
estimates of interconnect delay can be passed back to the synthesis tool.
• 6. Synthesis with in-place optimization ( IPO ). The synthesis tool changes the
drive strength of gates based on the accurate interconnect delay estimates from the
floorplanner without altering the netlist structure.
• 7. Detailed placement. The placement information is ready to be input to the
routing step.
Routing
• Once the designer has floorplanned a chip and the logic cells within
the flexible blocks have been placed, it is time to make the
connections by routing the chip.
• This is still a hard problem that is made easier by dividing it into
smaller problems.
• Routing is usually split into global routing followed by detailed routing
.
• Figure 1 shows the core of the Viterbi decoder after the placement
step. This implementation consists entirely of standard cells (18
rows). The I/O pads are not included in this example—we can route
the I/O pads after we route the core (though this is not always a good
idea).
• Figure 2 shows the Viterbi decoder chip after global and detailed
routing. The routing runs in the channels between the rows of logic
cells, but the individual interconnections are too small to see.
FIGURE 1 The core of the Viterbi decoder chip after placement (a screen shot from Cadence Cell Ensemble).
This is the same placement but without the channel labels. You can see the rows of standard cells; the widest
cells are the D flip-flops.
FIGURE 2 The core of the Viterbi decoder chip after the completion of global and detailed routing (a
screen shot from Cadence Cell Ensemble). This chip uses two-level metal. Although you cannot see
the difference, m1 runs in the horizontal direction and m2 in the vertical direction
Global Routing
• A global router does not make any connections, it just plans them.
We typically global route the whole chip (or large pieces if it is a large
chip) before detail routing the whole chip (or the pieces).
• There are two types of areas to global route: inside the flexible blocks
and between blocks (the Viterbi decoder, although a cell-based ASIC,
only involved the global routing of one large flexible block).
Goals and Objectives
• The input to the global router is a floorplan that includes the locations
of all the fixed and flexible blocks; the placement information for
flexible blocks; and the locations of all the logic cells
• The goal of global routing is to provide complete instructions to the
detailed router on where to route every net
• The objectives of global routing are one or more of the following:
• Minimize the total interconnect length.
• Maximize the probability that the detailed router can complete the routing.
• Minimize the critical path delay.
Global Routing Methods
• Global routing cannot use the interconnect length approximations
such as half perimeter measure that were used in placement
• What is needed is actual path not an approximation to the path
length
• One approach to global routing takes each net in turn and calculates
the shortest path using tree on graph algorithms-with the added
restrictions of using the available channel. This process is known as
sequential routing.
• As a sequential routing algorithm proceeds, some channels will
become more congested since they hold more interconnects than
others.
• In case of FPGAs and channeled gate arrays, the channel have fixed
channel capacity and can hold only certain number of interconnects.
• There are 2 different ways that a global router handles this problem:
Order independent routing and order dependent routing.
• Under order routing, a global router proceeds by routing each net,
ignoring how crowded the channels are.
• Whether a particular net is processed first or last does not matter, the
channel assignment will be the same.
• In order independent routing after all the interconnects are assigned
to channel the global router returns to those channels that are the
most crowded and reassigns some interconnects to other less
crowded channels.
• Hierarchical routing handles all nets at a particular level at once.
• Rather than handling all of the nets on the chip at the same time, the
global routing problem is made more tractable by dividing chip area
into levels of hierarchy
• The area of the chip is recursively divided into smaller and
smaller regions, until the routing problem within a region
can be handled by a simple optimal router. Then the
adjacent regions are successively pasted together to
obtain the routing of the entire chip.
• By considering only one level of hierarchy at time the size of problem
is reduced at each level.
• There are 2 ways to traverse the level of hierarchy : starting at the
whole chip or highest level and preceeding down to the logic cells is
the top-down approach.
• The bottom up approach starts at the lowest level and globally routes
the smallest area first.
Back annotation
• Once global routing is complete it is possible to accurately predict
what the length of each interconnect in every net will be after
detailed routing.
• The global router can give us not just an estimate of the total net
length but the resistance and capacitance of each path in each net.
• This RC information is used to calculate net delays. We can back-
annotate this net delay information to the synthesis tool for in-place
optimization or to a timing verifier to make sure there are no timing
surprises.

You might also like