Cts
Cts
Cts
Table of Contents
1.0
2.0
2.1
2.2
2.3
3.0
3.1
3.1.1
3.1.2
3.2
4.0
4.1
5.0
6.0
7.0
Introduction......................................................................................................................... 3
Complex Clocking Clock Tree Synthesis ........................................................................... 3
Clock Source Constraints during CTS................................................................................ 4
Non-Stop Pins Methodology............................................................................................... 7
Results Comparison using Non-Stop Pins Methodology.................................................... 8
Non-UPF Multiple Power Domains Clock Tree Synthesis ................................................ 8
Multi-iteration Compilation CTS Flow .............................................................................. 8
Component Definition in Multi-iteration Compilation CTS Flow ................................. 9
Process in Multi-iteration Compilation CTS flow.......................................................... 9
Voltage Area Awareness Multiple Power Domain Clock Tree Synthesis........................ 10
High Fanout Net Synthesis Using ICC CTS Technology ................................................. 12
Single Level Top-Mode CTS Flow................................................................................... 12
Conclusions and Recommendations ................................................................................. 14
Acknowledgements........................................................................................................... 14
References......................................................................................................................... 14
Table of Figures
Figure 1 - IC Compiler Design Flow...4
Figure 2 - Complex Clock structural with multiplexer and generated clock .................................5
Figure 3 - IC Compiler Design Flow with Different Clock Sources .5
Figure 4 - Complex Clock structural with cascaded multiplexer and generated clock7
Figure 5 - Defining all components in Multi-iteration CTS flow9
Figure 6 - Multi-iteration Compilation CTS flow..10
Figure 7 - Well crossing buffer insertion flow...11
Figure 8 - High-Fan-Out Net that driving load which crossing power gated domain12
Figure 9a - Compile_clock_tree high_fanout_net buffer tree path.12
Figure 9b - Single level top mode CTS buffer tree path12
Figure 10 - The flow chart of Single level top mode CTS flow13
Table 1 - Results comparison of Non-Stop Pins Methodology...8
Table 2 - Results comparison on Well-crossing buffer auto insertion flow..11
Table 3 - DRC violations comparison on Well-crossing buffer auto insertion flow.11
Table 4 - Results comparison between with and without single level top mode CTS flow..14
1.0 Introduction
In the Synopsys IC-Compiler tool suite, the physical design implementation can be done using a
standard IC-Compiler flow from placement and optimization, clock tree synthesis (CTS) to
routing and postroute optimization. IC-Compiler provides three core commands to achieve the
aforementioned flow by using place_opt, clock_opt and route_opt commands respectively.
However, in today SoC design, the clocking architecture complexity had increased tremendously.
The clock multiplexers, generated clocks and multiple clocks per register situations are observed
to be grown from one generation to another generation of VLSI design. The complex clocking
architecture actually poses a great challenge to VLSI designers on how to fine tune the design
constraints to have a common clock definition during placement optimization or place_opt stage
and during clock tree synthesis or clock_opt stage.
As the UPF flow had become IEEE standard, more and more designs will be migrated to UPF
flow. For some legacy designs that are still using non-upf flow, it will be a great challenge to
design the multiple power domains chip.
This paper will describe the method used to overcome the complex clocking architecture during
CTS phase. We will also present a non-upf clock tree synthesis methodology used by a multiple
power wells domain design. The ideas on how to handle a clock net that crosses different well
hierarchy and how to balance a clock tree between two different well types will be discussed in
detail in this paper. We will also present the strategy to handle the high fanout net synthesis on an
irregular floorplan using IC Compiler CTS technology.
Mapped design
Timing
constraints
Logical and
physical libraries
Completed
design
Timing
constraints
Mapped design
Logical and
physical libraries
Completed
design
When the design are done with place_opt, and moving to clock_opt phase, we start seeing
problem where the clkB are not being defined and therefore, ICC will not run the clock tree
compilation on clkB. To enable the clock tree synthesis compilation on all the clocks in the
design, every clock in the design will need to be defined. This will actually cause the run time
increases during placement optimization stage. Therefore, we proposed the different clock
sources definition during clock_opt phase. This will create a new set of clock definition which
will be optimized for CTS purposed only. The ideal solution will be ICC tool can allow user to
define a separate clock constraint base on each clock root during CTS when the design has the
complex clock architecture. This will prevent the hazard of removing all the clocks from the
design and re-defining the clock again during CTS stage. As for current solution, the approach is
to remove all the existing clocks in the design and redefined all the clock sources which will
enable ICC CTS to run correctly on all the clock definitions. As shown in figure3 on the IC
Compiler recommended flow, a completed clock source definition is created during clock_opt
step to ensure all the clock trees are being synthesized properly.
The following examples illustrated what was being done in the design with the complex clock
structure as shown in figure2.
During place_opt, the following clock constraint was being defined.
create_clock name clkA period 1000 [get_ports clkA]
create_generated_clock name genclkA source [get_ports clkA] divide_by 2 add \
-master_clock [get_clocks {clkA}] [get_pins instgenclk/Q]
During clock_opt or clock tree synthesis phase, the following clock constraint will be defined.
remove_clock all
create_clock name clkA period 1000 [get_ports clkA]
create_clock name clkB period 5000 [get_ports clkB]
create_generated_clock name genclkA source [get_ports clkA] divide_by 2 add \
-master_clock [get_clocks {clkA}] [get_pins instgenclk/Q]
create_generated_clock name genclkB source [get_ports clkB] divide_by 2 add \
-master_clock [get_clocks {clkB}] [get_pins instgenclk/Q]
The above example is showing a simplified case of the design. When the design gets more and
more complicated, the number of clock sources and generated clock that needed to be created
will increase. Figure 4 shows another design that has very complicated clocking architecture. For
this design, we will have cascaded generated clock and two cascaded multiplexer. To make sure
the CTS tool can recognize all the clocks that need to be treed, we will need to define multiple
clock sources for the generated clock. The clock tree synthesis clock definition will be as
follows:
create_clock name clkA period 1000 [get_ports clkA]
create_clock name clkB period 5000 [get_ports clkB]
SNUG Singapore 2009
Figure 4: Complex Clock structural with cascaded multiplexer and generated clock
2.2 Non-Stop Pins Methodology
As generated clock creation may become complicated for the complex clock architecture, we
proposed a non-stop pin approach to overcome the aforementioned problem. In IC-Compiler
CTS flow, there is a clock tree exception attribute where user can specify on the specified pins in
the design. To enable the tool to tree through the generated clock flops instead of stop at the
generated clock pin, we will specify a non-stop attribute on the generated clock pin. This will
effectively eliminate all the generated clock creation in the clock constraint.
For the design on figure 4, the clock constraint will be as below:
create_clock name clkA period 1000 [get_ports clkA]
create_clock name clkB period 5000 [get_ports clkB]
set_clock_tree_exceptions -non_stop_pins [get_pins instgenclkC/CK]
set_clock_tree_exceptions -non_stop_pins [get_pins instgenclkD/CK]
This will effectively take care of both clock sources from clkA and clkB because the generated
clock flop now will be treed through during clock tree compilations. This will also reduce
number of clock constraint needed to be defined when the complexity of the clocking
architecture increases. Against, this approach is design dependents and would only apply on
design in which the generated clock comes from both input pin of the MUX.
2.3 Results Comparison using Non-Stop Pins Methodology
We ran the clock tree compilation on the design with the very complex clocking architecture as
illustrated in figure 1 and figure 4.
Table 1: Results comparison of Non-Stop Pins Methodology
Create All Clock Sources
Using Non-Stop Pins
Skew
1x
0.88x less
Total Clock buffers
Same
Same
Total DRC violations
1x
0.95x less
Longest Path
Same
Same
Runtime
1x
0.84x faster
Table 1 shows detail experiment results. The result clearly shows that by using non-stop pins
approaches, it simplified the clock tree compilation flow and successfully improved the CTS
QOR. The overall clock skew is better using non-stop pins approach. The total number of clock
buffers used and total path insertion delay is also comparable for both cases. The total DRC
violations and the tool run time had improved by using the non-stop pins approaches.
3.1.1
In Multi-iteration Compilation CTS flow, defining all clock components and their functionality is
crucial to ensure proper execution without affecting the quality of the final product. Figure 5
shows the technique used in defining CTS main components in designs with multiple power
wells.
Isolation Gate:
Main power well type:
Other power well type:
Always On well type:
Works as clock root for WELL 2 and float pin for WELL 1.
WELL 1
WELL 2
WELL 1
Clock
Root
WELL 1
(Always On)
WELL 1
LOADS
Isolation
Gate
WELL 2
LOADS
WELL 2
The Multi-Iteration Compilation CTS flow includes CTS on each power well, starting with the
power well domain at the bottom of the power hierarchy and ending at the always on power
well domain. Power hierarchy is determined by the design operating conditions, where always
on is highest on the hierarchy.
It begins with defining the power well hierarchy in the design, followed by identifying the
clocks root point and its float pin point. After compile clock tree on lower well clock tree, next
is compiling clock tree for power well at the top of the power hierarchy. After completing clock
tree compilation, all insertion delays from the CTS root is compiled as float pin delay. The float
pin delay ensures that all subsequent power well CTS executions would take into account this
insertion delay. With this, ICC would include the floating point delay in the following round of
CTS, which would balance the clock tree insertion delay. The process in Multi-iteration
Compilation CTS flow is shown in Figure 6.
Input MW database
Define power well hierarchy
Identify Root point
10
wells, according to design scenarios. After buffer insertion, there would be no direct multiple
power wells loads in the design. This enables ICC to complete CTS for the design without
having unnecessary skew, max capacitance and max transition issues. Figure 7 illustrates one of
well-crossing buffer insertion scenarios.
Clock
Root
WELL 1
Clock
Root
WELL 1
LOAD
WELL 1
WELL 1
LOAD
Multiple wells
loads
Isolation
Gate
Isolation
Gate
WELL 2
LOADS
Well crossing
buffer
WELL 2
LOADS
11
Figure 8: High-Fan-Out Net that driving load which crossing power gated domain
12
A single level top mode CTS flow has been introduced to resolve this issue. Basically, single
level top-mode CTS flow is a wrapper that using ICC compile_clock_tree command together
with top-mode option and stop pin switches to achieve a good skew and shortest insertion delays
for clock tree synthesis. Figure 8 showed the high-fanout net that drives the loads which cross
power gated domain. Figure 9a showed the compile_clock_tree -high_fanout_net buffer tree path
while Figure 9b showed the single level top-mode CTS buffer tree path. From figure 9a, we can
see that 1st buffer that inserted using compile_clock_tree high_fanout_net command is placed
far away from clock root and more than 30 level of buffer padding near to the 1st buffer, this will
actually cause routing congestion issue, IR drop due to insufficient power when buffers are pad
together, maxcap violations and maxtran violations.
The following steps describe the methodology of the single level top mode CTS flow for
partition and the flow chart for the methodology is shown in Figure 10.
1. Clock constraints set the clock tree options to top mode
2. Clock definition define the clock at the driver of the high-fan-out net.
3. Clock exceptions set stop pins on the 1st level of the high-fan-out sinks.
4. Clock reference list defined the clock buffer that needed.
Figure 10: The flow chart of Single level top mode CTS flow
Table 4 provides a further illustration on the result between with and without single level top
mode CTS flow. Results proved that without using the single level top mode CTS flow, the
buffer count is lesser but it caused extremely high maxcap violations, maxtran violations and
longer insertion delay. Furthermore it also required extra buffer insertion to fix those maxcap
violations and maxtran violations after the run. This result in the buffers count might not be
accurate without using single level top mode CTS flow.
13
Table 4: Results comparison between with and without single level top mode CTS flow
Without single level top-mode CTS
With single level top-mode CTS flow
Clock name
flow
Number of
Local Skew
DRC
Number of
Local
DRC
CT buffer
Skew
violation
CT buffer
violation
magnitude
magnitude
Clock A
1x
1x
1x
2.25x
0.34x
0.10x
Clock B
1x
1x
1x
1.02x
0.61x
0.25x
6.0 Acknowledgements
The author would like to acknowledge the Structural and Physical Design Team in Intel Penang Design
Center for their support and feedback on the multiple power well domain ICC CTS clock tree design and
the development of the tool and methodology to support the complex clocking architecture and
floorplaning design. Thanks also to Singapore Synopsys AE for debugging and solving the tool issues.
Lastly, I would like to thank Mr. Lim, Han Wooi and Mr. Suresh Kumar, Perabala for their outstanding
help and valuable guidance.
7.0 References
[1] IC Compiler User Guide: Implementation Version B-2008.9-SP4, March 2009
14