Advanced Clock Tree Design Implementation

Using IC-Compiler CTS Tool

Teng, Siong Kiong

Chye, Chuan Ning
Lim, Mui Liang
Yeap, Cheong Siak

Penang Design Center (PDC), Intel Corporation

This paper describes the flow and methodology to synthesize the complex clock tree design
and the non-upf multiple power domains clock tree synthesis and high fanout nets synthesis
approaches. With complex SoC design integration and low power initiative, the clocking
architecture with different components such as generated clocks, multiplexer and multiple
clock sources pose a great challenge to IC designer to implement the clock distribution
network. The ideas on how to handle the complex clocks constraints and the skew balancing
on clock that crosses different well hierarchy will be discussed in detail in this paper. We will
also present a non-upf multiple power domains voltage aware clock tree synthesis flow which
is able to effectively using ICC CTS tool to synthesize the entire clock tree. We will also
discuss on the high fanout net synthesis design on irregular floorplan.

Advanced Clock Tree Design Implementation

Using IC-Compiler CTS Tool

1.0 Introduction
In the Synopsys IC-Compiler tool suite, the physical design implementation can be done using a
standard IC-Compiler flow from placement and optimization, clock tree synthesis (CTS) to
routing and postroute optimization. IC-Compiler provides three core commands to achieve the
aforementioned flow by using place_opt, clock_opt and route_opt commands respectively.
However, in today SoC design, the clocking architecture complexity had increased tremendously.
The clock multiplexers, generated clocks and multiple clocks per register situations are observed
to be grown from one generation to another generation of VLSI design. The complex clocking
architecture actually poses a great challenge to VLSI designers on how to fine tune the design
constraints to have a common clock definition during placement optimization or place_opt stage
and during clock tree synthesis or clock_opt stage.
As the UPF flow had become IEEE standard, more and more designs will be migrated to UPF
flow. For some legacy designs that are still using non-upf flow, it will be a great challenge to
design the multiple power domains chip.
This paper will describe the method used to overcome the complex clocking architecture during
CTS phase. We will also present a non-upf clock tree synthesis methodology used by a multiple
power wells domain design. The ideas on how to handle a clock net that crosses different well
hierarchy and how to balance a clock tree between two different well types will be discussed in
detail in this paper. We will also present the strategy to handle the high fanout net synthesis on an
irregular floorplan using IC Compiler CTS technology.

2.0 Complex Clocking Clock Tree Synthesis

Figure 1 shows the IC Compiler recommended design flow. There are three major phases of
design optimizations. Each of them are being taken care off by a unique IC Compiler command
namely place_opt, clock_opt and route_opt. In the complex clocking architecture design, the
placement engine during place_opt might not want to see all the different clock sources for run
time purposes. When more clocks are created on a multiple clock sources path, the place_opt
engine will need to peel through every clock domains and this will increase the runtime
tremendously. However, during CTS, the tool needs all the clock source constraint to perform a
proper clock tree synthesis and skew balancing on all the clock domains.
In this section, we will describe how the design constraints and clock sources definition can
impact IC-Compiler behavior during clock tree synthesis phases. The throughput time versus the
CTS results quality using different innovation techniques to overcome this problem will also be
discussed in detail.

Advanced Clock Tree Design Implementation

Using IC-Compiler CTS Tool

Mapped design


Logical and
physical libraries

Design planning and power planning

Placement and optimization


Clock tree synthesis and optimization


Routing and postroute optimization


Chip finishing and design for



Figure 1: IC Compiler Design Flow

2.1 Clock Source Constraints during CTS
Figure 2 shows one of the complex clock structures on the design that had generated clocks,
clock multiplexer and multiple clocks per registers. For placement optimization during
place_opt, the clock definition can be simplified by only defining ClkA and letting ClkA
propagated to the generated clock. The generated clock can be defined with the master clock
sources of clkA only. This is done with the assumption that clkA had a higher frequency than
clkB. When all the clkA paths are meeting the setup requirement, all the clkB domain paths
would be met automatically because clkB will be running at lower speed compare to clkA. The
clkB clock source definition can be omitted to simplify the design and improve ICC tool runtime
efficiency during placement steps.

Advanced Clock Tree Design Implementation

Using IC-Compiler CTS Tool

Figure 2: Complex Clock structural with multiplexer and generated clock


Mapped design

Logical and
physical libraries

Design planning and power planning

Simplify clock definition

for placement

Placement and optimization


All clock sources

definition for CTS

Clock tree synthesis and optimization


Simplify clock definition

for placement

Routing and postroute optimization


Chip finishing and design for



Figure 3: IC Compiler Design Flow with Different Clock Sources

Advanced Clock Tree Design Implementation

Using IC-Compiler CTS Tool

When the design are done with place_opt, and moving to clock_opt phase, we start seeing
problem where the clkB are not being defined and therefore, ICC will not run the clock tree
compilation on clkB. To enable the clock tree synthesis compilation on all the clocks in the
design, every clock in the design will need to be defined. This will actually cause the run time
increases during placement optimization stage. Therefore, we proposed the different clock
sources definition during clock_opt phase. This will create a new set of clock definition which
will be optimized for CTS purposed only. The ideal solution will be ICC tool can allow user to
define a separate clock constraint base on each clock root during CTS when the design has the
complex clock architecture. This will prevent the hazard of removing all the clocks from the
design and re-defining the clock again during CTS stage. As for current solution, the approach is
to remove all the existing clocks in the design and redefined all the clock sources which will
enable ICC CTS to run correctly on all the clock definitions. As shown in figure3 on the IC
Compiler recommended flow, a completed clock source definition is created during clock_opt
step to ensure all the clock trees are being synthesized properly.
The following examples illustrated what was being done in the design with the complex clock
structure as shown in figure2.
During place_opt, the following clock constraint was being defined.
create_clock name clkA period 1000 [get_ports clkA]
create_generated_clock name genclkA source [get_ports clkA] divide_by 2 add \
-master_clock [get_clocks {clkA}] [get_pins instgenclk/Q]
During clock_opt or clock tree synthesis phase, the following clock constraint will be defined.
remove_clock all
create_clock name clkA period 1000 [get_ports clkA]
create_clock name clkB period 5000 [get_ports clkB]
create_generated_clock name genclkA source [get_ports clkA] divide_by 2 add \
-master_clock [get_clocks {clkA}] [get_pins instgenclk/Q]
create_generated_clock name genclkB source [get_ports clkB] divide_by 2 add \
-master_clock [get_clocks {clkB}] [get_pins instgenclk/Q]
The above example is showing a simplified case of the design. When the design gets more and
more complicated, the number of clock sources and generated clock that needed to be created
will increase. Figure 4 shows another design that has very complicated clocking architecture. For
this design, we will have cascaded generated clock and two cascaded multiplexer. To make sure
the CTS tool can recognize all the clocks that need to be treed, we will need to define multiple
clock sources for the generated clock. The clock tree synthesis clock definition will be as
create_clock name clkA period 1000 [get_ports clkA]
create_clock name clkB period 5000 [get_ports clkB]
Advanced Clock Tree Design Implementation

Using IC-Compiler CTS Tool

create_generated_clock name clkA_div2_D source [get_ports clkA] divide_by 2 add \

-master_clock [get_clocks {clkA}] [get_pins instgenclkD/Q]
create_generated_clock name clkB_div2_D source [get_ports clkB] divide_by 2 add \
-master_clock [get_clocks {clkB}] [get_pins instgenclkD/Q]
create_generated_clock name clkB_div2_C source [get_ports clkB] divide_by 2 add \
-master_clock [get_clocks {clkB}] [get_pins instgenclkC/Q]
create_generated_clock name clkB_div4_D source [get_pins instgenclkC/Q] divide_by 2\
add -master_clock [get_clocks {clkB_div2_C}] [get_pins instgenclkD/Q]

Figure 4: Complex Clock structural with cascaded multiplexer and generated clock
2.2 Non-Stop Pins Methodology
As generated clock creation may become complicated for the complex clock architecture, we
proposed a non-stop pin approach to overcome the aforementioned problem. In IC-Compiler
CTS flow, there is a clock tree exception attribute where user can specify on the specified pins in
the design. To enable the tool to tree through the generated clock flops instead of stop at the
generated clock pin, we will specify a non-stop attribute on the generated clock pin. This will
effectively eliminate all the generated clock creation in the clock constraint.
For the design on figure 4, the clock constraint will be as below:
create_clock name clkA period 1000 [get_ports clkA]
create_clock name clkB period 5000 [get_ports clkB]
set_clock_tree_exceptions -non_stop_pins [get_pins instgenclkC/CK]
set_clock_tree_exceptions -non_stop_pins [get_pins instgenclkD/CK]

Advanced Clock Tree Design Implementation

Using IC-Compiler CTS Tool

This will effectively take care of both clock sources from clkA and clkB because the generated
clock flop now will be treed through during clock tree compilations. This will also reduce
number of clock constraint needed to be defined when the complexity of the clocking
architecture increases. Against, this approach is design dependents and would only apply on
design in which the generated clock comes from both input pin of the MUX.
2.3 Results Comparison using Non-Stop Pins Methodology
We ran the clock tree compilation on the design with the very complex clocking architecture as
illustrated in figure 1 and figure 4.
Table 1: Results comparison of Non-Stop Pins Methodology
Create All Clock Sources
Using Non-Stop Pins
0.88x less
Total Clock buffers
Total DRC violations
0.95x less
Longest Path
0.84x faster
Table 1 shows detail experiment results. The result clearly shows that by using non-stop pins
approaches, it simplified the clock tree compilation flow and successfully improved the CTS
QOR. The overall clock skew is better using non-stop pins approach. The total number of clock
buffers used and total path insertion delay is also comparable for both cases. The total DRC
violations and the tool run time had improved by using the non-stop pins approaches.

3.0 Non-UPF Multiple Power Domains Clock Tree Synthesis

The non-UPF multiple power domain clock tree synthesis flow was developed to enable CTS
activities in the multiple power wells domain design. It is the standard CTS solution for designs
with multiple power well domains, prior to the introduction of voltage awareness technology and
UPF standard. The flow prevents direct cross-well signals for design with multiple power
domains. To ensure clock still operates correctly within the design, the flow also prevents the
clock signal from crossing the well domain during CTS, while maintaining skew target and
quality of results within various well types of a design. This flow is easily adaptable with any
CTS tools and it is separate into 2 phrases for methodology development. The multi-iteration
compilation CTS flow was the first flow introduction where CTS tools are still supporting for a
single well design. It is follow by ICCs voltage area awareness technology non-UPF multiple
power domain CTS.
3.1 Multi-iteration Compilation CTS Flow
The concept of multi-iteration compilation CTS flow is the first generation solution to a multiple
power wells domain design when EDA tools is still commonly support for single power well
domain design. It is a semi automates flow where the design structure component definition is
critical to successful multi-iteration compilation CTS flow.

Advanced Clock Tree Design Implementation

Using IC-Compiler CTS Tool


Component Definition in Multi-iteration Compilation CTS Flow

In Multi-iteration Compilation CTS flow, defining all clock components and their functionality is
crucial to ensure proper execution without affecting the quality of the final product. Figure 5
shows the technique used in defining CTS main components in designs with multiple power
Isolation Gate:
Main power well type:
Other power well type:
Always On well type:

Works as clock root for WELL 2 and float pin for WELL 1.

(Always On)




Figure 5: Defining all components in Multi-iteration CTS flow.


Process in Multi-iteration Compilation CTS flow

The Multi-Iteration Compilation CTS flow includes CTS on each power well, starting with the
power well domain at the bottom of the power hierarchy and ending at the always on power
well domain. Power hierarchy is determined by the design operating conditions, where always
on is highest on the hierarchy.
It begins with defining the power well hierarchy in the design, followed by identifying the
clocks root point and its float pin point. After compile clock tree on lower well clock tree, next
is compiling clock tree for power well at the top of the power hierarchy. After completing clock
tree compilation, all insertion delays from the CTS root is compiled as float pin delay. The float
pin delay ensures that all subsequent power well CTS executions would take into account this
insertion delay. With this, ICC would include the floating point delay in the following round of
CTS, which would balance the clock tree insertion delay. The process in Multi-iteration
Compilation CTS flow is shown in Figure 6.

Advanced Clock Tree Design Implementation

Using IC-Compiler CTS Tool

Input MW database
Define power well hierarchy
Identify Root point

Select lowest power hierarchy

not CTS
Compile Clock Tree
Report Synthesized Clock Tree
Defined CTSed root as float pin
Save Design database
Has every well
Complete multi iteration
compilation CTS flow

Figure 6: Multi-iteration Compilation CTS flow

3.2 Voltage Area Awareness Multiple Power Domain Clock Tree Synthesis
With voltage area awareness technology, ICC is proven effective in ensuring buffers are inserted
correctly during multiple power domain CTS in non-upf flow. ICC is capable of performing CTS
properly for wells with single well loads. However, having multiple wells loads with isolations
for a clock net is very common in designs with multiple power wells. The setback is when ICC
encounters multiple power well loads within a particular clock tree nets, ICC would stop CTS
with a warning. In other word, if the CTS net driving more than 1 difference power well loads, it
will skip CTS for that particular net. Discontinued CTS caused huge skew, max capacitance and
max transition issues, which require huge amount of effort to fix manually.
Well-crossing buffer auto insertion flow is developed to manage the various scenarios of
multiple power domains loads. It prevents multiple wells loads if cross-well signals exist before
CTS execution. The flow initiates a check throughout all clock paths. Once multiple power well
loads are detected, a well-crossing buffer would be inserted at the boundary of the cross power
Advanced Clock Tree Design Implementation

Using IC-Compiler CTS Tool

wells, according to design scenarios. After buffer insertion, there would be no direct multiple
power wells loads in the design. This enables ICC to complete CTS for the design without
having unnecessary skew, max capacitance and max transition issues. Figure 7 illustrates one of
well-crossing buffer insertion scenarios.







Multiple wells




Well crossing


Figure 7: Well crossing buffer insertion flow

Table 2: Results comparison on Well-crossing buffer auto insertion flow

Without Well-crossing buffer auto
With Well-crossing buffer auto
insertion flow
insertion flow
Longest insertion
Clock name
Longest insertion delay
Table 3: DRC violations comparison on Well-crossing buffer auto insertion flow
Total Maxcap
Total Maxtran
Without Well-crossing buffer auto insertion flow
With Well-crossing buffer auto insertion flow
Table 2 and Table 3 provide a further illustration on the result between with and without well
crossing buffer insertion flow. As shown in both tables, without well-crossing buffer insertion,
the clock skew is very high due to the facts the net crossing the well is not being buffered by the
tool. This also results in a very high maxcap and maxtran violations along the clock paths. With
the proposed well crossing buffer insertions flow, we successfully enable the cross well CTS
using IC-Compiler tool and yield a good clock skew and clock tree design QOR.

Advanced Clock Tree Design Implementation

Using IC-Compiler CTS Tool

4.0 High Fanout Net Synthesis Using ICC CTS Technology

4.1 Single Level Top-Mode CTS Flow
In ICC 2007 compile_clock_tree -high_fanout_net command creates problem when treeing
high-fan-out net on the design that having irregular floor-plan shape. For example, when the
driver and receiver are placed across the power gated domain, macro block or multi-well domain,
the tool engine will pad a lot of buffers beside the power gated domain, macro block or multiwell domain even though there are better area to insert the buffers tree. This is due to the
algorithm is missing the voltage aware ability to find other alternative area for buffer tree

Figure 8: High-Fan-Out Net that driving load which crossing power gated domain

Figure 9a: Compile_clock_tree

high_fanout_net buffer tree path

Figure 9b: Single level top mode CTS buffer

tree path


Advanced Clock Tree Design Implementation

Using IC-Compiler CTS Tool

A single level top mode CTS flow has been introduced to resolve this issue. Basically, single
level top-mode CTS flow is a wrapper that using ICC compile_clock_tree command together
with top-mode option and stop pin switches to achieve a good skew and shortest insertion delays
for clock tree synthesis. Figure 8 showed the high-fanout net that drives the loads which cross
power gated domain. Figure 9a showed the compile_clock_tree -high_fanout_net buffer tree path
while Figure 9b showed the single level top-mode CTS buffer tree path. From figure 9a, we can
see that 1st buffer that inserted using compile_clock_tree high_fanout_net command is placed
far away from clock root and more than 30 level of buffer padding near to the 1st buffer, this will
actually cause routing congestion issue, IR drop due to insufficient power when buffers are pad
together, maxcap violations and maxtran violations.
The following steps describe the methodology of the single level top mode CTS flow for
partition and the flow chart for the methodology is shown in Figure 10.
1. Clock constraints set the clock tree options to top mode
2. Clock definition define the clock at the driver of the high-fan-out net.
3. Clock exceptions set stop pins on the 1st level of the high-fan-out sinks.
4. Clock reference list defined the clock buffer that needed.

Figure 10: The flow chart of Single level top mode CTS flow
Table 4 provides a further illustration on the result between with and without single level top
mode CTS flow. Results proved that without using the single level top mode CTS flow, the
buffer count is lesser but it caused extremely high maxcap violations, maxtran violations and
longer insertion delay. Furthermore it also required extra buffer insertion to fix those maxcap
violations and maxtran violations after the run. This result in the buffers count might not be
accurate without using single level top mode CTS flow.

Advanced Clock Tree Design Implementation

Using IC-Compiler CTS Tool

Table 4: Results comparison between with and without single level top mode CTS flow
Without single level top-mode CTS
With single level top-mode CTS flow
Clock name
Number of
Local Skew
Number of
CT buffer
CT buffer
Clock A
Clock B

5.0 Conclusions and Recommendations

In this paper, we have presented the several ICC CTS work flows, issues and the algorithms to achieve
better clock tree QOR in an advanced clocking architecture. The non-stop pins ICC clock tree synthesis
flow developed and used in our project, had successfully reduced the clock source definition complexity
and constraint definition during CTS phase. The non-upf multiple well CTS methodology managed to
reduce our multiple power domain clock tree synthesis complexity and improve our CTS design. The
highout net synthesis work flow presented also successfully reduced the buffer counts and congestion
problem on the irregular floorplan design.
The techniques discussed above clearly show that the flow with Synopsys ICC CTS tool is able to
effectively handle the high complexity clocking and floorplaning design. Apart from that, it helps to deal
with multiple power well domain designs. Last but not least, the innovation had greatly reduced the
design convergence time spent on the clock distribution.

6.0 Acknowledgements
The author would like to acknowledge the Structural and Physical Design Team in Intel Penang Design
Center for their support and feedback on the multiple power well domain ICC CTS clock tree design and
the development of the tool and methodology to support the complex clocking architecture and
floorplaning design. Thanks also to Singapore Synopsys AE for debugging and solving the tool issues.
Lastly, I would like to thank Mr. Lim, Han Wooi and Mr. Suresh Kumar, Perabala for their outstanding
help and valuable guidance.

7.0 References
[1] IC Compiler User Guide: Implementation Version B-2008.9-SP4, March 2009

Advanced Clock Tree Design Implementation

Using IC-Compiler CTS Tool

