CTS FlowGuide
CTS FlowGuide
CTS FlowGuide
Application Note
Table Of Content
Purpose ............................................................................................................................... 3 Audience ............................................................................................................................. 3 Overview ............................................................................................................................. 3 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 12. Introduction ................................................................................................................. 4 Generating the clock tree specification file .............................................................. 4 How to choose buffers for CTS.................................................................................. 5 Understanding of Specification file ........................................................................... 5 Creating Macro Models to handle hard macros ....................................................... 7 Create dynamic macro models to handle clock dividers ........................................ 7 Synthesizing the clock tree ........................................................................................ 8 Routing the clock tree ................................................................................................ 8 Optimization of clock tree .......................................................................................... 9 Tracing and Analysis of clock tree ....................................................................... 10 CTS with Multi-Mode Multi-Corner (MMMC) Flow ................................................ 12
13. Guidelines and Issues............................................................................................ 13 1. Guidelines for Avoiding the Hold Violation ..................................................................... 13 2. Issue on tracing the Bi-direction ports ............................................................................... 14
Purpose
This document provides understanding and complete flow for Clock Tree synthesis.
Audience
This document is meant for users / designers doing CTS using Encounter Digital Implementation (EDI) system versions 9.1 and 10.1
Overview
This document talks about Clock Tree synthesis understanding and flow.
1. Introduction
Clock tree synthesis is performed to meet the clock timing constraints, such as clock skew, latency (insertion delay) and the transition time. General Issues caused by improper CTS: Routing Congestion Sudden Rise in Std. cell Density Timing closure Issues. Best CTS will Yield: Reasonable Density change Well controlled CTS structure intern yields best Insertion / skew /clock transitions Early timing closure. Less prone to Cross talks
set_clock_uncertainty
create_generated_clock :
If create_clock have multiple ports then it will define clocks to a clock group (clkGroup).
In addition, there are other useful (design dependant) constraints, as shown below, which could be part of the constraints applied to the clock root pin. To mark the pin as leaf pin:
LeafPin + <pinname1> + <pinname2> CTS treats the pins as sinks, stops tracing further, and balances clock skew.
CTS would not insert buffers for nets that are between the start pin and end pin and consider having the DontTouchNet attribute.
CTS would not add a new port to the specified logical modules at their given hierarchical.
During clock tree synthesis, order of the clocks defined in the clock specification file is important and synthesis depends on this. The clock which is defined first in the clock specification file will be build first irrespective of its clock frequency. This is also true for clock routing as well. So, the clock defined first will be routed first and so on. So, the recommendation is to keep the faster clock in your specification file first so that it does not stop due to reconvergence and gets the maximum space for routing.
Here using the dynamic macro model we can balance the skew between the two flops. Once specify clock pin of Flop B as a reference pin and clock pin of Flop A as the target pin then the clock pin of flop A is balanced with the clock pin of flop B. The DynamicMacroModel statement minimizes the skew between these two flops to avoid timing violation on the data path. Since without the dynamic macro model the clock pin of Flop A is balanced with the group of flops and not with the clock pin of Flop B because of the ThroughPin that has been defined in Flop B.
If user wants to use any non default rule/shielding for any particular clock then they have to define the RouteTypeName along with the rules in the constraint file which later be defined at RouteType in that particular clock definition. In case there will be an some routability issues and desire to change the properties of any particular clock even it already have some property set during CTS then the setAttribute command with -net and -preferred_extra_space/non_default_rule options can be used to attach attributes to the desired nets. Another way of routing the clock nets are through the routed guides so when we gave the command routeClockNetWithGuide CTS will build a brand new routing guide. This routing guide is based on the steiner estimation for the clock trees in the design which was loaded. The flow for using the routed guide is as below: restoreDesign specifyClockTree -clkfile xxx.cts routeClockNetWithGuide If user wants to route the specific clock with some specific rule then it can also be possible using the attribute settings. Some of the features of routing the clocks are specified widths, shielding, and extra spacing. Specified Width: Non-default rules can be used to route the clock net using a wider width wire. First, define the rule in the LEF using the NONDEFAULTRULE syntax. Then use setAttribute to assign the rule to clock nets: setAttribute -net @clock -non_default_rule wide_wire_rule1 Shielding: Use the -shield_net attribute to specify the net(s) to use for shielding. setAttribute -net @clock -shield_net {VDD VSS} Extra Spacing: Specify extra spacing using the attribute -preferred_extra_space. The value specified is from 0 to 3 routing tracks. Nanoroute will do its best to achieve the specified spacing but will reduce the spacing to avoid violations. An example flow of specifying clock routing attributes and routing the clock nets is below: setAttribute -net @clock non_default_rule wide_wire_rule1 setAttribute -net @clock -shield_net {VSS} selectNet clock setNanoRouteMode -routeSelectedNetOnly true routeDesign
The ckECO command performs resizing and buffer insertion or dummy buffer insertion to improve skew. In addition, the ckECO command might move gating cells when the ckECO command runs refinePlace. The ckECO command also supports local skew optimization (with the localSkew parameter). Local skew optimization considers the skew between adjacent flip-flops that have data path connection (from a Q-pin of one flip-flop to the D-pin of another flipflop). Below are the options to control the behavior of the ckECO. -preRoute: Used when there is no license to run NanoRoute; or their flow is to build the clock tree, optimize clock tree, and then call another router to route the clock net. -clkRouteOnly: To use immediately after the clock tree is routed. -postRoute: To use after all signal nets are routed.
If we are using ckSynthesis command then we can use the option forceReconvergent. This option should be used if the physical partition has muxed clocks and CTS is expected to build a clock tree for every clock root of the muxed clock. The option will allow CTS to handle (trace through) the muxed clocks and generate a balanced tree starting from all the clock root branches of the muxed clock. CTS can support crossover clocks, but the subtree after the crossover pin must have the same conditions defined in both tree specifications. For example, if a subtree is marked with ExcludedPin, LeafPin or PreservePin in one, it must also be for the other.
11.
Cloning distributes the clock gating components and their gated loads, depending on the parameters specify. This can be used to optimize the amount and placement of the Gated Clock cells based on the placement of the design. Gated clock cells are typically inserted into the netlist during synthesis which may not be placement aware. Optimizing the Gated clocks cells after placement can improve the placement and improve the design performance. Cloning does not fix design rule violations on the data path, so if in cloning a large number of cells you create a high fanout net for the enable signal, then you would need to run IPO to fix it. Clock cloning *can* add a clock gating cell to the netlist. (ckSynthesis does not add gating cells). The command used to do this ckCloneGate. Decloning are identical clock gating components with the same inputs, depending on the parameters you specify. This step can be run prior to ckCloneGate to provide ckCloneGate a better starting point. To achieve the highest level of decloning use the options -ignoreDontTouch and -ignorePreplaced. To check the decloning that ckDecloneGate will do prior to committing it, run the "ckDecloneGate -check file filename" to output a report on the changes it proposes:
Example script for clock gate cloning: ## Read CTS spec file and run clock gate aware placement specifyClockTree -file ctsConstraintsFile setPlaceMode -clkGateAware true placeDesign # Declone the clock gates ckDecloneGate -ignoreDontTouch ignorePreplaced ckCloneGate # Clone the clock gates optDesign preCTS # Perform preCTS optimization ckCloneGate timingDriven # Clone the clock gates in time driven mode. timeDesign -preCTS optDesign preCTS # Run preCTS optimization again clockDesign # Synthesize the clock tree
The above flow generates the CTS specification files and synthesizes the clock trees in separate steps. This is most common because sometimes it is require modifying the clock tree specification files that are automatically generated. If editing is not required then you can use the below flow. Define Mode and Analysis View
set_analysis_view setup {view1 view2} hold {view3 view4} setCTSMode specMultiMode true clockDesign
createClockTreeSpec view1.spec specifyClockTree view1.spec ckSynthesis saveClockNets view1.DontTouchNets cleanupSpecifyClockTree createClockTreeSpec view2.spec specifyClockTree view2.spec + view1.DontTouchNets ckSynthesis cleanupSpecifyClockTree timeDesign
PAGE 12
13. Guidelines and Issues 1. Guidelines for Avoiding the Hold Violation
Split the larger clock domains into smaller, more manageable domains and separately build trees for each. Since lot of timing paths moving between them, so to balance those all the downstream trees should be defined into a clkGroup. Remove all the through point on divider register that is the source point for generated clocks that helps in reducing the insertion delay by half. Investigate the clock tree regarding the depth of muxing along with whether any HVT library (slow but low leakage) is being used since that impacts the insertion delay. One solution is to go for mixed-VT libs. Switch off the set_dont_touch and set_dont_use on the clock gating cells to allow CTS to upsize these cells. eg: set_dont_use [get_lib_cell <clock gating libcell>] false One way of thinking on clock tree constraints is to set the maxDelay to a large value to reduce the effort the tool spends on this, hence make it focus on skew/slew/minimal added cells. Cell Padding will help in getting more space reserved around FF's. This should help with both clock tree buffer insertion and the addition of de-coupling cap cells into key areas. If running scan-reordering then further re-ordering can be carried out after clock trees are inserted to reduce Hold violations caused by clock tree insertion. This should help post CTS holds, but may have little effect on post CTS routing congestion. Add following variables before placement stage: setPlaceMode ignoreScan true setScanReorderMode skipMode skipNone After CTS and setting clocks to propagated: setScanReorderMode -clkAware scanReorder After the clock trees with lower cell count have been created, the command ckECO can used on high buffer-cell count trees which may result in some further improvement in skew. ckECO -clk <clk root name> -postRoute -useSpecFileCellsOnly While doing the optimization it performs resizing and it may allowed to use any cell that matches the footprint of the existing cell, regardless of whether it is in the buffer list or not. So it may also swap the cell which is specified in spec file. So if you want to limit the resizing of listed cell then you must specify setDontUse cellName true on the cells it should not use.
Black Box
Clock Gate
INOUT Clock root CTS assume it input pin and couldnt find any
Ist Scenario
To FF
Clock
Clock Gate
INOUT Clock root CTS is tracing on right side, but user want it treat it as output pin of gate and trace on left side
2nd Scenario
There should be no issue when CTS is tracing the INOUT pin as INPUT but when it is required to consider the pin as OUTPUT as above scenarios the use the below variable setCTSMode -traceCellInOutPinAsOutPin true