Timing Closure Document
Timing Closure Document
Timing Closure Document
Introduction
Every design has to run at a certain speed based on the design requirement.
There are generally three types of speed requirement in an FPGA design:
Timing requirement how fast or slow a design should run. This is defined
through the target clock period (or clock frequency) and a few other
constraints.
Throughput the average rate of the valid output delivered per clock cycle
Latency the amount of the time required when the valid output is
available after the input arrives, usually measured in the number of clock
cycles
Throughput and latency are usually related to the design architecture and
application, and they need to be traded off between each other based on the
system requirement. For example, high throughput usually means more
pipelining, which increases the latency; low latency usually requires longer
combinatorial paths, which removes pipelines, and this can reduce the
throughput and clock speed.
More often, FPGA designers deal with the timing requirement to make sure
that the design runs at the required clock speed. This can require hard work
for high-speed design in order to close timing using various techniques
(including the trade-off between throughput and latency, appropriate timing
constraint adjustments, etc.) and running through multiple processing
iterations including Synthesis, MAP and PAR.
This document focuses on the timing requirement; it explains the timingdriven FPGA implementation processes and shows how to tackle timing
issues when timing closure becomes problematic.
Introduction
Clock Period/Frequency
Usually the maximum delay (or the most critical path) between any two
sequential elements (e.g. registers) in a clock domain determines that clocks
maximum frequency. In order to ensure that a design can run at the required
speed, the clock period or frequency should be defined as a constraint for the
timing-driven process so that the implementation process considers the
requirement and ensures that the maximum delay is no larger than the clock
period defined.
If a design includes multiple clock domains, each clock should be
appropriately constrained.
Figure 1 illustrates the following timing preference:
FREQUENCY PORT CLK <Frequency> MHZ
It illustrates an ideal clock, its period and frequency definitions. In this
diagram, the circuit will operate correctly if the data leaving FF_S (which is
created by the first clock edge) arrives at FF_D prior to the second clock
edge. The Period (or Frequency) defines how far apart these two clock edges
are in time.
Figure 1: Period and Frequency of an Ideal Clock
Timing Closure
Introduction
If the relationship is not known, then the data path will not be constrained.
Synthesis tools such as Synplify Pro must be told of these relationships. If
Synplify Pro constraint is used, this is done by defining two clocks with the
same clock group name using define_clock constraints, at the same time, by
defining clock skew using define_clock_delay constraint.
The FPGA implementation tools such as MAP, PAR and TRACE usually can
determine the timing relationship between two clocks (e.g. they both come
from the same PLL). However, if both clocks come from external pins, the
user must specify their relationship. This is done using the CLKSKEWDIFF
preference.
Figure 2 illustrates the following timing parameter:
CLKSKEWDIFF CLKPORT "CLK2" CLKPORT "CLK1" <clkskewdiff_value>
NS;
CLKSKEWDIFF is used to relate two otherwise unrelated clocks, for example,
two top-level clocks. TRACE will not analyze cross-domain paths between
unrelated clocks. You can establish a relationship between two unrelated
clocks by specifying the amount of clock skew between these clocks using the
CLKSKEWDIFF preference, as illustrated in Figure 2.
Figure 2: Using CLKSKEWDIFF
Timing Closure
Introduction
Input/Output Timing
FPGA IO timing basically looks at one part of the register-register timing
analysis of the simple Period/Frequency case in Figure 2. The goal is to be
able to analyze register-to-register paths that cross between two devices, but
focus the analysis on the FPGA device and model the other device within the
FPGA board timing environment.
Input case when the FPGA is receiving data from a source device (as
input)
In this case, the timing data of when the other device (and board)
guarantees to provide data to the FPGA pins is provided to the analysis as
a constraint.
Timing Closure
Introduction
capture clock edge arrives at its FPGA pin this is the input setup constraint
the FPGA internal timing must work within to meet the internal register setup
time of 0.50 ns.
Note
The input setup constraint value depends on the clock period value. Thus, if the clock
period constraint changes, the input setup constraint should also change.
Input delay is the time between when the previous capture clock edge arrived
as its FPGA pin and when the data arrives at its FPGA input pin. Input delay is
a positive value if the data arrives after this clock edge.
Note
The input delay value does not depend on the clock period value. Thus, if the clock
period constraint changes, the input delay constraint does not change.
Input setup and input delay are two different ways of looking at the same thing
(if you know one, you know the other).
Input setup + Input delay = clock period.
Both input setup and input delay forms are specified using the INPUT_SETUP
preference.
The HOLD time (in the INPUT_SETUP preference) represents how long the
data is valid at the FPGA input pin after the clock edge used for input setup
arrives at its FPGA pin. It is used to test for board level hold time violations.
The Hold time is how long the data will remain constant at the FPGA input pin
after the clock edge arrives at its FPGA pin.
Figure 4: Input Case
Timing Closure
Introduction
The following shows how Input setup and Input delay forms are specified in
the preference language. See the diagram in Figure 6 for reference:
Input_setup (form)
INPUT_SETUP PORT INPUT <INPUT_SETUP_value>
HOLD <HOLD_value> CLKPORT CLK
Input_delay (form)
INPUT_SETUP PORT INPUT INPUT_DELAY
<INPUT_DELAY_value> HOLD <HOLD_value> CLKPORT CLK
It shows a no-skew clock with its period/frequency, and an input with its input
setup, input delay and hold time. This also shows how the sum of input setup
time and input delay is the clock period, so that when one value is known, the
Timing Closure
Introduction
other one can be easily calculated by subtracting the known value from the
clock period.
Figure 6: Input Setup/Input Delay
Clock to Output
Clock_to_Out is the time difference between when the launch clock edge
arrives at the FPGA input pin, and when the resulting data signal departs the
FPGA (pin). The clock to out timing constraint value is when the FPGA must
provide the data to meet the board timing and downstream device
requirements. It is a function of the clock speed, destination device timing (its
input setup requirement) and board timing. The FPGA meets this timing
through the choice of the internal clock and data paths used.
The detailed timing example below shows the components of the two options
to define output IO timing constraint: clock to out and output delay.
The external environment is given: a clock period of 20 ns, board clock skew
of 1ns, board trace of 6 ns, and the destination devices input setup
requirement of 5.5ns. This leaves at most 7.5 ns for the FPGAs clock to out
timing.
Note
The clock to out value depends on the clock period value; if the clock period constraint
changes, the clock to out constraint should also change.
Output delay is the portion of the clock period used by the environment
outside of the FPGA. It includes the time for the signal to travel from the
FPGA to the destination device (board trace), the input setup time required by
Timing Closure
Introduction
the destination device, and any time lost to board clock skew between the
launch and capture clocks.
Note
The output delay value does not depend on the clock period value; if the clock period
constraint changes, the output delay constraint does not change.
Clock to output and output delay are two different ways of looking at the same
thing (if you know one, you know the other).
Clock to output + Output delay = clock period.
Both clock to out and output delay forms are specified using the
CLOCK_TO_OUT preference.
The MIN time (in the CLOCK_TO_OUT preference) represents the smallest
time for clock to out that will not result in a board level hold time violation.
Figure 7: Output Case
The following shows how clock to out and output delay forms are specified in
the preference language. See the diagram in Figure 9 for reference:
Clock to out (form)
CLOCK_TO_OUT PORT OUTPUT <clock_to_out value>
MIN <HOLD_value> CLKPORT CLK
Output delay (form)
CLOCK_TO_OUT PORT OUTPUT OUTPUT_DELAY <output_delay value>
MIN <HOLD_value> CLKPORT CLK
Timing Closure
Introduction
Timing Closure
Introduction
Maximum Delay
Every net has a delay. The Maximum Delay constraint defines the maximum
total time required for a net, bus or path, from a start point to an end point.
Figure 10: Maximum Delay
Exceptions
It can be necessary to specify exceptions to the standard timing analysis.
These timing exceptions / modifications are considered timing requirements
and captured in the timing constraints.
MULTICYCLE
Generally, in a synchronous design, a receiving register captures data using
the next active clock edge after the edge that launched the data from the
launching register. If both registers are clocked using the same clock signal,
then this is one clock cycle. There are cases where the designer intends the
time from a launching register to a receiving register to be different than this
general case. The MULTICYLCE preference allows the designer to specify a
timing requirement that is different than what the general/default case would
use.
A MULTICYCLE constraint is a relaxation of the clock period / frequency
analysis, and therefore only applies to paths covered by a clock period or
frequency constraint. The launching register and the receiving register can be
clocked by the same clock or different clocks. If driven by different clocks,
these clocks must be related (if they are unrelated, the period / frequency will
not be analyzed for the paths the cross between the clock domains).
The diagram in Figure 11 illustrates the following timing preference:
MULTICYCLE FROM <Source Register> CLKNET CLK CLKEN_NET CE
TO <Destination Register> CLKNET CLK 2 X
It illustrates a portion of a clock domain where a clock enable (CE) is running
at half the speed of the clock and therefore slows the actual clock domains
10
Timing Closure
Introduction
effective frequency and allowing the data path more time to reach the capture
register. If the clock was constrained to have a period of P, the MULTICYCLE
constraint could then be used to constrain the data path D at 2 x P.
Figure 11: Clock Domain
Timing Closure
11
Introduction
False Paths/Block
There can be paths in the design that are by default analyzed for timing, but
their timing has no impact on the operation of the circuit. A simple example is
an input that is tied to a constant VCC or GND on the board. During operation,
there is no value transferred from the start of the path to the end, so its timing
is not relevant.
Users can specify these paths to the tools using false path (synthesis
constraint) and BLOCK (FPGA preference) to keep the flow from working on
areas that have no impact.
12
Comp1
Clock skew: clock arrives 1ns earlier at FPGA than it arrives at Comp1.
Timing Closure
Timing Closure
13
14
Timing Closure
Identifying and addressing timing issues at an early stage such as MAP rather
than a later stage such as PAR will save a lot of time. Later processes in the
flow usually take longer to run, so doing analysis and debug earlier in the flow
provides a faster loop to make changes and see results. At MAP, you can
easily see an issue with having much too many levels of logic on a path for
the target FMax required and can avoid running PAR to see it.
Synplify Pro
To run Synplify Pro so that it uses timing constraints (timing driven mode), you
need to properly set up the active Strategy and define timing constraints.
Strategy Settings for Timing-Driven Mode Synthesis
Synplify Pro will use timing constraints if active Strategy has the Area setting
= False, as illustrated in Figure 15. To accomplish this, you can use the
predefined strategy called Timing, or you can make this setting in your own
custom Strategy settings.
Figure 15: Strategy Changing Area Strategy for Synplify Pro
Timing Closure
15
Timing Constraints
You can use two types of constraint to define your timing requirements:
The two types of constraint cannot be mixed and used together. They cannot
be used in the same constraint file (.sdc) or in separate multiple constraint
files. You need select one type to drive the Synplify Pro process in timingdriven mode.
Remember that Synplify Pro must be in timing-driven mode in order to have
your timing constraints applied to the synthesis process; otherwise, your
timing constraints might be ignored.
For detailed information about using timing constraints through Synplify SDC
and Synopsys SDC, refer to the Synopsys FPGA Synthesis User Guide and
Synopsys FPGA Reference Manual.
16
Timing Closure
The summary at the top, as shown in the following example. Ensure that
the appropriate SDC file was used and that the required frequency and
other timing constraints defined in the SDC file were included.
Top view:
Requested Frequency:
Wire load mode:
Paths requested:
Constraint File(s):
demo
25.0 MHz
top
3
C:\projects\demo\demo.sdc
Performance Summary
*******************
Worst slack in design: 14.892
Starting Requested Estimated Requested Estimated
Clock
Clock
Clock
Frequency Frequency Period
Period
Slack Type
Group
----------------------------------------------------------------------clk
25.0 MHz 97.9 MHz 40.000
10.217
14.892 inferred clkgroup
Timing Closure
Interface information, which shows input setup and clock to output timing
information and slacks.
17
*******************
Clocks
| rise to rise
| fall to fall
| rise to fall
| fall to rise
-----------------------------------------------------------------------------------Starting Ending |constraint slack |constraint slack |constraint slack |constraint
slack
--------------------------------------------------------------------------------------clk
clk
| 40.000
30.187 | 40.000
37.483 | 20.000
14.892 | 20.000
16.099
For complete information about the Synplify Pro report, refer to the Synopsys
FPGA Synthesis User Guide and Synopsys FPGA Reference Manual.
Remember that the Synplify Pro timing report is generated from the synthesis
result, which does not have any placement and routing information. To get the
highly accurate timing analysis result, run PAR TRACE, as explained in the
section PAR TRACE on page 32.
LSE
To run LSE so that it uses timing constraints (timing-driven mode), you need
to properly set up the active strategy and define timing constraints.
To accomplish this, you can use the predefined strategy called Timing, or you
can make this setting in your own custom strategy settings.
Specify Clock Frequency Timing Constraint Setting
You might also need to change the target frequency to the required value for
your design, as shown in Figure 19. The default value for this is 200MHz.
Other Timing Related Strategies
Other strategy settings for LSE may improve your designs performance.
Depending on the actual design and preliminary synthesis result, you can
change these strategy settings, but remember that all the following suggested
settings are at the expense of increased area:
18
Timing Closure
Use one hot state machine encoding style if your design includes state
machines.
Turn off Resource sharing. Resource sharing, when enabled, allows LSE
to reduce area by sharing certain resources. Turning this off might
improve the performance.
Timing Constraints
LSE supports Synopsys Design Constraints for timing-driven logic synthesis.
LSE supports the following Synopsys Design Constraints:
create_clock
set_input_delay
set_output_delay
set_max_delay
set_multicycle_path
set_false_path
Your constraints must be written in an LSE Design Constraint file (.ldc) that is
included and set as the active synthesis constraint file in your implementation.
Remember that you must enable timing-driven mode for LSE in order to have
your timing constraints (.ldc) applied to the synthesis process; otherwise, your
timing constraints will be ignored.
Timing Closure
19
Understanding TRACE
TRACE is the static timing analysis tool in Diamond.
Static timing analysis (STA) is a fast and powerful verification technique for
validating design performance. It is one of the most important steps in the
design flow, and it should be considered as important as the functional
verification performed with a logic simulator. TRACE verifies circuit timing by
totaling the propagation delays along paths between clocked or combinational
elements in a circuit. TRACE determines and reports timing data, such as the
critical path, setup time and hold time, and the maximum frequency.
You can run TRACE on mapped designs or on completely placed and routed
designs.
TRACE enables you to do the following:
20
Timing Closure
TRACE performs two types of timing analysis: Setup and Hold. Setup time
analysis ensures that the data arrives at the receiving registers before the
next capturing clock edge. Hold time analysis ensures that the data does not
arrive at the receiving registers too early, thus is captured by the clock edge
prior to the intended capture edge. The examples from Table 1 explain this in
detail.
Table 1: Setup and Hold Timing Analysis
constraints
FREQUENCY/PERIOD
INPUT_SETUP
HOLD
HOLD
TRACE uses different performance grades and conditions when doing Setup
time vs. Hold time analysis. Table 2 has more details. Although Place and
Route (PAR) runs as a single process, there are two distinct steps: (1) Meet
setup, (2) Meet hold (done via the Hold Time Correction sub-step). The table
also summarizes the behaviors of these two steps.
Table 2: Performance Grades and PAR Behavior
setup timing analysis
Default performance grade used (can Performance grade of the target
be changed by the end user)
device; for example, 6
Worst case conditions used (from
data in speed grade file)
Timing Closure
Slow/max data
Fast/min data
Fast/min clock
Slow/max clock
21
Ignored
The performance grade -m represents the fastest possible PVT corner. The
voltage used for this option is 5% above the nominal value, and the
temperature used is -40C.
For register-to-register timing analysis, the default grades used represent the
worst case for the setup analysis and the hold time analysis.
For FPGA I/O timing analysis, meaning INPUT_SETUP and
CLOCK_TO_OUT, it is possible that the default grades used will not represent
the worst case. The worst case depends on your design and the final placed
and routed design. If PAR TRACE reports no timing errors, you should still run
I/O timing analysis to sweep across speed grades faster than your target
speed grade to ensure that I/O timing is satisfied. Refer to the section I/O
Timing Analysis on page 33 for the details.
TRACE can be run on a post MAP netlist prior to place and routed where
routing delay is an estimate, or after place and route (see Understanding the
PAR and PAR TRACE Reports on page 32).
22
Timing Closure
You can specify a few options in the MAP TRACE Strategy settings to control
the MAP TRACE process.
Figure 22: MAP TRACE Strategy Settings
Timing Closure
23
These settings can help you quickly identify any timing issues that might exist
in your design:
Check Unconstrained Connections Setting this to True will list the paths
that are not covered by any timing preference.
Note
The Check Unconstrained Connections option will be discontinued after the next
two Diamond releases.
Check Unconstrained Paths Setting this to True will report the paths that
are not constrained and shows the start point and end point of each path.
TRACE will suggest some timing preferences to constrain the given paths.
The unconstrained paths are shown only in the setup timing check report
to avoid duplication of these same paths in the hold timing check report.
Based on the design and the required performance, only necessary paths
should be constrained so that PAR focuses only on the optimization of the
important paths. However, the Unconstrained Paths section of the TRACE
report is very useful for identifying whether any missing timing constraints
are really important to the design. This option does not require you to add
more preferences in an attempt to constrain all paths. Instead, it serves as
a reminder that there might be a necessary preference that is missing,
which could impact the desired performance of the design.
Note
The Check Unconstrained Paths option cannot be used with the -allprefpath
command-line option.
Report Style Set this option to Error Timing Report so that TRACE only
reports paths and nets that have timing errors. This allows you to identify
any timing issue quickly.
You can also specify a few PAR TRACE options through the Strategy settings
to control the PAR TRACE process, with the following differences:
24
Speed for Hold Analysis You can select the speed grade for the hold
analysis By default, this value is set to m, or minimum, which
represents the virtual silicon that is faster than the fastest speed grade of
the device available. If the analysis result reveals no hold time violation
using the value m, then it guarantees there will be no hold time violation
for all speed grades, including the one you selected for your project.
There are some cases of hold time violations with the use of m, but there
might be no violation for the speed grade you selected for your project. If
being able to migrate to a faster speed grade is not your concern, you can
set this value to the actual number selected for your project.
Speed for Setup Analysis You can select the speed grade for the setup
analysis. By default, this value is set to default, which is the speed grade
you selected for your project.
Timing Closure
The MAP TRACE and PAR TRACE reports have the same format. Table 3
shows a summary of their differences.
Table 3: TRACE Report Differences
MAP TRACE
PAR TRACE
.tw1
.twr
Routing timing
Estimated
Netlist used
Quickly identifying
Detailed timing
The TRACE report can be viewed in the Diamond Report View, as shown in
Figure 24. You can also view the TRACE report files in the implementation
directory using a text editor. Both reports use the same naming convention for
the prefix and use a different file extension (see table above). The naming
convention for the prefix is <prj_name>_<impl_name>, where <prj_name> is
your project name and <impl_name> is the implementation name.
Based on the type of analysis (setup, hold, etc.) set through the Analysis
Options in the MAP TRACE (or PAR TRACE) strategy settings, you might see
the report information differently. In the example shown above, the result of
setup time and hold time analysis can be examined. You can quickly jump to a
few areas to see if the result meets your timing requirements (preferences), or
to find more information about your design, as follows:
Timing Closure
25
Timing summary at the top This section summarizes the total number of
timing errors and timing scores for both setup time analysis and hold time
analysis.
Following is separate Setup and Hold analysis sections. They have the
same format.
26
Timing Closure
In this example, there are two clock domains: clk1 and clk2. Both of these
clock domains are covered by their own FREQUENCY preferences. In
addition, there are cross-domain paths between these two clocks, and they
are covered by their own MULTICYCLE preferences.
Remember that MAP TRACE runs on the mapped result, which does not
have any placement and routing information; instead, MAP TRACE uses the
Route Estimation Algorithm defined through the MAP TRACE strategy
settings to estimate routing delays. To get the accurate timing analysis result,
run PAR TRACE, as explained in PAR TRACE on page 32.
Timing Closure
27
Constraints
There are two types of constraints. Timing constraints are timing goals that
the design is to meet. Placement constraints directly affect the physical layout
of the netlist when it is put into the device. An example of a placement
constraint is assigning a designs top level port to a specific device pin.
Constraints passed to MAP can come from two different sources:
Entered in the HDL and passed to MAP inside the NGD file
Use the Spreadsheet View (timing preferences are held in the Timing
Preferences sheet),
You can use the following preference commands to define your timing
constraints in LPF:
FREQUENCY/PERIOD
INPUT_SETUP
CLOCK_TO_OUT
MULTICYCLE
MAXDELAY
CLKSKEWDIFF
BLOCK
OFFSET
28
Timing Closure
Timing Driven Mapping Turning on this option instructs the MAP process
to calculate the slack time for all constrained paths and optimize the
critical paths based on the slack distributions.
Timing Driven Node Replication Turning on this option instructs the MAP
process to replicate a LUT4 that has multiple-fanout flip-flops. It adds a
LUT for each flip-flop when the LUT belongs to the timing path, thus
packing LUT/FF in the same slice for all flip-flops.
Timing Driven Packing Turning on this option instructs the MAP process
to do timing -driven packing of LUT/FF, FF/LUT, and LUT/LUT in the same
slice
MAP TRACE
The content of the MAP TRACE report, and how it is generated, is described
in the section The MAP TRACE and PAR TRACE Reports on page 22. It can
be generated right after the MAP step, and by avoiding the PAR step, to more
quickly see if there are gross issues with the defined timing constraints or the
design itself.
Understanding PAR
PAR performs the following tasks:
Timing Closure
It takes a mapped physical design (.ncd file) and a preference file (.prf) as
input files. The .ncd file and .prf file are the outputs of the MAP process.
See Preferences and Processes.
It uses its timing driven engine to place and route the design with the goal
of meeting the placement constraints and the timing preferences defined
in the input .prf file. As explained in the second table in the section
Understanding TRACE, PAR first works to make the setup timing score
zero. If auto hold time correction is enabled in PAR, PAR then works to
correct hold time violations. Auto Hold Timing Correction is enabled
through the PAR strategy settings, as explained in PAR (Place & Route
Design) Settings in Strategy for Timing Closure. For releases prior to
Diamond 2.0, this must be enabled by the user (i.e. default setting was
disabled)
29
Placement
The PAR process places the mapped physical design (.ncd file) in two stages:
constructive placement and optimizing placement. PAR writes the physical
design after each of these stages is complete.
During constructive placement, PAR places components into sites based on
factors such as the following:
Cost tables that assign random weighted values to each of the relevant
factors. There are 100 possible cost tables, and they can be set through
PAR strategy settings.
Routing
Routing is also done in two stages: iterative routing and delay reduction
routing (also called cleanup). PAR writes the physical design (.ncd file) only
after iterations where the routing score (accumulated setup timing slacks) has
improved.
During iterative routing, the router attempts to converge on a solution that
routes the design to completion or minimizes the number of unrouted nets.
During delay-reduction routing, the router takes the results of iterative routing
and reroutes some connections to minimize the signal delays within the
device. Two types of delay-reduction routing are performed:
30
Timing Closure
PAR can run in two basic modes. The mode is set via the Auto Hold-Time
Correction setting in the PAR (Place & Route Design) section of the active
Strategy being used.
Meet setup and hold. Auto Hold-Time Correction = On. This is the
recommended mode. PAR will work to meet both setup and hold time so
that there are no violations. This is the default mode for Diamond release
2.0 and later. Therefore, users must turn this mode on in prior releases.
Meet setup (and report on hold). Auto Hold-Time Correction = Off. This is
not the recommended mode. PAR will work to meet setup only. If there are
hold time violations, PAR will not attempt to correct them. This is the
default mode for Diamond releases prior to 2.0. This mode may be useful
early in the design closure process when the focus is on meeting setup
time, and user wants to save runtime.
In either mode, Trace report will include setup and/or hold analysis whatever
the user chooses (default is for both) in the Analysis Options of the Trace
report Strategy settings. Any violations will be reported. You should examine
the PAR TRACE report for setup and hold-time analysis results.
Clock Skew Minimization -- If there is any clock signal that is not assigned
to the global clock tree, enabling this option will allow PAR to balance
routing to reduce clock skews.
Disable Timing Driven By default, this option is off, which means that
PAR runs timing-driven placement and routing based on your timing
constraints. You might want to disable timing-driven PAR on those
occasions where you want to have a quick PAR run and get a rough idea
of the difficulty of placing and routing your design.
Path Based Placement Turning on this option allows PAR to do pathbased placement, which usually yields better performance.
Timing Constraints
PAR (as with MAP) takes constraints as input. These constraints are passed
to PAR from MAP in a file referred to as the Physical Preference File (PRF,
has file extension .prf). The PRF is not a user created file. User puts
constraints into the LPF file, MAP then generates the PRF from the LPF, and
Timing Closure
31
then PAR runs against the PRF. User edits made directly to the PRF will be
lost if/when MAP is run. Therefore, the PRF should not be edited. PRF is not
accessible from the Diamond GUI.
PAR TRACE
The content of the PAR TRACE report, and how it is generated, is described
in the section The MAP TRACE and PAR TRACE Reports. It can be
generated right after the PAR step; it holds the final timing for the design.
In the Cost Table Summary, the Timing Score reported is the sum of all the
negative slacks related to setup timing requirements. Therefore, if the number
reported is 0, it means that the timing-driven PAR process finished
successfully without finding any setup timing issues. The hold timing score is
reported by PAR only if Auto Hold-Time Correction is ON in the active
Strategy. It can be found later in the PAR report, for example:
Hold time optimization iteration 0:
There are 6 hold time violations, the optimization is running
...
End of iteration 0
17 successful; 0 unrouted; real time: 24 secs
Hold time optimization iteration 1:
There are 4 hold time violations, the optimization is running
...
End of iteration 1
17 successful; 0 unrouted; real time: 24 secs
32
Timing Closure
The PAR TRACE report includes timing scores for both setup and hold if both
setup and hold are chosen in the Analysis Options (i.e. Standard Setup and
Hold Analysis). This report includes the detailed timing analysis of the design
against the constraints. It reports what constraints have been considered,
whether they have been met, and the failing paths wherever a constraint has
not been met. See the section The MAP TRACE and PAR TRACE Reports
on page 22 for more info on the format of the TRACE report.
Timing Closure
33
......
I/O Timing Report (All units are in ns)
Worst Case Results across All Performance Grades (M, 9, 8, 7, 6, 6L, 7L, 8L):
// Input Setup and Hold Times
Port Clock Edge Setup Performance_Grade Hold Performance_Grade
---------------------------------------------------------------------data1 clk1 R
0.469
6
1.180
6
data2 clk2 R
0.596
6
1.087
M
rst
clk1 R
0.458
M
0.437
6
rst
clk2 R
0.514
6
0.245
6
......
34
Timing Closure
In this example, from the summary section, the worst-case hold time minimum
requirement of the port data1 is 1.180 ns using performance grade 6
instead of the performance grade -m.
As explained previously, PAR TRACE actually uses the speed grade -m for
the hold timing analysis. In this case, the hold time minimum requirement is
1.145 ns, as shown in the detail section for performance grade M, which is
less than 1.180 ns.
When you actually ran PAR TRACE, if the hold time requirement of data1
that was written in the LPF is less than 1.180 ns but greater than 1.145 ns, it
actually reveals reveal an I/O timing problem in your design. If the hold time
requirement written in the LPF is greater than 1.180 ns, then your design is
fine.
The same situation applies to the setup timing analysis as well.
Timing Closure
35
2. Along with the FPGA-friendly code, use the appropriate and sufficient
timing constraints (preferences) to drive synthesis, MAP and PAR. A good
set of FPGA timing requirements are crucial for meeting timing goals.
3. Run an initial design process including synthesis, MAP, MAP TRACE,
PAR and PAR TRACE. If you have a high performance requirement,
select timing-driven placement and specify a low placement effort level for
this first PAR process through PAR strategy settings.
Rule of Thumb: When a timing issue is reported by MAP TRACE, usually
it is an RTL issue, and you should correlate the issue in your HDL code.
You can save time by using the MAP TRACE report to fix these issues
instead of trying to resolve them by needlessly running PAR and PAR
TRACE.
4. Examine the MAP report, MAP TRACE report, PAR report, PAR TRACE
and PAD report, and analyze the timing information.
5. If necessary, modify timing constraints and preferences. If applicable,
assign primary and secondary clocks, tune I/O timing with PLLs, and
group components along critical paths.
6. Run a second processing iteration. For PAR, change its strategy settings
to use timing-driven placement, and then experiment with increased
placement effort and multiple routing passes.
7. Analyze timing again, identifying high-fanout nets, critical path nets, and
long delay paths, etc.
8. If necessary, do some floorplanning to direct the physical layout of the
circuit. For designs that do not meet performance goals, use groups and
regions to place components closer together and shorten routing
distances. Use reiterative floor planning, repeating steps 6 through 8 until
performance goals are achieved
36
Timing Closure
mode, examine the timing report, and proceed to the next process if the
estimated performance meets your requirement.
2. Provide sufficient and appropriate timing constraints.
To ensure that timing-driven synthesis works correctly, you must provide
appropriate and sufficient timing constraints in the synthesis constraints
file. The essential timing constraint is the clock period or frequency. If you
do not provide clock requirements, by default 200MHz will be used for
timing-driven synthesis. This can be seen and modified through the
synthesis strategy, as explained in Timing-Driven Synthesis and
Constraints on page 15. If you have multiple clocks, make sure that all of
them are constrained with the appropriate values. Other timing
constraints, such as setup time, clock to output, etc., should be provided if
available.
3. Interpret the synthesis timing report.
Since Synplify Pro does not have place-and-route information, its timing
report is usually aggressive and inaccurate. You should use its timing
report as a reference. Usually you can reduce the reported speed by one
third to a half.
On the other hand, LSE is more conservative, and the reported maximum
frequency (Fmax) value is usually within 10% of the actual value from the
placed and routed result.
4. Over-constrain or not
There are some common practices suggesting that you should overconstrain the synthesis process in order to get a result with a better
performance. This is not always the case, since over-constraining a
design can unnecessarily increase the resource usage, and this might not
be what you expect. A decision must be made to balance the performance
and size of your design.
Timing Closure
When you use Synplify, the default setting is off, which is recommended.
You should let MAP infer GSR for the best result.
When you use the Lattice Synthesis Engine (LSE), the default setting is
auto, which allows LSE to decide. This is also the recommended setting.
37
Use synthesis attributes and directives in the RTL code to control each
individual port, or apply globally to all top level I/Os. This works for both
Synplify Pro and LSE.
For example, in Verilog:
output [15:0] q; // synthesis syn_useioff = 1
or
module test (a, b, clk, rst, d) /* synthesis syn_useioff
= 1 */;
in VHDL:
attribute syn_useioff : boolean;
attribute syn_useioff of data_in : signal is true; -data_in is an I/O port
If you use Synplify Pro, you can use synthesis constraints in the active
Synplify Design Constraints file. You can control each individual port, or
apply globally to all I/Os:
define_attribute {z[3:0]} syn_useioff {1}
define_global_attribute syn_useioff {1}
If you use LSE, you can set the synthesis strategy option Use IO
Registers to true. This will globally applies the option to all I/Os.
After turning on the use of the I/O register, ensure that the timing can still meet
setup time and the Fmax requirements. Using the I/O register helps I/O
timing, but it could potentially affect internal Fmax and cause an I/O hold time
issue. There are some good cases where register duplication is used to help
both I/O and Fmax; for example, the case of a counter with output going off
chip, as illustrated in Figure 27.
Figure 27: Counter with Output Going Off Chip
Note that not all FPGA devices facilitate I/O registers. Refer to the hardware
datasheet of your target device.
38
Timing Closure
To use this feature, you have the following options. Note that using these
elements will compensate input hold time requirements by adding a specific
amount of delay in the data input path. At the same time, it will affect the setup
time with the same amount of delay value.
in VHDL:
attribute
attribute
is an I/O
attribute
syn_useioff : boolean;
syn_useioff of data_in : signal is true; --data_in
port
FIXEDDELAY of data_in: signal is TRUE;
The delay value added depends on the device used. For example, for
ECP3 speed grade -9 or -8, this value is 1.3ns. For other devices, refer to
their datasheets.
Timing Closure
39
If you use ECP3 devices, you can instantiate a DELAYC element in the
HDL to add a fixed delay. The amount of delay added is the same as
using FIXEDDELAY. For example:
input b0;
wire bx, b_temp;
DELAYC myDelay1(.Z(b_temp), .A(b0));
IFS1P3IX b0_reg(.Q(bx), .SP(1'b1), .CD(rst), .SCLK(clk),
.D(b_temp));
The amount of delay value added is defined by the value of DEL[3:0]; This
allows you to choose a delay from one of the 16 values. For ECP3
devices, the value increment is 35ps.
If you use XO2 devices, you can instantiate either a DELAYE element
(all sides) in the HDL to add a user-specified amount of delay, or a
DELAYD element (bottom side) in the HDL to add a dynamic delay. For
example:
component DELAYE
generic(DEL_MODE: in String;
DEL_VALUE: in String);
port (A: in std_logic;
Z : out std_logic);
end component;
......
inst1: DELAYE
generic map ( DEL_MODE=> "SCLK_ZEROHOLD",
DEL_VALUE=> "DELAY31")
port map (A => IN1,
Z => insig);
40
Timing Closure
Note that this attribute in the HDL code will override the global maximum
fanout control. To use the attribute in your code, in Verilog:
input [31: 0] data_ in /* synthesis syn_ maxfan= 1000 */;
in VHDL:
attribute syn_maxfan : integer;
attribute syn_maxfan of data_in : signal is 1000;
in VHDL:
signal q_int : std_logic_vector( 3 downto 0);
Attribute syn_useenables : boolean;
attribute syn_useenables of q_int : signal is false;
process( clk)
begin
if (clk'event and clk = '1') then
if (enable = '1') then
q_int <= d;
end if;
end if;
end process;
Timing Closure
41
syn_tsu<n> timing setup delay required for input pins relative to the
clock
42
Timing Closure
Resource Sharing
Resource sharing usually increases the number of logic levels, thus
introducing additional delays to a path. Synthesis tools usually do a good job
of resource sharing if the path is not critical, but this is not always the case.
You should examine the critical paths to make sure that resource sharing
does not cause any timing issues.
Timing Closure
43
Check the logic depth in the report and determine if HDL design
changes are required. A typical design change example is pipelining,
or registering, the data path. This technique might be the only way to
achieve high internal frequencies if the designs logic levels are too
deep.
2. Perform placement and routing early in the design phase, using a
preliminary preference file, to gather information about the design.
3. Tune up your preference file to include all I/O and internal timing paths, as
appropriate. Establish the pinout in the preference file. Check the
preference coverage through the TRACE report and ensure that your
design is fully covered by the timing requirement.
4. Push PAR, when necessary, by running multiple routing iterations and
multiple placement iterations.
5. Revise the preference file as appropriate; use MULTICYCLE opportunities
when possible.
6. Floorplan your design if necessary.
44
No clock preference
Timing Closure
This option gives a list of all of the signals that are not covered under timing
analysis. In some designs, many of these signals are a common ground net
that indeed does not need to be constrained. You should understand this and
Timing Closure
45
It is always a good practice to constrain your design with the actual timing
requirements. But sometimes you might want to experiment with overconstraining your design to determine your best constraint settings for
achieving the desired results. In this case, instead of purposely overconstraining your design, you should use the PAR_ADJ option when you
define your clock period or frequency. The PAR_ADJ keyword allows you to
tighten requirements for PAR, but at the same time, preserve the
requirements reported by TRACE.
46
Timing Closure
Timing Closure
47
Note
The HDL code and timing constraint examples shown throughout all case
studies are solely for explaining the timing constraint concept and are not to
be considered as recommended HDL coding practice.
48
Timing Closure
In this example, there are two external, unrelated clocks: clk1 and clk2.
This can be examined in the Clock Domains Analysis section of the MAP
TRACE or PAR TRACE report:
Timing Closure
49
The text in red illustrates the calculated FREQUENCY preferences for both
clocks. These preferences are used to drive the MAP and PAR process, and
they are also used for the MAP TRACE and PAR TRACE static timing
analysis. The Report Type highlighted in blue clearly states that the
preferences are generated automatically by TRACE.
The warning messages in blue are the actual maximum speed of your design
for each clock domain, based on the calculated FREQUENCY preferences.
Also noticeable in this particular example is that, from the HDL code, we know
that clk1 and clk2 are unrelated. This fact is further proved in the Clock
Domain Analysis section. TRACE will not analyze cross-domain paths driven
by unrelated clocks, because it cannot determine the relationship between
them. This might make your design under-constrained. To relate two clocks,
use CLKSKEWDIFF See Case Study 6 CLKSKEWDIFF on page 60.
From the TRACE report, it can also be seen that since there are no userdefined timing constraints in the LPF or the HDL, the two default BLOCK
preferences are used, whether or not the two BLOCK preferences are present
in the default LPF file.
What is Learned from Case Study 1
From this case study, the following points are learned:
On the other hand, you can use this case as an experimental process to
estimate how fast or slow you design can run:
50
If there are no user-defined timing preferences in the LPF or the HDL, the
two default BLOCK preferences will be used, whether or not they are
present in the active LPF file.
Timing Closure
From this report, we can see that clock clk1 is now constrained but clock
clk2 is not. This reveals the factan important engine behaviorthat if your
design has more than one clock, and if only some but not all of them are
constrained, the engine will not automatically calculate and generate
FREQUENCY preferences for those clocks that you did not constrain.
This fact can also be observed in the Preference Summary section of the
MAP TRACE and PAR TRACE reports:
Timing Closure
51
Both the MAP TRACE report and the PAR TRACE report clearly show that
only one clock FREQUENCY preference is defined. Note that the MAP
TRACE report does not have Report Type that was shown in Case Study 1
No user-defined timing constraint on page 47. This user-defined preference
is the only FREQUENCY requirement driving the engine, and there is no
automatically generated FREQUENCY preference for clock clk2; thus clk2
is not constrained now.
Upon further examination of either the MAP TRACE report or the PAR TRACE
report, the low percentage of the preference coverage (17.6%) should imply
the problem as well:
Timing summary (Setup):
--------------Timing errors: 0 Score: 0
Cumulative negative slack: 0
Constraints cover 1 paths, 1 nets, and 3 connections (17.6%
coverage)
------------------------------------------------------------
There is one more fact you should be aware of: if there is a FREQUENCY
preference defined in your LPF, then including or excluding the two default
BLOCK preferences will be different. Suppose that we now have the following
LPF preferences:
#BLOCK RESETPATHS ;
#BLOCK ASYNCPATHS ;
FREQUENCY PORT "clk1" 300.000000 MHz ;
Note that the two BLOCK preferences are commented out and will not be in
effect. Now look at the Preference Summary section in the PAR TRACE
report:
Preference Summary
FREQUENCY PORT "clk1" 300.000000 MHz (0 errors)
4 items scored, 0 timing errors detected.
Report: 457.666MHz is the maximum frequency for this
preference.
------------------------------------------------------------
It is clear that the two BLOCK preferences are not shown in the summary.
Interestingly, you might also notice that the maximum frequency
(457.666MHz) is different now from the previous one (765.697MHz) where
the two BLOCK preferences were used. Because BLOCK ASYNCPATHS is
not present in the latter case, TRACE will analyze input-to-register paths that
are covered by a FREQUENCY or a PERIOD preference but not covered by
an INPUT_SETUP preference, and the input-to-register paths timing
requirements will be calculated automatically and used to drive the engine.
The calculated value usually equals a clock cycle defined by the
FREQUENCY preference. Most of the time, apparently, this value will under-
52
Timing Closure
constrain the engine. We will look into this in Case study 4 - INPUT_SETUP
on page 56.
What is Learned from Case Study 2
From this case study, the following points are learned:
You should have all clocks in your design appropriately constrained, either
through your HDL or through FREQUENCY preferences defined in the
LPF. Otherwise, your design is under-constrained, and you might miss
many timing problems in your design.
The TRACE reports are helpful for finding any unconstrained clocks:
The Clock Domains Analysis section should list the total number of
clocks identified in your design.
The Preference Summary lists all clocks that had been constrained.
Since PAR TRACE is generated after PAR, which usually takes more
runtime, you should carefully examine the MAP TRACE report and correct
as many issues as possible before running PAR.
Now let us examine the reports from MAP TRACE, PAR, and PAR TRACE.
Timing Closure
53
Clock Domains Analysis now reports that there are two clocks and that both of
them were constrained. This is also confirmed in the Preference Summary of
the MAP TRACE report and the PAR TRACE report. Furthermore, the
percentage of the preference coverage is doubled compared with that of
Case Study 2 Insufficient FREQUENCY preference on page 51, as shown
below:
Timing summary (Setup):
--------------Timing errors: 0 Score: 0
Cumulative negative slack: 0
Constraints cover 2 paths, 2 nets, and 6 connections (35.3%
coverage)
------------------------------------------------------------
54
Timing Closure
However, the 35.3% coverage is still poor, and apparently the design is still
under-constrained and the constraints need to be improved. We will cover
that in later case studies.
Similar to Case Study 2 Insufficient FREQUENCY preference on page 51,
if we remove the default BLOCK preferences in the active LFP, we have the
following:
#BLOCK RESETPATHS ;
#BLOCK ASYNCPATHS ;
FREQUENCY PORT "clk1" 300.000000 MHz ;
FREQUENCY PORT "clk2" 350.000000 MHz ;
The input-to-register paths covered by both clk1 and clk2 will be analyzed
by TRACE using the automatically calculated timing requirement, meaning
one clock cycle. In this example, the input setup timing requirement for all
input-to-register paths in clock domain clk1 is 3.333ns, based on the clk1
300MHz FREQUENCY preference. The input setup timing requirement for all
input-to-register paths in clock domain clk2 is 2.857ns, based on the clk2
350MHz FREQUENCY preference. This apparently increases the preference
coverage from 35.3% to 70.6%, as reported in the Timing Summary section:
Timing summary (Setup):
--------------Timing errors: 0 Score: 0
Cumulative negative slack: 0
Constraints cover 8 paths, 2 nets, and 12 connections (70.6%
coverage)
------------------------------------------------------------
Multi-cycle paths
False paths
Cross-domain paths that are between related clocks will be covered, though.
Related clocks are those clocks whose relationships the engine is able to
Timing Closure
55
Not all internally generated clocks can be explicitly related. One example of
this is a gated clock.
If your design includes cross-domain paths that are between unrelated clocks,
you should establish the relationship between the clocks; otherwise, your
design will be under-constrained. This will be covered in Case Study 6
CLKSKEWDIFF on page 60.
What is Learned from Case Study 3
From this case study, the following points are learned:
56
;
;
CLKPORT "clk1" ;
CLKPORT "clk2"
Timing Closure
================================================================================
Preference: INPUT_SETUP PORT "data2" 1.500000 ns CLKPORT "clk2" CLK_OFFSET 1.500000 X
;
1 item scored, 0 timing errors detected.
-------------------------------------------------------------------------------Passed: The following path meets requirements by 5.197ns
Logical Details: Cell type Pin type
Cell/ASIC name (clock net +/-)
Source:
Port
Pad
data2
Destination:
FF
Data in
reg21_0io (to clk2_c +)
Max Data Path Delay:
0.508ns (100.0% logic, 0.0% route), 1 logic levels.
Min Clock Path Delay:
1.213ns (37.7% logic, 62.3% route), 1 logic levels.
IOL_L27A attributes: FINE=FDEL0
Constraint Details:
0.508ns delay data2 to data2_MGIOL less
5.785ns offset data2 to clk2 (totaling -5.277ns) meets
1.213ns delay clk2 to data2_MGIOL less
1.293ns DI_SET requirement (totaling -0.080ns) by 5.197ns
Physical Path Details:
......
Timing Closure
57
;
;
CLKPORT "clk1" ;
CLKPORT "clk2"
CLKPORT "clk1" ;
Now we have 58.8% coverage, compared with 47.1% in the previous case
study.
When defining the CLOCK_TO_OUT preference, you can use a clock output,
if your design has one, as the reference clock. For example:
58
Timing Closure
Timing Closure
59
output paths will not be covered, and this can cause your design to be
under-constrained.
For detailed information on the CLOCK_TO_OUT preference and available
options, refer to the Constraints Reference Guide in the Diamond online Help.
Case Study 6 CLKSKEWDIFF
Sufficient FREQUENCY, INPUT_SETUP and CLOCK_TO_OUT preferences
should cover most of the paths in a simple design, especially those designs
that only have one clock domain or that have multiple clock domains with no
cross-domain paths.
More often, your design will have many paths crossing multiple clock
domains, where multiple clock domains have one or both of the following two
types:
Clock domains that are related For example, when you use a clock
divider, PLLs, and certain types of derived clocks, the engine is able to
determine the relationship between clock domains.
Clock domains that are unrelated for example, when your designs have
multiple top-level clock inputs
The example used in the previous case studies shows a typical design that
includes unrelated clock domains with paths crossing them. Since the engine
is unable to determine the relationship between clock domains, the paths
across these domains will not be analyzed; therefore, your design is underconstrained. This fact can be observed in the Clock Domains Analysis section
in the TRACE reports:
60
Timing Closure
This preference informs the engine that clk1 arrives at the clock input later
than clk2 by 0.5ns.
Now that the relationship between clk1 and clk2 is established, the engine
will cover the paths crossing these two clock domains, as shown in the Clock
Domain Analysis of the TRACE reports:
Transfers: 1
Transfers: 1
Timing Closure
61
CLKSKEWDISABLE
When calculating the slacks of paths, including those paths between the
same clock domain or those paths between cross-domains of the related
clocks, clock skews are also taken into account, as shown in the following
TRACE report:
================================================================================
Preference: FREQUENCY PORT "clk1" 500.000000 MHz ;
1 item scored, 0 timing errors detected.
-------------------------------------------------------------------------------Passed: The following path meets requirements by 0.701ns
Logical Details: Cell type Pin type
Cell/ASIC name (clock net +/-)
Source:
FF
Q
reg11_0io (from clk1_c +)
Destination:
FF
Data in
reg12 (to clk1_c +)
Delay:
1.047ns (19.2% logic, 80.8% route), 1 logic levels.
Constraint Details:
1.047ns physical path delay data1_MGIOL to SLICE_0 meets
2.000ns delay constraint less
0.099ns skew and
0.153ns M_SET requirement (totaling 1.748ns) by 0.701ns
Physical Path Details:
Data path data1_MGIOL to SLICE_0:
Name
Fanout
Delay (ns)
Site
Resource
C2OUT_DEL
--0.201
IOL_L26A.CLK to
IOL_L26A.INB data1_MGIOL (from clk1_c)
ROUTE
1
0.846
IOL_L26A.INB to
R27C2C.M0 reg11 (to clk1_c)
-------1.047
(19.2% logic, 80.8% route), 1 logic levels.
Clock Skew Details:
Source Clock Path clk1 to data1_MGIOL:
Name
Fanout
Delay (ns)
Site
Resource
ROUTE
2
1.183
K3.PADDI to
IOL_L26A.CLK clk1_c
-------1.183
(0.0% logic, 100.0% route), 0 logic levels.
Destination Clock Path clk1 to SLICE_0:
Name
Fanout
Delay (ns)
Site
Resource
ROUTE
2
1.084
K3.PADDI to
R27C2C.CLK clk1_c
-------1.084
(0.0% logic, 100.0% route), 0 logic levels.
Report:
If two clocks are related, this preference excludes clock skews from the slack
calculation when scoring cross-domain paths from the clk1_c domain to the
clk2_c domain.
62
Timing Closure
Where two clocks are unrelated, this preference also establishes the
relationship from the source domain clk1 to the destination domain clk2.
This is similar to the CLKSKEWDIFF preference, which also establishes the
relationship between two clock domains. The difference is that
CLKSKEWDIFF establishes the relationship in both domain-to-domain
directions, while CLKSKEWDISABLE only establish the relationship in one
direction. So to build the cross-domain relationship from the clk2 domain to
clk1 domain, another CLKSKEWDISABLE preference can be used:
CLKSKEWDISABLE CLKNET "clk2_c" CLKNET "clk1_c";
Note
Clock skews for data paths that have the same source and destination clock
nets cannot be disabled by using this preference.
What is Learned from Case Study 6
From this case study, the following points are learned:
If your design has multiple clock domains, you should carefully examine
your design and the TRACE reports to see if there are cross-domain paths
and to ensure that cross-domain paths are covered by the engine.
When your design has multiple top-level clocks, they are usually
unrelated. The relationship between unrelated clocks must be established
by using the CLKSKEWDIFF preference. Otherwise, cross-domain paths
will not be analyzed and your design might be under-constrained.
Clock skews are usually included and considered by the engine ifand
only ifclocks are related
Timing Closure
63
If the launching register and the receiving register of a path use the same
clock, then the default timing constraint is one clock cycle. See MultiCycle Within the Same Clock Domain.
If the launching register and the receiving register of a path use two
different clocks and these two clocks are related, then the default timing
constraint is the worst-case edge pair, which depends on the period of the
two clocks. See Multi-Cycle Across Clock Domains.
64
Timing Closure
Figure 32: Single-cycle vs. Multi-cycle Relationship Within the Same Clock Domain
The calculation formula of the timing requirement for the receiving register is
as follows:
<default delay calculated> + (n - 1) * <multiplier factor
applied clock period>
Here n is the multiplier factor. The default delay, or the default timing
requirement, is the default register-to-register timing requirement. Since both
of the registers are clocked by the same clock, the default delay calculated
will be one clock cycle.
In this example, since both FF_S and FF_D are clocked by CLK, different
options will make no difference. The timing requirement from FF_S to
FF_D is calculated by the following:
5ns + (2 - 1) * 5ns = 10ns
This is two clock cycles. So the timing requirement for the path from FF_S to
FF_D is two clock cycles (which is 10ns based on the 200MHz FREQUENCY
preference) instead of one clock cycle (which is 5ns) that would otherwise be
used as the default timing constraint by TRACE and PAR.
Multi-Cycle Across Clock Domains
The example used in the previous case studies has two clock domains and
two cross-domain paths, as seen in Figure 33.
Timing Closure
65
Cross-domain paths include those from the clk1 domain to the clk2
domain, i.e., reg12 to reg23, and those from the clk2 domain to the clk1
domain, i.e., reg22 to reg13.
In this example, the clocks clk1 and clk2 are unrelated. By default, the
engine does not cover paths that are transferred between unrelated clock
domains. In Case Study 6 CLKSKEWDIFF on page 60, we use the
CLKSKEWDIFF preference and establish the relationship between two clock
domains; therefore, the paths between them will be analyzed.
Assume that the clock period of the launching register FF_S is PL, and that
the clock period of the receiving register FF_D is PR. When analyzing paths
crossing these two clock domains, the engine uses the following approach to
calculate and apply the default timing requirement:
1. Align both clocks first active edge at time tp0 = 0, which means that at
time tp0, the time different between two active clock edges td0 = 0ns.
By doing this, we will know that at time tpN = N * LCM(PL, PR), the 2
clocks active edge will be aligned again. Here N is any integer, and LCM
is least common multiple. For example, if PL is equal to 2 and PR is
equal to 3, then LCM(PL, PR) is 6.
2. Between the time tp0 = 0 and tp1 = 1 * LCM( PL, PR) = LCM(PL , PR),
find two positive integers m and n, where m and n meet the following
criteria:
a. tp0 <= m * PL < n * PR <= tp1, that is, 0 <= m * PL < n * PR <= LCM
(PL, PR)
b. the value of tmin is the smallest possible number of t, where
t = (n * PR) - (m * PL)
3. The value of tmin is the default timing requirement from the launching
register FF_S to the receiving register FF_D.
66
Timing Closure
For the cross-domain paths that are from clk1 domain to clk2 domain, i.e.,
reg12 to reg23:
PL = 2ns, PR = 3ns
Timing Closure
67
This can also be found in the TRACE report, as 1ns was reported as the delay
constraint.
Error: The following path exceeds requirements by 0.609ns (weighted slack = 1.827ns)
Logical Details: Cell type Pin type
Cell/ASIC name (clock net +/-)
Source:
FF
Q
reg12 (from clk1_c +)
Destination:
FF
Data in
reg23 (to clk2_c +)
Delay:
0.956ns (31.7% logic, 68.3% route), 1 logic levels.
Constraint Details:
0.956ns physical path delay SLICE_0 to SLICE_1 exceeds
(delay constraint based on source clock period of 2.000ns and destination
clock period of 3.003ns)
1.000ns delay constraint less
0.500ns skew and
0.153ns M_SET requirement (totaling 0.347ns) by 0.609ns
Physical Path Details:
......
Similarly, for the cross-domain paths that are from the clk2 domain to the
clk1 domain, i.e., reg22 to reg13:
PL = 3ns, PR = 2ns
By default, 1ns is also the timing requirement for these cross-domain paths,
as illustrated in Figure 35.
Figure 35: Timing Requirement for Cross-Domain Paths
For cross-domain paths, using the default calculated minimum delay between
two active clock edges as the timing requirement might not reflect the actual
design behavior; and, in most cases, it will have your design and the engine
over-constrained. This usually has two side effects:
68
Timing Closure
The timing-driven PAR engine will spend a lot of runtime trying to meet the
unrealistic requirements, while the true critical paths might be undercovered.
Then the engine will use the following formula to calculate the timing
requirement:
<default delay calculated> + (n - 1) * <multiplier factor
applied clock period>
In this example, for the path from clk1 domain to clk2 domain, the timing
requirement is as follows:
1ns + (2 - 1) * 3ns = 4ns
Similarly, for the path from clk2 domain to clk1 domain, the timing
requirement is as follows:
1ns + (2 - 1) * 2ns = 3ns
In addition to using the multiplier factor, you can use an absolute delay value,
in nanoseconds, when defining MULTICYCLE preferences. For example:
MULTICYCLE FROM CLKNET "clk1_c" TO CLKNET "clk2_c" 3.000000
ns ;
MULTICYCLE FROM CLKNET "clk2_c" TO CLKNET "clk1_c" 2.000000
ns ;
Defining MULTICYCLE using clock names will have the constraints apply to
all paths covered by the clocks. To specify MULTICYCLE for a specific path,
you can use the format of the following example:
MULTICYCLE FROM CELL "reg12" TO CELL "reg23" 3.000000 ns ;
MULTICYCLE FROM CELL "reg22" TO CELL "reg13" 2.000000 ns ;
For detailed information about MULTICYCLE, its syntax and usage, refer to
the Constraints Reference Guide in the Diamond online Help.
Timing Closure
69
In order to relax the engine, the multiplier factor n must be greater than
1. If it is equal to 1, which is the default, the engine will behave as if the
MULTICYCLE preference has not been defined. If it is less than 1, the
engine will be even more over-constrained, which is not what you expect.
Diamond does not issue a warning when the multiplier factor is less than
or equal to 1.
Case study 8 Clock Over-Constrained
Over-constraining clocks in your design might work for some designs, but you
should not use it as a cure-all practice, because it can introduce side effects.
For those designs that only have a single clock domain, or that have multiple
unrelated clock domains, the side effects introduced are not very obvious. But
for designs that have multiple clock domains and where cross-domain paths
do exist, you should be especially careful. In this case, if cross-domain paths
are not appropriately constrained, you can actually drive the engine
incorrectly. The engine will then spend a huge amount of time trying to meet
the unrealistic timing requirements and eventually fail with a large amount of
timing errors.
For example, as explained in Case Study 7 Timing Exception 1
MULTICYCLE on page 63, for the following FREQUENCY preferences:
FREQUENCY PORT "clk1" 300.000000 MHz ;
FREQUENCY PORT "clk2" 150.000000 MHz ;
The default delay requirement will be 3.333ns for all cross-domain paths from
the clk1 domain to the clk2 domain.
Now instead of defining 300MHz for the clk1, we over-constrain it by 3MHz
through the following preference (similarly, clk1 can be over-constrained by
a few):
FREQUENCY NET "clk1" 303.000000 MHz;
FREQUENCY NET "clk2" 150.000000 MHz;
70
Timing Closure
The default delay requirement from the clk1 domain to the clk2 domain will
become 0.066ns, which is the worst-case edge-to-edge delay. This result can
be observed in the TRACE report:
================================================================================
Preference: FREQUENCY PORT "clk2 " 150.000000 MHz ;
2 items scored, 1 timing errors detected.
-------------------------------------------------------------------------------Error: The following path exceeds requirements by 1.700ns (weighted slack = 171.700ns)
Logical Details: Cell type Pin type
Cell/ASIC name (clock net +/-)
Source:
FF
Q
reg12 (from clk1_c +)
Destination:
FF
Data in
reg23 (to clk2_c +)
Delay:
1.116ns (27.2% logic, 72.8% route), 1 logic levels.
Constraint Details:
1.116ns physical path delay SLICE_0 to SLICE_1 exceeds
(delay constraint based on source clock period of 3.300ns and destination clock
period of 6.666ns)
0.066ns delay constraint less
0.497ns skew and
0.153ns M_SET requirement (totaling -0.584ns) by 1.700ns
......
This preference instructs the PAR engine to use 303MHz as the clk1
FREQUENCY requirement. At the same time, TRACE still uses 300MHz for
static timing analysis.
Since the TRACE reports still use the defined FREQUENCY or PERIOD for
static timing analysis, you might not notice anything going on incorrectly. But
the timing-driven PAR might have completely different numbers to drive itself
and spend a huge amount of time trying to meet the timing.
Timing Closure
71
If not well constrained, this condition can mask the violations of real timing
paths and make the performance results overly pessimistic.
False paths are treated as unconstrained by TRACE and timing-driven PAR. If
you can accurately describe false paths, design performance will usually
improve, because a false path is treated by PAR as unconstrained. With
relaxed timing objectives, PAR optimizes the true critical paths instead. In a
similar manner, unconstrained paths are ignored by TRACE and true critical
paths are reported instead.
You should use the BLOCK preference on those identified false paths. Refer
to the BLOCK PATH section for details.
Case study 10 Use PLL FREQUENCY Settings
Most designs use PLLs to generate clocks driving the FPGA circuit. The
following example shows a design that uses a PLL. This design is similar to
the one used in all of the previous case studies, but instead of using two
unrelated top-level clocks, it uses a PLL.
module example(clk1, data1, data2, rst, cout, pll_lock);
input clk1, data1, data2, rst;
output cout, pll_lock;
reg reg11, reg12, reg13;
72
Timing Closure
The PLL has the following frequency settings: 100MHz clk1 input, 300MHz
clkop output, and 150MHz clkok output.
If there is no FREQUENCY preference defined in your LPF file, the
FREQUENCY values from the PLL will be used to drive the engine. This can
be observed in the TRACE report:
Preference Summary
FREQUENCY NET "clk1_c" 100.000000 MHz (0 errors)
0 items scored, 0 timing errors detected.
FREQUENCY NET "clkop" 300.000000 MHz (0 errors)
5 items scored, 0 timing errors detected.
Report: 375.094MHz is the maximum frequency for this
preference.
FREQUENCY NET "i_my_pll/CLKOS" 300.000000 MHz (0 errors)
0 items scored, 0 timing errors detected.
Timing Closure
73
Here the clkok domain runs 150MHz and the clkop domain runs 300MHz,
which are from the PLL frequency settings.
This example still shows cross-domain behavior. In addition, since both clocks
are outputs from a PLL, by definition, these two clocks are related. Because
the engine will analyze cross-domain paths, MULTICYCLE preferences are
needed to relax the engine. See Case Study 7 Timing Exception 1
MULTICYCLE on page 63.
One important fact to remember when using PLL: after the PLLs clocks are
routed, there will be skews between different clock outputs. The skew will be
calculated automatically by the engine, as shown in the PAR TRACE report:
74
Timing Closure
Timing Closure
75
The clock skew details section shows how the clock skew was calculated.
From here, you can also see the phase adjustment (shown in blue), that was
set when you generated the PLL from IPExpress.
Overwrite PLL FREQUENCY Settings
There are cases where you might want to overwrite PLL FREQUENCY
settings. For example:
when you use a different FREQUENCY to drive the engine and static
timing analysis
when you apply other options such as PAR adjustment using PAR_ADJ,
specify hold margin using HOLD_MARGIN, or specify peak-to-peak jitter
value for the incoming clock using CLOCK_JITTER
The TRACE reports now show that the new FREQUENCY preferences
defined in the LPF are used:
Preference Summary
FREQUENCY NET "clk1_c" 100.000000 MHz (0 errors)
0 items scored, 0 timing errors detected.
FREQUENCY NET "clkop" 330.000000 MHz (0 errors)
4 items scored, 0 timing errors detected.
Report: 375.094MHz is the maximum frequency for this
preference.
FREQUENCY NET "i_my_pll/CLKOS" 300.000000 MHz (0 errors)
0 items scored, 0 timing errors detected.
FREQUENCY NET "clkok" 165.000000 MHz (0 errors)
4 items scored, 0 timing errors detected.
Report: 375.094MHz is the maximum frequency for this
preference.
WARNING - trce: The Preference FREQUENCY NET at signal clkop
(CLKOP) or signal i_my_pll/CLKOS (CLKOS) do not match their
divider settings for i_my_pll/PLLInst_0
From the report, you can easily see that there is a warning about the
overwriting of PLL settings. You should make sure that the overwriting is
indeed intended.
One important fact when overwriting the PLL settings: the PLL has been
configured and generated from IPExpress and its function has been fixed.
The overwriting values defined in the LPF file only affect the engine, to drive
other parts of your design, and the TRACE report. TRACE will calculate
slacks and other timing values, such as clock skews, based on the
preferences defined. To actually change the PLL definition, you must use
IPExpress to reconfigure it and regenerate it.
76
Timing Closure
Designs with PLLs usually have FREQUENCY defined through the PLLs.
If FREQUENCY preferences are not defined in your LPF file, the PLL
settings will take effect.
Overwriting PLL settings will not affect the PLL itself. To reconfigure the
PLL using a different FREQUENCY, use IPExpress.
Timing Closure
77
Since paths crossing multiple clock domains are not analyzed, now the
engine becomes under-constrained, any many cross-domain timing issues
are now under-covered. This under-constraining can be examined by looking
at the percentage of constraint coverage, where the coverage drops
dramatically.
Instead of blocking all the related cross-domain paths, you can also
selectively block certain paths using the BLOCK PATH preference. For
example, the following preference will block the paths from the clk1 domain
to the clk2 domain:
BLOCK PATH from CLKNET "clk1_c" to CLKNET "clk2_c";
You can also block a specific path using the BLOCK PATH preference, as
explained in the section BLOCK PATH.
If you block only part of cross-domain paths, you might have a higher
constraint coverage percentage than if you use the BLOCK
INTERCLOCKDOMAIN PATHS preference, but potential timing issues can
still exist in your design.
What Is Learned from Case Study 11
From this case study, the following points are learned:
78
If your design has multiple clock domains, you need to pay additional
attention to the possibility that cross-domain paths exist between related
clocks.
Timing Closure
Rule number 2: Preferences that are more specific take precedence over
less specific ones. This means that individual net or path preferences
supersede group (bus) preferences, and group preferences supersede
global preferences.
For example, if you have the following two INPUT_SETUP preferences in
your LPF,
INPUT_SETUP ALLPORTS INPUT_DELAY 3.000000 ns CLKNET
"clk1_c";
INPUT_SETUP PORT "data1" 4.000000 ns CLKNET "clk1_c";
The input data1 will have 4ns setup time requirement. In the following
example,
CLOCK_TO_OUT "cout" 1.0 ns CLKPORT "clk" CLKOUT PORT
"clkout";
CLOCK_TO_OUT "cout" 2.0 ns CLKPORT "clk" FROM "c2" CLKOUT
PORT "clkout";
Timing Closure
79
If your design has multiple clocks, you should examine your design
and determine whether any cross-domain paths exist in your design.
80
The basics:
Timing Closure
INPUT_SETUP
CLOCK_TO_OUT
MAXDELAY
CLKSKEWDIFF
MULTICYCLE
CLKSKEWDISABLE
MAXSKEW
BLOCK
Finally and importantly, you should check the preference coverage section in
your TRACE report to ensure a high percentage of coverage. To find out
those paths that are neither covered nor analyzed by TRACE and PAR, turn
on the Check Unconstrained Paths option in the TRACE strategy settings
and examine the result in the TRACE reports.
Other Considerations
Hold-Time Analysis
If you enable the Hold Analysis through the TRACE strategy settings, which is
the default setting, TRACE will produce a hold-time check based on your
timing preferences.
By default, TRACE analyzes designs for setup time violations using the worst
case operating conditions for the target performance grade. In contrast to
setup time analysis, hold time analysis uses best case operating conditions.
This approach of analyzing at both corners of the operating conditions
establishes a well-defined range in which the device will operate successfully.
As explained in Timing-Driven PAR Process on page 30, PAR only tries to
correct setup time violations when auto-hold time correction is not enabled
through the PAR strategy settings. You should always examine the hold time
analysis result in the TRACE report to ensure that there are no hold time
violations in your final placed and routed design.
Timing Closure
81
To get an accurate 90-degree phase shift, use two primary clock nets: one
for the feedback path and one for the shifted clock. This limits uncertainty
to the insertion delay of sysCLOCK PLL (pad to input). The uncertainty
can then be reconciled with FDEL settings in 250-picosecond increments.
82
Timing Closure
Controlling PAR
Extensive benchmark experiments have been performed to determine the
optimum per-device default settings for all PAR options. At times, you can
obtain improved timing results on a design-by-design basis by trying different
variations of the PAR options. This section describes the techniques that you
can use to improve timing results from TRACE on placed and routed designs.
Running Multiple Routing Passes
You can obtain improved timing results by increasing the number of routing
passes during the routing phase of PAR. By default, the number of routing
passes is 6, but you can change this number through PAR strategy settings,
as shown in Figure 37.
Figure 37: Setting the Number of Routing Passes
The router routes the design for the defined number of routing iterations or
until all the timing preferences are met, whichever comes first. For example,
PAR stops after the second routing iteration if it hits a timing score of zero on
the second routing iteration.
You can view the PAR report in the Diamond Report window. The report
contains execution information about the PAR run. For example:
0 connections routed; 26590 unrouted.
Starting router resource preassignment
Completed router resource preassignment. Real time: 11 mins
31 secs
Timing Closure
83
The PAR report also shows the steps taken as the program converges on a
placement and routing solution. In this routing convergence example, the
number in parenthesis is the timing score. In this example, timing was met
after three routing iterations, as you can see from the (0) timing score.
Using Multiple Placement Iterations (Cost Tables)
You can specify multiple placement iterations through the PAR strategy
settings, as shown in Figure 38.
Figure 38: Setting the Number of Placement Iterations
By default, the number of iterations is set to 1, and the placement start point is
set to iteration 1 (cost table 1). You can increase the number of placement
iterations and set a different start point. After one PAR iteration is completed,
PAR loops back through the PAR flow until the number of iterations has
reached the number defined. PAR keeps track of the timing and routing
performance for every iteration, and the best result will be used as the final
result. If Placement Iterations is set to 0, PAR will run indefinitely through
multiple iterations until a 0 timing score is reached. In a design that is known
to have large timing violations, a 0 timing score is never reached. As a
consequence, you must intervene and stop the flow at a given point in time.
The following is a PAR report example:
84
Timing Closure
Cost Table
Level/
Cost [ncd]
---------5_1
*
5_2
* : Design
Summary
Number
Unrouted
-------0
0
saved.
Timing
Score
-------0
2846
Run
Time
----26
42
NCD
Status
-----------Complete
Complete
In this example:
The 5_ under the Level/Cost column means that the placement effort
level was set to 5. The placement effort level can range from 1 (lowest) to
5 (highest).
Sometimes it is a good practice to save more than one result from a multiPAR run and use PAR TRACE on each result. Since the timing score is a
composite of all timing constraints, a low score might not be ideal for your
application, unless it is 0.
In general, multiple placement iterations can help placement, but they can
also use many CPU cycles. Multiple placement iterations should be used
carefully because of system limitations and the uncertainty of results. It is
better to fix the root cause of timing problems in the design stage.
Timing Closure
85
86
Timing Closure
RTL Coding
This is the most crucial and most effective area for achieving timing closure.
Instead of blindly coding your designs and using the push-button flow, the
best approach is to code your designs specifically for Lattices product
architecture. This can involve using/instantiating the embedded blocks,
pipelining, retiming, etc. To do this effectively, you should understand both
hardware and software.
Using Software
When using software tools, you should understand how to best utilize its
features and functions. Refer to previous sections and other related
documents for details. Misunderstanding or misusing the options and
switches of software tools can also lead to timing problems. For example, is
timing-driven synthesis really good for a design? (See synthesis General
Considerations on page 36.) What is the right trade-off between area and
speed synthesis mode? Are timing constraints for MAP and PAR accurate
and complete? Is the design under-constrained or over-constrained? Are
those MAP and PAR options used correctly and appropriately?
Area Balance
Although parallelism usually means faster, this is not always true. The
resource of a chosen FPGA for your design is limited. While usually it will be
faster if you increase the degree of parallel processing for the same design, it
will also increase the resource usage and lead to a large implementation,
Timing Closure
87
with increased numbers of signals and connections. This can introduce long
routing or high routing resource usage and push certain non-critical paths to
becoming critical. A balance needs to be found.
Resource Utilization
Before you actually analyze the timing problem, you should make sure that
your design does not over utilize your chosen device.
Over-utilized designs usually cause timing closure issues, because it is nearly
impossible for Diamond to honor all of the timing constraints when the
required resources exceed the amount available. Experiences shows that
slice utilization of more than 85% should be considered as an over-utilized
design. In addition, when the number of clock domains exceeds the number
of primary clocks available, or block RAM/DSP utilization is 80% or more, the
design should be considered an over-utilized design.
To determine the device utilization, look at the details in the Synthesis Report
file (if you use Synplify, the file extension is .srr) and the MAP Report file (with
the file extension .mrp). These reports can be viewed through either the GUI
or a text editor.
The first thing is to look at whether the appropriate resources are allocated or
inferred in the synthesis report file. If anything is missing, you should further
examine why.
The next step is to check the resource utilization in the MAP Report File.
Make sure that the resource usage does not exceed the recommended
values.
If the resource utilization exceeds the recommended values, you should
recode your HDL or migrate to a larger device. See the document Congested
Design.
88
Timing Closure
needed. These items tend to cause timing issues. By checking these items,
you can fix timing issues before you spend time analyzing the timing reports,
and you can get closer to achieving your timing.
I/O timing: to use or not to use the I/O registers (see Using I/O Register to
Improve I/O Timing on page 38 and Adding Delays to Input Registers
on page 39)
Timing Closure
89
The MAP TRACE report can be viewed in the Diamond Report View, or you
can open the report file in any text editor. The MAP TRACE report file is in
your implementation directory and has the file extension .tw1.
Rule of thumb: If there is any timing issue reported by MAP TRACE, usually
it is an RTL issue and you should correlate the issue and your HDL code.
You should also check your preference coverage and correctness, as
illustrated in Figure 41:
Figure 41: Checking Preferences
You need to pay attention to every preference listed in the report, especially
those that have the most timing failures. Examining the timing failures gives
clues as to where coding improvements can be made in the RTL or what
constraints need to be adjusted. The types of coding improvements are
registering/pipelining, retiming, or simply recoding to prevent long paths. The
types of constraint adjustment are constraint relaxation (if over-constrained)
or constraint coverage improvement (if under-constrained) such as adding
false paths, multi-cycle and MULTICYCLE for paths that cross related clock
domains.
90
Timing Closure
Unconstrained Paths
Unconstrained paths can be found in the MAP TRACE report for both setup
time and hold time analysis. For example:
You can review the list in order to improve your preference coverage or to
identify any paths that should be constrained.
Note that you need to turn on Check Unconstrained Paths through the
TRACE strategy settings (see MAP TRACE on page 29 and PAR TRACE
on page 32). By default, this is turned off.
Logic Levels
Logic levels can be examined in the synthesis report, MAP TRACE report and
PAR TRACE report. You should check to see whether paths exceed the
timing constraints and whether logic levels are too high. For example:
This example shows that the path has 20 logic levels and exceeds the
requirement by 1.430ns. The Physical Path Details section that follows gives
more information on the failed path.
When this situation happens, you have the following options:
Timing Closure
91
I_TOP_PCIDMA/U_3/sgdmac_inst/engine/dst_bus_r8_0
FF
Data in
I_0/I_RX_HDLC_CT_1/CT2/M_DWD_CNT_9
15.245ns
(from CK66_keep
(to CK66_keep
92
Timing
Score
-------458
Run
Time
----6:01:32
NCD
Status
-----------Complete
Timing Closure
If you have a small timing score number (less than a few hundred), you
should try a multi-PAR process with some PAR strategy changes (see PAR
PAR (Place & Route Design) Settings in Strategy for Timing Closure on
page 31 and Controlling PAR on page 83). A different strategy and multiPAR cost table might yield better results than the initial single seeded PAR
run. If a timing score of zero is achieved after a multi-PAR run, you can move
to the next step of the process.
If you still have timing issues, you should analyze the PAR TRACE report,
debug the timing problems, and use appropriate approaches to fix the issues.
Timing Closure
93
Multi-PAR
When using multi-PAR, you should make sure that multiple PAR results are
saved for later timing analysis. You can modify this through the PAR strategy
settings (see section Controlling PAR on page 83). Saving multiple PAR
results has the following benefits:
You can run PAR TRACE on multiple PAR results; for example, the top 5
results that have the lowest timing scores. This helps you identify timing
issues from multiple views with different angles. Note that failing paths
usually show up in more than one result, and this helps you identify
problematic areas in your design.
You can select the best of the PAR results or the one whose timing issues
can be quickly resolved.
94
Timing Closure
The following 4 signals are selected to use the primary clock routing resources:
clk_pll_c (driver: PLL_soft_wb_inst/PLL_inst0/PLLInst_0, clk load #: 400)
......
The following 6 signals are selected to use the secondary clock routing resources:
clk_c (driver: OSCH_inst, clk load #: 393)
......
WARNING - par: Signal "clk_c" is selected to use Secondary clock resources; however
its driver comp "clk" is located at "N3", which is not a dedicated pin for connecting
to Secondary clock resources. General routing has to be used to route this signal,
and it may suffer from excessive delay or skew.
Name
Fanout
REG_DEL
--tel_clk_155)
ROUTE
3
CTOF_DEL
--ROUTE
1
CTOF_DEL
--ROUTE
5
CTOF_DEL
--ROUTE
4
CTOF_DEL
--ROUTE
50
(to tel_clk_155)
Delay (ns)
Site
0.285
R89C27C.CLK to
4.914
0.180
0.312
0.180
0.741
0.180
0.861
0.180
1.986
-------9.819
R89C27C.Q0
R91C45D.D1
R91C45D.F1
R91C45D.D0
R91C45D.F0
R89C48D.B0
R89C48D.F0
R82C50A.C0
R82C50A.F0
to
to
to
to
to
to
to
to
to
Resource
R89C27C.Q0 U_core/SLICE_29896 (from
R91C45D.D1
R91C45D.F1
R91C45D.D0
R91C45D.F0
R89C48D.B0
R89C48D.F0
R82C50A.C0
R82C50A.F0
R75C72B.CE
U_core/rx_tu_mode_2
U_core/ddwr_stm8/SLICE_36645
U_core/payload_we17_tz_tz
U_core/SLICE_36645
U_core/payload_we17
U_core/SLICE_38198
U_core/nxt_bcnt_0_sqmuxa_1
U_core/SLICE_38332
U_core//payload_we_1_sqmuxa_1
This path starts with the register rx_tu_mode with fanout number 3 and has
4.914ns routing delay, which contributes more than 50% of the total delay. To
address this issue, you should add the following synthesis directive to the
HDL to limit the number of loads:
reg rx_tu_mode /* synthesis syn_maxfan = 1 */
For your actual designs, depending on the value of the desired number of
loads, the appropriate number of registers will be generated.
Experience shows that if the value of the desired number of loads is too small,
it could cause an unintended effect: that the load of the input source of this
register increases. You should check to see whether such an effect occurs
when this modification is used.
Timing Closure
95
Name
Fanout
REG_DEL
--ROUTE
12
CTOF_DEL
--ROUTE
1
C1TOFCO_DE --ROUTE
1
FCITOF1_DE --ROUTE
1
Delay (ns)
Site
Resource
0.243
R51C141C.CLK to
R51C141C.Q0 SLICE_15 (from CLK_c)
0.760
R51C141C.Q0 to
R51C143A.A0 tmp1_0
0.147
R51C143A.A0 to
R51C143A.F0 B_1_CR7_ram_0/SLICE_9
2.725
R75C143A.F0 to
R75C142C.B1 B_1_4
0.277
R75C142C.B1 to
R75C142C.FCO SLICE_3
0.000
R75C142C.FCO to
R75C143A.FCI Y_1_cry_4
0.177
R75C143A.FCI to
R75C143A.F1 SLICE_4
0.811
R75C143A.F1 to IOL_R52A.OPOSA Y_1_6 (to CLK_c)
-------5.140(17.9% logic, 83.1% route), 4 logic levels.
reg [7:0] A_d1, A_d2, A_d3, A_d4 /* synthesis syn_srlstyle = registers */;
reg [7:0] B_d1, B_d2, B_d3, B_d4 /* synthesis syn_srlstyle = registers */;
always @(posedge CLK)
begin
A_d1 <= A;
B_d1 <= B;
A_d2 <= A_d1;
B_d2 <= B_d1;
A_d3 <= A_d2;
B_d3 <= B_d2;
A_d4 <= A_d3;
B_d4 <= B_d3;
end
REGMODE_A = "REG";
REGMODE_B = "REG";
96
Timing Closure
Name
Fanout
Delay (ns)
Site
Resource
C2Q_DEL
--2.484 EBR_R49C2.CLKA to EBR_R49C2.DOA3 U_core/mem_mem_0_8 (from
sys_clk_125)
ROUTE
1
1.730 EBR_R49C2.DOA3 to
R35C15C.C1 U_core/mp_fifo_dout_1_35
CTOF_DEL
--0.180
R35C15C.C1 to
R35C15C.F1 U_core/SLICE_13772
ROUTE
3
0.715
R35C15C.F1 to
R33C27B.D1 U_core/mp_fifo_dout_2_35
CTOF_DEL
--0.180
R33C27B.D1 to
R33C27B.F1 U_core/SLICE_38961
ROUTE
6
0.661
R33C27B.F1 to
R31C33D.D1 U_core/mp_eop_out_1
CTOF_DEL
--0.180
R31C33D.D1 to
R31C33D.F1 U_core/SLICE_35423
ROUTE
1
0.370
R31C33D.F1 to
R31C33D.B0 U_core/
un1_abnormal_empty20_1_m2
CTOF_DEL
--0.180
R31C33D.B0 to
R31C33D.F0 U_core/SLICE_35423
ROUTE
32
1.310
R31C33D.F0 to
R26C43B.CE U_core/
un1_abnormal_empty20_1_m4 (to sys_clk_125)
-------7.990(40.1% logic, 59.9% route), 5 logic levels.
Name
Fanout
REG_DEL
--ROUTE
1
BYPASS_DEL --ROUTE
1
PD_DEL
--ROUTE
1
Delay (ns)
Site
Resource
0.243
R47C42C.CLK to
R47C42C.Q0 SLICE_100 (from CLK_c)
1.550
R47C42C.Q0 to *18_R34C51.B14 B_1_32
0.220 *18_R34C51.B14 to *_R34C51.ROB14 Y_wire_1_pt
0.000 *_R34C51.ROB14 to *54_R34C54.A32 Y_wire_1_pt_ROB14
4.870 *54_R34C54.A32 to *54_R34C54.R40 Y_wire_1_40_0
1.283 *54_R34C54.R40 to
R23C59C.M0 Y_wire_40 (to CLK_c)
-------8.166(24.1% logic, 75.9% route), 3 logic levels.
There is a large delay (4.870ns) through the DSP block. Since the actual
function in the HDL is a simple multiplication and using a DSP block is
unnecessary, you should use the following directive to prevent a specific logic
from using the DSP block:
wire [40:0] Y_wire /* synthesis syn_multstyle = "logic" */;
assign Y_wire = A_d4 * B_d4;
Timing Closure
97
After the modification, you need to run MAP again. This will ensure that
unrelated logic is not packed together by MAP.
When MAP finishes successfully, it propagates UGROUP constraints to the
generated PRF file that will be used to drive PAR. If grouping is no longer
desired, and you want to allow PAR to freely place the elements in the group
instead of trying putting them all in one SLICE or closed slices, you can
manually edit the PRF file to remove the group.
In this example, the UGROUP added is MODULE_A. In the generated PRF
file, you should see a few lines similar to the following example:
PGROUP "MODULE_A"
COMP "U_core/module_A/SLICE_0"
COMP "U_core/module_A/SLICE_1"
......
COMP "U_core/module_A/SLICE_1000"
PGROUP "PGROUP_X"
You should remove the line PGROUP "MODULE_A" toward the last line of
the group. In this example, the last line is the one containing SLICE1000.
Fixing Clock Enable (CE)
The enable pin on a PFU register usually has larger delays than the data pins.
Look at the following example from part of the PAR TRACE report:
98
Timing Closure
The routing delay in this example contributes 91.1% of the total. The
CE_SET requirement statement shown in the report gives a clue that this is
a Clock Enable delay issue. To fix it, set syn_useenables synthesis directive
to 0. For example:
reg Myreg /* synthesis syn_useenables=0 */
Final PAR
With all or most of the critical timing issues identified and addressed, you
should have a final PAR run with increased PAR effort using the following
options. These options can be modified through the PAR strategy settings.
See Controlling PAR on page 83.
Timing Closure
99
Placement iteration: 10 to 30
Placement effort: 5
Routing passes: 10
You should carefully examine the results to see if every iteration provides
significant improvement. If this is not the case, you might have reached a
point where a serious design review needs to be performed.
Clock Boosting
Clock boosting is the deliberate introduction of clock skew on a target flip-flop
to increase the setup margin. The automated clock-boosting tool attempts to
meet setup constraints by introducing delays to as many target registers as
needed to meet timing. In effect, it borrows register hold margins to meet
register setup timing. Clock boosting is accomplished through the following
features:
100
For the ECP3 device family, this is achieved by rerouting the clock through
the switch matrix to gain some delay on the destination clock. It introduces
skew only to the destination registers, not on the clock network.
Timing Closure
For certain device families, every programmable flip-flop in the device has
programmable delay elements before clock inputs for this purpose.
A 4-tap delay cell structure in front of the clock port of every flip-flop in
the device (includes I/O flip-flops)
The ability to borrow clock cycle time from one easily met path and
give this time to a difficult-to-meet path
Clock boosting is typically most useful in designs that are only missing timing
on a few paths for one or two preferences. If the design is missing timing by
over a few nanoseconds on any given path, clock boosting cannot schedule
skew in a way that eliminates enough timing to make the critical preference.
Clock boosting run times can be shortened by using a preference file that
contains only the failing preferences.
The example illustrated in Figure 44 shows two register-to-register transfers
that both need to meet the 10-ns period constraint. By using the DEL2 delay
cell to delay the clock input on flip-flop FF_2, the first register transfer makes
its period constraint with a new minimum period of approximately 9.7 ns, and
the second register transfer makes its period constraint by approximately 8.3
ns. The D1, D2, and D3 delays shown vary, depending on the performance
grade and FPGA device family.
Figure 44: Clock Boosting
Timing Closure
Some circuits show much improvement, but others show no gain. Clock
boosting results are very design-dependent.
Clock boosting uses minimum delay values that have not yet been
validated at the system level.
101
102
Timing Closure