Vlsi Concepts
Vlsi Concepts
Vlsi Concepts
Static timing analysis (STA) is a vast domain involving many sub-fields. It involves
computing the limits of delay of elements in the circuit without actually simulating it. In
this post, we have tried to list down all the posts that an STA engineer cannot do
without. Please add your feedback in comments to make reading it a more meaningful
experience.
Setup and hold interview questions
Clock gating concepts
Setup time and hold time - static timing analysis - Definition and detailed
discussion on setup time and hold time, setup hold violations and how to fix them
Metastability - This post discusses the basics of metastability and how to avoid it.
Problem: clock gating checks at a complex gate - An exercise to analyze the
requirements of clock gating checks at a complex gate
Lockup latch - The basics of lockup latch, both from timing and DFT perspective
have been discussed in this post.
Lockup latches vs. lockup registers - Provides an insight into the situations where
lockup latches and lockup registers can be useful.
Clock latency - Read this if you wish to get acquainted with the terminology
related to clock latency
Data checks - Non-sequential setup and hold checks have been discussed, very
useful for beginners
Modeling skew requirements with the help of data checks - Explains with an
example how data checks can be used to maintain skew requirements
What is static timing analysis - Defines static timing analysis and its scope
Setup checks and hold checks for reg-to-reg paths - Discusses the default setup
and hold checks for timing paths starting from and ending at registers
Setup checks and hold checks for register-to-latch paths - Discusses the default
setup and hold checks for timing paths starting from registers and ending at latches
Setup checks and hold checks for latch-to-reg timing paths - Discussed the
default setup and hold checks for timing paths starting from latches and ending at
registers
All about clock signals - Discusses the basics of clock signals
Synchronizers - Different types of synchronizers have been discussed in detail
Timing corners - dimensions in timing signoff - Highlights the importance of
signing off in different corner-case scenarios
Ensuring glitch-free propagation of clock - Discusses about the hazards that can
occur, if there is a glitch in clock
Clock switching and clock gating checks - The basics of clock gating check, and
how to apply these is discussed
Clock gating checks at a mux - How clock gating checks should be applied on a
mux is discussed in detail
False paths - what are they - This post discussed the basics of false paths and
how to treat them
Multicycle paths handling in STA - Basics of multicycle paths and how they are
treated in STA
All about multicycle paths in VLSI - Architecture specific description and handling
of multicycle paths, a must read
Propagation delay - Defines propagation delay and related terms
Is it possible for a logic gate to have negative delay - Thought provoking post on
whether a logic gate can have negative delay
Worst slew propagation - Discusses the basics of worst slew propagation
On-chip variations - Describes on-chip variations and the methods undertaken to
deal with these
Temperature inversion - Discusses the concept of temperature inversion and
conductivity trends with temperature
Can a net have negative delay - Describes how a net cannot have a negative
delay
Timing arcs - Discusses the basics of timing arcs, positive and negative
unateness, cell arcs and net arcs etc.
Time borrowing in latches - Discusses the basics of the concept of time
borrowing
Interesting problem - latches in series - Describes why it is essential to have
alternate positive and negative latches for sequential operation
Virtual clock - Explains the concept of virtual clock
Minimum pulse width - Discusses about minimum pulse width checks
Basics of latch timing - Definition of latch, setup time and hold timing of a latch,
latch timing arcs are discussed
Definition of Setup time: Setup time is defined as the minimum amount of time before the
clock's active edge that the data must be stable for it to be latched correctly. In other words, each
flip-flop (or any sequential element, in general) needs some time for the data to remain stable
before the clock edge arrives, such that it can reliably capture the data. This duration is known
as setup time.
The data that was launched at the previous clock edge should be stable at the
input at least setup time before the clock edge. So, adherence to setup time
ensures that the data launched at previous edge is captured properly at the
current edge. In other words, we can also say that setup time adherence ensures
that the system moves to next state smoothly.
Definition of Hold time: Hold time is defined as the minimum amount of time after the clock's
active edge during which data must be stable. Similar to setup time, each sequential element
needs some time for data to remain stable after clock edge arrives to reliably capture data.
This duration is known as hold time.
The data that was launched at the current edge should not travel to the capturing flop
before hold time has passed after the clock edge. Adherence to hold time ensures that the
data launched at current clock edge does not get captured at the same edge. In other words,
hold time adherence ensures that system does not deviate from the current state and go
into an invalid state.
As shown in the figure 1 below, the data at the input of flip-flop can change anywhere
except within the seup time hold time window.
Figure 1: Setup-hold window
A D-type latch
Cause/origin of setup time and hold time: Setup time and hold time are said to be the
backbone of timing analysis. Rightly so, for the chip to function properly, setup and hold
timing constraints need to be met properly for each and every flip-flop in the design. If
even a single flop exists that does not meet setup and hold requirements for timing paths
starting from/ending at it, the design will fail and meta-stability will occur. It is very
important to understand the origin of setup time and hold time as whole design
functionality is ensured by these. Let us discuss the origin of setup time and hold time taking
an example of D-flip-flop as in VLSI designs, D-type flip-flops are almost always used. A D-
type flip-flop is realized using two D-type latches; one of them is positive level-sensitive, the
other is negative level-sensitive. A D-type latch, in turn, is realized using transmission gates
and inverters. Figure below shows a positive-level sensitive D-type latch. Just inverting the
transmission gates’ clock, we get negative-level sensitive D-type latch.
A complete D flip-flop using the above structure of D-type latch is shown in figure below:
A D-type flip-flop
Now, let us get into the details of above figure. For data to be latched by ‘latch 1’ at the
falling edge of the clock, it must be present at ‘Node F’ at that time. Since, data has to
travel ‘NodeA’ -> ‘Node B’ -> ‘Node C’ -> ‘Node D’ -> ‘Node E’ -> ‘Node F’ to reach ‘Node
F’, it should arrive at flip-flop’s input (Node A) at some earlier time. This time for data to
reach from ‘Node A’ to ‘Node F’ is termed as data setup time (assuming CLK and CLK'
are present instantaneously. If that is not the case, it will be accounted for accordingly).
Similarly, it is necessary to ensure a stable value at the input to ensure a stable value at
‘Node C’. In other words, hold time can be termed as delay taken by data from ‘Node A’ to
‘Node C’.
Setup and hold checks in a design: Basically, setup and hold timing checks ensure that a data
launched from one flop is captured at another properly. Considering the way digital designs of
today are designed (finite state machines), the next state is derived from its previous state. So,
data launched at one edge should be captured at next active clock edge. Also, the data launched
from one flop should not be captured at next flop at the same edge. These conditions are ensured
by setup and hold checks. Setup check ensures that the data is stable before the setup
requirement of next active clock edge at the next flop so that next state is reached. Similarly,
hold check ensures that data is stable until the hold requirement for the next flop for same clock
edge has been met so that present state is not corrupted.
Shown above is a flop-to-flop timing path. For simplicity, we have assumed that both the flops
are rise edge triggered. The setup and hold timing relations for the data at input of second flop
can be explained using the waveforms below:
Figure showing setup and hold checks being applied for the timing path shown above
As shown, data launched from launching flop is allowed to arrive at the input of the second flop
only after a delay greater than its hold requirement so that it is properly captured. Similarly, it
must not have a delay greater than (clock period – setup requirement of second flop). In other
words, mathematically speaking, setup check equation is given as below (assuming zero skew
between launch and capture clocks):
Tck->q + Tprop + Tsetup < Tperiod
Similarly, hold check equation is given as:
Tck->q + Tprop > Thold
If we take into account skews between the two clocks, the above equations are modified
accordingly. If Tskew is the skew between launch and capture flops, (equal to latency of clock at
capture flop minus latency of clock at launch flop so that skew is positive if capture flop has
larger latency and vice-versa), above equations are modified as below:
Setup checks and hold checks for reg-to-reg paths explains different cases covering
setup and hold checks for flop-to-flop paths.
What if setup and/or hold violations occur in a design: As said earlier, setup and hold timings
are to be met in order to ensure that data launched from one flop is captured properly at the next
flop at next clock edge so as to transfer the state-machine of the design to the next state. If the
setup check is violated, the data will not be captured at the next clock edge properly. Similarly, if
hold check is violated, data intended to be captured at the next edge will get captured at the same
edge. Setup hold violations can also lead to data changing within setup/hold window of the
capturing flip-flop. It may lead to metastability failure in the design (as explained in our post
'metastability'). So, it is necessary to have setup and hold requirements met for all the flip-flops
in the design and there should not be any setup/hold violation.
What if you fabricate a design without taking care of setup/hold violations: If you fabricate a
design having setup violations, you can still use it by lowering the frequency as the equation
involves the variable clock frequency. On the other hand, a design with hold violation cannot be
run properly. So, if you fabricate a design with an accidental hold violation, you will have to
simply throw away the chip (unless the hold path is half cycle as explained here). A design with
half cycle hold violations only can still be used at lower frequencies.
Tackling setup time violation: As given above, the equation for setup timing check is given as:
Tck->q + Tprop + Tsetup - Tskew < Tperiod
The parameter that represents if there is a setup time violation is setup slack. The setup slack can
be defined as the difference between the L.H.S and R.H.S. In other words, it is the margin that is
available such that the timing path meets setup check. The setup slack equation can be given as:
Setup slack = Tperiod - (Tck->q + Tprop + Tsetup - Tskew)
If setup slack is positive, it means there is still some margin available in the timing path. On the
other hand, a negative slack means that the paths violates setup timing check by the amount of
setup slack. To get the path met, either data delay should be decreased or clock period should be
increased.
Mitigating setup violation: Thus, we can meet the setup requirement, if violating, by
1. Decreasing clk->q delay of launching flop
2. Decreasing the propagation delay of the combinational cloud
3. Reducing the setup time requirement of capturing flop
4. Increasing the skew between capture and launch clocks
5. Increasing the clock period
Tackling hold time violation: Similarly, the equation for hold timing check is as below:
Tck->q + Tprop > Thold + Tskew
The parameter that represents if there is a hold timing violation is hold slack. The hold slack is
defined as the amount by which L.H.S is greater than R.H.S. In other words, it is the margin by
which timing path meets the hold timing check. The equation for hold slack is given as:
Hold slack = Tck->q + Tprop - Thold - Tskew
If hold slack is positive, it means there is still some margin available before it will start violating
for hold. A negative hold slack means the path is violating hold timing check by the amount
represented by hold slack. To get the path met, either data path delay should be increased, or
clock skew/hold requirement of capturing flop should be decreased.
Metastability(WRITE)
What is metastability: Metastability is a phenomenon of unstable equilibrium in digital
electronics in which the sequential element is not able to resolve the state of the input
signal; hence, the output goes into unresolved state for an unbounded interval of time.
Almost always, this happens when data transitions very close to active edge of the clock,
hence, violating setup and hold requirements. Since, data makes transition close to
active edge of clock, the flop is not able to capture the data completely. The flop starts to
capture the data and output also starts to transition. But, before output has changed its
state, the input is cut-off from the output as clock edge has arrived. The output is, then,
left hanging between state ‘0’ and state ‘1’. Theoretically, the output may remain in this
state for an indefinite period of time. But, given the time to settle down, the output will
eventually settle to either its previous state or the new state. Thus, the effect of signal
present at input of flop may not travel to the output of the flop partly or completely. In
other words, we can say that when a flip-flop enters metastable state, one cannot predict
its output voltage level after it exits the metastability state nor when the output will settle
to some stable voltage level. The metastability failure is said to have occurred if the
output has not resolved itself by the time it must be available for use. Also, since, the
output remains in-between ‘0’ and ‘1’, which means both P-MOS and N-MOS are not
switched off. Hence, VDD is shorted to GND terminal making it cause a high current to
flow through as long as the output is hanging in-between.
Metastability example: Consider a CMOS inverter circuit as shown below. The current vs
voltage (we can also say power vs voltage as VDD is constant) characteristics for this
circuit are also shown. It can be observed that output current is 0 for both input voltage
levels; i.e. ‘0’ and ‘1’. As the voltage level is increased from ‘logic 0’, the current
increases. It attains its maximum value at ‘Vin’ somewhere near VDD/2. It again starts
decreasing as ‘Vin’ is increased further and again becomes 0 when ‘Vin’ is at ‘logic 1’.
Thus, there is a local maxima for power consumption for CMOS inverter. At this point,
the device is in unstable equilibrium. As for CMOS inverter, for other CMOS devices too,
there lies ‘a local maxima’ at some value of input voltage. We all know that for a flip-flop,
the output stage is a combinational gate (mostly an inverter). So, we can say that the
output of the flip-flop is prone to metastability provided right input level.
We have just came to know that different elements are prone to metastability to different
extents. There is a measure to determine the extent to which an element is prone to
metastability failure. This is given by an interval known as ‘Mean Time Between Failures’
(MTBF) and is a measure of how prone an element is to failure. It gives the average time
interval between two successive failures. The failure rate is given as the reciprocal of
MTBF.
Figure:Problem figure
Solution: As we know, clock gating checks can be of AND type or OR type. We can
find the type of clock gating check formed between a data and a clock signal by
considering all other signals as constant. Since, all the 4 data signals control Clk in one
or the other way, there are following clock gating checks formed:
ii) Clock gating check between Data2 and Clk: Same as in case 1.
iii) Clock gating check between Data3 and Clk: There is AND type check between
Data3 and Clk.
iv) Clock gating check between Data4 and CLK: As in 1 and 2, there is AND type
check between Data4 and Clk.
Where to use a lock-up latch: As mentioned above, a lock-up latch is used where there is high
probability of hold failure in scan-shift modes. So, possible scenarios where lockup latches are to be
inserted are:
Scan chains from different clock domains: In this case, since, the two domains do not interact
functionally, so both the clock skew and uncommon clock path will be large.
Flops within same domain, but at remote places: Flops within a scan chain which are at remote
places are likely to have more uncommon clock path.
In both the above mentioned cases, there is a great chance that the skew between the launch and
capture clocks will be high. There is both the probability of launch and capture clocks having greater
latency. If the capture clock has greater latency than launch clock, then the hold check will be as shown
in timing diagram in figure 3. If the skew difference is large, it will be a tough task to meet the hold
timing without lockup latches.
Figure 2: A path crossing from domain 1 to domain 2 (scope for a lock-up latch insertion)
Figure 3: Timing diagram showing setup and hold checks for path crossing from domain 1 to domain 2
Positive or negative level latch?? It depends on the path you are inserting a lock-up latch. Since, lock-up
latches are inserted for hold timing; these are not needed where the path starts at a positive edge-
triggered flop and ends at a negative edge-triggered flop. It is to be noted that you will never find scan
paths originating at positive edge-triggered flop and ending at negative edge-triggered flop due to DFT
specific reasons. Similarly, these are not needed where path starts at a negative edge-triggered flop and
ends at a positive edge-triggered flop. For rest two kinds of flop-to-flop paths, lockup latches are
required. The polarity of the lockup latch needs to be such that it remains open during the inactive
phase of the clock. Hence,
For flops triggering on positive edge of the clock, you need to have latch transparent when
clock is low (negative level-sensitive lockup latch)
For flops triggering on negative edge of the clock, you need to have latch transparent when
clock is high (positive level-sensitive lockup latch)
Who inserts a lock-up latch: These days, tools exist that automatically add lockup latches where a scan
chain is crossing domains. However, for cases where a lockup latch is to be inserted in an intra-domain
scan chain (i.e. for flops having uncommon path), it has to be inserted during physical implementation
itself as physical information is not feasible during scan chain implementation (scan chain
implementation is carried out at the synthesis stage itself).
Which clock should be connected to lock-up latch: There are two possible ways in which we can
connect the clock pin of the lockup latch inserted. It can either have same clock as launching flop or
capturing flop. Connecting the clock pin of lockup latch to clock of capturing flop will not solve the
problem as discussed below.
Lock-up latch and capturing flop having the same clock (Will not solve the problem): In this
case, the setup and hold checks will be as shown in figure 5. As is apparent from the waveforms, the
hold check between domain1 flop and lockup latch is still the same as it was between domain 1 flop and
domain 2 flop before. So, this is not the correct way to insert lockup latch.
Lock-up latch and launching flop having the same clock: As shown in figure 7, connecting the
lockup latch to launch flop’s clock causes the skew to reduce between the domain1 flop and lockup
latch. This hold check can be easily met as both skew and uncommon clock path is low. The hold check
between lockup latch and domain2 flop is already relaxed as it is half cycle check. So, we can say that
the correct way to insert a lockup latch is to insert it closer to launching flop and connect the launch
domain clock to its clock pin.
Why don’t we add buffers: If the clock skew is large at places, it will take a number of buffers to meet
hold requirement. In normal scenario, the number of buffers will become so large that it will become a
concern for power and area. Also, since skew/uncommon clock path is large, the variation due to OCV
will be high. So, it is recommended to have a bigger margin for hold while signing it off for timing. Lock-
up latch provides an area and power efficient solution for what a number of buffers together will not be
able to achieve.
Clock latency
Definition of clock latency (clock insertion delay): In sequential designs, each timing
path is triggered by a clock signal that originates from a source. The flops being
triggered by the clock signal are known as sinks for the clock. In general, clock latency
(or clock insertion delay) is defined as the amount of time taken by the clock
signal in traveling from its source to the sinks. Clock latency comprises of two
components - clock source latency and clock network latency.
Source latency of clock (Source insertion delay): Source latency is defined as
the time taken by the clock signal in traversing from clock source (may be PLL,
oscillator or some other source) to the clock definition point. It is also known as source
insertion delay. It can be used to model off-chip clock latency when clock source is not
part of the chip itself.
Network latency of clock (Network insertion delay): Network latency is
defined as the time taken by the clock signal in traversing from clock definition point to
the sinks of the clock. Thus, each sink of the clock has a different network latency. If we
talk about the clock, it will have:
o Maximum network latency: Maximum of all the network latencies
o Minimum network latency: Minimum of all the network latencies
o Average network latency: Average of all the network latencies
Total clock latency is given as the sum of source latency and network latency. In other
words, total clock latency at a point is given as follows:
It is generally stated that for a robust clock tree, ‘sum of source latency and network
latency for all sinks of a clock should be equal’. If that is the case, the clock tree is
said to be balanced as this means that all the registers are getting clock at the same
time; i.e., clock skew is zero.
Figure 1 : Figure showing source latency and network latency components of clock latency
Figure 1 above shows the two components of clock latency, i.e. source latency and
network latency. Each flop (sink, in general) has its own latency since the path traced by
clock from source to it may be different. The above case may be found in block level
constraints in case of hierarchical designs wherein clock source is sitting outside the
block and clock signal enters the block through a block port. It may also represent a
case of a chip in which the clock source is sitting outside; e.g. some external device is
controlling the chip. In that case, clock source will be sitting inside that device.
How to specify clock latency: In EDA tools, we can model clock latency using SDC
command ‘set_clock_latency’ to imitate the behavior there will be after clock tree will be
built. Using this command, we can specify both the source latency for a clock as well as
the network latency. After clock tree has been built, the latency for the sinks is
calculated by the tool itself from the delays of various elements. However, in case the
clock source is sitting outside, it still needs to be modeled by source latency even after
the clock tree synthesis. To specify clock latency for clock signal named ‘CLK’, we may
use SDC command set_clock_latency:
First command will specify the network latency whereas the second command will
specify the source latency for CLK.
Also read:
Clock - the incharge of synchronous designs
Timing corners - dimensions in timing signoff
Ensuring glitch-free propagation of clock
Clock switching and clock gating checks
Can hold be frequency dependent
Many a times, two or more signals at analog-digital interface or at the chip interface
have some timing requirement with respect to each other. These requirements are
generally in the form of minimum skew and maximum skew. Data checks come to
between two arbitrary data signals, none of which is a clock. One of these is called
constrained pin, which is like data pin of the flop. The other is called related pin, which is
like clock pin of the flop. In the figure below, is shown two data signal at a boundary
(possibly analog hard macro) having some minimum skew requirement between them.
Figure 1 : Two signals arriving at a boundary having skew requirement
Data-to-data checks are zero cycle checks: An important difference between normal
setup check (between a clock signal and a data signal) and data-to-data checks is that
data-to-data checks are zero cycle checks while normal setup check is single cycle
check. When we say that data checks are zero-cycle checks, we mean that these are
between two data signals that have launched at the same clock edge with respect to
each other.
As shown in the figure (i) below, traditional setup check is between a data signal
launched at one clock edge and a clock. Since, the data is launched at one clock edge
and is checked with respect to one edge later, it is a single cycle check. On the other
hand, as shown in figure (ii), data-to-data check is between two signals both of which
are launched on the same clock edge. Hence, we can say data-to-data checks are zero
cycle checks.
Figure 2 : (i) Normal setup check between a data signal and a clock signal, (ii) Data to dat
setup check between two data signals
What command in EDA tools is used to model data-to-data checks: Data checks
Here, A is the related pin and B is the constrained pin. The first command constrains B
to toggle at least ‘x’ time before ‘A’. Second command constrains ‘B’ to toggle at least ‘y’
Data setup time and data hold time: Similar to setup time and hold time, we can
through .lib also. There are constructs that can be used to model data-to-data checks.
These
seq_hold_falling. These commands specify setup and hold data-to-data checks with
reference signal. ‘rise_constraint’ and ‘fall_constraint’ can be used inside these to model
the setup and hold checks for rising and falling edge of constrained signal. Figure 3
below shows an example of modeling data setup check through liberty file.
condition (where the order of arrival of two signals can affect output and the intention is
to get one of the probable outputs by constraining one signal to come before the other)
How data checks are useful: As we have seen above, data checks provide a
convenient measure to constrain two or more data signals with respect to each other.
Had these checks not been there, it would have been a manual effort to check the skew
between the two arriving signals and to maintain it. Also, it would not have been
possible to get the optimization done through implementation tool as these would not
As we discussed in data setup and data hold checks, data setup check of 200 ps means
that constrained data should come at least 200 ps before the reference data. Similarly,
data hold check of 200 ps constrains the constrained data to come at least 200 ps after
the reference data. The same is shown pictorially in figure 1(a) and 1(b).
Figure 1(a): Data setup check of 200 ps constrains the constrained signal to toggle at-least 200 ps before reference
signal toggles.
Figure 1(b): Data hold check of 200 ps constrains the constrained signal to toggle at-least 200 ps after the reference
signal has toggled.
Now, suppose you apply a data setup check of -200 ps instead of 200 ps. This would
mean that the constrained signal can toggle upto 200 ps after the reference signal.
Similarly, a data hold check of -200 ps would mean that the constrained signal can
toggle from 200 ps before the reference signal. If we apply both the checks together, it
would infer that constrained signal can toggle in a window that ranges from 200 ps
before the toggling of reference signal to 200 ps after the toggling of reference signal.
This is pictorially shown in figures 2(a) and 2(b).
Figure 2(a): Negative data setup and hold checks of 200 ps
If we combine the two checks, it implies that the constrained data can toggle upto 200
ps after and from 200 ps before the reference signal. In other words, we have
constrained the constrained signal to toggle in a window +- 200 ps within the reference
signal.
Coming to the given problem, if there are a number of signals required to toggle within a
window of 200 ps, we can consider one of these to act as reference signal and other
signals as constrained signals. The other signals can then be constrained in both setup
and hold with respect to reference signal such that all of these lie within +-100 ps of the
reference signal. The same is shown in figure 3 below:
One important characteristic of static timing analysis that must be discussed is that
static timing analysis checks the static delay requirements of the circuit without applying
any vectors, hence, the delays calculated are the maximum and minimum bounds of the
delays that will occur in real application scenarios with vectors applied. This enables the
static timing analysis to be fast and inclusive of all the boundary conditions. Dynamic
timing analysis, on the contrary, applies input vectors, so is very slow. It is necessary to
certify the functionality of the design. Thus, static timing analysis guarantees the timing
of the design whereas dynamic timing analysis guarantees functionality for real
application specific input vectors.
I hope you’ve found this post useful. Let me know what you think in the comments. I’d
love to hear from you all.
In present day designs, most of the paths (more than 95%) start from and end at flip-flops
(exceptions are there like paths starting from and/or ending at latches). There can be flops which
are positive edge triggered or negative edge triggered. Thus, depending upon the type of
launching flip-flop and capturing flip-flop, there can be 4 cases as discussed below:
1) Setup and hold checks for paths launching from positive edge-triggered flip-flop and
being captured at positive edge-triggered flip-flop (rise-to-rise checks): Figure 1 shows a
path being launched from a positive edge-triggered flop and being captured on a positive edge-
triggered flop. In this case, setup check is on the next rising edge and hold check is on the same
edge corresponding to the clock edge on which launching flop is launching the data.
Figure 1 : Timing path from positive edge flop to positive edge flop (rise to rise path)
Figure 2 below shows the setup and hold checks for positive edge-triggered register to positive
edge-triggered register in the form of waveform. As is shown, setup check occurs at the next
rising edge and hold check occurs at the same edge corresponding to the launch clock edge. For
this case setup timing equation can be given as:
Tck->q + Tprop + Tsetup < Tperiod + Tskew (for setup check)
Where
Tck->q : Clock-to-output delay of launch register
Tprop : Maximum delay of the combinational path between launch and capture register
Thold : Hold time requirement of capturing register
Tskew : skew between the two registers (Clock arrival at capture register - Clock arrival at
launch register)
Also, we show below the data valid and invalid windows. From this figure,
2) Setup and hold checks for paths launching from positive edge-triggered flip-flop and
being captured at negative edge-triggered flip-flop: In this case, both setup and hold check are
half cycle checks; setup being checked on the next falling edge at the capture flop and hold on
the previous falling edge of clock at the capture flop (data is launched at rising edge). Thus, with
respect to (case 1) above, setup check has become tight and hold check has relaxed.
Figure 4: Timing path from positive edge flop to negative edge flop (Rise-to-fall path)
Figure 5 below shows the setup and hold checks in the form of waveforms. As is shown, setup
check occurs at the next falling edge and hold check occurs at the previous falling edge
corresponding to the launch clock edge. The equation for setup check can be written, in this case,
as:
Tck->q + Tprop + Tsetup < (Tperiod/2) + Tskew (for setup check)
Also, we show below the data valid and invalid windows. From this figure,
As we can see, the data valid window is spread evenly on both sides of launch clock edge.
3) Setup and hold checks for paths launching from negative edge-triggered flip-flop
and being captured at positive edge-triggered flip-flop (rise-to-fall paths): This case is
similar to case 2; i.e. both setup and hold checks are half cycle checks. Data is launched on
negative edge of the clock, setup is checked on the next rising edge and hold on previous rising
edge of the clock.
Figure 7: Timing path from negative edge flop to positive edge flop (fall-to-rise path)
Figure 8 below shows the setup and hold checks in the form of waveforms. As is shown, setup
check occurs at the next rising edge and hold check occurs at the previous rising edge
corresponding to the launch clock edge. The equation for setup check can be written, in this case, as:
Also, we show below the data valid and invalid windows. From this figure,
In this case too, data valid window spreads evenly on both the sides of launch clock edge.
Figure 9: Figure showing data valid window for fall-to-rise path
4) Setup and hold checks for paths launching from negative edge-triggered flip-flop
and being captured at negative edge-triggered flip-flop (fall-to-fall paths): The interpretation
of this case is similar to case 1. Both launch and capture of data happen at negative edge of the
clock. Figure 10 shows a path being launched from a negative edge-triggered flop and being
captured on a negative edge-triggered flop. In this case, setup check is on the next falling edge
and hold check is on the same edge corresponding to the clock edge on which launching flop is
launching the data.
Figure 10: Path from negative edge flop to negative edge flop (fall to fall path)
Figure below shows the setup and hold checks in the form of waveforms. As is shown, setup
check occurs at the next falling edge and hold check occurs at the same edge corresponding to
the launch clock edge.
The equation for setup check can be given as:
Tck->q + Tprop + Tsetup < Tperiod + Tskew (for setup check)
Also, we show below the data valid and invalid windows. From this figure,
In the post (Setup and hold – basics of timinganalysis), we introduced setup and hold timing
requirements and also discussed why these requirements are needed to be applied. In this
post, we will be discussing how these checks are applied for different cases for paths starting
Present day designs are focused mainly on the paths between flip-flops as the elements
dominating in them are flip-flops. But there are also some level-sensitive elements involved in
data transfer in current-day designs. So, we need to have knowledge of setup and hold checks
for flop-to-latch and latch-to-flop paths too. In this post, we will be discussing the former case.
In totality, there can be total 4 cases involved in flop-to-latch paths as discussed below:
1) Paths launching from positive edge-triggered flip-flop and being captured at positive
level-sensitive latch: Figure 1 shows a path being launched from a positive edge-triggered flop
and being captured on a positive level-sensitive latch. In this case, setup check is on the same
rising edge (without time borrow) and next falling edge (with time borrow) and hold check on
the previous falling edge with respect to the edge at which data is launched by the launching
flop.
Figure 1: Timing path from positive edge flop to positive level latch
Figure below shows the waveforms for setup and hold checks in case of paths starting from a
positive edge triggered flip-flop and ending at a positive level sensitive latch. As can be figured
level-sensitive latch: Figure 3 shows a path starting from positive edge-triggered flip-flop and
being captured at a negative level sensitive latch. In this case, setup check is on the next falling
edge (without time borrow) and on next positive edge (with time borrow). Hold check on the
same edge with respect to the edge at which data is launched (zero cycle hold check).
Figure below shows the waveforms for setup and hold checks in case of paths starting from a
positive edge triggered flip-flop and ending at a negative level-sensitive latch. As can be figured
out, setup and hold check equations can be described as:
level-sensitive latch: Figure 5 shows a path starting from negative edge-triggered flip-flop and
being captured at a positive level sensitive latch. In this case, setup check is on the next rising
edge (without time borrow) and next falling edge (with time borrow). Hold check on the next
Figure below shows the waveforms for setup and hold checks in case of paths starting from a
negative edge triggered flip-flop and ending at a positive level-sensitive latch. As can be figured
out, setup and hold check equations can be described as:
negative level-sensitive latch: Figure 5 shows a path starting from negative edge-triggered flip-
flop and being captured at a negative level sensitive latch. In this case, setup check is on the
same edge (without time borrow) and on next rising edge (with time borrow). Hold check on
the next rising edge with respect to the edge at which data is launched.
Figure below shows the waveforms for setup and hold checks in case of paths starting
from a negative edge triggered flip-flop and ending at a negative level-sensitive latch.
As can be figured out, setup and hold check equations can be described as:
Figure 2: Setup and hold check waveform for positive latch to positive register timing path
2. Positive level-sensitive latch to negative edge-triggered register: Figure 3 below
shows a timing path from a positive level-sensitive latch to negative edge-triggered
register. In this case, setup check will be half cycle with half cycle hold check. Time
borrowed by previous stage will be subtracted from the present stage.
Figure 3: A timing path from positive level-sensitive latch to negative edge-triggered register
Timing waveforms corresponding to setup check and hold check for timing path starting
from positive level-sensitive latch and ending at negative edge-triggered register is
shown in figure 4 below:
Figure 4: Setup and hold check waveform for timing path from positive latch to negative register
Figure 5: Timing path from negative level-sensitive latch to positive edge-triggered register
Timing waveforms for path from negative level-sensitive latch to positive edge-triggered
flop are shown in figure 6 below:
Figure 6: Waveform for setup check and hold check corresponding to timing path from negative latch to positive
flop
Figure 8: Timing waveform for path from negative latch to negative flip-flop
Clock signals occupy a very important place throughout the chip design stages. Since,
the state transition happens on clock transition, the entire simulations including
verification, static timing analysis and gate level simulations roam around these clock
signals only. If Static timing analysis can be considered as a body, then clock is its
blood. Also, during physical implementation of the design, special care has to be given
to the placement and routing of clock elements, otherwise the design is likely to fail.
Clock elements are responsible for almost half the dynamic power consumption in the
single clock source, while some designs have multiple clock sources. The clock signal is
distributed in the design in the form of a tree; leafs of the tree being analogous to the
sequential devices being triggered by the clock signal and the root being analogous to
the clock source. That is why; the distribution of clock in the design is termed as clock
tree. Normally, (except for sometimes when intentional skew is introduced to cater
some timing critical paths), the clock tree is designed in such a way that it is balanced.
By balanced clock tree, we mean that the clock signal reaches each and every element
of the design almost at the same time. Clock tree synthesis (placing and routing clock
tree elements) is an important step in the implementation process. Special cells and
Clock domains: By clock domain, we mean ‘the set of flops being driven by the clock
signal’. For instance, the flops driven by system clock constitute system domain.
Similarly, there may be other domains. There may be multiple clock domains in a
design; some of these may be interacting with each other. For interacting clock
domains, there must be some phase relationship between the clock signals otherwise
metastability failure.
specify a signal as a clock. We have to pass the period of the clock, clock definition
point, its reference clock (if it is a generated clock as discussed below), duty cycle,
Master and generated clocks: EDA tools have the concept of master and generated
clocks. A generated clock is the one that is derived from another clock, known as its
master clock. The generated clock may be of the same frequency or different frequency
Some terminology related to clock: There are different terms related to clock signals,
described below:
Leading and trailing clock edge: When clock signal transitions from ‘0’ to ‘1’,
the clock edge is termed as leading edge. Similarly, when clock signal transitions from
‘1’ to ‘0’, the clock edge is termed as trailing edge.
Launch and capture edge: Launch edge is that edge of the clock at which data
is launched by a flop. Similarly, capture edge is that edge of the clock at which data is
capture by a flop.
Clock skew: Clock skew is defined as the difference in arrival times of clock
signals at different leaf pins. Considering a set of flops, skew is the difference in the
minimum and maximum arrival times of the clock signal. Global skew is the clock skew
for the whole design. On the contrary, considering only a portion of the design, the skew
is termed as local skew.
Synchronizers(dneeee)
Modern VLSI designs have very complex architectures and multiple clock sources.
Multiple clock domains interact within the chip. Also, there are interfaces that connect
the chip to the outside world. If these different clock domains are not properly
synchronized, metastability events are bound to happen and may result in chip failure.
signals crossing clock domain boundaries and are must where two clock domains
metastability happen is an art. For systems with only one clock domain, synchronizers
signals cause catastrophic metastability failures when introduced into a clock domain.
So, the first thing that arises in one’s mind is to find ways to avoid metastability failures.
But the fact is that metastability cannot be avoided. We have to learn to tolerate
circuit that converts an asynchronous signal/a signal from a different clock domain into
the recipient clock domain so that it can be captured without introducing any
huge factor. Thus, a good synchronizer must be reliable with high MTBF, should have
low latency from source to destination domain and should have low area/power impact.
When a signal is passed from one clock domain to another clock domain, the circuit that
receives the signal needs to synchronize it. Whatever metastability effects are to be
caused due to this, have to be absorbed in synchronizer circuit only. Thus, the purpose
of synchronizer is to prevent the downstream logic from metastable state of first flip-flop
Flip-flop based synchronizer (Two flip-flop synchronizer): This is the most simple
and most common synchronization scheme and consists of two or more flip-flops in
chain working on the destination clock domain. This approach allows for an entire clock
period for the first flop to resolve metastability. Let us consider the simplest case of a
flip-flop synchronizer with 2 flops as shown in figure. Here, Q2 goes high 1 or 2 cycles
As said earlier, the two flop synchronizer converts a signal from source clock domain to
destination clock domain. The input to the first stage is asynchronous to the destination
clock. So, the output of first stage (Q1) might go metastable from time to time. However,
as long as metastability is resolved before next clock edge, the output of second stage
(Q2) should have valid logic levels (as shown in figure 3). Thus, the asynchronous
second stage. In that case, output of second stage will also go metastable. If the
probability of this event is high, then you need to consider having a three stage
synchronizer.
The two flops should be placed as close as possible to allow the metastability at
first stage output maximum time to get resolved. Therefore, some ASIC libraries have
built in synchronizer stages. These have better MTBF and have very large flops used,
hence, consume more power.
Two flop synchronizer is the most basic design all other synchronizers are based
upon.
Source domain signal is expected to remain stable for minimum two destination
clock cycles so that first stage is guaranteed to sample it on second clock edge. In
some cases, it is not possible even to predict the destination domain frequency. In such
cases, handshaking mechanism may be used.
The two flop synchronizer must be used to synchronize a single bit data only.
Using multiple two flops synchronizers to synchronize multi-bit data may lead to
catastrophic results as some bits might pass through in first cycle; others in second
cycle. Thus, the destination domain FSM may go in some undesired state.
Another practice that is forbidden is to synchronize same bit by two different
synchronizers. This may lead to one of these becoming 0, other becoming 1 leading into
inconsistent state.
The two stages in flop synchronizers are not enough for very high speed clocks
as MTBF becomes significantly low (eg. In processors, where clocks run in excess of 1
GHz). In such cases, adding one extra stage will help.
MTBF decreases almost linearly with the number of synchronizers in the system.
Thus, if your system uses 1000 synchronizers, each of these must be designed with
atleast 1000 times more MTBF than the actual reliability target.
only when there is one bit of data transfer between the two clock domains and the result
The solution for this is to implement handshaking based synchronization where the
data on the ‘REQ’ signal. When it goes high, receiver knows data is stable on bus and it
is safe to sample the data. After sampling, the receiver asserts ‘ACK’ signal. This signal
is synchronized to the source domain and informs the sender that data has been
sampled successfully and it may send a new data. Handshaking based synchronizers
offer a good reliable communication but reduce data transmission bandwidth as it takes
effectively communicate with each other when response time of one or both circuits is
unpredictable.
Handshaking protocol based synchronization technique
4.) Receiver deasserts ACK to inform the sender that it is ready to accept another
hazardous if used to synchronize data which is more than 1-bit in width. In such
situations, we may use mux-based synchronization scheme. In this, the source domain
sends an enable signal indicating that it the data has been changed. This enable is
synchronized to the destination domain using the two flop synchronizer. This
synchronized signal acts as an enable signal indicating that the data on data bus from
source is stable and destination domain may latch the data. As shown in the figure, two
Two clock FIFO synchronizer: FIFO synchronizers are the most common fast
synchronizers used in the VLSI industry. There is a ‘cyclic buffer’ (dual port RAM) that
is written into by the data coming from the source domain and read by the destination
domain. There are two pointers maintained; one corresponding to write, other pointing
to read. These pointers are used by these two domains to conclude whether the FIFO is
empty or full. For doing this, the two pointers (from different clock domains) must be
compared. Thus, write pointer has to be synchronized to receive clock and vice-versa.
Thus, it is not data, but pointers that are synchronized. FIFO based synchronization is
used when there is need for speed matching or data width matching. In case of speed
matching, the faster port of FIFO normally handles burst transfers while slower part
handles constant rate transfers. In FIFO based synchronization, average rate into and
out of the FIFO are same in spite of different access speeds and types.
Integrated circuits are designed to work for a range of temperatures and voltages, and
not just for a single temperature and voltage. These have to work under different
environmental conditions and different electrical setup and user environments. For
instance, the temperature in the internals of an automobile may reach as high as 150
degrees while operating. Also, automobiles may have to work in colder regions where
temperatures may reach -40 degrees during winters. So, a chip designed for
automobiles has to be designed so as to be able to work in temperatures ranging from -
40 to 150 degree Celsius. On the other hand, consumer electronics may have to work in
the range of -20 to +40 degrees only. Thus, depending upon the application, the chip
has to be robust enough to handle varying surrounding temperatures. Not just
surrounding temperatures, the voltage supplied by the voltage source may vary. The
battery may have an output voltage range. Also, the voltage regulator sitting inside or
outside the chip may have some inaccuracy range defined. Let us say, a SoC has a
nominal operating voltage of 1.2V with 10% variation. Thus, it can operate at any
voltage from 1.08 V to 1.32V. The integrated circuits have to be tolerable enough to
handle these variations. Not just these variations, the process by which the integrated
circuits are manufactured has variations due to its micro nature. For example, while
performing etching, the depth of etching may vary from wafer to wafer, and from die to
die. Not just that, there may be intra chip process variations too. For instance, an AND
gate may be placed inside an area of the chip where the signal density is very high. It
will behave differently from an isolated AND gate. Depending upon these, the behavior
(delay, static and dynamic power consumption etc) of cells on chip vary. These
variations are together referred as PVT (Process Voltage Temperature) variations. The
behavior of the devices also varies according to the PVT variations. The library (liberty)
models of the cells are characterized for cell delays, transitions, static and dynamic
power corresponding to different PVT combinations. Not just for cells, for nets too, these
variations are possible. The net parameters (resistance, capacitance and inductance)
may also vary. These parameters also account for cell delay. In addition, nets introduce
delay of their own too. Hence, one may get nets with high or less delay. So, these
variations also have to be taken into account for robust integrated circuit manufacture.
This variation in net characteristic can be modeled as their RC variation as it accounts
for changes in resistance and capacitance (ignoring inductance) of net.
Figure 1: A racing car. (Taken from en.wikipedia.com)
With proper techniques, the patterns of the variations for both the cell and net
parameters (delay, power, resistance and capacitance) are characterized and their
minima and maxima are recorded. Each minima and maxima can be termed as a
corner. Let us say, each minima/maxima in cell characteristics as ‘PVT corner’ and net
characteristics as ‘extraction corner’. Each combination of PVT extraction corners is
referred to as a ‘timing corner’ as it represents a point where timing will be extreme.
There is an assumption that if the setup and hold conditions are met for the design at
these corners, these will be met at intermediate points and it will be safe to run under all
conditions. This is true in most of the cases, not always. There is always a trade-off
between number of signed-off corners and the sign-off quality.
For bigger technologies, say 250 nm, only two corners used to be sufficient, one that
showed maximum cell delay and the other that showed least cell delay. Net variations
could be ignored for such technologies. In all, there used to be 2 PVT and 1 extraction
corner. As we go down technology nodes, net variations start coming into picture. Also,
cell characteristics do not show a linear behavior. Therefore, there is increased number
of PVT as well as extraction corners for lower technology nodes. For 28 nm, say, there
can be 8 PVT corners as well 8 extraction corners. The number of corners differs from
foundry to foundry. The chip has to be signed off in each and every corner to ensure it
works in every corner. However, we may choose to sign-off in lesser corners with
applying some extra uncertainty as margin in lieu of not signing off at these timing
corners. The timing analyst needs to decide what is appropriate depending upon the
resources and schedule.
1) Buffers/inverters: Since, there is only one input for a buffer/inverter, the glitch may
occur on the output of these gates only through coupling with other signals in the
vicinity. If we ensure that the buffer/inverter has good drive strength and that the load
and transition at its output are under a certain limit, we can be certain that the glitch will
not occur.
i) The other input is static: By static, we mean the other input will not change on the fly.
In other words, whenever the enable will change, the clock will be off. So, enable will
not cause the waveform at the output of the gate to change. This case is similar to a
buffer/inverter as the other input will not cause the shape of the output pulse to change.
ii) The other input is toggling: In this case, the enable might affect the waveform at the
output of the gate to change. To ensure that there is not glitch causes by this, there are
certain requirements related to skew between data and clock to be met, which will be
discussed later in the text. These requirements are termed as clock gating checks.
3 3) Sequential gates: There may also be sequential gates in clock path, say, a flop, a
latch or an integrated clock gating cell with the clock at its clock input and the enable for
the clock will be coming at its data input. The output of these cells will be a clock pulse.
For these also, two cases are possible as in case 2. In other words, if the enable
changes when clock is off, the enable is said to be static. In that case, the output either
has clock or does not have clock. On the other hand, if the input is toggling while clock
is there at the input, we may get away by meeting the setup and hold checks for the
enable signal with respect to clock input.
As discussed above, to ensure a glitch free propagation of clock at the output of the
combinational gates, we have to ensure some timing requirements between the enable
signal and clock. These timing requirements ensure that there is no functionally
unwanted pulse in clock path. If we ensure these timing requirements are met, there will
be no functional glitch in clock path. However, glitches due to crosstalk between signals
can still occur. There are other techniques to prevent glitches due to crosstalk. The
functional glitches in clock path can be prevented by ensuring the above discussed
timing requirements. In STA, these requirements are enforced on designs through
timing checks known as clock gating checks. By ensuring these checks are applied
and taken care of properly, an STA engineer can sign-off for functional glitches. In later
posts, we will be dealing with these checks in more details.
Definition of clock gating check: A clock gating check is a constraint, either applied or
inferred automatically by tool, that ensures that the clock will propagate without any
glitch through the gate.
Types of clock gating checks: Fundamentally, all clock gating checks can be
categorized into two types:
AND type clock gating check: Let us say we have a 2-input AND gate in which one of
the inputs has a clock and the other input has a data which will toggle while the clock is
still on.
Since, the clock is free-running, we have to ensure that the change of state of enable
signal does not cause the output of the AND gate to toggle. This is only possible if the
enable input toggles when clock is at ‘0’ state. As is shown in figure 3 below, if ‘EN’
toggles when ‘CLK_IN’ is high, the clock pulse gets clipped. In other words, we do not
get full duty cycle of the clock. Thus, this is a functional architectural miss causing glitch
in clock path. As is evident in figure 4, if ‘EN’ changes during ‘CLK_IN’ are low, there is
no change in clock duty cycle. Hence, this is the right way to gate a clock signal with an
enable signal; i.e. make the enable toggle only when clock is low.
Figure 3: Clock being clipped when ‘EN’ changes when ‘CLK_IN’ is high
Figure 4: Clock waveform not being altered when ‘EN’ changes when ‘CLK_IN’ is low
Theoretically, ‘EN’ can launch from either positive edge-triggered or negative edge-
triggered flops. In case ‘EN’ is launched by a positive edge-triggered flop, the setup and
hold checks will be as shown in figure 5. As shown, setup check in this case is on the
next positive edge and hold check is on next negative edge. However, the ratio of
maximum and minimum delays of cells in extreme operating conditions may be as high
as 3. So, architecturally, this situation is not possible to guarantee the clock to pass
under all conditions.
Figure 5: Clock gating setup and hold checks on AND gate when 'EN' launches from a positive edge-triggered flip-
flop
On the contrary, if ‘EN’ launches from a negative edge-triggered flip-flop, setup check
are formed with respect to the next rising edge and hold check is on the same falling
edge (zero-cycle) as that of the launch edge. The same is shown in figure 6. Since, in
this case, hold check is 0 cycle, both the checks are possible to be met for all operating
conditions; hence, this solution will guarantee the clock to pass under all operating
condition provided the setup check is met for worst case condition. The inactive clock
state, as evident, in this case, is '0'.
Figure 6: Clock gating setup and hold checks on AND gate when ‘EN’ launches from negative edge-triggered flip-
flop
OR type clock gating check: Similarly, since the off-state of OR gate is 1, the enable
for an OR type clock gating check can change only when the clock is at ‘1’ state. That
is, we have to ensure that the change of state of enable signal does not cause the
output of the OR gate to toggle. Figure 8 below shows if ‘EN’ toggles when ‘CLK_IN’ is
high, there is no change in duty cycle. However, if ‘EN’ toggles when ‘CLK_IN’ is low
(figure 9), the clock pulse gets clipped. Thus, ‘EN’ must be allowed to toggle only when
‘CLK_IN’ is high.
Figure 8: Clock being clipped when 'EN' changes when 'CLK_IN' is low
Figure 9: Clock waveform not being altered when 'EN' changes when 'CLK_IN' is low
As in case of AND gate, here also, ‘EN’ can launch from either positive or negative edge
flops. In case ‘EN’ launches from negative edge-triggered flop, the setup and hold
checks will be as shown in the figure 10. The setup check is on the next negative edge
and hold check is on the next positive edge. As discussed earlier, it cannot guarantee
the glitch less propagation of clock.
Figure 10: Clock gating setup and hold checks on OR gate when ‘EN’ launches from negative edge-triggered flip-
flop
If ‘EN’ launches from a positive edge-triggered flip-flop, setup check is with respect to
next falling edge and hold check is on the same rising edge as that of the launch edge.
The same is shown in figure 11. Since, the hold check is 0 cycle, both setup and hold
checks are guaranteed to be met under all operating conditions provided the path has
been optimized to meet setup check for worst case condition. The inactive clock state,
evidently, in this case, is '1'.
Figure 11: Clock gating setup and hold checks on OR gate when 'EN' launches from a positive edge-
triggered flip-flop
We have, thus far, discussed two fundamental types of clock gating checks. There may
be complex combinational cells other than 2-input AND or OR gates. However, for these
cells, too, the checks we have to meet between the clock and enable pins will be of the
above two types only. If the enable can change during low phase of the clock only, it is
said to be AND type clock gating check and vice-versa.
SDC command for application of clock gating checks: In STA, clock gating checks
can be applied with the help of SDC command set_clock_gating_check.
Figure 1: MUX with Data as select dynamically selecting the clock signal to propagate to output
This scenario is shown in figure 1 above. This situation normally arises when ‘Data’ acts as
clock select and dynamically selects which of the two clocks will propagate to the output. The
function of the MUX is given as:
CLK_OUT = Data.CLK1 + Data’.CLK2
The internal structure (in terms of basic gates) is as shown below in figure 2.
1. 2. Between CLK2 and Data: This scenario also follows scenario '1'. And the
type of clock gating check formed will be determined by the state of inactive clock.
2.
1.
1.
2. Thus, the type of clock gating check to be applied, in this case, depends
upon the inactive state of the other clock. If it is '0', AND-type check will be
formed. On the other hand, if it is '1', OR-type check will be formed.
Case 2: Clock signal is at select line. This situation is most common in case of Mux-based
configurable clock dividers wherein output clock waveform is a function of the two data values.
Figure 3: Combination of Data1 and Data2 determines if CLK or CLK' will propagate to the output
In this case too, there will be two kinds of clock gating checks formed:
i) Between CLK and Data1: Here, both CLK and Data1 are input to a 2-input AND gate, hence,
there will be AND type check between CLK and Data1. The following SDC command will
serve the purpose:
set_clock_gating_check -high 0.1 [get_pins MUX/Data1]
The above command will constrain an AND-type clock gating check of 100 ps on Data1 pin.
ii) Between CLK and Data2: As is evident from figure 4, there will be AND type check between
CLK’ and Data2. This means Data2 can change only when CLK’ is low. In other words, Data2
can change only when CLK is high. This means there is OR type check between CLK and
Data2. The following command will do the job:
set_clock_gating_check -low 0.1 [get_pins MUX/Data2]
The above command will constrain an OR-type clock gating check of 100 ps on Data2 pin.
Thus, we have discussed how there are clock gating checks formed between different
signals of a MUX.
Similarly, for other types of synchronizers as well, you can specify false paths.
False paths for static signals arising due to merging of modes: Suppose you have a structure
as shown in figure 1 below. You have two modes, and the path to multiplexer output is different
depending upon the mode. However, in order to cover timing for both the modes, you have to
keep the “Mode select bit” unconstrained. This result in paths being formed through multiplexer
select also. You can specify "set false path" through select of multiplexer as this will be static in
both the modes, if there are no special timing requirements related to mode transition on this
signal. Specifically speaking, for the scenario shown in figure 1,
Architectural false paths: There are some timing paths that are never possible to occur. Let us
illustrate with the help of a hypothetical, but very simplistic example that will help understand
the scenario. Suppose we have a scenario in which the select signals of two 2:1 multiplexers are
tied to same signal. Thus, there cannot be a scenario where data through in0 pin of MUX0 can
traverse through in1 pin of MUX1. Hence, it is a false path by design architecture. Figure 3
below depicts the scenario.
Specifying false path: The SDC command to specify a timing path as false path is "set_false_path".
We can apply false path in following cases:
From register to register paths
o set_false_path -from regA -to regB
o
Paths being launched from one clock and being captured at another
o set_false_path -from [get_clocks clk1] -to [get_clocks clk2]
o
Through a signal
o set_false_path -through [get_pins AND1/B]
Multicycle paths handling in STA(doneeeee)
In the post Multicycle paths - the architectural perspective, we discussed about the
architectural aspects of multicycle paths. In this post, we will discuss how multicycle
paths are handling in backend optimization and timing analysis:
How multi-cycle paths are handled in STA: By default, in STA, all the timing paths
are considered to have default setup and hold timings; i.e., all the timing paths should
be covered in either half cycle or single cycle depending upon the nature of path
(see setup-hold checks part 1 and setup-hold checks part 2 for reference). However, it
is possible to convey the information to STA engine regarding a path being multi-cycle.
There is an SDC command "set_multicycle_path" for the same. Let us elaborate it with
the help of an example:
Above command will shift both setup and hold checks forward by two cycles. That is,
setup check will now become 3 cycle check and hold will be 2 cycle check as shown in
blue in figure 4. This is because, by default, STA engine considers hold check one
active edge prior to setup check, which, in this case, is after 3 cycles.
Figure 4: Setup and hold checks before and after applying multicyle for setup-only
However, this is not the desired scenario in most of the cases. As we discussed earlier,
multi-cycle paths are achieved by either gating the clock path or data path for required
number of cycles. So, the required hold check in most cases is 0 cycle. This is done
through same command with switch "-hold" telling the STA engine to pull hold back to
zero cycle check.
The above command will bring back the hold check 2 cycles back to zero cycle. This is
as shown in figure 5 in blue.
Figure 5: Setup and hold checks after applying multi-cycle exceptions for both setup and hold
Setting a multi-cycle path for setup affects the hold check by same number of
cycles as setup check in the same direction. However, applying a multi-cycle path
for hold check does not affect setup check.
So, in the above example, both the statements combined will give the desired setup and
hold checks. Please note that there might be a case where only setup or hold multi-
cycle is sufficient, but that is the need of the design and depends on how FSM has been
modeled.
What if both clock periods are not equal: In the above example, for simplicity, we
assumed that launch and capture clock periods are equal. However, this may not be
true always. As discussed in multicycle path - the architectural perspective, it makes
more sense to have multi-cycle paths where there is a difference in clock periods. The
setup and hold checks for multicycle paths is not as simple in this case as it was when
we considered both the clocks to be of same frequency. Let us consider a case where
launch clock period is twice the capture clock period as shown in figure 6 below.
Figure 6: Default setup and hold checks for case where capture clock period is half that of launch clock
Now, the question is, defining a multi-cycle path, what clock period will be added to the
setup check, launch or capture? The answer depends upon the architecture and FSM of
the design. Once you know it, the same can be modelled in timing constraints. There is
a switch in the SDC command to provide for which of the clock periods is to be added.
"set_multicycle_path -start" means that the path is a multi-cycle for that many cycles of
launch clock. Similarly, "set_multicycle_path -end" means that the path is a multicycle
for that many cycles of capture clock. Let the above given path be a multicycle of 2. Let
us see below how it changes with -start and -end options.
Figure 8: Setup and hold checks with -start option provided with set_multicycle_path
Figure 9: Setup and hold checks with -end option provided with set_multicycle_path
Why is it important to apply multi-cycle paths: To achieve optimum area, power and
timing, all the timing paths must be timed at the desired frequencies. Optimization
engine will know about a path being multicycle only when it is told through SDC
commands in timing constraints. If we dont specify a multicycle path as multicycle,
optimization engine will consider it as a single cycle path and will try to use bigger drive
strength cells to meet timing. This will result in more area and power; hence, more cost.
So, all multicycle paths must be correctly specified as multicycle paths during timing
optimization and timing analysis.
Multicycle paths : The architectural perspective
Why multi-cycle paths are introduced in designs: A typical System on Chip consists
of many components working in tandem. Each of these works on different frequencies
depending upon performance and other requirements. Ideally, the designer would want
the maximum throughput possible from each component in design with paying proper
respect to power, timing and area constraints. The designer may think to introduce
multi-cycle paths in the design in one of the following scenarios:
1) Very large data-path limiting the frequency of entire component: Let us take a
hypothetical case in which one of the components is to be designed to work at 500
MHz; however, one of the data-paths is too large to work at this frequency. Let us say,
minimum the data-path under consideration can take is 3 ns. Thus, if we assume all the
paths as single cycle, the component cannot work at more than 333 MHz; however, if
we ignore this path, the rest of the design can attain 500 MHz without much difficulty.
Thus, we can sacrifice this path only so that the rest of the component will work at 500
MHz. In that case, we can make that particular path as a multi-cycle path so that it will
work at 250 MHz sacrificing the performance for that one path only.
2) Paths starting from slow clock and ending at fast clock: For simplicity, let us
suppose there is a data-path involving one start-point and one end point with the start-
point receiving clock that is half in frequency to that of the end point. Now, the start-
point can only send the data at half the rate than the end point can receive. Therefore,
there is no gain in running the end-point at double the clock frequency. Also, since, the
data is launched once only two cycles, we can modify the architecture such that the
data is received after a gap of one cycle. In other words, instead of single cycle data-
path, we can afford a two cycle data-path in such a case. This will actually save power
as the data-path now has two cycles to traverse to the endpoint. So, less drive strength
cells with less area and power can be used. Also, if the multi-cycle has been
implemented through clock enable (discussed later), clock power will also be saved.
Now let us extend this discussion to the case wherein the launch clock is half in
frequency to the capture clock. Let us say, Enable changes once every two cycles.
Here, the intention is to make the data-path a multi-cycle of 2 relative to faster clock
(capture clock here). As is evident from the figure below, it is important to have Enable
signal take proper waveform as on the waveform on right hand side of figure 2. In this
case, the setup check will be two cycles of capture clock and hold check will be 0 cycle.
Figure 2: Introducing multi-cycle path where launch clock is half in frequency to capture clock
2) Through gating in clock path: Similarly, we can make the capturing flop capture
data once every few cycles by clipping the clock. In other words, send only those pulses
of clock to the capturing flip-flop at which you want the data to be captured. This can be
done similar to data-path masking as discussed in point 1 with the only difference being
that the enable will be masking the clock signal going to the capturing flop. This kind of
gating is more advantageous in terms of power saving. Since, the capturing flip-flop
does not get clock signal, so we save some power too.
Figure 3: Introducing multi cycle paths through gating the clock path
Figure 3 above shows how multicycle paths can be achieved with the help of clock
gating. The enable signal, in this case, launches from negative edge-triggered register
due to architectural reasons (read here). With the enable waveform as shown in figure
3, flop will get clock pulse once in every four cycles. Thus, we can have a multicycle
path of 4 cycles from launch to capture. The setup check and hold check, in this case, is
also shown in figure 3. The setup check will be a 4 cycle check, whereas hold check will
be a zero cycle check.
Propagation Delay
What is propagation delay: Propagation delay of a logic gate is defined as the time it
takes for the effect of change in input to be visible at the output. In other words,
propagation delay is the time required for the input to be propagated to the output.
Normally, it is defined as the difference between the times when the transitioning input
reaches 50% of its final value to the time when the output reaches 50% of the final
value showing the effect of input change. Here, 50% is the defined as the logic
threshold where output (or, in particular, any signal) is assumed to switch its states.
Propagation delay example: Let us consider a 2-input AND gate as shown in figure 1,
with input ‘I2’ making transition from logic ‘0’ to logic ‘1’ and 'I1' being stable at logic
value '1'. In effect, it will cause the output ‘O’ also to make a transition. The output will
not show the effect immediately, but after certain time interval. The timing diagram for
the transitions are also shown. The propagation delay, in this case, will be the time
interval between I2 reaching 50% while rising to 'O' reaching 50% mark while rising as a
result of 'I2' making a transition. The propagation delay is labeled as “T P” in figure 2.
Figure 2: Propagation delay
On what factors propagation delay depends: The propagation delay of a logic gate is
not a constant value, but is dependent upon two factors:
1. Transition time of the input causing transition at the output: More the
transition time at the input, more will be the propagation delay of the cell. For less
propagation delays, the signals should switch faster.
2. The output load being felt by the logic gate: Greater is the capacitive
load sitting at the output of the cell, more will be the effort put (time taken) to
charge it. Hence, greater is the propagation delay.
How Propagation delay of logic gates is calculated: In physical design tools,
there can be following sources of calculation of propagation delay:
Liberty file: Liberty file contains a lookup table for the each input-to-output path
(also called as cell arc) for logic gates as .lib models. The table contains values for
different input transition times and output loads corresponding to cell delay. Depending
upon the input transition and output load that is present in the design for the logic gate
under consideration, physical design tools interpolate between these values and
calculate the cell delay.
SDF file: SDF (Standard Delay Format) is the extracted delay information of a
design. The current delay information, as calculated, can be dumped into SDF file. It
can, then, be read back. In case SDF is read, delays are not calculated and SDF delays
are given precedence.
Output transition time: The output transition time is also governed by the same two
factors as propagation delay. In other words, larger transition time and load increase the
transition time of the signal at the output of the logic gate. So, for better transition times,
both of these should be less.
Under all of the above mentioned conditions, the output is expected to transition faster
than the input signal, and can result in negative propagation delay. An example
negative delay scenario is shown in the figure below. The output signal starts to change
only after the input signal; however, the faster transition of the output signal causes it to
attain 50% level before input signal, thus, resulting in negative propagation delay. In
other words, negative delay is a relative concept.
Figure 1: Input and output transitions showing negative input delay
However, even though the timing path as shown through A pin, the resultant slew at
output SLEW_OUT will be calculated as:
One may feel this as an over-pessimism inserted by timing analysis tool. Path based
timing analysis will not have worst slew propagation phenomenon as it calculates output
slew for each timing path rather than one slew per node.
Similarly, for performing timing analysis for hold violations, the best of the slews at
inputs is propagated to the output as mentioned before also.
Also read:
On-chip variations – the STA takeaway(write)
Static timing analysis of a design is performed to estimate its working frequency after
the design has been fabricated. Nominal delays of the logic gates as per
characterization are calculated and some pessimism is applied above that to see if
there will be any setup and/or hold violation at the target frequency. However, all the
transistors manufactured are not alike. Also, not all the transistors receive the same
voltage and are at same temperature. The characterized delay is just the delay of
which there is maximum probability. The delay variation of a typical sample of
transistors on silicon follows the curve as shown in figure 1. As is shown, most of the
transistors have nominal characteristics. Typically, timing signoff is carried out with
some margin. By doing this, the designer is trying to ensure that more number of
transistors are covered. There is direct relationship between the margin and yield.
Greater the margin taken, larger is the yield. However, after a certain point, there is not
much increase in yield by increasing margins. In that case, it adds more cost to the
designer than it saves by increase in yield. Therefore, margins should be applied so as
to give maximum profits.
We have discussed above how variations in characteristics of transistors are taken care
of in STA. These variations in transistors’ characteristics as fabricated on silicon are
known as OCV (On-Chip Variations). The reason for OCV, as discussed above also, is
that all transistors on-chip are not alike in geometry, in their surroundings, and position
with respect to power supply. The variations are mainly caused by three factors:
Process variations: The process of fabrication includes diffusion, drawing out of
metal wires, gate drawing etc. The diffusion density is not uniform throughout wafer.
Also, the width of metal wire is not constant. Let us say, the width is 1um +- 20 nm. So,
the metal delays are bound to be within a range rather than a single value. Similarly,
diffusion regions for all transistors will not have exactly same diffusion concentrations.
So, all transistors are expected to have somewhat different characteristics.
Voltage variation: Power is distributed to all transistors on the chip with the help
of a power grid. The power grid has its own resistance and capacitance. So, there is
voltage drop along the power grid. Those transistors situated close to power source (or
those having lesser resistive paths from power source) receive larger voltage as
compared to other transistors. That is why, there is variation seen across transistors for
delay.
Temperature variation: Similarly, all the transistors on the same chip cannot
have same temperature. So, there are variations in characteristics due to variation in
temperatures across the chip.
How to take care of OCV: To tackle OCV, the STA for the design is closed with some
margins. There are various margining methodologies available. One of these is applying
a flat margin over whole design. However, this is over pessimistic since some cells may
be more prone to variations than others. Another approach is applying cell based
margins based on silicon data as what cells are more prone to variations. There also
exist methodologies based on different theories e.g. location based margins and
statistically calculated margins. As advances are happening in STA, more accurate and
faster discoveries are coming into existence.
At high voltage levels applied, there is abundance of free charge carriers as a result of
the energy supplied by the potential difference created. At this state, there is not
significant change in carrier concentration with increase in temperature; so, the mobility
factor dominates; thereby, decreasing the conductivity with temperature. In other words,
at high levels of voltages applied, the conductivity of semiconductors decreases with
temperature.
Similarly, in the absence of any voltage applied, or with little voltage applied, the
semiconductor behaves similar to an insulator with very less number of carriers, those
resulting from only thermal energy. So, increase in carrier concentration is the
dominating factor. So, we can say that at low applied voltages, the conductivity of
semiconductors increases with temperature.
In other words, the only condition for negative delay is to have improvement in slew. As
we know, a net has only passive parasitic in the form of parasitic resistances and
capacitances. Passive elements can only degrade the transition as they cannot provide
energy (assuming no crosstalk); rather can only dissipate it. In other words, it is not
possible for a net to have negative propagation delay.
However, we can have negative delay for a net, if there is crosstalk, as crosstalk can
improve the transition on a net. In other words, in the presence of crosstalk, we can
have 50% level at output reached before 50% level at input; hence, negative
propagation delay of a net.
Timing arcs(write)
What is a timing arc: A timing arc defines the propagation of signals through logic
gates/nets and defines a timing relationship between two related pins. Timing arc is
one of the components of a timing path. Static timing analysis works on the concept
of timing paths. Each path starts from either primary input or a register and ends at a
primary output or a register. In-between, the path traverses through what are known
as timing arcs. We can define a timing arc as an indivisible path/constraint from
one pin to another that tells EDA tool to consider the path/relationship between
the pins. For instance, AND, NAND, NOT, full adder cell etc. gates have arcs from
each input pin to each output pin. Also, sequential cells such as flops and latches have
arcs from clock pin to output pins and data pins. Net connections can also be identified
as timing arcs as is discussed later.
Cell arcs and net arcs: Timing arcs can be categorized into two categories based upon
the type of element they are associated with – cell arcs and net arcs.
Cell arcs: These are between an input pin and output pin of a cell. In other
words, source pin is an input pin of a cell and sink pin a pin of the same cell (output pin
in case of delay arcs and input pin in case of timing check arcs). In the figure shown
above, arcs (IN1 -> OUT) and (IN2 -> OUT) are cell arcs. Cell arcs are further divided
into sequential and combinational arcs as discussed below.
Net arcs: These arcs are between driver pin of a net and load pin of a net. In
other words, source pin is an output pin of one cell and sink pin is an input pin of
another cell. In the figure shown above, arc (OUT -> IN2) is a net arc. Net arcs are
always delay timing arcs.
Sequential and combinational arcs: As discussed above, cell arcs can be sequential
or combinational. Sequential arcs are between the clock pin of a sequential cell and
either input or output pin. Setup and hold arcs are between input data pin and clock pin
and are termed as timing check arcs as they constrain a form of timing relationship
between a set of signals. Sequential delay arc is between clock pin and output pin of
sequential elements. An example of sequential delay arc is clk to q delay arc in a flip-
flop. On the other hand, combinational arcs are between an input data and output data
pin of a combinational cell or block.
Information contained in timing arc: A delay timing arc provides following information:
1. A delay arc tells whether the path can be traversed through pin1 to pin2. If
the path can be traversed, we say that an arc exists between pin1 and pin2. On
the other hand, a timing check arc tells the relationship that is allowed between a
set of signals.
2. Under what condition the path will be traversed, known as ‘sdf condition’
3. Maximum and minimum times it can take from the source pin to the
destination pin of the arc to traverse in the path
4. Timing sense of the arc as explained below
Timing sense of an arc: Timing sense of an arc is defined as the sense of traversal
from source pin of the timing arc to the sink pin of the timing arc. Timing sense is also
called as "unateness" of timing arc. Timing sense can be ‘positive unate’, ‘negative
unate’ and ‘non-unate’.
Positive unate timing arc: The unateness of an arc is said to be positive unate
if rise transition at the source pin causes rise transition (if at all) at sink pin and vice-
versa. Cells of type AND, OR gate etc. have positive unate arcs. All net arcs are
positive unate arcs.
Negative unate timing arc: The unateness of an arc is said to be negative unate
if rise transition at the source pin causes fall transition at the sink pin and vice-versa.
NAND, NOR and Inverter have negative unate arcs.
Non unate timing arcs: If there is no such relationship between the source and
sink pins of a timing arc, the arc is said to be non-unate. XOR and XNOR gates have
non-unate timing arcs.
From what source timing arcs are picked: For cell arcs, the existence of a timing arc
is picked from liberty files. The cell has a function defined that identifies if the arc is
there from its input (say ‘x’) to output (say ‘y’). In most of the cases, the value (delay,
unateness, sd condition etc) of the arc is also picked from liberty; but in case you have
read SDF, the delay is picked from SDF (Standard Delay Format) file (other properties
picked from liberty in this case also). On the other hand, for net arcs, the existence of
arc is picked from connectivity information (netlist). The net arcs are calculated based
on the parasitic values given in SPEF (Standard Parasitics Exchange Format) file, or
SDF (like in case above).
Importance of timing arcs: Timing arcs have a very important role in VLSI design
industry. Whole of the optimization process right from gate level netlist till final signoff
revolves around timing arcs. The presence of correct timing arcs in liberty file is very
essential for high quality signoff or there may not be correlation between simulation and
silicon).
Let us consider an example wherein a negative latch is placed between two positive
edge-triggered registers for simplicity and ease of understanding. The schematic
diagram for the same is shown in figure 1 below:
Figure 2 below shows the clock waveform for all the three elements involved. We have
labeled the clock edges for convenience. As is shown, latB is transparent during low
phase of the clock. RegA and RegC (positive edge-triggered registers) can
capture/launch data only at positive edge of clock; i.e., at Edge1, Edge3 or Edge5. LatB,
on other hand, can capture and launch data at any instant of time between Edge2 and
Edge3 or Edge4 and Edge5.
Figure 2: Clock waveforms
The time instant at which data is launched from LatB depends upon the time at which
data launched from RegA has become stable at the input of latB. If the data launched at
Edge1 from RegA gets stable before Edge2, it will get captured at Edge2
itself. However, if the data is not able to get stable, even then, it will get captured. This
time, as soon as the data gets stable, it will get captured. The latest instant of time this
can happen is the latch closing edge (Edge3 here). One point here to be noted is that at
whatever point data launches from LatB, it has to get captured at RegC at edge3. The
more time latch takes to capture the data, it gets subtracted from the next path. The
worst case setup check at latB is at edge2. However, latch can borrow time as needed.
The maximum time borrowed, ideally, can be upto Edge3. Figure 3 below shows the
setup and hold checks with and without time borrow for this case:
Figure 3: Setup check with and without time borrow
As we know, setup check between latches of same polarity (both positive or negative) is
zero cycle with half cycle of time borrow allowed as shown in figure 2 below for negative
level-sensitive latches:
Figure 2: Setup check between two negative level-sensitive latches
So, if there are a number of same polarity latches, all will form zero cycle setup check
with the next latch; resulting in overall zero cycle phase shift.
As is shown in figure 3, all the latches in series are borrowing time, but allowing any
actual phase shift to happen. If we have a design with all latches, there cannot be a next
state calculation if all the latches are either positive level-sensitive or negative level-
sensitive. In other words, for state-machine implementation, there should not be latches
of same polarity in series.
Stating more clearly, a virtual clock is a clock that has been defined, but has not been
associated with any pin/port. A virtual clock is used as a reference to constrain the
interface pins by relating the arrivals at input/output ports with respect to it with the help
How to define a virtual clock: The most simple sdc command syntax to define a virtual
clock is as follows:
The above SDC command will define a virtual clock “VCLK” with period 10 ns.
Purpose of defining a virtual clock: The advantage of defining a virtual clock is that
we can specify desired latency for virtual clock. As mentioned above, virtual clock is
used to time interface paths. Figure 1 shows a scenario where it helps to define a virtual
clock. Reg-A is flop inside block that is sending data through PORT outside the block.
sitting outside the block. Now, within the block, the path to PORT can be timed by
specifying output delay for this port with a clock synchronous to clock_in. We can
specify a delay with respect to clock_in itself, but there lies the difficulty of specifying the
clock latency. If we specify the latency for clock_in, it will be applied to Reg-A also.
Applying output delay with respect to a real clock causes input ports to get relaxed and
output ports to get tightened after clock tree has been built.
Figure 1: Figure to illustrate virtual clock
The solution to the problem is to define a virtual clock and apply output delay with
respect to it. Making the source latency of virtual clock equal to network latency of real
Can you think of any other method that can serve the purpose of a virtual clock?
Minimum pulse width requirement: To understand minimum pulse width requirement, let us
first define pulse width. Formally, pulse width can be defined as:
"If talking in terms of high signal level (high minimum pulse width), it is the time interval
between clock signal crossing half the VDD level during rising edge of clock signal and
clock signal crossing half the VDD level during falling edge of clock signal. If talking in
terms of low signal level (low minimum pulse width), it is the time interval between clock
signal crossing half the VDD level during falling edge of the clock signal and clock signal
crossing half the VDD level during rising edge of the clock signal."
If the clock being fed to a sequential object has less pulse width than the minimum required,
either of the following is the probable output:
The flop can capture the correct data and FSM will functional correctly
The flop can completely miss the clock pulse and does not capture any new data. The
FSM will, then, lead to invalid state
The flop can go meta-stable
All these scenarios are probable of happening; so, it is required to ensure every sequential
element always gets a clock pulse greater than minimum pulse width required. To ensure this,
there are ways to communicate to timing analysis tool the minimum pulse width requirement for
each and every sequential element. The check to ensure minimum pulse width is known as
"minimum pulse width check". There are following ways to ensure minimum pulse width
through minimum pulse width check:
Through liberty file: By default, all the registers in a design should have a minimum
pulse width defined through liberty file as this is the format to convey the standard cell
requierements to STA tool. By convention, minimum pulse width should be defined for clock
and reset pins. Minimum pulse width is constrained in liberty file using following syntax:
Timing type : min_pulse_width;
Through SDC command: We can also define minimum pulse width requirement
through SDC command. The SDC command for the same is "set_min_pulse_width". For
example, following set of commands will constrain the minimum pulse width of clock clk to be
5 ns high and 4 ns low:
set_min_pulse_width -high 5 [get_clocks clk]
set_min_pulse_width -low 4 [get_clocks clk]
Positive level-sensitive latch: A positive level-sensitive latch follows the input data
signal when enable is '1' and keeps its output when the data when it is '0'. Figure 1
below shows the symbol and the timing waveforms for a latch. As can be seen,
whenever enable is '1', out follows the data input. And when enable in '0', out remains
the same.
Figure 1(a): Positive level- Figure 1(b): Timing waveform for a positive level-
sensitive latch sensitive latch
Negative level-sensitive latch: A negative level-sensitive latch follows the input data
when enable is '0' and keeps its output when input is '1'.
Figure 2(a): Negative level- Figure 2(b): Timing waveform for a negative level-
sensitive latch sensitive latch
Latch timing arcs: Data can propagate to the output of the latch in two ways as
discussed below:
Out changes with Data: This happens when enable is in its asserted state (for
example, for a positive level latch). When this happens, Out follows Data as there is a
direct path between Data and Out when Enable is '1'. This scenario is depicted in
figures 1(b) and 2(b) above wherein out is shown toggling when Data toggles. The latch
is, thus, said to have a timing arc from Data to Out.
Out changes with Enable: This happens when Data at input changes when
Enable is in its de-asserted state. When this happens, latch waits for Enable to be
asserted, then, follows the value of Data. As figure 3 shows, Data had become stable a
lot earlier, but out toggled only when enable became asserted. So, in latches, there
exists a timing arc from Enable to Out.
Figure 3:When data changes during enable is in de-asserted state, output waits for the enable to assert. Only then,
the effect of input propagated to output
Relation between Data and Enable: If Data toggles very close to the closing
edge of Enable, then, there might be a confusion as if its effect will be propagated to
output or not (as discussed later in this post). To make things more deterministic, we
impose a certain condition that Data should not toggle when Enable is getting de-
asserted. This relationship can be modelled as setup and hold arcs. So, there are setup
and hold timing arcs between data and enable pins of a latch. These will be discussed
below in detail.
Setup time and hold time for a latch: The most commonly used latch circuit is that
built using inverters and transmission gates. Figure 4 shows the transmission gate
implementation of a positive level-sensitive latch. The Enable has been shown as CLK
as usually is the case in sequential state machines. This circuit has two phases, as is
expected for a latch:
When CLK = '1', Transmission gate at the input gets ON and there is a direct
path between Data and Out
When CLK = '0', transmission gate in the loopback path gets ON. Out holds its
value
Now, when CLK transitions from '1' to '0', it is important that Data does not toggle. The
time before the clock falling edge that Data should remain stable is known as latch
setup time. Similarly, the time after the clock falling edge that Data should remain stable
is called latch hold time.
Let us go into the details of what latch setup and hold time should be for transmission
gate latch. If we want the data to be propagated properly to the output, then Data should
be stable for atleast some time before closing of the input transmission gate. This time
is such that it goes into the memory of latch; i.e., before input transmission gate closes,
Data should traverse both the inverters of the loop. So, setup time of the latch involves
the delay of input transmission gate and the two inverters. Figure 5 below shows the
setup time for the latch.
Figure 5: Setup time for latch
Similarly, if we do not want the data to propagate to output, it must not cross input
transmission gate so that it does not disturb the present state of the latch. This server
as the hold time for the latch. Assuming CLK' takes one inverter delay, input
transmission gate will close after one inverter delay only. So, the hold time for Data is
one inverter delay minus transmission gate delay. Please refer to figure 6 below for the
illustration of this. (CLK)' is formed from CLK after a delay equivalent to an inverter
delay. Only then, input transmission gate will switch off. If we want the data not to
propagate to Out, we have to ensure that it does not cross input transmission gate. So,
Data should not be present at the transmission gate's input at time (T(inv) - T(tg)). In
other words, it has to be held stable this much time after CLK edge. This is the hold time
for the latch.
Setup time ensures that the data propagates to the output at the coming clock
edge
Hold time ensures that the data does not propagate to the output at the
present/previous clock edge
Setup checks and hold checks for latches: As discussed above, the decision for the
data to be latched or not to be latched is made at the closing edge. So, the setup and
hold checks are with respect to latch closing edge only. However, since, latches are
transparent during half of the clock period, we can assume as if the capturing edge is
flexible and stretches all over the active level of the latch. This property enables a very
beautiful concept known as "time borrowing" for latches.
Setup time
Definition of setup time: Setup time is defined as the minimum amount of time before
arrival of clock's active edge so that it can be latched properly. In other words, each flip-
flop (or any sequential element, in general) needs data to be stable for some time
before arrival of clock edge such that it can reliably capture the data. This amount of
time is known as setup time.
We can also link setup time with state transitions. We know that the data to be captured
at the current clock edge was launched at previous clock edge by some other flip-flop.
The data launched at previous clock edge must be stable at least setup time before the
current clock edge. So, adherence to setup time ensures that the data launched at
previous edge is captured at the current clock edge reliably. In other words, setup time
ensures that the design transitions to next state smoothly.
Figure 1 shows that data is allowed to toggle prior to yellow dotted line. This yellow
dotted line corresponds to setup time. The time difference between this line and active
clock edge is termed as setup time. Data cannot toggle after this yellow dotted line for
a duration known as setup-hold window. Occurrence of such an event will be termed as
setup time violation. The consequence of setup time violation can be capture of wrong
data or the sequential element (setup check violation) going into metastable state (setup
time violation).
Figure 2: A positive level-sensitive D-latch
Latch setup time: Figure 2 shows a positive level-sensitive latch. If there is a toggling
of data at the latch input close to negative edge (while the latch is closing), there will be
an uncertainty as if data will be capture reliably or not. For data to be captured reliably,
it has to be available at the input of loop transmission gate at the arrival of closing clock
edge. To be able to present at NodeD at the closing edge, it must be there at latch input
some time prior to the clock edge. This time in reaching from latch input to NodeD is
termed as setup time for this latch.
Positive setup time: When setup time point is before the arrival of clock edge, setup
time is said to be positive. Figure 1 below shows positive setup time.
Zero setup time: When setup time point is at the same instant as clock's active edge,
setup time is said to be zero. Figure 2 shows a situation wherein setup time is zero.
Negative setup time: When setup time point occurs after clock edge, setup time is said
to be negative. Figure 3 shows timing waveform for negative setup time.
Figure 3: Negative setup time
What causes different values of setup time: We have discussed above theoretical
aspects of positive, zero and negative setup time. Let us go a bit deeper into the details.
Figure 4 shows a positive level-sensitive D-latch. As we know from the definition of
setup time, setup time depends upon the relative arrival times of data and clock at input
transmission gate (We have to ensure data has reached upto NodeD when clock
reaches input transmission gate). Depending upon the relative arrival times of data and
clock, setup time can be positive, zero or negative.
Now, if data takes 1 ns more than clock to reach input transmission gate from the
reference point, then, data has to reach reference point at least 3 ns before clock
reference point. In this case, setup time will be 3 ns.
Similarly, if data takes 1 ns less than clock to reach input transmission gate, setup time
will be 1 ns. And if data takes 2 ns less than clock to reach input transmission gate,
setup time will be zero.
Now, if there is further difference between delays of data and clock from respective
reference points to input transmission gate, the hold time will become negative. For
example, if data takes 3 ns less than clock to reach input transmission gate, setup time
will be -1 ns.
This is how setup time depends upon relative delays of data and clock within the
sequential element. And it completely makes sense to have negative setup time.
Hold time
Definition of hold time: Hold time is defined as the minimum amount of time after
arrival of clock's active edge so that it can be latched properly. In other words, each flip-
flop (or any sequential element, in general) needs data to be stable for some time after
arrival of clock edge such that it can reliably capture the data. This amount of time is
known as hold time.
We can also link hold time with state transitions. We know that the data to be captured
at the current clock edge was launched at previous clock edge by some other flip-flop.
And the data launched at the current clock edge must be captured at the next edge.
Adherence to hold time ensures that the data launched at current edge is not captured
at the current clock edge. And the data launched at previous edge is captured and not
disturbed by the one launched at current edge. In other words, hold time ensures that
the current state of the design is not disturbed.
Figure 1 shows that data is allowed to toggle after the yellow dotted line. This yellow
dotted line corresponds to hold time. The time difference between the active clock edge
and this yellow dotted line is hold time. Data cannot toggle before this yellow dotted line
for a duration known as setup-hold window. Occurrence of such an event is termed as
hold violation. The consequence of such a violation can be capture of wrong data
(better termed as hold check violation) or the sequential element going into meta-stable
state (hold time violation).
Figure 2: A positive level-sensitive D-latch
Latch hold time: Figure 2 shows a positive level-sensitive latch. If there is a toggling of
data at the latch input close to negative edge (while the latch is closing), there will be an
uncertainty as if data will be capture reliably or not. For data to be captured reliably,
next data must not reach Node C when closing edge of clock arrives at the input
transmission gate. For this to happen, data must not travel NodeA -> NodeB -> NodeC
before clock edge arrives. Data must change after this time interval only.
As we know from the definition of hold time, hold time is a point on time axis which
restrains data from changing before it. Data can change only after hold time has
elapsed. Now, there is no constraint on the occurrence of hold time point with respect to
clock edge. It can either be after, before or at the same instant of time as that of clock
active edge.
Posotive hold time: When hold time point is after the arrival of clock active edge, hold
time is said to be positive hold time. Figure 1 below shows positive hold time.
Zero hold time: When hold time point is at the same time instant as that of clock active
edge, we say that hold time of the sequential element is zero. Figure 2 below shows
timing waveform for zero hold time.
Figure 2: Zero hold time
Negative hold time: Similarly, when hold time point comes earlier on time scale as
compared to data, we say that hold time of the sequential element is negative. Figure 3
shows timing waveform for negative hold time.
We have discussed above theoretical aspects of positive, zero and negative hold time.
Let us go a bit deeper into the details. Figure 4 shows a positive level-sensitive D-latch.
As we know (from definition of hold time), hold time depends upon the relative arrival
times of clock and data at the input transmission gate (We have to ensure data does not
reach NodeC). Depending upon the times of arrival of clock and data, hold time can be
positive or negative.
Figure 4: Positive level-sensitive D-latch
Let us say, the delay of an inverter is 1 ns. Then, we can afford the data to reach
transmission gate input even 0.9 ns before arrival of clock at transmission gate. This will
ensure data reaches NodeC (-0.9 + 1 =) 0.1 n after arrival of clock edge, if allowed. But,
since, clock closes transmission gate, data will not reach NodeC. So, in this case, hold
time is -1 ns. If the delay from NodeB to NodeC was something else, hold time would
also have been different.
Now, if we say that clock arrives at transmission gate 1 ns earlier than data, then, by
above logic, hold time of this latch will be -2 ns.
Similarly, if clock arrives at transmission gate 0.5 ns after data, hold time will be -0.5 ns.
And if clock arrive at transmission gate 1 ns after data, hold time will be zero.
If the arrival time of clock is made more late, hold time will be greater than zero. For
example, if arrival time of clock is 2 ns after data, hold time will be +1 ns.
Hold time of the circuit is also dependent upon the reference point. For example,
consider a multi-level black box as shown in figure 5. If we look at black box 0, its hold
time is -1 ns. At level of black box 1, wherein clock travels 2 ns and data travels 0.5 ns
to reach black box 0, hold time is (-1 + 2 - 0.5 = ) 0.5 ns. Similarly, at the level of black
box 2, hold time is 1 ns. This is how, hold time depends upon the relative arrival times of
clock and data. And it completely makes sense to have a negative hold time.
STA problem : Hold time manipulation
Given a black box with a hold time of 2 ns. How will you convert it to one having a hold
time of 1 ns?
As we learnt in our post negative hold check, we can control hold time of a black box
just by controlling relative arrival times of data and clock at a certain reference point for
a given sequential element. So, to make this transition, we need to insert 1 ns of delay
in the clock path as shown in figure 2 below:
We can arrive at the above conclusion with the help of following equations. As we know,
for a hold check, following equation needs to hold true:
Hold slack = Tck->q + Tprop - (Thold + Tskew)
The above equation is for a single cycle path from register to register. However, the
results are valid for any kind of timing path.
What is meant by setup check: Setup check ensures that the design transitions to the
next state as desired through the state machine design. Mostly, the setup check is at
next active clock edge relative to the edge at which data is launched. Let us call this
as default setup check. This is, of course, in correspondence to state machine
requirement to transfer to next state and the possibility of meeting both setup and hold
checks together in view of delay variations accross timing corners. Figure 1 below
shows the setup check for a timing path from positive edge-triggered register to
negative edge-triggered register. It shows that the data launched by flop1 on positive
edge will be captured by flop2 on the forthcoming negative edge and will update the
state of flop2. To do so, it has to be stable at the input of flop2 before the negative edge
at least setup time before.
Figure 1: Default setup check for a timing path from positive edge-triggered to negative edge-triggered flop
What is meant by hold check: Hold check ensures that the design does not move to
the next state before its stipulated time; i.e., the design retains its present state only.
The hold check should be one active edge prior to the one at which setup is checked
unless there are some architectural care-abouts in the state machine design. The hold
check corresponding to default setup check in such a scenario is termed as default
hold check. Of course, there are some architectural care-abouts for this to happen.
Figure 2 below shows the default hold check corresponding to the default setup check
of figure 1. It shows that the data launched on positive edge by flop 1 should be
captured by next negative edge and not the previous negative edge.
Figure 2: Default hold check for a timing path from positive edge-triggered
Default setup and hold check categories: As discussed above, for each kind of timing
path, there is a default setup check and a default hold check that will be inferred unless
there is an intended non-default check. We can split the setup and hold checks into
following categories for our convenience. Each of the following is a link, which you can
visit to know about the default setup and hold checks for each category:
Non-default setup and hold checks: These are formed when the state machine
behavior is different than the default intended one. Sometimes, a state machine can be
designed causing the setup and hold checks to be non-default. For this to happen, of
course, you have to first analyze delay variations across timing corners and ensure that
the setup timing equation and hold timing equation are satisfied for all timing corner
scenarios. The non-default setup and hold checks can be modeled with the help of
multi-cycle path timing constraints. You may wish to go through our posts Multicycle
paths - the architectural perspective and Multicycle paths handling in STA to understand
some of the concepts related to non-default setup and hold checks.
Figure 1: Setup and hold checks for positive edge-triggered to negative edge-triggered flip-flop
Most of the cases in today’s designs are of this type only. The exceptions to zero cycle
hold check are not too many. There are hold checks for previous edge also. However,
these are very relaxed as compared to zero cycle hold check. Hence, are not
mentioned. Also, hold checks on next edge are impossible to be met considering cross-
corner delay variations. So, seldom do we hear that hold check is frequency dependant.
Let us talk of different scenarios of frequency dependant hold checks:
Figure 2: Setup and hold checks for timing path from positive edge-triggered flip-flop to negative edge-
triggered flip-flop
Similarly, for timing paths launching from negative edge-triggered flip-flop and being
captured at positive edge-triggered flip-flop, clock period comes into picture. However,
this check is very relaxed most of the times. It is evident from above equation that for
hold slack to be negative, the skew between launch and capture clocks should be
greater than half clock cycle which is very rare scenario to occur. Even at 2 GHz
frequency (Tclk = 500 ps), skew has to be greater than 250 ps which is still very rare.
Coming to latches, hold check from a positive level-sensitive latch to negative edge-
triggered flip-flop is half cycle. Similarly, hold check from a negative level-sensitive latch
to positive edge-triggered flip-flop is half cycle. Hence, hold check in both of these
cases is frequency dependant.
2. Clock gating hold checks: When data launched from a negative edge-triggered flip-
flop gates a clock on an OR gate, hold is checked on next positive edge to the edge at
which data is launched as shown in figure 3, which is frequency dependant.
Figure 3: Clock gating hold check between data launched from a negative edge-triggered flip-flop and
and clock at an OR gate
Similarly, data launched from positive edge-triggered and gating clock on an AND gate
form half cycle hold. However, this kind of check is not possible to meet under normal
scenarios considering cross-corner variations.
Setup timing critical paths: Those paths for which meeting setup timing is difficult, can
be termed as setup critical timing paths. For these paths, the setup slack value is very
close to zero and for the most part of design cycle, remains below zero.
Hold timing critical paths: As is quite obvious, those paths for which meeting hold
timing is difficult, are hold critical paths. These paths may require many buffers to meet
hold slack equation.
Sometimes, we may encounter some timing paths which are violating in both setup and
hold. There is not enough setup slack to make them hold timing clean and vice-versa.
The good practice in timing analysis is to identify all such paths as early as possible in
design cycle. Let us discuss the scenarios that make timing paths both setup and hold
timing critical.
Inherent frequency limit and delay variations: Let us say, we want our chip to remain
functional within following PVTs:
Process : Best-case to Worst-case
Voltage : 1.2 V with 10% voltage variation allowed (1.08 V to 1.32 V)
Temperature : -20 degrees to +150 degress
The delay of a standard cell changes with PVTs and OCVs. Let us only talk about PVT
variations. Let us say, cell delay changes by 2 times from worst case scenario (worst
process, lowest voltage, worst temperature) to best case scenario (best process,
highest voltage, best temperature). Let us say, setup and hold checks also scale by
same amount. Remember that the equations for setup and hold need to be satisfied
across all the PVTs. Which essentially means setup needs to be ensured for WCS
scenario and hold timing needs to be ensured for BCS scenario. This will provide a limit
to maximum frequency that the path can be timed at. If we try to go below that
frequency, we will not be able to ensure both setup and hold slacks remain positive.
Let us illustrate with the help of an example of a timing path from a positive edge-
triggered flip-flop to positive edge-triggered flip-flop with a frequency target of 1.4 GHz
(clock time period = 714 ps). Let us say, we have the Best-case and Worst-case
scenarios as shown in figure 1 and 2.
Figure 1 shows that the best-case clk->q delay for launch flop is 100 ps, best-case
combinational delay is 80 ps and best-case hold time is 200 ps. Applying our hold timing
equation for this case,
Thus, for the same timing path, both setup and hold slacks are coming out to be
negative. For this path, we cannot meet both setup and hold provided all these
conditions. One of the solutions could be to use cells with less delay variability. Or we
can limit the operating conditions to a tighter range, for instance, 1.15 to 1.25 V instead.
This will improve both setup and hold slack values. If this is not an option, the only
option left to satisfy timing is to add delay elements to bring hold slack to zero and
reduce the frequency as the inherent variations of cells will not allow the path to operate
beyond a certain frequency. Let us check at what maximum frequency our timing path
will work.
For a setup slack of 0 ps, operating clock frequency will be maximum; i.e.,
What if setup and/or hold violations occur in a design: As said earlier, setup and
hold timings are to be met in order to ensure that data launched from one flop is
captured properly at another and in accordance to the state machine designed. In other
words, no timing violations means that the data launched by one flip-flop at one clock
edge is getting captured by another flip-flop at the desired clock edge. If the setup check
is violated, data will not be captured properly at the next clock edge. Similarly, if hold
check is violated, data intended to get captured at the next edge will get captured at the
same edge. Moreover, setup/hold violations can lead to data getting captured within the
setup/hold window which can lead to metastability of the capturing flip-flop (as explained
in our post metastability). So, it is very important to have setup and hold requirements
met for all the registers in the design and there should not be any setup/hold violations.
Setup violations: As we know, setup checks are applied for timing paths to get the
state machine to move to the next state. The timing equation for a setup check from
positive edge-triggered flip-flop to positive edge-triggered flip-flop is given as below:
Tck->q + Tprop + Tsetup - Tskew < Tperiod
For a timing path to meet setup requirements, this equation needs to be satisfied. The
difference between left and right sides is represented by a parameter known as setup
slack.
Setup slack is the margin by which a timing path meets setup check requirement. It is
given as the difference in R.H.S. and L.H.S. of setup timing equation. The equation for
setup slack is given as:
Setup slack = Tperiod - Tck->q - Tprop - Tsetup + Tskew
If setup slack is positive, it means the timing path meets setup requirement. On the
other hand, a negative setup slack means setup violating timing path. If, by chance, a
fabricated design is found to have a setup violation, you can still run the design at less
frequency than specified and get the desired functionality as setup equation includes
clock period as a variable.
If we analyze setup equation more closely, it involves four parameters:
1. Data path delay: More the total delay of data path (flip-flop delay +
combinational delay + Setup), less is setup slack
2. Clock skew: More the clock skew (difference between arrival times of
clock at capture and launch flip-flops), more is the setup slack
3. Setup time requirement of capturing flip-flp: Less the setup time
requirement, more will be setup slack
4. Clock period: More is the clock period, more is the setup slack. However,
if you are targetting a specific clock period, doing this is not an option. :-)
How to tackle setup violations: The ultimate goal of timing analysis is to get every
timing path follow setup equation and get a positive setup slack number for every timing
path in the design. If a timing path is violating setup timing (assuming we are targetting
a certain clock frequency), we can try one or more of the following to bring the setup
slack back to a positive value by:
Decreasing data path delay
Choosing a flip-flop with less setup time requirement
Increasing clock skew
How to fix setup violations discusses various ways to tackle setup violations.
Hold violations: As we know, hold checks are applied to ensure that the state machine
remains in its present state until desired. The hold timing equation for a timing path from
a positive edge-triggered flip-flop to another positive edge-triggered flip-flop is governed
by the following equation:
Tck->q + Tprop > Thold + Tskew
Similar to setup slack, the presence and magnitude of hold violation is governed by a
parameter called as hold slack. The hold slack is defined as the amount by which L.H.S is
greater than R.H.S. In other words, it is the margin by which timing path meets the hold timing
check. The equation for hold slack is given as:
Hold slack = Tck->q + Tprop - Thold + Tskew
If hold slack is positive, it means there is still some margin available before it will start violating
for hold. A negative hold slack means the path is violating hold timing check by the amount
represented by hold slack. To get the path met, either data path delay should be increased, or
clock skew/hold requirement of capturing flop should be decreased.
1. Increase the drive strength of data-path logic gates: A cell with better drive
strength can charge the load capacitance quickly, resulting in lesser propagation delay.
Also, the output transition should improve resulting in better delay of proceeding stages.
We can view a logic gate as a certain ON-resistance, that will charge/discharge a load
capacitor to toggle the output state. This will form an RC circuit with a certain RC time
constant. A better drive-strength gate will have a lesser resistance, effectively lowering
the RC time constant; hence, providing less delay. This is illustrated in figure 1 below. If
an AND gate of drive strength 'X' has a pull down resistance equivalent to 'R', the one
with drive strength '2X' will have R/2 resistance. Thus, a bigger AND gate with better
drive strength will have less delay.
This strategy is going to give best results only if the load of the cell is dominated by
external load capacitance. Generally, drive strength of a cell is proportional to the cell
size. Thus, increasing the cell size halves its internal resistance, but doubles the internal
node capacitance. Thus, as shown in figure 2, the zero load capacitance delay of a cell
ideally remains same of doubling the size of the cell.
Thus, upon doubling the drive strength of the cell, (assuming D to be the original delay)
the delay can be anything between D/2 to D depending upon the ratio of intrinsic and
external load capacitance.
Moreover, the input pin capacitance is a by-product of the size of the cell. Thus,
increasing the size of the cell results in increased load for the driver cell of its input pins.
So, in some cases (very high drive strength cell with less load driven by a low drive
strength cell), increasing the drive strength can result in increase in magnitude of setup
violation.
Keeping aside timing, power dissipation (both leakage as well as dynamic power) are a
function of cell drive strength. Also, area is a function of cell drive strength. So,
increasing the drive strength to fix a setup violation results in both area and power
increase (although very small in comparison to whole design).
2. Use the data-path cells with lesser threshold voltages: If you have multiple
flavors of threshold voltages in your designs, the cell with lesser threshold voltage will
certainly have less delay. So, this must be the first step to resolve setup violations.
3. Improve the setup time of capturing flip-flop: As we know, the setup time of a flip-
flop is a function of the transition at its data pin and clock pin. Better the transition at
data pin, less is setup time. And worse clock transition causes less setup time. Also, a
flip-flop with higher drive strength and/or lower threshold voltage is more probable of
having less setup time requirement. Also, increasing the drive strength of flip-flop might
cause the transition at clock pin and data pin to get worse due to higher pin loads. This
also plays a role in deciding the setup time.
4. Restructuring of the data-path: Based upon the placement of data path logic cells,
you can decide either to combine simple logic gates into a complex gate, or split a multi-
stage cell into simpler logic gates. A multi-stage gate is optimized in terms of area,
power and timing. For example, a 2:1 mux will have less logic delay than 1 AND gate
and 1 OR gate combined for same output load capacitance. But, if you need to traverse
distance, then 2 stages of logic can help as a buffer will introduce additional delay.
Let us elaborate this with the help of an example wherein a data-path traverses a 3-
input AND gate from FF1 to FF2 situated around 400 micron apart. Let us assume one
logic cell can drive 200 micron and each logic cell has only one drive strength available
for simplicity. The choice is between two 2-input AND gates and 1 3-input AND gate. In
this case, 3-input AND gate should give less delay (may be 200 ps for two 2-input AND
vs 150 ps for one 3-input AND) as it has been optimized for less area, timing and power
as compared to two 2-input AND gates.
Now, consider another case where the FF1 and FF2 are at a distance of 600 micron. In
this case, if we use two 2-input AND gates, we can place them spaced apart 200 micron
and hence, can cover the distance. But, if we use one 3-input AND gate, we will need to
add a repeater, which will have its own delay. In this case, using two 2-input AND gates
should give better results in terms of overall data-path delay.
5. Routing topologies: Sometimes, when there are a lot of nets at a certain place in
the design, the routing tool can detour the nets trying to get the place less congested.
Thus, two logic cells might be placed very close, still the delay can seem to be high for
both the cells ; for driver cell due to high net capacitance and for load cell due to poor
transition at the input. Also, net delay can be a significant component in such scenarios.
Below figure shows one such example of two AND gates situated a certain distance
apart. Ideally, there could be a straight net route between the two gates. But, due to
very high net density in the region, router tool chose to route the way as shown on the
right to help ease the congestion (this is an exaggerated scenario to help understand
better).
So, always give proper importance to net routing topology, at least for setup timing
critical nets. A few tips to improve the timing you can try include:
6. Add repeaters: Every logic cell has a limit upto which it can drive a load capacitance.
After that, its delay starts increasing rapidly. Since, net capacitance is a function of net
length, we should keep a limit on the length of net driven by a gate. Also, net delay itself
is proportional to square of net length. Moreover, the transitions may be very bad in
such cases. So, it is wise to add repeater buffers after a certain distance, in order to
ensure that the signal is transferred reliably, and in time.
7. Play with clock skew: Positive skew helps improve the setup slack. So, to fix setup
violation, we may either choose to increase the clock latency of capturing flip-flop, or
decrease the clock latency of launching flip-flop. However, in doing so, we need to be
careful regarding setup and hold slack of other timing paths that are being formed
from/to these flip-flops.
8. Increase clock period: As a last resort, you may choose to time your design at
reduced frequency. But, if you are targeting a particular performance, you need a
minimum frequency. In that case, this option is not for you.
9. Improve the clk->q delay of launching flip-flop: A flip-flop with less clk->q delay
will help meeting a violating setup timing path. This can be achieved by:
Improving transition at flip-flops clock pin
Choosing a flip-flop of high drive strength. However, if by doing so, clock
transition degrades, delay can actually increase
Replacing the flip-flop with a flip-flop of same drive strength, but lower Vt
In this post, we learnt how to approach a setup violating timing path. Have you ever
used a method that is not listed above? Please share your experience in comments. We
will be happy to hear from you.
How to fix hold violations
In the post setup and hold time violations, we learnt about the setup time violations and
hold time violations. In this post, we will learn the approaches to tackle hold time
violations. Following strategies can be useful in reducing the magnitude of hold violation
and bringing the hold slack towards a positive value:
1. Insert delay elements: This is the simplest we can do, if we are to decrease the
magnitude of a hold time violation. The increase in data path delay can be increased if
we insert delay elements in the data-path. Thus, the hold violating path's delay can be
increased, and hence, slack can be made positive by inserting buffers in hold violating
data-path.
2. Reduce the drive strength of data-path logic gates: Replacing a cell with a similar
cell of less drive strength will certainly add delay to data-path. However, there is a slight
chance of decrease in data-path delay if the cell load is dominated by intrinsic
capacitance as we discussed in xkfjjkdsf
3. Use data-path cells with higher threshold voltages: If you have multiple flavors of
threshold voltages in your design, the cells with higher threshold voltage will certainly
have higher delays. So, this must be the first option you must be looking for to resolve
hold violations.
4. Improve hold time of capturing flip-flop: Using a capturing flip-flop with higher
drive strength and/or lower threshold voltage will give a lower hold time requirement.
Also, degrading the transition at flip-flop's clock pin reduces its hold time requirement.
6. Play with clock skew: A positive skew degrades hold timing and a negative skew
aids hold timing. So, if a data-path is violating, we can either decrease the latency of
capturing flip-flop or increase the clock latency of launching flip-flop. However, in doing
so, we need to keep in mind the setup and hold slacks of other timing paths starting
and/or ending at these flip-flops.
7. Increase the clk->q delay of launching flip-flop: A launching flip-flop with more clk-
>q delay will help ease the hold timing of the data-path. For this, either we can decrease
the drive strength of the flip-flop or move it to higher threshold voltage.
How multi-cycle paths are handled in STA: By default, in STA, all the timing paths are
considered to have default setup and hold timings; i.e., all the timing paths should be covered in
either half cycle or single cycle depending upon the nature of path (see setup-hold checks part
1 and setup-hold checks part 2 for reference). However, it is possible to convey the information
to STA engine regarding a path being multi-cycle. There is an SDC command
"set_multicycle_path" for the same. Let us elaborate it with the help of an example:
Let us assume a multi-cycle timing path (remember, it has to be ensured by architecture) wherein
both launch and capture flops are positive edge-triggered as shown in figure 3. The default setup
and hold checks for this path will be as shown in red in figure 4. We can tell STA engine to time
this path in 3 cycles instead of default one cycle with the help of set_multicycle_path SDC
command:
Above command will shift both setup and hold checks forward by two cycles. That is, setup
check will now become 3 cycle check and hold will be 2 cycle check as shown in blue in figure
4. This is because, by default, STA engine considers hold check one active edge prior to setup
check, which, in this case, is after 3 cycles.
Figure 4: Setup and hold checks before and after applying multicyle for setup-only
However, this is not the desired scenario in most of the cases. As we discussed earlier, multi-
cycle paths are achieved by either gating the clock path or data path for required number of
cycles. So, the required hold check in most cases is 0 cycle. This is done through same command
with switch "-hold" telling the STA engine to pull hold back to zero cycle check.
The above command will bring back the hold check 2 cycles back to zero cycle. This is as shown
in figure 5 in blue.
Figure 5: Setup and hold checks after applying multi-cycle exceptions for both setup and hold
Setting a multi-cycle path for setup affects the hold check by same number of cycles as
setup check in the same direction. However, applying a multi-cycle path for hold check
does not affect setup check.
So, in the above example, both the statements combined will give the desired setup and hold
checks. Please note that there might be a case where only setup or hold multi-cycle is sufficient,
but that is the need of the design and depends on how FSM has been modeled.
What if both clock periods are not equal: In the above example, for simplicity, we assumed
that launch and capture clock periods are equal. However, this may not be true always. As
discussed in multicycle path - the architectural perspective, it makes more sense to have multi-
cycle paths where there is a difference in clock periods. The setup and hold checks for multicycle
paths is not as simple in this case as it was when we considered both the clocks to be of same
frequency. Let us consider a case where launch clock period is twice the capture clock period as
shown in figure 6 below.
Figure 6: Default setup and hold checks for case where capture clock period is half that of launch clock
Now, the question is, defining a multi-cycle path, what clock period will be added to the setup
check, launch or capture? The answer depends upon the architecture and FSM of the design.
Once you know it, the same can be modelled in timing constraints. There is a switch in the SDC
command to provide for which of the clock periods is to be added. "set_multicycle_path -start"
means that the path is a multi-cycle for that many cycles of launch clock. Similarly,
"set_multicycle_path -end" means that the path is a multicycle for that many cycles of capture
clock. Let the above given path be a multicycle of 2. Let us see below how it changes with -start
and -end options.
1. set_multicycle_path -start: This will cause a cycle of launch clock to be added in setup
check. As expected, on applying a hold multicycle path of 1, the hold will return back to 0 cycle
check. Figure 7 below shows the effect of below two commands on setup and hold checks. As is
shown, setup check gets relaxed by one launch clock cycle.
2. set_multicycle_path -end: This will cause a cycle of capture clock to be added in setup
check. As expected, on applying a hold multicycle path of 1, the hold will return back to 0 cycle
check. Figure 8 below shows the effect of below two commands on setup and hold checks. As is
shown, setup gets relaxed by one cycle of capture clock.
set_multicycle_path 2 -setup -from ff1/Q -to ff2/D -end
set_multicycle_path 1 -hold -from ff1/Q -to ff2/D -end
Figure 9: Setup and hold checks with -end option provided with set_multicycle_path
Why is it important to apply multi-cycle paths: To achieve optimum area, power and
timing, all the timing paths must be timed at the desired frequencies. Optimization
engine will know about a path being multicycle only when it is told through SDC
commands in timing constraints. If we dont specify a multicycle path as multicycle,
optimization engine will consider it as a single cycle path and will try to use bigger drive
strength cells to meet timing. This will result in more area and power; hence, more cost.
So, all multicycle paths must be correctly specified as multicycle paths during timing
optimization and timing analysis.
Figure 1(a): Positive level- Figure 1(b): Timing waveform for a positive level-
sensitive latch sensitive latch
Negative level-sensitive latch: A negative level-sensitive latch follows the input data
when enable is '0' and keeps its output when input is '1'.
Figure 2(a): Negative level- Figure 2(b): Timing waveform for a negative level-
sensitive latch sensitive latch
Latch timing arcs: Data can propagate to the output of the latch in two ways as
discussed below:
Out changes with Data: This happens when enable is in its asserted state (for
example, for a positive level latch). When this happens, Out follows Data as there is a
direct path between Data and Out when Enable is '1'. This scenario is depicted in
figures 1(b) and 2(b) above wherein out is shown toggling when Data toggles. The latch
is, thus, said to have a timing arc from Data to Out.
Out changes with Enable: This happens when Data at input changes when
Enable is in its de-asserted state. When this happens, latch waits for Enable to be
asserted, then, follows the value of Data. As figure 3 shows, Data had become stable a
lot earlier, but out toggled only when enable became asserted. So, in latches, there
exists a timing arc from Enable to Out.
Figure 3:When data changes during enable is in de-asserted state, output waits for the enable to assert. Only then,
the effect of input propagated to output
Relation between Data and Enable: If Data toggles very close to the closing
edge of Enable, then, there might be a confusion as if its effect will be propagated to
output or not (as discussed later in this post). To make things more deterministic, we
impose a certain condition that Data should not toggle when Enable is getting de-
asserted. This relationship can be modelled as setup and hold arcs. So, there are setup
and hold timing arcs between data and enable pins of a latch. These will be discussed
below in detail.
Setup time and hold time for a latch: The most commonly used latch circuit is that
built using inverters and transmission gates. Figure 4 shows the transmission gate
implementation of a positive level-sensitive latch. The Enable has been shown as CLK
as usually is the case in sequential state machines. This circuit has two phases, as is
expected for a latch:
When CLK = '1', Transmission gate at the input gets ON and there is a direct
path between Data and Out
When CLK = '0', transmission gate in the loopback path gets ON. Out holds its
value
Now, when CLK transitions from '1' to '0', it is important that Data does not toggle. The
time before the clock falling edge that Data should remain stable is known as latch
setup time. Similarly, the time after the clock falling edge that Data should remain stable
is called latch hold time.
Let us go into the details of what latch setup and hold time should be for transmission
gate latch. If we want the data to be propagated properly to the output, then Data should
be stable for atleast some time before closing of the input transmission gate. This time
is such that it goes into the memory of latch; i.e., before input transmission gate closes,
Data should traverse both the inverters of the loop. So, setup time of the latch involves
the delay of input transmission gate and the two inverters. Figure 5 below shows the
setup time for the latch.
Similarly, if we do not want the data to propagate to output, it must not cross input
transmission gate so that it does not disturb the present state of the latch. This server
as the hold time for the latch. Assuming CLK' takes one inverter delay, input
transmission gate will close after one inverter delay only. So, the hold time for Data is
one inverter delay minus transmission gate delay. Please refer to figure 6 below for the
illustration of this. (CLK)' is formed from CLK after a delay equivalent to an inverter
delay. Only then, input transmission gate will switch off. If we want the data not to
propagate to Out, we have to ensure that it does not cross input transmission gate. So,
Data should not be present at the transmission gate's input at time (T(inv) - T(tg)). In
other words, it has to be held stable this much time after CLK edge. This is the hold time
for the latch.
Figure 6: Hold time for latch
Please note that there are other topologies also possible for latches such as dynamic
latches etc. The setup time and hold time calculations for such topologies will vary, but
the underlying principle will remain same, which is as follows:
Setup time ensures that the data propagates to the output at the coming clock
edge
Hold time ensures that the data does not propagate to the output at the
present/previous clock edge
Setup checks and hold checks for latches: As discussed above, the decision for the
data to be latched or not to be latched is made at the closing edge. So, the setup and
hold checks are with respect to latch closing edge only. However, since, latches are
transparent during half of the clock period, we can assume as if the capturing edge is
flexible and stretches all over the active level of the latch. This property enables a very
beautiful concept known as "time borrowing" for latches.
Positive level-sensitive latch: A positive level-sensitive latch follows the input data
signal when enable is '1' and keeps its output when the data when it is '0'. Figure 1
below shows the symbol and the timing waveforms for a latch. As can be seen,
whenever enable is '1', out follows the data input. And when enable in '0', out remains
the same.
Figure 1(a): Positive level- Figure 1(b): Timing waveform for a positive level-
sensitive latch sensitive latch
Negative level-sensitive latch: A negative level-sensitive latch follows the input data
when enable is '0' and keeps its output when input is '1'.
Figure 2(a): Negative level- Figure 2(b): Timing waveform for a negative level-
sensitive latch sensitive latch
Latch timing arcs: Data can propagate to the output of the latch in two ways as
discussed below:
Out changes with Data: This happens when enable is in its asserted state (for
example, for a positive level latch). When this happens, Out follows Data as there is a
direct path between Data and Out when Enable is '1'. This scenario is depicted in
figures 1(b) and 2(b) above wherein out is shown toggling when Data toggles. The latch
is, thus, said to have a timing arc from Data to Out.
Out changes with Enable: This happens when Data at input changes when
Enable is in its de-asserted state. When this happens, latch waits for Enable to be
asserted, then, follows the value of Data. As figure 3 shows, Data had become stable a
lot earlier, but out toggled only when enable became asserted. So, in latches, there
exists a timing arc from Enable to Out.
Figure 3:When data changes during enable is in de-asserted state, output waits for the enable to assert. Only then,
the effect of input propagated to output
Relation between Data and Enable: If Data toggles very close to the closing
edge of Enable, then, there might be a confusion as if its effect will be propagated to
output or not (as discussed later in this post). To make things more deterministic, we
impose a certain condition that Data should not toggle when Enable is getting de-
asserted. This relationship can be modelled as setup and hold arcs. So, there are setup
and hold timing arcs between data and enable pins of a latch. These will be discussed
below in detail.
Setup time and hold time for a latch: The most commonly used latch circuit is that
built using inverters and transmission gates. Figure 4 shows the transmission gate
implementation of a positive level-sensitive latch. The Enable has been shown as CLK
as usually is the case in sequential state machines. This circuit has two phases, as is
expected for a latch:
When CLK = '1', Transmission gate at the input gets ON and there is a direct
path between Data and Out
When CLK = '0', transmission gate in the loopback path gets ON. Out holds its
value
Now, when CLK transitions from '1' to '0', it is important that Data does not toggle. The
time before the clock falling edge that Data should remain stable is known as latch
setup time. Similarly, the time after the clock falling edge that Data should remain stable
is called latch hold time.
Let us go into the details of what latch setup and hold time should be for transmission
gate latch. If we want the data to be propagated properly to the output, then Data should
be stable for atleast some time before closing of the input transmission gate. This time
is such that it goes into the memory of latch; i.e., before input transmission gate closes,
Data should traverse both the inverters of the loop. So, setup time of the latch involves
the delay of input transmission gate and the two inverters. Figure 5 below shows the
setup time for the latch.
Figure 5: Setup time for latch
Similarly, if we do not want the data to propagate to output, it must not cross input
transmission gate so that it does not disturb the present state of the latch. This server
as the hold time for the latch. Assuming CLK' takes one inverter delay, input
transmission gate will close after one inverter delay only. So, the hold time for Data is
one inverter delay minus transmission gate delay. Please refer to figure 6 below for the
illustration of this. (CLK)' is formed from CLK after a delay equivalent to an inverter
delay. Only then, input transmission gate will switch off. If we want the data not to
propagate to Out, we have to ensure that it does not cross input transmission gate. So,
Data should not be present at the transmission gate's input at time (T(inv) - T(tg)). In
other words, it has to be held stable this much time after CLK edge. This is the hold time
for the latch.
Setup time ensures that the data propagates to the output at the coming clock
edge
Hold time ensures that the data does not propagate to the output at the
present/previous clock edge
Setup checks and hold checks for latches: As discussed above, the decision for the
data to be latched or not to be latched is made at the closing edge. So, the setup and
hold checks are with respect to latch closing edge only. However, since, latches are
transparent during half of the clock period, we can assume as if the capturing edge is
flexible and stretches all over the active level of the latch. This property enables a very
beautiful concept known as "time borrowing" for latches.
Similarly, for other types of synchronizers as well, you can specify false paths.
False paths for static signals arising due to merging of modes: Suppose you have a structure
as shown in figure 1 below. You have two modes, and the path to multiplexer output is different
depending upon the mode. However, in order to cover timing for both the modes, you have to
keep the “Mode select bit” unconstrained. This result in paths being formed through multiplexer
select also. You can specify "set false path" through select of multiplexer as this will be static in
both the modes, if there are no special timing requirements related to mode transition on this
signal. Specifically speaking, for the scenario shown in figure 1,
Architectural false paths: There are some timing paths that are never possible to occur. Let us
illustrate with the help of a hypothetical, but very simplistic example that will help understand
the scenario. Suppose we have a scenario in which the select signals of two 2:1 multiplexers are
tied to same signal. Thus, there cannot be a scenario where data through in0 pin of MUX0 can
traverse through in1 pin of MUX1. Hence, it is a false path by design architecture. Figure 3
below depicts the scenario.
Specifying false path: The SDC command to specify a timing path as false path is "set_false_path".
We can apply false path in following cases:
From register to register paths
o set_false_path -from regA -to regB
o
Paths being launched from one clock and being captured at another
o set_false_path -from [get_clocks clk1] -to [get_clocks clk2]
o
Through a signal
o set_false_path -through [get_pins AND1/B]
STA problem: Maximum frequency of operation of a
timing path
Problem: Figure 1 below shows a timing path from a positive edge-triggered register to a
positive edge-triggered register. Can you figure out the maximum frequency of operation for this
path?
In this post, we talked about frequency of operation of single cycle timing paths. Can
you figure out maximum frequency of operation for half cycle timing paths? Also, there
is a relation of maximum operating frequency to hold timing? Can you think about this
situation?
Solution:
To check if a timing path violates setup and/or hold, we need to check if they satisfy
setup and hold equations. A violating timing path has a negative setup/hold slack value.
The above circuit has a positive clock skew of 1 ns (as capture flip-flop gets clock 1 ns
later than launch flip-flop).
Let us first check for setup violation. As we know, for a full cycle register-to-register
timing path, setup equation is given as:
Tck->q + Tprop + Tsetup - Tskew < Tperiod
Here,
Tck->q = 2 ns, Tprop (max value of combinational propagation delay) = 4 ns, Tsetup = 1 ns,
Tperiod = 10 ns, Tskew = 1 ns
Now, Tck->q + Tprop + Tsetup = 2 + 4 + 1 - 1 = 6 ns < Tperiod
So, the above circuit does not have a setup violation. The setup slack, in this case, will
be given as:
SS = Tperiod - (Tck->q + Tprop + Tsetup - Tskew)
SS = +4 ns
Since, setup slack comes out to be positive, this path does not have a setup violation.
Now, let us check is there is a hold violation for this timing path. Hold timing equation is
given as:
Tck->q + Tprop > Thold + Tskew
Here,
Tck->q = 2 ns, Tprop (min value of combinational propagation delay) = 2 ns, Thold = 2ns,
Tskew = 1 ns
Now, Tck->q + Tprop = 2 ns + 2 ns = 4 ns
And Thold + Tskew = 2 ns + 1 ns = 3 ns
Now, 4 ns > 3 ns, so this circuit does not have a hold violation. The hold slack, in this
case, will be given as:
HS = Tck->q + Tprop - (Thold + Tskew) = +1 ns
Since, hold slack comes out to be positive, this path does not have a hold violation.
As we discussed in data setup and data hold checks, data setup check of 200 ps means
that constrained data should come at least 200 ps before the reference data. Similarly,
data hold check of 200 ps constrains the constrained data to come at least 200 ps after
the reference data. The same is shown pictorially in figure 1(a) and 1(b).
Figure 1(a): Data setup check of 200 ps constrains the constrained signal to toggle at-least 200 ps before reference
signal toggles.
Figure 1(b): Data hold check of 200 ps constrains the constrained signal to toggle at-least 200 ps after the reference
signal has toggled.
Now, suppose you apply a data setup check of -200 ps instead of 200 ps. This would
mean that the constrained signal can toggle upto 200 ps after the reference signal.
Similarly, a data hold check of -200 ps would mean that the constrained signal can
toggle from 200 ps before the reference signal. If we apply both the checks together, it
would infer that constrained signal can toggle in a window that ranges from 200 ps
before the toggling of reference signal to 200 ps after the toggling of reference signal.
This is pictorially shown in figures 2(a) and 2(b).
Figure 2(a): Negative data setup and hold checks of 200 ps
If we combine the two checks, it implies that the constrained data can toggle upto 200
ps after and from 200 ps before the reference signal. In other words, we have
constrained the constrained signal to toggle in a window +- 200 ps within the reference
signal.
Coming to the given problem, if there are a number of signals required to toggle within a
window of 200 ps, we can consider one of these to act as reference signal and other
signals as constrained signals. The other signals can then be constrained in both setup
and hold with respect to reference signal such that all of these lie within +-100 ps of the
reference signal. The same is shown in figure 3 below:
Many a times, two or more signals at analog-digital interface or at the chip interface
have some timing requirement with respect to each other. These requirements are
generally in the form of minimum skew and maximum skew. Data checks come to
between two arbitrary data signals, none of which is a clock. One of these is called
constrained pin, which is like data pin of the flop. The other is called related pin, which is
like clock pin of the flop. In the figure below, is shown two data signal at a boundary
(possibly analog hard macro) having some minimum skew requirement between them.
setup check (between a clock signal and a data signal) and data-to-data checks is that
data-to-data checks are zero cycle checks while normal setup check is single cycle
check. When we say that data checks are zero-cycle checks, we mean that these are
between two data signals that have launched at the same clock edge with respect to
each other.
As shown in the figure (i) below, traditional setup check is between a data signal
launched at one clock edge and a clock. Since, the data is launched at one clock edge
and is checked with respect to one edge later, it is a single cycle check. On the other
hand, as shown in figure (ii), data-to-data check is between two signals both of which
are launched on the same clock edge. Hence, we can say data-to-data checks are zero
cycle checks.
Figure 2 : (i) Normal setup check between a data signal and a clock signal, (ii) Data to dat
setup check between two data signals
What command in EDA tools is used to model data-to-data checks: Data checks
Here, A is the related pin and B is the constrained pin. The first command constrains B
to toggle at least ‘x’ time before ‘A’. Second command constrains ‘B’ to toggle at least ‘y’
Data setup time and data hold time: Similar to setup time and hold time, we can
Modeling data-to-data checks through liberty file: We can model data checks
through .lib also. There are constructs that can be used to model data-to-data checks.
These
seq_hold_falling. These commands specify setup and hold data-to-data checks with
reference signal. ‘rise_constraint’ and ‘fall_constraint’ can be used inside these to model
the setup and hold checks for rising and falling edge of constrained signal. Figure 3
below shows an example of modeling data setup check through liberty file.
Figure 3 : Modeling data-to-data checks through .lib using non_seq_setup_rising
In which cases data-to-data checks are applied: Data checks are normally applied
condition (where the order of arrival of two signals can affect output and the intention is
to get one of the probable outputs by constraining one signal to come before the other)
How data checks are useful: As we have seen above, data checks provide a
convenient measure to constrain two or more data signals with respect to each other.
Had these checks not been there, it would have been a manual effort to check the skew
between the two arriving signals and to maintain it. Also, it would not have been
possible to get the optimization done through implementation tool as these would not
Types of clock gating checks: Fundamentally, all clock gating checks can be
categorized into two types:
AND type clock gating check: Let us say we have a 2-input AND gate in which one of
the inputs has a clock and the other input has a data which will toggle while the clock is
still on.
Since, the clock is free-running, we have to ensure that the change of state of enable
signal does not cause the output of the AND gate to toggle. This is only possible if the
enable input toggles when clock is at ‘0’ state. As is shown in figure 3 below, if ‘EN’
toggles when ‘CLK_IN’ is high, the clock pulse gets clipped. In other words, we do not
get full duty cycle of the clock. Thus, this is a functional architectural miss causing glitch
in clock path. As is evident in figure 4, if ‘EN’ changes during ‘CLK_IN’ are low, there is
no change in clock duty cycle. Hence, this is the right way to gate a clock signal with an
enable signal; i.e. make the enable toggle only when clock is low.
Figure 3: Clock being clipped when ‘EN’ changes when ‘CLK_IN’ is high
Figure 4: Clock waveform not being altered when ‘EN’ changes when ‘CLK_IN’ is low
Theoretically, ‘EN’ can launch from either positive edge-triggered or negative edge-
triggered flops. In case ‘EN’ is launched by a positive edge-triggered flop, the setup and
hold checks will be as shown in figure 5. As shown, setup check in this case is on the
next positive edge and hold check is on next negative edge. However, the ratio of
maximum and minimum delays of cells in extreme operating conditions may be as high
as 3. So, architecturally, this situation is not possible to guarantee the clock to pass
under all conditions.
Figure 5: Clock gating setup and hold checks on AND gate when 'EN' launches from a positive edge-triggered flip-
flop
On the contrary, if ‘EN’ launches from a negative edge-triggered flip-flop, setup check
are formed with respect to the next rising edge and hold check is on the same falling
edge (zero-cycle) as that of the launch edge. The same is shown in figure 6. Since, in
this case, hold check is 0 cycle, both the checks are possible to be met for all operating
conditions; hence, this solution will guarantee the clock to pass under all operating
condition provided the setup check is met for worst case condition. The inactive clock
state, as evident, in this case, is '0'.
Figure 6: Clock gating setup and hold checks on AND gate when ‘EN’ launches from negative edge-triggered flip-
flop
OR type clock gating check: Similarly, since the off-state of OR gate is 1, the enable
for an OR type clock gating check can change only when the clock is at ‘1’ state. That
is, we have to ensure that the change of state of enable signal does not cause the
output of the OR gate to toggle. Figure 8 below shows if ‘EN’ toggles when ‘CLK_IN’ is
high, there is no change in duty cycle. However, if ‘EN’ toggles when ‘CLK_IN’ is low
(figure 9), the clock pulse gets clipped. Thus, ‘EN’ must be allowed to toggle only when
‘CLK_IN’ is high.
Figure 8: Clock being clipped when 'EN' changes when 'CLK_IN' is low
Figure 9: Clock waveform not being altered when 'EN' changes when 'CLK_IN' is low
As in case of AND gate, here also, ‘EN’ can launch from either positive or negative edge
flops. In case ‘EN’ launches from negative edge-triggered flop, the setup and hold
checks will be as shown in the figure 10. The setup check is on the next negative edge
and hold check is on the next positive edge. As discussed earlier, it cannot guarantee
the glitch less propagation of clock.
Figure 10: Clock gating setup and hold checks on OR gate when ‘EN’ launches from negative edge-triggered flip-
flop
If ‘EN’ launches from a positive edge-triggered flip-flop, setup check is with respect to
next falling edge and hold check is on the same rising edge as that of the launch edge.
The same is shown in figure 11. Since, the hold check is 0 cycle, both setup and hold
checks are guaranteed to be met under all operating conditions provided the path has
been optimized to meet setup check for worst case condition. The inactive clock state,
evidently, in this case, is '1'.
Figure 11: Clock gating setup and hold checks on OR gate when 'EN' launches from a positive edge-
triggered flip-flop
We have, thus far, discussed two fundamental types of clock gating checks. There may
be complex combinational cells other than 2-input AND or OR gates. However, for these
cells, too, the checks we have to meet between the clock and enable pins will be of the
above two types only. If the enable can change during low phase of the clock only, it is
said to be AND type clock gating check and vice-versa.
SDC command for application of clock gating checks: In STA, clock gating checks
can be applied with the help of SDC command set_clock_gating_check.
Why multi-cycle paths are introduced in designs: A typical System on Chip consists
of many components working in tandem. Each of these works on different frequencies
depending upon performance and other requirements. Ideally, the designer would want
the maximum throughput possible from each component in design with paying proper
respect to power, timing and area constraints. The designer may think to introduce
multi-cycle paths in the design in one of the following scenarios:
1) Very large data-path limiting the frequency of entire component: Let us take a
hypothetical case in which one of the components is to be designed to work at 500
MHz; however, one of the data-paths is too large to work at this frequency. Let us say,
minimum the data-path under consideration can take is 3 ns. Thus, if we assume all the
paths as single cycle, the component cannot work at more than 333 MHz; however, if
we ignore this path, the rest of the design can attain 500 MHz without much difficulty.
Thus, we can sacrifice this path only so that the rest of the component will work at 500
MHz. In that case, we can make that particular path as a multi-cycle path so that it will
work at 250 MHz sacrificing the performance for that one path only.
2) Paths starting from slow clock and ending at fast clock: For simplicity, let us
suppose there is a data-path involving one start-point and one end point with the start-
point receiving clock that is half in frequency to that of the end point. Now, the start-
point can only send the data at half the rate than the end point can receive. Therefore,
there is no gain in running the end-point at double the clock frequency. Also, since, the
data is launched once only two cycles, we can modify the architecture such that the
data is received after a gap of one cycle. In other words, instead of single cycle data-
path, we can afford a two cycle data-path in such a case. This will actually save power
as the data-path now has two cycles to traverse to the endpoint. So, less drive strength
cells with less area and power can be used. Also, if the multi-cycle has been
implemented through clock enable (discussed later), clock power will also be saved.
Now let us extend this discussion to the case wherein the launch clock is half in
frequency to the capture clock. Let us say, Enable changes once every two cycles.
Here, the intention is to make the data-path a multi-cycle of 2 relative to faster clock
(capture clock here). As is evident from the figure below, it is important to have Enable
signal take proper waveform as on the waveform on right hand side of figure 2. In this
case, the setup check will be two cycles of capture clock and hold check will be 0 cycle.
Figure 2: Introducing multi-cycle path where launch clock is half in frequency to capture
clock
2) Through gating in clock path: Similarly, we can make the capturing flop capture
data once every few cycles by clipping the clock. In other words, send only those pulses
of clock to the capturing flip-flop at which you want the data to be captured. This can be
done similar to data-path masking as discussed in point 1 with the only difference being
that the enable will be masking the clock signal going to the capturing flop. This kind of
gating is more advantageous in terms of power saving. Since, the capturing flip-flop
does not get clock signal, so we save some power too.
Figure 3: Introducing multi cycle paths through gating the clock path
Figure 3 above shows how multicycle paths can be achieved with the help of clock
gating. The enable signal, in this case, launches from negative edge-triggered register
due to architectural reasons (read here). With the enable waveform as shown in figure
3, flop will get clock pulse once in every four cycles. Thus, we can have a multicycle
path of 4 cycles from launch to capture. The setup check and hold check, in this case, is
also shown in figure 3. The setup check will be a 4 cycle check, whereas hold check will
be a zero cycle check.
Because collector current of one transistor Q1 is fed as input base current to another transistor
Q2, collector current of Q2, Ic2 = β2 * Ib2 and this collector current Ic2 is fed as input base
current Ib1 to another transistor Q1. In this way both transistors feedback each other and the
collector current of each goes on multiplying.
Surrounding PMOS and NMOS transistors with an insulating oxide layer (trench).
This breaks parasitic SCR structure.
Latchup Protection Technology circuitry which shuts off the device when latchup
is detected.
A semiconductor chip undergoes synthesis, placement, clock tree synthesis and routing
processes before going for fabrication. All these processes require some time, hence, it
requires time (9 months to 1 year for a normal sized chip) for a new chip to be sent for
fabrication. As a result of cut-throat competition, all the semiconductor companies stress
on Cycle-time reduction to be ahead of others in the race. New ways are being found
out to achieve the same. New techniques are being developed and more advanced
tools are being used. Sometimes, the new chip to be produced may be an incremental
change over an existing product. In such a case, there may not be the need to go over
the same cycle of complete synthesis, placement and routing. However, everything may
be carried out in incremental manner so as to reduce engineering costs, time and
manufacture costs.
It is a known fact that the fabrication process of a VLSI chip involves manufacture of a
number of masks, each mask corresponding to one layer. There are two kinds of layers
– base and metal. Base layers contain the information regarding the geometry and
kind of transistors, resistors, capacitors and other devices. Metal layers contain
information regarding metal interconnects used for connection of devices. For a
sub-micron technology, the mask costs may be greater than a million dollars.
Hence, to minimize the cost, the tendency is to reuse as many masks as possible. So, it
is tried to implement the ECO with minimal number of layers change. Also, due to cycle
time crunch, it is a tradition to send the base layers for the manufacture of masks while
the metals are still modified to eliminate any kind of DRC’s. This saves around two
weeks in cycle time. The base layer masks are developed while metal layers are still
being modified.
What conditions cause an Engineering Change Order: As mentioned above, ECO
are needed when the process steps are needed to be executed in an incremental
manner. This may be due to-
There may be some design bug that needs to be fixed and was caught very late
in the design cycle. It is very costly to re-run all the process cycle steps for each bug in
terms of time and cost. Hence, these changes need to be taken incrementally.
Normally, there is a case that design enhancements/functional bug fixes are being
implemented after the design has already been sent for fabrication. For instance, the
functional bug may be caught in silicon itself. To fix the bug, it is not practical to restart
the cycle.
The ECO process starts with the changes in the definition to be implemented into the
RTL. The resulting netlist synthesized from the modified netlist is, then, compared with
the golden netlist being implemented. The logic causing the difference is then
implemented into the main netlist. The netlist, then, undergoes placement of the
incremental logic, clock tree modifications and routing optimizations based upon the
requirements.
Kinds of ECO: The engineering change orders can be classified into two categories:
All layers ECO: In this, the design change is implemented using all layers. This
kind of ECO provides advantage in terms of cycle time and engineering costs. It is
implemented whenever the change is not possible to be carried out without all layer
change e.g. there is an updation in a hard macro cell or the change may require
updation of 100’s of cells. It is almost impossible to contain such a large change to a
few layers only.
Steps to carry out an ECO: The ECOs are best implemented manually. There exist
some automated ways to carry out the functional ECOs, but the most efficient and
effective method is to implement manually. Generally, following steps are employed to
carry out Engineering Change Orders:
1. The RTL with ECO implemented is synthesized and compared with the
golden netlist.
2. The delta is implemented into the golden netlist. The modified netlist is
then again compared with the synthesized netlist to ensure the logic has been
implemented correctly.
Spare Cells
We have discussed in our post titled 'Engineering Change Order' about the important to
have a uniform distribution of spare cells in the design. Nowadays, there is a trend
among the VLSI corporations to implement metal-only functional and timing ECOs due
to their low-cost. Let us discuss about the spare cells in a bit more detail here.
Figure showing spare cells in the design
Spare cells are put onto the chip during implementation keeping into view the possibility
of modifications that are planned to be carried out into the design without disturbing the
layers of base. This is because carrying out design changes with minimal layer changes
saves a lot of cost from fabrication point of view as each layer mask has a significant
cost of its own. Let us start by defining what a spare cell is. A spare cell can be thought
of as a redundant cell that is not used currently in the design. It may be in use later on,
but currently, it is sitting without doing any job. A spare cell does not contribute to the
functionality of the device. We can compare a spare cell with a spare wheel being
carried in a motor car to be used in case one of the wheels gets punctured. In that case,
the spare wheel will be replacing the main wheel. Similarly, a spare cell can be used to
replace an existing cell if the situation demands (eg. to meet the timing). However,
unlike spare wheels, spare cells may be added to the design even if they do not replace
any existing cell according as the need arises.
Kinds of spare cells: There are many variants of spare cells in the design. Designs are
full of spare inverters, buffers, nand, nor and specially designed configurable spare
cells. However, based on the origin of spare cells, these can be divided into two broad
categories:
Those used deliberately as spare cells in the design: As discussed earlier,
most of the designs today have spare cells sprinkled uniformly. These cells have inputs
and outputs tied to either ‘0’ or ‘1’ so that they contribute minimum to static and dynamic
power.
Those converted into spare cells due to design changes: There may be a
case that a cell that is being identified as a spare now was a main cell in the past. Due
to some design changes, the cell might have been replaced by another cell. Also, some
cells have floating outputs. These can be used as spare cells. We can also use the
used buffers as spare cells if removing the buffer does not introduce any setup/hold
violation in the design.
Advantages of using spare cells in the design: Introduction of spare cells into the
design offers several advantages such as:
Reusability: A design change can be carried out using metal layers only. So, the
base layers can be re-used for fabrication of new chips.
Design flexibility: As there are spare cells, small changes can be taken into the
design without much difficulty. Hence, the presence of spare cells provides flexibility to
the design.
Cycle time reduction: Nowadays, there is a trend to tape out base layers to the
foundry for fabrication as masks are not prepared in parallel. In the meantime, the
timing violations/design changes are being carried out in metal layers. Hence, there is
cycle time reduction of one to two weeks.
Disadvantages of using spare cells: In addition to many advantages, usage of spare
cells offers some disadvantages too. These are:
Contribution to static power: Each spare cell has its static power dissipation.
Hence, greater amount of spare cells contribute more to power. But, in general, this
amount of power is insignificant in comparison to total power. Spare cells should be
added keeping into consideration their contribution to power.
Area: Spare cells occupy area on the chip. So, more spare cells mean more
density of cells.
Thus, we have discussed about the spare cells here. Spare cells are used almost in
every design in each device manufactured today. It is important to make an intelligent
selection of spare cells to be sprinkled in the design. Many technical papers have been
published stating its importance and on the structure of the spare cells that can be
generalized to be used as any of the logic gate. In general, a collection of
nand/nor/inverters/buffers is sprinkled more or less uniformly. The modules where more
number of ECOs are expected, (like a new architecture being used for the first time)
should be sprinkled with more spare cells. On the contrary, those having stable
architectures are usually sprinkled with less number of spare cells as the probability of
an ECO is very less in the vicinity of these modules/macros.
transfer of an integral number of electrons, current can only take one of numerous values, and not just
any value. Let us take an illustration. The charge on an electron is 1.6E19 (or
0.00000000000000000016) represented as ‘e’. It is the smallest charge ever discovered. It is well known
that charge can exist only in the multiples of ‘e’. Thus, electric charge is a digital quantity with the smallest
unit ‘e’. When we say that the value of charge at a point is +1C, we actually mean that the charge is caused
by transfer of 6250000000000000000 electrons. Since, the smallest unit of charge is
0.00000000000000000016 C, hence, there cannot exist any charge of value
1.00000000000000000015 C, since that will make the number of electrons to be a fraction. Since, the
magnitude of 1C is very large as compared to charge on 1e, it appears to us as continuous and not discrete.
For us, there is no difference between 1. 00000000000000000015 and 1 as the devices we use don’t
measure with that much precision. Hence, we infer these quantities as analog. Similar is the case with
other physical quantities.
Many laws have been formed by our great scientists postulating about the quantization of some basic
physical quantities. Viz. Planck’s quantum theory states that angular momentum of an electron in the
orbit of an atom is quantized. Simply stated, it states that the angular momentum can take only specified
values given as multiples of h/2Π. Thus, the smallest angular momentum an electron can have is h/2Π
and the angular momentum can increment only in steps of h/2Π. If we take h/2Π as one unit, then we can
say that angular momentum of an electron is a digital quantity. Similarly speaking, Light is also known to
consist of photons. According to Planck’s quantum theory, the light intensity is also an integral multiple of
the intensity of a single photon. Thus, light is also inherently a digital quantity. Also, as stated above, the
charge is also quantized.
But there are some physical quantities of which quantization is yet to be established. Mass is one of those
quantities. But, it is believed that the quantization of mass will be established soon.
Thus, we have seen that most of the physical quantities known are known to be digital at microscopic
level. Since, we encounter these at macroscopic level having billions and billions of basic units, the
increments in these seem to be continuous to us as the smallest incremental unit is negligible in
comparison to actual measure of the quantity and we perceive them as analog in nature.
Thus, we can come to the conclusion that most of the quantities in this world are digital by their blood.
Once the quantization of mass will be established, we can conclude with surety that digital lies in the soul
of this world. This digital is similar to our definition of digital systems; just the difference is that it occurs
at a very minute scale which we cannot perceive at our own.
One important characteristic of static timing analysis that must be discussed is that
static timing analysis checks the static delay requirements of the circuit without applying
any vectors, hence, the delays calculated are the maximum and minimum bounds of the
delays that will occur in real application scenarios with vectors applied. This enables the
static timing analysis to be fast and inclusive of all the boundary conditions. Dynamic
timing analysis, on the contrary, applies input vectors, so is very slow. It is necessary to
certify the functionality of the design. Thus, static timing analysis guarantees the timing
of the design whereas dynamic timing analysis guarantees functionality for real
application specific input vectors.
However, we can have negative delay for a net, if there is crosstalk, as crosstalk can
improve the transition on a net. In other words, in the presence of crosstalk, we can
have 50% level at output reached before 50% level at input; hence, negative
propagation delay of a net.
What are scan chains: Scan chains are the elements in scan-based designs that are
used to shift-in and shift-out test data. A scan chain is formed by a number of flops
connected back to back in a chain with the output of one flop connected to another. The
input of first flop is connected to the input pin of the chip (called scan-in) from where
scan data is fed. The output of the last flop is connected to the output pin of the chip
(called scan-out) which is used to take the shifted data out. The figure below shows a
scan chain.
A scan chain
Purpose of scan chains: As said above, scan chains are inserted into designs to shift
the test data into the chip and out of the chip. This is done in order to make every point
in the chip controllable and observable as discussed below.
How normal flop is transformed into a scan flop: The flops in the design have to be
modified in order to be put in the scan chains. To do so, the normal input (D) of the flip-
flop has to be multiplexed with the scan input. A signal called scan-enable is used to
control which input will propagate to the output.
If scan-enable = 0, data at D pin of the flop will propagate to Q at the next active edge
If scan-enable= 1, data present at scan-in input will propagate to Q at the next active
edge
Scan terminology: Before we talk further, it will be useful to know some signals used in
scan chains which are as follows:
Scan-in: Input to the flop/scan-chain that is used to provide scan data into it
Scan-out: Output from flop/scan-chain that provides the scanned data to the
next flop/output
Scan-enable: Input to the flop that controls whether scan_in data or functional
data will propagate to output
Purpose of testing using scan: Scan testing is carried out for various reasons, two most
prominent of them are:
To test stuck-at faults in manufactured devices
To test the paths in the manufactured devices for delay; i.e. to test whether each
path is working at functional frequency or not
How a scan chain functions: The fundamental goal of scan chains is to make each
node in the circuit controllable and observable through limited number of patterns by
providing a bypass path to each flip-flop. Basically, it follows these steps:
1. Assert scan_enable (make it high) so as to enable (SI -> Q) path for each
flop
2. Keep shifting in the scan data until the intended values at intended nodes
are reached
3. De-assert scan_enable (for one pulse of clock in case of stuck-at testing
and two or more cycles in case of transition testing) to enable D->Q path so that
the combinational cloud output can be captured at the next clock edge.
4. Again assert scan_enable and shift out the data through scan_out
The PDF (How does scan work) provides a very good explanation to how scan chains
function.
How Chain length is decided: By chain length, we mean the number of flip-flops in a
single scan chain. Larger the chain length, more the number of cycles required to shift
the data in and out. However, considering the number of flops remains same, smaller
chain length means more number of input/output ports is needed as scan_in and
scan_out ports. As
Since for each scan chain, scan_in and scan_out port is needed. Also,
Suppose, there are 10000 flops in the design and there are 6 ports available as
input/output. This means we can make (6/2=) 3 chains. If we make scan chains of 9000,
100 and 900 flops, it will be inefficient as 9000 cycles will be required to shift the data in
and out. We need to distribute flops in scan chains almost equally. If we make chain
lengths as 3300, 3400 and 3300, the number of cycles required is 3400.
Keeping almost equal number of flops in each scan chain is referred to as chain
balancing.
This is due to the reason that all the nmos transistors share a common substrate, and a substrate
can only be biased to one voltage. Although it introduces body effect and makes transistors
slower and deviate from ideal mos current equation, there is no other way.
One could achieve different body voltage for all nmos transistors by putting all transistors in
different wells, but that would mean a tremendous penalty in terms of area as there needs to be
minimum size and separation that needs to be maintained which is huge in comparison to
transistor sizes. This is the reason why body is connected to ground for all NMOS.