Vlsi Concepts

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 168

STA

Static timing analysis (STA) is a vast domain involving many sub-fields. It involves
computing the limits of delay of elements in the circuit without actually simulating it. In
this post, we have tried to list down all the posts that an STA engineer cannot do
without. Please add your feedback in comments to make reading it a more meaningful
experience.
 Setup and hold interview questions
 Clock gating concepts
 Setup time and hold time - static timing analysis - Definition and detailed
discussion on setup time and hold time, setup hold violations and how to fix them
 Metastability - This post discusses the basics of metastability and how to avoid it.
 Problem: clock gating checks at a complex gate - An exercise to analyze the
requirements of clock gating checks at a complex gate
 Lockup latch - The basics of lockup latch, both from timing and DFT perspective
have been discussed in this post.
 Lockup latches vs. lockup registers - Provides an insight into the situations where
lockup latches and lockup registers can be useful.
 Clock latency - Read this if you wish to get acquainted with the terminology
related to clock latency
 Data checks - Non-sequential setup and hold checks have been discussed, very
useful for beginners
 Modeling skew requirements with the help of data checks - Explains with an
example how data checks can be used to maintain skew requirements
 What is static timing analysis - Defines static timing analysis and its scope
 Setup checks and hold checks for reg-to-reg paths - Discusses the default setup
and hold checks for timing paths starting from and ending at registers
 Setup checks and hold checks for register-to-latch paths - Discusses the default
setup and hold checks for timing paths starting from registers and ending at latches
 Setup checks and hold checks for latch-to-reg timing paths - Discussed the
default setup and hold checks for timing paths starting from latches and ending at
registers
 All about clock signals - Discusses the basics of clock signals
 Synchronizers - Different types of synchronizers have been discussed in detail
 Timing corners - dimensions in timing signoff - Highlights the importance of
signing off in different corner-case scenarios
 Ensuring glitch-free propagation of clock - Discusses about the hazards that can
occur, if there is a glitch in clock
 Clock switching and clock gating checks - The basics of clock gating check, and
how to apply these is discussed
 Clock gating checks at a mux - How clock gating checks should be applied on a
mux is discussed in detail
 False paths - what are they - This post discussed the basics of false paths and
how to treat them
 Multicycle paths handling in STA - Basics of multicycle paths and how they are
treated in STA
 All about multicycle paths in VLSI - Architecture specific description and handling
of multicycle paths, a must read
 Propagation delay - Defines propagation delay and related terms
 Is it possible for a logic gate to have negative delay - Thought provoking post on
whether a logic gate can have negative delay
 Worst slew propagation - Discusses the basics of worst slew propagation
 On-chip variations - Describes on-chip variations and the methods undertaken to
deal with these
 Temperature inversion - Discusses the concept of temperature inversion and
conductivity trends with temperature
 Can a net have negative delay - Describes how a net cannot have a negative
delay
 Timing arcs - Discusses the basics of timing arcs, positive and negative
unateness, cell arcs and net arcs etc.
 Time borrowing in latches - Discusses the basics of the concept of time
borrowing
 Interesting problem - latches in series - Describes why it is essential to have
alternate positive and negative latches for sequential operation
 Virtual clock - Explains the concept of virtual clock
 Minimum pulse width - Discusses about minimum pulse width checks
 Basics of latch timing - Definition of latch, setup time and hold timing of a latch,
latch timing arcs are discussed

Setup time and hold time basics(write)


In digital designs, each and every flip-flop has some restrictions related to the data with
respect to the clock in the form of windows in which data can change or not. There is
always a region around the clock edge in which input data should not change at the input
of the flip-flop. This is because, if the data changes within this window, we cannot
guarantee the output. The output can be the result of either of the previous input, the new
input or metastability (as explained in our post 'metastability'). This window is marked by
two boundary lines, one pertaining to the setup time of the flop, the other to the hold time
defined as below.

Definition of Setup time: Setup time is defined as the minimum amount of time before the
clock's active edge that the data must be stable for it to be latched correctly. In other words, each
flip-flop (or any sequential element, in general) needs some time for the data to remain stable
before the clock edge arrives, such that it can reliably capture the data. This duration is known
as setup time.
The data that was launched at the previous clock edge should be stable at the
input at least setup time before the clock edge. So, adherence to setup time
ensures that the data launched at previous edge is captured properly at the
current edge. In other words, we can also say that setup time adherence ensures
that the system moves to next state smoothly.
Definition of Hold time: Hold time is defined as the minimum amount of time after the clock's
active edge during which data must be stable. Similar to setup time, each sequential element
needs some time for data to remain stable after clock edge arrives to reliably capture data.
This duration is known as hold time.
The data that was launched at the current edge should not travel to the capturing flop
before hold time has passed after the clock edge. Adherence to hold time ensures that the
data launched at current clock edge does not get captured at the same edge. In other words,
hold time adherence ensures that system does not deviate from the current state and go
into an invalid state.
As shown in the figure 1 below, the data at the input of flip-flop can change anywhere
except within the seup time hold time window.
Figure 1: Setup-hold window

A D-type latch

Cause/origin of setup time and hold time: Setup time and hold time are said to be the
backbone of timing analysis. Rightly so, for the chip to function properly, setup and hold
timing constraints need to be met properly for each and every flip-flop in the design. If
even a single flop exists that does not meet setup and hold requirements for timing paths
starting from/ending at it, the design will fail and meta-stability will occur. It is very
important to understand the origin of setup time and hold time as whole design
functionality is ensured by these. Let us discuss the origin of setup time and hold time taking
an example of D-flip-flop as in VLSI designs, D-type flip-flops are almost always used. A D-
type flip-flop is realized using two D-type latches; one of them is positive level-sensitive, the
other is negative level-sensitive. A D-type latch, in turn, is realized using transmission gates
and inverters. Figure below shows a positive-level sensitive D-type latch. Just inverting the
transmission gates’ clock, we get negative-level sensitive D-type latch.

A complete D flip-flop using the above structure of D-type latch is shown in figure below:

A D-type flip-flop

Now, let us get into the details of above figure. For data to be latched by ‘latch 1’ at the
falling edge of the clock, it must be present at ‘Node F’ at that time. Since, data has to
travel ‘NodeA’ -> ‘Node B’ -> ‘Node C’ -> ‘Node D’ -> ‘Node E’ -> ‘Node F’ to reach ‘Node
F’, it should arrive at flip-flop’s input (Node A) at some earlier time. This time for data to
reach from ‘Node A’ to ‘Node F’ is termed as data setup time (assuming CLK and CLK'
are present instantaneously. If that is not the case, it will be accounted for accordingly).
Similarly, it is necessary to ensure a stable value at the input to ensure a stable value at
‘Node C’. In other words, hold time can be termed as delay taken by data from ‘Node A’ to
‘Node C’.
Setup and hold checks in a design: Basically, setup and hold timing checks ensure that a data
launched from one flop is captured at another properly. Considering the way digital designs of
today are designed (finite state machines), the next state is derived from its previous state. So,
data launched at one edge should be captured at next active clock edge. Also, the data launched
from one flop should not be captured at next flop at the same edge. These conditions are ensured
by setup and hold checks. Setup check ensures that the data is stable before the setup
requirement of next active clock edge at the next flop so that next state is reached. Similarly,
hold check ensures that data is stable until the hold requirement for the next flop for same clock
edge has been met so that present state is not corrupted.

A sample path in a design

Shown above is a flop-to-flop timing path. For simplicity, we have assumed that both the flops
are rise edge triggered. The setup and hold timing relations for the data at input of second flop
can be explained using the waveforms below:
Figure showing setup and hold checks being applied for the timing path shown above

As shown, data launched from launching flop is allowed to arrive at the input of the second flop
only after a delay greater than its hold requirement so that it is properly captured. Similarly, it
must not have a delay greater than (clock period – setup requirement of second flop). In other
words, mathematically speaking, setup check equation is given as below (assuming zero skew
between launch and capture clocks):
Tck->q + Tprop + Tsetup < Tperiod
Similarly, hold check equation is given as:
Tck->q + Tprop > Thold

If we take into account skews between the two clocks, the above equations are modified
accordingly. If Tskew is the skew between launch and capture flops, (equal to latency of clock at
capture flop minus latency of clock at launch flop so that skew is positive if capture flop has
larger latency and vice-versa), above equations are modified as below:

Tck->q + Tprop + Tsetup - Tskew < Tperiod


Tck->q + Tprop > Thold + Tskew

Setup checks and hold checks for reg-to-reg paths explains different cases covering
setup and hold checks for flop-to-flop paths.
What if setup and/or hold violations occur in a design: As said earlier, setup and hold timings
are to be met in order to ensure that data launched from one flop is captured properly at the next
flop at next clock edge so as to transfer the state-machine of the design to the next state. If the
setup check is violated, the data will not be captured at the next clock edge properly. Similarly, if
hold check is violated, data intended to be captured at the next edge will get captured at the same
edge. Setup hold violations can also lead to data changing within setup/hold window of the
capturing flip-flop. It may lead to metastability failure in the design (as explained in our post
'metastability'). So, it is necessary to have setup and hold requirements met for all the flip-flops
in the design and there should not be any setup/hold violation.
What if you fabricate a design without taking care of setup/hold violations: If you fabricate a
design having setup violations, you can still use it by lowering the frequency as the equation
involves the variable clock frequency. On the other hand, a design with hold violation cannot be
run properly. So, if you fabricate a design with an accidental hold violation, you will have to
simply throw away the chip (unless the hold path is half cycle as explained here). A design with
half cycle hold violations only can still be used at lower frequencies.

Tackling setup time violation: As given above, the equation for setup timing check is given as:
Tck->q + Tprop + Tsetup - Tskew < Tperiod

The parameter that represents if there is a setup time violation is setup slack. The setup slack can
be defined as the difference between the L.H.S and R.H.S. In other words, it is the margin that is
available such that the timing path meets setup check. The setup slack equation can be given as:
Setup slack = Tperiod - (Tck->q + Tprop + Tsetup - Tskew)
If setup slack is positive, it means there is still some margin available in the timing path. On the
other hand, a negative slack means that the paths violates setup timing check by the amount of
setup slack. To get the path met, either data delay should be decreased or clock period should be
increased.

Mitigating setup violation: Thus, we can meet the setup requirement, if violating, by
1. Decreasing clk->q delay of launching flop
2. Decreasing the propagation delay of the combinational cloud
3. Reducing the setup time requirement of capturing flop
4. Increasing the skew between capture and launch clocks
5. Increasing the clock period
Tackling hold time violation: Similarly, the equation for hold timing check is as below:
Tck->q + Tprop > Thold + Tskew
The parameter that represents if there is a hold timing violation is hold slack. The hold slack is
defined as the amount by which L.H.S is greater than R.H.S. In other words, it is the margin by
which timing path meets the hold timing check. The equation for hold slack is given as:
Hold slack = Tck->q + Tprop - Thold - Tskew
If hold slack is positive, it means there is still some margin available before it will start violating
for hold. A negative hold slack means the path is violating hold timing check by the amount
represented by hold slack. To get the path met, either data path delay should be increased, or
clock skew/hold requirement of capturing flop should be decreased.

Mitigating hold violation: We can meet the hold requirement by:


1. Increasing the clk->q delay of launching flop
2. Decreasing the hold requirement of capturing flop
3. Decreasing clock skew between capturing clock and launching flip-flops

Metastability(WRITE)
What is metastability: Metastability is a phenomenon of unstable equilibrium in digital
electronics in which the sequential element is not able to resolve the state of the input
signal; hence, the output goes into unresolved state for an unbounded interval of time.
Almost always, this happens when data transitions very close to active edge of the clock,
hence, violating setup and hold requirements. Since, data makes transition close to
active edge of clock, the flop is not able to capture the data completely. The flop starts to
capture the data and output also starts to transition. But, before output has changed its
state, the input is cut-off from the output as clock edge has arrived. The output is, then,
left hanging between state ‘0’ and state ‘1’. Theoretically, the output may remain in this
state for an indefinite period of time. But, given the time to settle down, the output will
eventually settle to either its previous state or the new state. Thus, the effect of signal
present at input of flop may not travel to the output of the flop partly or completely. In
other words, we can say that when a flip-flop enters metastable state, one cannot predict
its output voltage level after it exits the metastability state nor when the output will settle
to some stable voltage level. The metastability failure is said to have occurred if the
output has not resolved itself by the time it must be available for use. Also, since, the
output remains in-between ‘0’ and ‘1’, which means both P-MOS and N-MOS are not
switched off. Hence, VDD is shorted to GND terminal making it cause a high current to
flow through as long as the output is hanging in-between.

Metastability example: Consider a CMOS inverter circuit as shown below. The current vs
voltage (we can also say power vs voltage as VDD is constant) characteristics for this
circuit are also shown. It can be observed that output current is 0 for both input voltage
levels; i.e. ‘0’ and ‘1’. As the voltage level is increased from ‘logic 0’, the current
increases. It attains its maximum value at ‘Vin’ somewhere near VDD/2. It again starts
decreasing as ‘Vin’ is increased further and again becomes 0 when ‘Vin’ is at ‘logic 1’.
Thus, there is a local maxima for power consumption for CMOS inverter. At this point,
the device is in unstable equilibrium. As for CMOS inverter, for other CMOS devices too,
there lies ‘a local maxima’ at some value of input voltage. We all know that for a flip-flop,
the output stage is a combinational gate (mostly an inverter). So, we can say that the
output of the flip-flop is prone to metastability provided right input level.

Figure 1: Power characteristics of CMOS inverter


As we now know that a CMOS combinational gate has a point on its ‘voltage
characteristic’ curve that is quasi-stable, let us look at a CMOS latch from the same
perspective. The CMOS latch has a transmission gate followed by two inverters
connected in feedback loop. The voltage characteristic curves for the two inverters are
shown. The metastable point, here, lies where the two curves intersect as this point is
the resulting peak point of the ‘Superposition curve’ resulting from the two individual
curves. A latch goes into metastable state very frequently, especially if the input is
changing fast. But, this metastability is resolved quickly as the output tends to go to one
of its stable states. As a flop is generally made by connecting two latches in master-slave
configuration, the flops are also prone to be metastable. The difference here is just that
the probability of a flip-flop being metastable is a lot less than latches as ‘flops are edge
sensitive’ as compared to latches which are level sensitive.

Figure 2: Transfer curves of two inverters in a D-latch

We have just came to know that different elements are prone to metastability to different
extents. There is a measure to determine the extent to which an element is prone to
metastability failure. This is given by an interval known as ‘Mean Time Between Failures’
(MTBF) and is a measure of how prone an element is to failure. It gives the average time
interval between two successive failures. The failure rate is given as the reciprocal of
MTBF.

Quiz : Clock gating check at a complex gate(write)


Problem: Consider a complex gate with internal structure as shown in figure below. One
of the inputs gets clock while all others get data signals. What all (and what type of)
clock gating checks exist?

Figure:Problem figure

Solution: As we know, clock gating checks can be of AND type or OR type. We can
find the type of clock gating check formed between a data and a clock signal by
considering all other signals as constant. Since, all the 4 data signals control Clk in one
or the other way, there are following clock gating checks formed:

i) Clock gating check between Data1 and Clk: As is evident, invert


of Clk and Data1 meet at OR gate ‘6’. Hence, there is OR type check between invert of
Clk and Data1. In other words, Data1 can change only when invert of Clk is
high or Clk is low. Hence, there is AND type check formed at gate 6.

ii) Clock gating check between Data2 and Clk: Same as in case 1.

iii) Clock gating check between Data3 and Clk: There is AND type check between
Data3 and Clk.
iv) Clock gating check between Data4 and CLK: As in 1 and 2, there is AND type
check between Data4 and Clk.

Lockup latch – principle, application and


timing(wite)doneee
What are lock-up latches: Lock-up latch is an important element in scan-based designs, especially for
hold timing closure of shift modes. Lock-up latches are necessary to avoid skew problems during shift
phase of scan-based testing. A lock-up latch is nothing more than a transparent latch used intelligently
in the places where clock skew is very large and meeting hold timing is a challenge due to large
uncommon clock path. That is why, lockup latches are used to connect two flops in scan chain having
excessive clock skews/uncommon clock paths as the probability of hold failure is high in such cases. For
instances, the launching and capturing flops may belong to two different domains (as shown in figure
below). Functionally, they might not be interacting. Hence, the clock of these two domains will not be
balanced and will have large uncommon path. But in scan-shift mode, these interact shifting the data in
and out. Had there been no lockup latches, it would have been very difficult for STA engineer to close
timing in a scan chain across domains. Also, probability of chip failure would have been high as there a
large uncommon path between the clocks of the two flops leading to large on-chip-variations. That is
why; lockup latches can be referred as as the soul mate of scan-based designs.
Figure 1 : Lockup latches - the soul mate of scan-based designs

Where to use a lock-up latch: As mentioned above, a lock-up latch is used where there is high
probability of hold failure in scan-shift modes. So, possible scenarios where lockup latches are to be
inserted are:
 Scan chains from different clock domains: In this case, since, the two domains do not interact
functionally, so both the clock skew and uncommon clock path will be large.
 Flops within same domain, but at remote places: Flops within a scan chain which are at remote
places are likely to have more uncommon clock path.
In both the above mentioned cases, there is a great chance that the skew between the launch and
capture clocks will be high. There is both the probability of launch and capture clocks having greater
latency. If the capture clock has greater latency than launch clock, then the hold check will be as shown
in timing diagram in figure 3. If the skew difference is large, it will be a tough task to meet the hold
timing without lockup latches.

Figure 2: A path crossing from domain 1 to domain 2 (scope for a lock-up latch insertion)

Figure 3: Timing diagram showing setup and hold checks for path crossing from domain 1 to domain 2

Positive or negative level latch?? It depends on the path you are inserting a lock-up latch. Since, lock-up
latches are inserted for hold timing; these are not needed where the path starts at a positive edge-
triggered flop and ends at a negative edge-triggered flop. It is to be noted that you will never find scan
paths originating at positive edge-triggered flop and ending at negative edge-triggered flop due to DFT
specific reasons. Similarly, these are not needed where path starts at a negative edge-triggered flop and
ends at a positive edge-triggered flop. For rest two kinds of flop-to-flop paths, lockup latches are
required. The polarity of the lockup latch needs to be such that it remains open during the inactive
phase of the clock. Hence,

 For flops triggering on positive edge of the clock, you need to have latch transparent when
clock is low (negative level-sensitive lockup latch)
 For flops triggering on negative edge of the clock, you need to have latch transparent when
clock is high (positive level-sensitive lockup latch)

Who inserts a lock-up latch: These days, tools exist that automatically add lockup latches where a scan
chain is crossing domains. However, for cases where a lockup latch is to be inserted in an intra-domain
scan chain (i.e. for flops having uncommon path), it has to be inserted during physical implementation
itself as physical information is not feasible during scan chain implementation (scan chain
implementation is carried out at the synthesis stage itself).

Which clock should be connected to lock-up latch: There are two possible ways in which we can
connect the clock pin of the lockup latch inserted. It can either have same clock as launching flop or
capturing flop. Connecting the clock pin of lockup latch to clock of capturing flop will not solve the
problem as discussed below.
 Lock-up latch and capturing flop having the same clock (Will not solve the problem): In this
case, the setup and hold checks will be as shown in figure 5. As is apparent from the waveforms, the
hold check between domain1 flop and lockup latch is still the same as it was between domain 1 flop and
domain 2 flop before. So, this is not the correct way to insert lockup latch.

Figure 4: Lock-up latch clock pin connected to clock of capturing flop


Figure 5: Timing diagrams for figure 4

 Lock-up latch and launching flop having the same clock: As shown in figure 7, connecting the
lockup latch to launch flop’s clock causes the skew to reduce between the domain1 flop and lockup
latch. This hold check can be easily met as both skew and uncommon clock path is low. The hold check
between lockup latch and domain2 flop is already relaxed as it is half cycle check. So, we can say that
the correct way to insert a lockup latch is to insert it closer to launching flop and connect the launch
domain clock to its clock pin.

Figure 6: Lock-up latch clock pin connected to clock of launch flop


Figure 7: Waveforms for figure 6

Why don’t we add buffers: If the clock skew is large at places, it will take a number of buffers to meet
hold requirement. In normal scenario, the number of buffers will become so large that it will become a
concern for power and area. Also, since skew/uncommon clock path is large, the variation due to OCV
will be high. So, it is recommended to have a bigger margin for hold while signing it off for timing. Lock-
up latch provides an area and power efficient solution for what a number of buffers together will not be
able to achieve.

Advantages of inserting lockup latches:


 Inserting lock-up latches helps in easier hold timing closure for scan-shift mode
 Robust method of hold timing closure where uncommon path is high between launch and
capture flops
 Power efficient and area efficient
 It improves yield as it enables the device to handle more variations.
Lockup registers: Instead of latches, registers can also be used as lockup elements;
however, they have their own advantages and disadvantages. Please refer to Lockup
latches vs. lockup registers : what to chose for a comparative study of using lockup
latches vs lockup registers.

Lockup latches vs. lockup registers: what to


choose(write)doneeeee
Both lockup latches and lockup registers are used to make scan chain robust to hold
failures. What one uses for the same depends upon his/her priorities and the situation.
However, it seems lockup latches are more prevalent in designs of today. This might be
due to following reasons:
1. Area: As we know, a latch occupies only half the area as a register. So,
using lockup latches instead of lockup registers gives us area and power
advantage; i.e., less overhead.
2. Timing: Lockup elements – timing perspective has given an analysis of
how timing critical lockup elements (lockup latches and lockup registers) paths
can be. According to it, using a negative lockup latch, you don’t have to meet
timing at functional (at-speed) frequency. However, in all other cases, you need
to meet timing. This might also be a reason people prefer lockup latches.
Lockup latches, on one hand relax only one side hold. So, you can afford to have skew
only on one side, either on launch or on capture. Lockup registers, on the other hand,
let you have skew on both the sides. So, lockup latches are preferable where you can
afford to have tap on the clock either from launch flop or on capture flop. On the other
hand, lockup flops can be used by tapping clock from any point as long as you meet
setup and hold timings.

Clock latency

Definition of clock latency (clock insertion delay): In sequential designs, each timing
path is triggered by a clock signal that originates from a source. The flops being
triggered by the clock signal are known as sinks for the clock. In general, clock latency
(or clock insertion delay) is defined as the amount of time taken by the clock
signal in traveling from its source to the sinks. Clock latency comprises of two
components - clock source latency and clock network latency.
 Source latency of clock (Source insertion delay): Source latency is defined as
the time taken by the clock signal in traversing from clock source (may be PLL,
oscillator or some other source) to the clock definition point. It is also known as source
insertion delay. It can be used to model off-chip clock latency when clock source is not
part of the chip itself.
 Network latency of clock (Network insertion delay): Network latency is
defined as the time taken by the clock signal in traversing from clock definition point to
the sinks of the clock. Thus, each sink of the clock has a different network latency. If we
talk about the clock, it will have:
o Maximum network latency: Maximum of all the network latencies
o Minimum network latency: Minimum of all the network latencies
o Average network latency: Average of all the network latencies
Total clock latency is given as the sum of source latency and network latency. In other
words, total clock latency at a point is given as follows:

Clock latency = Source latency + Network latency

It is generally stated that for a robust clock tree, ‘sum of source latency and network
latency for all sinks of a clock should be equal’. If that is the case, the clock tree is
said to be balanced as this means that all the registers are getting clock at the same
time; i.e., clock skew is zero.
Figure 1 : Figure showing source latency and network latency components of clock latency

Figure 1 above shows the two components of clock latency, i.e. source latency and
network latency. Each flop (sink, in general) has its own latency since the path traced by
clock from source to it may be different. The above case may be found in block level
constraints in case of hierarchical designs wherein clock source is sitting outside the
block and clock signal enters the block through a block port. It may also represent a
case of a chip in which the clock source is sitting outside; e.g. some external device is
controlling the chip. In that case, clock source will be sitting inside that device.

How to specify clock latency: In EDA tools, we can model clock latency using SDC
command ‘set_clock_latency’ to imitate the behavior there will be after clock tree will be
built. Using this command, we can specify both the source latency for a clock as well as
the network latency. After clock tree has been built, the latency for the sinks is
calculated by the tool itself from the delays of various elements. However, in case the
clock source is sitting outside, it still needs to be modeled by source latency even after
the clock tree synthesis. To specify clock latency for clock signal named ‘CLK’, we may
use SDC command set_clock_latency:

set_clock_latency <value> CLK


set_clock_latency <value> CLK –source

First command will specify the network latency whereas the second command will
specify the source latency for CLK.

Also read:
 Clock - the incharge of synchronous designs
 Timing corners - dimensions in timing signoff
 Ensuring glitch-free propagation of clock
 Clock switching and clock gating checks
 Can hold be frequency dependent

Data checks : data setup and data hold in VLSI(doneee)

Many a times, two or more signals at analog-digital interface or at the chip interface

have some timing requirement with respect to each other. These requirements are

generally in the form of minimum skew and maximum skew. Data checks come to

rescue in such situations. Theoretically speaking, data-to-data checks are applied

between two arbitrary data signals, none of which is a clock. One of these is called

constrained pin, which is like data pin of the flop. The other is called related pin, which is

like clock pin of the flop. In the figure below, is shown two data signal at a boundary

(possibly analog hard macro) having some minimum skew requirement between them.
Figure 1 : Two signals arriving at a boundary having skew requirement

Data-to-data checks are zero cycle checks: An important difference between normal

setup check (between a clock signal and a data signal) and data-to-data checks is that

data-to-data checks are zero cycle checks while normal setup check is single cycle

check. When we say that data checks are zero-cycle checks, we mean that these are

between two data signals that have launched at the same clock edge with respect to

each other.

As shown in the figure (i) below, traditional setup check is between a data signal

launched at one clock edge and a clock. Since, the data is launched at one clock edge

and is checked with respect to one edge later, it is a single cycle check. On the other

hand, as shown in figure (ii), data-to-data check is between two signals both of which
are launched on the same clock edge. Hence, we can say data-to-data checks are zero

cycle checks.

Figure 2 : (i) Normal setup check between a data signal and a clock signal, (ii) Data to dat
setup check between two data signals

What command in EDA tools is used to model data-to-data checks: Data checks

are modeled in EDA tools using ‘set_data_check’ command.

Set_data_check –from A –to B –setup <x>

Set_data_check –from A –to B –hold <y>

Here, A is the related pin and B is the constrained pin. The first command constrains B

to toggle at least ‘x’ time before ‘A’. Second command constrains ‘B’ to toggle at least ‘y’

time after ‘A’.

Data setup time and data hold time: Similar to setup time and hold time, we can

define data setup time and data hold time as follows:


 Definition of data setup time: Data setup time can be defined as the minimum
time before the toggling of reference signal for which constrained signal should become
stable. In the example above, <x> is data setup time.
 Definition of data hold time: Data hold time can be defined as the minimum
time after the toggling of reference signal for which constrained signal should remain
stable. In the example above, <y> is data hold time.
Modeling data-to-data checks through liberty file: We can model data checks

through .lib also. There are constructs that can be used to model data-to-data checks.

These

are non_seq_setup_rising, non_seq_setup_falling, non_seq_hold_rising and non_

seq_hold_falling. These commands specify setup and hold data-to-data checks with

respect to rising or falling edge of reference signal respectively. E.g.

‘non_seq_setup_falling’ represents data setup check with respect to falling edge of

reference signal. ‘rise_constraint’ and ‘fall_constraint’ can be used inside these to model

the setup and hold checks for rising and falling edge of constrained signal. Figure 3

below shows an example of modeling data setup check through liberty file.

Figure 3 : Modeling data-to-data checks through .lib using non_seq_setup_rising


In which cases data-to-data checks are applied: Data checks are normally applied

where there is a specific requirement of skew (either minimum of maximum) or race

condition (where the order of arrival of two signals can affect output and the intention is

to get one of the probable outputs by constraining one signal to come before the other)

between two or more signals. These may be required where:


 At the digital-analog interface within a chip where analog signals at the analog
block boundary are required in a specific order
 At the chip boundary, some asynchronous interface signals may have skew
requirements

How data checks are useful: As we have seen above, data checks provide a

convenient measure to constrain two or more data signals with respect to each other.

Had these checks not been there, it would have been a manual effort to check the skew

between the two arriving signals and to maintain it. Also, it would not have been

possible to get the optimization done through implementation tool as these would not

have been constrained for it.

Quiz: Modeling skew requirements with data-to-data


setup and hold checks(doneee)
Problem: Suppose there are 'N' signals which are to be skew matched within a
window of 200 ps with respect to each other. Model this requirement with the help
of data setup and hold checks.

As we discussed in data setup and data hold checks, data setup check of 200 ps means
that constrained data should come at least 200 ps before the reference data. Similarly,
data hold check of 200 ps constrains the constrained data to come at least 200 ps after
the reference data. The same is shown pictorially in figure 1(a) and 1(b).
Figure 1(a): Data setup check of 200 ps constrains the constrained signal to toggle at-least 200 ps before reference
signal toggles.

Figure 1(b): Data hold check of 200 ps constrains the constrained signal to toggle at-least 200 ps after the reference
signal has toggled.

Now, suppose you apply a data setup check of -200 ps instead of 200 ps. This would
mean that the constrained signal can toggle upto 200 ps after the reference signal.
Similarly, a data hold check of -200 ps would mean that the constrained signal can
toggle from 200 ps before the reference signal. If we apply both the checks together, it
would infer that constrained signal can toggle in a window that ranges from 200 ps
before the toggling of reference signal to 200 ps after the toggling of reference signal.
This is pictorially shown in figures 2(a) and 2(b).
Figure 2(a): Negative data setup and hold checks of 200 ps
If we combine the two checks, it implies that the constrained data can toggle upto 200
ps after and from 200 ps before the reference signal. In other words, we have
constrained the constrained signal to toggle in a window +- 200 ps within the reference
signal.

Coming to the given problem, if there are a number of signals required to toggle within a
window of 200 ps, we can consider one of these to act as reference signal and other
signals as constrained signals. The other signals can then be constrained in both setup
and hold with respect to reference signal such that all of these lie within +-100 ps of the
reference signal. The same is shown in figure 3 below:

Figure 3: Data checks to maintain skew between N signals


What is Static Timing Analysis?(write)doneeeee
Static timing analysis (STA) is an analysis method of computing the delay bounds of a
complete circuit without actually simulating the full circuit. In STA, static delays such as
gate delay and net delays are considered in each path. These delays are, then,
compared against the required bounds on the delay values and/or the relationship
between the delays of different gates. In STA, the circuit to be analyzed is broken down
into timing paths consisting of gates, registers and nets connecting these. Normally,
timing paths start from and end at registers or chip boundary. Based on origin and
termination of data, timing paths can be categorized into four categories:
1.) Input to register paths: These paths start at chip boundary from input ports and
end at registers
2.) Register to register paths: These paths start at register output pin and terminate at
register input pin
3.) Register to output paths: These paths start at a register and end at chip boundary
output ports
4.) Input to output paths: These paths start from chip boundary at input port and end
at chip boundary at output port
Timing path from each start-point to end-point are constrained to have maximum and
minimum delays. For example, for register to register paths, each path can take
maximum of one clock cycle (minus input/output delay in case of input/output to register
paths). The minimum delay of a path is governed by hold timing requirement of the
endpoints. Thus, the maximum delay taken by a timing path governs the maximum
frequency of operation.
As stated before, Static timing analysis does timing analysis without actually simulating
the circuit. The delays of cells are picked from respecting technology libraries. The
delays are available in libraries in tabulated form on the basis of input transition and
output load, which have been calculated based by simulating the cells for a range of
boundary conditions. Net delays are calculated based upon R and C models.

One important characteristic of static timing analysis that must be discussed is that
static timing analysis checks the static delay requirements of the circuit without applying
any vectors, hence, the delays calculated are the maximum and minimum bounds of the
delays that will occur in real application scenarios with vectors applied. This enables the
static timing analysis to be fast and inclusive of all the boundary conditions. Dynamic
timing analysis, on the contrary, applies input vectors, so is very slow. It is necessary to
certify the functionality of the design. Thus, static timing analysis guarantees the timing
of the design whereas dynamic timing analysis guarantees functionality for real
application specific input vectors.

I hope you’ve found this post useful. Let me know what you think in the comments. I’d
love to hear from you all.

1.Setup checks and hold checks for reg-to-reg paths


(doneeee)
In the post (Setup time and hold time – static timing analysis), we introduced setup and hold
timing requirements and also discussed why these requirements exist. In this post, we will be
discussing how these checks are applied for different cases for paths starting from and ending at
flip-flops.

In present day designs, most of the paths (more than 95%) start from and end at flip-flops
(exceptions are there like paths starting from and/or ending at latches). There can be flops which
are positive edge triggered or negative edge triggered. Thus, depending upon the type of
launching flip-flop and capturing flip-flop, there can be 4 cases as discussed below:

1) Setup and hold checks for paths launching from positive edge-triggered flip-flop and
being captured at positive edge-triggered flip-flop (rise-to-rise checks): Figure 1 shows a
path being launched from a positive edge-triggered flop and being captured on a positive edge-
triggered flop. In this case, setup check is on the next rising edge and hold check is on the same
edge corresponding to the clock edge on which launching flop is launching the data.
Figure 1 : Timing path from positive edge flop to positive edge flop (rise to rise path)

Figure 2 below shows the setup and hold checks for positive edge-triggered register to positive
edge-triggered register in the form of waveform. As is shown, setup check occurs at the next
rising edge and hold check occurs at the same edge corresponding to the launch clock edge. For
this case setup timing equation can be given as:
Tck->q + Tprop + Tsetup < Tperiod + Tskew (for setup check)

And the equation for hold timing can be given as:


Tck->q + Tprop > Thold + Tskew (for hold check)

Where
Tck->q : Clock-to-output delay of launch register
Tprop : Maximum delay of the combinational path between launch and capture register
Thold : Hold time requirement of capturing register
Tskew : skew between the two registers (Clock arrival at capture register - Clock arrival at
launch register)

Figure 2 : Setup and hold check for rise-to-rise path

Also, we show below the data valid and invalid windows. From this figure,

Data valid window = Clock period – Setup window – Hold window


Start of data valid window = Tlaunch + Thold
End of data valid window = Tlaunch + Tperiod – Tsetup
In other words, data at the input of capture register can toggle any time between ( Tlaunch + Thold)
and (Tlaunch + Tperiod – Tsetup).

Figure 3: Figure showing data valid window for rise-to-rise path

2) Setup and hold checks for paths launching from positive edge-triggered flip-flop and
being captured at negative edge-triggered flip-flop: In this case, both setup and hold check are
half cycle checks; setup being checked on the next falling edge at the capture flop and hold on
the previous falling edge of clock at the capture flop (data is launched at rising edge). Thus, with
respect to (case 1) above, setup check has become tight and hold check has relaxed.
Figure 4: Timing path from positive edge flop to negative edge flop (Rise-to-fall path)

Figure 5 below shows the setup and hold checks in the form of waveforms. As is shown, setup
check occurs at the next falling edge and hold check occurs at the previous falling edge
corresponding to the launch clock edge. The equation for setup check can be written, in this case,
as:
Tck->q + Tprop + Tsetup < (Tperiod/2) + Tskew (for setup check)

And the equation for hold check can be written as:


Tck->q + Tprop + (Tperiod/2) > Thold + Tskew (for hold check)

Figure 5: Setup and hold checks for rise-to-fall paths

Also, we show below the data valid and invalid windows. From this figure,

Data valid window = Clock period – Setup window – Hold window


Start of data valid window = Tlaunch – (Tperiod/2)+ Thold
End of data valid window = Tlaunch + (Tperiod/2) – Tsetup

As we can see, the data valid window is spread evenly on both sides of launch clock edge.

Figure 6: Figure showing data valid window for rise-to-fall path

3) Setup and hold checks for paths launching from negative edge-triggered flip-flop
and being captured at positive edge-triggered flip-flop (rise-to-fall paths): This case is
similar to case 2; i.e. both setup and hold checks are half cycle checks. Data is launched on
negative edge of the clock, setup is checked on the next rising edge and hold on previous rising
edge of the clock.

Figure 7: Timing path from negative edge flop to positive edge flop (fall-to-rise path)
Figure 8 below shows the setup and hold checks in the form of waveforms. As is shown, setup
check occurs at the next rising edge and hold check occurs at the previous rising edge
corresponding to the launch clock edge. The equation for setup check can be written, in this case, as:

Tck->q + Tprop + Tsetup < (Tperiod/2) + Tskew (for setup check)

And the equation for hold check can be written as:


Tck->q + Tprop + (Tperiod/2) > Thold + Tskew (for hold check)

Figure 8: Setup and hold checks for fall to rise paths

Also, we show below the data valid and invalid windows. From this figure,

Data valid window = Clock period – Setup window – Hold window


Start of data valid window = Tlaunch – (Tperiod/2)+ Thold
End of data valid window = Tlaunch + (Tperiod/2) – Tsetup

In this case too, data valid window spreads evenly on both the sides of launch clock edge.
Figure 9: Figure showing data valid window for fall-to-rise path

4) Setup and hold checks for paths launching from negative edge-triggered flip-flop
and being captured at negative edge-triggered flip-flop (fall-to-fall paths): The interpretation
of this case is similar to case 1. Both launch and capture of data happen at negative edge of the
clock. Figure 10 shows a path being launched from a negative edge-triggered flop and being
captured on a negative edge-triggered flop. In this case, setup check is on the next falling edge
and hold check is on the same edge corresponding to the clock edge on which launching flop is
launching the data.

Figure 10: Path from negative edge flop to negative edge flop (fall to fall path)

Figure below shows the setup and hold checks in the form of waveforms. As is shown, setup
check occurs at the next falling edge and hold check occurs at the same edge corresponding to
the launch clock edge.
The equation for setup check can be given as:
Tck->q + Tprop + Tsetup < Tperiod + Tskew (for setup check)

And the equation for hold check can be given as:

Tck->q + Tprop > Thold + Tskew (for hold check)

Figure 11: Setup and hold check for fall-to-fall path

Also, we show below the data valid and invalid windows. From this figure,

Data valid window = Clock period – Setup window – Hold window


Start of data valid window = Tlaunch + Thold
End of data valid window = Tlaunch + Tperiod – Tsetup
Figure 12: Figure showing data valid window for fall-to-fall path

2.Setup check and hold check for register-to-latch timing


paths

In the post (Setup and hold – basics of timinganalysis), we introduced setup and hold timing

requirements and also discussed why these requirements are needed to be applied. In this

post, we will be discussing how these checks are applied for different cases for paths starting

from flops and ending at latches and vice-versa.

Present day designs are focused mainly on the paths between flip-flops as the elements

dominating in them are flip-flops. But there are also some level-sensitive elements involved in

data transfer in current-day designs. So, we need to have knowledge of setup and hold checks

for flop-to-latch and latch-to-flop paths too. In this post, we will be discussing the former case.

In totality, there can be total 4 cases involved in flop-to-latch paths as discussed below:

1) Paths launching from positive edge-triggered flip-flop and being captured at positive

level-sensitive latch: Figure 1 shows a path being launched from a positive edge-triggered flop

and being captured on a positive level-sensitive latch. In this case, setup check is on the same
rising edge (without time borrow) and next falling edge (with time borrow) and hold check on

the previous falling edge with respect to the edge at which data is launched by the launching

flop.

Figure 1: Timing path from positive edge flop to positive level latch

Figure below shows the waveforms for setup and hold checks in case of paths starting from a

positive edge triggered flip-flop and ending at a positive level sensitive latch. As can be figured

out, setup and hold check equations can be described as:

Tck->q + Tprop + Tsetup < Tskew + Tperiod/2 (for setup check)

Tck->q + Tprop > Thold + Tskew - (Tperiod/2) (for hold check)


2) Paths launching from positive edge-triggered flip-flop and being captured at negative

level-sensitive latch: Figure 3 shows a path starting from positive edge-triggered flip-flop and

being captured at a negative level sensitive latch. In this case, setup check is on the next falling

edge (without time borrow) and on next positive edge (with time borrow). Hold check on the

same edge with respect to the edge at which data is launched (zero cycle hold check).

Figure 3: Path from positive edge flop to negative level latch

Figure below shows the waveforms for setup and hold checks in case of paths starting from a
positive edge triggered flip-flop and ending at a negative level-sensitive latch. As can be figured
out, setup and hold check equations can be described as:

Tck->q + Tprop + Tsetup < (Tperiod) + Tskew (for setup check)

Tck->q + Tprop > Thold + Tskew (for hold check)


3) Paths launching from negative edge-triggered flip-flop and being captured at positive

level-sensitive latch: Figure 5 shows a path starting from negative edge-triggered flip-flop and

being captured at a positive level sensitive latch. In this case, setup check is on the next rising

edge (without time borrow) and next falling edge (with time borrow). Hold check on the next

rising edge with respect to the edge at which data is launched.

Figure 5: Path from negative edge flop to positive level latch

Figure below shows the waveforms for setup and hold checks in case of paths starting from a
negative edge triggered flip-flop and ending at a positive level-sensitive latch. As can be figured
out, setup and hold check equations can be described as:

Tck->q + Tprop + Tsetup < (Tperiod) + Tskew (for setup check)

Tck->q + Tprop > Thold + Tskew (for hold check)


4) Paths launching from negative edge-triggered flip-flop and being captured at

negative level-sensitive latch: Figure 5 shows a path starting from negative edge-triggered flip-

flop and being captured at a negative level sensitive latch. In this case, setup check is on the

same edge (without time borrow) and on next rising edge (with time borrow). Hold check on

the next rising edge with respect to the edge at which data is launched.

Figure 5: Path from negative edge flop to negative level latch

Figure below shows the waveforms for setup and hold checks in case of paths starting

from a negative edge triggered flip-flop and ending at a negative level-sensitive latch.

As can be figured out, setup and hold check equations can be described as:

Tck->q + Tprop + Tsetup < Tperiod/2 + Tskew (for setup check)

Tck->q + Tprop > Thold + Tskew - (Tperiod/2) (for hold check)


3.Setup checks and hold checks for latch-to-reg timing
paths
There can be 4 cases of latch-to-register timing paths as discussed below:
1. Positive level-sensitive latch to positive edge-triggered register: Figure 1 below shows a
timing path being launched from a positive level-sensitive latch and being captured at a positive
edge-triggered register. In this case, setup check will be full cycle with zero-cycle hold check.
Time borrowed by previous stage will be subtracted from the present stage.

Figure 1: Positive level-sensitive latch to positive edge-triggered register timing path


Timing waveforms corresponding to setup check and hold check for a timing path from
positive level-sensitive latch to positive edge-triggered register is as shown in figure 2
below.

Figure 2: Setup and hold check waveform for positive latch to positive register timing path
2. Positive level-sensitive latch to negative edge-triggered register: Figure 3 below
shows a timing path from a positive level-sensitive latch to negative edge-triggered
register. In this case, setup check will be half cycle with half cycle hold check. Time
borrowed by previous stage will be subtracted from the present stage.

Figure 3: A timing path from positive level-sensitive latch to negative edge-triggered register
Timing waveforms corresponding to setup check and hold check for timing path starting
from positive level-sensitive latch and ending at negative edge-triggered register is
shown in figure 4 below:

Figure 4: Setup and hold check waveform for timing path from positive latch to negative register

3. Negative level-sensitive latch to positive edge-triggered register: Figure 5 below


shows a timing path from a negative level-sensitive latch to positive edge-triggered
register. Setup check, in this case, as in case 2, is half cycle with half cycle hold check.
Time borrowed by previous stage will be subtracted from the present stage.

Figure 5: Timing path from negative level-sensitive latch to positive edge-triggered register
Timing waveforms for path from negative level-sensitive latch to positive edge-triggered
flop are shown in figure 6 below:
Figure 6: Waveform for setup check and hold check corresponding to timing path from negative latch to positive
flop

4. Negative level-sensitive latch to negative edge-triggered register: Figure 7 below shows a


timing path from negative level-sensitive latch from a negative edge-triggered register. In this
case, setup check will be single cycle with zero cycle hold check. Time borrowed by previous
stage will be subtracted from present stage.

Figure 7: Timing path from negative latch to negative flop


Figure 8 below shows the setup check and hold check waveform from negative level-
sensitive latch to negative edge-triggered flop.

Figure 8: Timing waveform for path from negative latch to negative flip-flop

All about clock signals


Today’s designs are dominated by digital devices. These are all synchronous state
machines consisting of flip-flops. The transition from one state to next is synchronous
and is governed by a signal known as clock. That is why, we have aptly termed clock
as ‘the in-charge of synchronous designs’.
Definition of clock signal: We can define a clock signal as the one which synchronizes
the state transitions by keeping all the registers/state elements in synchronization. In
common terminology, a clock signal is a signal that is used to trigger sequential devices
(flip-flops in general). By this, we mean that ‘on the active state/edge of clock, data at
input of flip-flops propagates to the output’. This propagation is termed as state
transition. As shown in figure 1, the ‘2-bit’ ring counter transitions from state ‘01’ to ‘10’
on active clock edge.

Figure 1: Figure showing state transition on active edge of clock

Clock signals occupy a very important place throughout the chip design stages. Since,

the state transition happens on clock transition, the entire simulations including

verification, static timing analysis and gate level simulations roam around these clock

signals only. If Static timing analysis can be considered as a body, then clock is its

blood. Also, during physical implementation of the design, special care has to be given

to the placement and routing of clock elements, otherwise the design is likely to fail.

Clock elements are responsible for almost half the dynamic power consumption in the

design. That is why; clock has to be given the prime importance.


Clock tree: A clock signal originates from a clock source. There may be designs with a

single clock source, while some designs have multiple clock sources. The clock signal is

distributed in the design in the form of a tree; leafs of the tree being analogous to the

sequential devices being triggered by the clock signal and the root being analogous to

the clock source. That is why; the distribution of clock in the design is termed as clock

tree. Normally, (except for sometimes when intentional skew is introduced to cater

some timing critical paths), the clock tree is designed in such a way that it is balanced.

By balanced clock tree, we mean that the clock signal reaches each and every element

of the design almost at the same time. Clock tree synthesis (placing and routing clock

tree elements) is an important step in the implementation process. Special cells and

routing techniques are used to ensure a robust clock tree.

Clock domains: By clock domain, we mean ‘the set of flops being driven by the clock

signal’. For instance, the flops driven by system clock constitute system domain.

Similarly, there may be other domains. There may be multiple clock domains in a

design; some of these may be interacting with each other. For interacting clock

domains, there must be some phase relationship between the clock signals otherwise

there is chance of failure due to metastability. If phase relationship is not possible to

achieve, there should be clock domain synchronizers to reduce the probability of

metastability failure.

Specifying a signal as clock: In EDA tools, ‘create_clock’ command is used to

specify a signal as a clock. We have to pass the period of the clock, clock definition

point, its reference clock (if it is a generated clock as discussed below), duty cycle,

waveform etc. as argument to the command.

Master and generated clocks: EDA tools have the concept of master and generated
clocks. A generated clock is the one that is derived from another clock, known as its

master clock. The generated clock may be of the same frequency or different frequency

than its master clock. In general, a generated clock is defined so as to distinguish it

from its master in terms of frequency, phase or domain relationship.

Some terminology related to clock: There are different terms related to clock signals,

described below:

 Leading and trailing clock edge: When clock signal transitions from ‘0’ to ‘1’,
the clock edge is termed as leading edge. Similarly, when clock signal transitions from
‘1’ to ‘0’, the clock edge is termed as trailing edge.

 Launch and capture edge: Launch edge is that edge of the clock at which data
is launched by a flop. Similarly, capture edge is that edge of the clock at which data is
capture by a flop.
 Clock skew: Clock skew is defined as the difference in arrival times of clock
signals at different leaf pins. Considering a set of flops, skew is the difference in the
minimum and maximum arrival times of the clock signal. Global skew is the clock skew
for the whole design. On the contrary, considering only a portion of the design, the skew
is termed as local skew.
Synchronizers(dneeee)

Modern VLSI designs have very complex architectures and multiple clock sources.

Multiple clock domains interact within the chip. Also, there are interfaces that connect

the chip to the outside world. If these different clock domains are not properly

synchronized, metastability events are bound to happen and may result in chip failure.

Synchronizers come to rescue to avoid fatal effects of metastability arising due to

signals crossing clock domain boundaries and are must where two clock domains

interact. Understanding metasbility and correct design of synchronizers to prevent

metastability happen is an art. For systems with only one clock domain, synchronizers

are required only when reading an asynchronous signal.

Synchronizers, a key to tolerate metastability: As mentioned earlier, asynchronous

signals cause catastrophic metastability failures when introduced into a clock domain.

So, the first thing that arises in one’s mind is to find ways to avoid metastability failures.

But the fact is that metastability cannot be avoided. We have to learn to tolerate

metastability. This is where synchronizers come to rescue. A synchronizer is a digital

circuit that converts an asynchronous signal/a signal from a different clock domain into

the recipient clock domain so that it can be captured without introducing any

metastability failure. However, the introduction of synchronizers does not totally

guarantee prevention of metastability. It only reduces the chances of metastability by a

huge factor. Thus, a good synchronizer must be reliable with high MTBF, should have

low latency from source to destination domain and should have low area/power impact.

When a signal is passed from one clock domain to another clock domain, the circuit that

receives the signal needs to synchronize it. Whatever metastability effects are to be

caused due to this, have to be absorbed in synchronizer circuit only. Thus, the purpose
of synchronizer is to prevent the downstream logic from metastable state of first flip-flop

in new clock domain.

Flip-flop based synchronizer (Two flip-flop synchronizer): This is the most simple

and most common synchronization scheme and consists of two or more flip-flops in

chain working on the destination clock domain. This approach allows for an entire clock

period for the first flop to resolve metastability. Let us consider the simplest case of a

flip-flop synchronizer with 2 flops as shown in figure. Here, Q2 goes high 1 or 2 cycles

later than the input.

A two flip-flop synchronizer

As said earlier, the two flop synchronizer converts a signal from source clock domain to

destination clock domain. The input to the first stage is asynchronous to the destination

clock. So, the output of first stage (Q1) might go metastable from time to time. However,

as long as metastability is resolved before next clock edge, the output of second stage

(Q2) should have valid logic levels (as shown in figure 3). Thus, the asynchronous

signal is synchronized with a maximum latency of 2 clock cycles.Theoretically, it is still

possible for the output of first stage to be unresolved before it is to be sampled by

second stage. In that case, output of second stage will also go metastable. If the
probability of this event is high, then you need to consider having a three stage

synchronizer.

Waveforms in a two flop synchronizer

A good two flip-flop synchronizer design should have following characteristics:

 The two flops should be placed as close as possible to allow the metastability at
first stage output maximum time to get resolved. Therefore, some ASIC libraries have
built in synchronizer stages. These have better MTBF and have very large flops used,
hence, consume more power.
 Two flop synchronizer is the most basic design all other synchronizers are based
upon.
 Source domain signal is expected to remain stable for minimum two destination
clock cycles so that first stage is guaranteed to sample it on second clock edge. In
some cases, it is not possible even to predict the destination domain frequency. In such
cases, handshaking mechanism may be used.
 The two flop synchronizer must be used to synchronize a single bit data only.
Using multiple two flops synchronizers to synchronize multi-bit data may lead to
catastrophic results as some bits might pass through in first cycle; others in second
cycle. Thus, the destination domain FSM may go in some undesired state.
 Another practice that is forbidden is to synchronize same bit by two different
synchronizers. This may lead to one of these becoming 0, other becoming 1 leading into
inconsistent state.
 The two stages in flop synchronizers are not enough for very high speed clocks
as MTBF becomes significantly low (eg. In processors, where clocks run in excess of 1
GHz). In such cases, adding one extra stage will help.
 MTBF decreases almost linearly with the number of synchronizers in the system.
Thus, if your system uses 1000 synchronizers, each of these must be designed with
atleast 1000 times more MTBF than the actual reliability target.

Handshaking based synchronizers: As discussed earlier, two flop synchronizer works

only when there is one bit of data transfer between the two clock domains and the result

of using multiple two-flop synchronizers to synchronize multi-bit data is catastrophic.

The solution for this is to implement handshaking based synchronization where the

transfer of data is controlled by handshaking protocol wherein source domain places

data on the ‘REQ’ signal. When it goes high, receiver knows data is stable on bus and it

is safe to sample the data. After sampling, the receiver asserts ‘ACK’ signal. This signal

is synchronized to the source domain and informs the sender that data has been

sampled successfully and it may send a new data. Handshaking based synchronizers

offer a good reliable communication but reduce data transmission bandwidth as it takes

many cycles to exchange handshaking signals. Handshaking allows digital circuits to

effectively communicate with each other when response time of one or both circuits is

unpredictable.
Handshaking protocol based synchronization technique

How handshaking takes place:

1.) Sender places data, then asserts REQ signal

2.) Receiver latches data and asserts ACK

3.) Sender deasserts REQ

4.) Receiver deasserts ACK to inform the sender that it is ready to accept another

data. Sender may start the handshaking protocol again.


Sequence of events in a handshaking protocol based synchronizer

Mux based synchronizers: As mentioned above, two flop synchronizers are

hazardous if used to synchronize data which is more than 1-bit in width. In such

situations, we may use mux-based synchronization scheme. In this, the source domain

sends an enable signal indicating that it the data has been changed. This enable is

synchronized to the destination domain using the two flop synchronizer. This

synchronized signal acts as an enable signal indicating that the data on data bus from

source is stable and destination domain may latch the data. As shown in the figure, two

flop synchronizer acts as a sub-unit of mux based synchronization scheme.

A mux-based synchronization scheme

Two clock FIFO synchronizer: FIFO synchronizers are the most common fast

synchronizers used in the VLSI industry. There is a ‘cyclic buffer’ (dual port RAM) that

is written into by the data coming from the source domain and read by the destination
domain. There are two pointers maintained; one corresponding to write, other pointing

to read. These pointers are used by these two domains to conclude whether the FIFO is

empty or full. For doing this, the two pointers (from different clock domains) must be

compared. Thus, write pointer has to be synchronized to receive clock and vice-versa.

Thus, it is not data, but pointers that are synchronized. FIFO based synchronization is

used when there is need for speed matching or data width matching. In case of speed

matching, the faster port of FIFO normally handles burst transfers while slower part

handles constant rate transfers. In FIFO based synchronization, average rate into and

out of the FIFO are same in spite of different access speeds and types.

A FIFO based synchronization scheme

Timing Corners – dimensions in timing signoff(oneeee)

Integrated circuits are designed to work for a range of temperatures and voltages, and
not just for a single temperature and voltage. These have to work under different
environmental conditions and different electrical setup and user environments. For
instance, the temperature in the internals of an automobile may reach as high as 150
degrees while operating. Also, automobiles may have to work in colder regions where
temperatures may reach -40 degrees during winters. So, a chip designed for
automobiles has to be designed so as to be able to work in temperatures ranging from -
40 to 150 degree Celsius. On the other hand, consumer electronics may have to work in
the range of -20 to +40 degrees only. Thus, depending upon the application, the chip
has to be robust enough to handle varying surrounding temperatures. Not just
surrounding temperatures, the voltage supplied by the voltage source may vary. The
battery may have an output voltage range. Also, the voltage regulator sitting inside or
outside the chip may have some inaccuracy range defined. Let us say, a SoC has a
nominal operating voltage of 1.2V with 10% variation. Thus, it can operate at any
voltage from 1.08 V to 1.32V. The integrated circuits have to be tolerable enough to
handle these variations. Not just these variations, the process by which the integrated
circuits are manufactured has variations due to its micro nature. For example, while
performing etching, the depth of etching may vary from wafer to wafer, and from die to
die. Not just that, there may be intra chip process variations too. For instance, an AND
gate may be placed inside an area of the chip where the signal density is very high. It
will behave differently from an isolated AND gate. Depending upon these, the behavior
(delay, static and dynamic power consumption etc) of cells on chip vary. These
variations are together referred as PVT (Process Voltage Temperature) variations. The
behavior of the devices also varies according to the PVT variations. The library (liberty)
models of the cells are characterized for cell delays, transitions, static and dynamic
power corresponding to different PVT combinations. Not just for cells, for nets too, these
variations are possible. The net parameters (resistance, capacitance and inductance)
may also vary. These parameters also account for cell delay. In addition, nets introduce
delay of their own too. Hence, one may get nets with high or less delay. So, these
variations also have to be taken into account for robust integrated circuit manufacture.
This variation in net characteristic can be modeled as their RC variation as it accounts
for changes in resistance and capacitance (ignoring inductance) of net.
Figure 1: A racing car. (Taken from en.wikipedia.com)

With proper techniques, the patterns of the variations for both the cell and net
parameters (delay, power, resistance and capacitance) are characterized and their
minima and maxima are recorded. Each minima and maxima can be termed as a
corner. Let us say, each minima/maxima in cell characteristics as ‘PVT corner’ and net
characteristics as ‘extraction corner’. Each combination of PVT extraction corners is
referred to as a ‘timing corner’ as it represents a point where timing will be extreme.
There is an assumption that if the setup and hold conditions are met for the design at
these corners, these will be met at intermediate points and it will be safe to run under all
conditions. This is true in most of the cases, not always. There is always a trade-off
between number of signed-off corners and the sign-off quality.

For bigger technologies, say 250 nm, only two corners used to be sufficient, one that
showed maximum cell delay and the other that showed least cell delay. Net variations
could be ignored for such technologies. In all, there used to be 2 PVT and 1 extraction
corner. As we go down technology nodes, net variations start coming into picture. Also,
cell characteristics do not show a linear behavior. Therefore, there is increased number
of PVT as well as extraction corners for lower technology nodes. For 28 nm, say, there
can be 8 PVT corners as well 8 extraction corners. The number of corners differs from
foundry to foundry. The chip has to be signed off in each and every corner to ensure it
works in every corner. However, we may choose to sign-off in lesser corners with
applying some extra uncertainty as margin in lieu of not signing off at these timing
corners. The timing analyst needs to decide what is appropriate depending upon the
resources and schedule.

Need for clock gating checks - need for glitchless clock


propagation(dneee)
One of the most important things in designs is to ensure glitch free propagation of
clocks. Even a single glitch in clock path can cause the chip to be metastable and even
fail. A glitch is any unwanted clock pulse that may cause the sequential cells to
consider it as an actual clock pulse. Thus, a glitch can put your device in an
unwanted state that is functionally never possible. That is why; there should never be a
glitch in clock path. Every effort should be done by designers to minimize its probability.
The figure below shows a flip-flop receiving a data signal and a clock signal; if there is
some glitch (unwanted change of state) in clock, it will take it as a real clock edge and
latch the data to its output. However, if the pulse is too small, the data may not
propagate properly to output and the flop may go metastable.

Figure showing functional glitch in clock path

There may be following kind of cells present in clock path:

1) Buffers/inverters: Since, there is only one input for a buffer/inverter, the glitch may
occur on the output of these gates only through coupling with other signals in the
vicinity. If we ensure that the buffer/inverter has good drive strength and that the load
and transition at its output are under a certain limit, we can be certain that the glitch will
not occur.

2) Combinational gates: There can be combinational gates other than


buffers/inverters in clock path, say, a 2-input AND gate having an enable signal that tells
if the clock is to be propagated or not. Each combinational gate might have one or more
clocks and data/enable pins. Let us say, we have a two input AND gate with one input
as clock and the other acting as an enable to the clock. We can have following cases
possible:

i) The other input is static: By static, we mean the other input will not change on the fly.
In other words, whenever the enable will change, the clock will be off. So, enable will
not cause the waveform at the output of the gate to change. This case is similar to a
buffer/inverter as the other input will not cause the shape of the output pulse to change.
ii) The other input is toggling: In this case, the enable might affect the waveform at the
output of the gate to change. To ensure that there is not glitch causes by this, there are
certain requirements related to skew between data and clock to be met, which will be
discussed later in the text. These requirements are termed as clock gating checks.

3 3) Sequential gates: There may also be sequential gates in clock path, say, a flop, a
latch or an integrated clock gating cell with the clock at its clock input and the enable for
the clock will be coming at its data input. The output of these cells will be a clock pulse.
For these also, two cases are possible as in case 2. In other words, if the enable
changes when clock is off, the enable is said to be static. In that case, the output either
has clock or does not have clock. On the other hand, if the input is toggling while clock
is there at the input, we may get away by meeting the setup and hold checks for the
enable signal with respect to clock input.

As discussed above, to ensure a glitch free propagation of clock at the output of the
combinational gates, we have to ensure some timing requirements between the enable
signal and clock. These timing requirements ensure that there is no functionally
unwanted pulse in clock path. If we ensure these timing requirements are met, there will
be no functional glitch in clock path. However, glitches due to crosstalk between signals
can still occur. There are other techniques to prevent glitches due to crosstalk. The
functional glitches in clock path can be prevented by ensuring the above discussed
timing requirements. In STA, these requirements are enforced on designs through
timing checks known as clock gating checks. By ensuring these checks are applied
and taken care of properly, an STA engineer can sign-off for functional glitches. In later
posts, we will be dealing with these checks in more details.

Clock gating checks(doneeee)


Today’s designs have many functional as well as test modes. A number of clocks
propagate to different parts of design in different modes. And a number of control
signals are there which control these clocks. These signals are behind switching on and
off the design. Let us say, we have a simple design as shown in the figure below. Pin
‘SEL’ selects between two clocks. Also, ‘EN’ selects if clock will be propagating to the
sub-design or not. Similarly, there are signals that decide what, when, where and how
for propagation of clocks. Some of these controlling signals may be static while some of
these might be dynamic. Even with all this, these signals should not play with waveform
of the clock; i.e. these should not cause any glitch in clock path. There are both
architectural as well as timing care-abouts that are to be taken care of while designing
for signals toggling in clock paths. This scenario is widely known as ‘clock gating’. The
timing checks that need to be modeled in timing constraints are known as ‘clock gating
checks’.
Figure 1: A simplest clocking structure

Definition of clock gating check: A clock gating check is a constraint, either applied or
inferred automatically by tool, that ensures that the clock will propagate without any
glitch through the gate.

Types of clock gating checks: Fundamentally, all clock gating checks can be
categorized into two types:

AND type clock gating check: Let us say we have a 2-input AND gate in which one of
the inputs has a clock and the other input has a data which will toggle while the clock is
still on.

Figure 2: AND type clock gating check; EN signal


controlling CLK_IN through AND gate

Since, the clock is free-running, we have to ensure that the change of state of enable
signal does not cause the output of the AND gate to toggle. This is only possible if the
enable input toggles when clock is at ‘0’ state. As is shown in figure 3 below, if ‘EN’
toggles when ‘CLK_IN’ is high, the clock pulse gets clipped. In other words, we do not
get full duty cycle of the clock. Thus, this is a functional architectural miss causing glitch
in clock path. As is evident in figure 4, if ‘EN’ changes during ‘CLK_IN’ are low, there is
no change in clock duty cycle. Hence, this is the right way to gate a clock signal with an
enable signal; i.e. make the enable toggle only when clock is low.

Figure 3: Clock being clipped when ‘EN’ changes when ‘CLK_IN’ is high

Figure 4: Clock waveform not being altered when ‘EN’ changes when ‘CLK_IN’ is low

Theoretically, ‘EN’ can launch from either positive edge-triggered or negative edge-
triggered flops. In case ‘EN’ is launched by a positive edge-triggered flop, the setup and
hold checks will be as shown in figure 5. As shown, setup check in this case is on the
next positive edge and hold check is on next negative edge. However, the ratio of
maximum and minimum delays of cells in extreme operating conditions may be as high
as 3. So, architecturally, this situation is not possible to guarantee the clock to pass
under all conditions.
Figure 5: Clock gating setup and hold checks on AND gate when 'EN' launches from a positive edge-triggered flip-
flop

On the contrary, if ‘EN’ launches from a negative edge-triggered flip-flop, setup check
are formed with respect to the next rising edge and hold check is on the same falling
edge (zero-cycle) as that of the launch edge. The same is shown in figure 6. Since, in
this case, hold check is 0 cycle, both the checks are possible to be met for all operating
conditions; hence, this solution will guarantee the clock to pass under all operating
condition provided the setup check is met for worst case condition. The inactive clock
state, as evident, in this case, is '0'.

Figure 6: Clock gating setup and hold checks on AND gate when ‘EN’ launches from negative edge-triggered flip-
flop

Figure 7: An OR gate controlling a clock signal 'CLK_IN'

OR type clock gating check: Similarly, since the off-state of OR gate is 1, the enable
for an OR type clock gating check can change only when the clock is at ‘1’ state. That
is, we have to ensure that the change of state of enable signal does not cause the
output of the OR gate to toggle. Figure 8 below shows if ‘EN’ toggles when ‘CLK_IN’ is
high, there is no change in duty cycle. However, if ‘EN’ toggles when ‘CLK_IN’ is low
(figure 9), the clock pulse gets clipped. Thus, ‘EN’ must be allowed to toggle only when
‘CLK_IN’ is high.

Figure 8: Clock being clipped when 'EN' changes when 'CLK_IN' is low

Figure 9: Clock waveform not being altered when 'EN' changes when 'CLK_IN' is low

As in case of AND gate, here also, ‘EN’ can launch from either positive or negative edge
flops. In case ‘EN’ launches from negative edge-triggered flop, the setup and hold
checks will be as shown in the figure 10. The setup check is on the next negative edge
and hold check is on the next positive edge. As discussed earlier, it cannot guarantee
the glitch less propagation of clock.

Figure 10: Clock gating setup and hold checks on OR gate when ‘EN’ launches from negative edge-triggered flip-
flop

If ‘EN’ launches from a positive edge-triggered flip-flop, setup check is with respect to
next falling edge and hold check is on the same rising edge as that of the launch edge.
The same is shown in figure 11. Since, the hold check is 0 cycle, both setup and hold
checks are guaranteed to be met under all operating conditions provided the path has
been optimized to meet setup check for worst case condition. The inactive clock state,
evidently, in this case, is '1'.

Figure 11: Clock gating setup and hold checks on OR gate when 'EN' launches from a positive edge-
triggered flip-flop

We have, thus far, discussed two fundamental types of clock gating checks. There may
be complex combinational cells other than 2-input AND or OR gates. However, for these
cells, too, the checks we have to meet between the clock and enable pins will be of the
above two types only. If the enable can change during low phase of the clock only, it is
said to be AND type clock gating check and vice-versa.

SDC command for application of clock gating checks: In STA, clock gating checks
can be applied with the help of SDC command set_clock_gating_check.

Clock gating checks at a multiplexer (MUX) dneeee


In the post 'clock switching and clock gating checks', we discussed why clock gating checks are
needed. Also, we discussed the two basic types of clock gating checks. Let us go one step
further. The most common types of combinational cells with dynamic clock switching
encountered in today’s designs are multiplexers. We will be discussing the clock gating checks at
a multiplexer. For simplicity, let us say, we have a 2-input multiplexer with 1 select pin. There
can be two cases:
Case 1: Data signal at the select pin of MUX used to select between two clocks

Figure 1: MUX with Data as select dynamically selecting the clock signal to propagate to output

This scenario is shown in figure 1 above. This situation normally arises when ‘Data’ acts as
clock select and dynamically selects which of the two clocks will propagate to the output. The
function of the MUX is given as:
CLK_OUT = Data.CLK1 + Data’.CLK2

The internal structure (in terms of basic gates) is as shown below in figure 2.

Figure 2: Internal structure of mux in figure 1

There will be two clock gating checks formed:


1. Between CLK1 and Data: There are two cases to be considered for this scenario:
o When CLK2 is at state '0': In this scenario, if the data toggles when CLK1 is '0',
it will pass without any glitches. On the other hand, there will be a glitch if data toggles when
CLK1 is '1'. Thus, the mux acts as AND gate and there will be AND-type clock gating check.
o When CLK2 is '1': In this scenario, if data toggles when CLK1 is '1', it will pass
without any glitches; and will produce a glitch if toggled when CLK1 is '0'. In other words,
MUX acts as an OR gate; hence, OR-type clock gating check will be formed in this case.

1. 2. Between CLK2 and Data: This scenario also follows scenario '1'. And the
type of clock gating check formed will be determined by the state of inactive clock.
2.
1.
1.
2. Thus, the type of clock gating check to be applied, in this case, depends
upon the inactive state of the other clock. If it is '0', AND-type check will be
formed. On the other hand, if it is '1', OR-type check will be formed.
Case 2: Clock signal is at select line. This situation is most common in case of Mux-based
configurable clock dividers wherein output clock waveform is a function of the two data values.

Figure 3: Combination of Data1 and Data2 determines if CLK or CLK' will propagate to the output

In this case too, there will be two kinds of clock gating checks formed:
i) Between CLK and Data1: Here, both CLK and Data1 are input to a 2-input AND gate, hence,
there will be AND type check between CLK and Data1. The following SDC command will
serve the purpose:
set_clock_gating_check -high 0.1 [get_pins MUX/Data1]
The above command will constrain an AND-type clock gating check of 100 ps on Data1 pin.

ii) Between CLK and Data2: As is evident from figure 4, there will be AND type check between
CLK’ and Data2. This means Data2 can change only when CLK’ is low. In other words, Data2
can change only when CLK is high. This means there is OR type check between CLK and
Data2. The following command will do the job:
set_clock_gating_check -low 0.1 [get_pins MUX/Data2]
The above command will constrain an OR-type clock gating check of 100 ps on Data2 pin.

Thus, we have discussed how there are clock gating checks formed between different
signals of a MUX.

False paths basics and examples(doneeeee


False path is a very common term used in STA. It refers to a timing path which is not required to
be optimized for timing as it will never be required to get captured in a limited time when excited
in normal working situation of the chip. In normal scenario, the signal launched from a flip-flop
has to get captured at another flip-flop in only one clock cycle. However, there are certain
scenarios where it does not matter at what time the signal originating from the transmitting flop
arrives at the receiving flop. The timing path resulting in such scenarios is labeled as false path
and is not optimized for timing by the optimization tool.
Definition of false path: A timing path, which can get captured even after a very large interval
of time has passes, and still, can produce the required output is termed as a false path. A false
path, thus, does not need to get timed and can be ignored while doing timing analysis.
Common false path scenarios: Below, we list some of the examples , where false paths can be
applied:
Synchronized signals: Let us say we have a two flop synchronizer placed between a sending
and receiving flop (The sending and receiving flops may be working on different clocks or same
clock). In this scenario, it is not required to meet timing from launching flop to first stage of
synchronizer. Figure 1 below shows a two-flop synchronizer. We can consider the signal coming
to flop1 as false, since, even if the signal causes flop1 to be metastable, it will get resolved
before next clock edge arrives with the success rate governed by MTBF of the synchronizer. This
kind of false path is also known as Clock domain crossing (CDC).
Figure 1: A two flop synchronizer
However, this does not mean that wherever you see a chain of two flops, there is a false path to
first flop. The two flops may be for pipelining the logic. So, once it is confirmed that there is a
synchronizer, you can specify the signal as false.

Similarly, for other types of synchronizers as well, you can specify false paths.

False paths for static signals arising due to merging of modes: Suppose you have a structure
as shown in figure 1 below. You have two modes, and the path to multiplexer output is different
depending upon the mode. However, in order to cover timing for both the modes, you have to
keep the “Mode select bit” unconstrained. This result in paths being formed through multiplexer
select also. You can specify "set false path" through select of multiplexer as this will be static in
both the modes, if there are no special timing requirements related to mode transition on this
signal. Specifically speaking, for the scenario shown in figure 1,

Mode 1: set_case_analysis 0 MUX/SEL


Mode 2: set_case_analysis 1 MUX/SEL
Mode with Mode1 and Mode2 merged together : set_false_path -through MUX/SEL
Figure 2: Mode selection signal selecting between mode1 and mode2 paths

Architectural false paths: There are some timing paths that are never possible to occur. Let us
illustrate with the help of a hypothetical, but very simplistic example that will help understand
the scenario. Suppose we have a scenario in which the select signals of two 2:1 multiplexers are
tied to same signal. Thus, there cannot be a scenario where data through in0 pin of MUX0 can
traverse through in1 pin of MUX1. Hence, it is a false path by design architecture. Figure 3
below depicts the scenario.

Figure 3: A hypothetical example showing architectural false path

Specifying false path: The SDC command to specify a timing path as false path is "set_false_path".
We can apply false path in following cases:
 From register to register paths
o set_false_path -from regA -to regB
o
 Paths being launched from one clock and being captured at another
o set_false_path -from [get_clocks clk1] -to [get_clocks clk2]
o
 Through a signal
o set_false_path -through [get_pins AND1/B]
Multicycle paths handling in STA(doneeeee)
In the post Multicycle paths - the architectural perspective, we discussed about the
architectural aspects of multicycle paths. In this post, we will discuss how multicycle
paths are handling in backend optimization and timing analysis:

How multi-cycle paths are handled in STA: By default, in STA, all the timing paths
are considered to have default setup and hold timings; i.e., all the timing paths should
be covered in either half cycle or single cycle depending upon the nature of path
(see setup-hold checks part 1 and setup-hold checks part 2 for reference). However, it
is possible to convey the information to STA engine regarding a path being multi-cycle.
There is an SDC command "set_multicycle_path" for the same. Let us elaborate it with
the help of an example:

Figure 3: Path from ff1/Q to ff2/D is multicycle path

Let us assume a multi-cycle timing path (remember, it has to be ensured by


architecture) wherein both launch and capture flops are positive edge-triggered as
shown in figure 3. The default setup and hold checks for this path will be as shown in
red in figure 4. We can tell STA engine to time this path in 3 cycles instead of default
one cycle with the help of set_multicycle_path SDC command:

set_multicycle_path 3 -setup -from ff1/Q -to ff2/D

Above command will shift both setup and hold checks forward by two cycles. That is,
setup check will now become 3 cycle check and hold will be 2 cycle check as shown in
blue in figure 4. This is because, by default, STA engine considers hold check one
active edge prior to setup check, which, in this case, is after 3 cycles.
Figure 4: Setup and hold checks before and after applying multicyle for setup-only

However, this is not the desired scenario in most of the cases. As we discussed earlier,
multi-cycle paths are achieved by either gating the clock path or data path for required
number of cycles. So, the required hold check in most cases is 0 cycle. This is done
through same command with switch "-hold" telling the STA engine to pull hold back to
zero cycle check.

set_multicycle_path -hold 2 -from ff1/Q -to ff2/D

The above command will bring back the hold check 2 cycles back to zero cycle. This is
as shown in figure 5 in blue.

Figure 5: Setup and hold checks after applying multi-cycle exceptions for both setup and hold

We need to keep in mind the following statement:

Setting a multi-cycle path for setup affects the hold check by same number of
cycles as setup check in the same direction. However, applying a multi-cycle path
for hold check does not affect setup check.

So, in the above example, both the statements combined will give the desired setup and
hold checks. Please note that there might be a case where only setup or hold multi-
cycle is sufficient, but that is the need of the design and depends on how FSM has been
modeled.

What if both clock periods are not equal: In the above example, for simplicity, we
assumed that launch and capture clock periods are equal. However, this may not be
true always. As discussed in multicycle path - the architectural perspective, it makes
more sense to have multi-cycle paths where there is a difference in clock periods. The
setup and hold checks for multicycle paths is not as simple in this case as it was when
we considered both the clocks to be of same frequency. Let us consider a case where
launch clock period is twice the capture clock period as shown in figure 6 below.

Figure 6: Default setup and hold checks for case where capture clock period is half that of launch clock

Now, the question is, defining a multi-cycle path, what clock period will be added to the
setup check, launch or capture? The answer depends upon the architecture and FSM of
the design. Once you know it, the same can be modelled in timing constraints. There is
a switch in the SDC command to provide for which of the clock periods is to be added.
"set_multicycle_path -start" means that the path is a multi-cycle for that many cycles of
launch clock. Similarly, "set_multicycle_path -end" means that the path is a multicycle
for that many cycles of capture clock. Let the above given path be a multicycle of 2. Let
us see below how it changes with -start and -end options.

1. set_multicycle_path -start: This will cause a cycle of launch clock to be added


in setup check. As expected, on applying a hold multicycle path of 1, the hold will return
back to 0 cycle check. Figure 7 below shows the effect of below two commands on
setup and hold checks. As is shown, setup check gets relaxed by one launch clock
cycle.
set_multicycle_path 2 -setup -from ff1/Q -to ff2/D -start
set_multicycle_path 1 -hold -from ff1/Q -to ff2/D -start

Figure 8: Setup and hold checks with -start option provided with set_multicycle_path

2. set_multicycle_path -end: This will cause a cycle of capture clock to be added in


setup check. As expected, on applying a hold multicycle path of 1, the hold will return
back to 0 cycle check. Figure 8 below shows the effect of below two commands on
setup and hold checks. As is shown, setup gets relaxed by one cycle of capture clock.
set_multicycle_path 2 -setup -from ff1/Q -to ff2/D -end
set_multicycle_path 1 -hold -from ff1/Q -to ff2/D -end

Figure 9: Setup and hold checks with -end option provided with set_multicycle_path

Why is it important to apply multi-cycle paths: To achieve optimum area, power and
timing, all the timing paths must be timed at the desired frequencies. Optimization
engine will know about a path being multicycle only when it is told through SDC
commands in timing constraints. If we dont specify a multicycle path as multicycle,
optimization engine will consider it as a single cycle path and will try to use bigger drive
strength cells to meet timing. This will result in more area and power; hence, more cost.
So, all multicycle paths must be correctly specified as multicycle paths during timing
optimization and timing analysis.
Multicycle paths : The architectural perspective

Definition of multicycle paths: By definition, a multi-cycle path is one in which data


launched from one flop is allowed (through architecture definition) to take more than one
clock cycle to reach to the destination flop. And it is architecturally ensured either by
gating the data or clock from reaching the destination flops. There can be many such
scenarios inside a System on Chip where we can apply multi-cycle paths as discussed
later. In this post, we discuss architectural aspects of multicycle paths. For timing
aspects like application, analysis etc, please refer Multicycle paths handling in STA.

Why multi-cycle paths are introduced in designs: A typical System on Chip consists
of many components working in tandem. Each of these works on different frequencies
depending upon performance and other requirements. Ideally, the designer would want
the maximum throughput possible from each component in design with paying proper
respect to power, timing and area constraints. The designer may think to introduce
multi-cycle paths in the design in one of the following scenarios:

1) Very large data-path limiting the frequency of entire component: Let us take a
hypothetical case in which one of the components is to be designed to work at 500
MHz; however, one of the data-paths is too large to work at this frequency. Let us say,
minimum the data-path under consideration can take is 3 ns. Thus, if we assume all the
paths as single cycle, the component cannot work at more than 333 MHz; however, if
we ignore this path, the rest of the design can attain 500 MHz without much difficulty.
Thus, we can sacrifice this path only so that the rest of the component will work at 500
MHz. In that case, we can make that particular path as a multi-cycle path so that it will
work at 250 MHz sacrificing the performance for that one path only.

2) Paths starting from slow clock and ending at fast clock: For simplicity, let us
suppose there is a data-path involving one start-point and one end point with the start-
point receiving clock that is half in frequency to that of the end point. Now, the start-
point can only send the data at half the rate than the end point can receive. Therefore,
there is no gain in running the end-point at double the clock frequency. Also, since, the
data is launched once only two cycles, we can modify the architecture such that the
data is received after a gap of one cycle. In other words, instead of single cycle data-
path, we can afford a two cycle data-path in such a case. This will actually save power
as the data-path now has two cycles to traverse to the endpoint. So, less drive strength
cells with less area and power can be used. Also, if the multi-cycle has been
implemented through clock enable (discussed later), clock power will also be saved.

Implementation of multi-cycle paths in architecture: Let us discuss some of the


ways of introducing multi-cycle paths in the design:

1) Through gating in data-path: Refer to figure 1 below, wherein ‘Enable’ signal


gates the data-path towards the capturing flip-flop. Now, by controlling the waveform at
enable signal, we can make the signal multi-cycle. As is shown in the waveform, if the
enable signal toggles once every three cycles, the data at the end-point toggles after
three cycles. Hence, the data launched at edge ‘1’ can arrive at capturing flop only at
edge ‘4’. Thus, we can have a multi-cycle of 3 in this case getting a total of 3 cycles for
data to traverse to capture flop. Thus, in this case, the setup check is of 3 cycles and
hold check is 0 cycle.

Figure 1: Introducing multi cycle path in design by gating the data-path

Now let us extend this discussion to the case wherein the launch clock is half in
frequency to the capture clock. Let us say, Enable changes once every two cycles.
Here, the intention is to make the data-path a multi-cycle of 2 relative to faster clock
(capture clock here). As is evident from the figure below, it is important to have Enable
signal take proper waveform as on the waveform on right hand side of figure 2. In this
case, the setup check will be two cycles of capture clock and hold check will be 0 cycle.
Figure 2: Introducing multi-cycle path where launch clock is half in frequency to capture clock

2) Through gating in clock path: Similarly, we can make the capturing flop capture
data once every few cycles by clipping the clock. In other words, send only those pulses
of clock to the capturing flip-flop at which you want the data to be captured. This can be
done similar to data-path masking as discussed in point 1 with the only difference being
that the enable will be masking the clock signal going to the capturing flop. This kind of
gating is more advantageous in terms of power saving. Since, the capturing flip-flop
does not get clock signal, so we save some power too.

Figure 3: Introducing multi cycle paths through gating the clock path

Figure 3 above shows how multicycle paths can be achieved with the help of clock
gating. The enable signal, in this case, launches from negative edge-triggered register
due to architectural reasons (read here). With the enable waveform as shown in figure
3, flop will get clock pulse once in every four cycles. Thus, we can have a multicycle
path of 4 cycles from launch to capture. The setup check and hold check, in this case, is
also shown in figure 3. The setup check will be a 4 cycle check, whereas hold check will
be a zero cycle check.

Pipelining v/s introducing multi-cycle paths: Making a long data-path to get to


destination in two cycles can alternatively be implemented through pipelining the logic.
This is much simpler approach in most of the cases than making the path multi-cycle.
Pipelining means splitting the data-path into two halves and putting a flop between
them, essentially making the data-path two cycles. This approach also eases the timing
at the cost of performance of the data-path. However, looking at the whole component
level, we can afford to run the whole component at higher frequency. But in some
situations, it is not economical to insert pipelined flops as there may not be suitable
points available. In such a scenario, we have to go with the approach of making the
path multi-cycle.

Propagation Delay
What is propagation delay: Propagation delay of a logic gate is defined as the time it
takes for the effect of change in input to be visible at the output. In other words,
propagation delay is the time required for the input to be propagated to the output.
Normally, it is defined as the difference between the times when the transitioning input
reaches 50% of its final value to the time when the output reaches 50% of the final
value showing the effect of input change. Here, 50% is the defined as the logic
threshold where output (or, in particular, any signal) is assumed to switch its states.

Figure 1: 2-input AND gate

Propagation delay example: Let us consider a 2-input AND gate as shown in figure 1,
with input ‘I2’ making transition from logic ‘0’ to logic ‘1’ and 'I1' being stable at logic
value '1'. In effect, it will cause the output ‘O’ also to make a transition. The output will
not show the effect immediately, but after certain time interval. The timing diagram for
the transitions are also shown. The propagation delay, in this case, will be the time
interval between I2 reaching 50% while rising to 'O' reaching 50% mark while rising as a
result of 'I2' making a transition. The propagation delay is labeled as “T P” in figure 2.
Figure 2: Propagation delay

On what factors propagation delay depends: The propagation delay of a logic gate is
not a constant value, but is dependent upon two factors:

1. Transition time of the input causing transition at the output: More the
transition time at the input, more will be the propagation delay of the cell. For less
propagation delays, the signals should switch faster.
2. The output load being felt by the logic gate: Greater is the capacitive
load sitting at the output of the cell, more will be the effort put (time taken) to
charge it. Hence, greater is the propagation delay.
How Propagation delay of logic gates is calculated: In physical design tools,
there can be following sources of calculation of propagation delay:

 Liberty file: Liberty file contains a lookup table for the each input-to-output path
(also called as cell arc) for logic gates as .lib models. The table contains values for
different input transition times and output loads corresponding to cell delay. Depending
upon the input transition and output load that is present in the design for the logic gate
under consideration, physical design tools interpolate between these values and
calculate the cell delay.
 SDF file: SDF (Standard Delay Format) is the extracted delay information of a
design. The current delay information, as calculated, can be dumped into SDF file. It
can, then, be read back. In case SDF is read, delays are not calculated and SDF delays
are given precedence.
Output transition time: The output transition time is also governed by the same two
factors as propagation delay. In other words, larger transition time and load increase the
transition time of the signal at the output of the logic gate. So, for better transition times,
both of these should be less.

Negative gate delay - is it possible(write)oneeee


As discussed in our post ‘propagation delay’, the difference in time from the input
reaching 50% of the final value of the transition to that of the output is termed as
propagation delay. It seems a bit absurd to have negative value of propagation delay as
it provides a misinterpretation of the effect happening before the cause. Common sense
says that the output should only change after input. However, under certain special
cases, it is possible to have negative delay. In most of such cases, we have one or
more of the following conditions:
i) A high drive strength transistor
ii) Slow transition at the input
iii) Small load at the output

Under all of the above mentioned conditions, the output is expected to transition faster
than the input signal, and can result in negative propagation delay. An example
negative delay scenario is shown in the figure below. The output signal starts to change
only after the input signal; however, the faster transition of the output signal causes it to
attain 50% level before input signal, thus, resulting in negative propagation delay. In
other words, negative delay is a relative concept.
Figure 1: Input and output transitions showing negative input delay

Worst Slew Propagation(write)


Worst slew propagation is a phenomenon in Static Timing Analysis. According to it, the
worst of the slews at the input pin of a gate is propagated to its output. As we know, the
output slew of a logic cell is a function of its input slew and output load. For a multi-input
logic gate, the output slew should be different for the timing paths through its different
input pins. However, this is not the case. This is due to the reason that to maintain a
timing grapth, each node in the design can have only 1 slew. So, to cover the worst
scenario for setup timing, the maximum slew at each output pin should be equal to that
caused by the input pin having worst of the slews. The output slew calculated is on the
basis of worst input slew, even if the timing path for which the output slew is being
calculated is not through the input pin with worst slew. Similarly, the best of the slews is
calculated based upon the effect of all the input pins for hold timing analysis. We can
refer to it as best slew propagation.
Let us illustrate with the help of a 2-input AND gate. As shown in figure below, let the
slews at the input pins be denoted as SLEW_A and SLEW_B and that at the output pin
as SLEW_OUT. Now, as we know:

SLEW_OUT = func (SLEW_A) if A toggles leading to OUT toggling


And SLEW_OUT = func (SLEW_B) if B toggles leading to OUT toggling

However, even though the timing path as shown through A pin, the resultant slew at
output SLEW_OUT will be calculated as:

SLEW_OUT = func (SLEW_A) if func(SLEW_A) > func(SLEW_B)

= func (SLEW_B) if func(SLEW_B) > func(SLEW_A)

Figure 1: Figure showing worst slew propagation

One may feel this as an over-pessimism inserted by timing analysis tool. Path based
timing analysis will not have worst slew propagation phenomenon as it calculates output
slew for each timing path rather than one slew per node.

Similarly, for performing timing analysis for hold violations, the best of the slews at
inputs is propagated to the output as mentioned before also.

Also read:
On-chip variations – the STA takeaway(write)
Static timing analysis of a design is performed to estimate its working frequency after
the design has been fabricated. Nominal delays of the logic gates as per
characterization are calculated and some pessimism is applied above that to see if
there will be any setup and/or hold violation at the target frequency. However, all the
transistors manufactured are not alike. Also, not all the transistors receive the same
voltage and are at same temperature. The characterized delay is just the delay of
which there is maximum probability. The delay variation of a typical sample of
transistors on silicon follows the curve as shown in figure 1. As is shown, most of the
transistors have nominal characteristics. Typically, timing signoff is carried out with
some margin. By doing this, the designer is trying to ensure that more number of
transistors are covered. There is direct relationship between the margin and yield.
Greater the margin taken, larger is the yield. However, after a certain point, there is not
much increase in yield by increasing margins. In that case, it adds more cost to the
designer than it saves by increase in yield. Therefore, margins should be applied so as
to give maximum profits.

Number of transistors v/s delay for a typical silicon transistors sample

We have discussed above how variations in characteristics of transistors are taken care
of in STA. These variations in transistors’ characteristics as fabricated on silicon are
known as OCV (On-Chip Variations). The reason for OCV, as discussed above also, is
that all transistors on-chip are not alike in geometry, in their surroundings, and position
with respect to power supply. The variations are mainly caused by three factors:
 Process variations: The process of fabrication includes diffusion, drawing out of
metal wires, gate drawing etc. The diffusion density is not uniform throughout wafer.
Also, the width of metal wire is not constant. Let us say, the width is 1um +- 20 nm. So,
the metal delays are bound to be within a range rather than a single value. Similarly,
diffusion regions for all transistors will not have exactly same diffusion concentrations.
So, all transistors are expected to have somewhat different characteristics.
 Voltage variation: Power is distributed to all transistors on the chip with the help
of a power grid. The power grid has its own resistance and capacitance. So, there is
voltage drop along the power grid. Those transistors situated close to power source (or
those having lesser resistive paths from power source) receive larger voltage as
compared to other transistors. That is why, there is variation seen across transistors for
delay.
 Temperature variation: Similarly, all the transistors on the same chip cannot
have same temperature. So, there are variations in characteristics due to variation in
temperatures across the chip.

How to take care of OCV: To tackle OCV, the STA for the design is closed with some
margins. There are various margining methodologies available. One of these is applying
a flat margin over whole design. However, this is over pessimistic since some cells may
be more prone to variations than others. Another approach is applying cell based
margins based on silicon data as what cells are more prone to variations. There also
exist methodologies based on different theories e.g. location based margins and
statistically calculated margins. As advances are happening in STA, more accurate and
faster discoveries are coming into existence.

Temperature inversion – concept and phenomenon(write)

To understand the phenomenon of temperature inversion, let us first understand the


concepts governing the conductivity of semiconductor devices with respect to changes
in temperature.

Phenomenon governing semoconductor conductivity vs. temperature: In all, there


are two phenomenon that govern the conductivity in any device-
 Carrier concentration: Electrons and holes are the charge carriers in a
semiconductor. More is the number of carriers; greater is the conductivity of the
material. Rise in temperature causes greater number of bonds to break due to higher
number of collisions among vibrating molecules; thus, resulting in higher number of
carriers with increase in temperature. This factor tends to increase the conductivity with
increasing temperature. More the number of carriers, greater is the conductivity.
 Mobility of the carriers: Mobility is another measure of conductivity. Greater is
the mobility of carriers, carriers move with greater speed, thus, contributing more to the
overall current; hence, greater is the conductivity of the material. With increase in
temperature, lattice vibrations increase resulting in less mobility of free carriers. So, this
factor tends to decrease the conductivity with temperature increase.
Summing up, the trend of conductivity with temperature depends upon which of the
above two factors dominates. Based upon the conductivity, the materials can be divided
into three types - conductors, insulators and semi-conductors. Let us explore how the
conductivity of these materials is based on the above two factors:

Conductivity of conductors (metals): Metals have abundance of loosely attached


nearly free electrons (as is commonly called, the electron sea), the carriers of electric
current. The increase in carrier concentration is ignorable with change in temperature.
So, mobility factor dominates. The conductivity of conductors decreases with increase in
temperature.

Conductivity of insulators (non-metals): Insulators have almost negligible free


carriers. The electrons in insulators are tightly bound to atoms by bonds. The
conductivity is negligible in insulators due to limited number of carriers. However, the
number of free carriers increases exponentially with temperature. This increase in
carrier concentration with temperature outpaces the decrease in mobility thereby
making the insulators to gain conductivity with rise in temperature. So, the conductivity
in insulators increases with rise in temperature.

Conductivity trend in Semiconductors: Semiconductors have conductivity in-between


metals and insulators. These are the class of insulating materials in which electrons are
loosely bound to atoms. A small energy is needed to break these bonds and supply free
carriers, which can be supplied by potential difference applied across the
semiconductor, or, by temperature itself, in the form of thermal energy. So, there can be
any of the two factors dominating depending upon the voltage applied across the
semiconductor. The decrease or increase in conductivity of semiconductor depends
upon which of the two factors dominates. For CMOS transistors, the number of charge
carriers directly translates to threshold voltage.

At high voltage levels applied, there is abundance of free charge carriers as a result of
the energy supplied by the potential difference created. At this state, there is not
significant change in carrier concentration with increase in temperature; so, the mobility
factor dominates; thereby, decreasing the conductivity with temperature. In other words,
at high levels of voltages applied, the conductivity of semiconductors decreases with
temperature.

Similarly, in the absence of any voltage applied, or with little voltage applied, the
semiconductor behaves similar to an insulator with very less number of carriers, those
resulting from only thermal energy. So, increase in carrier concentration is the
dominating factor. So, we can say that at low applied voltages, the conductivity of
semiconductors increases with temperature.

The concept of temperature inversion: With reference to the discussion we had


earlier, at higher technology nodes, the voltage levels used to be high. So, traditionally,
the delay of CMOS logic circuits used to increase with temperature. So, the most timing
critical corner used to be worst process, minimum voltage and maximum temperature.
However, with scaling down of technology, the voltage levels have also scaled down.
Due to this, at sub-nanometer technology levels, both the factors come into play. At
lower range of the operating voltage levels, first factor comes into play. In other words,
at lower technology nodes, the most setup timing critical corner has become worst
process, minimum voltage and minimum temperature. This shift in setup critical corner,
in VLSI jargon, is termed as temperature inversion.

Can a net have negative propagation delay? write


As we discussed in ‘’Is it possible for a logic gate to have negative propagation delay”, a
logic cell can have negative propagation delay. However, the only condition we
mentioned was that the transition at the output pin should be improved drastically so
that 50% level at output is reached before 50% level of input waveform.

In other words, the only condition for negative delay is to have improvement in slew. As
we know, a net has only passive parasitic in the form of parasitic resistances and
capacitances. Passive elements can only degrade the transition as they cannot provide
energy (assuming no crosstalk); rather can only dissipate it. In other words, it is not
possible for a net to have negative propagation delay.

However, we can have negative delay for a net, if there is crosstalk, as crosstalk can
improve the transition on a net. In other words, in the presence of crosstalk, we can
have 50% level at output reached before 50% level at input; hence, negative
propagation delay of a net.

Timing arcs(write)
What is a timing arc: A timing arc defines the propagation of signals through logic
gates/nets and defines a timing relationship between two related pins. Timing arc is
one of the components of a timing path. Static timing analysis works on the concept
of timing paths. Each path starts from either primary input or a register and ends at a
primary output or a register. In-between, the path traverses through what are known
as timing arcs. We can define a timing arc as an indivisible path/constraint from
one pin to another that tells EDA tool to consider the path/relationship between
the pins. For instance, AND, NAND, NOT, full adder cell etc. gates have arcs from
each input pin to each output pin. Also, sequential cells such as flops and latches have
arcs from clock pin to output pins and data pins. Net connections can also be identified
as timing arcs as is discussed later.

Figure 1 : Figure showing cell and net arcs


Terminology: The common terminology related to timing arcs is as follows:
 Source pin: The pin from which timing arc originates (pin IN1 and IN2 for cell
arcs, pin OUT for net arc in figure1). This also means constraining pin in case of
setup/hold timing checks (for example clock is source pin for setup check)
 Sink pin: The pin at which timing arc ends (pin OUT for cell arc, pin AND2/IN2
for net arc in figure2). This also means constrained pin in case of setup/hold timing arcs
(for example data pin is sink pin for setup check)

Cell arcs and net arcs: Timing arcs can be categorized into two categories based upon
the type of element they are associated with – cell arcs and net arcs.
 Cell arcs: These are between an input pin and output pin of a cell. In other
words, source pin is an input pin of a cell and sink pin a pin of the same cell (output pin
in case of delay arcs and input pin in case of timing check arcs). In the figure shown
above, arcs (IN1 -> OUT) and (IN2 -> OUT) are cell arcs. Cell arcs are further divided
into sequential and combinational arcs as discussed below.
 Net arcs: These arcs are between driver pin of a net and load pin of a net. In
other words, source pin is an output pin of one cell and sink pin is an input pin of
another cell. In the figure shown above, arc (OUT -> IN2) is a net arc. Net arcs are
always delay timing arcs.
Sequential and combinational arcs: As discussed above, cell arcs can be sequential
or combinational. Sequential arcs are between the clock pin of a sequential cell and
either input or output pin. Setup and hold arcs are between input data pin and clock pin
and are termed as timing check arcs as they constrain a form of timing relationship
between a set of signals. Sequential delay arc is between clock pin and output pin of
sequential elements. An example of sequential delay arc is clk to q delay arc in a flip-
flop. On the other hand, combinational arcs are between an input data and output data
pin of a combinational cell or block.

Information contained in timing arc: A delay timing arc provides following information:
1. A delay arc tells whether the path can be traversed through pin1 to pin2. If
the path can be traversed, we say that an arc exists between pin1 and pin2. On
the other hand, a timing check arc tells the relationship that is allowed between a
set of signals.
2. Under what condition the path will be traversed, known as ‘sdf condition’
3. Maximum and minimum times it can take from the source pin to the
destination pin of the arc to traverse in the path
4. Timing sense of the arc as explained below
Timing sense of an arc: Timing sense of an arc is defined as the sense of traversal
from source pin of the timing arc to the sink pin of the timing arc. Timing sense is also
called as "unateness" of timing arc. Timing sense can be ‘positive unate’, ‘negative
unate’ and ‘non-unate’.
 Positive unate timing arc: The unateness of an arc is said to be positive unate
if rise transition at the source pin causes rise transition (if at all) at sink pin and vice-
versa. Cells of type AND, OR gate etc. have positive unate arcs. All net arcs are
positive unate arcs.
 Negative unate timing arc: The unateness of an arc is said to be negative unate
if rise transition at the source pin causes fall transition at the sink pin and vice-versa.
NAND, NOR and Inverter have negative unate arcs.
 Non unate timing arcs: If there is no such relationship between the source and
sink pins of a timing arc, the arc is said to be non-unate. XOR and XNOR gates have
non-unate timing arcs.
From what source timing arcs are picked: For cell arcs, the existence of a timing arc
is picked from liberty files. The cell has a function defined that identifies if the arc is
there from its input (say ‘x’) to output (say ‘y’). In most of the cases, the value (delay,
unateness, sd condition etc) of the arc is also picked from liberty; but in case you have
read SDF, the delay is picked from SDF (Standard Delay Format) file (other properties
picked from liberty in this case also). On the other hand, for net arcs, the existence of
arc is picked from connectivity information (netlist). The net arcs are calculated based
on the parasitic values given in SPEF (Standard Parasitics Exchange Format) file, or
SDF (like in case above).

Importance of timing arcs: Timing arcs have a very important role in VLSI design
industry. Whole of the optimization process right from gate level netlist till final signoff
revolves around timing arcs. The presence of correct timing arcs in liberty file is very
essential for high quality signoff or there may not be correlation between simulation and
silicon).

Time borrowing in latches


What is time borrowing: Latches exhibit the property of being transparent when clock
is asserted to a required value. In sequential designs, using latches can enhance
performace of the design. This is possible due to time borrowing property of latches. We
can define time borrowing in latches as follows:
Time borrowing is the property of a latch by virtue of which a path ending at a latch can
borrow time from the next path in pipeline such that the overall time of the two paths
remains the same. The time borrowed by the latch from next stage in pipeline is, then,
subtracted from the next path's time.
The time borrowing property of latches is due to the fact that latches are level sensitive;
hence, they can capture data over a range of times than at a single time, the entire
duration of time over which they are transparent. If they capture data when they are
transparent, the same point of time can launch the data for the next stage (of course,
there is combinational delay from data pin of latch to output pin of latch).

Let us consider an example wherein a negative latch is placed between two positive
edge-triggered registers for simplicity and ease of understanding. The schematic
diagram for the same is shown in figure 1 below:

Figure 1: Negative level-sensitive latch between two positive edge-triggered registers

Figure 2 below shows the clock waveform for all the three elements involved. We have
labeled the clock edges for convenience. As is shown, latB is transparent during low
phase of the clock. RegA and RegC (positive edge-triggered registers) can
capture/launch data only at positive edge of clock; i.e., at Edge1, Edge3 or Edge5. LatB,
on other hand, can capture and launch data at any instant of time between Edge2 and
Edge3 or Edge4 and Edge5.
Figure 2: Clock waveforms

The time instant at which data is launched from LatB depends upon the time at which
data launched from RegA has become stable at the input of latB. If the data launched at
Edge1 from RegA gets stable before Edge2, it will get captured at Edge2
itself. However, if the data is not able to get stable, even then, it will get captured. This
time, as soon as the data gets stable, it will get captured. The latest instant of time this
can happen is the latch closing edge (Edge3 here). One point here to be noted is that at
whatever point data launches from LatB, it has to get captured at RegC at edge3. The
more time latch takes to capture the data, it gets subtracted from the next path. The
worst case setup check at latB is at edge2. However, latch can borrow time as needed.
The maximum time borrowed, ideally, can be upto Edge3. Figure 3 below shows the
setup and hold checks with and without time borrow for this case:
Figure 3: Setup check with and without time borrow

The above example consisted of a negative level-sensitive latch. Similarly, a positive


level-sensitive latch will also borrow time from the next stage, just the polarities will be
different.

Interesting problem – Latches in series


Problem: 100 latches (either all positive or all negative) are placed in series
(figure 1). How many cycles of latency will it introduce?

Figure 1 : 100 negative level-sensitive latches in series

As we know, setup check between latches of same polarity (both positive or negative) is
zero cycle with half cycle of time borrow allowed as shown in figure 2 below for negative
level-sensitive latches:
Figure 2: Setup check between two negative level-sensitive latches

So, if there are a number of same polarity latches, all will form zero cycle setup check
with the next latch; resulting in overall zero cycle phase shift.

As is shown in figure 3, all the latches in series are borrowing time, but allowing any
actual phase shift to happen. If we have a design with all latches, there cannot be a next
state calculation if all the latches are either positive level-sensitive or negative level-
sensitive. In other words, for state-machine implementation, there should not be latches
of same polarity in series.

Figure 3 : Timing for 100 latches in series


Virtual clock - purpose and timing
What is a virtual clock: By definition, a virtual clock is a clock without any source.

Stating more clearly, a virtual clock is a clock that has been defined, but has not been

associated with any pin/port. A virtual clock is used as a reference to constrain the

interface pins by relating the arrivals at input/output ports with respect to it with the help

of input and output delays.

How to define a virtual clock: The most simple sdc command syntax to define a virtual

clock is as follows:

create_clock –name VCLK –period 10

The above SDC command will define a virtual clock “VCLK” with period 10 ns.

Purpose of defining a virtual clock: The advantage of defining a virtual clock is that

we can specify desired latency for virtual clock. As mentioned above, virtual clock is

used to time interface paths. Figure 1 shows a scenario where it helps to define a virtual

clock. Reg-A is flop inside block that is sending data through PORT outside the block.

Since, it is a synchronous signal, we can assume it to be captured by a flop (Reg-B)

sitting outside the block. Now, within the block, the path to PORT can be timed by

specifying output delay for this port with a clock synchronous to clock_in. We can

specify a delay with respect to clock_in itself, but there lies the difficulty of specifying the

clock latency. If we specify the latency for clock_in, it will be applied to Reg-A also.

Applying output delay with respect to a real clock causes input ports to get relaxed and

output ports to get tightened after clock tree has been built.
Figure 1: Figure to illustrate virtual clock

The solution to the problem is to define a virtual clock and apply output delay with

respect to it. Making the source latency of virtual clock equal to network latency of real

clock will solve the problem.

Can you think of any other method that can serve the purpose of a virtual clock?

Minimum pulse width


All the sequential elements need some minimum pulse (either high or low) to ensure that the data
has been captured correctly. In other words, clock pulse fed to a flop or latch (or any other
sequential element) must be wide enough so that it does not interfere with correct functionality
of the element. By correct functionality, is meant, the internal operations of the cell.

Minimum pulse width requirement: To understand minimum pulse width requirement, let us
first define pulse width. Formally, pulse width can be defined as:
"If talking in terms of high signal level (high minimum pulse width), it is the time interval
between clock signal crossing half the VDD level during rising edge of clock signal and
clock signal crossing half the VDD level during falling edge of clock signal. If talking in
terms of low signal level (low minimum pulse width), it is the time interval between clock
signal crossing half the VDD level during falling edge of the clock signal and clock signal
crossing half the VDD level during rising edge of the clock signal."
If the clock being fed to a sequential object has less pulse width than the minimum required,
either of the following is the probable output:
 The flop can capture the correct data and FSM will functional correctly
 The flop can completely miss the clock pulse and does not capture any new data. The
FSM will, then, lead to invalid state
 The flop can go meta-stable
All these scenarios are probable of happening; so, it is required to ensure every sequential
element always gets a clock pulse greater than minimum pulse width required. To ensure this,
there are ways to communicate to timing analysis tool the minimum pulse width requirement for
each and every sequential element. The check to ensure minimum pulse width is known as
"minimum pulse width check". There are following ways to ensure minimum pulse width
through minimum pulse width check:

 Through liberty file: By default, all the registers in a design should have a minimum
pulse width defined through liberty file as this is the format to convey the standard cell
requierements to STA tool. By convention, minimum pulse width should be defined for clock
and reset pins. Minimum pulse width is constrained in liberty file using following syntax:
Timing type : min_pulse_width;
 Through SDC command: We can also define minimum pulse width requirement
through SDC command. The SDC command for the same is "set_min_pulse_width". For
example, following set of commands will constrain the minimum pulse width of clock clk to be
5 ns high and 4 ns low:
set_min_pulse_width -high 5 [get_clocks clk]
set_min_pulse_width -low 4 [get_clocks clk]

Basics of latch timing


A latch is a digital logic circuit that can sample a 1-bit digital value and hold it depending
upon the state of an enable signal. Based upon the state of enable, latches are
categorized into positive level-sensitive and negative level-sensitive latches.

Positive level-sensitive latch: A positive level-sensitive latch follows the input data
signal when enable is '1' and keeps its output when the data when it is '0'. Figure 1
below shows the symbol and the timing waveforms for a latch. As can be seen,
whenever enable is '1', out follows the data input. And when enable in '0', out remains
the same.

Figure 1(a): Positive level- Figure 1(b): Timing waveform for a positive level-
sensitive latch sensitive latch
Negative level-sensitive latch: A negative level-sensitive latch follows the input data
when enable is '0' and keeps its output when input is '1'.

Figure 2(a): Negative level- Figure 2(b): Timing waveform for a negative level-
sensitive latch sensitive latch

Latch timing arcs: Data can propagate to the output of the latch in two ways as
discussed below:
 Out changes with Data: This happens when enable is in its asserted state (for
example, for a positive level latch). When this happens, Out follows Data as there is a
direct path between Data and Out when Enable is '1'. This scenario is depicted in
figures 1(b) and 2(b) above wherein out is shown toggling when Data toggles. The latch
is, thus, said to have a timing arc from Data to Out.
 Out changes with Enable: This happens when Data at input changes when
Enable is in its de-asserted state. When this happens, latch waits for Enable to be
asserted, then, follows the value of Data. As figure 3 shows, Data had become stable a
lot earlier, but out toggled only when enable became asserted. So, in latches, there
exists a timing arc from Enable to Out.

Figure 3:When data changes during enable is in de-asserted state, output waits for the enable to assert. Only then,
the effect of input propagated to output

 Relation between Data and Enable: If Data toggles very close to the closing
edge of Enable, then, there might be a confusion as if its effect will be propagated to
output or not (as discussed later in this post). To make things more deterministic, we
impose a certain condition that Data should not toggle when Enable is getting de-
asserted. This relationship can be modelled as setup and hold arcs. So, there are setup
and hold timing arcs between data and enable pins of a latch. These will be discussed
below in detail.
Setup time and hold time for a latch: The most commonly used latch circuit is that
built using inverters and transmission gates. Figure 4 shows the transmission gate
implementation of a positive level-sensitive latch. The Enable has been shown as CLK
as usually is the case in sequential state machines. This circuit has two phases, as is
expected for a latch:

 When CLK = '1', Transmission gate at the input gets ON and there is a direct
path between Data and Out
 When CLK = '0', transmission gate in the loopback path gets ON. Out holds its
value

Figure 4: Positive level-sensitive latch using transmission gates

Now, when CLK transitions from '1' to '0', it is important that Data does not toggle. The
time before the clock falling edge that Data should remain stable is known as latch
setup time. Similarly, the time after the clock falling edge that Data should remain stable
is called latch hold time.

Let us go into the details of what latch setup and hold time should be for transmission
gate latch. If we want the data to be propagated properly to the output, then Data should
be stable for atleast some time before closing of the input transmission gate. This time
is such that it goes into the memory of latch; i.e., before input transmission gate closes,
Data should traverse both the inverters of the loop. So, setup time of the latch involves
the delay of input transmission gate and the two inverters. Figure 5 below shows the
setup time for the latch.
Figure 5: Setup time for latch

Similarly, if we do not want the data to propagate to output, it must not cross input
transmission gate so that it does not disturb the present state of the latch. This server
as the hold time for the latch. Assuming CLK' takes one inverter delay, input
transmission gate will close after one inverter delay only. So, the hold time for Data is
one inverter delay minus transmission gate delay. Please refer to figure 6 below for the
illustration of this. (CLK)' is formed from CLK after a delay equivalent to an inverter
delay. Only then, input transmission gate will switch off. If we want the data not to
propagate to Out, we have to ensure that it does not cross input transmission gate. So,
Data should not be present at the transmission gate's input at time (T(inv) - T(tg)). In
other words, it has to be held stable this much time after CLK edge. This is the hold time
for the latch.

Figure 6: Hold time for latch


Please note that there are other topologies also possible for latches such as dynamic
latches etc. The setup time and hold time calculations for such topologies will vary, but
the underlying principle will remain same, which is as follows:

 Setup time ensures that the data propagates to the output at the coming clock
edge
 Hold time ensures that the data does not propagate to the output at the
present/previous clock edge
Setup checks and hold checks for latches: As discussed above, the decision for the
data to be latched or not to be latched is made at the closing edge. So, the setup and
hold checks are with respect to latch closing edge only. However, since, latches are
transparent during half of the clock period, we can assume as if the capturing edge is
flexible and stretches all over the active level of the latch. This property enables a very
beautiful concept known as "time borrowing" for latches.

Setup time
Definition of setup time: Setup time is defined as the minimum amount of time before
arrival of clock's active edge so that it can be latched properly. In other words, each flip-
flop (or any sequential element, in general) needs data to be stable for some time
before arrival of clock edge such that it can reliably capture the data. This amount of
time is known as setup time.

We can also link setup time with state transitions. We know that the data to be captured
at the current clock edge was launched at previous clock edge by some other flip-flop.
The data launched at previous clock edge must be stable at least setup time before the
current clock edge. So, adherence to setup time ensures that the data launched at
previous edge is captured at the current clock edge reliably. In other words, setup time
ensures that the design transitions to next state smoothly.

Figure 1: Setup time

Figure 1 shows that data is allowed to toggle prior to yellow dotted line. This yellow
dotted line corresponds to setup time. The time difference between this line and active
clock edge is termed as setup time. Data cannot toggle after this yellow dotted line for
a duration known as setup-hold window. Occurrence of such an event will be termed as
setup time violation. The consequence of setup time violation can be capture of wrong
data or the sequential element (setup check violation) going into metastable state (setup
time violation).
Figure 2: A positive level-sensitive D-latch
Latch setup time: Figure 2 shows a positive level-sensitive latch. If there is a toggling
of data at the latch input close to negative edge (while the latch is closing), there will be
an uncertainty as if data will be capture reliably or not. For data to be captured reliably,
it has to be available at the input of loop transmission gate at the arrival of closing clock
edge. To be able to present at NodeD at the closing edge, it must be there at latch input
some time prior to the clock edge. This time in reaching from latch input to NodeD is
termed as setup time for this latch.

Flip-flop setup time: Figure 3 below shows a master-slave negative edge-triggered D


flip-flop using transmission gate latches. This is the most popular configuration of a flip-
flop used in today's designs. Let us get into the details of setup time for this flip-flop. For
this flip-flop to capture data reliably, the data must be present at nodeF at the arrival of
negative edge of clock. So, data must travel NodeA -> NodeB -> NodeC -> NodeD ->
NodeE -> NodeF before clock edge arrives. To reach NodeF at the closing edge of
latch1, data should be present at NodeA at some earlier time. This time taken by data
to reach NodeF is the setup time for flip-flop under consideration (assuming CLK and
CLK' are present instantaneously. If that is not the case, it will be accounted for accordingly).
We can also say that the setup time of flip-flop is, in a way, setup time of master latch.

Figure 3: D-flip flop


Positive, negative and zero setup time
As we know from the definition of setup time, setup time is a point on time axis which
restrains data from changing after it. Data can change only before occurrence of setup
timing point. Theoretically, there is no constraint on occurrence of setup time point with
respect to clock active edge. It can either be before, after or at the same time as that of
clock edge. Depending upon the relative occurrence of setup time point and clock active
edge, setup time is said to be positive, zero or negative.

Positive setup time: When setup time point is before the arrival of clock edge, setup
time is said to be positive. Figure 1 below shows positive setup time.

Figure 1: Positive setup time

Zero setup time: When setup time point is at the same instant as clock's active edge,
setup time is said to be zero. Figure 2 shows a situation wherein setup time is zero.

Figure 2: Zero setup time

Negative setup time: When setup time point occurs after clock edge, setup time is said
to be negative. Figure 3 shows timing waveform for negative setup time.
Figure 3: Negative setup time

What causes different values of setup time: We have discussed above theoretical
aspects of positive, zero and negative setup time. Let us go a bit deeper into the details.
Figure 4 shows a positive level-sensitive D-latch. As we know from the definition of
setup time, setup time depends upon the relative arrival times of data and clock at input
transmission gate (We have to ensure data has reached upto NodeD when clock
reaches input transmission gate). Depending upon the relative arrival times of data and
clock, setup time can be positive, zero or negative.

Figure 4: Positive level-sensitive latch


Let us assume the delay of an inverter is 1 ns. Then, to ensure that the data has
reached NodeD when clock edge arrives at input transmission gate, data has to be
available at the input transmission gate at least 2 ns before. So, if both data and clock
reach the reference point at the same time, the latch has a setup time of 2 ns.

Now, if data takes 1 ns more than clock to reach input transmission gate from the
reference point, then, data has to reach reference point at least 3 ns before clock
reference point. In this case, setup time will be 3 ns.

Similarly, if data takes 1 ns less than clock to reach input transmission gate, setup time
will be 1 ns. And if data takes 2 ns less than clock to reach input transmission gate,
setup time will be zero.
Now, if there is further difference between delays of data and clock from respective
reference points to input transmission gate, the hold time will become negative. For
example, if data takes 3 ns less than clock to reach input transmission gate, setup time
will be -1 ns.

This is how setup time depends upon relative delays of data and clock within the
sequential element. And it completely makes sense to have negative setup time.

Hold time
Definition of hold time: Hold time is defined as the minimum amount of time after
arrival of clock's active edge so that it can be latched properly. In other words, each flip-
flop (or any sequential element, in general) needs data to be stable for some time after
arrival of clock edge such that it can reliably capture the data. This amount of time is
known as hold time.

We can also link hold time with state transitions. We know that the data to be captured
at the current clock edge was launched at previous clock edge by some other flip-flop.
And the data launched at the current clock edge must be captured at the next edge.
Adherence to hold time ensures that the data launched at current edge is not captured
at the current clock edge. And the data launched at previous edge is captured and not
disturbed by the one launched at current edge. In other words, hold time ensures that
the current state of the design is not disturbed.

Figure 1 : Hold time

Figure 1 shows that data is allowed to toggle after the yellow dotted line. This yellow
dotted line corresponds to hold time. The time difference between the active clock edge
and this yellow dotted line is hold time. Data cannot toggle before this yellow dotted line
for a duration known as setup-hold window. Occurrence of such an event is termed as
hold violation. The consequence of such a violation can be capture of wrong data
(better termed as hold check violation) or the sequential element going into meta-stable
state (hold time violation).
Figure 2: A positive level-sensitive D-latch
Latch hold time: Figure 2 shows a positive level-sensitive latch. If there is a toggling of
data at the latch input close to negative edge (while the latch is closing), there will be an
uncertainty as if data will be capture reliably or not. For data to be captured reliably,
next data must not reach Node C when closing edge of clock arrives at the input
transmission gate. For this to happen, data must not travel NodeA -> NodeB -> NodeC
before clock edge arrives. Data must change after this time interval only.

Flip-flop hold time: Figure 3 below shows a master-slave negative edge-triggered D


flip-flop using transmission gate latches. This is the most popular configuration of a flip-
flop used in today's designs. Let us get into the details of hold time for this flip-flop. For
this flip-flop to capture data reliably, new data must not be present at nodeD at the
arrival of negative edge of clock. So, data must not travel NodeA -> NodeB -> NodeC -
> NodeD when clock edge arrives. For data to not reach NodeD when clock edge
arrives, it must toggle after some interval A with respect to clock. This interval
corresponds to hold time of the flip-flop.We can also say that the hold time of flip-flop is, in a
way, hold time of master latch.

Figure 3: D-flip flop


Hope this helped you in understanding the basics of hold time. You can suggest any
improvement you think below in comments.

Positive, negative and zero hold time

As we know from the definition of hold time, hold time is a point on time axis which
restrains data from changing before it. Data can change only after hold time has
elapsed. Now, there is no constraint on the occurrence of hold time point with respect to
clock edge. It can either be after, before or at the same instant of time as that of clock
active edge.

Posotive hold time: When hold time point is after the arrival of clock active edge, hold
time is said to be positive hold time. Figure 1 below shows positive hold time.

Figure 1: Positive hold time

Zero hold time: When hold time point is at the same time instant as that of clock active
edge, we say that hold time of the sequential element is zero. Figure 2 below shows
timing waveform for zero hold time.
Figure 2: Zero hold time

Negative hold time: Similarly, when hold time point comes earlier on time scale as
compared to data, we say that hold time of the sequential element is negative. Figure 3
shows timing waveform for negative hold time.

Figure 3: Negative hold time

We have discussed above theoretical aspects of positive, zero and negative hold time.
Let us go a bit deeper into the details. Figure 4 shows a positive level-sensitive D-latch.
As we know (from definition of hold time), hold time depends upon the relative arrival
times of clock and data at the input transmission gate (We have to ensure data does not
reach NodeC). Depending upon the times of arrival of clock and data, hold time can be
positive or negative.
Figure 4: Positive level-sensitive D-latch

Let us say, the delay of an inverter is 1 ns. Then, we can afford the data to reach
transmission gate input even 0.9 ns before arrival of clock at transmission gate. This will
ensure data reaches NodeC (-0.9 + 1 =) 0.1 n after arrival of clock edge, if allowed. But,
since, clock closes transmission gate, data will not reach NodeC. So, in this case, hold
time is -1 ns. If the delay from NodeB to NodeC was something else, hold time would
also have been different.

Now, if we say that clock arrives at transmission gate 1 ns earlier than data, then, by
above logic, hold time of this latch will be -2 ns.

Similarly, if clock arrives at transmission gate 0.5 ns after data, hold time will be -0.5 ns.

And if clock arrive at transmission gate 1 ns after data, hold time will be zero.

If the arrival time of clock is made more late, hold time will be greater than zero. For
example, if arrival time of clock is 2 ns after data, hold time will be +1 ns.

Hold time of the circuit is also dependent upon the reference point. For example,
consider a multi-level black box as shown in figure 5. If we look at black box 0, its hold
time is -1 ns. At level of black box 1, wherein clock travels 2 ns and data travels 0.5 ns
to reach black box 0, hold time is (-1 + 2 - 0.5 = ) 0.5 ns. Similarly, at the level of black
box 2, hold time is 1 ns. This is how, hold time depends upon the relative arrival times of
clock and data. And it completely makes sense to have a negative hold time.
STA problem : Hold time manipulation
Given a black box with a hold time of 2 ns. How will you convert it to one having a hold
time of 1 ns?

As we learnt in our post negative hold check, we can control hold time of a black box
just by controlling relative arrival times of data and clock at a certain reference point for
a given sequential element. So, to make this transition, we need to insert 1 ns of delay
in the clock path as shown in figure 2 below:

We can arrive at the above conclusion with the help of following equations. As we know,
for a hold check, following equation needs to hold true:
Hold slack = Tck->q + Tprop - (Thold + Tskew)
The above equation is for a single cycle path from register to register. However, the
results are valid for any kind of timing path.

Let the initial values be denoted by subscript (init). Then,


Hold slack(init) = Tck->q(init) + Tprop(init) - (Thold(init) + Tskew(init))
Hold slack(init) = Tck->q(init) + Tprop(init) - (2 + Tskew(init)) --- (i)
And final values by subscript (fin)
Hold slack(fin) = Tck->q(fin) + Tprop(fin) - (1 + Tskew(fin)) -- (ii)
Assuming hold slack and launch register timings remain same, we can say that R.H.S.
of the two equations can be equated.

Tck->q(init) + Tprop(init) - (2 + Tskew(init)) = Tck->q(fin) + Tprop(fin) - (1 + Tskew(fin))


Now,
Tck->q(init) = Tck->q(fin) as the launch register has not changed. Equation reduces to
Tprop(init) - Tprop(fin) - Tskew(init) + Tskew(fin) = 1
Now, Tskew = Tck(cap) - Tck(lau). Since, launch is not changed, we can write above equation
as
Tprop(init) - Tprop(fin) - Tck(cap)(init) + Tck(cap)(fin) = 1
Rearranging,
Tck(cap)(fin) - Tck(cap)(init) - {Tprop(fin) - Tprop(init)}} = 1
Or
Change in clock arrival - change in data arrival = 1

In other words, either we put a delay of 1 ns in clock path. Or reduce 1 ns of delay in


data path. As simple as that. :-)

Setup time vs hold time


In digital designs, each and every sequential element has some restrictions related to
the data with respect to clock in the form of windows in which data can change or not.
There is always a region around the active edge of the clock in which data is not
allowed to change at the input of the sequential element. This is because, if the data
changes at the input within this window, we cannot guarantee the output. If this
happens, there can be one of the three possibilities:
 Current output data can be the result of current input data
 Current output data can be the result of previous input data
 The output can go metastable (as explained in metastability)
This region around clock edge is marked by two boundary lines, one perrtaining to
setup time, and other to hold time. The region between these two lines is generally
termed as setup-hold window. Figure 1 below shows the setup-hold window.
Figure 1: Figure showing setup/hold window of a sequential element
There are certain points of difference between setup time and hold time that we need to
keep in mind:
 Setup time signifies the point in time before which data needs to be stable,
whereas hold time is the point of time after which the data needs to be stable
 Adherence to setup time ensures that the data launched at previous active clock
edge by another flip-flop gets captured at the current clock edge. On the other hand,
adherence to hold time ensures that the data launched at the current edge does not get
captured on the same edge.
 Above point also means that setup time adherence ensures that the design goes
to next state smoothly, whereas hold time adherence means the current state is not
disturbed.
Hope this post helped you in understanding the basic difference in setup time and hold
time.

Setup and hold checks


Setup and hold checks ensure that the finite state machine works in the way as
designed. In essence, whole of the timing analysis, be it static or dynamic, revolves
around setup and hold checks only. In this post, we will be touching upon setup and
hold checks.

What is meant by setup check: Setup check ensures that the design transitions to the
next state as desired through the state machine design. Mostly, the setup check is at
next active clock edge relative to the edge at which data is launched. Let us call this
as default setup check. This is, of course, in correspondence to state machine
requirement to transfer to next state and the possibility of meeting both setup and hold
checks together in view of delay variations accross timing corners. Figure 1 below
shows the setup check for a timing path from positive edge-triggered register to
negative edge-triggered register. It shows that the data launched by flop1 on positive
edge will be captured by flop2 on the forthcoming negative edge and will update the
state of flop2. To do so, it has to be stable at the input of flop2 before the negative edge
at least setup time before.
Figure 1: Default setup check for a timing path from positive edge-triggered to negative edge-triggered flop

What is meant by hold check: Hold check ensures that the design does not move to
the next state before its stipulated time; i.e., the design retains its present state only.
The hold check should be one active edge prior to the one at which setup is checked
unless there are some architectural care-abouts in the state machine design. The hold
check corresponding to default setup check in such a scenario is termed as default
hold check. Of course, there are some architectural care-abouts for this to happen.
Figure 2 below shows the default hold check corresponding to the default setup check
of figure 1. It shows that the data launched on positive edge by flop 1 should be
captured by next negative edge and not the previous negative edge.

Figure 2: Default hold check for a timing path from positive edge-triggered

Default setup and hold check categories: As discussed above, for each kind of timing
path, there is a default setup check and a default hold check that will be inferred unless
there is an intended non-default check. We can split the setup and hold checks into
following categories for our convenience. Each of the following is a link, which you can
visit to know about the default setup and hold checks for each category:

 Setup checks and hold checks for register to register paths


 Setup and hold checks for register to latch timing paths
 Setup and hold checks for latch to register timing paths
 Clock gating setup and hold checks
 Setup and hold checks for data-to-data checks

Non-default setup and hold checks: These are formed when the state machine
behavior is different than the default intended one. Sometimes, a state machine can be
designed causing the setup and hold checks to be non-default. For this to happen, of
course, you have to first analyze delay variations across timing corners and ensure that
the setup timing equation and hold timing equation are satisfied for all timing corner
scenarios. The non-default setup and hold checks can be modeled with the help of
multi-cycle path timing constraints. You may wish to go through our posts Multicycle
paths - the architectural perspective and Multicycle paths handling in STA to understand
some of the concepts related to non-default setup and hold checks.

Can hold check be frequency dependant?


We often encounter people argue that hold check is frequency independent. However, it
is only partially true. This condition is true only for zero-cycle hold checks. By zero cycle
hold checks, we mean that the hold check is performed on the same edge at which it is
launched. This is true in case of timing paths between same polarity registers; e.g.
between positive edge-triggered flops. Figure 1 below shows timing checks for a data-
path launched from a positive edge-triggered flip-flop and captured at a positive edge-
triggered flip-flop. The hold timing, in this case, is checked at the same edge at which
data is launched. Changing the clock frequency will not cause hold check to change.

Figure 1: Setup and hold checks for positive edge-triggered to negative edge-triggered flip-flop

Most of the cases in today’s designs are of this type only. The exceptions to zero cycle
hold check are not too many. There are hold checks for previous edge also. However,
these are very relaxed as compared to zero cycle hold check. Hence, are not
mentioned. Also, hold checks on next edge are impossible to be met considering cross-
corner delay variations. So, seldom do we hear that hold check is frequency dependant.
Let us talk of different scenarios of frequency dependant hold checks:

1. From positive edge-triggered flip-flop to negative edge-triggered flip-


flop and vice-versa: Figure 2 below shows the setup and hold checks for a
timing path from positive edge-triggered flip-flop to a negative edge-triggered flip-
flop. Change in frequency will change the distance between the two adjacent
edges; hence, hold check will change. The equation for hold timing will be given
for below case as:

Tdata + Tclk/2 > Tskew + Thold


or
Tslack = Tclk/2 - Thold - Tskew + Tdata
Thus, clock period comes into picture in calculation of hold timing slack.

Figure 2: Setup and hold checks for timing path from positive edge-triggered flip-flop to negative edge-
triggered flip-flop

Similarly, for timing paths launching from negative edge-triggered flip-flop and being
captured at positive edge-triggered flip-flop, clock period comes into picture. However,
this check is very relaxed most of the times. It is evident from above equation that for
hold slack to be negative, the skew between launch and capture clocks should be
greater than half clock cycle which is very rare scenario to occur. Even at 2 GHz
frequency (Tclk = 500 ps), skew has to be greater than 250 ps which is still very rare.
Coming to latches, hold check from a positive level-sensitive latch to negative edge-
triggered flip-flop is half cycle. Similarly, hold check from a negative level-sensitive latch
to positive edge-triggered flip-flop is half cycle. Hence, hold check in both of these
cases is frequency dependant.

2. Clock gating hold checks: When data launched from a negative edge-triggered flip-
flop gates a clock on an OR gate, hold is checked on next positive edge to the edge at
which data is launched as shown in figure 3, which is frequency dependant.
Figure 3: Clock gating hold check between data launched from a negative edge-triggered flip-flop and
and clock at an OR gate

Similarly, data launched from positive edge-triggered and gating clock on an AND gate
form half cycle hold. However, this kind of check is not possible to meet under normal
scenarios considering cross-corner variations.

3) Non-default hold checks: Sometimes, due to architectural requirements (e.g.


multi-cycle paths for hold), hold check is non-zero cycle even for positive edge-triggered
to positive edge-triggered paths as shown in figure 4 below.

Figure 4: Non-default hold check with multi-cycle path of 1 cycle specified

What makes timing paths both setup critical and hold


critical
Those timing paths, which are very hard to meet in timing are called timing critical
paths. They can be divided into setup and hold timing critical paths.

Setup timing critical paths: Those paths for which meeting setup timing is difficult, can
be termed as setup critical timing paths. For these paths, the setup slack value is very
close to zero and for the most part of design cycle, remains below zero.
Hold timing critical paths: As is quite obvious, those paths for which meeting hold
timing is difficult, are hold critical paths. These paths may require many buffers to meet
hold slack equation.

Sometimes, we may encounter some timing paths which are violating in both setup and
hold. There is not enough setup slack to make them hold timing clean and vice-versa.
The good practice in timing analysis is to identify all such paths as early as possible in
design cycle. Let us discuss the scenarios that make timing paths both setup and hold
timing critical.

Inherent frequency limit and delay variations: Let us say, we want our chip to remain
functional within following PVTs:
Process : Best-case to Worst-case
Voltage : 1.2 V with 10% voltage variation allowed (1.08 V to 1.32 V)
Temperature : -20 degrees to +150 degress
The delay of a standard cell changes with PVTs and OCVs. Let us only talk about PVT
variations. Let us say, cell delay changes by 2 times from worst case scenario (worst
process, lowest voltage, worst temperature) to best case scenario (best process,
highest voltage, best temperature). Let us say, setup and hold checks also scale by
same amount. Remember that the equations for setup and hold need to be satisfied
across all the PVTs. Which essentially means setup needs to be ensured for WCS
scenario and hold timing needs to be ensured for BCS scenario. This will provide a limit
to maximum frequency that the path can be timed at. If we try to go below that
frequency, we will not be able to ensure both setup and hold slacks remain positive.

Let us illustrate with the help of an example of a timing path from a positive edge-
triggered flip-flop to positive edge-triggered flip-flop with a frequency target of 1.4 GHz
(clock time period = 714 ps). Let us say, we have the Best-case and Worst-case
scenarios as shown in figure 1 and 2.

Figure 1 shows that the best-case clk->q delay for launch flop is 100 ps, best-case
combinational delay is 80 ps and best-case hold time is 200 ps. Applying our hold timing
equation for this case,

Hold slack = Tck->q + Tprop - Thold


Hold slack = 100 + 80 - 200
Hold slack = -20 ps
So, in this case, our hold slack comes out to be negative. So, we need to apply the
techniques to improve our hold slack. But we need to ensure that our setup slack is
sufficiently positive. Let us look at the worst-case scenario to know about our setup
slack. If we assume that everything scales by 2 times, the worst-case numbers for clk-
>q delay, combinational delay and setup/hold time come out to be 200 ps, 160 ps and
400 ps respectively.

Applying setup timing equation for this scenario,


Setup slack = Tperiod - (Tck->q + Tprop + Tsetup)
Setup slack = 714 - 200 - 160 - 400 = -36 ps

Thus, for the same timing path, both setup and hold slacks are coming out to be
negative. For this path, we cannot meet both setup and hold provided all these
conditions. One of the solutions could be to use cells with less delay variability. Or we
can limit the operating conditions to a tighter range, for instance, 1.15 to 1.25 V instead.
This will improve both setup and hold slack values. If this is not an option, the only
option left to satisfy timing is to add delay elements to bring hold slack to zero and
reduce the frequency as the inherent variations of cells will not allow the path to operate
beyond a certain frequency. Let us check at what maximum frequency our timing path
will work.

First, we need to ensure hold timing is met. Thus,


Hold slack >= 0
This translates to Combinational delay (Cb) > 100 ps, or Cb = 100 ps for a hold slack of
0 ps. In other words, worst case combinational delay is 200 ps (2 times scaling).

For a setup slack of 0 ps, operating clock frequency will be maximum; i.e.,

Tperiod(min) = Tck->q + Tprop + Tsetup


Tperiod(min) = 200+ 200 + 400 = 800 ps
The minimum time period that it can operate at is 800 ps, or a maximum frequency of
1.25 GHz.
In this post, we have discussed how PVT variations in delay can cause a timing path to
be both setup and hold critical. Also, we discussed how it limits the frequency of
operation. Although the discussion was limited to only PVT variations, OCV variations
will add to the variations. The inherent equations will certainly remain same though.
Also, we did not take an important parameter into consideration; i.e. clock skew. Can
you think of how clock skew between the two flip-flops contribute to maximum
achievable clock frequency? Or is it unrelated to clock skew?

Setup and hold violations


What is meant by setup and/or hold violations: The ultimate aim of timing analysis is
to get the design work at required frequency and with reliability. For this to happen, it
must be ensured in timing that all the state transitions are happening smoothly; i.e., the
setup and hold requirements of all the timing paths in the design are met. If there are
failing setup and/or hold paths, the design is said to have violations.

What if setup and/or hold violations occur in a design: As said earlier, setup and
hold timings are to be met in order to ensure that data launched from one flop is
captured properly at another and in accordance to the state machine designed. In other
words, no timing violations means that the data launched by one flip-flop at one clock
edge is getting captured by another flip-flop at the desired clock edge. If the setup check
is violated, data will not be captured properly at the next clock edge. Similarly, if hold
check is violated, data intended to get captured at the next edge will get captured at the
same edge. Moreover, setup/hold violations can lead to data getting captured within the
setup/hold window which can lead to metastability of the capturing flip-flop (as explained
in our post metastability). So, it is very important to have setup and hold requirements
met for all the registers in the design and there should not be any setup/hold violations.

Setup violations: As we know, setup checks are applied for timing paths to get the
state machine to move to the next state. The timing equation for a setup check from
positive edge-triggered flip-flop to positive edge-triggered flip-flop is given as below:
Tck->q + Tprop + Tsetup - Tskew < Tperiod
For a timing path to meet setup requirements, this equation needs to be satisfied. The
difference between left and right sides is represented by a parameter known as setup
slack.

Setup slack is the margin by which a timing path meets setup check requirement. It is
given as the difference in R.H.S. and L.H.S. of setup timing equation. The equation for
setup slack is given as:
Setup slack = Tperiod - Tck->q - Tprop - Tsetup + Tskew
If setup slack is positive, it means the timing path meets setup requirement. On the
other hand, a negative setup slack means setup violating timing path. If, by chance, a
fabricated design is found to have a setup violation, you can still run the design at less
frequency than specified and get the desired functionality as setup equation includes
clock period as a variable.
If we analyze setup equation more closely, it involves four parameters:
1. Data path delay: More the total delay of data path (flip-flop delay +
combinational delay + Setup), less is setup slack
2. Clock skew: More the clock skew (difference between arrival times of
clock at capture and launch flip-flops), more is the setup slack
3. Setup time requirement of capturing flip-flp: Less the setup time
requirement, more will be setup slack
4. Clock period: More is the clock period, more is the setup slack. However,
if you are targetting a specific clock period, doing this is not an option. :-)
How to tackle setup violations: The ultimate goal of timing analysis is to get every
timing path follow setup equation and get a positive setup slack number for every timing
path in the design. If a timing path is violating setup timing (assuming we are targetting
a certain clock frequency), we can try one or more of the following to bring the setup
slack back to a positive value by:
 Decreasing data path delay
 Choosing a flip-flop with less setup time requirement
 Increasing clock skew
How to fix setup violations discusses various ways to tackle setup violations.

Hold violations: As we know, hold checks are applied to ensure that the state machine
remains in its present state until desired. The hold timing equation for a timing path from
a positive edge-triggered flip-flop to another positive edge-triggered flip-flop is governed
by the following equation:
Tck->q + Tprop > Thold + Tskew
Similar to setup slack, the presence and magnitude of hold violation is governed by a
parameter called as hold slack. The hold slack is defined as the amount by which L.H.S is
greater than R.H.S. In other words, it is the margin by which timing path meets the hold timing
check. The equation for hold slack is given as:
Hold slack = Tck->q + Tprop - Thold + Tskew
If hold slack is positive, it means there is still some margin available before it will start violating
for hold. A negative hold slack means the path is violating hold timing check by the amount
represented by hold slack. To get the path met, either data path delay should be increased, or
clock skew/hold requirement of capturing flop should be decreased.

If we analyze hold timing equation more closely, it involves three parameters:


1. Data path delay: More data path delay favours hold slack; hence, more data path
delay, more is the margin
2. Skew: Having a positive skew degrades hold slack
3. Hold requirement of capturing flip-flop: Less the hold requirement, more will
be hold slack
How to tackle hold violations: Similar to setup analysis, the ultimate aim of hold
analysis is to get every timing path follow the hold timing equation and get a positive
hold slack for each and every timing path in the design. If a timing path violates for hold,
we can do either of the following:
 Increase data path delay
 Decrease clock skew
 Choose a flip-flop with less hold requirement
How to fix hold violations discusses, in detail, various ways to fix hold violations.

How to fix setup violations


In the post setup and hold time violations, we learnt about the setup time violations and
hold time violations. In this post, we will learn the approaches to tackle setup time
violations. Following strategies can be useful in reducing the magnitude of setup
violation and bringing it closer towards a positive value:

1. Increase the drive strength of data-path logic gates: A cell with better drive
strength can charge the load capacitance quickly, resulting in lesser propagation delay.
Also, the output transition should improve resulting in better delay of proceeding stages.
We can view a logic gate as a certain ON-resistance, that will charge/discharge a load
capacitor to toggle the output state. This will form an RC circuit with a certain RC time
constant. A better drive-strength gate will have a lesser resistance, effectively lowering
the RC time constant; hence, providing less delay. This is illustrated in figure 1 below. If
an AND gate of drive strength 'X' has a pull down resistance equivalent to 'R', the one
with drive strength '2X' will have R/2 resistance. Thus, a bigger AND gate with better
drive strength will have less delay.

This strategy is going to give best results only if the load of the cell is dominated by
external load capacitance. Generally, drive strength of a cell is proportional to the cell
size. Thus, increasing the cell size halves its internal resistance, but doubles the internal
node capacitance. Thus, as shown in figure 2, the zero load capacitance delay of a cell
ideally remains same of doubling the size of the cell.
Thus, upon doubling the drive strength of the cell, (assuming D to be the original delay)
the delay can be anything between D/2 to D depending upon the ratio of intrinsic and
external load capacitance.

Moreover, the input pin capacitance is a by-product of the size of the cell. Thus,
increasing the size of the cell results in increased load for the driver cell of its input pins.
So, in some cases (very high drive strength cell with less load driven by a low drive
strength cell), increasing the drive strength can result in increase in magnitude of setup
violation.

Keeping aside timing, power dissipation (both leakage as well as dynamic power) are a
function of cell drive strength. Also, area is a function of cell drive strength. So,
increasing the drive strength to fix a setup violation results in both area and power
increase (although very small in comparison to whole design).

2. Use the data-path cells with lesser threshold voltages: If you have multiple
flavors of threshold voltages in your designs, the cell with lesser threshold voltage will
certainly have less delay. So, this must be the first step to resolve setup violations.

3. Improve the setup time of capturing flip-flop: As we know, the setup time of a flip-
flop is a function of the transition at its data pin and clock pin. Better the transition at
data pin, less is setup time. And worse clock transition causes less setup time. Also, a
flip-flop with higher drive strength and/or lower threshold voltage is more probable of
having less setup time requirement. Also, increasing the drive strength of flip-flop might
cause the transition at clock pin and data pin to get worse due to higher pin loads. This
also plays a role in deciding the setup time.

4. Restructuring of the data-path: Based upon the placement of data path logic cells,
you can decide either to combine simple logic gates into a complex gate, or split a multi-
stage cell into simpler logic gates. A multi-stage gate is optimized in terms of area,
power and timing. For example, a 2:1 mux will have less logic delay than 1 AND gate
and 1 OR gate combined for same output load capacitance. But, if you need to traverse
distance, then 2 stages of logic can help as a buffer will introduce additional delay.
Let us elaborate this with the help of an example wherein a data-path traverses a 3-
input AND gate from FF1 to FF2 situated around 400 micron apart. Let us assume one
logic cell can drive 200 micron and each logic cell has only one drive strength available
for simplicity. The choice is between two 2-input AND gates and 1 3-input AND gate. In
this case, 3-input AND gate should give less delay (may be 200 ps for two 2-input AND
vs 150 ps for one 3-input AND) as it has been optimized for less area, timing and power
as compared to two 2-input AND gates.

Now, consider another case where the FF1 and FF2 are at a distance of 600 micron. In
this case, if we use two 2-input AND gates, we can place them spaced apart 200 micron
and hence, can cover the distance. But, if we use one 3-input AND gate, we will need to
add a repeater, which will have its own delay. In this case, using two 2-input AND gates
should give better results in terms of overall data-path delay.

5. Routing topologies: Sometimes, when there are a lot of nets at a certain place in
the design, the routing tool can detour the nets trying to get the place less congested.
Thus, two logic cells might be placed very close, still the delay can seem to be high for
both the cells ; for driver cell due to high net capacitance and for load cell due to poor
transition at the input. Also, net delay can be a significant component in such scenarios.
Below figure shows one such example of two AND gates situated a certain distance
apart. Ideally, there could be a straight net route between the two gates. But, due to
very high net density in the region, router tool chose to route the way as shown on the
right to help ease the congestion (this is an exaggerated scenario to help understand
better).
So, always give proper importance to net routing topology, at least for setup timing
critical nets. A few tips to improve the timing you can try include:

 Try the net to have as less detouring as possible


 Vias increase the net resistance. So, try to have as less vias as possible
 Higher metal layers have less resistance. So, long nets can be routed in higher
layers to have less net delay

6. Add repeaters: Every logic cell has a limit upto which it can drive a load capacitance.
After that, its delay starts increasing rapidly. Since, net capacitance is a function of net
length, we should keep a limit on the length of net driven by a gate. Also, net delay itself
is proportional to square of net length. Moreover, the transitions may be very bad in
such cases. So, it is wise to add repeater buffers after a certain distance, in order to
ensure that the signal is transferred reliably, and in time.

7. Play with clock skew: Positive skew helps improve the setup slack. So, to fix setup
violation, we may either choose to increase the clock latency of capturing flip-flop, or
decrease the clock latency of launching flip-flop. However, in doing so, we need to be
careful regarding setup and hold slack of other timing paths that are being formed
from/to these flip-flops.

8. Increase clock period: As a last resort, you may choose to time your design at
reduced frequency. But, if you are targeting a particular performance, you need a
minimum frequency. In that case, this option is not for you.

9. Improve the clk->q delay of launching flip-flop: A flip-flop with less clk->q delay
will help meeting a violating setup timing path. This can be achieved by:
 Improving transition at flip-flops clock pin
 Choosing a flip-flop of high drive strength. However, if by doing so, clock
transition degrades, delay can actually increase
 Replacing the flip-flop with a flip-flop of same drive strength, but lower Vt
In this post, we learnt how to approach a setup violating timing path. Have you ever
used a method that is not listed above? Please share your experience in comments. We
will be happy to hear from you.
How to fix hold violations
In the post setup and hold time violations, we learnt about the setup time violations and
hold time violations. In this post, we will learn the approaches to tackle hold time
violations. Following strategies can be useful in reducing the magnitude of hold violation
and bringing the hold slack towards a positive value:

1. Insert delay elements: This is the simplest we can do, if we are to decrease the
magnitude of a hold time violation. The increase in data path delay can be increased if
we insert delay elements in the data-path. Thus, the hold violating path's delay can be
increased, and hence, slack can be made positive by inserting buffers in hold violating
data-path.

2. Reduce the drive strength of data-path logic gates: Replacing a cell with a similar
cell of less drive strength will certainly add delay to data-path. However, there is a slight
chance of decrease in data-path delay if the cell load is dominated by intrinsic
capacitance as we discussed in xkfjjkdsf

3. Use data-path cells with higher threshold voltages: If you have multiple flavors of
threshold voltages in your design, the cells with higher threshold voltage will certainly
have higher delays. So, this must be the first option you must be looking for to resolve
hold violations.

4. Improve hold time of capturing flip-flop: Using a capturing flip-flop with higher
drive strength and/or lower threshold voltage will give a lower hold time requirement.
Also, degrading the transition at flip-flop's clock pin reduces its hold time requirement.

5. Detoured routing: Detoured routing can be adoped as an alternative to insertion of


delay elements as it will add load to the driving cell as well as provide additional net
delay thereby increasing the data-path delay.

6. Play with clock skew: A positive skew degrades hold timing and a negative skew
aids hold timing. So, if a data-path is violating, we can either decrease the latency of
capturing flip-flop or increase the clock latency of launching flip-flop. However, in doing
so, we need to keep in mind the setup and hold slacks of other timing paths starting
and/or ending at these flip-flops.

7. Increase the clk->q delay of launching flip-flop: A launching flip-flop with more clk-
>q delay will help ease the hold timing of the data-path. For this, either we can decrease
the drive strength of the flip-flop or move it to higher threshold voltage.

Multicycle paths handling in STA


In the post Multicycle paths - the architectural perspective, we discussed about the architectural
aspects of multicycle paths. In this post, we will discuss how multicycle paths are handling in
backend optimization and timing analysis:

How multi-cycle paths are handled in STA: By default, in STA, all the timing paths are
considered to have default setup and hold timings; i.e., all the timing paths should be covered in
either half cycle or single cycle depending upon the nature of path (see setup-hold checks part
1 and setup-hold checks part 2 for reference). However, it is possible to convey the information
to STA engine regarding a path being multi-cycle. There is an SDC command
"set_multicycle_path" for the same. Let us elaborate it with the help of an example:

Figure 3: Path from ff1/Q to ff2/D is multicycle path

Let us assume a multi-cycle timing path (remember, it has to be ensured by architecture) wherein
both launch and capture flops are positive edge-triggered as shown in figure 3. The default setup
and hold checks for this path will be as shown in red in figure 4. We can tell STA engine to time
this path in 3 cycles instead of default one cycle with the help of set_multicycle_path SDC
command:

set_multicycle_path 3 -setup -from ff1/Q -to ff2/D

Above command will shift both setup and hold checks forward by two cycles. That is, setup
check will now become 3 cycle check and hold will be 2 cycle check as shown in blue in figure
4. This is because, by default, STA engine considers hold check one active edge prior to setup
check, which, in this case, is after 3 cycles.
Figure 4: Setup and hold checks before and after applying multicyle for setup-only

However, this is not the desired scenario in most of the cases. As we discussed earlier, multi-
cycle paths are achieved by either gating the clock path or data path for required number of
cycles. So, the required hold check in most cases is 0 cycle. This is done through same command
with switch "-hold" telling the STA engine to pull hold back to zero cycle check.

set_multicycle_path -hold 2 -from ff1/Q -to ff2/D

The above command will bring back the hold check 2 cycles back to zero cycle. This is as shown
in figure 5 in blue.

Figure 5: Setup and hold checks after applying multi-cycle exceptions for both setup and hold

We need to keep in mind the following statement:

Setting a multi-cycle path for setup affects the hold check by same number of cycles as
setup check in the same direction. However, applying a multi-cycle path for hold check
does not affect setup check.

So, in the above example, both the statements combined will give the desired setup and hold
checks. Please note that there might be a case where only setup or hold multi-cycle is sufficient,
but that is the need of the design and depends on how FSM has been modeled.
What if both clock periods are not equal: In the above example, for simplicity, we assumed
that launch and capture clock periods are equal. However, this may not be true always. As
discussed in multicycle path - the architectural perspective, it makes more sense to have multi-
cycle paths where there is a difference in clock periods. The setup and hold checks for multicycle
paths is not as simple in this case as it was when we considered both the clocks to be of same
frequency. Let us consider a case where launch clock period is twice the capture clock period as
shown in figure 6 below.

Figure 6: Default setup and hold checks for case where capture clock period is half that of launch clock

Now, the question is, defining a multi-cycle path, what clock period will be added to the setup
check, launch or capture? The answer depends upon the architecture and FSM of the design.
Once you know it, the same can be modelled in timing constraints. There is a switch in the SDC
command to provide for which of the clock periods is to be added. "set_multicycle_path -start"
means that the path is a multi-cycle for that many cycles of launch clock. Similarly,
"set_multicycle_path -end" means that the path is a multicycle for that many cycles of capture
clock. Let the above given path be a multicycle of 2. Let us see below how it changes with -start
and -end options.

1. set_multicycle_path -start: This will cause a cycle of launch clock to be added in setup
check. As expected, on applying a hold multicycle path of 1, the hold will return back to 0 cycle
check. Figure 7 below shows the effect of below two commands on setup and hold checks. As is
shown, setup check gets relaxed by one launch clock cycle.

set_multicycle_path 2 -setup -from ff1/Q -to ff2/D -start


set_multicycle_path 1 -hold -from ff1/Q -to ff2/D -start
Figure 8: Setup and hold checks with -start option provided with set_multicycle_path

2. set_multicycle_path -end: This will cause a cycle of capture clock to be added in setup
check. As expected, on applying a hold multicycle path of 1, the hold will return back to 0 cycle
check. Figure 8 below shows the effect of below two commands on setup and hold checks. As is
shown, setup gets relaxed by one cycle of capture clock.
set_multicycle_path 2 -setup -from ff1/Q -to ff2/D -end
set_multicycle_path 1 -hold -from ff1/Q -to ff2/D -end

Figure 9: Setup and hold checks with -end option provided with set_multicycle_path

Why is it important to apply multi-cycle paths: To achieve optimum area, power and
timing, all the timing paths must be timed at the desired frequencies. Optimization
engine will know about a path being multicycle only when it is told through SDC
commands in timing constraints. If we dont specify a multicycle path as multicycle,
optimization engine will consider it as a single cycle path and will try to use bigger drive
strength cells to meet timing. This will result in more area and power; hence, more cost.
So, all multicycle paths must be correctly specified as multicycle paths during timing
optimization and timing analysis.

Basics of latch timing


A latch is a digital logic circuit that can sample a 1-bit digital value and hold it depending
upon the state of an enable signal. Based upon the state of enable, latches are
categorized into positive level-sensitive and negative level-sensitive latches.
Positive level-sensitive latch: A positive level-sensitive latch follows the input data
signal when enable is '1' and keeps its output when the data when it is '0'. Figure 1
below shows the symbol and the timing waveforms for a latch. As can be seen,
whenever enable is '1', out follows the data input. And when enable in '0', out remains
the same.

Figure 1(a): Positive level- Figure 1(b): Timing waveform for a positive level-
sensitive latch sensitive latch

Negative level-sensitive latch: A negative level-sensitive latch follows the input data
when enable is '0' and keeps its output when input is '1'.

Figure 2(a): Negative level- Figure 2(b): Timing waveform for a negative level-
sensitive latch sensitive latch

Latch timing arcs: Data can propagate to the output of the latch in two ways as
discussed below:
 Out changes with Data: This happens when enable is in its asserted state (for
example, for a positive level latch). When this happens, Out follows Data as there is a
direct path between Data and Out when Enable is '1'. This scenario is depicted in
figures 1(b) and 2(b) above wherein out is shown toggling when Data toggles. The latch
is, thus, said to have a timing arc from Data to Out.
 Out changes with Enable: This happens when Data at input changes when
Enable is in its de-asserted state. When this happens, latch waits for Enable to be
asserted, then, follows the value of Data. As figure 3 shows, Data had become stable a
lot earlier, but out toggled only when enable became asserted. So, in latches, there
exists a timing arc from Enable to Out.
Figure 3:When data changes during enable is in de-asserted state, output waits for the enable to assert. Only then,
the effect of input propagated to output

 Relation between Data and Enable: If Data toggles very close to the closing
edge of Enable, then, there might be a confusion as if its effect will be propagated to
output or not (as discussed later in this post). To make things more deterministic, we
impose a certain condition that Data should not toggle when Enable is getting de-
asserted. This relationship can be modelled as setup and hold arcs. So, there are setup
and hold timing arcs between data and enable pins of a latch. These will be discussed
below in detail.

Setup time and hold time for a latch: The most commonly used latch circuit is that
built using inverters and transmission gates. Figure 4 shows the transmission gate
implementation of a positive level-sensitive latch. The Enable has been shown as CLK
as usually is the case in sequential state machines. This circuit has two phases, as is
expected for a latch:

 When CLK = '1', Transmission gate at the input gets ON and there is a direct
path between Data and Out
 When CLK = '0', transmission gate in the loopback path gets ON. Out holds its
value

Figure 4: Positive level-sensitive latch using transmission gates

Now, when CLK transitions from '1' to '0', it is important that Data does not toggle. The
time before the clock falling edge that Data should remain stable is known as latch
setup time. Similarly, the time after the clock falling edge that Data should remain stable
is called latch hold time.

Let us go into the details of what latch setup and hold time should be for transmission
gate latch. If we want the data to be propagated properly to the output, then Data should
be stable for atleast some time before closing of the input transmission gate. This time
is such that it goes into the memory of latch; i.e., before input transmission gate closes,
Data should traverse both the inverters of the loop. So, setup time of the latch involves
the delay of input transmission gate and the two inverters. Figure 5 below shows the
setup time for the latch.

Figure 5: Setup time for latch

Similarly, if we do not want the data to propagate to output, it must not cross input
transmission gate so that it does not disturb the present state of the latch. This server
as the hold time for the latch. Assuming CLK' takes one inverter delay, input
transmission gate will close after one inverter delay only. So, the hold time for Data is
one inverter delay minus transmission gate delay. Please refer to figure 6 below for the
illustration of this. (CLK)' is formed from CLK after a delay equivalent to an inverter
delay. Only then, input transmission gate will switch off. If we want the data not to
propagate to Out, we have to ensure that it does not cross input transmission gate. So,
Data should not be present at the transmission gate's input at time (T(inv) - T(tg)). In
other words, it has to be held stable this much time after CLK edge. This is the hold time
for the latch.
Figure 6: Hold time for latch
Please note that there are other topologies also possible for latches such as dynamic
latches etc. The setup time and hold time calculations for such topologies will vary, but
the underlying principle will remain same, which is as follows:

 Setup time ensures that the data propagates to the output at the coming clock
edge
 Hold time ensures that the data does not propagate to the output at the
present/previous clock edge
Setup checks and hold checks for latches: As discussed above, the decision for the
data to be latched or not to be latched is made at the closing edge. So, the setup and
hold checks are with respect to latch closing edge only. However, since, latches are
transparent during half of the clock period, we can assume as if the capturing edge is
flexible and stretches all over the active level of the latch. This property enables a very
beautiful concept known as "time borrowing" for latches.

Basics of latch timing


A latch is a digital logic circuit that can sample a 1-bit digital value and hold it depending
upon the state of an enable signal. Based upon the state of enable, latches are
categorized into positive level-sensitive and negative level-sensitive latches.

Positive level-sensitive latch: A positive level-sensitive latch follows the input data
signal when enable is '1' and keeps its output when the data when it is '0'. Figure 1
below shows the symbol and the timing waveforms for a latch. As can be seen,
whenever enable is '1', out follows the data input. And when enable in '0', out remains
the same.

Figure 1(a): Positive level- Figure 1(b): Timing waveform for a positive level-
sensitive latch sensitive latch

Negative level-sensitive latch: A negative level-sensitive latch follows the input data
when enable is '0' and keeps its output when input is '1'.
Figure 2(a): Negative level- Figure 2(b): Timing waveform for a negative level-
sensitive latch sensitive latch

Latch timing arcs: Data can propagate to the output of the latch in two ways as
discussed below:
 Out changes with Data: This happens when enable is in its asserted state (for
example, for a positive level latch). When this happens, Out follows Data as there is a
direct path between Data and Out when Enable is '1'. This scenario is depicted in
figures 1(b) and 2(b) above wherein out is shown toggling when Data toggles. The latch
is, thus, said to have a timing arc from Data to Out.
 Out changes with Enable: This happens when Data at input changes when
Enable is in its de-asserted state. When this happens, latch waits for Enable to be
asserted, then, follows the value of Data. As figure 3 shows, Data had become stable a
lot earlier, but out toggled only when enable became asserted. So, in latches, there
exists a timing arc from Enable to Out.

Figure 3:When data changes during enable is in de-asserted state, output waits for the enable to assert. Only then,
the effect of input propagated to output

 Relation between Data and Enable: If Data toggles very close to the closing
edge of Enable, then, there might be a confusion as if its effect will be propagated to
output or not (as discussed later in this post). To make things more deterministic, we
impose a certain condition that Data should not toggle when Enable is getting de-
asserted. This relationship can be modelled as setup and hold arcs. So, there are setup
and hold timing arcs between data and enable pins of a latch. These will be discussed
below in detail.

Setup time and hold time for a latch: The most commonly used latch circuit is that
built using inverters and transmission gates. Figure 4 shows the transmission gate
implementation of a positive level-sensitive latch. The Enable has been shown as CLK
as usually is the case in sequential state machines. This circuit has two phases, as is
expected for a latch:

 When CLK = '1', Transmission gate at the input gets ON and there is a direct
path between Data and Out
 When CLK = '0', transmission gate in the loopback path gets ON. Out holds its
value

Figure 4: Positive level-sensitive latch using transmission gates

Now, when CLK transitions from '1' to '0', it is important that Data does not toggle. The
time before the clock falling edge that Data should remain stable is known as latch
setup time. Similarly, the time after the clock falling edge that Data should remain stable
is called latch hold time.

Let us go into the details of what latch setup and hold time should be for transmission
gate latch. If we want the data to be propagated properly to the output, then Data should
be stable for atleast some time before closing of the input transmission gate. This time
is such that it goes into the memory of latch; i.e., before input transmission gate closes,
Data should traverse both the inverters of the loop. So, setup time of the latch involves
the delay of input transmission gate and the two inverters. Figure 5 below shows the
setup time for the latch.
Figure 5: Setup time for latch

Similarly, if we do not want the data to propagate to output, it must not cross input
transmission gate so that it does not disturb the present state of the latch. This server
as the hold time for the latch. Assuming CLK' takes one inverter delay, input
transmission gate will close after one inverter delay only. So, the hold time for Data is
one inverter delay minus transmission gate delay. Please refer to figure 6 below for the
illustration of this. (CLK)' is formed from CLK after a delay equivalent to an inverter
delay. Only then, input transmission gate will switch off. If we want the data not to
propagate to Out, we have to ensure that it does not cross input transmission gate. So,
Data should not be present at the transmission gate's input at time (T(inv) - T(tg)). In
other words, it has to be held stable this much time after CLK edge. This is the hold time
for the latch.

Figure 6: Hold time for latch


Please note that there are other topologies also possible for latches such as dynamic
latches etc. The setup time and hold time calculations for such topologies will vary, but
the underlying principle will remain same, which is as follows:

 Setup time ensures that the data propagates to the output at the coming clock
edge
 Hold time ensures that the data does not propagate to the output at the
present/previous clock edge
Setup checks and hold checks for latches: As discussed above, the decision for the
data to be latched or not to be latched is made at the closing edge. So, the setup and
hold checks are with respect to latch closing edge only. However, since, latches are
transparent during half of the clock period, we can assume as if the capturing edge is
flexible and stretches all over the active level of the latch. This property enables a very
beautiful concept known as "time borrowing" for latches.

False paths basics and examples


False path is a very common term used in STA. It refers to a timing path which is not required to
be optimized for timing as it will never be required to get captured in a limited time when excited
in normal working situation of the chip. In normal scenario, the signal launched from a flip-flop
has to get captured at another flip-flop in only one clock cycle. However, there are certain
scenarios where it does not matter at what time the signal originating from the transmitting flop
arrives at the receiving flop. The timing path resulting in such scenarios is labeled as false path
and is not optimized for timing by the optimization tool.
Definition of false path: A timing path, which can get captured even after a very large interval
of time has passes, and still, can produce the required output is termed as a false path. A false
path, thus, does not need to get timed and can be ignored while doing timing analysis.
Common false path scenarios: Below, we list some of the examples , where false paths can be
applied:
Synchronized signals: Let us say we have a two flop synchronizer placed between a sending
and receiving flop (The sending and receiving flops may be working on different clocks or same
clock). In this scenario, it is not required to meet timing from launching flop to first stage of
synchronizer. Figure 1 below shows a two-flop synchronizer. We can consider the signal coming
to flop1 as false, since, even if the signal causes flop1 to be metastable, it will get resolved
before next clock edge arrives with the success rate governed by MTBF of the synchronizer. This
kind of false path is also known as Clock domain crossing (CDC).
Figure 1: A two flop synchronizer
However, this does not mean that wherever you see a chain of two flops, there is a false path to
first flop. The two flops may be for pipelining the logic. So, once it is confirmed that there is a
synchronizer, you can specify the signal as false.

Similarly, for other types of synchronizers as well, you can specify false paths.

False paths for static signals arising due to merging of modes: Suppose you have a structure
as shown in figure 1 below. You have two modes, and the path to multiplexer output is different
depending upon the mode. However, in order to cover timing for both the modes, you have to
keep the “Mode select bit” unconstrained. This result in paths being formed through multiplexer
select also. You can specify "set false path" through select of multiplexer as this will be static in
both the modes, if there are no special timing requirements related to mode transition on this
signal. Specifically speaking, for the scenario shown in figure 1,

Mode 1: set_case_analysis 0 MUX/SEL


Mode 2: set_case_analysis 1 MUX/SEL
Mode with Mode1 and Mode2 merged together : set_false_path -through MUX/SEL
Figure 2: Mode selection signal selecting between mode1 and mode2 paths

Architectural false paths: There are some timing paths that are never possible to occur. Let us
illustrate with the help of a hypothetical, but very simplistic example that will help understand
the scenario. Suppose we have a scenario in which the select signals of two 2:1 multiplexers are
tied to same signal. Thus, there cannot be a scenario where data through in0 pin of MUX0 can
traverse through in1 pin of MUX1. Hence, it is a false path by design architecture. Figure 3
below depicts the scenario.

Figure 3: A hypothetical example showing architectural false path

Specifying false path: The SDC command to specify a timing path as false path is "set_false_path".
We can apply false path in following cases:
 From register to register paths
o set_false_path -from regA -to regB
o
 Paths being launched from one clock and being captured at another
o set_false_path -from [get_clocks clk1] -to [get_clocks clk2]
o
 Through a signal
o set_false_path -through [get_pins AND1/B]
STA problem: Maximum frequency of operation of a
timing path
Problem: Figure 1 below shows a timing path from a positive edge-triggered register to a
positive edge-triggered register. Can you figure out the maximum frequency of operation for this
path?

Figure 1: A sample timing path


Solution:
The above timing path is a single cycle timing path. The maximum frequency is
governed by setup timing equation. In other words, maximum frequency of operation is
the maximum frequency (minimum time period of clock) that satisfies the following
condition:

Tck->q + Tprop + Tsetup - Tskew < Tperiod


Here,
Tck->q = 2 ns, Tprop = 4 ns, Tsetup = 1 ns, Tskew = 1 ns, Tperiod
Now,
Tperiod > 2 ns + 4 ns + 1 ns - 1 ns
Tperiod > 6 ns
So, the minimum time period of the clock required is 6 ns. And the maximum frequency
that the above circuit can work is (1000/6) MHz = 166.67 MHz.

It should be noted that at if we operate this timing path at maximum frequency


calculated, setup slack will be zero. :-)

In this post, we talked about frequency of operation of single cycle timing paths. Can
you figure out maximum frequency of operation for half cycle timing paths? Also, there
is a relation of maximum operating frequency to hold timing? Can you think about this
situation?

STA problem: Checking for setup/hold violations in a


timing path
Problem: Figure 1 below shows a timing path from positive edge-triggered register to a positive
edge-triggered register. Can you figure out if there is any setup and/or hold violation in the
following circuit?

Figure 1: A sample timing path

Solution:

To check if a timing path violates setup and/or hold, we need to check if they satisfy
setup and hold equations. A violating timing path has a negative setup/hold slack value.

The above circuit has a positive clock skew of 1 ns (as capture flip-flop gets clock 1 ns
later than launch flip-flop).

Let us first check for setup violation. As we know, for a full cycle register-to-register
timing path, setup equation is given as:
Tck->q + Tprop + Tsetup - Tskew < Tperiod
Here,
Tck->q = 2 ns, Tprop (max value of combinational propagation delay) = 4 ns, Tsetup = 1 ns,
Tperiod = 10 ns, Tskew = 1 ns
Now, Tck->q + Tprop + Tsetup = 2 + 4 + 1 - 1 = 6 ns < Tperiod
So, the above circuit does not have a setup violation. The setup slack, in this case, will
be given as:
SS = Tperiod - (Tck->q + Tprop + Tsetup - Tskew)
SS = +4 ns
Since, setup slack comes out to be positive, this path does not have a setup violation.

Now, let us check is there is a hold violation for this timing path. Hold timing equation is
given as:
Tck->q + Tprop > Thold + Tskew
Here,
Tck->q = 2 ns, Tprop (min value of combinational propagation delay) = 2 ns, Thold = 2ns,
Tskew = 1 ns
Now, Tck->q + Tprop = 2 ns + 2 ns = 4 ns
And Thold + Tskew = 2 ns + 1 ns = 3 ns
Now, 4 ns > 3 ns, so this circuit does not have a hold violation. The hold slack, in this
case, will be given as:
HS = Tck->q + Tprop - (Thold + Tskew) = +1 ns
Since, hold slack comes out to be positive, this path does not have a hold violation.

Quiz: Modeling skew requirements with data-to-data


setup and hold checks
Problem: Suppose there are 'N' signals which are to be skew matched within a
window of 200 ps with respect to each other. Model this requirement with the help
of data setup and hold checks.

As we discussed in data setup and data hold checks, data setup check of 200 ps means
that constrained data should come at least 200 ps before the reference data. Similarly,
data hold check of 200 ps constrains the constrained data to come at least 200 ps after
the reference data. The same is shown pictorially in figure 1(a) and 1(b).
Figure 1(a): Data setup check of 200 ps constrains the constrained signal to toggle at-least 200 ps before reference
signal toggles.

Figure 1(b): Data hold check of 200 ps constrains the constrained signal to toggle at-least 200 ps after the reference
signal has toggled.

Now, suppose you apply a data setup check of -200 ps instead of 200 ps. This would
mean that the constrained signal can toggle upto 200 ps after the reference signal.
Similarly, a data hold check of -200 ps would mean that the constrained signal can
toggle from 200 ps before the reference signal. If we apply both the checks together, it
would infer that constrained signal can toggle in a window that ranges from 200 ps
before the toggling of reference signal to 200 ps after the toggling of reference signal.
This is pictorially shown in figures 2(a) and 2(b).
Figure 2(a): Negative data setup and hold checks of 200 ps
If we combine the two checks, it implies that the constrained data can toggle upto 200
ps after and from 200 ps before the reference signal. In other words, we have
constrained the constrained signal to toggle in a window +- 200 ps within the reference
signal.

Coming to the given problem, if there are a number of signals required to toggle within a
window of 200 ps, we can consider one of these to act as reference signal and other
signals as constrained signals. The other signals can then be constrained in both setup
and hold with respect to reference signal such that all of these lie within +-100 ps of the
reference signal. The same is shown in figure 3 below:

Figure 3: Data checks to maintain skew between N signals


Data checks : data setup and data hold in VLSI

Many a times, two or more signals at analog-digital interface or at the chip interface

have some timing requirement with respect to each other. These requirements are

generally in the form of minimum skew and maximum skew. Data checks come to

rescue in such situations. Theoretically speaking, data-to-data checks are applied

between two arbitrary data signals, none of which is a clock. One of these is called

constrained pin, which is like data pin of the flop. The other is called related pin, which is

like clock pin of the flop. In the figure below, is shown two data signal at a boundary

(possibly analog hard macro) having some minimum skew requirement between them.

Figure 1 : Two signals arriving at a boundary having skew requirement


Data-to-data checks are zero cycle checks: An important difference between normal

setup check (between a clock signal and a data signal) and data-to-data checks is that

data-to-data checks are zero cycle checks while normal setup check is single cycle

check. When we say that data checks are zero-cycle checks, we mean that these are

between two data signals that have launched at the same clock edge with respect to

each other.

As shown in the figure (i) below, traditional setup check is between a data signal

launched at one clock edge and a clock. Since, the data is launched at one clock edge

and is checked with respect to one edge later, it is a single cycle check. On the other

hand, as shown in figure (ii), data-to-data check is between two signals both of which

are launched on the same clock edge. Hence, we can say data-to-data checks are zero

cycle checks.

Figure 2 : (i) Normal setup check between a data signal and a clock signal, (ii) Data to dat
setup check between two data signals

What command in EDA tools is used to model data-to-data checks: Data checks

are modeled in EDA tools using ‘set_data_check’ command.

Set_data_check –from A –to B –setup <x>


Set_data_check –from A –to B –hold <y>

Here, A is the related pin and B is the constrained pin. The first command constrains B

to toggle at least ‘x’ time before ‘A’. Second command constrains ‘B’ to toggle at least ‘y’

time after ‘A’.

Data setup time and data hold time: Similar to setup time and hold time, we can

define data setup time and data hold time as follows:


 Definition of data setup time: Data setup time can be defined as the minimum
time before the toggling of reference signal for which constrained signal should become
stable. In the example above, <x> is data setup time.
 Definition of data hold time: Data hold time can be defined as the minimum
time after the toggling of reference signal for which constrained signal should remain
stable. In the example above, <y> is data hold time.

Modeling data-to-data checks through liberty file: We can model data checks

through .lib also. There are constructs that can be used to model data-to-data checks.

These

are non_seq_setup_rising, non_seq_setup_falling, non_seq_hold_rising and non_

seq_hold_falling. These commands specify setup and hold data-to-data checks with

respect to rising or falling edge of reference signal respectively. E.g.

‘non_seq_setup_falling’ represents data setup check with respect to falling edge of

reference signal. ‘rise_constraint’ and ‘fall_constraint’ can be used inside these to model
the setup and hold checks for rising and falling edge of constrained signal. Figure 3

below shows an example of modeling data setup check through liberty file.
Figure 3 : Modeling data-to-data checks through .lib using non_seq_setup_rising

In which cases data-to-data checks are applied: Data checks are normally applied

where there is a specific requirement of skew (either minimum of maximum) or race

condition (where the order of arrival of two signals can affect output and the intention is

to get one of the probable outputs by constraining one signal to come before the other)

between two or more signals. These may be required where:


 At the digital-analog interface within a chip where analog signals at the analog
block boundary are required in a specific order
 At the chip boundary, some asynchronous interface signals may have skew
requirements

How data checks are useful: As we have seen above, data checks provide a

convenient measure to constrain two or more data signals with respect to each other.

Had these checks not been there, it would have been a manual effort to check the skew

between the two arriving signals and to maintain it. Also, it would not have been
possible to get the optimization done through implementation tool as these would not

have been constrained for it.

Clock gating checks


Today’s designs have many functional as well as test modes. A number of clocks
propagate to different parts of design in different modes. And a number of control
signals are there which control these clocks. These signals are behind switching on and
off the design. Let us say, we have a simple design as shown in the figure below. Pin
‘SEL’ selects between two clocks. Also, ‘EN’ selects if clock will be propagating to the
sub-design or not. Similarly, there are signals that decide what, when, where and how
for propagation of clocks. Some of these controlling signals may be static while some of
these might be dynamic. Even with all this, these signals should not play with waveform
of the clock; i.e. these should not cause any glitch in clock path. There are both
architectural as well as timing care-abouts that are to be taken care of while designing
for signals toggling in clock paths. This scenario is widely known as ‘clock gating’. The
timing checks that need to be modeled in timing constraints are known as ‘clock gating
checks’.

Figure 1: A simplest clocking structure


Definition of clock gating check: A clock gating check is a constraint, either applied or
inferred automatically by tool, that ensures that the clock will propagate without any
glitch through the gate.

Types of clock gating checks: Fundamentally, all clock gating checks can be
categorized into two types:

AND type clock gating check: Let us say we have a 2-input AND gate in which one of
the inputs has a clock and the other input has a data which will toggle while the clock is
still on.

Figure 2: AND type clock gating check; EN signal


controlling CLK_IN through AND gate

Since, the clock is free-running, we have to ensure that the change of state of enable
signal does not cause the output of the AND gate to toggle. This is only possible if the
enable input toggles when clock is at ‘0’ state. As is shown in figure 3 below, if ‘EN’
toggles when ‘CLK_IN’ is high, the clock pulse gets clipped. In other words, we do not
get full duty cycle of the clock. Thus, this is a functional architectural miss causing glitch
in clock path. As is evident in figure 4, if ‘EN’ changes during ‘CLK_IN’ are low, there is
no change in clock duty cycle. Hence, this is the right way to gate a clock signal with an
enable signal; i.e. make the enable toggle only when clock is low.

Figure 3: Clock being clipped when ‘EN’ changes when ‘CLK_IN’ is high
Figure 4: Clock waveform not being altered when ‘EN’ changes when ‘CLK_IN’ is low

Theoretically, ‘EN’ can launch from either positive edge-triggered or negative edge-
triggered flops. In case ‘EN’ is launched by a positive edge-triggered flop, the setup and
hold checks will be as shown in figure 5. As shown, setup check in this case is on the
next positive edge and hold check is on next negative edge. However, the ratio of
maximum and minimum delays of cells in extreme operating conditions may be as high
as 3. So, architecturally, this situation is not possible to guarantee the clock to pass
under all conditions.

Figure 5: Clock gating setup and hold checks on AND gate when 'EN' launches from a positive edge-triggered flip-
flop

On the contrary, if ‘EN’ launches from a negative edge-triggered flip-flop, setup check
are formed with respect to the next rising edge and hold check is on the same falling
edge (zero-cycle) as that of the launch edge. The same is shown in figure 6. Since, in
this case, hold check is 0 cycle, both the checks are possible to be met for all operating
conditions; hence, this solution will guarantee the clock to pass under all operating
condition provided the setup check is met for worst case condition. The inactive clock
state, as evident, in this case, is '0'.
Figure 6: Clock gating setup and hold checks on AND gate when ‘EN’ launches from negative edge-triggered flip-
flop

Figure 7: An OR gate controlling a clock signal 'CLK_IN'

OR type clock gating check: Similarly, since the off-state of OR gate is 1, the enable
for an OR type clock gating check can change only when the clock is at ‘1’ state. That
is, we have to ensure that the change of state of enable signal does not cause the
output of the OR gate to toggle. Figure 8 below shows if ‘EN’ toggles when ‘CLK_IN’ is
high, there is no change in duty cycle. However, if ‘EN’ toggles when ‘CLK_IN’ is low
(figure 9), the clock pulse gets clipped. Thus, ‘EN’ must be allowed to toggle only when
‘CLK_IN’ is high.

Figure 8: Clock being clipped when 'EN' changes when 'CLK_IN' is low
Figure 9: Clock waveform not being altered when 'EN' changes when 'CLK_IN' is low

As in case of AND gate, here also, ‘EN’ can launch from either positive or negative edge
flops. In case ‘EN’ launches from negative edge-triggered flop, the setup and hold
checks will be as shown in the figure 10. The setup check is on the next negative edge
and hold check is on the next positive edge. As discussed earlier, it cannot guarantee
the glitch less propagation of clock.

Figure 10: Clock gating setup and hold checks on OR gate when ‘EN’ launches from negative edge-triggered flip-
flop

If ‘EN’ launches from a positive edge-triggered flip-flop, setup check is with respect to
next falling edge and hold check is on the same rising edge as that of the launch edge.
The same is shown in figure 11. Since, the hold check is 0 cycle, both setup and hold
checks are guaranteed to be met under all operating conditions provided the path has
been optimized to meet setup check for worst case condition. The inactive clock state,
evidently, in this case, is '1'.
Figure 11: Clock gating setup and hold checks on OR gate when 'EN' launches from a positive edge-
triggered flip-flop

We have, thus far, discussed two fundamental types of clock gating checks. There may
be complex combinational cells other than 2-input AND or OR gates. However, for these
cells, too, the checks we have to meet between the clock and enable pins will be of the
above two types only. If the enable can change during low phase of the clock only, it is
said to be AND type clock gating check and vice-versa.

SDC command for application of clock gating checks: In STA, clock gating checks
can be applied with the help of SDC command set_clock_gating_check.

Multicycle paths : The architectural perspective

Definition of multicycle paths: By definition, a multi-cycle path is one in which data


launched from one flop is allowed (through architecture definition) to take more than one
clock cycle to reach to the destination flop. And it is architecturally ensured either by
gating the data or clock from reaching the destination flops. There can be many such
scenarios inside a System on Chip where we can apply multi-cycle paths as discussed
later. In this post, we discuss architectural aspects of multicycle paths. For timing
aspects like application, analysis etc, please refer Multicycle paths handling in STA.

Why multi-cycle paths are introduced in designs: A typical System on Chip consists
of many components working in tandem. Each of these works on different frequencies
depending upon performance and other requirements. Ideally, the designer would want
the maximum throughput possible from each component in design with paying proper
respect to power, timing and area constraints. The designer may think to introduce
multi-cycle paths in the design in one of the following scenarios:

1) Very large data-path limiting the frequency of entire component: Let us take a
hypothetical case in which one of the components is to be designed to work at 500
MHz; however, one of the data-paths is too large to work at this frequency. Let us say,
minimum the data-path under consideration can take is 3 ns. Thus, if we assume all the
paths as single cycle, the component cannot work at more than 333 MHz; however, if
we ignore this path, the rest of the design can attain 500 MHz without much difficulty.
Thus, we can sacrifice this path only so that the rest of the component will work at 500
MHz. In that case, we can make that particular path as a multi-cycle path so that it will
work at 250 MHz sacrificing the performance for that one path only.

2) Paths starting from slow clock and ending at fast clock: For simplicity, let us
suppose there is a data-path involving one start-point and one end point with the start-
point receiving clock that is half in frequency to that of the end point. Now, the start-
point can only send the data at half the rate than the end point can receive. Therefore,
there is no gain in running the end-point at double the clock frequency. Also, since, the
data is launched once only two cycles, we can modify the architecture such that the
data is received after a gap of one cycle. In other words, instead of single cycle data-
path, we can afford a two cycle data-path in such a case. This will actually save power
as the data-path now has two cycles to traverse to the endpoint. So, less drive strength
cells with less area and power can be used. Also, if the multi-cycle has been
implemented through clock enable (discussed later), clock power will also be saved.

Implementation of multi-cycle paths in architecture: Let us discuss some of the


ways of introducing multi-cycle paths in the design:

1) Through gating in data-path: Refer to figure 1 below, wherein ‘Enable’ signal


gates the data-path towards the capturing flip-flop. Now, by controlling the waveform at
enable signal, we can make the signal multi-cycle. As is shown in the waveform, if the
enable signal toggles once every three cycles, the data at the end-point toggles after
three cycles. Hence, the data launched at edge ‘1’ can arrive at capturing flop only at
edge ‘4’. Thus, we can have a multi-cycle of 3 in this case getting a total of 3 cycles for
data to traverse to capture flop. Thus, in this case, the setup check is of 3 cycles and
hold check is 0 cycle.
Figure 1: Introducing multi cycle path in design by gating the data-path

Now let us extend this discussion to the case wherein the launch clock is half in
frequency to the capture clock. Let us say, Enable changes once every two cycles.
Here, the intention is to make the data-path a multi-cycle of 2 relative to faster clock
(capture clock here). As is evident from the figure below, it is important to have Enable
signal take proper waveform as on the waveform on right hand side of figure 2. In this
case, the setup check will be two cycles of capture clock and hold check will be 0 cycle.

Figure 2: Introducing multi-cycle path where launch clock is half in frequency to capture
clock

2) Through gating in clock path: Similarly, we can make the capturing flop capture
data once every few cycles by clipping the clock. In other words, send only those pulses
of clock to the capturing flip-flop at which you want the data to be captured. This can be
done similar to data-path masking as discussed in point 1 with the only difference being
that the enable will be masking the clock signal going to the capturing flop. This kind of
gating is more advantageous in terms of power saving. Since, the capturing flip-flop
does not get clock signal, so we save some power too.
Figure 3: Introducing multi cycle paths through gating the clock path

Figure 3 above shows how multicycle paths can be achieved with the help of clock
gating. The enable signal, in this case, launches from negative edge-triggered register
due to architectural reasons (read here). With the enable waveform as shown in figure
3, flop will get clock pulse once in every four cycles. Thus, we can have a multicycle
path of 4 cycles from launch to capture. The setup check and hold check, in this case, is
also shown in figure 3. The setup check will be a 4 cycle check, whereas hold check will
be a zero cycle check.

Pipelining v/s introducing multi-cycle paths: Making a long data-path to get to


destination in two cycles can alternatively be implemented through pipelining the logic.
This is much simpler approach in most of the cases than making the path multi-cycle.
Pipelining means splitting the data-path into two halves and putting a flop between
them, essentially making the data-path two cycles. This approach also eases the timing
at the cost of performance of the data-path. However, looking at the whole component
level, we can afford to run the whole component at higher frequency. But in some
situations, it is not economical to insert pipelined flops as there may not be suitable
points available. In such a scenario, we have to go with the approach of making the
path multi-cycle.

Latchup and its prevention in CMOS devices


What is Latchup: Latchup refers to short circuit formed between power and ground rails in an
IC leading to high current and damage to the IC. Speaking about CMOS transistors, latch up is
the phenomenon of low impedance path between power rail and ground rail due to interaction
between parasitic pnp and npn transistors. The structure formed by these resembles a Silicon
Controlled rectifier (SCR, usually known as a thyristor, a PNPN device used in power
electronics). These form a +ve feedback loop, short circuit the power rail and ground rail, which
eventually causes excessive current, and can even permanently damage the device.
Figure 1: Latchup formation in CMOS
Latchup formation: Shown alongside is a CMOS transistor consisting of an NMOS and a
PMOS device. Q1 and Q2 are parasitic transistor elements residing inside it. Q1 is double emitter
pnp transistor whose base is formed by n well substrate of PMOS, two emitters are formed by
source and drain terminal of PMOS and collector is formed by substrate(p type) of NMOS. The
reverse is true for Q2. The two parasitic transistors form a positive feedback loop and is
equivalent to an SCR (as stated earlier).

Analysis of latchup formation: Unless SCR is triggered by an external disturbance, the


collector current of both transistors consists of reverse leakage current. But if collector current of
one of BJT is temporarily increased by disturbance, resulting positive feedback loop causes
current perturbation to be multiplied by β1β2 as explained below. The disturbance may be a
spike of input voltage on an input or output pin, leading to junction breakdown, or ionizing
radiations.

Because collector current of one transistor Q1 is fed as input base current to another transistor
Q2, collector current of Q2, Ic2 = β2 * Ib2 and this collector current Ic2 is fed as input base
current Ib1 to another transistor Q1. In this way both transistors feedback each other and the
collector current of each goes on multiplying.

Net gain of SCR device = β1 *β2


Total current in one loop = current perturbation * Gain
If β1 *β2 >=1, both transistors will conduct a high saturation current even after the triggering
perturbation is no longer available. This current will eventually becomes so large that it may
damage the device.
Latch-up prevention techniques: Simply put, latchup prevention/protection includes putting a
high resistance in the path so as to limit the current through supply and make β1 *β2 < 1. This
can be done with the help of following techniques:

 Surrounding PMOS and NMOS transistors with an insulating oxide layer (trench).
This breaks parasitic SCR structure.
 Latchup Protection Technology circuitry which shuts off the device when latchup
is detected.

Engineering Change Order (ECO)(doneee)

A semiconductor chip undergoes synthesis, placement, clock tree synthesis and routing
processes before going for fabrication. All these processes require some time, hence, it
requires time (9 months to 1 year for a normal sized chip) for a new chip to be sent for
fabrication. As a result of cut-throat competition, all the semiconductor companies stress
on Cycle-time reduction to be ahead of others in the race. New ways are being found
out to achieve the same. New techniques are being developed and more advanced
tools are being used. Sometimes, the new chip to be produced may be an incremental
change over an existing product. In such a case, there may not be the need to go over
the same cycle of complete synthesis, placement and routing. However, everything may
be carried out in incremental manner so as to reduce engineering costs, time and
manufacture costs.

It is a known fact that the fabrication process of a VLSI chip involves manufacture of a
number of masks, each mask corresponding to one layer. There are two kinds of layers
– base and metal. Base layers contain the information regarding the geometry and
kind of transistors, resistors, capacitors and other devices. Metal layers contain
information regarding metal interconnects used for connection of devices. For a
sub-micron technology, the mask costs may be greater than a million dollars.
Hence, to minimize the cost, the tendency is to reuse as many masks as possible. So, it
is tried to implement the ECO with minimal number of layers change. Also, due to cycle
time crunch, it is a tradition to send the base layers for the manufacture of masks while
the metals are still modified to eliminate any kind of DRC’s. This saves around two
weeks in cycle time. The base layer masks are developed while metal layers are still
being modified.
What conditions cause an Engineering Change Order: As mentioned above, ECO
are needed when the process steps are needed to be executed in an incremental
manner. This may be due to-

 Some functionality enhancement of the existing device. This functionality


enhancement change may be too small to undergo all the process steps again

 There may be some design bug that needs to be fixed and was caught very late
in the design cycle. It is very costly to re-run all the process cycle steps for each bug in
terms of time and cost. Hence, these changes need to be taken incrementally.

Normally, there is a case that design enhancements/functional bug fixes are being
implemented after the design has already been sent for fabrication. For instance, the
functional bug may be caught in silicon itself. To fix the bug, it is not practical to restart
the cycle.

The ECO process starts with the changes in the definition to be implemented into the
RTL. The resulting netlist synthesized from the modified netlist is, then, compared with
the golden netlist being implemented. The logic causing the difference is then
implemented into the main netlist. The netlist, then, undergoes placement of the
incremental logic, clock tree modifications and routing optimizations based upon the
requirements.

Kinds of ECO: The engineering change orders can be classified into two categories:

 All layers ECO: In this, the design change is implemented using all layers. This
kind of ECO provides advantage in terms of cycle time and engineering costs. It is
implemented whenever the change is not possible to be carried out without all layer
change e.g. there is an updation in a hard macro cell or the change may require
updation of 100’s of cells. It is almost impossible to contain such a large change to a
few layers only.

 Metal-only ECO: As discussed above, due to incurring costs, sometimes, it may


not be practical to use all the layers (base + metal) to do the ECO. In that case, to
minimize the cost, it is required to be completed with changes only in minimal number of
metal layers. These days, it is expected that every design will be re-opened for the
ECOs. So, an adequate number of spare cells are sprinkled during the implementation
all over the design to be used later on. These cells are spread uniformly over the
design. The inputs of these cells are tied. Whenever the need for an ECO arises, the
cells to be implemented can be mapped into the existing spare cells. Hence, there is no
need to change the base layers in such a case. Only the connections need to be
updated which can be done by changing the metal layers only. Hence, the base layer
cost is saved.

Steps to carry out an ECO: The ECOs are best implemented manually. There exist
some automated ways to carry out the functional ECOs, but the most efficient and
effective method is to implement manually. Generally, following steps are employed to
carry out Engineering Change Orders:

1. The RTL with ECO implemented is synthesized and compared with the
golden netlist.

2. The delta is implemented into the golden netlist. The modified netlist is
then again compared with the synthesized netlist to ensure the logic has been
implemented correctly.

3. The logic is the placed incrementally. However, if it is metal-only ECO,


spare cells in the proximity of the changed logic are found out.

4. The connections are, then, modified in metal layers.

Spare Cells

We have discussed in our post titled 'Engineering Change Order' about the important to
have a uniform distribution of spare cells in the design. Nowadays, there is a trend
among the VLSI corporations to implement metal-only functional and timing ECOs due
to their low-cost. Let us discuss about the spare cells in a bit more detail here.
Figure showing spare cells in the design

Spare cells are put onto the chip during implementation keeping into view the possibility
of modifications that are planned to be carried out into the design without disturbing the
layers of base. This is because carrying out design changes with minimal layer changes
saves a lot of cost from fabrication point of view as each layer mask has a significant
cost of its own. Let us start by defining what a spare cell is. A spare cell can be thought
of as a redundant cell that is not used currently in the design. It may be in use later on,
but currently, it is sitting without doing any job. A spare cell does not contribute to the
functionality of the device. We can compare a spare cell with a spare wheel being
carried in a motor car to be used in case one of the wheels gets punctured. In that case,
the spare wheel will be replacing the main wheel. Similarly, a spare cell can be used to
replace an existing cell if the situation demands (eg. to meet the timing). However,
unlike spare wheels, spare cells may be added to the design even if they do not replace
any existing cell according as the need arises.
Kinds of spare cells: There are many variants of spare cells in the design. Designs are
full of spare inverters, buffers, nand, nor and specially designed configurable spare
cells. However, based on the origin of spare cells, these can be divided into two broad
categories:
 Those used deliberately as spare cells in the design: As discussed earlier,
most of the designs today have spare cells sprinkled uniformly. These cells have inputs
and outputs tied to either ‘0’ or ‘1’ so that they contribute minimum to static and dynamic
power.

 Those converted into spare cells due to design changes: There may be a
case that a cell that is being identified as a spare now was a main cell in the past. Due
to some design changes, the cell might have been replaced by another cell. Also, some
cells have floating outputs. These can be used as spare cells. We can also use the
used buffers as spare cells if removing the buffer does not introduce any setup/hold
violation in the design.
Advantages of using spare cells in the design: Introduction of spare cells into the
design offers several advantages such as:

 Reusability: A design change can be carried out using metal layers only. So, the
base layers can be re-used for fabrication of new chips.

 Cost reduction: Significant amount of money is saved both in terms of


engineering and manufacture costs.

 Design flexibility: As there are spare cells, small changes can be taken into the
design without much difficulty. Hence, the presence of spare cells provides flexibility to
the design.

 Cycle time reduction: Nowadays, there is a trend to tape out base layers to the
foundry for fabrication as masks are not prepared in parallel. In the meantime, the
timing violations/design changes are being carried out in metal layers. Hence, there is
cycle time reduction of one to two weeks.
Disadvantages of using spare cells: In addition to many advantages, usage of spare
cells offers some disadvantages too. These are:

 Contribution to static power: Each spare cell has its static power dissipation.
Hence, greater amount of spare cells contribute more to power. But, in general, this
amount of power is insignificant in comparison to total power. Spare cells should be
added keeping into consideration their contribution to power.

 Area: Spare cells occupy area on the chip. So, more spare cells mean more
density of cells.
Thus, we have discussed about the spare cells here. Spare cells are used almost in
every design in each device manufactured today. It is important to make an intelligent
selection of spare cells to be sprinkled in the design. Many technical papers have been
published stating its importance and on the structure of the spare cells that can be
generalized to be used as any of the logic gate. In general, a collection of
nand/nor/inverters/buffers is sprinkled more or less uniformly. The modules where more
number of ECOs are expected, (like a new architecture being used for the first time)
should be sprinkled with more spare cells. On the contrary, those having stable
architectures are usually sprinkled with less number of spare cells as the probability of
an ECO is very less in the vicinity of these modules/macros.

Our world – Digital or analog

There are two kinds of electronic systems that we


encounter in our daily life – digital and analog. Digital systems are the ones in which the variables to be
dealt with can presume only some specified values whereas in analog systems, these variables can assume
any of the infinite values. The superiority of digital devices over analog devices has ever been a topic of
discussion. This is the reason why digital devices have taken over analog in almost all the areas that we
encounter today. Digital computers, digital watches, digital thermometers etc. have replaced analog
computers, analog watches and analog thermometers, and so on. Digital devices have replaced the analog
ones due to their superior performance, better ability to handle noise and reliability in spite of being more
costly than analog ones. Although most of the devices used today are digital, but in general, the world
around us seems to be analog. All the physical quantities around us; i.e. Light, heat, current are analog.
The so called digital devices have to interface with this analog real world only. For instance, a digital
camera interfaces with analog signal (light) and converts it into information in the form of pixels that
collectively form a digital image. Similarly, a music system converts the digital information stored in a
music CD into pleasant music which is nothing but analog sound waves. All the digital devices that we
know have this characteristic in common. Simply speaking, there are devices known as Analog to Digital
converters (ADC) and Digital to Analog converters (DAC) that acts as an interface between the real analog
world and the digital devices and converts the data sensed by analog sensor into the digital information
understood by the digital system and vice-versa. They all interface with the so called analog world. But is
the analog world really analog? Is it true that analog variables can take any number of values? Or is there
some limit of granularity for them too. Is this world inherently digital or analog in nature? Is digital more
fundamental than analog?
As we all know, there are many fundamental quantities in this universe viz. Mass, length, time,
charge, light etc. We have been encountering these ever since the world has begun. Now the question
arises – whether all these quantities are inherently analog or digital? Finding the answer to this question
will automatically bring us to the answer of our main question; i.e. whether the basics of this world lie in
analog or digital. It is often said that “Heart of digital devices is analog.” (See figure below). This is
because, as visible on a macroscopic scale, the current and voltage waveforms produced by a digital
circuit/system are not digital in fact. This can be observed from the fact that the transition from one logic
state to another cannot be abrupt. Also, there are small spikes in the voltage levels even if the system is
stable in one state. But, seen at microscopic level in terms of transfer of current by transfer of electrons,
since, there can only be

transfer of an integral number of electrons, current can only take one of numerous values, and not just
any value. Let us take an illustration. The charge on an electron is 1.6E19 (or
0.00000000000000000016) represented as ‘e’. It is the smallest charge ever discovered. It is well known
that charge can exist only in the multiples of ‘e’. Thus, electric charge is a digital quantity with the smallest
unit ‘e’. When we say that the value of charge at a point is +1C, we actually mean that the charge is caused
by transfer of 6250000000000000000 electrons. Since, the smallest unit of charge is
0.00000000000000000016 C, hence, there cannot exist any charge of value
1.00000000000000000015 C, since that will make the number of electrons to be a fraction. Since, the
magnitude of 1C is very large as compared to charge on 1e, it appears to us as continuous and not discrete.
For us, there is no difference between 1. 00000000000000000015 and 1 as the devices we use don’t
measure with that much precision. Hence, we infer these quantities as analog. Similar is the case with
other physical quantities.

Many laws have been formed by our great scientists postulating about the quantization of some basic
physical quantities. Viz. Planck’s quantum theory states that angular momentum of an electron in the
orbit of an atom is quantized. Simply stated, it states that the angular momentum can take only specified
values given as multiples of h/2Π. Thus, the smallest angular momentum an electron can have is h/2Π
and the angular momentum can increment only in steps of h/2Π. If we take h/2Π as one unit, then we can
say that angular momentum of an electron is a digital quantity. Similarly speaking, Light is also known to
consist of photons. According to Planck’s quantum theory, the light intensity is also an integral multiple of
the intensity of a single photon. Thus, light is also inherently a digital quantity. Also, as stated above, the
charge is also quantized.

But there are some physical quantities of which quantization is yet to be established. Mass is one of those
quantities. But, it is believed that the quantization of mass will be established soon.

Thus, we have seen that most of the physical quantities known are known to be digital at microscopic
level. Since, we encounter these at macroscopic level having billions and billions of basic units, the
increments in these seem to be continuous to us as the smallest incremental unit is negligible in
comparison to actual measure of the quantity and we perceive them as analog in nature.

Thus, we can come to the conclusion that most of the quantities in this world are digital by their blood.
Once the quantization of mass will be established, we can conclude with surety that digital lies in the soul
of this world. This digital is similar to our definition of digital systems; just the difference is that it occurs
at a very minute scale which we cannot perceive at our own.

What is Static Timing Analysis?


Static timing analysis (STA) is an analysis method of computing the delay bounds of a
complete circuit without actually simulating the full circuit. In STA, static delays such as
gate delay and net delays are considered in each path. These delays are, then,
compared against the required bounds on the delay values and/or the relationship
between the delays of different gates. In STA, the circuit to be analyzed is broken down
into timing paths consisting of gates, registers and nets connecting these. Normally,
timing paths start from and end at registers or chip boundary. Based on origin and
termination of data, timing paths can be categorized into four categories:
1.) Input to register paths: These paths start at chip boundary from input ports and
end at registers
2.) Register to register paths: These paths start at register output pin and terminate at
register input pin
3.) Register to output paths: These paths start at a register and end at chip boundary
output ports
4.) Input to output paths: These paths start from chip boundary at input port and end
at chip boundary at output port
Timing path from each start-point to end-point are constrained to have maximum and
minimum delays. For example, for register to register paths, each path can take
maximum of one clock cycle (minus input/output delay in case of input/output to register
paths). The minimum delay of a path is governed by hold timing requirement of the
endpoints. Thus, the maximum delay taken by a timing path governs the maximum
frequency of operation.
As stated before, Static timing analysis does timing analysis without actually simulating
the circuit. The delays of cells are picked from respecting technology libraries. The
delays are available in libraries in tabulated form on the basis of input transition and
output load, which have been calculated based by simulating the cells for a range of
boundary conditions. Net delays are calculated based upon R and C models.

One important characteristic of static timing analysis that must be discussed is that
static timing analysis checks the static delay requirements of the circuit without applying
any vectors, hence, the delays calculated are the maximum and minimum bounds of the
delays that will occur in real application scenarios with vectors applied. This enables the
static timing analysis to be fast and inclusive of all the boundary conditions. Dynamic
timing analysis, on the contrary, applies input vectors, so is very slow. It is necessary to
certify the functionality of the design. Thus, static timing analysis guarantees the timing
of the design whereas dynamic timing analysis guarantees functionality for real
application specific input vectors.

Can a net have negative propagation delay?


As we discussed in ‘’Is it possible for a logic gate to have negative propagation delay”, a
logic cell can have negative propagation delay. However, the only condition we
mentioned was that the transition at the output pin should be improved drastically so
that 50% level at output is reached before 50% level of input waveform.
In other words, the only condition for negative delay is to have improvement in slew. As
we know, a net has only passive parasitic in the form of parasitic resistances and
capacitances. Passive elements can only degrade the transition as they cannot provide
energy (assuming no crosstalk); rather can only dissipate it. In other words, it is not
possible for a net to have negative propagation delay.

However, we can have negative delay for a net, if there is crosstalk, as crosstalk can
improve the transition on a net. In other words, in the presence of crosstalk, we can
have 50% level at output reached before 50% level at input; hence, negative
propagation delay of a net.

Scan chains – the backbone of DFT(done)

What are scan chains: Scan chains are the elements in scan-based designs that are
used to shift-in and shift-out test data. A scan chain is formed by a number of flops
connected back to back in a chain with the output of one flop connected to another. The
input of first flop is connected to the input pin of the chip (called scan-in) from where
scan data is fed. The output of the last flop is connected to the output pin of the chip
(called scan-out) which is used to take the shifted data out. The figure below shows a
scan chain.

A scan chain

Purpose of scan chains: As said above, scan chains are inserted into designs to shift
the test data into the chip and out of the chip. This is done in order to make every point
in the chip controllable and observable as discussed below.
How normal flop is transformed into a scan flop: The flops in the design have to be
modified in order to be put in the scan chains. To do so, the normal input (D) of the flip-
flop has to be multiplexed with the scan input. A signal called scan-enable is used to
control which input will propagate to the output.

Figure showing transition of a normal flop to scan flop

If scan-enable = 0, data at D pin of the flop will propagate to Q at the next active edge
If scan-enable= 1, data present at scan-in input will propagate to Q at the next active
edge

Scan terminology: Before we talk further, it will be useful to know some signals used in
scan chains which are as follows:
 Scan-in: Input to the flop/scan-chain that is used to provide scan data into it
 Scan-out: Output from flop/scan-chain that provides the scanned data to the
next flop/output
 Scan-enable: Input to the flop that controls whether scan_in data or functional
data will propagate to output

Purpose of testing using scan: Scan testing is carried out for various reasons, two most
prominent of them are:
 To test stuck-at faults in manufactured devices
 To test the paths in the manufactured devices for delay; i.e. to test whether each
path is working at functional frequency or not
How a scan chain functions: The fundamental goal of scan chains is to make each
node in the circuit controllable and observable through limited number of patterns by
providing a bypass path to each flip-flop. Basically, it follows these steps:
1. Assert scan_enable (make it high) so as to enable (SI -> Q) path for each
flop
2. Keep shifting in the scan data until the intended values at intended nodes
are reached
3. De-assert scan_enable (for one pulse of clock in case of stuck-at testing
and two or more cycles in case of transition testing) to enable D->Q path so that
the combinational cloud output can be captured at the next clock edge.
4. Again assert scan_enable and shift out the data through scan_out
The PDF (How does scan work) provides a very good explanation to how scan chains
function.

How Chain length is decided: By chain length, we mean the number of flip-flops in a
single scan chain. Larger the chain length, more the number of cycles required to shift
the data in and out. However, considering the number of flops remains same, smaller
chain length means more number of input/output ports is needed as scan_in and
scan_out ports. As

Number of ports required = 2 X Number of scan chains

Since for each scan chain, scan_in and scan_out port is needed. Also,

Number of cycles required to run a pattern = Length of largest scan


chain in design

Suppose, there are 10000 flops in the design and there are 6 ports available as
input/output. This means we can make (6/2=) 3 chains. If we make scan chains of 9000,
100 and 900 flops, it will be inefficient as 9000 cycles will be required to shift the data in
and out. We need to distribute flops in scan chains almost equally. If we make chain
lengths as 3300, 3400 and 3300, the number of cycles required is 3400.

Keeping almost equal number of flops in each scan chain is referred to as chain
balancing.

Why is body connected to ground for all nmos and not to


VDD
To prevent latch-up in CMOS, the body-source and body-drain diodes should not be forward
biased; i.e, body terminal should be at same or lesser voltage than source terminal (for an
NMOS; for a PMOS, it should be at higher voltage than source). This condition will be satisfied
if we connect all the nmos bodies to their respective sources. But we see that all the body
terminals are connected to a common ground.

This is due to the reason that all the nmos transistors share a common substrate, and a substrate
can only be biased to one voltage. Although it introduces body effect and makes transistors
slower and deviate from ideal mos current equation, there is no other way.

One could achieve different body voltage for all nmos transistors by putting all transistors in
different wells, but that would mean a tremendous penalty in terms of area as there needs to be
minimum size and separation that needs to be maintained which is huge in comparison to
transistor sizes. This is the reason why body is connected to ground for all NMOS.

Similarly, body of all PMOS transitors is connected to a common terminal VDD.

You might also like