FUNDAMENTALS of CMOS VLSI 5th SEM ECE PDF
FUNDAMENTALS of CMOS VLSI 5th SEM ECE PDF
FUNDAMENTALS of CMOS VLSI 5th SEM ECE PDF
Transistor was first invented by William.B.Shockley, Walter Brattain and John Bardeen
of Bell Labratories. In 1961, first IC was introduced.
Levels of Integration:-
Moore’s Law:-
“The number of transistors embedded on the chip doubles after every one and a half
years.” The number of transistors is taken on the y-axis and the years in taken on the x-
axis. The diagram also shows the speed in MHz. the graph given in figure also shows the
variation of speed of the chip in MHz.
1
The graph in figure2 compares the various technologies available in ICs.
From the graph we can conclude that GaAs technology is better but still it is not
used because of growing difficulties of GaAs crystal. CMOS looks to be a better
option compared to nMOS since it consumes a lesser power. BiCMOS technology is
also used in places where high driving capability is required and from the graph it
confirms that, BiCMOS consumes more power compared to CMOS.
Levels of Integration:-
2
Basic MOS Transistors:
Why the name MOS?
We should first understand the fact that why the name Metal Oxide
Semiconductor transistor, because the structure consists of a layer of Metal (gate), a
layer of oxide (Sio2) and a layer of semiconductor. Figure 3 below clearly tell why
the name MOS.
.
Figure 3.cross section of a MOS structure
We have two types of FETs. They are Enhancement mode and depletion mode
transistor. Also we have PMOS and NMOS transistors.
In Enhancement mode transistor channel is going to form after giving a proper
positive gate voltage. We have NMOS and PMOS enhancement transistors.
In Depletion mode transistor channel will be present by the implant. It can be
removed by giving a proper negative gate voltage. We have NMOS and PMOS
depletion mode transistors.
3
N-MOS depletion mode transistor:-
This transistor is normally ON, even with Vgs=0. The channel will be implanted
while fabricating, hence it is normally ON. To cause the channel to cease to exist, a –
ve voltage must be applied between gate and source.
NOTE: Mobility of electrons is 2.5 to 3 times faster than holes. Hence P-MOS devices
will have more resistance compared to NMOS.
4
Figure 8(a)(b)(c) Enhancement mode transistor with different Vds values
To establish the channel between the source and the drain a minimum voltage (Vt)
must be applied between gate and source. This minimum voltage is called as “Threshold
Voltage”. The complete working of enhancement mode transistor can be explained with
the help of diagram a, b and c.
a) Vgs > Vt
Vds = 0
Since Vgs > Vt and Vds = 0 the channel is formed but no current flows between
drain and source.
b) Vgs > Vt
Vds < Vgs - Vt
This region is called the non-saturation Region or linear region where the drain
current increases linearly with Vds. When Vds is increased the drain side becomes more
reverse biased(hence more depletion region towards the drain end) and the channel starts
to pinch. This is called as the pinch off point.
c) Vgs > Vt
Vds > Vgs - Vt
This region is called Saturation Region where the drain current remains almost
constant. As the drain voltage is increased further beyond (Vgs-Vt) the pinch off point
starts to move from the drain end to the source end. Even if the Vds is increased more
and more, the increased voltage gets dropped in the depletion region leading to a constant
current.
The typical threshold voltage for an enhancement mode transistor is given by Vt = 0.2 *
Vdd.
5
Depletion mode Transistor action:-
We can explain the working of depletion mode transistor in the same manner, as that of
the enhancement mode transistor only difference is, channel is established due to the
implant even when Vgs = 0 and the channel can be cut off by applying a –ve voltage
between the gate and source. Threshold voltage of depletion mode transistor is around
0.8*Vdd.
NMOS Fabrication:
6
Figure 9 NMOS Fabrication process steps
The process starts with the oxidation of the silicon substrate (Fig. 9(a)), in which a
relatively thick silicon dioxide layer, also called field oxide, is created on the surface
(Fig. 9(b)). Then, the field oxide is selectively etched to expose the silicon surface on
which the MOS transistor will be created (Fig. 9(c)). Following this step, the surface
is covered with a thin, high-quality oxide layer, which will eventually form the gate
oxide of the MOS transistor (Fig. 9(d)). On top of the thin oxide, a layer of
polysilicon (polycrystalline silicon) is deposited (Fig. 9(e)). Polysilicon is used both
as gate electrode material for MOS transistors and also as an interconnect medium in
silicon integrated circuits. Undoped polysilicon has relatively high resistivity. The
resistivity of polysilicon can be reduced, however, by doping it with impurity atoms.
After deposition, the polysilicon layer is patterned and etched to form the
interconnects and the MOS transistor gates (Fig. 9(f)). The thin gate oxide not
covered by polysilicon is also etched away, which exposes the bare silicon surface on
which the source and drain junctions are to be formed (Fig. 9(g)). The entire silicon
surface is then doped with a high concentration of impurities, either through diffusion
or ion implantation (in this case with donor atoms to produce n-type doping). Figure
9(h) shows that the doping penetrates the exposed areas on the silicon surface,
ultimately creating two n-type regions (source and drain junctions) in the p-type
substrate. The impurity doping also penetrates the polysilicon on the surface,
reducing its resistivity. Note that the polysilicon gate, which is patterned before
doping actually defines the precise location of the channel region and, hence, the
location of the source and the drain regions. Since this procedure allows very precise
positioning of the two regions relative to the gate, it is also called the self-aligned
7
process. Once the source and drain regions are completed, the entire surface is again
covered with an insulating layer of silicon dioxide (Fig. 9 (i)). The insulating oxide
layer is then patterned in order to provide contact windows for the drain and source
junctions (Fig. 9 (j)). The surface is covered with evaporated aluminum which will
form the interconnects (Fig. 9 (k)). Finally, the metal layer is patterned and etched,
completing the interconnection of the MOS transistors on the surface (Fig. 9 (l)).
Usually, a second (and third) layer of metallic interconnect can also be added on top
of this structure by creating another insulating oxide layer, cutting contact (via) holes,
depositing, and patterning the metal.
P-WELL PROCESS:
8
The p-well process starts with a n type substrate. The n type substrate can be used
to implement the pMOS transistor, but to implement the nMOS transistor we
need to provide a p-well, hence we have provided he place for both n and pMOS
transistor on the same n-type substrate.
Mask sequence.
Mask 1:
Mask 1 defines the areas in which the deep p-well diffusion takes place.
Mask 2:
It defines the thin oxide region (where the thick oxide is to be removed or
stripped and thin oxide grown)
Mask 3:
It’s used to pattern the polysilicon layer which is deposited after thin oxide.
Mask 4:
A p+ mask (anded with mask 2) to define areas where p-diffusion is to take
place.
Mask 5:
We are using the –ve form of mask 4 (p+ mask) It defines where n-diffusion is
to take place.
Mask 6:
Contact cuts are defined using this mask.
Mask 7:
The metal layer pattern is defined by this mask.
Mask 8:
An overall passivation (overglass) is now applied and it also defines openings
for accessing pads.
9
9
N-WELL PROCESS:
In the following figures, some of the important process steps involved in the
fabrication of a CMOS inverter will be shown by a top view of the lithographic masks
and a cross-sectional view of the relevant areas.
The n-well CMOS process starts with a moderately doped (with impurity
concentration typically less than 1015 cm-3) p-type silicon substrate. Then, an initial
oxide layer is grown on the entire surface. The first lithographic mask defines the n-well
region. Donor atoms, usually phosphorus, are implanted through this window in the
oxide. Once the n-well is created, the active areas of the nMOS and pMOS transistors can
be defined. Figures 12.1 through 12.6 illustrate the significant milestones that occur
during the fabrication process of a CMOS inverter.
Figure-12.1: Following the creation of the n-well region, a thick field oxide is grown in
the areas surrounding the transistor active regions, and a thin gate oxide is grown on top
of the active regions. The thickness and the quality of the gate oxide are two of the most
critical fabrication parameters, since they strongly affect the operational characteristics of
the MOS transistor, as well as its long-term reliability.
10
Figure-12.2: The polysilicon layer is deposited using chemical vapor deposition (CVD)
and patterned by dry (plasma) etching. The created polysilicon lines will function as the
gate electrodes of the nMOS and the pMOS transistors and their interconnects. Also, the
polysilicon gates act as self-aligned masks for the source and drain implantations that
follow this step.
11
Figure-12.3: Using a set of two masks, the n+ and p+ regions are implanted into the
substrate and into the n- well, respectively. Also, the ohmic contacts to the substrate and
to the n-well are implanted in this process step.
12
Figure-12.4: An insulating silicon dioxide layer is deposited over the entire wafer using
CVD. Then, the contacts are defined and etched away to expose the silicon or polysilicon
contact windows. These contact windows are necessary to complete the circuit
interconnections using the metal layer, which is patterned in the next step.
Figure-12.5: Metal (aluminum) is deposited over the entire chip surface using metal
evaporation, and the metal lines are patterned through etching. Since the wafer surface is
non-planar, the quality and the integrity of the metal lines created in this step are very
critical and are ultimately essential for circuit reliability.
13
Figure-12.6: The composite layout and the resulting cross-sectional view of the chip,
showing one nMOS and one pMOS transistor (built-in n-well), the polysilicon and metal
interconnections. The final step is to deposit the passivation layer (for protection) over
the chip, except for wire-bonding pad areas.
Twin-tub process:
Here we will be using both p-well and n-well approach. The starting point is a n-type
material and then we create both n-well and p-well region. To create the both well we
first go for the epitaxial process and then we will create both wells on the same substrate.
NOTE: Twin tub process is one of the solutions for latch-up problem.
14
Bi-CMOS technology: - (Bipolar CMOS)
The driving capability of MOS transistors is less because of limited current
sourcing and sinking capabilities of the transistors. To drive large capacitive loads we can
think of Bi-Cmos technology.
This technology combines Bipolar and CMOS transistors in a single integrated
circuit, by retaining benefits of bipolar and CMOS, BiCMOS is able to achieve VLSI
circuits with speed-power-density performance previously unattainable with either
technology individually.
15
The diagram given below shows the cross section of the BiCMOS process which
uses an npn transistor.
The figure below shows the layout view of the BiCMOS process.
16
The graph below shows the relative cost vs. gate delay.
In this topic we will understand how we are preparing the masks using e-beam
technology. The following are the steps in production of e-beam masks.
• Starting materials is chromium coated glass plates which are coated with e-beam
sensitive resist.
• E-beam machine is loaded with the mask description data.
• Plates are loaded into e-beam machine, where they are exposed with the patterns
specified by mask description data.
• After exposure to e-beam, plates are introduced into developer to bring out
patterns.
• The cycle is followed by a bake cycle which removes resist residue.
• The chrome is then etched and plate is stripped of the remaining e-beam resist.
We use two types of scanning, Raster scanning and vector scanning to map the
pattern on to the mask.
In raster type, e-beam scans all possible locations and a bit map is used to turn the e-
beam on and off, depending on whether the particular location being scanned is to be
exposed or not.
17
In vector type, beam is directed only to those location which are to be exposed.
18
MOS TRANSISTOR THEORY
Introduction
A MOS transistor is a majority-carrier device, in which the current in a
conducting channel between the source and the drain is modulated by a voltage applied to
the gate.
Symbols
19
Relationship between Vgs and Ids, for a fixed Vds:
Devices that are normally cut-off with zero gate bias are classified as "enhancement-
mode "devices.
Devices that conduct with zero gate bias are called "depletion-mode"devices.
Enhancement-mode devices are more popular in practical use.
Three MOS operating regions are:Cutoff or subthreshold region, linear region and
saturation region.
The following equation describes all these three regions:
20
where β is MOS transistor gain and it is given by β=µε/tox(W/L)
again ‘µ’ is the mobility of the charge carrier
‘ε’ is the permittivity of the oxide layer.
‘tox’is the thickness of the oxide layer.
‘W’ is the width of the transistor.( shown in diagram)
‘L’ is the channel length of the transistor.(shown in diagram)
21
Second Order Effects:
Following are the list of second order effects of MOSFET.
Vt=Vt(0)+γ[(Vsb+2ΦF)1/2-(2ΦF)1/2]
where
Vt is the threshold voltage,
Vt(0) is the threshold voltage without body effect
γ is the body coefficient factor
ΦF is the fermi potential
Vsb is the potential difference between source and substarte.
If Vsb is zero, then Vt=Vt(0) that means the value of the threshold voltage will
not be changed. Therefore, we short circuit the source and substrate so that, Vsb
will be zero.
Subthreshold region:
For Vgs<Vt also we will get some value of Drain current this is called as
Subthreshold current and the region is called as Subthreshold region.
22
Mobility:
Mobility is the defined as the ease with which the charge carriers drift in
the substrate material. Mobility decreases with increase in doping concentration
and increase in temperature. Mobility is the ratio of average carrier drift velocity
and electric field. Mobility is represented by the symbol µ.
Drain punchthough:
When the drain is a high voltage, the depletion region around the drain
may extend to the source, causing the current to flow even it gate voltage is zero.
This is known as Punchthrough condition.
MOS Models
MOS model includes the Ideal Equations, Second-order Effects plus the
additional Curve-fitting parameters. Many semiconductor vendors expend a lot of
effects to model the devices they manufacture.(Standard : Level 3 SPICE) . Main
SPICE DC parameters in level 1,2,3 in 1µn-well CMOS process.
23
CMOS INVETER CHARACTERISTICS
A CMOS inverter contains a PMOS and a NMOS transistor connected at the drain
and gate terminals, a supply voltage VDD at the PMOS source terminal, and a ground
connected at the NMOS source terminal, were VIN is connected to the gate terminals and
VOUT is connected to the drain terminals.( given in diagram). It is important to notice
that the CMOS does not contain any resistors, which makes it more power efficient that a
regular resistor-MOSFET inverter. As the voltage at the input of the CMOS device varies
between 0 and VDD, the state of the NMOS and PMOS varies accordingly. If we model
each transistor as a simple switch activated by VIN, the inverter’s operations can be seen
very easily:
The table given, explains when the each transistor is turning on and off. When
VIN is low, the NMOS is "off", while the PMOS stays "on": instantly charging VOUT to
24
logic high. When Vin is high, the NMOS is "on and the PMOS is "off": taking the
voltage at VOUT to logic low.
Inverter DC Characteristics:
The actual characteristics is also given here for the reference. Here we have shown the
status of both NMOS and PMOS transistor in all the regions of the characteristics.
25
Graphical Derivation of Inverter DC Characteristics:
The actual characteristics is drawn by plotting the values of output voltage for
different values of the input voltage. We can also draw the characteristics, starting with
the VI characteristics of PMOS and NMOS characteristics.
26
Figure 7d: Cmos Inverter Dc Characteristics
Figure 7d shows five regions namely region A, B, C, D & E. also we have shown a dotted
curve which is the current that is drawn by the inverter.
Region A:
The output in this region is High because the P device is OFF and n device is ON.
In region A, NMOS is cutoff region and PMOS is on, therefore output is logic high.
We can analyze the inverter when it is in region B. the analysis is given below:
Region B:
The equivalent circuit of the inverter when it is region B is given below.
In this region PMOS will be in linear region and NMOS is in saturation region.
9
27
The expression for the voltage Vo can be written as
Region C:
The equivalent circuit of CMOS inverter when it is in region C is given here.
Both n and p transistors are in saturation region, we can equate both the currents and we
can obain the expression for the mid point voltage or switching point voltage of a
inverter. The corresponding equations are as follows:
By equating both the currents, we can obtain the expression for the switching point
voltage as,
10
28
Region D: the equivalent circuit for region D is given in the figure below.
We can apply the same analysis what we did for region B and C and we can obtain the
expression for output voltage.
Region E:
The output in this region is zero because the P device is OFF and n device is ON.
Figure 11: Effect of βn/βp ratio change on the DC characteristics of CMOS inverter
11
29
The characteristics shifts left if the ratio of βn/βp is greater than 1(say 10). The
curve shifts right if the ratio of βn/βp is lesser than 1(say 0.1). this is decided by the
switching point equation of region C. the equation is repeated here the reference again.
/
Vm=Vsp=VDD+Vtp+Vtn(βn/βp)1/2 1+(βn/βp)1/2
Noise Margin:
Noise margin is a parameter related to input output characteristics. It determines the
allowable noise voltage on the input so that the output is not affected.
We will specify it in terms of two things:
LOW noise margin
HIGH noise margin
LOW noise margin: is defined as the difference in magnitude between the maximum
Low output voltage of the driving gate and the maximum input Low voltage recognized
by the driven gate.
NML=|VILmax – VOLmax|
HIGH noise margin: is defined difference in magnitude between minimum High output
voltage of the driving gate and minimum input High voltage recognized by the receiving
gate.
NMH=|Vohmin – VIHmin|
12
30
Figure shows how exactly we can find the noise margin for the input and output.
We can also find the noise margin of a CMOS inverter. The following figure gives the
idea of calculating the noise margin.
13
31
Pseudo-NMOS inverter:
This circuit uses the load device which is p device and is made to turn on always by
connecting the gate terminal to the ground.
Transmission gates:
It’s a parallel combination of pmos and nmos transistor with the gates connected to a
complementary input. After looking into various issues of pass transistors we will come
back to the TGs again.
14
32
Pass transistors:
We have n and p pass transistors.
The disadvantage with the pass transistors is that, they will not be able to transfer the
logic levels properly. The following table gives that explanation in detail.
If Vdd (5 volts) is to be transferred using nMOS the output will be (Vdd-Vtn). POOR 1
or Weak Logic 1
If Gnd(0 volts) is to be transferred using nMOS the output will be Gnd. GOOD 0 or
Strong Logic 0
If Vdd (5 volts) is to be transferred using pMOS the output will be Vdd. GOOD 1 or
Strong Logic 1
If Gnd(0 volts) is to be transferred using pMOS the output will be Vtp. POOR 0 or
Weak Logic 0.
Transmission gates(TGs):
It’s a parallel combination of pmos and nmos transistor with the gates connected to a
complementary input. The disadvantages weak 0 and weak 1 can be overcome by using a
TG instead of pass transistors.
Working of transmission gate can be explained better with the following equation.
One more important advantage of TGs is that the reduction in the resistance because two
transistors will come in parallel and it is shown in the graph. The graph shows the
15
33
resistance of n and p pass transistors, and resistance of TG which is lesser than the other
two.
Figure 19: Graph of resistance vs. input for pass transistors and TG
Tristate Inverter:
By cascading a transmission gate with an inverter the tristate inverter circuit can
be obtained. The working can be explained with the help of the circuit.
16
34
Unit 2
Circuit Design Process
In this chapter we will be studying how to get the schematic into stick diagrams or
layouts.
MOS circuits are formed on four basic layers:
N-diffusion
P-diffusion
Polysilicon
Metal
These layers are isolated by one another by thick or thin silicon dioxide insulating
layers.
Thin oxide mask region includes n-diffusion / p-diffusion and transistor channel.
Stick diagrams:
Stick diagrams may be used to convey layer information through the use of a color code.
For example: n-diffusion --green
poly -- red
blue -- metal
yellow --implant
black --contact areas
1
35
Figure1 shows when a n-transistor is formed: a transistor is formed when a green line (n+
diffusion) crosses a red line (poly) completely. Figure also shows how a depletion mode
transistor is represented in the stick format.
Figure 2 shows when a n-transistor is formed: a transistor is formed when a green line
(n+ diffusion) crosses a red line (poly) completely.
Figure 2 also shows when a p-transistor is formed: a transistor is formed when a yellow
line(p+ diffusion) crosses a red line (poly) completely.
2
36
Figure 3: Bi CMOS encodings
With CMOS there are two types of diffusion: n-type is drawn in green and p-type in
brown.
These are on the same layers in the chip and must not meet. In fact, the method of
fabrication required that they be kept relatively far apart.
Modern CMOS processes usually support more than one layer of metal. Two are
common and three or more are often available.
3
37
Actually, these conventions for colors are not universal; in particular, industrial (rather
than academic) systems tend to use red for diffusion and green for polysilicon. Moreover,
a shortage of colored pens normally means that both types of diffusion in CMOS are
colored green and the polarity indicated by drawing a circle round p-type transistors or
simply inferred from the context. Colorings for multiple layers of metal are even less
standard.
Figure4 shows schematic, stick diagram and corresponding layout of nMOS depletion
load inverter
Figure 5 shows the schematic, stick diagram and corresponding layout of CMOS inverter
4
38
Figure 6: nMOS depletion load NAND and NOR stick diagram
Figure6 shows the stick diagrams for nMOS NOR and NAND.
Figure 7 shows the stick diagram nMOS implementation of the function f=[(xy)+z]’
5
39
Figure 8 shows the stick diagram CMOS NOR and NAND, where we can see that the p
diffusion line never touched the n diffusion directly, it is always joined using a blue color
metal line.
Figure 9 shows the stick diagram of dynamic shift register using CMOS style. Here the
output of the TG is connected as the input to the inverter and the same chain continues
depending the number of bits.
Design Rules:
Design rules include width rules and spacing rules. Mead and Conway developed
a set of simplified scalable λ -based design rules, which are valid for a range of
fabrication technologies. In these rules, the minimum feature size of a technology is
characterized as 2 λ . All width and spacing rules are specified in terms of the parameter
λ . Suppose we have design rules that call for a minimum width of 2 λ , and a minimum
spacing of 3 λ . If we select a 2 um technology (i.e., λ = 1 um), the above rules are
translated to a minimum width of 2 um and a minimum spacing of 3 um. On the other
hand, if a 1 um technology (i.e., λ = 0.5 um) is selected, then the same width and spacing
rules are now specified as 1 um and 1.5 um, respectively.
6
40
Figure 10: Design rules for the diffusion layers and metal lalyers
Figure 10 shows the design rule n diffusion, p diffusion, poly, metal1 and metal 2.
The n and p diffusion lines is having a minimum width of 2λ and a minimum spacing of
3λ. Similarly we are showing for other layers.
Figure 11: Design rules for transistors and gate over hang distance
7
41
Figure shows the design rule for the transistor, and it also shows that the poly should
extend for a minimum of 2λ beyond the diffusion boundaries.(gate over hang distance)
What is Via?
It is used to connect higher level metals from metal1 connection. The cross section and
layout view given figure 13 explain via in a better way.
Figure 12: cross section showing the contact cut and via
Figure shows the design rules for contact cuts and Vias. The design rule for contact is
minimum 2λx2λ and same is applicable for a Via.
8
42
Buried contact: The contact cut is made down each layer to be joined and it is shown in
figure 14.
Butting contact: The layers are butted together in such a way the two contact cuts
become contiguous. We can better under the butting contact from figure 15.
9
43
Orbit 2µm CMOS process:
In this process all the spacing between each layers and dimensions will be in
terms micrometer. The 2µm here represents the feature size. All the design rules what
ever we have seen will not have lambda instead it will have the actual dimension in
micrometer.
In one way lambda based design rules are better compared micrometer based design
rules, that is lambda based rules are feature size independent.
Figure 17 shows the design rule for BiCMOS process using orbit 2um process.
The following is the example stick and layout for 2way selector with enable(2:1 MUX).
10
44
11
45
Unit 3
46
Different design tricks need to be used to avoid unknown creations. Like a
combination of metal1 and metal2 can be used to avoid short. Usually metat2 is used
for the global vdd and vss lines and metal1 for local connections.
VIN VOUT
The diagram shown here is the stick diagram for the CMOS inverter. It consists of a
Pmos and a Nmos connected to get the inverted output. When the input is low, Pmos
(yellow)is on and pulls the output to vdd, hence it is called pull up device. When Vin
=1,Nmos (green)is on it pulls Vout to Vss, hence Nmos is a pull down device. The red
lines are the poly silicon lines connecting the gates and the blue lines are the metal
lines for VDD(up) and VSS (down).The layout of the cmos inverter is shown below.
Layout also gives the minimum dimensions of different layers, along with the logical
connections and main thing about layouts is that can be simulated and checked for
errors which cannot be done with only stick diagrams.
47
Figure 3: Layout of an inverter
The layout shown above is that of a CMOS inverter. It consists of a pdiff (yellow
colour) forming the pmos at the junction of the diffusion and the polysilicon(red
colour)shown hatched ndiff(green) forming the nmos(area hatched).The different
layers drawn are checked for their dimensions using the DRC rule check of the tool
used for drawing. Only after the DRC(design rule check) is passed the design can
proceed further. Further the design undergoes Layout Vs Schematic checks and
finally the parasitic can be extracted.
48
Figure 5: Stick diagram of nand gate
49
Figure 8: Layout of nor gate
TRANSMISSION GATE
50
Figure 11:TG with nmos switches
CMOS STANDARD CELL DESIGN
Geometric regularity is very important to maintain some common electrical
characteristics between the cells in the library. The common physical limitation is to
fix the height and vary the width according to the required function. The Wp and Wn
are fixed considering power dissipation, propagation delay, area and noise immunity.
The best thing to do is to fix a required objective function and then fix Wn and Wp to
obtain the required objective
Usually in CMOS Wn is made equal to Wp . In the process of designing these gates
techniques may be employed to automatically generate the gates of common size.
Later optimization can be carried out to achieve a specific feature. Gate array layout
and sea of gate layout are constructed using the above techniques. The gate arrays
may be customized by having routing channels in between array of gates. The gate
array and the sea of gates have some special layout considerations. The gate arrays
use fixed image of the under layers i.e the diffusion and poly are fixed and metal are
programmable. The wiring layers are discretionary and providing the personalization
of the array. The rows of transistors are fixed and the routing channels are provided
in between them. Hence the design issues involves size of transistors, connectivity of
poly and the number of routing channels required.
Sea of gates in this style continuous rows of n and p diffusion run across the master
chip and are arranged without regard to the routing channel. Finally the routing is
done across unused transistors saving space.
51
3.Vertical poysilicon for each gate input
4.Order polysilicon gate signals for maximal connection between transistors
5.The connectivity requires to place nmos close to VSS and pmos close to VDD
6.Connection to complete the logic must be made using poly,metal and even metal2
The design must always proceeds towards optimization. Here optimization is at
transistor level rather then gate level. Since the density of transistors is large ,we
could obtain smaller and faster layout by designing logic blocks of 1000 transistors
instead of considering a single at a time and then putting them together. Density
improvement can also be made by considering optimization of the other factors in the
layout
The factors are
1.Efficient routing space usage. They can be placed over the cells or even in multiple
layers.
2.Source drain connections must be merged better.
3.White (blank) spaces must be minimum
4.The devices must be of optimum sizes.
5.Transperent routing can be provided for cell to cell interconnection, this reduces
global wiring problems
LAYOUT OPTIMIZATION FOR PERFORMANCE
1.Vary the size of the transistor according to its position in series. The transistor
closest to the output is the smallest. The transistor nearest to the VSS line is the
largest. This helps in increasing the performance by 30 %. A three input nand gate
with the varying size is shown next.
2. Less optimized gates could occur even in the case of parallel connected
transistors.This is usually seen in parallel inverters, nor & nand.When drains are
52
connected in parallel ,we must try and reduce the number of drains in parallel ie
wherever possible we must try and connect drains in series at least at the output.This
arrangement could reduce the capacitance at the output enabling good voltage levels.
One example is as shown next.
53
The N1 & N2 supply current to the base of the NPN2 transistor when the out put is
high and hence the it can pull it down with larger speed. When the output is low N3
clamps the base current to NPN2, P1 & P2 supply the base current to NPN1.
54
PSEUDO NMOS LOGIC
This logic structure consists of the pull up circuit being replaced by a single pull up
pmos whose gate is permanently grounded. This actually means that pmos is all the
time on and that now for a n input logic we have only n+1 gates. This technology is
equivalent to the depletion mode type and preceded the CMOS technology and hence
the name pseudo. The two sections of the device are now called as load and driver.
The ßn/ßp (ßdriver/ßload) has to be selected such that sufficient gain is achieved to
get consistent pull up and pull down levels. This involes having ratioed transistor
sizes so that correct operation is obtained. However if minimum size drivers are being
used then the gain of the load has to be reduced to get adequate noise margin.
There are certain drawbacks of the design which is highlighted next
1.The gate capacitance of CMOS logic is two unit gate but for
psuedo logic it is only one gate unit.
2.Since number of transistors per input is reduced area is reduced drastically.
The disadvantage is that since the pmos is always on, static power dissipation occurs
whenever the nmos is on. Hence the conclusion is that in order to use psuedo logic a
trade off between size & load or power dissipation has to be made.
55
GANGED LOGIC
The inputs are separately connected but the output is connected to a common
terminal. The logic depends on the pull up and pull down ratio. If pmos is able to over
come nmos it behaves as nandelse nor.
DYNAMIC CMOS LOGIC
56
1.Inputs have to change during the precharge stage and must be stable during the
evaluate. If this condition cannot occur then charge redistribution corrupts the output
node.
2.A simple single dynamic logic cannot be cascaded. During the evaluate phase the
first gate will conditionally discharge but by the time the second gate evaluates, there
is going to be a finite delay. By then the first gate may precharge.
CLOCKED CMOS LOGIC (C2MOS)
57
Hence in one clock cycle the cascaded logic makes only one transition from 1 to 0 and
buffer makes a transition from 0 to 1.In effect we can say that the cascaded logic falls
like a line of dominos, and hence the name. The advantage is that any number of logic
blocks can be cascaded provided the sequence can be evaluated in a single clock
cycle. Single clock can be used to precharge and evaluate all the logic in a block. The
limitation is that each stage must be buffered and only non- inverted structures are
possible.
A further fine tuning to the domino logic can also be done. Cascaded logic can now
consist of alternate p and n blocks and avoid the domino buffer. When clk=0,ie during
the precharge stage, the first stage (with n logic) is precharged high and the second a p
logic is precharged low and the third stage is high. Since the second stage is low, the n
transistor is off. Hence domino connections can be made.
The advantages are we can use smaller gates, achieve higher speed and get a smooth
operation. Care must be taken to ensure
design is correct.
NP DOMINO LOGIC (ZIPPER CMOS)
58
STATIC CVSL
DYNAMIC CVSL
59
DYNAMIC SSDL CVSL
Switches and switch logic can be formed from simple n or p transistors and from the
complementary switch ie the transmission gate. The complex transmission gate came
into picture because of the undesirable threshold effects of the simple pass transistors.
Transmission gate gives good non degraded logic levels. But this good package came
at the cost of larger area and complementary signals required to drive the gates
60
Figure 24: Some properties of pass transistor
CMOS Technology Logic Circuit Structures
Many different logic circuits utilizing CMOS technology have been invented and used
in various applications. These can be divided into three types or families of circuits:
1.Complementary Logic
Standard CMOS
Clocked CMOS (C2MOS)
BICMOS (CMOS logic with Bipolar driver)
2.Ratio Circuit Logic
Pseudo-NMOS
Saturated NMOS Load
Saturated PMOS Load
Depletion NMOS Load (E/D)
Source Follower Pull-up Logic (SFPL)
61
3.Dynamic Logic:
CMOS Domino Logic
NP Domino Logic (also called Zipper CMOS)
NOR A Logic
Cascade voltage Switch Logic (CVSL)
Sample-Set Differential Logic (SSDL)
Pass-Transistor Logic
The large number of implementations shown so far may lead to a confusion as to what
to use where. Here are some inputs
1.Complementary CMOS
The best option,because of the less dc power dissipation, noise immuned and fast.The
logic is highly automated. Avoid in large fan outs as it leads to excessive levels of
logic.
2.BICMOS
It can be used in high speed applications with large fanout. The economics must be
justified.
PSUEDO –NMOS
Mostly useful in large fan in NOR gates like ROMS,PLA and CLA adders.The DC
power can be reduced to 0 in case of power down situations
Clocked CMOS
Useful in hot electron susceptible processes.
CMOS domino logic
Used mostly in high speed low power application. Care must take of charge
redistribution. Precharge robs the speed advantage.
CVSL
This is basically useful in fast cascaded logic .The size, design complexity and
reduced noise immunity make the design not so popular.
Hybrid designs are also being tried for getting the maximum advantage of each
of them into one.
62
UNIT 4
BASIC CIRCUIT DESIGN CONCEPTS
INTRODUCTION
We have already seen that MOS structures are formed by the super imposition of a
number conducting ,insulating and transistor forming material. Now each of these
layers have their own characteristics like capacitance and resistances. These
fundamental components are required to estimate the performance of the system.
These layers also have inductance characteristics that are important for I/O behaviour
but are usually neglected for on chip devices.
The issues of prominence are
1.Resistance, capacitance and inductance calculations.
2.Delay estimations
3.Determination of conductor size for power and clock distribution
4.Power consumption
5.Charge sharing
6.Design margin
7.Reliabiltiy
8.Effects and extent of scaling
RESISTANCE ESTIMATION
The concept of sheet resistance is being used to know the resistive behavior of the
layers that go into formation of the MOS device. Let us consider a uniform slab of
conducting material of the following characteristics .
Resistivity- ρ
Width - W
Thickness - t
Length between faces – L as shown next
63
We know that the resistance is given by RAB= ρL/A Ω. The area of the slab
considered above is given by A=Wt. There fore RAB= ρL/Wt Ω. If the slab is
considered as a square then L=W. therefore RAB=ρ/t which is called as sheet
resistance represented by Rs.The unit of sheet resistance is ohm per square. It is to
be noted that Rs is independent of the area of the slab. Hence we can conclude that a
1um per side square has the same resistance as that of 1cm per side square of the
same material.
The resistance of the different materials that go into making of the MOS device
depend on the resistivity and the thickness of the material. For a diffusion layer the
depth defines the thickness and the impurity defines the resistivity. The table of values
for a 5u technology is listed below.5u technology means minimum line width is 5u
and =גּ2.5u.The diffusion mentioned in the table is n diffusion, p diffusion values are
2.5 times of that of n. The table of standard sheet resistance value follows.
Layer Rs per square
Metal 0.03
Silicide 2 to 4
Polysilicon 15 to 100
64
The N transistor above is formed by a 2 גּwide poly and n diffusion. The L/W ratio is
1. Hence the transistor is a square, therefore the resistance R is 1sqxRs ohm/sq i.e.
R=1x104. If L/W ratio is 4 then R = 4x104. If it is a P transistor then for L/W =1,the
value of R is 2.5x104.
Pull up to pull down ratio = 4.In this case when the nmos is on, both the devices are
on simultaneously, Hence there is an on resistance Ron = 40+10 =50k. It is this
resistance that leads the static power consumption which is the disadvantage of nmos
depletion mode devices
65
CONTACT AND VIA RESISTANCE
The contacts and the vias also have resistances that depend on the contacted materials
and the area of contact. As the contact sizes are reduced for scaling ,the associated
resistance increases. The resistances are reduced by making ohmic contacts which are
also called loss less contacts. Currently the values of resistances vary from .25ohms to
a few tens of ohms.
SILICIDES
The connecting lines that run from one circuit to the other have to be optimized. For
this reason the width is reduced considerably. With the reduction is width the sheet
resistance increases, increasing the RC delay component. With poly silicon the sheet
resistance values vary from 15 to 100 ohm. This actually effects the extent of scaling
down process. Polysilicon is being replaced with silicide. Silicide is obtained by
depositing metal on polysilicon and then sintering it. Silicides give a sheet resistance
of 2 to 4 ohm. The reduced sheet resistance makes silicides a very attractive
replacement for poly silicon. But the extra processing steps is an offset to the
advantage.
A Problem
A particular layer of MOS circuit has a resistivity ρ of 1 ohm –cm. The section is
55um long,5um wide and 1 um thick. Calculate the resistance and also find Rs
R= RsxL/W, Rs= ρ/t
Rs=1x10-2/1x10-6=104ohm
R= 104x55x10-6/5x106=110k
CAPACITANCE ESTIMATION
Parasitics capacitances are associated with the MOS device due to different layers that
go into its formation. Interconnection capacitance can also be formed by the metal,
diffusion and polysilicon (these are often called as runners) in addition with the
transistor and conductor resistance. All these capacitances actually define the
switching speed of the MOS device.
Understanding the source of parasitics and their variation becomes a very essential
part of the design specially when system performance is measured in terms of the
speed. The various capacitances that are associated with the CMOS device are
1.Gate capacitance - due to other inputs connected to output of the device
2.Diffusion capacitance - Drain regions connected to the output
3.Routing capacitance- due to connections between output and other inputs
The fabrication process illustrates that the conducting layers are apparently seperated
from the substrate and other layers by the insulating layer leading to the formation of
parallel capacitors. Since the silicon dioxide is the insulator knowing its thickness we
can calculate the capacitance
C= εoεinsA farad
D
Ɛεo= permittivity of free space-8.854x1014f/cm
66
εins= relative permitivity of sio2=4.0
D= thickness of the dioxide in cm
A = area of the plate in cm2
The gate to channel capacitance formed due to the sio2 separation is the most
profound of the mentioned three types. It is directly connected to the input and the
output. The other capacitance like the metal, poly can be evaluated against the
substrate. The gate capacitance is therefore standardized so as to enable to move from
one technology to the other conveniently.
The standard unit is denoted by ロCg. It represents the capacitance between gate to
channel with W=L=min feature size. Here is a figure showing the different
capacitances that add up to give the total gate capacitance
Cgd, Cgs = gate to channel capacitance lumped at the source and drain
Csb, Cdb = source and drain diffusion capacitance to substrate
Cgb = gate to bulk capacitance
Total gate capacitance Cg = Cgd+Cgs+Cgb
Since the standard gate capacitance has been defined, the other capacitances like
polysilicon, metal, diffusion can be expressed in terms of the same standard units so
that the total capacitance can be obtained by simply adding all the values. In order to
express in standard values the following steps must be followed
1. Calculate the areas of area under consideration relative to that of standard gate
i.e.4גּ2. (standard gate varies according to the technology)
2. Multiply the obtained area by relative capacitance values
tabulated .
3. This gives the value of the capacitance in the standard unit of
capacitance ロCg.
Table 1:Relative value of Cg
Gate to channel 1
Diffusion 0.25
M1 to sub 0.075
M2 to sub 0.05
M2 to M1 0.1
M2 to poly 0.075
67
For a 5u technology the area of the minimum sized transistor is 5uX5u=25um2 ie
=גּ2.5u, hence,area of minimum sized transistor in lambda is 2 גּX 2 = גּ4 גּ2.Therefore
for 2u or 1.2u or any other technology the area of a minimum sized transistor in
lambda is 4 גּ2. Lets solve a few problems to get to know the things better.
The figure above shows the dimensions and the interaction of different layers, for
evaluating the total capacitance resulting so.
Three capacitance to be evaluated metal Cm,polysilicon Cp and gate capacitance Cg
Area of metal = 100x3=300גּ2
Relative area = 300/4=75
Cm=75Xrelative cap=75X0.075=5.625ロCg
Polysilicon capacitance Cp
Area of poly=(4x4+1x2+2X2)=22גּ2
Relative area = 22גּ2/4 גּ2=5.5
Cp=5.5Xrelative cap=5.5x.1=0.55 ロCg
Gate capacitance Cg= 1ロCg because it is a min size gate
Ct=Cm+Cp+Cg=5.625+0.55+1=7.2 ロCg
68
The input capacitance is made of three components metal capacitance Cm, poly
capacitance Cp, gate capacitance Cg i.e Cin= Cm+Cg+Cp
Relative area of metal =(50x3)X2/4=300/4=75
Cm=75x0.075=5.625ロCg
Relative area of poly = (4x4+2x1+2x2)/4 =22/4 =5.5
Cp=5.5X0.1=0.55 ロCg
Cg=1 ロCg
Cin=7.175 ロCg
Cout = Cd+Cperi. Assuming Cperi to be negligible.
Cout = Cd.
Relative area of diffusion=51x2/4=102/4=25.5
Cd=25.5x0.25=6.25 ロCg.
The relative values are for the 5um technology
DELAY The concept of sheet resistance and standard unit capacitance can be used to
calculate the delay. If we consider that a one feature size poly is charged by one
feature size diffusion then the delay is Time constant 1Ƭ= Rs (n/p channel)x 1ロCg
secs. This can be evaluated for any technology. The value of ロCg will vary with
different technologies because of the variation in the minimum feature size.
5u using n diffusion=104X0.01=0.1ns safe delay 0.03nsec
2um = 104x0.0032=0.064 nsecs safe delay 0.02nsec
1.2u= 104x0.0023 = 0.046nsecs safe delay =0.1nsec
These safe figures are essential in order to anticipate the output at the right time
INVERTER DELAYS
We have seen that the inverter is associated with pull up and pull down resistance
values. Specially in nmos inverters. Hence the delay associated with the inverter will
depend on whether it is being turned off or on. If we consider two inverters cascaded
then the total delay will remain constant irrespective of the transitions. Nmos and
Cmos inverter delays are shown next
NMOS INVERTER
69
Let us consider the input to be high and hence the first inverter will pull it down. The
pull down inverter is of minimum size nmos. Hence the delay is 1Ƭ. Second inverter
will pull it up and it is 4 times larger, hence its delay is 4Ƭ.The total delay is 1Ƭ +4Ƭ=
5Ƭ. Hence for nmos the delay can be generalized as T=(1+Zpu/Zpd) Ƭ
CMOS INVERTER
70
The above current charges the capacitance and it has a constant value therefore the
model can be written as shown in figure above. The output is the drop across the
capacitance, given by
Vout =Idsp x t/CL
Substituting for Idsp we have Vout=ßp(Vgs-|Vtp|)2t/2CL. Therefore the equation for
t=2CLVout/ßp(Vgs-|Vtp|).Let t=Ƭr and Vout=Vdd, therefore we have Ƭr =
2VddCL/ßp(Vgs-|Vtp|)2. If consider Vtp=0.2Vdd and Vgs=Vdd we have Ƭr =
3CL/ßpVdd
On similar basis the fall time can be also be written as Ƭf = 3CL/ßnVdd whose model
can be written as shown next
71
We see that the width is increasing by a factor of f towards the last stage. Now both f
and N can be complementary. If f for each stage is large the number of stages N
reduces but delay per stage increases. Therefore it becomes essential to optimize. Fix
N and find the minimum value of f. For nmos inverters if the input transitions from 0
to 1 the delay is fƬ and if it transitions from 1 to 0 the delay is 4 fƬ. The delay for a
nmos pair is 5 fƬ. For a cmos pair it will be 7fƬ
optimum value of f.
Assume y=CL/ ロCg = fN, therefore choice of values of N and f are interdependent.
We find the value of f to minimize the delay, from the equation of y we have
ln(y)=Nln(f) i.e N=ln(y)/ln(f). If delay per stage is 5fƬ for nmos, then for even
number of stages the total delay is N/2 5fƬ=2.5fƬ. For cmos total delay is N/2 7fƬ =
3.5fƬ
Hence delay ά Nft=ln(y)/ln(f)ft. Delay can be minimized if chose the value of f to be
equal to e which is the base of natural logarithms. It means that each stage is 2.7wider
than its predecessor. If f=e then N= ln(y).The total delay is then given by
1.For N=even
td=2.5NeƬ for nmos, td=3.5NeƬ for cmos
2.For N=odd
transition from 0 to 1 transition from1 to 0
td=[2.5(N-1)+1]eƬ nmos td=[2.5(N-1)+4]eƬ
td=[3.59N-1)+2]eƬ cmos td=[3.5(N-1)+5]eƬ
for example
For N=5 which is odd we can calculate the delay fro vin=1 as td=[2.5(5-1)+1]eƬ =11e
Ƭ
i.e. 1 +4+1+4+1 = 11eƬ
For vin =0 , td=[2.5(5-1)+4]eƬ = 14eƬ
4+1+4+1+4 = 14eƬ
SUPER BUFFER
The asymmetry of the inverters used to solve delay problems is clearly undesirable,
this also leads to more delay problems, super buffer are a better solution. We have a
inverting and non inverting variants of the super buffer. Such arrangements when
used for 5u technology showed that they were capable of driving 2pf capacitance with
2nsec rise time.The figure shown next is the inverting variant.
72
I
73
Figure 35
The collector resistance is another parameter that contributes to the delay.The graph
shown below shows that for smaller load capacitance, the delay is manageable but for
large capacitance, as Rc increases the delay increase drastically.
Figure 36
By taking certain care during fabrication reasonably good bipolar devices can be
produced with large hfe, gm ,ß and small Rc. Therefore bipolar devices used in
buffers and logic circuits give the designers a lot of scpoe and freedom .This is
coming without having to do any changes with the cmos circuit.
PROPAGATION DELAY
74
This is delay introduced when the logic signals have to pass through a chain of pass
transistors. The transistors could pose a RC product delay and this increases
drastically as the number of pass transistor in series increases.As seen from the figure
the response at node V2 is given by CdV2/dt=(V1-V2)(V2-V3)/R For a long network
we can write RCdv/dt =dv2/dx2, i.e delay ά x2,
Figure 38
Lump all the R and C we have Rtotal=nrRs and C=ncロCg where and hence delay
=n2rcƬ. The increases by the square of the number, hence restrict the number of stages
to maximum 4 and for longer ones introduce buffers in between.
75
3.Peripheral capacitance
The capacitances together add upto as much capacitance as coming from the gate to
source and hence the design must consider points to reduce them.The major of the
wiring capacitance is coming from fringing field effects. Fringing capacitances is due
to parallel fine metal lines running across the chip for power conection.The
capacitance depends on the length l, thickness t and the distance d between the wire
and the substrate. The accurate prediction is required for performance
estimation.Hence Cw=Carea+Cff.
Interlayer capacitance is seen when different layers cross each and hence it is
neglected for simole calculations. Such capacitance can be easily estimated for regular
structures and helps in modeling the circuit better.
Peripheral capacitance is seen at the junction of two devices. The source and the
drain n regions form junctions with the pwell (substrate) and p diffusion form with
adjacent nwells leading to these side wall (peripheral) capacitance
The capacitances are profound when the devices are shrunk in sizes and hence must
be considered. Now the total diffusion capacitance is Ctotal = Carea + Cperi
In order to reduce the side wall effects, the designers consider to use isolation regions
of alternate impurity.
CHOICE OF LAYERS
1.Vdd and Vss lines must be distributed on metal lines except for some exception
2.Long lengths of poly must be avoided because they have large Rs,it is not suitable
for routing Vdd or Vss lines.
3.Since the resistance effects of the transistors are much larger, hence wiring effects
due to voltage dividers are not that profound
Capacitance must be accurately calculated for fast signal lines usually those using
high Rs material. Diffusion areas must be carefully handled because they have larger
capacitance to substrate.
With all the above inputs it is better to model wires as small capacitors which will
give electrical guidelines for communication circuits.
PROBLEMS
1.A particular section of the layout includes a 3 גּwide metal path which crosses a 2גּ
polysilicon path at right angles. Assuming that the layers are seperated by a 0.5 thick
sio2,find the capacitance between the two.
Capacitance = ɛ0 ɛins A/D
Let the technology be 5um, =גּ2.5um.
Area = 7.5umX5um=37.5um
C=4X8.854X10-12 x37.5/ 0.5 =2656pF
The value of C in standard units is
Relative area 6 גּ2 /4 גּ2 =1.5
C =1.5x0.075=0.1125ロCg
2 nd part of the problem
76
The polysilicon turns across a 4 גּdiffusion layer, find the gate to channel capacitance.
Area = 2 גּx 4= גּ8 גּ2 Relative area= 8 גּ2 / 4 גּ2 =2
Relative capacitance for 5u=1
Total gate capacitance = 2ロCg
Gate to channel capacitance>metal
2. The two nmos transistors are cascaded to drive a load capacitance of 16ロCg as
shown in figure ,Calculate the pair delay. What are the ratios of each transistors. f
stray and wiring capacitance is to be considered then each inverter will have an
additional capacitance at the output of 4 ロCg .Find the delay.
Figure 40
Lpu=16 גּWpu=2 גּZpu=8
Lpd=2 גּWpd=2 גּZpd=1
Ratio of inverter 1 = 8:1
Lpu=2 גּWpu=2 גּZpu=1
Lpd =2 גּWpd =8 גּZpd=1/4
Ratio of inverter 2 = 1/1/4=4
Delay without strays
1Ƭ=Rsx1ロCg
Let the input transition from 1 to 0
Delay 1 = 8RsXロCg=8Ƭ Delay 2=4Rs(ロCg +16 ロCg)=68Ƭ Total delay = 76Ƭ
Delay with strays
Delay 1 = 8RsX(ロCg+ 4ロCg) = 40Ƭ Delay 2= 4RsX(ロCg+ 4ロCg +16 ロCg)=84Ƭ
Total delay = 40+84=124Ƭ
If Ƭ = 0.1ns for 5u ie the delays are 7.6ns and 12.4ns
SCALING OF MOS DEVICES
77
The VLSI technology is in the process of evolution leading to reduction of the feature
size and line widths. This process is called scaling down. The reduction in sizes has
generally lead to better performance of the devices. There are certain limits on scaling
and it becomes important to study the effect of scaling. The effect of scaling must be
studied for certain parameters that effect the performance.
The parameters are as stated below
1.Minimum feature size
2.Number of gates on one chip
3.Power dissipation
4.Maximum operational frequency
5.Die size
6.Production cost .
These are also called as figures of merit
Many of the mentioned factors can be improved by shrinking the sizes of transistors,
interconnects, separation between devices and also by adjusting the voltage and
doping levels. Therefore it becomes essential for the designers to implement scaling
and understand its effects on the performance
There are three types of scaling models used
1.Constant electric field scaling model
2.Constant voltage scaling model
3.Combined voltage and field model
The three models make use of two scaling factors 1/ß and 1/ά . 1/ß is chosen as the
scaling factor for Vdd, gate oxide thickness D. 1/ ά is chosen as the scaling factor for
all the linear dimensions like length, width etc. the figure next shows the dimensions
and their scaling factors
The following are some simple derivations for scaling down the device parameters
1.Gate area Ag
Ag= L x W. Since L & W are scaled down by 1/ ά. Ag is scaled down by 1/ ά2
2.Gate capacitance per unit area
Co=ɛo/D, permittivity of sio2 cannot be scaled, hence Co can be scaled 1/1/ß=ß
3.Gate capacitance Cg
Cg=CoxA=CoxLxW. Therefore Cg can be scaled by ßx1/ άx1/ ά= ß/ ά2
4.Parasitic capacitance
Cx=Ax/d, where Ax is the area of the depletion around the drain or source. d is the
depletion width .Ax is scaled down by 1/ά2 and d is scaled by 1/ά. Hence Cx is
scaled by
1/ ά2 /1/ ά = 1/ ά
78
5.Carrier density in the channel Qon
Qon=Co.Vgs
Co is scaled by ß and Vgs is scaled by 1/ ß,hence Qo is scaled by ßx1/ß =1.
Channel resistance Ro
Ron = L/Wx1/Qoxµ, µ is mobility of charge carriers . Ro is scaled by1/ά/1/ άx1=1
Gate delay Td
Td is proportional to Ro and Cg
Td is scaled by 1x ß/ά2 = ß/ά2
Maximum operating frequency fo
fo=1/td,therefore it is scaled by 1/ ß/ά2 = ά2/ß
Saturation current
Idss= CoµW(Vgs-Vt)/2L, Co scale by ß and
voltages by 1/ ß, Idss is scaled by ß /ß2= 1/ß
Current Density
J=Idss/A hence J is scaled by 1/ß/1/ά2 = ά2 /ß
79
1
80
Scaling of MOS Circuits
CONTENTS
1. What is scaling?
2. Why scaling?
5. Scaling models
8. Limitations of scaling
9. Observations
10. Summary
2
81
Scaling of MOS Circuits
1.What is Scaling?
Proportional adjustment of the dimensions of an electronic device while
maintaining the electrical properties of the device, results in a device either larger or
smaller than the un-scaled device. Then Which way do we scale the devices for VLSI?
BIG and SLOW … or SMALL and FAST? What do we gain?
2.Why Scaling?...
Scale the devices and wires down, Make the chips ‘fatter’ – functionality, intelligence,
memory – and – faster, Make more chips per wafer – increased yield, Make the end user
Happy by giving more for less and therefore, make MORE MONEY!!
o Power dissipation
o Die size
o Production cost
Many of the FoMs can be improved by shrinking the dimensions of transistors and
interconnections. Shrinking the separation between features – transistors and wires
Adjusting doping levels and supply voltages.
Reduce energy per transition by 65% (50% power savings @ 43% increase in frequency)
3
82
Figure1 to Figure 5 illustrates the technology scaling in terms of minimum feature size,
transistor count, prapogation delay, power dissipation and density and technology
generations.
2
10
Minimum Feature Size (micron)
1
10
0
10
-1
10
-2
10
1960 1970 1980 1990 2000 2010
Year
Figure-1:Technology Scaling (1)
4
83
Propagation Delay
Figure-3:Technology Scaling (3)
ears
100 x1.4 / 3 y 1000 ∝κ
0.7
Power Dissipation (W)
rs
y ea
3
10 /3 κ
x 4 100
∝
10
0.1
MPU
DSP
0.01 1
80 85 90 95 1 10
( )
Scaling Factor κ
Year normalized by 4 µm design rule
(a) Power dissipation vs. year. (b) Power density vs. scaling factor.
5
84
Technology Generations
Figure-5:Technology generation
Table 1: ITRS
6
85
5.Scaling Models
Full Scaling (Constant Electrical Field)
Ideal model – dimensions and voltage scale together by the same scale factor
Most common model until recently – only the dimensions scale, voltages remain constant
General Scaling
Most realistic for today’s situation – voltages and dimensions scale with different factors
Why is the scaling factor for gate oxide thickness different from other linear horizontal
and vertical dimensions? Consider the cross section of the device as in Figure 6,various
parameters derived are as follows.
Figure-6:Technology generation
7
86
• Gate area Ag
Ag = L *W
Where L: Channel length and W: Channel width and both are scaled by 1/α
Thus Ag is scaled up by 1/α2
Cox = εox/D
Where εox is permittivity of gate oxide(thin-ox)= εinsεo and D is the gate oxide thickness
scaled by 1/β 1
Thus Cox is scaled up by =β
1
β
• Gate capacitance Cg C g = Co * L *W
• Parasitic capacitance Cx
Cx is proportional to Ax/d
where d is the depletion width around source or drain and scaled by 1/ α
Ax is the area of the depletion region around source or drain, scaled by (1/ α2 ).
Thus Cx is scaled up by {1/(1/α)}* (1/ α2 ) =1/ α
Qon = Co * Vgs
where Qon is the average charge per unit area in the ‘on’ state.
Co is scaled by β and Vgs is scaled by 1/ β
L 1
Ron = *
W Qon * µ
8
87
Thus Ron is scaled by 1.
• Gate delay Td
Td is proportional to Ron*Cg
Td is scaled by 1 β
2
*β =
α α2
W µCoVDD
fo = *
L Cg
1 1
β * =
2
β β
Co µ W 2
I dss = * * (Vgs − Vt )
2 L
1 α2
Both Vgs and Vt are scaled by (1/ β). Therefore, Idss is scaled by =
β β
2
α
• Current density J
I dss
Current density, J = A where A is cross sectional area of the
Channel in the “on” state which is scaled by (1/ α2).
So, J is scaled by
1
β α2
=
1 β
α2
•E = 1 C V 2
g g DD
2
So Eg is scaled by
β 1 1
* 2 = 2
2
α β α β 9
88
• Power dissipation per gate Pg
Pg = Pgs + Pgd
Pg comprises of two components: static component Pgs and dynamic component Pgd:
2
Where, the static power component is given by: V DD
Pgs =
R on
Since VDD scales by (1/β) and Ron scales by 1, Pgs scales by (1/β2).
Since Eg scales by (1/α2 β) and fo by (α2 /β), Pgd also scales by (1/β2). Therefore, Pg
scales by (1/β2).
Max. operating α2 / β α α2
fo frequency
PT Power speed product 1 / α2 β 1 / α3 1/α
2
7.Implications of Scaling
Improved Performance
Improved Cost
Interconnect Woes
Power Woes
Productivity Challenges
Physical Limits
11
90
7.1Cost Improvement
– Moore’s Law is still going strong as illustrated in Figure 7.
Figure-7:Technology generation
7.2:Interconnect Woes
• Scaled transistors are steadily improving in delay, but scaled wires are holding
constant or getting worse.
• SIA made a gloomy forecast in 1997
– Delay would reach minimum at 250 – 180 nm, then get
worse because of wires
• But…
• For short wires, such as those inside a logic gate, the wire RC delay is negligible.
• However, the long wires present a considerable challenge.
• Scaled transistors are steadily improving in delay, but scaled wires are holding
constant or getting worse.
• SIA made a gloomy forecast in 1997
– Delay would reach minimum at 250 – 180 nm, then get
worse because of wires
• But…
• For short wires, such as those inside a logic gate, the wire RC delay is negligible.
• However, the long wires present a considerable challenge.
Figure 8 illustrates delay Vs. generation in nm for different materials.
Figure-8:Technology generation
12
91
7.3 Reachable Radius
• We can’t send a signal across a large fast chip in one cycle anymore
Chip size
Scaling of
reachable radius
Figure-9:Technology generation
13
92
Figure-10:Technology generation
Moore(03)
Figure-11:Technology generation
14
93
7.6 Productivity
• Transistor count is increasing faster than designer productivity (gates / week)
Dynamic power
Fabrication costs
Electro-migration
Interconnect delay
8. Limitations of Scaling
Effects, as a result of scaling down- which eventually become severe enough to prevent
further miniaturization.
o Substrate doping
o Depletion width
o Limits of miniaturization
15
94
o Limits of interconnect and contact resistance
o q electron charge
Figure 12 , Figure 13 and Figure 14 shows the relation between substrate concentration
Vs depletion width , Electric field and transit time.
Figure 15 demonstrates the interconnect length Vs. propagation delay and Figure 16
oxide thickness Vs. thermal noise.
Figure-12:Technology generation
17
96
Figure-13:Technology generation
v drift = µE
L 2d
t= =
Vdrift µE
18
97
Figure-14:Technology generation
19
98
Figure-15:Technology generation
Emax = 2{Va + Vb }/ d
20
99
Figure-16:Technology generation
21
100
– Module wires will get worse, but only slowly
– You don’t think to rethink your wires in your adder, memory
Or even your super-scalar processor core
• It does let you design more modules
• Continued scaling of uniprocessor performance is getting hard
-Machines using global resources run into wire limitations
-Machines will have to become more explicitly parallel
22
101
– Module wires will get worse, but only slowly
– You don’t think to rethink your wires in your adder, memory
Or even your super-scalar processor core
• It does let you design more modules
• Continued scaling of uniprocessor performance is getting hard
-Machines using global resources run into wire limitations
-Machines will have to become more explicitly parallel
22
102
CMOS SUBSYSTEM DESIGN
23
103
CONTENTS
1. System
2. VLSI design flow
4. Architectural issues
6. Circuit Families
Restoring Logic: CMOS and its variants - NMOS and Bi CMOS
Other circuit variants
NMOS gates with depletion (zero -threshold) pull up
Bi-CMOS gates
24
104
1.What is a System?
A system is a set of interacting or interdependent entities forming and integrate whole.
Common characteristics of a system are
o Systems have structure - defined by parts and their composition
o Systems have behavior – involves inputs, processing and outputs (of material,
information or energy)
o Systems have interconnectivity the various parts of the system functional as well
as structural relationships between each other
25
105
Geometrical domain
• Design flow starts from the algorithm that describes the behavior
of target chip.
Verification of design plays very important role in every step during process.
Two approaches for design flow as shown in Figure 2 are
Top-down
Bottom-up
26
106
Figure 2. Typical VLSI design flow
27
107
Figure 3. Typical ASIC/Custom design flow
28
108
3 Structured Design Approach
29
109
Figure 4-Structured Design Approach –Hierarchy
30
110
• Design of array structures consisting of identical cells.-such as parallel
multiplication array.
• Exist at all levels of abstraction:
transistor level-uniformly sized.
logic level- identical gate structures
• 2:1 MUX, D-F/F- inverters and tri state buffers
• Library-well defined and well-characterized basic building block.
• Modularity: enables parallelization and allows plug-and-play
• Locality: Internals of each module unimportant to exterior modules and internal
details remain at local level.
4 Architectural issues
5. MOSFET as a Switch
31
111
• We can view MOS transistors as electrically controlled switches
• Voltage at gate controls path from source to drain
32
112
5.2 Series connection of Switches..
a a a a a
0 0 1 1
g1
g2
0 1 0 1
b b b b b
(a) OFF OFF OFF ON
a a a a a
0 0 1 1
g1
g2
0 1 0 1
b b b b b
(b) ON OFF OFF OFF
a a a a a
g1 g2 0 0 0 1 1 0 1 1
b b b b b
(c) OFF ON ON ON
a a a a a
g1 g2 0 0 0 1 1 0 1 1
b b b b b
(d) ON ON ON OFF
33
113
6. Circuit Families : Restoring logic
CMOS INVERTER
A Y V DD
0
A Y
A Y
1
GND
V DD
A Y
OFF
0 A Y A= 1
ON
Y= 0
1 0 GND
V DD
A Y ON
A= 0 Y= 1
0 1 A Y OFF
1 0 GND
34
114
6.1 NAND gate Design..
35
115
A B Y
ON ON
0 0 1 Y=1
A=0
0 1 OFF
1 0 B=0
OFF
1 1
A B Y OFF ON
0 0 1 Y=1
A=0
0 1 1 OFF
1 0
B=1
1 1
ON
A B Y
ON OFF
0 0 1 Y=1
A=1
0 1 1 ON
B=0
1 0 1 OFF
1 1 0
36
116
Y
A
B
C
37
117
A
B
C
D
Y
CMOS INVERTER
38
118
6.3 CMOS Properties
Pull-down ON 0 X (crowbar)
pMOS
pull-up
network
inputs
output
nMOS
pull-down
network
39
119
• Pull-up network is complement of pull-down
• Parallel -> series, series -> parallel
• Output signal strength is independent of input.-level restoring
• Restoring logic. Ouput signal strength is either Voh (output high) or Vol. (output
low).
• Ratio less logic :output signal strength is independent of pMOS device size to
nMOS size ratio.
• significant current only during the transition from one state to another and - hence
power is conserved..
• Rise and fall transition times are of the same order,
• Very high levels of integration,
• High performance.
40
120
41
121
42
122
6.5 Complex gates AOI..
A C A C
B D B D
(a) (b)
C D
A B C D
A B
(c)
(d)
C D
A
A B
B
Y Y
C
AY = (A+ B +C
C) D D
B D
(f)
(e) A
B
C D
Y
D
A B C
A 4 B 4 A 4 B 4 B 6
2 C 4 C 4 D 4 C 6 A 3
A Y Y Y
1 A 2 A 2 C 2 D 6 E 6
C 1 Y
B 2 B 2 D 2 E 2 A 2
D 2 B 2 C 2
43
123
6.6 Circuit Families : Restoring logic CMOS Inverter- Stick diagram
44
124
• Depletion mode is called pull-up and the enhancement mode device pull-
down.
• Obtain the transfer characteristics.
• As Vin exceeds the p.d. threshold voltage current begins to flow, Vout thus
decreases and further increase will cause p.d transistor to come out of
saturation and become resistive.
• p.u transistor is initially resistive as the p.d is turned on.
• Point at which Vout = Vin is denoted as Vinv
• Can be shifted by variation of the ratio of pull-up to pull-down resistances
–Zp.u / Zp.d
• Z- ratio of channel length to width for each transistor
45
125
6.8Restoring logic CMOS Variants: BiCMOS Inverter-stick diagram
46
126
6.9 Circuit Families : Restoring logic CMOS NAND gate
47
127
6.11 Restoring logic CMOS Variants: BiCMOS NAND gate
• For nMOS Nand-gate, the ratio between pull-up and sum of all pull-downs must
be 4:1.
• nMOS Nand-gate area requirements are considerably greater than corresponding
nMOS inverter
• nMOS Nand-gate delay is equal to number of input times inverter delay.
• Hence nMOS Nand-gates are used very rarely
• CMOS Nand-gate has no such restrictions
• BiCMOS gate is more complex and has larger fan-out.
48
128
7.1 Switch logic: Pass Transistor
VDD
Vs = |Vtp| VDD-Vtn
VDD VDD-2Vtn
VSS
50
130
Input Output
g = 0, gb = 1 g = 1, gb = 0
g
a b 0 strong 0
a b g = 1, gb = 0 g = 1, gb = 0
a b 1 strong 1
gb
g g g
a b a b a b
gb gb gb
51
131
8 Structured Design-Tristate
• Tristate buffer produces Z when not enabled
EN A Y
0 0
0 1
1 0
E N
1 1 A Y
1 0 0
1 1 1
EN
A Y
EN
52
132
8.3 Structured Design-Tristate Inverter
A
EN
Y
EN
A A
A
EN
Y Y Y
EN
EN = 0 EN = 1
Y = 'Z' Y=A
53
133
8.4 Structured Design-Multiplexers
S D1 D0 Y
0 X 0 0 S
0 X 1 1 D0 0
Y
1 0 X 0 1
D1
S D1 D0 Y
1 1 X 1
0 X 0
0 X 1
1 0 X
1 1 X
D1
S Y
D0
54
134
D1 4 2
8.6 Structured Design-Mux Design-Transmission Gate
• Nonrestoring mux uses two transmission gates
– Only 4 transistors
D0
S Y
D1
Inverting Mux
• Inverting multiplexer
– Use compound AOI22
– Or pair of tristate inverters
• Noninverting multiplexer adds an inverter
D0 S D0 D1 S
S D1 S S
Y Y D0 0
S S S S Y
D1 1
55
135
8.7 Design-4:1 Multiplexer
• 4:1 mux chooses one of 4 inputs using two selects
Two levels of 2:1 muxes
Or four tristates
S1S0 S1S0 S1S0 S1S0
D0
S0 S1
D0 0
D1
D1 1
0
Y Y
1
D2 0 D2
D3 1
D3
CLK CLK
D
Latch
D -a latch is level
Q sensitive Q
– a register is edge-triggered
– A flip-flop is a bi-stable element
–
56
136
–
CLK
CLK
D Q Q
1
Q D Q
0
CLK CLK
CLK
Q Q
D Q D Q
CLK = 1 CLK = 0 57
137
Structured Design-Latch Design
φ
X
D Q
• Inverting buffer
Restoring φ
No backdriving φ
Fixes either
Output noise sensitivity D Q
Or diffusion input
Inverted output φ
CLK
CLK
D
Flop
D Q Q
CLK CLK
CLK QM
D Q
CLK CLK CLK CLK
CLK
Latch
Latch
QM
D Q
CLK CLK
58
138
9.4 D Flip-flop Operation
QM Q
D
CLK = 0
QM
D Q
CLK = 1
CLK
CLK1
CLK1 CLK2 CLK2
Q1
Flop
Flop
Q1 Q2
D
Q2
59
139
UNIT 6: Subsystem Design Processes Illustration
Some Solution
Problem 1 & 3 are greatly reduced if two aspects of standard practices are
accepted.
1. a) Top-down design approach with adequate CAD tools to do the job
b) Partitioning the system sensibly
c) Aiming for simple interconnections
d) High regularity within subsystem
e) Generate and then verify each section of the design
2. Devote significant portion of total chip area to test and diagnostic facility
3. Select architectures that allow design objectives and high regularity in realization
Illustration of design processes
1. Structured design begins with the concept of hierarchy
Datapath is as shown below in figure 6.2. It is seen that the structure comprises of
a unit which processes data applied at one port and presents its output at a second port.
The heart of the ALU is a 4-bit adder circuit. A 4-bit adder must take sum of two
4-bit numbers, and there is an assumption that all 4-bit quantities are presented in parallel
form and that the shifter circuit is designed to accept and shift a 4-bit parallel sum from
the ALU. The sum is to be stored in parallel at the output of the adder from where it is
fed through the shifter and back to the register array. Therefore, a single 4-bit data bus is
needed from the adder to the shifter and another 4-bit bus is required from the shifted
output back to the register array. Hence, for an adder two 4-bit parallel numbers are fed
on two 4-bit buses. The clock signal is also required to the adder, during which the inputs
are given and sum is generated. The shifter is unclocked but must be connected to four
shift control lines.
An ALU must be able to add and subtract two binary numbers, perform logical
operations such as And, Or and Equality (Ex-or) functions. Subtraction can be performed
by taking 2’s complement of the negative number and perform the further addition. It is
desirable to keep the architecture as simple as possible, and also see that the adder
performs the logical operations also. Hence let us examine the possibility.
Next, consider the carry output of each element, first Ck-1 is held at logical 0, then
Ck = AkBk + Hk . 0
Ck = AkBk - An And operation
Now if Ck-1 is at logical 1, then
Ck = AkBk + Hk . 1
On solving Ck = Ak + Bk - An Or operation
The adder element implementing both the arithmetic and logical functions can be
implemented as shown in the figure 6.12.
Generation:
If we are able to localize a chain of bits ak ak+1...ak+p and bk bk+1...bk+p for which ak
not equal to bk for k in [k,k+p], then the output carry bit of this chain will be equal to the
input carry bit of the chain.
These remarks constitute the principle of generation and propagation used to
speed the addition of two numbers.
pk = ak XOR bk
gk = ak bk
In the schematic of Figure 6.12, the carry passes through a complete transmission
gate. If the carry path is precharged to VDD, the transmission gate is then reduced to a
simple NMOS transistor. In the same way the PMOS transistors of the carry generation is
removed. One gets a Manchester cell.
The Manchester cell is very fast, but a large set of such cascaded cells would be
slow. This is due to the distributed RC effect and the body effect making the propagation
time grow with the square of the number of cells. Practically, an inverter is added every
four cells, like in Figure 6.14.
The operands of addition are the addend and the augend. The addend is added to
the augend to form the sum. In most computers, the augmented operand (the augend) is
replaced by the sum, whereas the addend is unchanged. High speed adders are not only
for addition but also for subtraction, multiplication and division. The speed of a digital
processor depends heavily on the speed of adders. The adders add vectors of bits and the
principal problem is to speed- up the carry signal. A traditional and non optimized four
bit adder can be made by the use of the generic one-bit adder cell connected one to the
other. It is the ripple carry adder. In this case, the sum resulting at each stage need to wait
for the incoming carry signal to perform the sum operation. The carry propagation can be
speed-up in two ways. The first –and most obvious– way is to use a faster logic circuit
technology. The second way is to generate carries by means of forecasting logic that does
not rely on the carry signal being rippled from stage to stage of the adder.
Depending on the position at which a carry signal has been generated, the
propagation time can be variable. In the best case, when there is no carry generation, the
addition time will only take into account the time to propagate the carry signal. Figure
6.15 is an example illustrating a carry signal generated twice, with the input carry being
equal to 0. In this case three simultaneous carry propagations occur. The longest is the
second, which takes 7 cell delays (it starts at the 4th position and ends at the 11th
position). So the addition time of these two numbers with this 16-bits Ripple Carry Adder
is 7.k + k’, where k is the delay cell and k’ is the time needed to compute the 11th sum bit
using the 11th carry-in.
With a Ripple Carry Adder, if the input bits Ai and Bi are different for all position
i, then the carry signal is propagated at all positions (thus never generated), and the
addition is completed when the carry signal has propagated through the whole adder. In
this case, the Ripple Carry Adder is as slow as it is large. Actually, Ripple Carry Adders
are fast only for some configurations of the input words, where carry signals are
generated at some positions.
Carry Skip Adders take advantage both of the generation or the propagation of the
carry signal. They are divided into blocks, where a special circuit detects quickly if all the
bits to be added are different (Pi = 1 in all the block). The signal produced by this circuit
will be called block propagation signal. If the carry is propagated at all positions in the
block, then the carry signal entering into the block can directly bypass it and so be
transmitted through a multiplexer to the next block. As soon as the carry signal is
transmitted to a block, it starts to propagate through the block, as if it had been generated
at the beginning of the block. Figure 6.16 shows the structure of a 24-bits Carry Skip
Adder, divided into 4 blocks.
It becomes now obvious that there exist a trade-off between the speed and the size
of the blocks. In this part we analyze the division of the adder into blocks of equal size.
Let us denote k1 the time needed by the carry signal to propagate through an adder cell,
and k2 the time it needs to skip over one block. Suppose the N-bit Carry Skip Adder is
divided into M blocks, and each block contains P adder cells. The actual addition time of
a Ripple Carry Adder depends on the configuration of the input words. The completion
time may be small but it also may reach the worst case, when all adder cells propagate the
carry signal. In the same way, we must evaluate the worst carry propagation time for the
Carry Skip Adder. The worst case of carry propagation is depicted in Figure 6.17.
The configuration of the input words is such that a carry signal is generated at the
beginning of the first block. Then this carry signal is propagated by all the succeeding
adder cells but the last which generates another carry signal. In the first and the last block
the block propagation signal is equal to 0, so the entering carry signal is not transmitted
to the next block. Consequently, in the first block, the last adder cells must wait for the
carry signal, which comes from the first cell of the first block. When going out of the first
N=M.P
The time T needed by the carry signal to propagate through P adder cells is
T=k1.P
The time T' needed by the carry signal to skip through M adder blocks is
T'=k2.M
The problem to solve is to minimize the worst case delay which is:
This type of adder is not as fast as the Carry Look Ahead (CLA) presented in a
next section. However, despite its bigger amount of hardware needed, it has an interesting
design concept. The Carry Select principle requires two identical parallel adders that are
partitioned into four-bit groups. Each group consists of the same design as that shown on
Figure 6.18. The group generates a group carry. In the carry select adder, two sums are
generated simultaneously. One sum assumes that the carry in is equal to one as the other
assumes that the carry in is equal to zero. So that the predicted group carry is used to
select one of the two sums.
It can be seen that the group carries logic increases rapidly when more high- order
groups are added to the total adder length. This complexity can be decreased, with a
subsequent increase in the delay, by partitioning a long adder into sections, with four
groups per section, similar to the CLA adder.
Usually the size and complexity for a big adder using this equation is not
affordable. That is why the equation is used in a modular way by making groups of carry
(usually four bits). Such a unit generates then a group carry which give the right predicted
information to the next block giving time to the sum units to perform their calculation.
Figure-6.19: The Carry Generation unit performing the Carry group computation
The same design is available with less transistors in a dynamic logic design. The
sizing is still an important issue, but the number of transistors is reduced (Figure 6.21).
Introduction
Serial-Parallel Multiplier
The structure of Figure 6.23 is suited only for positive operands. If the operands are
negative and coded in 2’s complement:
We see that subtraction cells must be used. In order to use only adder cells, the
negative terms may be rewritten as:
because:
Booth Algorithm
This algorithm is a powerful direct algorithm for signed-number multiplication. It
generates a 2n-bit product and treats both positive and negative numbers uniformly. The
idea is to reduce the number of additions to perform. Booth algorithm allows in the best
case n/2 additions whereas modified Booth algorithm allows always n/2 additions.
2i+k-2i=2i+k-1+2i+k-2+...+2i+1+2i
In fact, the modified Booth algorithm converts a signed number from the standard
2’s-complement radix into a number system where the digits are in the set {-1,0,1}. In
this number system, any number may be written in several forms, so the system is called
redundant.
The coding table for the modified Booth algorithm is given in Table 1. The
algorithm scans strings composed of three digits. Depending on the value of the string, a
certain operation will be performed.
BIT M is
1
2 20 2-1 OPERATION multiplied
Yi+1 Yi Yi-1 by
0 0 0 add zero (no string) +0
0 0 1 add multipleic (end of string) +X
0 1 0 add multiplic. (a string) +X
0 1 1 add twice the mul. (end of string) +2X
1 0 0 sub. twice the m. (beg. of string) -2X
1 0 1 sub. the m. (-2X and +X) -X
1 1 0 sub . the m. (beg. of string) -X
1 1 1 sub. zero (center of string) -0
.
Wallace Trees
For this purpose, Wallace trees were introduced. The addition time grows like the
logarithm of the bit number. The simplest Wallace tree is the adder cell. More generally,
an n-inputs Wallace tree is an n-input operator and log2(n) outputs, such that the value of
the output word is equal to the number of “1” in the input word. The input bits and the
least significant bit of the output have the same weight (Figure 6.27). An important
property of Wallace trees is that they may be constructed using adder cells. Furthermore,
the number of adder cells needed grows like the logarithm log2(n) of the number n of
input bits. Consequently, Wallace trees are useful whenever a large number of operands
are to add, like in multipliers. In a Braun or Baugh-Wooley multiplier with a Ripple
Carry Adder, the completion time of the multiplication is proportional to twice the
number n of bits. If the collection of the partial products is made through Wallace trees,
the time for getting the result in a carry save notation should be proportional to log2(n).
Power dissipation
• static dissipation is very small
• dynamic power is significant
• dissipation can be reduced by alternate geometry
Volatility
• data storage time is limited to 1msec or less
Circuit diagram
VDD
Bus
T3
T2
T1
GND
WR RD
Circuit diagram
Circuit diagram
WR,φ1 RD,φ1
O/P
φ2
Figure 7.3: nMOS pseudo-static memory Cell
WR, φ1 RD, φ1
WR,φ1 φ2 RD,φ1
O/P
φ2
Figure 7.4: CMOS pseudo-static memory Cell
Circuit diagram
bit bit_b
word
Definition:
Design for testability (DFT) refers to those design techniques that make test generation
and test application cost-effective.
Some terminologies:
Input / output (I/O) pads
• Protection of circuitry on chip from damage
• Care to be taken in handling all MOS circuits
• Provide necessary buffering between the environments On & OFF chip
• Provide for the connections of power supply
• Pads must be always placed around the peripheral
Minimum set of pads include:
• VDD connection pad
• GND(VSS) connection pad
• Input pad
• Output pad
• Bidirectional I/O pad
Designer must be aware of:
• nature of circuitry
• ratio/size of inverters/buffers on which output lines are connected
• how input lines pass through the pad circuit (pass transistor/transmission gate)
System delays
Buses:
• convenient concept in distributing data & control through a system
• bidirectional buses are convenient
• in design of datapath
• problems: capacitive load present
A fault model is a model of how a physical or parametric fault manifests itself in the
circuit Operation. Fault tests are derived based on these models
Physical Faults are caused due to the following reasons:
Defect in silicon substrate
Photolithographic defects
Mask contamination and scratches
Process variations and abnormalities
Oxide defects
Physical faults cause Electrical and Logical faults
Logical Faults are:
Single/multiple stuck-at (most used)
CMOS stuck-open
CMOS stuck-on
AND / OR Bridging faults
Electrical faults are due to short, opens, transistor stuck on, stuck open, excessive steady
state currents, resistive shorts and open.
The first idea to test an N input circuit would be to apply an N-bit counter to the
inputs (controllability), then generate all the 2N combinations, and observe the outputs
for checking (observability). This is called "exhaustive testing", and it is very efficient...
but only for few- input circuits. When the input number increase, this technique becomes
very time consuming.
Most of the time, in exhaustive testing, many patterns do not occur during the
application of the circuit. So instead of spending a huge amount of time searching for
faults everywhere, the possible faults are first enumerated and a set of appropriate vectors
are then generated. This is called "single-path sensitization" and it is based on "fault
oriented testing".
• Manifestation : gate inputs, at the site of the fault, are specified as to generate the
opposite value of the faulty value (0 for SA1, 1 for SA0).
• Propagation : inputs of the other gates are determined so as to propagate the fault
signal along the specified path to the primary output of the circuit. This is done by
setting these inputs to "1" for AND/NAND gates and "0" for OR/NOR gates.
• Consistency : or justification. This final step helps finding the primary input
pattern that will realize all the necessary input values. This is done by tracing
backward from the gate inputs to the primary inputs of the logic in order to
receive the test patterns.
Example1 - SA1 of line1 (L1) : the aim is to find the vector(s) able to detect this fault.
These three steps have led to four possible vectors detecting L1=SA1.
Example 2 - SA1 of line8 (L8) : The same combinational logic having one internal line
SA1
• Manifestation : L8 = 0
• Propagation: Through the AND-gate: L5 = L1 = 1, then L10 = 0 Through the
NOR-gate: we want to have L11 = 0, not to mask L10 = 0.
• Consistency: From the AND-gate L8 = 0 leads to L7 = 0. From the NOT-gate
L11 = 0 means L9 = L7 = 1, L7 could not be set to 1 and 0 at the same time. This
incompatibility could not be resolved in this case, and the fault "L8 SA1" remains
undetectable.
D – Algorithm:
Practical guidelines for testability should aim to facilitate test processes in three
main ways:
All "design for test" methods ensure that a design has enough observability and
controllability to provide for a complete and efficient testing. When a node has difficult
access from primary inputs or outputs (pads of the circuit), a very efficient method is to
add internal pads acceding to this kind of node in order, for instance, to control block B2
and observe block B1 with a probe.
It is easy to observe block B1 by adding a pad just on its output, without breaking
the link between the two blocks. The control of the block B2 means to set a 0 or a 1 to its
input, and also to be transparent to the link B1-B2. The logic functions of this purpose are
a NOR- gate, transparent to a zero, and a NAND-gate, transparent to a one. By this way
the control of B2 is possible across these two gates.
In this case the major penalties are extra devices and propagation delays due to
multiplexers. Demultiplexers are also used to improve observability. Using multiplexers
and demultiplexers allows internal access of blocks separately from each other, which is
the basis of techniques based on partitioning or bypassing blocks to observe or control
separately other blocks.
Based on the same principle of partitioning, the counters are sequential elements
that need a large number of vectors to be fully tested. The partitioning of a long counter
corresponds to its division into sub-counters.
The full test of a 16-bit counter requires the application of 216 + 1 = 65537 clock
pulses. If this counter is divided into two 8-bit counters, then each counter can be tested
separately, and the total test time is reduced 128 times (27). This is also useful if there are
subsequent requirements to set the counter to a particular count for tests associated with
other parts of the circuit: pre-loading facilities.
One of the most important problems in sequential logic testing occurs at the time
of power-on, where the first state is random if there were no initialization. In this case it
is impossible to start a test sequence correctly, because of memory effects of the
sequential elements.
The solution is to provide flip-flops or latches with a set or reset input, and then to
use them so that the test sequence would start with a known state.
Ideally, all memory elements should be able to be set to a known state, but
practically this could be very surface consuming, also it is not always necessary to
initialize all the sequential logic. For example, a serial-in serial-out counter could have its
first flip-flop provided with an initialization, then after a few clock pulses the counter is
in a known state.
Overriding of the tester is necessary some times, and requires the addition of gates
before a Set or a Reset so the tester can override the initialization state of the logic.
Automatic test pattern generators work in logic domains, they view delay
dependent logic as redundant combinational logic. In this case the ATPG will see an
AND of a signal with its complement, and will therefore always compute a 0 on the
output of the AND-gate (instead of a pulse). Adding an OR-gate after the AND-gate
output permits to the ATPG to substitute a clock signal directly.
When a clock signal is gated with any data signal, for example a load signal
coming from a tester, a skew or any other hazard on that signal can cause an error on the
output of logic.
This is also due to asynchronous type of logic. Clock signals should be distributed
in the circuit with respect to synchronous logic structure.
This is another timing situation to avoid, in which the tester could not be
synchronized if one clock or more are dependent on asynchronous delays (across D-input
of flip-flops, for example).
The self resetting logic is more related to asynchronous logic, since a reset input
is independent of clock signal.
Before the delayed reset, the tester reads the set value and continues the normal
operation. If a reset has occurred before tester observation, then the read value is
erroneous. The solution to this problem is to allow the tester to override by adding an
OR-gate, for example, with an inhibition input coming from the tester. By this way the
right response is given to the tester at the right time.
The tester can then disconnect any module from the buses by putting its output
into a high- impedance state. Test patterns can then be applied to each module separately.
Testing analog circuit requires a completely different strategy than for digital
circuit. Also the sharp edges of digital signals can cause cross-talk problem to the analog
lines, if they are close to each other.
If it is necessary to route digital signals near analog lines, then the digital lines
should be properly balanced and shielded. Also, in the cases of circuits like Analog-
Digital converters, it is better to bring out analog signals for observation before
conversion. For Digital-Analog converters, digital signals are to be brought out also for
observation before conversion.
The set of design for testability guidelines presented above is a set of ad hoc
methods to design random logic in respect with testability requirements. The scan design
techniques are a set of structured approaches to design (for testability) the sequential
circuits.
The major difficulty in testing sequential circuits is determining the internal state
of the circuit. Scan design techniques are directed at improving the controllability and
observability of the internal states of a sequential circuit. By this the problem of testing a
sequential circuit is reduced to that of testing a combinational circuit, since the internal
states of the circuit are under control.
Scan Path
The goal of the scan path technique is to reconfigure a sequential circuit, for the
purpose of testing, into a combinational circuit. Since a sequential circuit is based on a
combinational circuit and some storage elements, the technique of scan path consists in
connecting together all the storage elements to form a long serial shift register. Thus the
internal state of the circuit can be observed and controlled by shifting (scanning) out the
contents of the storage elements. The shift register is then called a scan path.
The storage elements can either be D, J-K, or R-S types of flip-flops, but simple
latches cannot be used in scan path. However, the structure of storage elements is slightly
different than classical ones. Generally the selection of the input source is achieved using
a multiplexer on the data input controlled by an external mode signal. This multiplexer is
integrated into the D-flip-flop, in our case; the D-flip-flop is then called MD-flip-flop
(multiplexed-flip-flop).
As analyzed from figure 8.13, in the normal mode, the storage elements are
connected to the combinational circuit, in the loops of the global sequential circuit, which
is considered then as a finite state machine.
In the test mode, the loops are broken and the storage elements are connected
together as a serial shift register (scan path), receiving the same clock signal. The input of
the scan path is called scan-in and the output scan-out. Several scan paths can be
implemented in one same complex circuit if it is necessary, though having several scan-in
inputs and scan-out outputs.
Before applying test patterns, the shift register itself has to be verified by shifting
in all ones i.e. 111...11, or zeros i.e. 000...00, and comparing.
1. Set test mode signal, flip-flops accept data from input scan-in
2. Verify the scan path by shifting in and out test data
3. Set the shift register to an initial state
4. Apply a test pattern to the primary inputs of the circuit
5. Set normal mode, the circuit settles and can monitor the primary outputs of the
circuit
6. Activate the circuit clock for one cycle
7. Return to test mode
8. Scan out the contents of the registers, simultaneously scan in the next pattern
The scan path aspect is due to the use of shift register latches (SRL) employed as
storage elements. In the test mode they are connected as a long serial shidt register. Each
SRL has a specific design similar to a master-slave FF. it is driven by two non-
overlapping clocks which can be controlled readily from the primary inputs to the circuit.
Input D1 is the normal data input to the SRL; clocks CK1 and CK2 control the normal
operation of the SRL while clocks CK3 and CK2 control scan path movements through
the SRL. The SRL output is derived at L2 in both modes of operation, the mode
depending on which clocks are activated.
Advantages:
• Circuit operation is independent of dynamic characteristics of the logic elements
• ATP generation is simplified
• Eliminate hazards and races
• Simplifies test generation and fault simulation
Boundary Scan Test (BST) is a technique involving scan path and self-testing
techniques to resolve the problem of testing boards carrying VLSI integrated circuits
and/or surface mounted devices (SMD).
Printed circuit boards (PCB) are becoming very dense and complex, especially
with SMD circuits, that most test equipment cannot guarantee good fault coverage.
BST (figure 8.15) consists in placing a scan path (shift register) adjacent to each
component pin and to interconnect the cells in order to form a chain around the border of
the circuit. The BST circuits contained on one board are then connected together to form
a single path through the board.
The boundary scan path is provided with serial input and output pads and appropriate
clock pads which make it possible to:
Procedure:
Set test inputs to all test points
Apply the master reset signal to initialize all memory elements
Set scan-in address & data, then apply the scan clock
Repeat the above step until all internal test inputs are scanned
Clock once for normal operation
Check states of the output points
Read the scan-out states of all memory elements by applying the address
Built-in-self test
Objectives:
1. To reduce test pattern generation cost
2. To reduce volume of test data
3. To reduce test time
Built-in Self Test, or BIST, is the technique of designing additional hardware and
software features into integrated circuits to allow them to perform self-testing, i.e., testing
of their own operation (functionally, parametrically, or both) using their own circuits,
thereby reducing dependence on an external automated test equipment (ATE).
Signature analysis performs polynomial division that is, division of the data out of
the device under test (DUT). This data is represented as a polynomial P(x) which is
divided by a characteristic polynomial C(x) to give the signature R(x), so that
R(x) = P(x)/C(x)
This is summarized as in figure 8.16.
TGP Compaction
(Digital DUT Signature Analysis
Tester)
An LFSR is a shift register that, when clocked, advances the signal through the
register from one bit to the next most-significant bit. Some of the outputs are combined in
exclusive-OR configuration to form a feedback mechanism. A linear feedback shift
register can be formed by performing exclusive-OR (Figure 8.16) on the outputs of two
or more of the flip-flops together and feeding those outputs back into the input of one of
the flip-flops.
i0 i1 i2
D0
Clock
Q0 Q1 Q2
When both A and B inputs are 0, the D-inputs are ignored (due to the AND gate
connected to A), but the flipflops are connected as a shift-register via the NOR and XOR
gates. The input to the first flipflop is then selected via the multiplexer controlled by the
S input. If the S input is 1, the multiplexer transmits the value of the external SIN shift-in
input to the first flipflop, so that the BILBO register works as a normal shift-register. This
allows to initialize the register contents using a single signal wire, e.g. from an external
test controller.
If all of the A, B, and S inputs are 0, the flipflops are configured as a shift-
register, again, but the input bit to the first flipflop is computed by the XOR gates in the
LFSR feedback path. This means that the register works as a standard LFSR
pseudorandom pattern generator, useful to drive the logic connected to the Q outputs.
Note that the start value of the LFSR sequence can be set by shifting it in via the SIN
input.
Because a BILBO register can be used as a pattern generator for the block it
drives, as well provide signature-analysis for the block it is driven by, a whole circuit can
be made self-testable with very low overhead and with only minimal performance
degradation (two extra gates before the D inputs of the flipflops).