Design and Estimation of Delay

ABSTRACT
Parallel-prefix adders (also known as carry tree adders) are known to have the best performance in
VLSI designs. However, this performance advantage does not translate directly into FPGA implementations
due to constraints on logic block configurations and routing overhead. This paper investigates three types of
carry-tree adders (the Kogge-Stone, sparse Kogge-Stone, and spanning tree adder) and compares them to the
simple Ripple Carry Adder (RCA) and Carry Skip Adder (CSA). These designs of varied bit-widths were
implemented on a Xilinx Spartan 3E FPGA and delay measurements were made with a high-performance
logic analyzer. Due to the presence of a fast carry-chain, the RCA designs exhibit better delay performance
up to 128 bits. The carry-tree adders are expected to have a speed advantage over the RCA as bit widths
approach 128.
1
SOFTWARE LANGUAGES:
HDL LANGUAGE : VHDL
SIMULATION TOOL : XILINX ISE SIMULATOR
SYNTHESIS TOOL : XILINX 9.1i
FPGA : SPARTAN 3E
2
CHAPTER-1
1.INTRODUCTION
In Very Large Scale Integration (VLSI) designs, Parallel prefix adders (PPA) have the better delay
performance. This paper investigates four types of PPA’s (Kogge Stone Adder (KSA), Spanning Tree Adder
(STA), Brent Kung Adder (BKA) and Sparse Kogge Stone Adder (SKA)). Additionally Ripple Carry Adder
(RCA), Carry Look ahead Adder (CLA) and Carry
Skip Adder (CSA) are also investigated. These adders are implemented in verilog Hardware Description
Language (HDL) using Xilinx Integrated Software Environment (ISE) 13.2 Design Suite. These designs are
implemented in Xilinx Virtex 5 Field Programmable Gate Arrays (FPGA) and delays are measured using
Agilent 1692A logic analyzer and all these adder’s delay, power and area are investigated and compared
finally.
The binary addition is the basic arithmetic operation in digital circuits and it became essential in most of the
digital systems including Arithmetic and Logic Unit (ALU), microprocessors and Digital Signal Processing
(DSP). At present, the research continues on increasing the adder’s delay performance. In many practical
applications like mobile and telecommunications, the Speed and power performance improved in FPGAs is
better than microprocessor and DSP’s based solutions. Additionally, power is also an important aspect in
growing trend of mobile electronics, which makes large-scale use of DSP functions. Because of the
Programmability, structure of configurable logic blocks (CLB) and programming interconnects in FPGAs,
Parallel prefix adders have better performance. The delays of the adders are discussed. In this paper, above
mentioned PPA’s and RCA and CSA are implemented and characterized on a Xilinx virtex 5 FPGA. Finally,
delay, power and area for the designed adders are presented and compared.
3
CHAPTER-2
2. LITERATURE SURVEY
One of the most universal digital circuits for almost any application is an adder. It is the fundamental
building block of Arithmetic Logic Units (ALUs) in general-purpose and special-purpose digital signal
microprocessors. Currently,in the CMOS domain, the design space of adder structures has been nearly
exhausted, with only minimal improvements shown over previous designs. In contrast, emerging digital
circuit technologies such as superconducting Rapid Single FluxQuantum (RSFQ) logic opens a way for
researchers to explore new design methodologies for extremely fast, energy-efficient adders.In RSFQ logic,
most adder designs demonstrated to dateare bit-serial or digit-serial architectures which operate on a single
bit or a small group of bits sequentially at a very high processing rate [1]–[6]. Such designs allow for simple
clocking and compact structures. However, the latency of serial adders scales O(n), where n is the number of
bits per operand, which leads to long latencies for 32-/64-bit operations in general purpose processors. In the
past, parallel architectures in RSFQ have been limited to small data widths or relatively long latency ripple-
carryadders [7]–[9]. One study evaluated 32-/64-bit parallel Kogge-Stone RSFQ adders using co-flow
clocking [10].
In the effort of realizing scalable, high-performance, fully parallel designs, a new technique of
asynchronous hybrid wave-pipelining for RSFQ circuits has been developed at Stony Brook University
(SBU) [11], [12]. Later, as a result of the collaboration between the SBU and HYPRES designers, an 8-bit
wave-pipelined ALU was successfully designed, fabricated, and demonstrated correct operation at the rate of
20 GHz[13], [14].In this paper, we present the design of the first 16-bit asynchronous parallel adder
implemented in RSFQ logic. It builds upon the proven hybrid wave-pipelining techniques to provide16-bit
wide processing and synchronization. It incorporates an energy efficient, low complexity sparse-tree
structure with very high processing rate. The work is based on a design study for a scalable 32-bit wave-
pipelined sparse-tree adder conducted at SBU.
4
CHAPTER-3
ADDERS
• In electronics, an adder or summer is a digital circuit that performs addition of numbers. In many
computers and other kinds of processors, adders are used not only in the arithmetic logic unit(s), but also in
other parts of the processor, where they are used to calculate addresses, table indices, and similar.
Although adders can be constructed for many numerical representations, such as binary-coded
decimal or excess-3, the most common adders operate on binary numbers. In cases where two's
complement or ones' complement is being used to represent negative numbers, it is trivial to modify an
adder into an adder–subtractor. Other signed number representations require a more complex adder.
• Half adder
The half adder adds two one-bit binary numbers A and B. It has two outputs, S and C (the value
theoretically carried on to the next addition); the final sum is 2C + S. The simplest half-adder design,
pictured on the right, incorporates an XOR gate for S and an AND gate for C. With the addition of an OR
gate to combine their carry outputs, two half adders can be combined to make a full adder
A full adder adds binary numbers and accounts for values carried in as well as out. A one-bit full
adder adds three one-bit numbers, often written as A, B, and Cin; A and B are the operands, and Cin is a bit
carried in from the next less significant stage. The full-adder is usually a component in a cascade of adders,
which add 8, 16, 32, etc. binary numbers.
A full adder can be implemented in many different ways such as with a custom transistor-level circuit or
composed of other gates. One example implementation is with and .
In this implementation, the final OR gate before the carry-out output may be replaced by an XOR gate
without altering the resulting logic. Using only two types of gates is convenient if the circuit is being
implemented using simple IC chips which contain only one gate type per chip. In this light, C out can be
implemented as .
A full adder can be constructed from two half adders by connecting A and B to the input of one half adder,
connecting the sum from that to an input to the second adder, connecting Ci to the other input and OR the
two carry outputs. Equivalently, S could be made the three-bit XOR of A, B, and Ci, and Cout could be made
the three-bit majority function of A, B, and Ci.
• Ripple carry adder

It is possible to create a logical circuit using multiple full adders to add N-bit numbers. Each full adder
inputs a Cin, which is the Cout of the previous adder. This kind of adder is a ripple carry adder, since each
5
carry bit "ripples" to the next full adder. Note that the first (and only the first) full adder may be replaced by
a half adder.
The layout of a ripple carry adder is simple, which allows for fast design time; however, the ripple carry
adder is relatively slow, since each full adder must wait for the carry bit to be calculated from the previous
full adder. The gate delay can easily be calculated by inspection of the full adder circuit. Each full adder
requires three levels of logic. In a 32-bit [ripple carry] adder, there are 32 full adders, so the critical path
(worst case) delay is 3 (from input to carry in first adder) + 31 * 2 (for carry propagation in later adders) =
65 gate delays. A design with alternating carry polarities and optimized AND-OR-Invert gates can be about
twice as fas
• Carry-lookahead adders
To reduce the computation time, engineers devised faster ways to add two binary numbers by using carry-
lookahead adders. They work by creating two signals (P and G) for each bit position, based on if a carry is
propagated through from a less significant bit position (at least one input is a '1'), a carry is generated in that
bit position (both inputs are '1'), or if a carry is killed in that bit position (both inputs are '0'). In most cases,
P is simply the sum output of a half-adder and G is the carry output of the same adder. After P and G are
generated the carries for every bit position are created. Some advanced carry-lookahead architectures are the
Manchester carry chain, Brent–Kung adder, and the Kogge–Stone adder.
Some other multi-bit adder architectures break the adder into blocks. It is possible to vary the length of these
blocks based on the propagation delay of the circuits to optimize computation time. These block based
adders include the carry bypass adder which will determine P and G values for each block rather than each
bit, and the carry select adder which pre-generates sum and carry values for either possible carry input to the
block.
A carry-lookahead adder (CLA) is a type of adder used in digital logic. A carry-lookahead adder improves
speed by reducing the amount of time required to determine carry bits. It can be contrasted with the simpler,
but usually slower, ripple carry adder for which the carry bit is calculated alongside the sum bit, and each
bit must wait until the previous carry has been calculated to begin calculating its own result and carry bits
(see adder for detail on ripple carry adders). The carry-lookahead adder calculates one or more carry bits
before the sum, which reduces the wait time to calculate the result of the larger value bits. The Kogge-Stone
adder and Brent-Kung adder are examples of this type of adder.
6
Charles Babbage recognized the performance penalty imposed by ripple carry and developed mechanisms
for anticipating carriage in his computing engines. Gerald Rosenberger of IBM filed for a patent on a
modern binary carry-lookahead adder in 1957.
A ripple-carry adder works in the same way as pencil-and-paper methods of addition. Starting at the
rightmost (least significant) digit position, the two corresponding digits are added and a result obtained. It is
also possible that there may be a carry out of this digit position (for example, in pencil-and-paper methods,
"9+5=4, carry 1"). Accordingly all digit positions other than the rightmost need to take into account the
possibility of having to add an extra 1, from a carry that has come in from the next position to the right.
This means that no digit position can have an absolutely final value until it has been established whether or
not a carry is coming in from the right. Moreover, if the sum without a carry is 9 (in pencil-and-paper
methods) or 1 (in binary arithmetic), it is not even possible to tell whether or not a given digit position is
going to pass on a carry to the position on its left. At worst, when a whole sequence of sums comes
to ...99999999... (in decimal) or ...11111111... (in binary), nothing can be deduced at all until the value of the
carry coming in from the right is known, and that carry is then propagated to the left, one step at a time, as
each digit position evaluated "9+1=0, carry 1" or "1+1=0, carry 1". It is the "rippling" of the carry from right
to left that gives a ripple-carry adder its name, and its slowness. When adding 32-bit integers, for instance,
allowance has to be made for the possibility that a carry could have to ripple through every one of the 32
one-bit adders.
Carry lookahead depends on two things:
• Calculating, for each digit position, whether that position is going to propagate a carry if one comes
in from the right.
• Combining these calculated values to be able to deduce quickly whether, for each group of digits,
that group is going to propagate a carry that comes in from the right.
Supposing that groups of 4 digits are chosen. Then the sequence of events goes something like this:
• All 1-bit adders calculate their results. Simultaneously, the lookahead units perform their
calculations.
• Suppose that a carry arises in a particular group. Within at most 3 gate delays, that carry will emerge
at the left-hand end of the group and start propagating through the group to its left.
• If that carry is going to propagate all the way through the next group, the lookahead unit will already
have deduced this. Accordingly, before the carry emerges from the next group the lookahead unit is
immediately (within 1 gate delay) able to tell the next group to the left that it is going to receive a
carry - and, at the same time, to tell the next lookahead unit to the left that a carry is on its way.
The net effect is that the carries start by propagating slowly through each 4-bit group, just as in a ripple-carry
system, but then move 4 times as fast, leaping from one lookahead carry unit to the next. Finally, within each
group that receives a carry, the carry propagates slowly within the digits in that group.
7
The more bits in a group, the more complex the lookahead carry logic becomes, and the more time is spent
on the "slow roads" in each group rather than on the "fast road" between the groups (provided by the
lookahead carry logic). On the other hand, the fewer bits there are in a group, the more groups have to be
traversed to get from one end of a number to the other, and the less acceleration is obtained as a result.
Deciding the group size to be governed by lookahead carry logic requires a detailed analysis of gate and
propagation delays for the particular technology being used.
It is possible to have more than one level of lookahead carry logic, and this is in fact usually done. Each
lookahead carry unit already produces a signal saying "if a carry comes in from the right, I will propagate it
to the left", and those signals can be combined so that each group of (let us say) four lookahead carry units
becomes part of a "supergroup" governing a total of 16 bits of the numbers being added. The "supergroup"
lookahead carry logic will be able to say whether a carry entering the supergroup will be propagated all the
way through it, and using this information, it is able to propagate carries from right to left 16 times as fast as
a naive ripple carry. With this kind of two-level implementation, a carry may first propagate through the
"slow road" of individual adders, then, on reaching the left-hand end of its group, propagate through the "fast
road" of 4-bit lookahead carry logic, then, on reaching the left-hand end of its supergroup, propagate through
the "superfast road" of 16-bit lookahead carry logic.
Again, the group sizes to be chosen depend on the exact details of how fast signals propagate within logic
gates and from one logic gate to another.
For very large numbers (hundreds or even thousands of bits) look ahead carry logic does not become any
more complex, because more layers of super groups and super super groups can be added as necessary. The
increase in the number of gates is also moderate: if all the group sizes are 4, one would end up with one third
as many lookahead carry units as there are adders. However, the "slow roads" on the way to the faster levels
begin to impose a drag on the whole system (for instance, a 256-bit adder could have up to 24 gate delays in
its carry processing), and the mere physical transmission of signals from one end of a long number to the
other begins to be a problem. At these sizes carry-save adders are preferable, since they spend no time on
carry propagation at all.
• Operation:
Carry lookahead logic uses the concepts of generating and propagating carries. Although in the context of a
carry lookahead adder, it is most natural to think of generating and propagating in the context of binary
addition, the concepts can be used more generally than this. In the descriptions below, the word digit can be
replaced by bit when referring to binary addition.
The addition of two 1-digit inputs A and B is said to generate if the addition will always carry, regardless of
whether there is an input carry (equivalently, regardless of whether any less significant digits in the sum
carry). For example, in the decimal addition 52 + 67, the addition of the tens digits 5 and 6 generates
8
because the result carries to the hundreds digit regardless of whether the ones digit carries (in the example,
the ones digit does not carry (2+7=9)).
In the case of binary addition, generates if and only if both A andB are 1. If we write to
represent the binary predicate that is true if and only if generates, we have:
The addition of two 1-digit inputs A and B is said to propagate if the addition will carry whenever there is an
input carry (equivalently, when the next less significant digit in the sum carries). For example, in the decimal
addition 37 + 62, the addition of the tens digits 3 and 6 propagate because the result would carry to the
hundreds digit if the ones were to carry (which in this example, it does not). Note that propagate and
generate are defined with respect to a single digit of addition and do not depend on any other digits in the
sum.
In the case of binary addition, propagates if and only if at least one of A or B is 1. If we write
to represent the binary predicate that is true if and only if propagates, we have:
Sometimes a slightly different definition of propagate is used. By this definition A + B is said to propagate if
the addition will carry whenever there is an input carry, but will not carry if there is no input carry. It turns
out that the way in which generate and propagate bits are used by the carry lookahead logic, it doesn't matter
which definition is used. In the case of binary addition, this definition is expressed by:
For binary arithmetic, or is faster than xor and takes fewer transistors to implement. However, for a
multiple-level carry lookahead adder, it is simpler to use .
Given these concepts of generate and propagate, when will a digit of addition carry? It will carry precisely
when either the addition generates or the next less significant bit carries and the addition propagates. Written
in boolean algebra, with the carry bit of digit i, and and the propagate and generate bits of digit i
respectively,
9
• Implementation details
For each bit in a binary sequence to be added, the Carry Look Ahead Logic will determine whether that bit
pair will generate a carry or propagate a carry. This allows the circuit to "pre-process" the two numbers
being added to determine the carry ahead of time. Then, when the actual addition is performed, there is no
delay from waiting for the ripple carry effect (or time it takes for the carry from the first Full Adder to be
passed down to the last Full Adder). Below is a simple 4-bit generalized Carry Look Ahead circuit that
combines with the 4-bit Ripple Carry Adder we used above with some slight adjustments:
For the example provided, the logic for the generate (g) and propagate (p) values are given below. Note that
the numeric value determines the signal from the circuit above, starting from 0 on the far left to 3 on the far
right:
Substituting into , then into , then into yields the expanded equations:
To determine whether a bit pair will generate a carry, the following logic works:
To determine whether a bit pair will propagate a carry, either of the following logic statements work:
10
The reason why this works is based on evaluation of . The only difference in the truth
tables between ( ) and ( ) is when both and are 1. However, if both and are 1, then
the term is 1 (since its equation is ), and the term becomes irrelevant. The XOR is used
normally within a basic full adder circuit; the OR is an alternate option (for a carry lookahead only) which is
far simpler in transistor-count terms.
The Carry Look Ahead 4-bit adder can also be used in a higher-level circuit by having each CLA Logic
circuit produce a propagate and generate signal to a higher-level CLA Logic circuit. The group propagate (
) and group generate ( ) for a 4-bit CLA are:
Putting 4 4-bit CLAs together yields four group propagates and four group generates. A Lookahead Carry
Unit (LCU) takes these 8 values and uses identical logic to calculate in the CLAs. The LCU then
generates the carry input for each of the 4 CLAs and a fifth equal to .
The calculation of the gate delay of a 16-bit adder (using 4 CLAs and 1 LCU) is not as straight forward as
the ripple carry adder. Starting at time of zero:
• calculation of and is done at time 1
• calculation of is done at time 3
• calculation of the is done at time 2
• calculation of the is done at time 3
• calculation of the inputs for the CLAs from the LCU are done at
• time 0 for the first CLA
• time 5 for the second CLA
• time 5 for the third & fourth CLA
• calculation of the are done at
• time 4 for the first CLA
• time 8 for the second CLA
• time 8 for the third & fourth CLA
11
• calculation of the final carry bit ( ) is done at time 5
The maximum time is 8 gate delays (for ). A standard 16-bit ripple carry adder would take 31 gate
delays.
• Manchester carry chain
The Manchester carry chain is a variation of the carry-lookaheadadder that uses shared logic to lower the
transistor count. As can be seen above in the implementation section, the logic for generating each carry
contains all of the logic used to generate the previous carries. A Manchester carry chain generates the
intermediate carries by tapping off nodes in the gate that calculates the most significant carry value. Not all
logic families have these internal nodes, however, CMOS being a major example. Dynamic logic can
support shared logic, as can transmission gate logic. One of the major downsides of the Manchester carry
chain is that the capacitive load of all of these outputs, together with the resistance of the transistors causes
the propagation delay to increase much more quickly than a regular carry lookahead. A Manchester-carry-
chain section generally won't exceed 4 bits.
carry-save adder
A carry-save adder is a type of digital adder, used in computer microarchitecture to compute the sum of
three or more n-bit numbers in binary. It differs from other digital adders in that it outputs two numbers of
the same dimensions as the inputs, one which is a sequence of partial sum bits and another which is a
sequence of carry bits.
Consider the sum:

12345678
+ 87654322
=100000000.
Using the arithmetic we learned as children, we go from right to left, "8+2=0, carry 1", "7+2+1=0, carry 1",
"6+3+1=0, carry 1", and so on to the end of the sum. Although we know the last digit of the result at once,
we cannot know the first digit until we have gone through every digit in the calculation, passing the carry
from each digit to the one on its left. Thus adding two n-digit numbers has to take a time proportional to n,
even if the machinery we are using would otherwise be capable of performing many calculations
simultaneously.
In electronic terms, using binary bits, this means that even if we have n one-bit adders at our disposal, we
still have to allow a time proportional to n to allow a possible carry to propagate from one end of the number
to the other. Until we have done this,
• We do not know the result of the addition.
12
• We do not know whether the result of the addition is larger or smaller than a given number (for
instance, we do not know whether it is positive or negative).
A carry look-ahead adder can reduce the delay. In principle the delay can be reduced so that it is
proportional to logn, but for large numbers this is no longer the case, because even when carry look-ahead is
implemented, the distances that signals have to travel on the chip increase in proportion to n, and
propagation delays increase at the same rate. Once we get to the 512-bit to 2048-bit number sizes that are
required in public-key cryptography, carry look-ahead is not of much help.
• The basic concept
Here is an example of a binary sum:

10111010101011011111000000001101
+ 11011110101011011011111011101111.
Carry-save arithmetic works by abandoning the binary notation while still working to base 2. It computes the
sum digit by digit, as
10111010101011011111000000001101
+ 11011110101011011011111011101111
= 21122120202022022122111011102212.
The notation is unconventional but the result is still unambiguous. Moreover, given n adders (here, n=32 full
adders), the result can be calculated in a single tick of the clock, since each digit result does not depend on
any of the others.
If the adder is required to add two numbers and produce a result, carry-save addition is useless, since the
result still has to be converted back into binary and this still means that carries have to propagate from right
to left. But in large-integer arithmetic, addition is a very rare operation, and adders are mostly used to
accumulate partial sums in a multiplication.
• Carry-save accumulators
Supposing that we have two bits of storage per digit, we can use a redundant binary representation, storing
the values 0, 1, 2, or 3 in each digit position. It is therefore obvious that one more binary number can be
added to our carry-save result without overflowing our storage capacity: but then what?
The key to success is that at the moment of each partial addition we add three bits:
• 0 or 1, from the number we are adding.
• 0 if the digit in our store is 0 or 2, or 1 if it is 1 or 3.
• 0 if the digit to its right is 0 or 1, or 1 if it is 2 or 3.
To put it another way, we are taking a carry digit from the position on our right, and passing a carry digit to
the left, just as in conventional addition; but the carry digit we pass to the left is the result of the previous
13
calculation and not the current one. In each clock cycle, carries only have to move one step along, and not n
steps as in conventional addition.
Because signals don't have to move as far, the clock can tick much faster.
There is still a need to convert the result to binary at the end of a calculation, which effectively just means
letting the carries travel all the way through the number just as in a conventional adder. But if we have done
512 additions in the process of performing a 512-bit multiplication, the cost of that final conversion is
effectively split across those 512 additions, so each addition bears 1/512 of the cost of that final
"conventional" addition.
• Drawbacks
At each stage of a carry-save addition,
• We know the result of the addition at once.
• We still do not know whether the result of the addition is larger or smaller than a given number (for
instance, we do not know whether it is positive or negative).
This latter point is a drawback when using carry-save adders to implement modular multiplication
(multiplication followed by division, keeping the remainder only). If we cannot know whether the
intermediate result is greater or less than the modulus, how can we know whether to subtract the modulus or
not?
Montgomery multiplication, which depends on the rightmost digit of the result, is one solution; though rather
like carry-save addition itself, it carries a fixed overhead, so that a sequence of Montgomery multiplications
saves time but a single one does not. Fortunately exponentiation, which is effectively a sequence of
multiplications, is the most common operation in public-key cryptography.
• Technical details
The carry-save unit consists of nfull adders, each of which computes a single sum and carry bit based solely
on the corresponding bits of the three input numbers. Given the three n - bit numbers a, b, and c, it produces
a partial sum ps and a shift-carry sc:
The entire sum can then be computed by:
• Shifting the carry sequence sc left by one place.
• Appending a 0 to the front (most significant bit) of the partial sum sequence ps.
• Using a ripple carry adder to add these two together and produce the resulting n + 1-bit value.
14
When adding together three or more numbers, using a carry-save adder followed by a ripple carry adder is
faster than using two ripple carry adders. This is because a ripple carry adder cannot compute a sum bit
without waiting for the previous carry bit to be produced, and thus has a delay equal to that of n full adders.
A carry-save adder, however, produces all of its output values in parallel, and thus has the same delay as a
single full-adder. Thus the total computation time (in units of full-adder delay time) for a carry-save adder
plus a ripple carry adder is n + 1, whereas for two ripple carry adders it would be 2n.
Carry Look A Head Adder

Look ahead carry algorithm speed up the operation to perform addition, because in this algorithm
carry for the next stages is calculated in advance based on input signals. In CLA, the carry propagation time
is reduced to O(log2(Wd)) by using a tree like circuit to compute the carry rapidly. The CLA exploits the
fact that the carry generated by a bit-position depends on the three inputs to that position. If ‘X’ and ‘Y‘ are
two inputs then if X=Y= 1, a carry is generated independently of the carry from the previous bit position and
if X=Y= 0, no carry is generated. Similarly if X ≠ Y, a carry is generated if and only if the previous bit-
position generates a carry. ‘C’ is initial carry , “S” and “C out” are output sum and carry respectively, then
Boolean expression for calculating next carry and addition is:
Pi = Xi xor Yi -- Carry Propagation (1)
Gi = Xi and Yi -- Carry Generation (2)
Ci+1 = Gi or (Pi and Ci) -- Next Carry (3)
Si = Xi xor Yi xor Ci -- Sum Generation (4)
Thus, for 4-bit adder, we can extend the carry, as shown below:
C1 = G0 + P0 · C0 (5)
C2 = G1 + P1 · C1 = G1 + P1 · G0 + P1 · P0 · C0 (6)
C3 = G2 + P2 · G1 + P2 · P1 · G0 + P2 · P1 · P0 · C0 (7)
C4 = G3 + P3 · G2 + P3 · P2 · G1 + P3 · P2 · P1 · G0+ P3 · P2 · P1 · P0 · C0 (8)
3.2Carry Save Adder

Basically, carry save adder is used to compute sum of three or more n-bit binary numbers. Carry save
adder is same as a full adder. As shown in the Figure.1, here we are computing sum of two 32-bit binary
numbers, so we take 32 full adders at first stage. Carry save unit consists of 32 full adders, each of which
computes single sum and carry bit based only on the corresponding bits of the two input numbers. Let X and
Y are two 32-bit numbers and produces partial sum and carry as S and C as shown in the Table1:
Si = Xi xor Yi (9)
Ci = Xi and Yi (10)
The final addition is then computed as:
1. Shifting the carry sequence C left by one place.
2. Placing a 0 to the front (MSB) of the partial sum sequence S.
15
3. Finally, a ripple carry adder is used to add these two together and computing the resulting sum.
TABLE I. CARRY SAVE ADDER COMPUTATION

X: 10011
Y: 11001
Z: +01011
S: 00001
C: +11011
Sum: 1 1 0 1 1 1
Figure 1: Computation flow of Carry Save Adder
DRAWBACKS OF RIPPLE CARRY AND CARRY LOOKAHEAD ADDER

In figure1, the first sum bit should wait until input carry is given, the second sum bit should wait until
previous carry is propagated and so on. Finally the output sum should wait until all previous carries are
generated. So it results in delay.
16
In order to reduce the delay in RCA (or) to propagate the carry in advance, we go for carry look ahead
adder .Basically this adder works on two operations called propagate and generate The propagate and
generate equations are given by.
For 4 bit CLA, the propagated carry equations are given as
Equations (3),(4),(5) and (6) are observed that, the carry complexity increases by increasing the adder bit
width. So designing higher bit CLA becomes complexity. In this way, for the higher bit of CLA’s, the carry
complexity increases by increasing the width of the adder. So results in bounded fan-in rather than
unbounded fan-in, when designing wide width adders. In order to compute the carries in advance without
delay and complexity, there is a concept called Parallel prefix approach.
17
CHAPTER-4
DIFFERENCE BETWEEN PARALLEL-PREFIX ADDERS AND OTHERS
(proposed)
The PPA’s pre-computes generate and propagate signals are presented in Using the fundamental carry
operator (fco), these computed signals are combined.The fundamental carry operator is denoted by the
symbol “ο ”,
For example, 4 bit CLA carry equation is given by
For example, 4 bit PPA carry equation is given by
Equations (8) and (9) are observed that, the carry look ahead adder takes 3 steps to generate the carry, but
the bit PPA takes 2 steps to generate the carry.
The prominent parallel prefix tree adders, that are invented so far are, Kogge-Stone, Brent-Kung, Han-
Carlson, and Sklansky. There exists various architectures fro carry calculation part. Tradeoff in these
architecture involves:
• Area of adder
• Its depth
• The fan-out of the nodes
18
• The overall wiring network.
Out of these, it was found from the literature that Kogge-stone adder is the fastest adder when compared to
other adders. Kogge-Stone adder implementation [7] is most straightforward, and also it has one of the
shortest critical paths of all tree adders. The drawback with the Kogge-Stone adder implementation is the
large area consumed and the more complex routing (Fan-Out) of interconnects. The two signals that are
generated during various stages:
Propagate: controls whether a carry is propagated from lower bits to higher bits.
Generate: controls whether a carry is generated.
PARALLEL-PREFIX ADDER STRUCTURE

Parallel-prefix structures are found to be common in high performance adders because of the delay is
logarithmically proportional to the adder width.
PPA’s basically consists of 3 stages
• Pre computation
• Prefix stage
• Final computation
The Parallel-Prefix Structure is shown in figure 2.
A. Pre computation
19
In pre computation stage, propagates and generates are computed for the given inputs using the given
equations (1) and (2).
B. Prefix stage
In the prefix stage, group generate/propagate signals are computed at each bit using the given equations. The
black cell (BC) generates the ordered pair in equation (7), the gray cell (GC) generates only left signal,
following.
More practically, the equations (10) and (11) can be expressed using a symbol “o “denoted by Brent and
Kung. Its function is exactly the same as that of a black cell i.e.
The "o" operation will help make the rules of building prefix structures.
C. Final computation
In the final computation, the sum and carryout are the final output.
Where “-1” is the position of carry-input. The generate/propagate signals can be grouped in different fashion
to get the same correct carries. Based on different ways of grouping the generate/propagate signals, different
prefix architectures can be created. Figure 3 shows the definitions of cells that are used in prefix structures,
including BC and GC.For analysis of various parallel prefix structures, see [2], [3] & [4].
20
The 16 bit SKA uses black cells and gray cells as well as full adder blocks too. This adder computes the
carries using the BC’s and GC’s and terminates with 4 bit RCA’s. Totally it uses 16 full adders. The 16 bit
SKA is shown in figure 4.In this adder, first the input bits (a, b) are converted as propagate and generate (p,
g). Then propagate and generate terms are given to BC’s and GC’s. The carries are propagated in advance
using these cells. Later these are given to full adder blocks.
Another PPA is known as STA is also tested [6]. Like the SKA, this adder also terminates with a RCA. It
also uses the BC’s and GC’s and full adder blocks like SKA’s but the difference is the interconnection
between them [7].The 16 bit STA is shown in the below figure 5.
Enhancements to the original implementation include increasing the radix and sparsity of the adder. The
radix of the adder refers to how many results from previous level of computation are used to generate the
next one. Doing so increases the power and delay of each stage, but reduces the number of required stages.
The sparsity of the adder refers to how many carry bits are generated by the carry-tree. Generating every
carry bit is called sparsity-1, whereas generating every other is sparsity-2 and every fourth is sparsity-4. The
resulting carries are then used as the carry-in inputs for much shorter ripple carry adders or some other adder
design, which generates the final sum bits. Increasing sparsity reduces the total needed computation and can
reduce the amount of routing congestion.
The Sparse Kogge-Stone adder consists [10] of several smaller ripple carry adders (RCAs) on its lower half
and a carry tree on its upper half. Thus, the sparse Kogge-Stone adder terminates with RCAs. The number of
carries generated is less in a Sparse Kogge- Stone adder compared to the regular Kogge-Stone adder. The
functionality of the GP block, black cell and the gray cell remains exactly the same as in the regular Kogge-
21
Stone adder. Sparse and regular Kogge-Stone adders have essentially the same delay when implemented on
an FPGA although the former utilizes much less resources.
KSA is another of prefix trees that use the fewest logic levels. A 16-bit KSA is shown in Figure 6. The 16 bit
kogge stone adder uses BC’s and GC’s and it won’t use full adders. The 16 bit KSA uses 36 BC’s and 15
GC’s. And this adder totally operates on generate and propagate blocks. So the delay is less when compared
to the previous SKA and STA. The 16 bit KSA is shown in figure 6.
In this KSA, there are no full adder blocks like SKA and STA [5] & [6]. Another carry tree known as BKA
which also uses BC’s and GC’s but less than the KSA. So it takes less area to implement than KSA. The 16
bit BKA uses 14 BC’s and 11 GC’s but kogge stone uses 36 BC’s and 15 GC’s. So BKA has less
architecture and occupies less area than KSA. The 16 bit BKA
is shown in the below figure 7.
The Kogge-Stone adder concept [3] was developed by Peter M. Kogge and Harold S. Stone, which they
published in 1973 in a seminal paper titled “A Parallel Algorithm for the Efficient Solution of a General
Class of Recurrence Equations”. First the focus is on the Kogge-Stone adder has minimal logic depth and
fan-out. The number of stages are log N, fan-out 2 at each stage and have long wires.
22
BKA occupies less area than the other 3 adders called SKA, KSA, STA. This adder uses limited number of
propagate and generate cells than the other 3 adders. It takes less area to implement than the KSA and has
less wiring congestion. The operation of the 16 bit brent kung adder is given below [3]. This adder uses less
BC’s and GC’s than kogge stone adder and has the better delay performance which is observed in agilent
1692A logic analyzer.
23
APPLIACTIONS & ADVANTAGES:
Applications:
Digital Signal Procesing ,
Filters ,
ALU’s ,
Image Signal Processing.
Advantages:
Very Fast ,
Effective Computation ,
Less time consumption ,
Moderate Power dissipation
and
power consumption.
24
CHAPTER-5
INTRODUCTION TO XILINX
5.1 Migrating Projects from Previous ISE Software Releases:
When you open a project file from a previous release, the ISE® software prompts you to migrate
your project. If you click Backup and Migrate or Migrate Only, the software automatically converts your
project file to the current release. If you click Cancel, the software does not convert your project and,
instead, opens Project Navigator with no project loaded.
Note: After you convert your project, you cannot open it in previous versions of the ISE software, such as
the ISE 11 software. However, you can optionally create a backup of the original project as part of project
migration, as described below.
To Migrate a Project
• In the ISE 12 Project Navigator, select File > Open Project.
• In the Open Project dialog box, select the .xise file to migrate.
Note You may need to change the extension in the Files of type field to display .npl (ISE 5 and ISE
6 software) or .ise (ISE 7 through ISE 10 software) project files.
• In the dialog box that appears, select Backup and Migrate or Migrate Only.
• The ISE software automatically converts your project to an ISE 12 project.
Note If you chose to Backup and Migrate, a backup of the original project is created at
project_name_ise12migration.zip.
• Implement the design using the new version of the software.
Note Implementation status is not maintained after migration.
5.2 Properties:
For information on properties that have changed in the ISE 12 software, see ISE 11 to ISE 12
Properties Conversion.
25
5.3 IP Modules:
If your design includes IP modules that were created using CORE Generator™ software or Xilinx®
Platform Studio (XPS) and you need to modify these modules, you may be required to update the core.
However, if the core netlist is present and you do not need to modify the core, updates are not required and
the existing netlist is used during implementation.
5.4 Obsolete Source File Types:
The ISE 12 software supports all of the source types that were supported in the ISE 11 software.
If you are working with projects from previous releases, state diagram source files (.dia), ABEL
source files (.abl), and test bench waveform source files (.tbw) are no longer supported. For state diagram
and ABEL source files, the software finds an associated HDL file and adds it to the project, if possible. For
test bench waveform files, the software automatically converts the TBW file to an HDL test bench and adds
it to the project. To convert a TBW file after project migration, see Converting a TBW File to an HDL Test
Bench
5.5 Using ISE Example Projects:
To help familiarize you with the ISE® software and with FPGA and CPLD designs, a set of example
designs is provided with Project Navigator. The examples show different design techniques and source
types, such as VHDL, Verilog, schematic, or EDIF, and include different constraints and IP.
To Open an Example
• Select File > Open Example.
• In the Open Example dialog box, select the Sample Project Name.
Note To help you choose an example project, the Project Description field describes each project.
In addition, you can scroll to the right to see additional fields, which provide details about the
project.
• In the Destination Directory field, enter a directory name or browse to the directory.
• Click OK.
26
The example project is extracted to the directory you specified in the Destination Directory field and
is automatically opened in Project Navigator. You can then run processes on the example project and save
any changes.
Note If you modified an example project and want to overwrite it with the original example project,
select File > Open Example, select the Sample Project Name, and specify the same Destination Directory
you originally used. In the dialog box that appears, select Overwrite the existing project and click OK.
5.6 Creating a Project:
Project Navigator allows you to manage your FPGA and CPLD designs using an ISE® project,
which contains all the source files and settings specific to your design. First, you must create a project and
then, add source files, and set process properties. After you create a project, you can run processes to
implement, constrain, and analyze your design. Project Navigator provides a wizard to help you create a
project as follows.
Note If you prefer, you can create a project using the New Project dialog box instead of the New
Project Wizard. To use the New Project dialog box, deselect the Use New Project wizard option in the
ISE General page of the Preferences dialog box.
To Create a Project
• Select File > New Project to launch the New Project Wizard.
• In the Create New Project page, set the name, location, and project type, and click Next.
• For EDIF or NGC/NGO projects only: In the Import EDIF/NGC Project page, select the
input and constraint file for the project, and click Next.
• In the Project Settings page, set the device and project properties, and click Next.
• In the Project Summary page, review the information, and click Finish to create the project
Project Navigator creates the project file (project_name.xise) in the directory you specified. After
you add source files to the project, the files appear in the Hierarchy pane of the
27
5.7 Design panel:
Project Navigator manages your project based on the design properties (top-level module type,
device type, synthesis tool, and language) you selected when you created the project. It organizes all the
parts of your design and keeps track of the processes necessary to move the design from design entry
through implementation to programming the targeted Xilinx® device.
Note For information on changing design properties, see Changing Design Properties.
You can now perform any of the following:
• Create new source files for your project.
• Add existing source files to your project.
• Run processes on your source files.
Modify process properties.
5.8 Creating a Copy of a Project:
You can create a copy of a project to experiment with different source options and implementations.
Depending on your needs, the design source files for the copied project and their location can vary as
follows:
• Design source files are left in their existing location, and the copied project points to these
files.
• Design source files, including generated files, are copied and placed in a specified directory.
• Design source files, excluding generated files, are copied and placed in a specified directory.
Copied projects are the same as other projects in both form and function. For example, you can do the
following with copied projects:
• Open the copied project using the File > Open Project menu command.
• View, modify, and implement the copied project.
• Use the Project Browser to view key summary data for the copied project and then, open the
copied project for further analysis and implementation, as described in
28
• Using the Project Browser:
Alternatively, you can create an archive of your project, which puts all of the project contents into a
ZIP file. Archived projects must be unzipped before being opened in Project Navigator. For information on
archiving, see Creating a Project Archive.
To Create a Copy of a Project
• Select File > Copy Project.
• In the Copy Project dialog box, enter the Name for the copy.
Note The name for the copy can be the same as the name for the project, as long as you specify a
different location.
• Enter a directory Location to store the copied project.
• Optionally, enter a Working directory.
By default, this is blank, and the working directory is the same as the project directory. However,
you can specify a working directory if you want to keep your ISE® project file (.xise extension)
separate from your working area.
• Optionally, enter a Description for the copy.
The description can be useful in identifying key traits of the project for reference later.
• In the Source options area, do the following:
Select one of the following options:
• Keep sources in their current locations - to leave the design source files in their existing location.
If you select this option, the copied project points to the files in their existing location. If you edit the
files in the copied project, the changes also appear in the original project, because the source files are shared
between the two projects.
• Copy sources to the new location - to make a copy of all the design source files and place them in
the specified Location directory.
If you select this option, the copied project points to the files in the specified directory. If you edit the
files in the copied project, the changes do not appear in the original project, because the source files are not
shared between the two projects.

29
Optionally, select Copy files from Macro Search Path directories to copy files from the directories
you specify in the Macro Search Path property in theTranslate Properties dialog box. All files from the
specified directories are copied, not just the files used by the design.
Note: If you added a net list source file directly to the project as described in Working with Net list-
Based IP, the file is automatically copied as part of Copy Project because it is a project source file. Adding
net list source files to the project is the preferred method for incorporating net list modules into your design,
because the files are managed automatically by Project Navigator.
Optionally, click Copy Additional Files to copy files that were not included in the original project.
In the Copy Additional Files dialog box, use the Add Files and Remove Files buttons to update the list of
additional files to copy. Additional files are copied to the copied project location after all other files are
copied.To exclude generated files from the copy, such as implementation results and reports, select
5.9 Exclude generated files from the copy:
When you select this option, the copied project opens in a state in which processes have not yet been
run.
• To automatically open the copy after creating it, select Open the copied project.
Note By default, this option is disabled. If you leave this option disabled, the original project
remains open after the copy is made.
Click OK.
5.10 Creating a Project Archive:

30
A project archive is a single, compressed ZIP file with a .zip extension. By default, it contains all
project files, source files, and generated files, including the following:
• User-added sources and associated files
• Remote sources
• Verilog ìnclude files
• Files in the macro search path
• Generated files
• Non-project files
5.11 To Archive a Project:
• Select Project > Archive.
• In the Project Archive dialog box, specify a file name and directory for the ZIP file.
• Optionally, select Exclude generated files from the archive to exclude generated files and
non-project files from the archive.
• Click OK.
A ZIP file is created in the specified directory. To open the archived project, you must first unzip the
ZIP file, and then, you can open the project.
Note Sources that reside outside of the project directory are copied into a remote_sources subdirectory in the
project archive. When the archive is unzipped and opened, you must either specify the location of these files
in the remote_sources subdirectory for the unzipped project, or manually copy the sources into their original
location.
CHAPTER- 6
31
INTRODUCTION TO VERILOG
In the semiconductor and electronic design industry, Verilog is a hardware description language(HDL)

used to model electronic systems. Verilog HDL, not to be confused with VHDL (a competing language), is
most commonly used in the design, verification, and implementation of digital logic chips at the register-
transfer level of abstraction. It is also used in the verification of analog and mixed-signal circuits.
• Overview
Hardware description languages such as Verilog differ from software programming languages because

they include ways of describing the propagation of time and signal dependencies (sensitivity). There are two
assignment operators, a blocking assignment (=), and a non-blocking (<=) assignment. The non-blocking
assignment allows designers to describe a state-machine update without needing to declare and use
temporary storage variables (in any general programming language we need to define some temporary
storage spaces for the operands to be operated on subsequently; those are temporary storage variables). Since
these concepts are part of Verilog's language semantics, designers could quickly write descriptions of large
circuits in a relatively compact and concise form. At the time of Verilog's introduction (1984), Verilog
represented a tremendous productivity improvement for circuit designers who were already using
graphical schematic capture software and specially-written software programs to document and simulate
electronic circuits.
The designers of Verilog wanted a language with syntax similar to the C programming language,
which was already widely used in engineering software development. Verilog is case-sensitive, has a
basic pre-processor (though less sophisticated than that of ANSI C/C++), and equivalent control
flow keywords (if/else, for, while, case, etc.), and compatible operator precedence. Syntactic differences
include variable declaration (Verilog requires bit-widths on net/regtypes [clarification needed]
), demarcation of
procedural blocks (begin/end instead of curly braces {}), and many other minor differences.
A Verilog design consists of a hierarchy of modules. Modules encapsulate design hierarchy, and

communicate with other modules through a set of declared input, output, and bidirectional ports. Internally, a
module can contain any combination of the following: net/variable declarations (wire, reg, integer, etc.),
concurrent and sequential statement blocks, and instances of other modules (sub-hierarchies). Sequential
statements are placed inside a begin/end block and executed in sequential order within the block. But the
blocks themselves are executed concurrently, qualifying Verilog as a dataflow language.
Verilog's concept of 'wire' consists of both signal values (4-state: "1, 0, floating, undefined") and
strengths (strong, weak, etc.). This system allows abstract modeling of shared signal lines, where multiple
sources drive a common net. When a wire has multiple drivers, the wire's (readable) value is resolved by a
function of the source drivers and their strengths.
32
A subset of statements in the Verilog language is synthesizable. Verilog modules that conform to a
synthesizable coding style, known as RTL (register-transfer level), can be physically realized by synthesis
software. Synthesis software algorithmically transforms the (abstract) Verilog source into a net list, a
logically equivalent description consisting only of elementary logic primitives (AND, OR, NOT, flip-flops,
etc.) that are available in a specific FPGA or VLSI technology. Further manipulations to the net list
ultimately lead to a circuit fabrication blueprint (such as a photo mask set for an ASIC or a bit stream file for
an FPGA).
• History
• Beginning
Verilog was the first modern hardware description language to be invented. It was created by Phil
Moorby and PrabhuGoel during the winter of 1983/1984. The wording for this process was "Automated
Integrated Design Systems" (later renamed to Gateway Design Automation in 1985) as a hardware modeling
language. Gateway Design Automation was purchased by Cadence Design Systems in 1990. Cadence now
has full proprietary rights to Gateway's Verilog and the Verilog-XL, the HDL-simulator that would become
the de-facto standard (of Verilog logic simulators) for the next decade. Originally, Verilog was intended to
describe and allow simulation; only afterwards was support for synthesis added.
• Verilog-95
With the increasing success of VHDL at the time, Cadence decided to make the language available
for open standardization. Cadence transferred Verilog into the public domain under the Open Verilog
International (OVI) (now known as Accellera) organization. Verilog was later submitted to IEEE and
became IEEE Standard 1364-1995, commonly referred to as Verilog-95.
In the same time frame Cadence initiated the creation of Verilog-A to put standards support behind
its analog simulator Spectre. Verilog-A was never intended to be a standalone language and is a subset
of Verilog-AMS which encompassed Verilog-95.
• Verilog 2001
Extensions to Verilog-95 were submitted back to IEEE to cover the deficiencies that users had found
in the original Verilog standard. These extensions became IEEE Standard 1364-2001 known as Verilog-
2001.
Verilog-2001 is a significant upgrade from Verilog-95. First, it adds explicit support for (2's
complement) signed nets and variables. Previously, code authors had to perform signed operations using
awkward bit-level manipulations (for example, the carry-out bit of a simple 8-bit addition required an
explicit description of the Boolean algebra to determine its correct value). The same function under Verilog-
2001 can be more succinctly described by one of the built-in operators: +, -, /, *, >>>. A
generate/endgenerate construct (similar to VHDL's generate/endgenerate) allows Verilog-2001 to control
instance and statement instantiation through normal decision operators (case/if/else). Using
33
generate/endgenerate, Verilog-2001 can instantiate an array of instances, with control over the connectivity
of the individual instances. File I/O has been improved by several new system tasks. And finally, a few
syntax additions were introduced to improve code readability (e.g. always @*, named parameter override,
C-style function/task/module header declaration).
Verilog-2001 is the dominant flavor of Verilog supported by the majority of

commercial EDA software packages.
• Verilog 2005
Not to be confused with SystemVerilog, Verilog 2005 (IEEE Standard 1364-2005) consists of minor
corrections, spec clarifications, and a few new language features (such as the uwire keyword).
A separate part of the Verilog standard, Verilog-AMS, attempts to integrate analog and mixed signal
modeling with traditional Verilog.
• SystemVerilog
SystemVerilog is a superset of Verilog-2005, with many new features and capabilities to aid design
verification and design modeling. As of 2009, the SystemVerilog and Verilog language standards were
merged into SystemVerilog 2009 (IEEE Standard 1800-2009).
The advent of hardware verification languages such as OpenVera, and Verisity's e language encouraged the

development of Superlog by Co-Design Automation Inc. Co-Design Automation Inc was later purchased
by Synopsys. The foundations of Superlog and Vera were donated to Accellera, which later became the
IEEE standard P1800-2005: SystemVerilog.
In the late 1990s, the Verilog Hardware Description Language (HDL) became the most widely used
language for describing hardware for simulation and synthesis. However, the first two versions standardized
by the IEEE (1364-1995 and 1364-2001) had only simple constructs for creating tests. As design sizes
outgrew the verification capabilities of the language, commercial Hardware Verification Languages (HVL)
such as Open Vera and ewere created. Companies that did not want to pay for these tools instead spent
hundreds of man-years creating their own custom tools. This productivity crisis (along with a similar one on
the design side) led to the creation of Accellera, a consortium of EDA companies and users who wanted to
create the next generation of Verilog. The donation of the Open-Vera language formed the basis for the HVL
features of SystemVerilog.Accellera’s goal was met in November 2005 with the adoption of the IEEE
standard P1800-2005 for SystemVerilog, IEEE (2005).
The most valuable benefit of SystemVerilog is that it allows the user to construct reliable, repeatable
verification environments, in a consistent syntax, that can be used across multiple projects
Some of the typical features of an HVL that distinguish it from a Hardware Description Language such as
Verilog or VHDL are
• Constrained-random stimulus generation
• Functional coverage
34
• Higher-level structures, especially Object Oriented Programming
• Multi-threading and interprocess communication
• Support for HDL types such as Verilog’s 4-state values
• Tight integration with event-simulator for control of the design
There are many other useful features, but these allow you to create test benches at a higher level of
abstraction than you are able to achieve with an HDL or a programming language such as C.
System Verilog provides the best framework to achieve coverage-driven verification (CDV). CDV
combines automatic test generation, self-checking testbenches, and coverage metrics to significantly reduce
the time spent verifying a design. The purpose of CDV is to:
• Eliminate the effort and time spent creating hundreds of tests.
• Ensure thorough verification using up-front goal setting.
• Receive early error notifications and deploy run-time checking and error analysis to simplify
debugging.
35
Examples
Ex1: A hello world program looks like this:

module main;
initial
begin
$display("Hello world!");
$finish;
end
endmodule
Ex2: A simple example of two flip-flops follows:
moduletoplevel(clock,reset);
input clock;
input reset;
reg flop1;
reg flop2;
always@(posedge reset orposedge clock)

if(reset)
begin
flop1 <=0;
flop2 <=1;
end
else
begin
flop1 <= flop2;
flop2 <= flop1;
end
endmodule
The "<=" operator in Verilog is another aspect of its being a hardware description language as
opposed to a normal procedural language. This is known as a "non-blocking" assignment. Its action doesn't
register until the next clock cycle. This means that the order of the assignments are irrelevant and will
produce the same result: flop1 and flop2 will swap values every clock.
The other assignment operator, "=", is referred to as a blocking assignment. When "=" assignment is
used, for the purposes of logic, the target variable is updated immediately. In the above example, had the
statements used the "=" blocking operator instead of "<=", flop1 and flop2 would not have been swapped.
Instead, as in traditional programming, the compiler would understand to simply set flop1 equal to flop2
(and subsequently ignore the redundant logic to set flop2 equal to flop1.)
Ex3: An example counter circuit follows:
module Div20x (rst,clk,cet,cep, count,tc);

// TITLE 'Divide-by-20 Counter with enables'
// enable CEP is a clock enable only
// enable CET is a clock enable and
// enables the TC output
// a counter using the Verilog language
36
parameter size =5;
parameter length =20;
inputrst;// These inputs/outputs represent

inputclk;// connections to the module.
inputcet;
inputcep;
output[size-1:0] count;
outputtc;
reg[size-1:0] count;// Signals assigned

// within an always
// (or initial)block
// must be of type reg
wiretc;// Other signals are of type wire
// The always statement below is a parallel

// execution statement that
// executes any time the signals
// rst or clk transition from low to high
always@(posedgeclkorposedgerst)
if(rst)// This causes reset of the cntr
count<={size{1'b0}};
else
if(cet&&cep)// Enables both true
begin
if(count == length-1)
count<={size{1'b0}};
else
count<= count +1'b1;
end
// the value of tc is continuously assigned

// the value of the expression
assigntc=(cet&&(count == length-1));
endmodule
Ex4: An example of delays:

37
...
reg a, b, c, d;
wire e;
...
always@(b or e)
begin
a = b & e;
b = a | b;
#5 c = b;
d =#6 c ^ e;
end
The always clause above illustrates the other type of method of use, i.e. the always clause executes
any time any of the entities in the list change, i.e. the b or e change. When one of these changes, immediately
a is assigned a new value, and due to the blocking assignment b is assigned a new value afterward (taking
into account the new value of a.) After a delay of 5 time units, c is assigned the value of b and the value of c
^ e is tucked away in an invisible store. Then after 6 more time units, d is assigned the value that was tucked
away.
Signals that are driven from within a process (an initial or always block) must be of type reg. Signals that are
driven from outside a process must be of type wire. The keyword reg does not necessarily imply a hardware
register.
7.3 Constants
The definition of constants in Verilog supports the addition of a width parameter. The basic syntax is:
<Width in bits>'<base letter><number>
Examples:
• 12'h123 - Hexadecimal 123 (using 12 bits)
• 20'd44 - Decimal 44 (using 20 bits - 0 extension is automatic)
• 4'b1010 - Binary 1010 (using 4 bits)
• 6'o77 - Octal 77 (using 6 bits)
Synthesizable Constructs
There are several statements in Verilog that have no analog in real hardware, e.g. $display.
Consequently, much of the language can not be used to describe hardware. The examples presented here are
the classic subset of the language that has a direct mapping to real gates.
// Mux examples - Three ways to do the same thing.

// The first example uses continuous assignment
wire out;
assign out =sel?a : b;
// the second example uses a procedure
38
// to accomplish the same thing.
reg out;
always@(a or b orsel)
begin
case(sel)
1'b0: out = b;
1'b1: out = a;
endcase
end
// Finally - you can use if/else in a
// procedural structure.
reg out;
always@(a or b orsel)
if(sel)
out= a;
else
out= b;
The next interesting structure is a transparent latch; it will pass the input to the output when the gate
signal is set for "pass-through", and captures the input and stores it upon transition of the gate signal to
"hold". The output will remain stable regardless of the input signal while the gate is set to "hold". In the
example below the "pass-through" level of the gate would be when the value of the if clause is true, i.e. gate
= 1. This is read "if gate is true, the din is fed to latch_out continuously." Once the if clause is false, the last
value at latch_out will remain and is independent of the value of din.
EX6: // Transparent latch example

reg out;
always@(gate or din)
if(gate)
out= din;// Pass through state
// Note that the else isn't required here. The variable
// out will follow the value of din while gate is high.
// When gate goes low, out will remain constant.
The flip-flop is the next significant template; in Verilog, the D-flop is the simplest, and it can be modeled as:
reg q;
always@(posedgeclk)
q <= d;
39
The significant thing to notice in the example is the use of the non-blocking assignment. A basic rule
of thumb is to use <= when there is a posedge or negedge statement within the always clause.
A variant of the D-flop is one with an asynchronous reset; there is a convention that the reset state
will be the first if clause within the statement.
reg q;
always@(posedgeclkorposedge reset)
if(reset)
q <=0;
else
q <= d;
The next variant is including both an asynchronous reset and asynchronous set condition; again the
convention comes into play, i.e. the reset term is followed by the set term.
reg q;
always@(posedgeclkorposedge reset orposedge set)
if(reset)
q <=0;
else
if(set)
q <=1;
else
q <= d;
Note: If this model is used to model a Set/Reset flip flop then simulation errors can result. Consider
the following test sequence of events. 1) reset goes high 2) clk goes high 3) set goes high 4) clk goes high
again 5) reset goes low followed by 6) set going low. Assume no setup and hold violations.
In this example the always @ statement would first execute when the rising edge of reset occurs
which would place q to a value of 0. The next time the always block executes would be the rising edge of clk
which again would keep q at a value of 0. The always block then executes when set goes high which because
reset is high forces q to remain at 0. This condition may or may not be correct depending on the actual flip
flop. However, this is not the main problem with this model. Notice that when reset goes low, that set is still
high. In a real flip flop this will cause the output to go to a 1. However, in this model it will not occur
because the always block is triggered by rising edges of set and reset - not levels. A different approach may
be necessary for set/reset flip flops.
Note that there are no "initial" blocks mentioned in this description. There is a split between FPGA
and ASIC synthesis tools on this structure. FPGA tools allow initial blocks where reg values are established
instead of using a "reset" signal. ASIC synthesis tools don't support such a statement. The reason is that an
FPGA's initial state is something that is downloaded into the memory tables of the FPGA. An ASIC is an
actual hardware implementation.
40
Initial Vs Always:
There are two separate ways of declaring a Verilog process. These are the always and
the initial keywords. The always keyword indicates a free-running process. The initial keyword indicates a
process executes exactly once. Both constructs begin execution at simulator time 0, and both execute until
the end of the block. Once an always block has reached its end, it is rescheduled (again). It is a common
misconception to believe that an initial block will execute before an always block. In fact, it is better to think
of the initial-block as a special-case of the always-block, one which terminates after it completes for the
first time.
//Examples:
initial
begin
a =1;// Assign a value to reg a at time 0
#1;// Wait 1 time unit
b = a;// Assign the value of reg a to reg b
end
always@(a or b)// Any time a or b CHANGE, run the process

begin
if(a)
c = b;
else
d =~b;
end// Done with this block, now return to the top (i.e. the @ event-control)
always@(posedge a)// Run whenever reg a has a low to high change

a <= b;
These are the classic uses for these two keywords, but there are two significant additional uses. The
most common of these is an alwayskeyword without the @(...) sensitivity list. It is possible to use always as
shown below:
always
begin// Always begins executing at time 0 and NEVER stops
clk=0;// Set clk to 0
#1;// Wait for 1 time unit
clk=1;// Set clk to 1
#1;// Wait 1 time unit
end// Keeps executing - so continue back at the top of the begin
The always keyword acts similar to the "C" construct while(1) {..} in the sense that it will execute
forever.
The other interesting exception is the use of the initial keyword with the addition of
the forever keyword.
Race Condition
41
The order of execution isn't always guaranteed within Verilog. This can best be illustrated by a
classic example. Consider the code snippet below:
initial
a =0;
initial
b = a;
initial
begin
#1;
$display("Value a=%b Value of b=%b",a,b);
end
What will be printed out for the values of a and b? Depending on the order of execution of the initial blocks,
it could be zero and zero, or alternately zero and some other arbitrary uninitialized value. The $display
statement will always execute after both assignment blocks have completed, due to the #1 delay.
Operators
Note: These operators are not shown in order of precedence.

42
Operator type Operator symbols O p e r a t i o n p e r f o r m e d
~ B i t w i s e N O T ( 1 ' s c o m p l e m e n t )
& B i t w i s e A N D
B i t w i s e | B i t w i s e O R
^ B i t w i s e X O R
~ ^ o r ^ ~ B i t w i s e X N O R
! N O T
L o g i c a l & & A N D
| | O R
& R e d u c t i o n A N D
~ & R e d u c t i o n N A N D
| R e d u c t i o n O R
Reduction
~ | R e d u c t i o n N O R
^ R e d u c t i o n X O R
~ ^ o r ^ ~ R e d u c t i o n X N O R
Arithmetic + A d d i t i o n
- S u b t r a c t i o n
- 2 ' s c o m p l e m e n t
* M u l t i p l i c a t i o n
43
/ D i v i s i o n
* * E x p o n e n t i a t i o n ( * V e r i l o g - 2 0 0 1 )
> G r e a t e r t h a n
< L e s s t h a n
> = G r e a t e r t h a n o r e q u a l t o
< = L e s s t h a n o r e q u a l t o
Relational
= = Logical equality (bit-value 1'bX is removed from comparison)
! = Logical inequality (bit-value 1'bX is removed from comparison)
= = = 4-state logical equality (bit-value 1'bX is taken as literal )
! = = 4-state logical inequality (bit-value 1'bX is taken as literal )
> > L o g i c a l r i g h t s h i f t
< < L o g i c a l l e f t s h i f t
S h i f t
> > > Arithmetic right shift (*Verilog-2001)
< < < Arithmetic left shift (*Verilog-2001)
Concatenation { , } C o n c a t e n a t i o n
Replication { n { m } } R e p l i c a t e v a l u e m f o r n t i m e s
Conditional ? : C o n d i t i o n a l
System Tasks:
System tasks are available to handle simple I/O, and various design measurement functions. All system tasks
are prefixed with $ to distinguish them from user tasks and functions. This section presents a short list of the
most often used tasks. It is by no means a comprehensive list.
44
• $display - Print to screen a line followed by an automatic newline.
• $write - Write to screen a line without the newline.
• $swrite - Print to variable a line without the newline.
• $sscanf - Read from variable a format-specified string. (*Verilog-2001)
• $fopen - Open a handle to a file (read or write)
• $fdisplay - Write to file a line followed by an automatic newline.
• $fwrite - Write to file a line without the newline.
• $fscanf - Read from file a format-specified string. (*Verilog-2001)
• $fclose - Close and release an open file handle.
• $readmemh - Read hex file content into a memory array.
• $readmemb - Read binary file content into a memory array.
• $monitor - Print out all the listed variables when any change value.
• $time - Value of current simulation time.
• $dumpfile - Declare the VCD (Value Change Dump) format output file name.
• $dumpvars - Turn on and dump the variables.
• $dumpports - Turn on and dump the variables in Extended-VCD format.
• $random - Return a random value.
45
CHAPTER-7
INTRODUCTION TO FPGA
As described in Architectural Overview, the Spartan™-3E FPGA architecture consists of five fundamental
functional elements:
• Input/Output Blocks (IOBs)
• Configurable Logic Block (CLB) and Slice Resources
• Block RAM
• Dedicated Multipliers
• Digital Clock Managers (DCMs)
The following sections provide detailed information on each of these functions. In addition, this section also
describes the following functions:
• Clocking Infrastructure
• Interconnect
• Configuration
• Powering Spartan-3E FPGAs
Input/ Output Blocks (IOBs)
For additional information, refer to the “Using I/O Resources” chapter in UG331.
IOB Overview
The Input/ Output Block (IOB) provides a programmable, unidirectional or bidirectional interface between a
package pin and the FPGA’s internal logic. The IOB is similar to that of the Spartan-3 family with the
following differences:
• Input-only blocks are added
• Programmable input delays are added to all blocks
• DDR flip-flops can be shared between adjacent IOBs
The unidirectional input-only block has a subset of the full IOB capabilities. Thus there are no connections
or logic for an output path. The following paragraphs assume that any reference to output functionality does
not apply to the input-only blocks. The number of input-only blocks varies with device size, but is never
more than 25% of the total IOB count.
Figure 4.1 is a simplified diagram of the IOB’s internal structure. There are three main signal paths
within the IOB: the output path, input path, and 3-state path. Each path has its own pair of storage elements
that can act as either registers or latches. For more information, see Storage Element Functions.
The three main signal paths are as follows:
• The input path carries data from the pad, which is bonded to a package pin, through an optional
programmable delay element directly to the I line. After the delay element, there are alternate routes through
a pair of storage elements to the IQ1 and IQ2 lines. The IOB outputs I, IQ1, and IQ2 lead to the FPGA’s
internal logic. The delay element can be set to ensure a hold time of zero (see Input Delay Functions).
• The output path, starting with the O1 and O2 lines, carries data from the FPGA’s internal logic through a
multiplexer and then a three-state driver to the IOB pad. In addition to this direct path, the multiplexer
provides the option to insert a pair of storage elements.
• The 3-state path determines when the output driver is high impedance. The T1 and T2 lines carry data from
nthe FPGA’s internal logic through a multiplexer to the output driver. In addition to this direct path, the
multiplexer provides the option to insert a pair of storage elements.
46
• All signal paths entering the IOB, including those associated with the storage elements, have an inverter
option. Any inverter placed on these paths is automatically absorbed into the IOB.
Fig 7.1: Simplified IOB Diagram
47
7.2 Configurable logic block (CLB) and slice resources
For additional information, refer to the “Using Configurable Logic Blocks (CLBs)” chapter in UG331.
7.2.1 CLB Overview

The Configurable Logic Blocks (CLBs) constitute the main logic resource for implementing
synchronous as well as combinatorial circuits. Each CLB contains four slices, and each slice contains two
Look-Up Tables (LUTs) to implement logic and two dedicated storage elements that can be used as flip-
flops or latches. The LUTs can be used as a 16x1 memory (RAM16) or as a 16-bit shift register (SRL16),
and additional multiplexers and carry logic simplify wide logic and arithmetic functions. Most general-
purpose logic in a design is automatically mapped to the slice resources in the CLBs. Each CLB is identical,
and the Spartan-3E family CLB structure is identical to that for the Spartan-3 family.
7.2.2 CLB Array
The CLBs are arranged in a regular array of rows and columns as shown in Figure 4.Each density
varies by the number of rows and columns of CLBs (see Table 9).
Table 2: Spartan-3E CLB Resources
Fig 7.2 spartan3e CLB
7.3 Slices
48
Each CLB comprises four interconnected slices, as shown in Figure 5. These slices are grouped in
pairs. Each pair is organized as a column with an independent carry chain. The left pair supports both logic
and memory functions and its slices are called SLICEM. The right pair supports logic only and its slices are
called SLICEL. Therefore half the LUTs support both logic and memory (including both RAM16 and
SRL16 shift registers) while half support logic only, and the two types alternate throughout the array
columns. The SLICEL reduces the size of the CLB and lowers the cost of the device, and can also provide a
performance advantage over the SLICEM.
49
Fig7.3 Simplified Diagram of the Left-Hand SLICEM
50
Fig 7.4: Arrangement of Slices within the CLB
7.3.1 Slice Location Designations

The Xilinx development software designates the location of a slice according to its X and Y
coordinates, starting in the bottom left corner, as shown in Figure 4. The letter ‘X’ followed by a number
identifies columns of slices, incrementing from the left side of the die to the right. The letter ‘Y’ followed by
a number identifies the position of each slice in a pair as well as indicating the CLB row, incrementing from
he bottom of the die. Figure 6 shows the CLB located in he lower left-hand corner of the die. The SLICEM
always as an even ‘X’ number and the LICEL always has an odd X’ number.
7.3.2 Slice Overview

A slice includes two LUT function generators and two storage elements, along with additional logic, as
shown in Figure 7.4 .Both SLICEM and SLICEL have the following elements in common to provide logic,
arithmetic, and ROM functions:
• Two 4-input LUT function generators, F and G
• Two storage elements
• Two wide-function multiplexers, F5MUX and FiMUX
• Carry and arithmetic logic
51
Fig 7.5 :Resources in a Slice
The SLICEM pair supports two additional functions:
• Two 16x1 distributed RAM blocks, RAM16
• Two 16-bit shift registers, SRL16
Each of these elements is described in more detail in the following sections.
7.3.3 Logic Cells
The combination of a LUT and a storage element is known as a “Logic Cell”. The additional features
in a slice, such as the wide multiplexers, carry logic, and arithmetic gates, add to the capacity of a slice,
implementing logic that would otherwise require additional LUTs. Benchmarks have shown that the overall
slice is equivalent to 2.25 simple logic cells. This calculation provides the equivalent logic cell count shown
in Table .
7.3.4 Slice Details
Figure 7.4 is a detailed diagram of the SLICEM. It represents a superset of the elements and
connections to be found in all slices. The dashed and gray lines (blue when viewed in color) indicate the
resources found only in the SLICEM and not in the SLICEL. Each slice has two halves, which are
differentiated as top and bottom to keep them distinct from the upper and lower slices in a CLB. The control
inputs for the clock (CLK), Clock Enable (CE), Slice Write Enable (SLICEWE1), and Reset/Set (RS) are
shared in common between the two halves. The LUTs located in the top and bottom portions of the slice are
referred to as "G" and "F", respectively, or the "G-LUT" and the "F-LUT". The storage elements in the top
and bottom portions of the slice are called FFY and FFX, respectively.
Each slice has two multiplexers with F5MUX in the bottom portion of the slice and FiMUX in the
top portion. Depending on the slice, the FiMUX takes on the name F6MUX, F7MUX, or F8MUX, according
to its position in the multiplexer chain. The lower SLICEL and SLICEM both have an F6MUX. The upper
SLICEM has an F7MUX, and the upper SLICEL has an F8MUX. The carry chain enters the bottom of the
slice as CIN and exits at the top as COUT. Five multiplexers control the chain: CYINIT, CY0F, and
CYMUXF in the bottom portion and CY0G and CYMUXG in the top portion. The dedicated arithmetic
logic includes the exclusive-OR gates XORF and XORG (bottom and top portions of the slice, respectively)
as well as the AND gates FAND and GAND (bottom and top portions, respectively). See Table for a
description of all the slice input and output signals.
52
7.4 Interconnect
For additional information, refer to the “Using Interconnect “chapter in UG331. Interconnect is the
programmable network of signal pathways between the inputs and outputs of functional elements within the
FPGA, such as IOBs, CLBs, DCMs, and block RAM.
7.4.1 Overview
Interconnect, also called routing, is segmented for optimal connectivity. Functionally, interconnect
resources are identical to that of the Spartan-3 architecture. There are four kinds of interconnects: long lines,
hex lines, double lines, and direct lines. The Xilinx Place and Route (PAR) software exploits the rich
interconnect array to deliver optimal system performance and the fastest compile times.
Fig 7.6: Four Types of Interconnect Tiles (CLBs, IOBs, DCMs, and Block RAM/Multiplier)
53
CHAPTER-8
INTRODUCTION TO SPARTAN 3E-KIT
8.1 Introduction
The Basys board is a circuit design and implementation platform that anyone can use to gain experience
building real digital circuits. Built around a Xilinx Spartan-3E Field Programmable Gate Array and a
Cypress EZUSB controller, the Basys board provides complete, ready-to-use hardware suitable for hosting
circuits ranging from basic logic devices to complex controllers. A large collection of on-board I/O devices
and all required FPGA support circuits are included, so countless designs can be created without the need for
any other components.
Four standard expansion connectors allow designs to grow beyond the Basys board using
breadboards, user-designed circuit boards, or Pmods (Pmods are inexpensive analog and digital I/O modules
that offer A/D & D/A conversion, motor drivers, sensor inputs, and many other features). Signals on the 6-
pin connectors are protected against ESD damage and short-circuits, ensuring a long operating life in any
environment. The Basys board works seamlessly with all versions of the Xilinx ISE tools, including the free
WebPack. It ships with a USB cable that provides power and a programming interface, so no other power
supplies or programming cables are required
Figure 8.1 Basys programming circuit locations
8.2 Board power

54
The Basys board is typically powered from a USB cable, but a power jack and battery connector are
also provided so that external supplies can be used. To use USB power, set the power source switch (SW8)
to USB and attach the USB cable. To use an external wall-plug power supply, set SW8 to EXT and attach a
5VDC to 9VDC supply to the center-positive, 2.1/5.5mm power jack. To use battery power, set SW8 to EXT
and attach a 4V-9V battery pack to the 2-pin, 100-mil spaced battery connector (four AA cells in series make
a good 6+/- volt supply). Voltages higher than 9V on either power connector may cause permanent damage.
SW8 can also be used to turn off main power by setting it to the unused power input (e.g., if USB power is
used, setting SW8 to EXT will shut off board power without unplugging the USB cable).
Input power is routed through the power switch (SW8) to the four 6- pin expansion connectors and to a
National Semiconductor LP8345 voltage regulator. The LP8345 produces the main 3.3V supply for the
board, and it also drives secondary regulators to produce the 2.5V and 1.2V supply voltages required by the
FPGA. Total board current is dependant on FPGA configuration, clock frequency, and external connections.
In test circuits with roughly 20K gates routed, a 50MHz clock source, and all LEDs illuminated, about
100mA of current is drawn from the 1.2V supply, 50mA from the 2.5V supply, and 50mA from the 3.3V
supply. Required current will increase if larger circuits are configured in the FPGA, or if peripheral boards
are attached.
The Basys board uses a four layer PCB, with the inner layers dedicated to VCC and GND planes.
The FPGA and the other ICs on the board have large complements of ceramic bypass capacitors placed as
close as possible to each VCC pin, resulting in a very clean, low-noise power supply.
8.3 Configuration
After power-on, the FPGA on the Basys board must be configured before it can perform any useful
functions. During configuration, a “bit” file is transferred into memory cells within the FPGA to define the
logical functions and circuit interconnects. The free ISE/WebPack CAD software from Xilinx can be used to
create bit files from VHDL, Verilog, or schematic-based source files.
Digilent’s PC-based program called Adept can be used to configure the FPGA with any suitable bit file
stored on the computer. Adept uses the USB cable to transfer a selected bit file from the PC to the FPGA
(via the FPGA’s JTAG programming port). Adept can also program a bit file into an on-board non-volatile
ROM called “Platform Flash”. Once programmed, the Platform Flash can automatically transfer a stored bit
file to the FPGA at a subsequent power-on or reset event if the Mode Jumper is set to ROM. The FPGA will
remain configured until it is reset by a power-cycle event or by the FPGA reset button (BTNR) being
pressed. The Platform Flash ROM will retain a bit file until it is reprogrammed, regardless of power-cycle
events.
To program the Basys board, attach the USB cable to the board (if USB power will not be used,
attach a suitable power supply to the power jack or battery connector on the board, and set the power switch
to VEXT). Start the Adept software, and wait for the FPGA and the Platform Flash ROM to be recognized.
Use the browse function to associate the desired .bit file with the FPGA, and/or the desired .mcs file with the
55
Platform Flash ROM. Right-click on the device to be programmed, and select the “program” function. The
configuration file will be sent to the FPGA or Platform Flash, and the software will indicate whether
programming was successful. The “configuration done” LED (LD_D) will also illuminate after the FPGA
has been successfully configured. For further information on using Adept, please see the Adept
documentation available at the Digilent website.
8.4 Oscillators
The Basys board includes a primary, user settable silicon oscillator that produces 25MHz, 50MHz, or
100MHz based on the position of the clock select jumper at JP4. A socket for a second oscillator is provided
at IC7 (the IC7 socket can accommodate any 3.3V CMOS oscillator in a half-size DIP package). The
primary and secondary oscillators are connected to global clock input pins at pin 54 and pin 53 respectively.
Both clock inputs can drive the clock synthesizer DLL on the Spartan 3E, allowing for a wide range if
internal frequencies, from 4 times the input frequency to any integer divisor of the input frequency.
The primary silicon oscillator is flexible and inexpensive, but it lacks the frequency stability of a crystal
oscillator. Some circuits that drive a VGA monitor may realize a slight improvement in image stability by
using a crystal oscillator installed in the IC7 socket.
8.5 User i/o

Four pushbuttons and eight slide switches are provided for circuit inputs. Pushbutton inputs are normally
low and driven high only when the pushbutton is pressed. Slide switches generate constant high or low
inputs depending on position. Pushbuttons and slide switches all have
series resistors for protection against short circuits (a short circuit would occur if an FPGA pin assigned to a
pushbutton or slide switch was inadvertently defined as an output). Eight LEDs and a four-digit
sevensegment LED display are provided for circuit outputs. LED anodes are driven from the FPGA via
current-limiting resistors, so they will illuminate when a logic ‘1’ is written to the corresponding FPGA pin.
A ninth LED is provided as a power-indicator LED, and a tenth LED (LD-D) illuminates any time the FPGA
has been successfully programmed.
8.6 Seven-segment display

Each of the four digits of the seven segment LED display is composed of seven LED segments
arranged in a “figure 8” pattern. Segment LEDs can be individually illuminated, so any one of 128 patterns
can be displayed on a digit by illuminating certain LED segments and leaving the others dark. Of these 128
possible patterns, the ten corresponding to the decimal digits are the most useful. The anodes of the seven
LEDs forming each digit are tied together into one common anode circuit node, but the LED cathodes
remain separate. The common anode signals are available as four “digit enable” input signals to the 4-digit
display. The cathodes of similar segments on all four displays are connected into seven circuit nodes labeled
56
CA through CG (so, for example, the four “D” cathodes from the four digits are grouped together into a
single circuit node called “CD”). These seven cathode signals are available as inputs to the 4-digit display.
This signal connection scheme creates a multiplexed display, where the cathode signals are common to all
digits but they can only illuminate the segments of the digit whose corresponding anode signal is asserted. A
scanning display controller circuit can be used to show a four-digit number on this display. This circuit
drives the anode signals and corresponding cathode patterns of each digit in a repeating, continuous
succession, at an update rate that is faster than the human eye response. Each digit is illuminated just one-
quarter of the time, but because the eye cannot perceive the darkening of a digit before it is illuminated
again, the digit appears continuously illuminated. If the update or “refresh” rate is slowed to a given point
(around 45 hertz), then most people will begin to see the display flicker.
For each of the four digits to appear bright and continuously illuminated, all four digits should be
driven once every 1 to 16ms (for a refresh frequency of 1KHz to 60Hz). For example, in a 60Hz refresh
scheme, the entire display would be refreshed once every 16ms, and each digit would be illuminated for ¼ of
the refresh cycle, or 4ms. The controller must assure that the correct cathode pattern is present when the
corresponding anode signal is driven. To illustrate the process, if AN1 is asserted while CB and CC are
asserted, then a “1” will be displayed in digit position 1. Then, if AN2 is asserted while CA, CB and CC are
asserted, then a “7” will be displayed in digit position 2. If A1 and CB, CC are driven for 4ms, and then A2
and CA, CB, CC are driven for 4ms in an endless succession, the display will show “17” in the first two
digits.
8.7 PS/2 PORT

The 6-pin mini-DIN connector can accommodate a PS/2 mouse or keyboard. Most PS/2 devices can
operate from a 3.3V supply, but some older devices may require a 5VDC supply. A jumper on the Basys
board (JP1) selects whether 3.3V or VU is supplied to the PS/2 connector. For 5V, set JP1 to VU and ensure
that Basys is powered with a 5VDC wall plug supply. For 3.3V, set the jumper to 3.3V. For 3.3V operation,
any board power supply (including USB) can be used. Both the mouse and keyboard use a two-wire serial
bus (clock and data) to communicate with a host device. Both use 11-bit words that include a start, stop and
odd parity bit, but the data packets are organized differently, and the keyboard interface allows bi-directional
data transfers
The clock and data signals are only driven when data transfers occur, and otherwise they are held in the
“idle” state at logic ‘1’. The timings define signal requirements for mouse-to-host communications and bi-
directional keyboard communications. A PS/2 interface circuit can be implemented in the FPGA to create a
keyboard or mouse interface.
8.8 DUMPING PROCEDURE

8.8.1 Programming through JTAG
57
For programming the FPGA we need a JTAG cable which is a 6 pin cable converted to a parallel port
cable connected to CPU, So the FPGA is programmed through this cable. And this type of programming is
called “flash programming”.
Connecting the USB Cable, The kit includes a standard USB Type A/Type B cable, similar to the one shown
in Figure 18 . The actual cable color might vary from the picture.
:
Fig 8.2: Standard USB Type A/Type B Cable
The wider and narrower Type A connector fits the USB connector at the back of the computer. After
installing the Xilinx software, connect the square Type B connector to the Spartan-3E Starter Kit board, as
shown in Figure 7. The USB connector is on the left side of the board, immediately next to the Ethernet
connector. When the board is powered on, the Windows operating system should recognize and install the
associated driver software. When the USB cable driver is successfully installed and the board is correctly
connected to the PC, a green LED lights up, indicating a good connection.
8.8.2 Programming via iMPACT
After successfully compiling an FPGA design using the Xilinx development software, the design can
be downloaded using the iMPACT programming software and the USB cable. To begin programming,
connect the USB cable to the starter kit board and apply power to the board. Then, double-click Configure
Device (iMPACT) from within Project Navigator, as shown in Figure8.
Fig 8.3: Double-Click to Invoke iMPACT

If the board is connected properly, the iMPACT programming software automatically recognizes the three
devices in the JTAG programming file, as shown in Figure 9. If not already prompted, click the first device
in the chain, the Spartan-3E FPGA, to highlight it. Right-click the FPGA and select assign New
Configuration File. Select the desired FPGA configuration file and click OK.
58
Fig 8.4 : Right-Click to Assign a Configuration File to the Spartan-3E FPGA
If the original FPGA configuration file used the default Start-Up clock source, CCLK, iMPACT issues the
warning message shown in Figure 10. This message can be safely ignored. When downloading via JTAG,
the iMPACT software must change the Start-UP clock source to use the TCK JTAG clock source.
Figure 8.5: iMPACT Issues a Warning if the Start-Up Clock Was Not CCLK
To start programming the FPGA, right-click the FPGA and select Program. The iMPACT software reports
status during programming process. Direct programming to the FPGA takes a few seconds to less than a
minute, depending on the speed of the PC’s USB port and the iMPACT settings.
Figure 8.6: Right-Click to Program the Spartan-3E FPGA
Figure 8.7: iMPACT Programming Succeeded, the FPGA’s DONE Pin is High
59
CHAPTER-9
RESULTS
Figure 9.1:Design Summary
60
Figure 9.2:Sparse Kogge-stone Adder Simulation Result
61
Figure 9.3:Brent Adder Simulation Result
62
Figure 9.4:Spanning-Tree Adder Simulation Result
63
CONCLUSION
In this paper, we proposed different parallel prefix adders which works based on three stages pre-
computation, prefix, final computation based on these three stages the proposed adders works very
efficient.so based on experiment results compared to existing which is a sparse tree adder, the proposed
architectures got less delays compare to existing, and among these proposed architectures Sparse-kogge
stone adder got very less delay and is very efficient.
64
REFERENCES
1. David H.K.Hoe, Chris Martinez and Sri Jyothsna Vundavalli”, Design and Characterization of Parallel
Prefix Adders using FPGAs”, 2011 IEEE 43rd Southeastern Symposium in pp. 168-172, 2011.
2. N. H. E. Weste and D. Harris, CMOS VLSI Design, 4th edition, Pearson–Addison-Wesley, 2011.
3. R. P. Brent and H. T. Kung, “A regular layout for parallel adders,” IEEE Trans. Comput., vol. C-31, pp.
260-264, 1982.
4. D. Harris, “A Taxonomy of Parallel Prefix Networks,” in Proc. 37th Asilomar Conf. Signals Systems and
Computers, pp. 2213–7, 2003.
5. P. M. Kogge and H. S. Stone, “A Parallel Algorithm for the Efficient Solution of a General Class of
Recurrence Equations,” IEEE Trans. On Computers, Vol. C-22, No 8, August 1973.
6. D. Gizopoulos, M. Psarakis, A. Paschalis, and Y. Zorian, “Easily Testable Cellular Carry Lookahead
Adders,” Journal of Electronic Testing: Theory and Applications 19, 285-298, 2003.
7.T. Lynch and E. E. Swartzlander, “A Spanning Tree Carry Lookahead Adder,” IEEE Trans. on Computers,
vol. 41, no. 8, pp. 931-939, Aug.
8.Beaumont-Smith, A, Cheng-Chew Lim ,”Parallel prefix adder design”, Computer Arithmetic, 2001.
Proceedings. 15th IEEE Symposium,pp. 218 – 225,2001.M. Young, The Technical Writer's Handbook. Mill
Valley, CA: University Science, 1989
9. K. Vitoroulis and A. J. Al-Khalili, “Performance of Parallel Prefix Adders Implemented with FPGA
technology,” IEEE Northeast Workshop on Circuits and Systems, pp. 498-501, Aug. 2007. 172.
10.S. Xing and W. W. H. Yu, “FPGA Adders: Performance Evaluation and Optimal Design,” IEEE Design
& Test of Computers, vol. 15, no. 1, pp. 24-29, Jan. 1998
65

Design and Estimation of Delay

Uploaded by

Copyright:

Available Formats

Design and Estimation of Delay

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Design and Estimation of Delay

Uploaded by

Copyright:

Available Formats

ABSTRACT

HDL LANGUAGE : VHDL

SIMULATION TOOL : XILINX ISE SIMULATOR

SYNTHESIS TOOL : XILINX 9.1i

composed of other gates. One example implementation is with and .

• Ripple carry adder

Carry lookahead depends on two things:

multiple-level carry lookahead adder, it is simpler to use .

) and group generate ( ) for a 4-bit CLA are:

• calculation of and is done at time 1

• calculation of is done at time 3

• calculation of the is done at time 2

• calculation of the is done at time 3

• time 0 for the first CLA

• time 5 for the second CLA

• time 5 for the third & fourth CLA

• calculation of the are done at

• time 4 for the first CLA

• time 8 for the second CLA

• time 8 for the third & fourth CLA

• Manchester carry chain

Consider the sum:

• We do not know the result of the addition.

• The basic concept

Here is an example of a binary sum:

• 0 or 1, from the number we are adding.

• 0 if the digit in our store is 0 or 2, or 1 if it is 1 or 3.

• 0 if the digit to its right is 0 or 1, or 1 if it is 2 or 3.

At each stage of a carry-save addition,

• We know the result of the addition at once.

The entire sum can then be computed by:

• Shifting the carry sequence sc left by one place.

Carry Look A Head Adder

3.2Carry Save Adder

TABLE I. CARRY SAVE ADDER COMPUTATION

Figure 1: Computation flow of Carry Save Adder

DRAWBACKS OF RIPPLE CARRY AND CARRY LOOKAHEAD ADDER

For 4 bit CLA, the propagated carry equations are given as

For example, 4 bit CLA carry equation is given by

For example, 4 bit PPA carry equation is given by

PARALLEL-PREFIX ADDER STRUCTURE

5.1 Migrating Projects from Previous ISE Software Releases:

instead, opens Project Navigator with no project loaded.

migration, as described below.

• In the ISE 12 Project Navigator, select File > Open Project.

6 software) or .ise (ISE 7 through ISE 10 software) project files.

• The ISE software automatically converts your project to an ISE 12 project.

• Implement the design using the new version of the software.

Note Implementation status is not maintained after migration.

the existing netlist is used during implementation.

5.4 Obsolete Source File Types:

5.5 Using ISE Example Projects:

• Select File > Open Example.

5.6 Creating a Project:

ISE General page of the Preferences dialog box.

through implementation to programming the targeted Xilinx® device.

Note For information on changing design properties, see Changing Design Properties.

You can now perform any of the following:

• Create new source files for your project.