CHDD part 1
CHDD part 1
CHDD part 1
Unit 1.1
Introduction to the Digital Half of the Module
The digital half of the module consists of three themes:
Part 3: Designing large scale digital systems that are easy to test
~2 hours of video lectures, examples session 3
With any ambitious manufacturing process, not all of the manufactured units will
work, so we need to test each unit before selling to a customer. However, with a very
complex system, it can be very difficult, costly and time consuming to decide whether
or not an individual unit definitely works under all possible input conditions. Modern
designs incorporate additional features to make each unit easy, quick and cheap to
test. We will look at the main approaches to design-for-test.
Suppose we want a piece of hardware that adds two 16-bit numbers together
16
a 16
16 ADD sum
b
How would we approach this using traditional methods? We would have to exercise
some ingenuity to partition this large problem into a series of smaller problems that
are easier to solve. For example, we could partition into a sequence of sixteen 1-bit
adders
a0 + sum0
b0
a1 + sum1
b1
a2 sum2
+
b2
a3 + sum3
b3
and so on …
1-2
That’s the traditional design methodology. Let’s do some evaluation of this method.
Design effort
It’s quite a lot of work to get from the design brief (add 16-bit numbers) into the
design using logic gates. It would take about 10 minutes to complete and document
the process. But the chips used in mobile phones or computer games consoles contain
millions of logic gates. If we have to put this much effort into getting out a design that
conatins only a few dozen gates, then we are in trouble. The designer productivity will
be too low; it will take too much time, and cost too much money to undertake a big
project. We need a method that gives higher productivity.
Maintainability
When you are in the midst of designing a piece of hardware, you probably have a
pretty good understanding of the design you produce. However, it’s difficult to work
with gate level designs produced by someone else, or even with designs that were
produced by you a few weeks previously. Just by glancing at the logic schematics
above, can you tell what it actually does? We need a method that gives greater clarity
as to what the designs do.
1-3
2 VHDL
VHDL is the VHSIC Hardware Description Language. (VHSIC stands for Very High
Speed Integrated Circuit). The main motivation for its creation was to provide
rigorous and unambiguous specification of modules. Gradually it developed to be
used for other purposes such as synthesis. The other important HDL is Verilog. This
is older than VHDL, and its original form was not very powerful. Over the years, it
has been enhanced and extended with extra features, so that it is now as powerful as
VHDL.
VHDL is based around the notion of being able to view the modules in a design at
different levels of abstraction. Crudely speaking, a high level of abstraction contains
little detail, and a low level of abstraction contains a lot of detail. Low-level design
requires a lot of effort, and we want to avoid this effort until we are sure that it won’t
be wasted. We need to resolve all high level issues before we commit to any low level
design.
3 Levels of Abstraction
Algorithmic
This describes the basic idea of what the design is supposed to do, without
reference to how this functionality is to be achieved. (Indeed, one of the
purposes of the design phase is to investigate different possible methods to
achieve the desired functionality). The initial specification of a design will
almost always be at the algorithmic level.
Register transfer level (RTL)
The design is conceived as a group of interconnected modules. For each
module we know three things:
What its interface to the other modules of the design is (how many inputs
and outputs, how many bits wide are they, etc.)
What the logical relationship is between the inputs and the outputs (usually
expressed as something like a Boolean logic equation)
What the timing is between the inputs and the outputs (i.e. what happens
on what clock cycle).
Gate level
The design is constructed from basic logic gates1.
There may also be a 4th level, the physical level. This might refer, for example, to the
processing of a piece of silicon that would be necessary to manufacture the required
configuration of logic gates as an application-specific integrated circuit (ASIC).
Alternatively, it might be the generation of the configuration bit stream for a field
programmable gate array (FPGA).
4 Synthesis
A synthesis CAD tool is one that automatically maps a description from one level of
the hierarchy to a lower level on the hierarchy. The main types are as follows:-
Physical synthesis
This performs the mapping from gate-level to physical level. For an ASIC, a
gate level representation is automatically translated to a mask level design of
an integrated circuit. For a PLD or an FPGA, a gate-level design is mapped to
a configuration file that controls how the fuses in the device should be blown,
1 Or whatever is the most fundamental design primitive for the hardware implementation we are
intending to use. This would Configurable Logic Blocks (CLBs) for an FPGA or a fuse map for a
CPLD.
1-4
or how the CLBs should be configured. CAD tools do this task extremely
well.
Logic synthesis
This maps an RTL description to a gate level description. CAD tools also
perform this stage extremely well.
Behavioural synthesis
This maps an algorithmic description to a register transfer or to a gate level
description. At present, automated synthesis tools cannot do this well. It is a
very active area of research, but for now, and the foreseeable future, this is a
task that must be done by humans.
Automatic synthesis leads to great productivity gains, because human designers can
confine their activities to high level design. It has often said that the output of a
designer is limited to about 10-50 items per day, fully debugged and properly
documented. This rate is true whether an item corresponds to a logic gate, a functional
unit of an RTL design, or an equation representing the behaviour of an entire digital
filter for a signal processing system. 10-50 logic gates will only form a very small part
of a system whereas 10-50 RTL functional units may be enough to describe an entire
system. The higher the level at which the designer is working, the more the designer
can produce.
Synthesis also helps us avoid the risk of putting a lot of effort into targeting one
particular manufacturing technology, only to find that we need to re-target the design
to a new technology. If all the low-level synthesis was done by humans, the re-
targeting could take thousands of man hours. If it was done by a CAD tool, then we
simply re-synthesise for the new target hardware, which means leaving a computer
running for a few days.
5 Summary
Traditional design methods have many problems. Designer productivity is too low.
Decisions about the implementation have to be made early in the design process. If
the design is re-targeted from one technology to another (say a design originally
implemented on an ASIC is moved to an FPGA) the whole design process needs to be
repeated.
HDLs also allow a specification for a module to be simulated. So we can try out
different combinations of input and see what the outputs would do. This can be done
before any detailed gate level design is attempted.
b nandgate
c
ENTITY nandgate IS
PORT ( a, b: IN STD_LOGIC; c: OUT STD_LOGIC );
END;
Now that we have described the inputs and outputs, we need to say what the device
does, i.e. how its outputs respond to its inputs. This is done in an architecture:
1-6
The symbol <= (which is meant to look like a left-pointing arrow) is pronounced
"gets". It means that the signal c gets the value of a NANDed together with the value
of b. Whenever a or b change their value, this statement causes the value of c to be
updated.
If we want to check that our description is functioning correctly, we can feed it into a
simulator, a program that predicts how the outputs would change in response to
changes in the input. Here is the sort of thing we get if we run this code through a
simulator
The horizontal axis is time, ranging from 0 to 100 ns. Traces are shown for the signals
a, b and c. Whenever a or b changes its value, c receives a new value. In order to carry
out the simulation, we need to tell the simulator what we want each of the inputs a and
b to do (in this case we have toggled each from 0 to 1 and then back to 0). The
simulator then works out what the output c would do in response. You can see that c
is carrying out the logic function a NAND b, so the design is correct.
VHDL uses the following logical operators: NOT, AND, OR, NAND, NOR, XOR
This achieves exactly the same function as the first description, but does it in a
different way.
VHDL uses the keywords BEGIN and END. So in VHDL the loop would look like
this
1-7
FOR i IN ( 1 TO N ) LOOP
BEGIN
a(i) = i;
b(i) = a(i) * a(i);
END LOOP;
Note that indentation of the block is used to make it clearer where the block starts and
ends.
4 Semicolons
Like C or Java, VHDL uses the semicolon to indicate the end of a statement.
Statements that "open up" a block don't take semicolons. So in C these would be
wrong:
ENTITY nandgate IS
PORT ( a, b: IN STD_LOGIC; c: OUT STD_LOGIC );
END;
The keyword IS is "opening up" a block of statements, and therefore does not need a
semicolon. However, note that VHDL is a little inconsistent as to whether IS needs to
be followed by a BEGIN. In an ENTITY, the BEGIN is implied, and the END
statement is answering the IS. By contrast, in an ARCHITECTURE the word BEGIN
must also be there, and the END is answering the BEGIN.
5 Stylistic issues
5.1 Case
VHDL is not case sensitive. All three of these are identical in meaning, and you’ll see
all three styles in textbooks and design magazines:
ENTITY nandgate IS
PORT ( a, b: IN STD_LOGIC; c: OUT STD_LOGIC);
END;
entity NANDGATE is
port ( A, B: in std_logic; C: out std_logic);
end;
1-8
entity nandgate is
port ( a, b: in std_logic; c: out std_logic);
end;
It used to be considered good style to write all the keywords of VHDL in one case,
and all the names that we have chosen for our design in the other case. This makes it
easier to figure out what is going on in the design.
In lectures we will show all keywords in uppercase to make it clearer to you what is
part of the VHDL language, and what is just a name that I have chosen.
ENTITY nandgate IS
PORT (a,b: IN STD_LOGIC; c: OUT STD_LOGIC);
END;
ENTITY nandgate IS
PORT ( a, b: IN STD_LOGIC; c: OUT STD_LOGIC);
END;
5.3 Returns
Putting in a carriage return makes no difference to the function of your code. So the
following two are identical in function
ENTITY nandgate IS
PORT ( a, b: IN STD_LOGIC; c: OUT STD_LOGIC);
END;
ENTITY nandgate IS
PORT ( a, b: IN STD_LOGIC;
c: OUT STD_LOGIC);
END;
ENTITY nandgate IS
PORT ( a, b: IN STD_LOGIC; c: OUT STD_LOGIC);
END ENTITY nandgate;
1-9
When you run the compiler, the code will be checked, and if there is a mismatch
between what you say you are ENDing and what VHDL thinks you are ending, then
this will be flagged as an error.
5.5 Comments
Comments are introduced by two dashes:
Everything after the two dashes up to the end of the line is a comment. The example
above isn’t great, because the comments are stating the obvious. But we haven’t done
enough of the language yet to show an example that would give rise to more sensible
comments
LIBRARY IEEE;
The IEEE library contains many sub-libraries, which in turn contain many features.
The VHDL name of a sub-library is a package. In order to say which features of
which packages we wish to access, we use a statement that looks like this:
USE IEEE.XXXX.YYYY
Where XXXX is the name of the required package, and YYYY is the name of the
specific feature that is to be used. Rather than listing each specific feature that we
want to use (which can be very tedious), often we will simply make all features within
a package visible by using the VHDL keyword ALL:
USE IEEE.XXXX.ALL
This opens up all features in the XXXX package of the IEEE library so that they can
be used by our design.
1-10
6.2 Using STD_LOGIC
The standard logic definitions are held in a package called std_logic_11641. So here is
a full listing for the NAND, that opens up the library to access the features of
STD_LOGIC type.
LIBRARY ieee;
USE ieee.std_logic_1164.all;
ENTITY nandgate IS
PORT ( a, b: IN STD_LOGIC; c: OUT STD_LOGIC );
END ENTITY nandgate;
7 Summary
We’ve looked at the basic features of VHDL, and seen some simple examples. The
two key parts of a description are the ENTITY, i.e. a list of inputs and outputs, and an
ARCHITECTURE, i.e. a description of the logical relationship between the inputs and
outputs. We’ve also looked at the STD_LOGIC data type, which represents a wire
carrying values of 1 and 0. The definition of the STD_LOGIC data type is held in the
library IEEE and must be imported at the start of each entity in our code.
1 1164 is simply the number of the IEEE standards document that defined the Standard Logic type.
1-11
Unit 1.4
Handling signals that are more than 1 bit wide
1 STD_LOGIC_VECTORs
Most interesting design have inputs that are more than just a single bit. For example,
lets consider a device that has two 4-bit inputs a and b, and a 4-bit output c.
b c0
0
a
0
4 b1 c1
a a
4 1
b3 c3
a
3
STD_LOGIC_VECTOR(0 TO 3)
Now a contains four members a(0), a(1), a(2) and a(3). Each of these four members is
of type STD_LOGIC.
2 An example
Imagine that we wanted to represent a device like this:
b c0
0
a
0
b1 c1
a
1
b c2
2
a
2
b c3
3
a
3
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
ENTITY orgate IS
PORT ( a, b: IN STD_LOGIC_VECTOR(0 TO 3);
c: OUT STD_LOGIC_VECTOR(0 TO 3));
END ENTITY orgate;
There are several ways that we could write the architecture. One way to describe it
would be like this, explicitly listing what happens for each bits:
1-12
ARCHITECTURE number1 OF orgate IS
BEGIN
c(0) <= a(0) OR b(0);
c(1) <= a(1) OR b(1);
c(2) <= a(2) OR b(2);
c(3) <= a(3) OR b(3);
END ARCHITECTURE number1;
Alternatively, we could just write this, which would be simpler and would mean
exactly the same thing
VHDL knows that a, b and c are four bits wide, and will do the appropriate operation
for each of the bit positions.
3 STD_LOGIC_VECTOR values
The value of an STD_LOGIC is indicated by a string of values enclosed in double
quotes. So if a is a single bit, assignment looks like this:
a <= '1';
a <= "1110";
By default, VHDL expects the values to be binary, but sometimes it can be useful to
use Hex numbers. This can be done by placing the letter X before the
STD_LOGIC_VECTOR value:
a <= X"E";
a: STD_LOGIC_VECTOR(0 TO 3);
a <= “1110”;
Element 0
Element 1
Element 2
Element 3
1-13
This feels normal and intuitive (indeed in the programming languages that you may
know, e.g. C or Java, this is the only way that you are allowed to do it). However, in
VHDL you also have the option to have arrays where the index counts downwards:
a <= “1110”;
Element 3
Element 2
Element 1
Element 0
In both cases, the number would be interpreted as 14 signed or –2 unsigned: the left-
most bit is always interpreted as the msb and the right most is always interpreted as
the lsb.
In digital logic design, the normal numbering convention is that bit 0 is the least
significant bit (lsb). This is accomplished by having the index run downwards. So
unlike most programming languages, in VHDL it is normal for arrays to be numbered
downwards. You can use upward-numbering if you want, but this often leads to
confusion that creates awkward bugs in your code.
3.2 Aggregates
Aggregates are a group values, separated by commas, that will be used for an array.
Here is an example:
ARCHITECTURE example OF aggregate IS
SIGNAL nibble1, nibble2: STD_LOGIC_VECTOR ( 0 TO 3 );
BEGIN
nibble1 <= ( '0','1','0','0');
nibble2 <= ( '0','0','1','0');
END ARCHITECTURE example;
The assignment for nibble1 sets its 0th value to ‘0’, its 1st value to ‘1’, the 2nd to ‘0’
and so on. This way of doing things is called positional assignment: the 0th value
listed goes in the 0th position, the first goes in the first position and so on. We could
instead use named association. So these statements have the same effect:
nibble1 <= ( '0','1','0','0');
nibble1 <= ( 1 => '1', 0 => '0', 3 => '0', 2 => '0');
With named association, we can just specify the values of some of the bit positions,
and use an OTHERS value to provide a value for everything not explicitly mentioned.
So these are all the same:
The OTHERS notation can be used as a convenient trick when we want to set all the
values of an array to a particular value:
nibble1 <= ( OTHERS => '1');
1-14
3.3 Concatenation
Concatenation merges two vectors to produce a longer vector. For example
ARCHITECTURE example OF aggregate IS
SIGNAL byte: STD_LOGIC_VECTOR ( 0 TO 7 );
SIGNAL nibble1, nibble2: STD_LOGIC_VECTOR ( 0 TO 3 );
BEGIN
nibble1 <= ( '0','1','0','0');
nibble2 <= ( '0','0','1','0');
byte <= nibble1 & nibble2;
END ARCHITECTURE example;
3.4 Literals
STD_LOGIC is a data type that has values '0', '1', ‘X’, ‘U’ etc. It is sub-type of
CHARACTER. An array of characters is a string, and is denoted by double quotes.
This is similar to the convention used in the C programming language: '1' is a
character; "1010" is an array of 4 characters. We can use this notation for
STD_LOGIC_VECTORS. So for example,
nibble1 <= ( '0','1','0','0');
could be written as
nibble1 <= "0100";
A value that is directly specified (as opposed to being calculated from other signals),
like “0100” in the code above is called a literal. Standard logic vector literals may be
specified in binary, octal or hexadecimal. By default, a string is interpreted as binary.
To make it explicit that we wish the string to be interpreted as a binary number, we
can place the letter B in front. For an octal string, we place the letter O in front, and
for hexadecimal, we place X in front. So if a is 12-bit std_logic_vector, then these are
all equivalent:
a <= "010011001010";
a <= B"010011001010";
a <= O"2312";
a <= X"4CA”
Long strings of ‘1’s and ‘0’s can be confusing, so in order to improve legibility, we
can introduce underscores:
a <= B"0100_1100_1010";
The underscores are ignored by VHDL; their only function is to space the digits out to
make it easier for a human to read. Note that if you do use underscores in your values,
you must put the B in front to make it clear that this should be interpreted as a binary
value. This (without the B) would be an error:
a <= "0100_1100_1010"; -- Wrong!
4 Summary
We’ve looked at the STD_LOGIC_VECTOR datatype, which represents multi-bit signals.
It is effectively an array of STD_LOGIC values. We have also seen the two main
notations for assigning values to STD_LOGIC_VECTORs and how to indicate the number
base for a STD_LOGIC_VECTOR value.
1-15
You should now know...
1-16
Unit 1.5
Number Representation and Arithmetic
Most forms of data that are handled by digital systems (e.g. samples of audio data,
pixel values for image and video data, ASCII data for representing text) are some
form of number. In this section we will look at the background to two of the most
important VHDL numerical data types (SIGNED and UNSIGNED). Before we do that, we
will briefly recap binary data representation formats for numbers.
1 Denary
In everyday life, we use the denary (base 10) number system whose digits can take the
values 0,1,2,3,4,5,6,7,8,9. An n-digit denary number with digits di is interpreted as
having the value
d 10
i 0 , n 1
i
i
So, for example, the number 365 has three digits: d2=3, d1=6 and d0=5. Its value is
2 Unsigned binary
For digital systems, we deal with binary numbers where digits can have value 0,1.
These values are particularly convenient for the construction of simple, cheap and fast
electronic circuits. A digit that can take only the values 0 or 1 is called a binary digit
or bit. An n-bit binary number has the value
d 2
i 0 , n 1
i
i
So, for example, the denary number 5 equates to the 3-bit binary number 101, which
has digits d2=1, d1=0 and d0=1. Its value is
1 4 + 0 2 + 1 1, which is
1 22 + 0 21 + 5 20
The bit with the highest weighting is called the most significant bit (msb) and the bit
with the lowest weighting is the least significant bit (lsb). The msb is always the
leftmost bit and the lsb is the rightmost. For this 3-bit example, bit number 2 is the
msb and bit number 0 is the lsb.
The largest number that we can represent depends on the number of bits that we use.
For example, if we use 4 bits to represent a number, then there are 16 different values
that can be represented:
Remember that anything raised to the power of zero is 1, i.e. x 1 for all x
1 0
1-17
Number Unsigned binary representation
0 0000
1 0001
2 0010
3 0011
4 0100
5 0101
6 0110
7 0111
8 1000
9 1001
10 1010
11 1011
12 1100
13 1101
14 1110
15 1111
Similarly, if we have a 4-bit binary up-counter and it reads 1111, then the next state in
its count sequence will be 0000. This provides us with an alternative method for the
representation of negative numbers. –1 is the number that is 1 less than zero. In other
words, it is a number which when added to 1 gives zero. But we have just seen that
1111 when added to 1 gives 0000. Similarly, -2 is the number that gives zero when we
add 2 to it. This is 1110. The number system that this generates is shown below
1-18
Thus we interpret the value of an n-bit 2s complement number as
d n 1 2 n 1 d 2
i 0 ,n 2
i
i
This is exactly the same as an unsigned binary number, except that the msb is
negatively weighted. So, for example, the interpretation of the number 1010 is
Note that all negative numbers have an msb of 1 and all positive numbers have an
msb of 0. The msb of a 2s complement is therefore often referred to as the sign bit.
But this is correct without any modification. An adder for unsigned binary works
without modification for 2s complement. This is why 2s complement is the normal
representation format for integer and fixed point numbers in digital systems1.
1In fact there is one small difference between an unsigned binary adder and a 2s complement adder,
and this is related to the treatment of overflow conditions.
1-19
It has two inputs, a and b, both of which represent four-bit binary numbers. There is a
single one-bit output g, which represents the “greater than” condition. When a>b then
g=’1’; otherwise g=’0’.
This can’t be interpreted unless we know whether a and b are signed (2’s
complement) or unsigned numbers. Suppose a=1111 and b=0001. If the numbers are
unsigned then a is fifteen and b is one. So a>b and g=1. But if the numbers are signed
then a is minus one and b is plus one; a<b and g=’0’.
The way that VHDL handles this is through the NUMERIC_STD library. This introduces
two new data types, SIGNED and UNSIGNED. These are declared and used in the same
way that STD_LOGIC_VECTORs would be, but they have the additional property that
arithmetic operators +,-,>,< and conversion to integer are defined for them. By
contrast, if we tried to apply +,-,>,< to STD_LOGIC_VECTORs this would not be allowed
and would result in a compilation error.
ENTITY numbers IS
END ENTITY numbers;
The binary value 1010 when given to an unsigned signal x converts to an integer
value of +10. The same value when given to a signed signal y converts to an integer
value of -6. However, if we try to convert a STD_LOGIC_VECTOR z whose value is
1010, this results in a compilation error. This forces us always to be clear about the
number convention that we are using, and helps us to avoid subtle bugs creeping into
our code.
If we really did want to convert z to an integer, then we would need to feed it into a
function that converts it value to UNSIGNED or SIGNED, and then geed it into the
TO_INTEGER conversion function
1-20
So our comparator circuit would be described like this (assuming that we want the
signed interpretation of our binary numbers):
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.numeric_std.ALL;
ENTITY comparator IS
PORT ( a: IN SIGNED(3 DOWNTO 0);
b: IN SIGNED(3 DOWNTO 0);
g: OUT STD_LOGIC);
END ENTITY comparator;
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.numeric_std.ALL;
ENTITY comparator2 IS
PORT ( a: IN STD_LOGIC_VECTOR(3 DOWNTO 0);
b: IN STD_LOGIC_VECTOR (3 DOWNTO 0);
g: OUT STD_LOGIC);
END ENTITY comparator2;
Opcode Operation
00 num1 + num2
01 num1 – num2
10 num1 OR num2
11 num1 AND num2
This is an arithmetic logic unit. It has two inputs, num1 and num2, each of which are
16 bits wide. The 16-bit output result is produced by some arithmetic or logical
operation on the two inputs. The operation that will be performed on num1 and num2
to produce result is shown in the table.
To design one of these using manual design methods is a non-trivial task. However, in
VHDL it’s easy. Here is the listing:
1-21
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.numeric_std.ALL;
ENTITY alu IS
PORT ( num1, num2: IN SIGNED(15 DOWNTO 0);
opcode: IN UNSIGNED(1 DOWNTO 0);
result: OUT SIGNED(15 DOWNTO 0) );
END ENTITY alu;
In order to test whether our design does what we expect, we can feed it into a
simulator tool. This is a program which allows us to apply inputs to the design, and to
see what outputs would be produced by the design. Here is an example simulation:
Once we have simulated the description thoroughly and are sure that it correctly gives
the behaviour we want, the code can then be fed into a synthesis tool, a computer
program which will automatically generate a gate-level design.
1-22
Unit 2.1
Dataflow and Structural VHDL
There are two fundamentally different ways that we can describe a design:
Behavioural descriptions tell us what the design should do but not how we would
make it
Structural descriptions tell us how we would make it but not what it would do
In this unit, we will look how a behavioural design (a 4-bit adder) might be
transformed by a synthesis tool into a structural netlist of logic gates.
As we do this, we will learn more about how to write behavioural and structural
VHDL. There are several different approaches to writing behavioural VHDL. The
easiest to understand is dataflow, a type of description where we build up the required
behaviour as a set of arithmetic and logical operations on data items.
carry in
x
0 + sum
y 0
0
x
1 + sum
y 1
1
x
2 + sum
y 2
2
x
3 +
y sum
3 3
carry out
A structural description tells us how we would connect together several simpler units
to make a more complicated unit. In this case our simple units are full adders, and the
complicated unit is the 4-bit adder.
If we are going to build or circuits out of basic logic gates, then the above description
isn't quite finished. Although we have broken down our complicated (4-bit) unit into
simpler (1-bit) units, we still haven't shown how to make the full adders out of gates.
By contrast, in the description below everything is resolved into the most basic
building blocks we have (in this case logic gates).
1-23
This design is a special case. It is a netlist, a description that consists solely of the
interconnection of basic building blocks that are available to implement the design. A
netlist contains sufficient detail that it is immediately obvious how to build the device.
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.numeric_std.ALL;
ENTITY adder IS
PORT ( x, y: IN SIGNED(3 DOWNTO 0);
sum: OUT SIGNED (3 DOWNTO 0) );
END ENTITY adder;
The above code has all the advantages of a behavioural description. The code is
concise; you can easily see at a glance what function it performs; it contains no
detailed decisions about logic gates. This code would then be given to a synthesis
tool, a computer program whose purpose is to design a circuit that fulfils the required
behaviour.
1-24
2.1 Implementing the adder function
Here is a circuit that the synthesis tool might create in order to fulfil our 4-bit adder
requirement:
carry in
x
0 + sum
y 0
0
x
1 + sum
y 1
1
x
2 + sum
y 2
2
x
3 +
y sum
3 3
carry out
The four-bit adder is built up from four 1-bit full adders, which have the following
behaviour:
There are many ways to implement this. One possible way is shown below.
x
sum
y
carry in
carry out
In the remainder of this lecture, we will look in detail at the full adder circuit, and use
it to illustrate the features of dataflow VHDL. In the next lecture we will look at how
to connect together the full adders and create a structural 4-bit adder. We will also
look at how to feed the 4-bit adder circuit into a simulator.
1-25
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
ENTITY fulladd IS
PORT ( x, y, cin: IN STD_LOGIC;
sum, cout: OUT STD_LOGIC);
END ENTITY fulladd;
This is easy and obvious, but also tedious. There are many neater ways to describe the
behaviour. We could take inspiration from the gate level design of the full adder, and
write this
This is much neater and nicer, but requires us to think a bit harder about how the
outputs relate to the inputs.
It’s important to realise that as far as a synthesis tool is concerned, both descriptions
are the same thing. They simply say how the outputs relate to the inputs. The second
architecture is not ordering the synthesis tool to use two XOR gates, 3 AND gates and
an OR gate. It’s simply a shorthand for saying how the output relates to the inputs.
The synthesis tool is free to do whatever it wants to find a circuit that has the same
input-output relation.
1-26
3.1 Local signals
Now let’s look at a slight modification of our description.
n1
x
sum
y
n2
cin
cout
n3
n4
We have given names to the internal nodes of the circuit (n1, n2, n3, n4). Once we
have given them names, we are free to use them in our description. So here is a
slightly different description
This is basically the same as the simple architecture of fulladd, but this time we have
used the local signals n1, n2, n3 and n4 as part of the description. In order to use the
names, we have to declare that they exist, that they are signals, and that they carry
logic values (e.g. ‘1’, ‘0’, ‘X’ and ‘U’) which means that they are of type
STD_LOGIC. The declaration of local signals takes place between the
ARCHITECTURE statement and the first BEGIN.
work.fulladd(dataflow)
The name is constructed from the library name followed by a point, then the entity
name, then the architecture name.
1-27
4.1 Placing library components into a design
We can now build up a 4-bit adder structurally as follows:
cin
x
0 + sum
y 0
0 carry
x 1
1 + sum
y 1
1 carry
2
x
2 + sum
y 2
2 carry
3
x
3 +
y sum
3 3
cout
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
ENTITY adder IS
PORT ( x, y: IN STD_LOGIC_VECTOR(3 DOWNTO 0);
cin: IN STD_LOGIC;
sum: OUT STD_LOGIC_VECTOR(3 DOWNTO 0);
cout: OUT STD_LOGIC);
END ENTITY adder;
ARCHITECTURE structural OF adder IS
SIGNAL carry: STD_LOGIC_VECTOR(4 DOWNTO 0);
BEGIN
g0: ENTITY work.fulladd(dataflow)
PORT MAP (x(0),y(0),cin,sum(0),carry(1));
g1: ENTITY work.fulladd(dataflow)
PORT MAP (x(1),y(1),carry(1),sum(1),carry(2));
g2: ENTITY work.fulladd(dataflow)
PORT MAP (x(2),y(2),carry(2),sum(2),carry(3));
g3: ENTITY work.fulladd(dataflow)
PORT MAP (x(3),y(3),carry(3),sum(3),cout);
END ARCHITECTURE structural;
The names appearing in the port maps are the names of the wires. Wherever the same
name occurs in the output list of one component and in the input list of another, that
means that there is a wire connecting these two components. So, for example,
carry(2) is a wire connecting the output of c1 to the input of c2.
1 Strictly speaking, g1 is a statement label, but you can think of it as just providing a name for the gate.
1-28
4.2 Positional association
How does VHDL know which of the wires I am connecting to c1 are inputs and
which are outputs? If we compare the instantiation
c0: entity work.fulladd(dataflow)
PORT MAP (x(0),y(0),cin,sum(0),carry(1));
ENTITY fulladd IS
PORT ( x, y, cin: IN STD_LOGIC; sum, cout: OUT STD_LOGIC);
END ENTITY fulladd;
We see that the first three signals in the port map are inputs and the last two are
outputs. So the first three signals in the instantiation x(0), y(0) and cin will be attached
to the inputs x, y and cin. Similarly, sum(0) will be connected to sum and carry(1) will
be connected to cout. This is called positional association.
This is called named association. With named association, the order doesn’t matter, so
you could jumble up the order of the signals and write the instantiation like this
5 Summary
A behavioural description says what a design should do. Dataflow is a type of
behavioural description that relates the outputs to inputs using logical or arithmetic
assignments.
A structural description says how we construct a design from the composition of
simpler units. A structural description can be
Hierarchic: made up from simpler units, which themselves then need to be
designed
Netlist: made up from fundamental building blocks (e.g. logic gates)
1-29
Unit 2.2
VHDL Simulation
A simulator is a software tool that takes a proposed design (which could be
behavioural or structural or a mixture of both) and predicts what outputs would result
from a given set of input transitions. Simulation one of the main methods of verifying
that a proposed design has the required behaviour.
Let’s look again at the full adder circuit, and for the sake of clarity, we will now give
names (g1…g6) to the gates.
g1 n1
x
g2 sum
y
n2
g3
cin
g4 g6 cout
n3
g5
n4
Imagine that the signals x, y and cin are initially at zero. Looking through the circuit,
we can see that n1, n2, n3, n4, sum and cout will all be at zero.
Now imagine that x changes its value from 0 to 1. Let’s think through what happens
next:
x is the input to three gates: g1, g3 and g4. These gates are potentially affected by
the change, so we need to re-compute their outputs n1, n2, n3.
We also know that gates g2, g5 and g6, which don’t have x as an input, can’t be
affected by this change, so there is no point to re-computing their outputs.
The new value of n1 is 1 (i.e. it changed)
The new value of n2 is 0 (i.e. it is unchanged)
The new value of n3 is 0 (i.e. it is unchanged)
n1 just changed, which means that any gate that has n1 as an input (i.e. g2) needs
to have its output (sum) re-computed.
n2 and n3 didn’t change, so we don’t need to bother to examine any consequences
in gate g6, which has n2 and n3 as inputs.
The new value of sum is 1.
There are no more gates whose inputs have changed, so we can stop analysing the
circuit now.
1-30
ARCHITECTURE number3 OF fulladd IS
SIGNAL n1, n2, n3, n4: STD_LOGIC;
BEGIN
1 n1 <= x XOR y;
2 sum <= cin XOR n1;
3 n2 <= x AND y;
4 n3 <= cin AND x;
5 n4 <= y AND cin;
6 cout <= n2 OR n3 OR n4;
END ARCHITECTURE number3;
All statements 1-6 are scanned simultaneously, waiting for a signal on the right hand
side (RHS) to change. In the jargon of VHDL, a change to a signal is called an event.
A VHDL simulation proceeds by manipulating an event queue. If we assume that all
signals are initially at 0, then the event queue initially looks like this:
Time = 0
Signal Name: x y cin n1 n2 n3 n4 sum cout
Present value: 0 0 0 0 0 0 0 0 0
Next value: 1
Event time: 10
It has a list of the present value for each signal, any new value that has been scheduled
to take place in future, and the time at which the signal must assume this new value.
In this case then next event is that x will transition from ‘0’ to ‘1’ at time 10 ns.
Once the event queue is set up, the simulator proceeds by looking down the event
queue to find the time of the next pending event. It then jumps forward to the time of
the next event (10 ns), giving x its new value. An event has just occurred on signal x.
This triggers execution of the all of the statements that have x on the RHS:
If the new value is different from the old value, it is placed on the event queue. The
queue now looks like this.
Time = 10
Signal Name: x y Cin n1 n2 n3 n4 sum cout
Present value: 1 0 0 0 0 0 0 0 0
Next value: 1
Event time: 10+
The simulator now looks down the event queue to find the next scheduled event.
There is only one item on the queue, the 0 to 1 transition on n1 at time 10+. The time
pointer is incremented to 10+and n1 takes its new value. Because an event has
occurred on n1, any statement with n1 on the RHS is triggered:
1-31
If the new value is different from the old value, it is placed on the event queue. The
queue now looks like this.
Time = 10+
Signal Name: x y Cin n1 n2 n3 n4 sum cout
Present value: 1 0 0 1 0 0 0 0 0
Next value: 1
Event time: 10+
The simulator now looks down the event queue to find the time of the next scheduled
event, i.e. 10+2. The time pointer is incremented to 10+2and sum takes its new
value.
Time = 10+2
Signal Name: x y Cin n1 n2 n3 n4 sum cout
Present value: 1 0 0 1 0 0 0 1 0
Next value:
Event time:
There are no statements with sum on the RHS, so no further statements are triggered.
The simulator now looks down the queue to find the next scheduled event. There are
none, so simulation terminates.
3 Concurrent processing
Now we come to a very important point. Consider these two descriptions of the full
adder:
Now, when a or b change, the value of c will be re-computed, but c will not get its
new value until 10 ns after the change in the input. VHDL knows about the following
units of time:
Imagine that the signals x, y and cin are initially at zero, so n1, n2, n3, n4, sum and
cout are also initially at zero. At time 10 ns x will go to one. The event queue initially
looks like this:
Time = 0
Signal Name: x y cin n1 n2 N3 n4 sum cout
Present value: 0 0 0 0 0 0 0 0 0
Next value: 1
Event time: 10
Once the event queue is set up, the simulator looks down the event queue to find the
time of the next scheduled event. It then jumps forward to the time of the next event
(10 ns), giving x its new value. An event has just occurred on signal x. This triggers
execution of the all of the statements that have x on the RHS:
1-33
If the new value is different from the old value, it is placed on the event queue. The
queue now looks like this.
Time = 10
Signal Name: X y Cin n1 n2 n3 n4 sum cout
Present value: 1 0 0 0 0 0 0 0 0
Next value: 1
Event time: 20
The simulator now looks down the event queue to find the next scheduled event.
There is only one item on the queue, the 0 to 1 transition on n1 at time 20. The time is
incremented to 20and n1 takes its new value. Because an event has occurred on n1,
any statement with n1 on the RHS is triggered:
If the new value is different from the old value, it is placed on the event queue. The
queue now looks like this.
Time = 20
Signal Name: x y Cin n1 n2 n3 n4 sum cout
Present value: 1 0 0 1 0 0 0 0 0
Next value: 1
Event time: 30
The simulator now looks down the event queue to find the time of the next scheduled
event, i.e. 30. The time pointer is incremented to 30and sum takes its new value.
Time = 30
Signal Name: x y Cin n1 n2 n3 n4 sum cout
Present value: 1 0 0 1 0 0 0 1 0
Next value:
Event time:
There are no statements with sum on the RHS, so no further statements are triggered.
The queue is now empty, so simulation terminates.
1-34
5 Simulation of structural VHDL
For the sake of clarity, let’s look again at the structural description, and highlight
which of the signals are inputs:
cin
x
0 + sum
y 0
0 carry
x 1
1 + sum
y 1
1 carry
2
x
2 + sum
y 2
2 carry
3
x
3 +
y sum
3 3
cout
6 Summary
Simulation advances through time updating signal values according to assignments.
Dataflow VHDL consists of a series of Boolean/arithmetic assignment statements.
These statements are concurrent: all are active at the same time; a statement is
triggered to re-evaluate its left hand side value when any a right-hand side value
changes. Structural VHDL consists of instantiations of library elements, which
operate concurrently. When an input to an instance changes, the new outputs are
evaluated.
1-35
Unit 2.3
VHDL Processes and Test Benches
In this lecture we will look at how to write blocks of VHDL that are interpreted
sequentially (as opposed to the concurrent behaviour that we have seen so far). This is
done by using a VHDL process. Sequential VHDL has many applications, but in this
lecture we will illustrate its use in setting up simulations that can be used to test out our
designs before we feed them to a synthesis tool.
PROCESS
BEGIN
Statement 1;
Statement 2;
Statement 3;
END PROCESS;
When the PROCESS executes, it runs each statement sequentially, i.e. Statement 1 first,
Statement 2 second and so on. When the process reaches the END PROCESS statement, it
wraps back round to the BEGIN and starts all over again. Because of this behaviour, the
process shown above would in fact be an infinite loop running in zero time, which will
never be useful. So we need to give additional information to a PROCESS to tell it when it
should run and when it should suspend its execution. One way to do this is by means of
a WAIT statement.
PROCESS
BEGIN
clock <= '1';
WAIT FOR 10 NS;
clock <= '0';
WAIT FOR 10 NS;
END PROCESS;
When simulation is carried out, the process starts running immediately at time 0. When
it gets to the WAIT statement, it is suspended. After 10 ns of simulation time has gone
by, the process resumes execution until it hits the next WAIT statement. After another 10
ns has elapsed, the process resumes, reaches the END PROCESS, wraps back round to the
BEGIN and continues execution. The resulting behaviour is as follows:
The process resumes every 10 ns the clock signal toggles between 0 and 1.
A process can be used anywhere that it would be legitimate to use a line of concurrent
code. If we use multiple processes within an architecture, then the processes operate
concurrently with one another. The processes also operate concurrently with any lines
of concurrent code within the architecture.
The sensitivity list is a list of signals. The way this works is as follows:
The process waits until a signal in its sensitivity list changes.
When signal on the sensitivity list changes, the process starts executing. It runs each
of the statements in its body sequentially, i.e. one after the other, first statement 1,
then statement 2, an so on.
Suppose, for example, that we want to describe our full adder using a process:
x
sum
y
carry in
carry out
This circuit will need to re-compute its outputs whenever an input changes. So the
sensitivity list should be x, y, cin. The process body will describe how to compute the
new outputs in response to an input change.
Whenever x, y or cin change, the process will execute (sequentially) and compute new
values for cout and sum. The process will then suspend until the next time x, y or cin
change.
Sequential code “flows” from one line to the next, in blocks of code. This means that
we can build up complicated sequences of statements that build up a required behavior
over many lines. This is a very powerful way of doing things, and corresponds fairly
closely to the way that programming languages such as C work.
There are many constructs of a language that do make sense if there is a flow of code
from one line to the next, but don’t make any sense at all if each statement is
independent of its neighbors. So, in sequential VHDL there are many additional features
of the language that we can use that are not available in concurrent VHDL.
Notice that this assumes a sequential flow of control from one statement to the next. So
the IF block can only be used inside a process. Using an IF block outside of a process is
a error and the code will not compile.
In concurrent code, each line stands alone and is triggered into life by a change on its
RHS. So in order to achieve conditional assignment in a piece of concurrent code, we
need a version of the IF statement that bundles all the functionality into one (possibly
quite long) line of code. This is the WHEN statement.
Using a WHEN statement inside of a process is an error and the code will not compile7.
6 Notice that ELSIF (one word) is not the same thing as ELSE IF (two words)
7 This annoyed users so much that a change was made in VHDL-2008 so that WHEN statements can be
used inside processes. However, most tools default to VHDL-93 so to make a WHEN statement work
inside a process requires you would need to alter the compiler option to VHDL-2008.
1-38
4.2 Sequential and concurrent selection
In order to illustrate the selection operator, consider the example of a 4-input
multiplexer. The output y is connected to one of the four data inputs as result of the
value applied at input address. If address is “00” then data(0) is selected through to the
output y; if address is “01” then data(1) is selected, and so on.
data(0)
data(1)
data(2) MUX
data(3) y
4 to 1
address(0)
address(1)
LIBRARY ieee;
USE ieee.std_logic_1164.all;
ENTITY mux4to1 IS
PORT ( address: IN STD_LOGIC_VECTOR(1 DOWNTO 0);
data: IN STD_LOGIC_VECTOR(3 DOWNTO 0);
y: out STD_LOGIC);
END mux4to1;
The operation of selecting one of the data lines to the output y depending on the value of
address is accomplished in concurrent VHDL using a SELECT statement:
The OTHERS choice catches all other values for address that do not match any of the
values explicitly listed. It is clear from the fact that there is only one semicolon that
everything from WITH through OTHERS counts as one VHDL statement. This
statement will be executed whenever one of the address or data signals changes its
value.
1-39
In sequential code, the function is achieved using a CASE statement:
The CASE block is spread across several statements. These are executed (sequentially)
whenever address or data (the signals in the sensitivity list of the process) change their
value. The execution of the process computes a new value for y.
Both the CASE and the SELECT statement must exhaustively list all possible values for
the address. This is facilitated by using the OTHERS choice, to indicate everything that
has not been explicitly listed. You may wonder why we couldn’t just write this:
CASE address IS
WHEN "11" => y <= data(3);
WHEN "10" => y <= data(2);
WHEN "01" => y <= data(1);
WHEN "00" => y <= data(0);
END CASE;
Wouldn’t this exhaustively list all possible cases? The answer is no, because address is
declared as a STD_LOGIC_VECTOR. Its bits can take values not only of ‘0’ or ‘1’ but also
‘U’, ‘X’ etc.
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.std_logic_signed.ALL;
ENTITY adder IS
PORT ( x, y: IN STD_LOGIC_VECTOR(3 DOWNTO 0);
sum: OUT STD_LOGIC_VECTOR(3 DOWNTO 0) );
END ENTITY adder;
We can view this as the input to a synthesis tool. However, before we synthesize our
code we want to see if it is correct. We do this through simulation: give it some inputs
and see whether the outputs behave as expected. In order to do this we need to create a
VHDL test bench.
1-40
5.1 Test bench for our adder example
A test bench is not intended to be fed to a synthesis tool; it is simply a way of applying
some test inputs to our design and observing what the outputs do. Once we are sure that
our design behaves as expected, then we will feed the design (but not the test bench) to
the synthesis tool. Because a test bench will not be synthesized, we can be much more
carefree in the features of the VHDL language that we use.
Test bench
in1
in2
x
+
sum output
y
The test bench is a simulation of the world around our design. It will include
representations of the signal generator that will apply test inputs to our design (in1 and
in2). It will also include representations of the logic analyzer that will capture the
outputs of our design (output) and check that they are OK. Here is the ENTITY
declaration for the test bench:
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
ENTITY testbench IS
END ENTITY testbench;
The ENTITY declaration for the test bench may look slightly odd, since it contains no
port map. This is because it has no inputs or outputs. In order to fully describe a
simulation in VHDL, it is necessary that the top level of our design has no inputs or
outputs. (If it did have inputs and outputs, then we would need to think about some
bigger system enclosing the design that was able to supply the required inputs and
outputs.) This pattern is normally recognized by design tools as an entity that must not
be synthesized, but can be simulated.
ARCHITECTURE tb OF testbench IS
SIGNAL in1, in2: STD_LOGIC_VECTOR(3 DOWNTO 0);
SIGNAL output: STD_LOGIC_VECTOR(3 DOWNTO 0);
BEGIN
It declares local signals (in1, in2) that will be used to apply the test inputs to our design.
Similarly it declares a local signal (output) that will capture the simulated output of our
design. Then we place one copy of our design, which has been compiled to the library
as work.adder(behavioural) inside the test bench, and its inputs and outputs are wired
up to the local signals. Finally, there is a process that sets up a sequence of test inputs. If
we simulate this test bench, then this is the result
We can see that 2+5=7, 7+1=8 and 9+4=D (in hex) so the simulation gives us
reassurance that the design is working correctly.
The last line of the process says WAIT;. This tells the process to suspend forever. If we
didn’t have that WAIT statement, then the process would wrap round to the beginning
and starts the sequence all over again, repeating forever.
1-42
Unit 2.4
Synthesized hardware
In this lecture we look at how the main constructs of VHDL are inferred as hardware.
We will go through the main operators of dataflow VHDL and how they transform to
hardware. In order to ensure that your VHDL can map to valid hardware, the
synthesizable subset of VHDL imposes some restrictions on what you can include in
your code.
1 Boolean operators
The Boolean operators have a hardware interpretation that is trivially obvious
NOT A
not a A
A A
A AND B A NAND B
a and b B a nand b B
A A
a or b B
A OR B
a nor c B
A NOR B
A A
a xor c B
A XOR B a xnor c B
A XNOR B
However, some types of hardware do not support all of these gate types. If that is the
case, then a synthesis stage called technology mapping will be used to replace the
desired gates with an arrangement of equivalent functionality that uses only the
resources available in the hardware.
2 Comparison operators
Equality is tested by testing each of the bits of the number using an xnor gate, which
outputs a 1 if the inputs are equal. The xnor outputs are then anded together. Here is a 4-
bit example:
a(0)
b(0)
a(1)
b(1) a=b
a(2)
b(2)
a(3)
b(3)
If we want to find the condition a<b?, then we use a subtractor to form a-b.
a msb
a<b?
sub a-b
b nb
The msb of the result is the sign bit. If it is 1, then a-b is negative, which means a<b.
This is, of course identical, to the condition b>a?
1-43
To test for b<a (or a>b), we swap the inputs to the subtractor.
To test for a<=b, we use the fact that a<=b is the logical complement of a>b.
msb a<=b?
b
s b-a
a
3 Selection operators
statements (and WHEN statements) are implemented by multiplexers. So in concurrent
IF
VHDL the statement
Would synthesize as
x
y equals?
a
z
b
Where the equality detector is as described above in section 2.2. The following process
would have the same effect:
PROCESS ( a, b, x, y )
BEGIN
IF x=y THEN
z <= a;
ELSE
z <= b;
END IF;
END PROCESS;
4 Latch inference
Suppose we write an incomplete WHEN statement (i.e. one that has no ELSE clause)
PROCESS ( a, b, x, y )
BEGIN
IF x=y Then
z <= a;
END IF;
END PROCESS;
If x=y then the outcome is clear, z will be set to the value of a. But what if x is not equal
to y? We haven’t specified a value of for z to take. In this circumstance, the behaviour
of VHDL is that z will continue to hold whatever value it had previously. This will
synthesize as
1-44
x
y equals
a
z
Latch
If this is the circuit you wanted then that’s fine, however this is often done by accident
and is one of the commonest coding errors in VHDL. If you don’t want a latch to be
inferred, you must assign a value to the output under every possible combination of
inputs.
a
0 + c carry in
b 0
0
a a
1 + c
a c
b 1 where b + c is built as
1 b
a
2 + c carry in
b 2
2
carry out carry out
a
3 +
b c
3 3
carry out
a a a a
0 + c 0 + c 0 + c 0 + c
b 0 0 0 0 0 b 0
0 0
a a a a
1 + 1 + 1 + 1 +
b c c c b c
1 0 1 0 1 1
1 1
a a a a
2 + c 2 + c 2 + c 2 + c
b 2 0 2 0 2 b 2
2 2
a a a a
3 + 3 + 3 + 3 +
b c 0 c 0 c b c
3 3 3 3 3 3
The subtraction and negation circuits are based on a standard trick to negate a 2s
complement number: we complement each of the bits and then add 1. So, for example,
if we want to know what –6 is represented as a 4-bit binary number we do this:
Form the number +6 in 4-bits, i.e. 0110.
Complement each bit (i.e. replace each 0 by a 1 and each 1 by a 0) to get 1001.
Finally we add 1, which gives us 1010.
1-45
6 Absolute value: c <= abs(a)
This is accomplished by transforming to
and then building the circuit according to the above rules, using a comparator, a
multiplexer and a 2s complementer.
7 Multiplication: c <= a * b;
This is accomplished using an array multiplier. To understand how this works, consider
an example of (unsigned) long multiplication 7 3 = 21:
0111 Multiplicand = 7
0011 Multiplier =3
0111 LSB of multiplier multiplicand: partial product 0 = 7
+ 01110 Bit 1 of multiplier multiplicand: partial product 1 = 14
+ 000000 Bit 2 of multiplier multiplicand: partial product 2 = 0
+ 0000000 MSB of multiplier multiplicand: partial product 3 = 0
00010101 Sum all the partial products = 21
The 1-bit muliply is simply an AND gate. So, for example, partial product 0 is the LSB
of the multiplier ANDed with the each bit of multiplicand and shifted left by 0 bits.
Similarly, PP1 is bit 1 of the multiplier ANDed with the multiplicand and shifted left by
1 bit.
To add the four partial products requires a 4-input adder. This is more easily realised by
a cascade of 2-input adders. So we have one adder to add PP 0 and PP 1. This feeds its
output to another adder with adds this to PP 2. This in turn feeds another adder which
adds in PP 3.
a
b
Output left +
Input from right
Output down
and consists of a 1-bit multiplier (realised as an AND gate), combined with a full adder.
These cells are then placed to build up the multiplier. A 4 x 4 multiplier has the
appearance8
8This is an unsigned multiplier. To correctly handle negative operands (i.e. a signed multiplier) we would
need to make some small modifications
1-46
Note that a multiplier that multiplies an m-bit number by an n-bit number requires n+m
bits to represent the answer.
The speed of this multiplier is limited by carry propagation. In the worst case a
propagating carry may have to pass through 8 stages (in general for an n-bit multiplier,
2n stages, i.e. twice as many as for an n-bit multiplier). This makes the combinational
multiplier slow.
8 Synthesis optimizations
In the first stage, the synthesis tool will convert the VHDL description to a netlist of
gates, represented as an EDIF file. Before it then proceeds to technology mapping and
physical synthesis, it will normally perform an optimization stage on the netlist to get
rid of un-needed gates. For example, consider the incrementer circuit of section 8.5.1.
Since the input b is known to be 0 for all bit positions, the circuit can be optimized,
giving a 60% saving in logic gates.
a a
c c
is built as
0
carry in carry in
carry out carry out
1-47
Unit 2.5
Problems with VHDL Synthesis
Our lines of VHDL code represent chunks of hardware that receive inputs and drive
outputs. It is important to have a good grasp of how your VHDL code maps to hardware
in order to avoid a mistake which is commonly made by newcomers to VHDL. This is
the problem of having multiple different chunks of hardware all simultaneously driving
incompatible values onto the same output, causing the output to go to some
unpredictable garbage value.
If we do that, then obviously in our VHDL simulation the signal a will start with an
initial value of 0000. If we don’t give an initial value in the declaration
signal a: std_logic_vector(3 downto 0);
If we then perform an arithmetic operation on this UUUU value, say for example
c <= a+b;
2 Contention
Now suppose we have two lines of code that attempt to write to the same signal:
1-48
This error situation looks obvious. However, there are more subtle scenarios where this
same problem can crop up, and these can be harder to spot. We’ll think in detail about
these in the next two sections.
The driver for the signal result is statement 3, i.e. the statement that will compute a new
value for result when appropriate (i.e. whenever num1, num2 or button change value).
This code is correctly constructed and will do what we want. It will build hardware as
shown below:
Each line that has result on its left hand side will become a different lump of hardware.
Both try to drive the output result. Node result has multiple drivers.
The writer hoped that when button=’1’ then result would derive its value from
statement 3, and when button =’0’ then result would derive its value from statement 4.
The plan is that either one or the other assignment to result will be active, so result
should behave sensibly. However, this reasoning is wrong. result is deriving a value
from statement 3 all of the time. Statement 3 is a latch, with button as its enable signal.
When button=’1’, statement 3 drives the value num1+num2 onto result. When button
=’0’, statement 3 drives the previously memorized value onto result. Meanwhile
statement 4 is also driving a value onto result all of the time. When button =’1’ it drives
the value ‘0’; when button =’0; it drives the previously memorized value. Whatever
values button takes in this scenario, the value of result is guaranteed to be garbage.
1-49
Statement 3 tries to drive it to one value and statement 4 simultaneously tries to drive it
to a different value.
The conclusion is that if result appears on the LHS of more than one concurrent
statement, result has multiple drivers and a contention situation exists. Simulating the
code would give result a value of all Xs. A synthesis tools would probably refuse to
compile the code.
Within the same process, a signal can appear on the LHS as often as you want, and it
still counts as only one driver. For example:
Each time button, num1 or num2 changes, the process will execute. The process runs
sequentially calculating a new values for result. The new value of result is applied
after the process has completed. So if within a single run of the process, several
different assignments to result occurred, then the later assignments will overwrite the
earlier. After the process has completed, the new future value is assigned later. This is
not a contention situation: node result has only one driver.
1-50
However, if the same signal appears on the LHS of two different processes, then it has
multiple drivers and contention exists. For example, consider the signal result in the
following code:
The signal result has two drivers and the result would be garbage.
IF condition THEN
do this;
ELSE
do that;
END IF;
An IF block must end with an END IF, and can have an optional ELSE clause. If we
have multiple conditions to check for, we get this (also correct):
IF button='1' THEN
result <= num1+num2;
ELSIF button='0' THEN
result <= num1-num2;
END IF;
The ELSIF clause if part of the one IF block. It will only be evaluated if condition1 is
false.
A common mistake is to write the IF block with multiple conditions to check like this:
IF button='1' THEN
result <= num1+num2;
ELSE IF button='0' THEN
result <= num1-num2;
END IF;
The mistake is that ELSIF is not the same thing as ELSE IF. ELSIF is part of the existing IF
block, whereas ELSE IF starts a completely new IF block nested inside the other IF block.
We now have two IF blocks, so we need two END IF statements, and the code will not
compile if we just have one END IF.
6 Synthesizable VHDL
VHDL is a powerful language and enables us to express a vast range of possibilities in
our code. However some of these are meaningless in hardware and are useful only for
1-51
simulation. There is a standard subset of the VHDL language, defined in IEEE standard
1076.6, which states which features of the language can be used with confidence that all
synthesis tools will interpret them consistently and correctly. The reason why some
features are forbidden or ignored in VHDL-for-synthesis is usually because the features
have no reasonable counterpart in real hardware. Here are some examples:
The construct WAIT FOR is ignored by synthesis tools. The normal use of the WAIT
FOR construct is to set up the timing of test vectors in a test simulation bench.
The AFTER clause is ignored for a similar reason. It is not possible to synthesize a
gate that has an exact delay.
Initial values assigned to signals are normally ignored. Instead set and reset signals
must be used to initialize flip-flops. However, if your design targets an FPGA and
the signal will synthesize as a register, then the initial value will be used. This is due
to the peculiarities of the physical structure and programming method of FPGAs.
When a synthesis tool processes code that cannot be synthesized, there are two possible
outcomes:
For major problems, an error message is issued and synthesis is aborted
For minor problems, the offending code is deleted, and synthesis then continues
If your design depends critically on the use of one of the features of VHDL that
simulates correctly but is thrown away by the synthesis tool, then your synthesized
design will not work even though the simulations were fine.
1-52
Unit 3.1
Register Transfer Level VHDL (1)
In this lecture, we will look at how to describe sequential logic, logic whose operation is
synchronized to the edges of a clock signal.
LIBRARY IEEE;
D Q USE IEEE.STD_LOGIC_1164.ALL;
ENTITY dff IS
Clk PORT ( d, clk : IN STD_LOGIC;
q : OUT STD_LOGIC);
END ENTITY dff;
The behaviour of this device is as follows. When the clock is stable, Q simply holds its
value constant. When a rising clock edge occurs, the output Q takes on the value that D
has at the moment when the clock edge occurred. It then holds that value constant until
the next rising clock edge occurs, at which time it updates itself again.
There is a small delay between the occurrence of the clock edge and the appearance of
the new output. This is called the clock-to-q delay of the flip flop.
clk
Tclk-q
Here is an architecture that describes the behaviour of the D-type flip-flop:
Whenever clk changes its value, the process will run. However, clk might have changed
due to a falling edge of the clock (which should not trigger an update to q) so we need
to insert an IF statement which causes q to be updated only on the rising edge of clk.
The rising_edge function is contained in the STD_LOGIC_1164 package, and returns
TRUE when clk has changed from 0 to 1 during the last delta.
Clk
Reset
The Reset signal may be synchronous, but is usually asynchronous. If the Reset is
synchronous, then it is ignored until the rising edge of the clock. When the rising edge
comes, if Reset=’1’ then q goes to ‘0’. If Reset=’0’, then the flip-flop exhibits normal
behaviour, i.e. q<=d. This would be described like this:
If the Reset is asynchronous, then it takes immediate effect, no matter what the clock is
doing. This means that the flip-flop is always sensitive to its Reset input. This would be
described like this:
1-54
ARCHITECTURE asynch_reset OF dff IS
BEGIN
PROCESS (clk, reset)
BEGIN
IF ( reset='1' ) THEN
q <= '0';
ELSIF ( rising_edge(clk) ) THEN
q <= d;
END IF;
END PROCESS;
END ARCHITECTURE asynch_reset;
2 Registered logic
x
0 + sum
y 0
0
x
1 + sum
y 1
1
x
2 + sum
y 2
2
x
3 +
y sum
3 3
carry out
When we have a simple set of inputs, this behaves sensibly. Suppose x=”0000” and
y=”0001”. Then after a short delay as the input values move through the gate delays in
the full adder, sum gets a new value of “0001”.
Now suppose we have the values x=”0001” and y=”0111”. After a brief delay sum
becomes “0110”. Then after a short while, fulladder 1 “notices” that fulladder 0 has
generated a carry: its sum output flips to ‘0’, and its carry flips to ‘1’. So sum becomes
“0100”. Then after another short while, fulladder 2 notices that fulladder 1 generated a
carry and sum becomes “0000”. Then after another while fullader 3 notices the carry
that has just been generated by fulladder 2. Then sum becomes “1000”
Here is a simulation of how the adder behaves with the two different sets of inputs:
When x=1, y=0, the sum output goes quickly to the correct output, with no
misbehaviour en route. However, when x=1 and y=7, we have a series of outputs which
are garbage, and the sum takes a long time to settle to the correct value.
1-55
This effect is called carry ripple. For our little four bit adder, the problem was awkward
enough. But realistic adders are more likely to be 16, 32 or even 64 bits in length, so the
carry may have to ripple down a very long path. This can cause a long delay period
during which the outputs of the adder are garbage.
carry in
sum
0 sum_reg
x 0
0 +
y
0
sum sum_reg
x 1
1 + 1
y 4-bit
1 register
x sum sum_reg
2 + 2 2
y
2
x sum sum_reg
3 + 3 3
y
3
A group of D-type flip flops all controlled by the same clock signal is called a register.
This is what the output of the register looks like:
The registered sum will be updated at the each rising edge of the clock signal. If the
clock is slow enough that sum has completely settled before the next clock edge arrives,
then the registered sum is a cleaned up version of sum. (But notice that if we ran the
clock too fast, the rising edge would come during the period in which sum is garbage,
and the registered output would therefore be wrong).
One very important point to notice here is that the output is acquiring its value one
clock cycle after the corresponding inputs. So the output goes to 1 the cycle after the
inputs were x=0, y=1. Similarly, the output goes to 8 one cycle after x=7, y=1. This is
often a source of confusion, and you should make sure you understand why this
happens.
1-56
2.3 VHDL description of the registered adder
The diagrams of the previous section show a structural description of the adder. Usually
humans don’t produce structural code. They write behavioural code like this:
and leave it to the synthesis tool to produce the structural gate-level description of the
design. Now if we are looking for a registered adder, the above description isn’t good
enough. There is nothing to tell the synthesis tool that we want the addition to be
synchronized to a clock signal. An obvious way to describe the adder with registered
output is to instantiate 4 flip-flops at the output of the adder:
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.std_logic_unsigned.ALL;
ENTITY reg_adder IS
PORT ( x, y: IN STD_LOGIC_VECTOR(3 downto 0);
clk: IN STd_logic;
sum_reg: OUT STD_LOGIC_VECTOR(3 downto 0));
END ENTITY reg_adder;
However, this is pretty painful (even more so for a 32-bit adder). Instantiating lots of
small pieces of hardware is a structural way to do things, and we normally don’t want
humans to operate in this way. This is much better:
ARCHITECTURE registered2 OF reg_adder IS
SIGNAL sum: STD_LOGIC_VECTOR ( 3 DOWNTO 0 );
BEGIN
sum <= x + y;
PROCESS (clk)
BEGIN
IF ( rising_edge(clk ) ) THEN
sum_reg <= sum;
END IF;
END PROCESS;
END ARCHITECTURE registered2;
Now we have made in clear to the synthesis tool that we want a registered version of
sum to be created, synchronized to the rising edge of the clock. It is then up to the
synthesis tool to figure out how to build a circuit that achieves this behaviour.
1-57
2.4 Register transfer level (RTL) description
This description is even more concise:
If we want a reset signal, that can asynchronously reset the adder output to zero, this is
achieved in a similar fashion:
This style of coding is called register transfer level coding. We are using dataflow
statements, but wrapping them up in processes triggered by the clock (and possible
some reset or enable signals) in order to make it clear on what clock cycle the outputs
should assume their values. Notice that in RTL we are simply defining the behaviour we
want ( in this example sum_reg <= a+b on a rising edge of the clock). We are leaving it
entirely up to the synthesis tool to infer what configuration of registers will be needed to
give us this behaviour.
3 Summary
In this lecture we have looked at how to use clocked processes to build register transfer
level (RTL) descriptions. These are behavioural descriptions that make it clear on which
clock edge the outputs must assume their value. These descriptions can then be
synthesised into sequential circuits using the appropriate configuration of registers by a
synthesis tool.
1-58
Unit 3.2
Register Transfer Level VHDL (2)
Register transfer level descriptions build up a description of a complete system inside a
clocked process. The timing of the movement of data is controlled by the way that we
write our VHDL inside the process. It is therefore important to have a good grasp of
how signals inside processes behave, and how they map to hardware. In this lecture we
look at one of the most important rules of RTL: signals that are the target of an
assignment inside a clocked process will synthesise to registers.
1 A chain of registers
Suppose we have a chain of 4-bit registers, called s1, s2, s3. The output of one stage in
the chain is the input of the next:
Input D Q D Q D Q
s1 s2 s3
Clk
Now suppose we feed a series of numbers 8, 3, 7, 4 on successive clock cycles into the
input of the chain. Initially we have this:
8
Input D Q D Q D Q
s1 s2 s3
Clk
3 8
Input D Q D Q D Q
s1 s2 s3
Clk
The number 8 will be read into the first register at the clock edge. This takes a finite
amount of time, and the by the time that this read in has completed, the second stage is
no longer responding to its input. So the number 8 can proceed no further down the
chain in this clock cycle.
1-59
7 3 8
Input D Q D Q D Q
s1 s2 s3
Clk
And so on. On each clock cycle, the numbers shift one stage to the right. This device is
a shift register. The overall behaviour, shifting one stage per clock cycle, occurs because
reading a data item into a register has a finite delay. This is the clock-to-q delay (see
section 12.1), which is modelled in VHDL by the delay of an assignment to a signal.
If reading data into a register entailed no delay, then a single data item would shoot all
the way down the chain at the first clock edge.
You can see that this is the case for s1, s2 and s3 in the example above.
1-60
4 Order of statements in RTL VHDL
Inside a VHDL process, the statements are executed sequentially. So the ordering of
statements below left feels natural (as stage s1 comes before stage s2 in the chain). The
ordering below right is also correct and would also have exactly the same effect in spite
of the fact that the assignments have been written in the reverse order to what feels
natural.
This is because as the process runs, s1, s2, s3 are not updating their values. They will
only update a time after exit from the process.
1-61
Unit 3.3
Controlling Register Inference using Signals and Variables
In this lecture, we will see how to control when registers are inserted into our circuits. A
VHDL signal that is the target of an assignment will cause a register to be inferred.
Sometimes we would like to make an assignment without causing a register. This can be
accomplished using a VHDL variable.
1 Pipelines
In the last lecture, we looked at the shift register. Now suppose we put some blocks of
logic gates that perform some useful function between each register stage:
D Q D Q D
Block of D Block of D Q Block of D
n logic gates logic gates logic gates
Clk
This arrangement is called a pipeline. As we introduce inputs into the pipeline, they
flow along the pipeline at a rate of one stage per clock cycle as follows:
Clk
Clk
Clk
And so on.
1-62
2 Speed of pipelined datapaths
A system in which data flows in at one end, and out at the other end, after undergoing
some useful processing, is called a datapath. Let’s consider a simple example of a
datapath, an adder tree. Suppose we have four numbers a, b, c, d to add together, but the
adders that we have available to us only accommodate two inputs. This can be solved
by an adder tree arrangement, shown below:
a
+
b
e
register
f + sum
d
+ clock
This takes four numbers a, b, c, d and adds them all together, catching them in an output
register.
Suppose we have a timing constraint that says that the circuit must operate from a 100
MHz clock, but the delay of each adder is 6 ns. If a set of inputs is applied at a, b, c, d
then the values at e, f will be garbage for the next 6 ns and the value at sum will be
garbage for the next 12 ns. Thus our sum output will not be valid within the 10 ns
timeframe in which we want to apply a clock edge to sample the output sum.
a
register
+
b
e
register
f + sum
c
register
d
+
clock
Each of the register stages has stable inputs after only 6 ns, so we can apply a can apply
a clock edge to all registers after 10 ns and meet our timing constraint.
1-63
ARCHITECTURE pipelined OF adder IS
SIGNAL c, f: SIGNED ( 31 DOWNTO 0 );
BEGIN
PROCESS (clock)
BEGIN
IF ( rising_edge(clock) ) THEN
c <= a + b;
f <= d + e;
sum <= c + f;
END IF;
END PROCESS;
END ARCHITECTURE pipelined;
a
register
+
b
e
register
f + sum
c
register
d
+
clock
If we want to have a register only at the sum output, then we could write our code like
this:
ARCHITECTURE no_pipeline OF adder IS
BEGIN
PROCESS (clock)
BEGIN
IF ( rising_edge(clock) ) THEN
sum <= a + b + d + e;
END IF;
END PROCESS;
END ARCHITECTURE no_pipeline;
a
+
b
e
register
f + sum
d
+ clock
1-64
4 VHDL variables
Sometimes we find that we want to refer to intermediate terms in our code, like
e and f, without causing registers to be inferred by the synthesis tool as occurred above.
This can be achieved by declaring the intermediate terms as variables rather than signals.
Variables are like signals, with the exception that assignments take effect immediately
(as opposed to signals, whose assignments take delay). A variable can only exist
inside a process, and must be declared between the PROCESS and the corresponding
BEGIN statement. In order to remind us of the difference in behaviour, signals and
variables use a different assignment operator:
b <= c; -- b is signal
b := c; -- b is a variable
A variable will assume its new value immediately whenever an assignment occurs (unlike
a signal which receives a new value from an assignment after the process has completed).
This means that using a variable to represent c and f will not cause registers to be inferred.
This is shown below:
ARCHITECTURE rtl3 OF adder IS
BEGIN
PROCESS (clock)
VARIABLE e, f: SIGNED ( 31 DOWNTO 0 );
BEGIN
IF ( rising_edge(clock) ) THEN
c := a + b;
f := d + e;
sum <= c + f;
END IF;
END PROCESS;
END ARCHITECTURE rtl3;
a
+
b
e
register
f + sum
d
+ clock
5 Summary
Pipelining introduces registers into intermediate stage in our design so that the overall
combinational delay between register stages is reduced, thus enabling us to use a higher
clock frequency.We can control inference of register stages by the way that we write
our VHDL code. Signals that are the target of an assignment cause a register to be
inferred. Variables that are the target of an assignment do not cause a register.
1-65
Unit 3.5
Finite State Machine Design using VHDL
Now that we have a good grasp of RTL descriptions, we will put them to work by
looking at two of the most important sequential logic building blocks: finite state
machines (in this lecture) and counters (in the next lecture). We will develop examples
of these for specific problems, but in so doing we will generate adaptable templates that
can be easily adjusted for other situations.
A finite state machine is a machine that can generate sequences and/or respond to input
sequences. We will illustrate this through a simple FSM called seq_detect.
x
seq_detect z
clock
Its behaviour is that on each rising clock edge, new value is read at input x. The output z
goes high when input sequence ends in 101. This can be illustrated with an example
input and output sequence:
x=0011011001010100
z=0000010000010100
Each time we see the input sequence x=101 on successive clock cycles, the output z will
go to 1. Note that in the example above, the x=1 that completed the second occurrence
of 101 is also the first x=1 in a new occurrence.
The device will output z=1 when it reaches state S3. In the other states, it will output
z=0.
We complete the design by defining how the input x causes transitions between these
states. Once we have done this, we can create our VHDL description.
1-66
2 The state diagram
The state diagram shows how new values for x causes transitions between states:
0 1
S0 1 S1
0 0
Input = x
0 Output = z
1 0
S3 1 S2
1 0 0
At turn-on or reset, we start in state S0. If we then receive x=0, we stay in S0 because we
have not seen any part of the required sequence. If, on the other hand, we receive x=1,
we transition to S1 because we have seen the first item in the sequence.
When we are in S1, if we receive x=0 then we transition to S2 because we have now
seen the first two items of the required sequence. If x=1, then we stay in state S1
because this could be the first item in a new 101 sequence. Following this thought
process through, you can see how the rest of the diagram is constructed.
x
seq_detect z
clock
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
ENTITY seq_detect IS
PORT ( x, clock: IN STD_LOGIC; z: OUT STD_LOGIC);
END ENTITY seq_detect ;
In order to define the architecture, we need some way to represent the states S0, S1, S2,
S3 in a way suitable for implementing as a digital memory signal. Four distinct states
1-67
can be represented as two bits. There are many different choices that we could make,
but we’ll make an arbitrary choice to represent our states as S0=00, S1=01, S2=10,
S3=11. We can now produce our code:
How to construct state diagrams and state table for synchronous finite state
machines
How to write VHDL descriptions of synchronous finite state machines
How to enforce particular state encodings on finite state machines, and how to leave
the synthesis tool free to choose its preferred encoding
How to deal with unused states, and ensure that they transition to appropriate states
1-68
Unit 3.5
Creating a Display Driver using VHDL
The boards that we will use in lab have an 8-digit 7-segment display that can display up
to 8 hexadecimal digits:
In this lecture, we will put our knowledge of RTL and finite state machines to work to
produce a driver for the display. To keep the example small, we will only drive four of
the digits, but the extension to all 8 digits is straightforward. The driver will consist of
an entity that has a 16-bit input called number and outputs control signals SEGMENTS
and DIGITS. The value of number will appear in hex on the display. So, for example, if
number=000100100011011, then 1234 would appear on the display.
We take a 4-bit unsigned binary number as the input, and produce the required signals
to light up the appropriate LED segments on the display. These LEDs segments are
active low. They turn on when the corresponding input is low, and turn off when that
input is high. Bearing this in mind (and also bearing in mind that segment 7, the decimal
point, should always be off) we can produce the VHDL description of the driver:
WITH number SELECT
SEGMENTS <= "11000000" WHEN "0000", --0
"11111001" WHEN "0001", --1
"10100100" WHEN "0010", --2
1-69
"10110000" WHEN "0011", --3
"10011001" WHEN "0100", --4
"10010010" when "0101", --5
"10000010" WHEN "0110", --6
"11111000" WHEN "0111", --7
"10000000" WHEN "1000", --8
"10010000" WHEN "1001", --9
"10001000" WHEN "1010", --A
"10000011" WHEN "1011", --b
"11000110" WHEN "1100", --C
"10100001" WHEN "1101", --d
"10000110" WHEN "1110", --E
"10001110" WHEN "1111", --F
"11111111" WHEN OTHERS; --Default
In order to do this, we create a state machine with 4 states. The output is the signal
called DIGITS. The 4 states output the appropriate code on DIGITS to light up one digit
of the digits of the display. The input that drives the transitions is the signal called
change. This goes high once every millisecond.
PROCESS (CLK100MHZ)
BEGIN
IF (rising_edge(CLK100MHZ) ) THEN
1-70
CASE state IS
WHEN s0 => IF change='1' THEN state <= s1;
ELSE state <= s0; END IF;
WHEN s1 => IF change ='1' THEN state <= s2;
ELSE state <= s1; END IF;
WHEN s2 => IF change ='1' THEN state <= s3;
ELSE state <= s2; END IF;
WHEN s3 => IF change ='1' THEN state <= s0;
ELSE state <= s3; END IF;
END CASE;
END IF;
END PROCESS;
process(state)
begin
case state is
when s0 => DIGITS<="0111"; slice<=number(15 downto 12);
when s1 => DIGITS<="1011"; slice <=number(11 downto 8);
when s2 => DIGITS<="1101"; slice <=number(7 downto 4);
when others => DIGITS<="1110"; slice<= number(3 downto 0);
end case;
end process;
PROCESS(CLK100MHZ)
BEGIN
IF rising_edge(CLK100MHZ) THEN -- On each clock cycle …
count <= count + 1 ; -- Increment the counter
IF count >= period THEN -- If it’s time to change digit
count <= (others => '0') ; -- Reset the timer
change <= '1';
ELSE
change <='0';
END if;
END IF;
END PROCESS;
How to write VHDL descriptions of finite state machines and counter circuits
How to generate timer signals at regular intervals
1-71
Unit 3.6
Memory Design in VHDL
A memory is a piece of hardware that contains a series of storage locations for data.
Different types of memory permit different types of operations on these storage
locations; which location is the target of the current operation is determined by an
address input.
The index of the list runs from 0 to 15 denary (0 to F hex), which needs four unsigned
binary bits (i.e.one hex digit) to represent. To represent the number from 0 to 100
denary ( 0 to 64 hex) could be done in 7 bits, but if we want to allow for the possibility
of negative marks (e.g. due to penalties) we will use 8 signed bits.
LIBRARY IEEE;
USE IEEE.std_logic_1164.all;
USE IEEE.numeric_std.all;
ENTITY rom IS
PORT ( address: IN UNSIGNED (3 DOWNTO 0);
data : OUT SIGNED (15 DOWNTO 0) );
END ENTITY rom;
1-72
A memory is represented in VHDL by an array. The address input is converted to an
integer and then used as an index into the array to pick out a particular value. If, for
example, we give the binary address input value of 0110, data item number 6 (i.e. a value
of 3D) will be transferred to the output. The line of code that achieves this is:
data <= mem_data ( TO_INTEGER (address) );
This memory is a ROM (read only memory), as we can only read existing values from it;
we can’t write new values into it.
A memory that had the capability to write a new value from a data input into a memory
location would achieve that like this:
mem_data ( TO_INTEGER (address) ) <= data;
This is a single port memory, as we give it one address, and we receive back one data
item. That means that we can only read one item at a time. If we wanted to read two
different items, we would have to do two read operations one after the other; we can’t
read both all in one go.
LIBRARY IEEE;
USE IEEE.std_logic_1164.ALL;
USE IEEE.numeric_std.ALL;
ENTITY rom_test IS
END ENTITY rom_test;
ARCHITECTURE tb OF rom_test IS
SIGNAL input_address: UNSIGNED(3 DOWNTO 0);
SIGNAL output_data: SIGNED(7 DOWNTO 0);
BEGIN
The loop index i is implicitly declared to be of integer type by the fact that 0 TO 15 is an
integer range. We do not need to explicitly declare i in our code.
1-73
It is common for a ROM to be synchronised to a clock. To achieve this, we would need
to add a clock signal to its ENTITY, and replace the body of the architecture with this:
PROCESS(clock)
BEGIN
IF RISING_EDGE(clock) THEN
data <= mem_data ( TO_INTEGER (address) );
END IF;
END PROCESS;
ARCHITECTURE tb OF rom_test IS
SIGNAL input_address: UNSIGNED(3 DOWNTO 0);
SIGNAL output_data: SIGNED(7 DOWNTO 0);
SIGNAL clock: std_logic;
BEGIN
The second process in the testbench loops over values from 0 to 15, advancing by one on
each clock cycle. The loop index is converted to a 4-bit unsigned number and then applied
as the address to the ROM. The final WAIT statement suspends the process forever, thus
preventing the process from wrapping back to the beginning and applying the test inputs
repeatedly.
2 Multiport Memory
1-74
A multiport memory operates on multiple locations simultaneously. For example, a dual
port ROM would look like this:
In each read cycle, we would apply two addresses and the two indexed data items would
appear at the outputs data1 and data2. So, for example, if we apply the inputs
address1=0010 and address2=0100, we would see at the outputs data1=43 and
data2=2B.
This would cause the sum of the two data items would appear at the result output.
However, this is of limited use as we cannot recycle the result for further arithmetic
processing.
3 Register file
This becomes much more useful if we add a third port (address3, data3) which is
capable of writing data back into the memory array. This results in a type of memory
that is known as register file.
1-75
Suppose we wanted to compute the average student mark. Our first step would be to add
all of the marks together, then we would divide by the total number. We can achieve the
addition of all of the marks in a sequence of steps like this:
Once we have read out the student mark held at location 0 (which is 48) we no longer
need to preserve the value at that location, so in subsequent cycles we use location 0 to
form a subtotal.
This is the basic idea of how a computer program works. In successive cycles, we apply
a series of binary numbers at address1, address2, address3 and opcode to instruct the
hardware as to the function it should perform and where it can find its operand data.
If write_enable=0, then the operation is a read operation. The data_in input is ignored,
and the data item appearing at the location indexed by the address is sent to the data_out
output.
write_enable=1, then the operation is a write. The value at data_in overwrites the data
held at the location indexed by the address input.
The arrangement above shows how small RAMs are normally organised. However, the
data items are many bits wide the requirement for separate data_in and data_out ports
can become expensive. Large RAMs, for example the main memory in a computer
system, normally has a bidirectional data input/output which connects toa bidirectional
bus:
1-76
1-77
Unit 4.1
VHDL Simulation (2)
In unit 2.2, we saw a brief introduction to VHDL simulation. In this unit and the next,
we will look in more detail at how simulations works and some points that can be
confusing to new users of VHDL.
So far, we have looked at using simulation for functional verification (does my adder
actually add? does my shifter actually shift?) which is done by applying the appropriate
sequence of 1s and 0s at the inputs, and looking for the required set of 1s and 0s at the
outputs.
The first question is simply answered by substituting the synthesized circuit instead of
the initial VHDL into the test bench, then re-running the simulator. The second question
is answered by getting the synthesis tool to incorporate component delays into the
VHDL description of the synthesized circuit, and then re-running the simulator.
1-78
Combinational Combinational
logic logic
Combinational
logic
Combinational Combinational
logic logic
The dashed lines indicated registers. Register transfers take data from left to right one
stage on each clock cycle, passing through blocks of combinational logic. All registers
must run from the same clock, and the shortest permissible clock period is established
by the block with the longest delay.
For this piece of VHDL code, whenever a or b change, the value of c will be re-
computed, but c will not get its new value until 5 ns after the change in the input.
Suppose all signals are initialized to zero and then the inputs a and b driven initially to
‘0’ and then to ‘1’.
1-79
At time 20 ns a=’1’ and b=’0’, an input condition that causes c to become ‘1’. However,
there is a 5 ns gap between the cause and the effect, and the output c does not assume its
new value until time 25 ns
If we look at the event queue, we can see how the simulator produced the simulation
results. At time 0, the signal c is initialized to ‘U’ and the assignment to c runs. This
computes a new value, but the new value will not be assigned until 5 ns later. The state
of the queue is therefore:
Time = 0
Signal Name: a b c
Present value: 0 0 U
Next value: 1 1 0
Event time: 20 40 5
The simulator time pointer is advanced to the time of the earliest event on the queue, i.e.
5 ns and c receives its new value.
Time = 5
Signal Name: a b c
Present value: 0 0 0
Next value: 1 1
Event time: 20 40
The signal c does not appear on the right hand side of any assignments, so the change on
c does not cause any further events to be triggered. The simulator time pointer is advanced
to the time of the next event on the queue, i.e. 20 ns and a receives its new value. The
change on a at time 20 ns causes the assignment on c to be executed and a new value of
1 is scheduled for c at time 25 ns.
Time = 20
Signal Name: a b c
Present value: 1 0 0
Next value: 1 1
Event time: 40 25
The simulator time pointer is advanced to the time of the next event on the queue, i.e. 25
ns and c receives its new value.
1-80
Time = 25
Signal Name: a b c
Present value: 1 0 1
Next value: 1
Event time: 40
At time 40 ns b receives its new value. The change on b causes the assignment on c to be
executed and this takes effect at time 45 ns.
Time = 40
Signal Name: a b c
Present value: 1 1 1
Next value: 0
Event time: 45
Time = 45
Signal Name: a b c
Present value: 1 1 0
Next value:
Event time:
3 Gate delay
To prepare ourselves for the next theme, we will need to tale a closer look at how logic
gates operate and what gate delay means. As an example, let’s consider an inverter with
a gate delay of 5 ns:
Our digital signals, which take values of 0 or 1, are an abstraction of what is really
happening in the real world. x is actually a continuous voltage which is interpreted a 1 if
it is above the logic threshold voltage (shown as a red dashed line) and 0 if the voltage
is below.
Suppose x goes through a 0 to 1 transition. The voltage at x will rise. When x moves
above the threshold, the output y will start to fall but it will take 5 ns to fall as low as
the threshold. During this 5 ns, the digital output y will continue to be in the voltage
range that is digitally interpreted as a 1.
1-81
3.1 Inertial delay
Now let’s imagine that x makes a 0 to 1 transition, and then after only 2 ns makes a 1 to
0 transition.
Initially the voltage at x will rise. When x moves above the threshold, the output y will
start to fall but it will take 5 ns to fall below its threshold. Before that can happen, x
starts to fall again which means that y will start to rise. Throughout the whole this
period, the digital output y will to be in the voltage range that is digitally interpreted as
a 1. The input x went briefly to 1, but the output y did not go to 0.
Imagine that the transitions on the inputs to the XOR gate happen closer together, with
a rising at time 20 and b rising at time 22.
Between 20 and 22, the inputs have values that would make the output want to go 1.
However, the XOR gate has a delay and cannot respond until 5 ns has elapsed. Before
this 5 ns passes, the inputs change again to a set of values that means that output should
be 0 (the value it already has).
A real device would respond to these inputs by staying constantly at zero. This is called
an inertial effect; the device delay of 5 ns is a measure of how much electrical inertia it
has, and input transients that are briefer than this length of time will have no effect on
the output. This is the default behaviour of VHDL, so if we simulate our gate, it behaves
as follows:
This is detected in VHDL by looking for collisions on the event queue.
1-82
Time = 0
Signal Name: a b c
Present value: 0 0 U
Next value: 1 1 0
Event time: 20 22 5
Time = 5
Signal Name: a b c
Present value: 0 0 0
Next value: 1 1
Event time: 20 22
Time = 20
Signal Name: a b c
Present value: 1 0 0
Next value: 1 1
Event time: 22 25
Time = 22
Signal Name: a b c
Present value: 1 1 0
Next value: 1 0
Event time: 25 27
Two events have collided on the event queue. By default, VHDL just throws away the
old event to make way for the new. So the event queue looks like this:
Time = 22
Signal Name: a b c
Present value: 1 1 0
Next value: 0
Event time: 27
Note that this can only be used in simulation. It cannot be used in synthesis. This
technique can be useful when you want to assign the input values for a test bench, but
without using a process. So, for example, if we were testing a two-input device and
wanted the inputs to do this:
1-83
ARCHITECTURE tb1 OF tb IS
SIGNAL in1, in2: STD_LOGIC;
BEGIN
-- Set up the test input signals
a <= '0', '1' AFTER 20 NS, '0' AFTER 60 NS;
b <= '0', '1' AFTER 40 NS, '0' AFTER 80 NS;
-- Rest of the test bench goes here
END ARCHITECTURE tb1
According to the normal rules of VHDL, the assignments will run at the beginning of
simulation (because all concurrent lines of code run at the start of simulation). They will
not run again subsequently (because lines only re-run when a RHS value changes, and
these lines have no changeable signals on their RHS). After the assignments have run
this will be the state of the event queue:
Time = 0
Signal Name: a b
Present value: U U
Next value: 0 1 0 0 1 0
Event time: 20 60 40 80
Using the comma operator, the multiple transaction co-exist on the queue, and are not
treated in an inertial manner. We could achieve the same behaviour for a and b by using
the following process:
ARCHITECTURE tb2 OF tb IS
SIGNAL a, b: STD_LOGIC;
BEGIN
PROCESS
BEGIN
-- Set up the test input signals
a <= '0';
b <= '0';
WAIT FOR 20 NS;
a <= '1';
WAIT FOR 20 NS;
b <= '1';
WAIT FOR 20 NS;
a <= '0';
WAIT FOR 20 NS;
b <= '1'
WAIT;
END PROCESS;
-- Rest of the test bench goes here
END ARCHITECTURE tb2
5 Summary
VHDL simulation proceeds by moving through time in response to events scheduled on
an event queue. As assignments run, they schedule new events for signals to receive in
future. As new signals receive new values, they will trigger the execution of further
lines of code.
1-84
Unit 4.2
VHDL Simulation (3)
In this unit we will look at how to use concurrent code and how to use processes to give
correct results for combinational logic. In particular, we will see how to construct the
sensitivity list for a process, a matter that can be confusing for users who are new to
VHDL.
BEGIN
c <= a XOR b AFTER 5 NS;
d <= c AFTER 2 NS;
END ARCHITECTURE simple;
We assume that the simulation has been started by some other piece of code that has
assigned initial values of a, b, c, d to 0 and has scheduled a transition on a from 0 to 1 at
time 20.
Time = 0
Signal Name: a b c d
Present value: 0 0 0 0
Next value: 1
Event time: 20
We jump to the time of the next scheduled transaction, i.e. time 20, and let a take its
new value. This causes the assignment on c to run. That will cause c to transition to 1 at
a time 5 in future:
Time = 20
Signal Name: a b c d
Present value: 1 0 0 0
Next value: 1
Event time: 25
We jump to the time of the next scheduled transaction, i.e. time 25, and let c take its
new value. This causes the assignment on d to run. That will cause d to transition to 1 at
a time 2 in future:
Time = 25
Signal Name: a b c d
Present value: 1 0 1 0
Next value: 1
Event time: 27
Finally we jump to the time of the next scheduled transaction, i.e. time 27, and let d take
its new value. This completes the simulation:
Time = 27
Signal Name: a b c d
Present value: 1 0 1 1
Next value:
Event time:
The simulator extracts out the waveform from the tables above:
1-85
This simulation contained no surprises, but it gives au a foundation for our examination
of how processes simulate.
2 An incorrect process
Now we will try to describe the same logic using a process. The body of the process
computes new values for the outputs c and d. The process is triggered to run whenever a
signal on its sensitivity list changes. A common mistake in constructing the sensitivity
list is to assume that we only need to include the inputs a,b:
BEGIN
PROCESS(a, b) -- This is wrong
BEGIN
c <= a XOR b AFTER 5 NS;
d <= c AFTER 2 NS;
END PROCESS;
END ARCHITECTURE simple;
Once again, we assume that the simulation has been started by some other piece of code
that has assigned initial values of a, b, c, d to 0 and has scheduled a transition on a from
0 to 1 at time 20. Let’s see how this simulates:
Time = 0
Signal Name: a b c d
Present value: 0 0 0 0
Next value: 1
Event time: 20
We jump to the time of the next scheduled transaction, i.e. time 20, and let a take its
new value. This causes the process to run and compute an assignment to cause c to
transition to 1 at a time 5 in future. The assignment on d will also run, but that will
assign the current value of c (i.e. zero), not the future value:
Time = 20
Signal Name: a b c d
Present value: 1 0 0 0
Next value: 1
Event time: 25
We jump to the time of the next scheduled transaction, i.e. time 25, and let c take its
new value. This has no further consequences: we have finished
Time = 25
Signal Name: a b c d
Present value: 1 0 1 0
Next value:
Event time:
1-86
This is not equivalent to the situation that we simulated in section 1. The signal d did
not follow the value of c.
3 A corrected process
We can correct this by including c in the sensitivity list:
BEGIN
PROCESS(a,b,c) -- This is right
BEGIN
c <= a XOR b AFTER 5 NS;
d <= c AFTER 2 NS;
END PROCESS;
END ARCHITECTURE simple;
Time = 0
Signal Name: a b c d
Present value: 0 0 0 0
Next value: 1
Event time: 20
We jump to the time of the next scheduled transaction, i.e. time 20, and let a take its
new value. This causes the process to run and compute that c should transition to 1 at a
time 5 in future. The assignment on d will also run, but that will assign the current value
of c (i.e. zero), not the future value:
Time = 20
Signal Name: a b c d
Present value: 1 0 0 0
Next value: 1
Event time: 25
We jump to the time of the next scheduled transaction, i.e. time 25, and let c take its
new value of 1. This causes the process to run again and compute an assignment to c to
get 1 which is not a change, so we don’t update the queue) and an assignment of d to 1
to happen 2 in future.
Time = 25
Signal Name: a b c d
Present value: 1 0 1 0
Next value: 1
Event time: 27
Finally we jump to the time of the next scheduled transaction, i.e. time 27, and let d take
its new value. This completes the simulation:
Time = 27
Signal Name: a b c d
Present value: 1 0 1 1
Next value:
Event time:
1-87
The final waveform is as follows:
This is correct. Adding nore c to the sensitivity list means that when c is assigned a new
value, node d updates.
1-88
Unit 4.3
Using Processes for Combinational Logic
In this unit, we continue our consideration of the features of VHDL that have tended to
cause students confusion during the assignments. Specifically, we will look at how to
use signals correctly when describing combinational logic with a process.
n1
x
sum
y
cin
cout
The output needs to be re-calculated whenever one of the inputs change, so we use x, y
and cin as the sensitivity list for the process. However, the above description is wrong.
It does not behave like the circuit in the diagram and a synthesis tool would not produce
the desired circuit if it was fed with this code. The problem is that during the execution
of the process, all of the signals have their value frozen. The new values are applied one
delta after the process finishes running.
Imagine that initially x=’0’, y=’0’ and cin=’0’. As a result, the internal node n1=0. Then
x changes from ‘0’ to ‘1’. The process would run because a signal on its sensitivity list
has just changed. Statement 6 computes a new value of ‘1’ for n1, but this value will not
be applied until after the process has finished. In the mean time, statement 7 is executed
and uses the old value of n1, thus producing a result of 0. The process then finishes; n1
gets its new value of ‘1’ and sum gets its value of ‘0’. By contrast, the circuit in the
diagram would produce a final value of ‘1’ for sum. The VHDL description is therefore
incorrect.
We can remedy the problem with the process by adding the internal node n1 to the
sensitivity list.
1-89
SIGNAL n1: STD_LOGIC; --2
BEGIN --3
PROCESS (x, y, cin, n1) --4
BEGIN --5
n1 <= x XOR y; --6
sum <= cin XOR n1; --7
cout <= ( x AND y ) OR ( cin AND x ) OR ( y AND cin ); --8
END PROCESS; --9
END ARCHITECTURE corrected; --10
In this case, when x changes to 1 the process runs because a signal on its sensitivity list
has changed. As before, n1 is assigned to a value of 1 and sum to a value of 0. Now,
because n1 has just changed and it appears on the sensitivity list, the process immediately
runs again. This time statement 7 will use the updated value of n1 and sum will get a
value of 1.
In general, the rule to describe combinational logic using a process is that all inputs and
all internal nodes that receive an assignment must appear on the sensitivity list.
Now n1 updates immediately when statement 6 is executed, so its new value is used in
statement 7.
1-90
Index
Unit 1.4 Handling Signals that are more than 1 bi wide 1-12
1 STD_LOGIC_VECTORs 1-12
2 An example 1-12
3 STD_LOGIC_VECTOR values 1-13
3.1 Direction of numbering 1-13
3.2 Aggregates 1-14
3.3 Concatenation 1-15
3.4 Literals 1-15
4 Summary 1-15
1-91
Unit 2.1 Dataflow and Structural VHDL 1-23
1 Behavioural description versus structural description 1-23
2 Example of transforming a high level description to a netlist 1-24
2.1 Implementing the adder function 1-25
3 A dataflow description of the full adder 1-25
3.1 Local signals 1-27
4 Connecting entities together: structural VHDL 1-27
4.1 Placing library components into a design 1-28
4.2 Positional association 1-29
4.3 Named association 1-29
5 Summary 1-29
1-92
Unit 3.1 Register Transfer Level VHDL (1) 1-53
1 The D-type flip-flop 1-53
1.1 D-type flip-flop with reset 1-54
2 Registered logic 1-55
2.1 Example: Carry ripple in adders 1-55
2.2 The registered adder 1-56
2.3 VHDL description of the registered adder 1-57
2.4 Register transfer level (RTL) description 1-58
3 Summary 1-58
Unit 3.3 Controlling Register Inference using Signals and Variables 1-62
1 Pipelines 1-62
2 Speed of pipelined datapaths 1-63
3 Controlling pipelining with signals 1-63
4 VHDL variables 1-65
5 Summary 1-65
1-93
Unit 4.1 VHDL Simulation (2) 1-78
1 The purpose of simulation 1-78
1.1 Timing constraints 1-78
1.2 The VHDL simulation mechanism 1-79
2 Simulation with delays 1-79
3 Gate delay 1-81
3.1 Inertial delay 1-82
4 Simulation with inertial delay 1-82
4 Assigning multiple transactions for a signal 1-83
1-94