CHDD part 1

Computer Hardware and Digital Design
Unit 1.1
Introduction to the Digital Half of the Module
The digital half of the module consists of three themes:
Part 1: The VHDL language (

~6 hours of video lectures, worked examples session 1
We start by looking at the processes used to design modern large scale digital
systems, e.g. microprocessor chips and the special purpose chips used in cameras,
phones and TVs. These are complex systems that may involve many millions of logic
gates. The design approaches that were learnt in first and second year (Boolean
algebra, Karnaugh maps, state transition diagrams and tables) are far too slow for the
creation of such large designs. Instead, it is common for humans to design using high
level languages (Hardware Description Languages) that specify the overall behaviour
required, and these are automatically translated to logic gates. We will learn the
VHDL language and put it into action in the labs.
Part 2 Computer hardware and application-specific integrated circuits

~6 hours of video lectures, worked examples session 2
We then look at how designs turn into silicon chips. We look at the various
components of a modern computer, and how the desire for ever faster performance
has driven the evolution of the hardware. We look in detail at the design of
microprocessors and the tricks and techniques they use to maximise performance.
Finally we look at situations where using a computer (which solves a problem one
step at a time in software) is not fast enough and we must build special-purpose
silicon chips that are dedicated to a particular problem, and solve many aspects of the
problem in parallel. We look at the most usual type of special purpose chip that can be
programmed using VHDL, the Field Programmable Gate Array (FPGA).
Part 3: Designing large scale digital systems that are easy to test
~2 hours of video lectures, examples session 3
With any ambitious manufacturing process, not all of the manufactured units will
work, so we need to test each unit before selling to a customer. However, with a very
complex system, it can be very difficult, costly and time consuming to decide whether
or not an individual unit definitely works under all possible input conditions. Modern
designs incorporate additional features to make each unit easy, quick and cheap to
test. We will look at the main approaches to design-for-test.
The digital part of the module consists of:

 14 hours of lecture content (delivered as Panopto videos)
 3 past exam sessions (delivered as Panopto videos)
 Weekly tutorial sessions to look at past exams and issues that cropped up in lab
 5 x 2-hour lab sessions. The lab assignment is written up as a report and submitted
in December
Unit 1.2
Why do we need Hardware Description Languages (HDLs)?
This session introduces modern techniques for hardware design. These are based
around the use of hardware description languages HDLs. The basic idea is that instead
of designing a new piece of hardware one logic gate at a time, we just write a
description (using an HDL) of what we want the whole design to do, and this is
automatically translated into a detailed hardware design in a process called synthesis.
1 What’s the problem with traditional methods?

Traditional methods of logic design are based on truth tables, state transition tables,
Boolean logic and Karnaugh maps. To illustrate some of the issues, we start off by
looking at a simple case study. We’ll also use this case study in later sessions to
illustrate some of the features of VHDL.
Suppose we want a piece of hardware that adds two 16-bit numbers together
16
a 16
16 ADD sum
b
How would we approach this using traditional methods? We would have to exercise
some ingenuity to partition this large problem into a series of smaller problems that
are easier to solve. For example, we could partition into a sequence of sixteen 1-bit
adders
a0 + sum0
b0
a1 + sum1
b1
a2 sum2
+
b2
a3 + sum3
b3
and so on …
The procedure would then go as follows:

 We form the truth table for each of these units.
 We turn this truth table into a Karnaugh map
 We extract Booleans equation from the Karnaugh map
 We turn these Boolean equations into AND, OR and NOT gates
 Finally, we assemble the 16 units to make our complete circuit:
1-2
That’s the traditional design methodology. Let’s do some evaluation of this method.
Design effort
It’s quite a lot of work to get from the design brief (add 16-bit numbers) into the
design using logic gates. It would take about 10 minutes to complete and document
the process. But the chips used in mobile phones or computer games consoles contain
millions of logic gates. If we have to put this much effort into getting out a design that
conatins only a few dozen gates, then we are in trouble. The designer productivity will
be too low; it will take too much time, and cost too much money to undertake a big
project. We need a method that gives higher productivity.
Maintainability
When you are in the midst of designing a piece of hardware, you probably have a
pretty good understanding of the design you produce. However, it’s difficult to work
with gate level designs produced by someone else, or even with designs that were
produced by you a few weeks previously. Just by glancing at the logic schematics
above, can you tell what it actually does? We need a method that gives greater clarity
as to what the designs do.
Some implementation styles don’t use basic logic gates

Some types of hardware, e.g. FPGAs don’t actually use basic logic gates such as AND
gate and OR gates. But we put a lot of effort into getting the designs down to an
interconnection of basic logic gates. This effort was simply wasted if the design was
to be implemented on an FPGA. (FPGA design tools have translation utilities that
translate the logic gates into something that the FPGA can use, but the fact still
remains that the effort expended in getting out the gate-level design was largely
wasted, since the FPGA didn’t actually use it.)
1-3
2 VHDL
VHDL is the VHSIC Hardware Description Language. (VHSIC stands for Very High
Speed Integrated Circuit). The main motivation for its creation was to provide
rigorous and unambiguous specification of modules. Gradually it developed to be
used for other purposes such as synthesis. The other important HDL is Verilog. This
is older than VHDL, and its original form was not very powerful. Over the years, it
has been enhanced and extended with extra features, so that it is now as powerful as
VHDL.
VHDL is based around the notion of being able to view the modules in a design at
different levels of abstraction. Crudely speaking, a high level of abstraction contains
little detail, and a low level of abstraction contains a lot of detail. Low-level design
requires a lot of effort, and we want to avoid this effort until we are sure that it won’t
be wasted. We need to resolve all high level issues before we commit to any low level
design.
3 Levels of Abstraction
 Algorithmic
This describes the basic idea of what the design is supposed to do, without
reference to how this functionality is to be achieved. (Indeed, one of the
purposes of the design phase is to investigate different possible methods to
achieve the desired functionality). The initial specification of a design will
almost always be at the algorithmic level.
 Register transfer level (RTL)
The design is conceived as a group of interconnected modules. For each
module we know three things:
 What its interface to the other modules of the design is (how many inputs
and outputs, how many bits wide are they, etc.)
 What the logical relationship is between the inputs and the outputs (usually
expressed as something like a Boolean logic equation)
 What the timing is between the inputs and the outputs (i.e. what happens
on what clock cycle).
 Gate level
The design is constructed from basic logic gates1.
There may also be a 4th level, the physical level. This might refer, for example, to the
processing of a piece of silicon that would be necessary to manufacture the required
configuration of logic gates as an application-specific integrated circuit (ASIC).
Alternatively, it might be the generation of the configuration bit stream for a field
programmable gate array (FPGA).
4 Synthesis
A synthesis CAD tool is one that automatically maps a description from one level of
the hierarchy to a lower level on the hierarchy. The main types are as follows:-
 Physical synthesis
This performs the mapping from gate-level to physical level. For an ASIC, a
gate level representation is automatically translated to a mask level design of
an integrated circuit. For a PLD or an FPGA, a gate-level design is mapped to
a configuration file that controls how the fuses in the device should be blown,
1 Or whatever is the most fundamental design primitive for the hardware implementation we are
intending to use. This would Configurable Logic Blocks (CLBs) for an FPGA or a fuse map for a
CPLD.
1-4
or how the CLBs should be configured. CAD tools do this task extremely
well.
 Logic synthesis
This maps an RTL description to a gate level description. CAD tools also
perform this stage extremely well.
 Behavioural synthesis
This maps an algorithmic description to a register transfer or to a gate level
description. At present, automated synthesis tools cannot do this well. It is a
very active area of research, but for now, and the foreseeable future, this is a
task that must be done by humans.
Automatic synthesis leads to great productivity gains, because human designers can
confine their activities to high level design. It has often said that the output of a
designer is limited to about 10-50 items per day, fully debugged and properly
documented. This rate is true whether an item corresponds to a logic gate, a functional
unit of an RTL design, or an equation representing the behaviour of an entire digital
filter for a signal processing system. 10-50 logic gates will only form a very small part
of a system whereas 10-50 RTL functional units may be enough to describe an entire
system. The higher the level at which the designer is working, the more the designer
can produce.
Synthesis also helps us avoid the risk of putting a lot of effort into targeting one
particular manufacturing technology, only to find that we need to re-target the design
to a new technology. If all the low-level synthesis was done by humans, the re-
targeting could take thousands of man hours. If it was done by a CAD tool, then we
simply re-synthesise for the new target hardware, which means leaving a computer
running for a few days.
5 Summary
Traditional design methods have many problems. Designer productivity is too low.
Decisions about the implementation have to be made early in the design process. If
the design is re-targeted from one technology to another (say a design originally
implemented on an ASIC is moved to an FPGA) the whole design process needs to be
repeated.
HDLs aim to alleviate these problems. Specification is normally done at the

algorithmic level of abstraction. Design is then carried out at the RTL level, saying
what the modules of the design should do, and on which clock cycle the output is due.
This description in then automatically translated to implementation in a process called
synthesis.
HDLs also allow a specification for a module to be simulated. So we can try out
different combinations of input and see what the outputs would do. This can be done
before any detailed gate level design is attempted.
You should now know...
The meaning of the following:

 Level of abstraction
 Algorithmic level
 Register transfer level (RTL)
 Logic synthesis
 Physical synthesis
1-5
Unit 1.3
Introduction to VHDL
In this session we will look at how to do simple designs in VHDL.
1 Entity and Architecture

We'll start off with an extremely simple example: we will describe a NAND gate. The
first thing that we have to do is to say what the device looks like to the outside world.
This basically means describing its port map, i.e. the signals that flow in and out of
the device.
b nandgate
c
To describe this in VHDL, we use an entity declaration.
ENTITY nandgate IS
PORT ( a, b: IN STD_LOGIC; c: OUT STD_LOGIC );
END;
Everything in uppercase is a VHDL keyword, i.e. part of the language. Everything in

lower case is a name that I have chosen for the parts of my design. The entity has to
be given a name (we've chosen nandgate, but you could have chosen any other name).
Each of the signals in the port map is declared as having a mode and a type. The mode
can be IN or OUT, and simply says whether the signal is an input or an output. The
type STD_LOGIC represents a signal that bit can a value of ‘0’, ‘1’, ‘X’ or ‘U’. (‘X’
means unknown. ‘U’ means uninitialized, i.e. a signal that has not yet been assigned
any valid logical value.) STD_LOGIC is the normal way to describe logic signals that
appear at the input or output of gates, or at wires in between them.
Now that we have described the inputs and outputs, we need to say what the device
does, i.e. how its outputs respond to its inputs. This is done in an architecture:
ARCHITECTURE simple OF nandgate IS

BEGIN
c <= a NAND b;
END;
The ARCHITECTURE statement says that we are producing a description of what

goes on inside nandgate. It is possible (and indeed quite common) for us to try out
many different designs for what goes on inside nandgate. We have to give each
different design a name, so that we can tell VHDL which version we want to use. I
have chosen the name simple for this particular design. After the ARCHITECTURE
statement comes the word BEGIN. This introduces the main body of the architecture,
which explains how the outputs relate to the inputs. At the end of the body comes the
END statement, which says that we have reached the end of the body.
How the outputs relate to the inputs is described by the statement

c <= a NAND b;
1-6
The symbol <= (which is meant to look like a left-pointing arrow) is pronounced
"gets". It means that the signal c gets the value of a NANDed together with the value
of b. Whenever a or b change their value, this statement causes the value of c to be
updated.
If we want to check that our description is functioning correctly, we can feed it into a
simulator, a program that predicts how the outputs would change in response to
changes in the input. Here is the sort of thing we get if we run this code through a
simulator
The horizontal axis is time, ranging from 0 to 100 ns. Traces are shown for the signals
a, b and c. Whenever a or b changes its value, c receives a new value. In order to carry
out the simulation, we need to tell the simulator what we want each of the inputs a and
b to do (in this case we have toggled each from 0 to 1 and then back to 0). The
simulator then works out what the output c would do in response. You can see that c
is carrying out the logic function a NAND b, so the design is correct.
VHDL uses the following logical operators: NOT, AND, OR, NAND, NOR, XOR
2 Specifying another architecture

We are allowed to give many different descriptions of the way that the input related to
the output. So, for example, here is a second version of the architecture of the
nandgate.
ARCHITECTURE complicated OF nandgate IS

BEGIN
c <= NOT ( a AND b );
END;
This achieves exactly the same function as the first description, but does it in a
different way.
3 BEGIN and END statements

If you are used to C, C# or Java, you will know that sometimes you want to consider a
group of statements to be considered as one block. C, C# and Java use curly braces {
and } to indicated the beginning and end of a block respectively. So, for example, a
loop in C might look like this:
for (i=1; i<=n; i++)

{
a[i]=i;
b[i]=a[i]*a[i];
}
The braces show that the two statements should be considered collectively as a block
that makes up the body of the loop.
VHDL uses the keywords BEGIN and END. So in VHDL the loop would look like
this
1-7
FOR i IN ( 1 TO N ) LOOP
BEGIN
a(i) = i;
b(i) = a(i) * a(i);
END LOOP;
Note that indentation of the block is used to make it clearer where the block starts and
ends.
4 Semicolons
Like C or Java, VHDL uses the semicolon to indicate the end of a statement.
Statements that "open up" a block don't take semicolons. So in C these would be
wrong:
for (i=1; i<=n; i++); /* WRONG: shouldn't be a semicolon here */

{; /* ALSO WRONG: don't want a semicolon here */
a[i]=i;
b[i]=a[i]*a[i];
}
Similarly in VHDL these would be wrong
FOR i IN ( 1 TO N ) LOOP; --WRONG: shouldn't be a semicolon here

BEGIN; --ALSO WRONG: don't want a semicolon here
a(i) = i;
b(i) = a(i) * a(i);
END LOOP;
Let's have another look at our simple example:
ENTITY nandgate IS
END;

BEGIN
c <= a NAND b;
END;
The keyword IS is "opening up" a block of statements, and therefore does not need a
semicolon. However, note that VHDL is a little inconsistent as to whether IS needs to
be followed by a BEGIN. In an ENTITY, the BEGIN is implied, and the END
statement is answering the IS. By contrast, in an ARCHITECTURE the word BEGIN
must also be there, and the END is answering the BEGIN.
5 Stylistic issues
5.1 Case
VHDL is not case sensitive. All three of these are identical in meaning, and you’ll see
all three styles in textbooks and design magazines:
ENTITY nandgate IS
PORT ( a, b: IN STD_LOGIC; c: OUT STD_LOGIC);
END;
entity NANDGATE is
port ( A, B: in std_logic; C: out std_logic);
end;
1-8
entity nandgate is
port ( a, b: in std_logic; c: out std_logic);
end;
It used to be considered good style to write all the keywords of VHDL in one case,
and all the names that we have chosen for our design in the other case. This makes it
easier to figure out what is going on in the design.
Nowadays it is normal to put everything in lowercase. Modern VHDL editors are

context sensitive, and can figure out which words are part of the VHDL language and
show them in a particular colour. So for the editors we use in labs, VHDL keywords
are automatically displayed in purple, and the names chosen by us for signals, entities
and architectures are shown in black.
In lectures we will show all keywords in uppercase to make it clearer to you what is
part of the VHDL language, and what is just a name that I have chosen.
5.2 Spaces and indents

You can put as many spaces as you like between words. So, for example, these are
both the same
ENTITY nandgate IS
PORT (a,b: IN STD_LOGIC; c: OUT STD_LOGIC);
END;
ENTITY nandgate IS
END;
5.3 Returns
Putting in a carriage return makes no difference to the function of your code. So the
following two are identical in function
ENTITY nandgate IS
END;
ENTITY nandgate IS
PORT ( a, b: IN STD_LOGIC;
c: OUT STD_LOGIC);
END;
You can use whichever you feel is clearest.
5.4 Annotating END statements

In a long description, it can be easy to lose track of how many BEGIN and END pairs
you have in the code. To help you keep track, you can put the name of what you think
you are ending after the END statement. So, for example, you can write
ENTITY nandgate IS
END ENTITY nandgate;

BEGIN
c <= a NAND b;
END ARCHITECTURE simple;
1-9
When you run the compiler, the code will be checked, and if there is a mismatch
between what you say you are ENDing and what VHDL thinks you are ending, then
this will be flagged as an error.
Although the annotation of END statements is normally optional, it is considered to

be good style. It is a useful safety precaution, which can save you from bugs that are
difficult and time consuming to find.
5.5 Comments
Comments are introduced by two dashes:
-- Here is our simple first example

ENTITY nandgate IS -- The entity shows the port map
PORT ( a, b: IN STD_LOGIC; -- Inputs
c: OUT STD_LOGIC); -- Outputs

BEGIN
c <= a NAND b; -- This is how the output gets its function
Everything after the two dashes up to the end of the line is a comment. The example
above isn’t great, because the comments are stating the obvious. But we haven’t done
enough of the language yet to show an example that would give rise to more sensible
comments
6 The IEEE library

The listings shown so far in this lecture have been incomplete, and if you try to use
them, then the compiler will give an error message something like “Cannot recognise
type STD_LOGIC”. A large number of features and extensions to the capabilities of
the VHDL language are bundled into a library called “IEEE”. The definitions used for
STD_LOGIC are held in this library. In order to use the features of this library, a
design must open the library and say which features of the library it wishes to access.
6.1 Opening libraries

The IEEE library is opened by this statement:
LIBRARY IEEE;
The IEEE library contains many sub-libraries, which in turn contain many features.
The VHDL name of a sub-library is a package. In order to say which features of
which packages we wish to access, we use a statement that looks like this:
USE IEEE.XXXX.YYYY
Where XXXX is the name of the required package, and YYYY is the name of the
specific feature that is to be used. Rather than listing each specific feature that we
want to use (which can be very tedious), often we will simply make all features within
a package visible by using the VHDL keyword ALL:
USE IEEE.XXXX.ALL
This opens up all features in the XXXX package of the IEEE library so that they can
be used by our design.
1-10
6.2 Using STD_LOGIC
The standard logic definitions are held in a package called std_logic_11641. So here is
a full listing for the NAND, that opens up the library to access the features of
STD_LOGIC type.
LIBRARY ieee;
USE ieee.std_logic_1164.all;
ENTITY nandgate IS

BEGIN
c <= a NAND b;
7 Summary
We’ve looked at the basic features of VHDL, and seen some simple examples. The
two key parts of a description are the ENTITY, i.e. a list of inputs and outputs, and an
ARCHITECTURE, i.e. a description of the logical relationship between the inputs and
outputs. We’ve also looked at the STD_LOGIC data type, which represents a wire
carrying values of 1 and 0. The definition of the STD_LOGIC data type is held in the
library IEEE and must be imported at the start of each entity in our code.

 Entity
 Port map
 Architecture
 Library and package
 Standard logic STD_LOGIC
 The STD_LOGIC values ‘0’, ‘1’, ‘X’ and ‘U’
1 1164 is simply the number of the IEEE standards document that defined the Standard Logic type.
1-11
Unit 1.4
Handling signals that are more than 1 bit wide
1 STD_LOGIC_VECTORs
Most interesting design have inputs that are more than just a single bit. For example,
lets consider a device that has two 4-bit inputs a and b, and a 4-bit output c.
b c0
0
a
0
4 b1 c1
a a
4 1
4 c Expanded out, it looks like this: b c2

b a
2
2
b3 c3
a
3
In VHDL, quantities such as a, b and c are called STD_LOGIC_VECTORs. If you are

familiar with arrays in computer programming, you can think of a
STD_LOGIC_VECTOR as being an array of STD_LOGIC signals. So the input a
would be declared as being
STD_LOGIC_VECTOR(0 TO 3)
Now a contains four members a(0), a(1), a(2) and a(3). Each of these four members is
of type STD_LOGIC.
2 An example
Imagine that we wanted to represent a device like this:
b c0
0
a
0
b1 c1
a
1
b c2
2
a
2
b c3
3
a
3
The entity declaration would look like this.
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
ENTITY orgate IS
PORT ( a, b: IN STD_LOGIC_VECTOR(0 TO 3);
c: OUT STD_LOGIC_VECTOR(0 TO 3));
END ENTITY orgate;
There are several ways that we could write the architecture. One way to describe it
would be like this, explicitly listing what happens for each bits:
1-12
ARCHITECTURE number1 OF orgate IS
BEGIN
c(0) <= a(0) OR b(0);
c(1) <= a(1) OR b(1);
c(2) <= a(2) OR b(2);
c(3) <= a(3) OR b(3);
END ARCHITECTURE number1;
Alternatively, we could just write this, which would be simpler and would mean
exactly the same thing

BEGIN
c <= a OR b;
VHDL knows that a, b and c are four bits wide, and will do the appropriate operation
for each of the bit positions.
Or, if we preferred, we could write this

BEGIN
c(0 TO 3) <= a(0 TO 3) OR b(0 TO 3);
This is effectively a loop, which tells VHDL to make four assignments, one for each
of the four bit positions 0, 1, 2 and 3.
3 STD_LOGIC_VECTOR values
The value of an STD_LOGIC is indicated by a string of values enclosed in double
quotes. So if a is a single bit, assignment looks like this:
a <= '1';
If a is 4-bits wide assignment looks like this:
a <= "1110";
By default, VHDL expects the values to be binary, but sometimes it can be useful to
use Hex numbers. This can be done by placing the letter X before the
STD_LOGIC_VECTOR value:
a <= X"E";
3.1 Direction of numbering

In the examples given above, the elements were numbered from 0 to 3
a: STD_LOGIC_VECTOR(0 TO 3);
a <= “1110”;
Element 0
Element 1
Element 2
Element 3
1-13
This feels normal and intuitive (indeed in the programming languages that you may
know, e.g. C or Java, this is the only way that you are allowed to do it). However, in
VHDL you also have the option to have arrays where the index counts downwards:
a: STD_LOGIC_VECTOR(3 DOWNTO 0);
a <= “1110”;
Element 3
Element 2
Element 1
Element 0
In both cases, the number would be interpreted as 14 signed or –2 unsigned: the left-
most bit is always interpreted as the msb and the right most is always interpreted as
the lsb.
In digital logic design, the normal numbering convention is that bit 0 is the least
significant bit (lsb). This is accomplished by having the index run downwards. So
unlike most programming languages, in VHDL it is normal for arrays to be numbered
downwards. You can use upward-numbering if you want, but this often leads to
confusion that creates awkward bugs in your code.
3.2 Aggregates
Aggregates are a group values, separated by commas, that will be used for an array.
Here is an example:
ARCHITECTURE example OF aggregate IS
SIGNAL nibble1, nibble2: STD_LOGIC_VECTOR ( 0 TO 3 );
BEGIN
nibble1 <= ( '0','1','0','0');
nibble2 <= ( '0','0','1','0');
END ARCHITECTURE example;
The assignment for nibble1 sets its 0th value to ‘0’, its 1st value to ‘1’, the 2nd to ‘0’
and so on. This way of doing things is called positional assignment: the 0th value
listed goes in the 0th position, the first goes in the first position and so on. We could
instead use named association. So these statements have the same effect:
nibble1 <= ( '0','1','0','0');
nibble1 <= ( 1 => '1', 0 => '0', 3 => '0', 2 => '0');
With named association, we can just specify the values of some of the bit positions,
and use an OTHERS value to provide a value for everything not explicitly mentioned.
So these are all the same:
nibble1 <= ( '0','1','0','0');

nibble1 <= ( 1 => '1', 0 => '0', 3 => '0', 2 => '0');
nibble1 <= ( 1 => '1', OTHERS => '0');
The OTHERS notation can be used as a convenient trick when we want to set all the
values of an array to a particular value:
nibble1 <= ( OTHERS => '1');
This would set all of the elements of nibble1 to '1'.
1-14
3.3 Concatenation
Concatenation merges two vectors to produce a longer vector. For example
ARCHITECTURE example OF aggregate IS
SIGNAL byte: STD_LOGIC_VECTOR ( 0 TO 7 );
SIGNAL nibble1, nibble2: STD_LOGIC_VECTOR ( 0 TO 3 );
BEGIN
nibble1 <= ( '0','1','0','0');
nibble2 <= ( '0','0','1','0');
byte <= nibble1 & nibble2;
END ARCHITECTURE example;
would cause byte to assume the value ( '0','1','0','0','0','0','1','0')
3.4 Literals
STD_LOGIC is a data type that has values '0', '1', ‘X’, ‘U’ etc. It is sub-type of
CHARACTER. An array of characters is a string, and is denoted by double quotes.
This is similar to the convention used in the C programming language: '1' is a
character; "1010" is an array of 4 characters. We can use this notation for
STD_LOGIC_VECTORS. So for example,
nibble1 <= ( '0','1','0','0');
could be written as
nibble1 <= "0100";
A value that is directly specified (as opposed to being calculated from other signals),
like “0100” in the code above is called a literal. Standard logic vector literals may be
specified in binary, octal or hexadecimal. By default, a string is interpreted as binary.
To make it explicit that we wish the string to be interpreted as a binary number, we
can place the letter B in front. For an octal string, we place the letter O in front, and
for hexadecimal, we place X in front. So if a is 12-bit std_logic_vector, then these are
all equivalent:
a <= "010011001010";
a <= B"010011001010";
a <= O"2312";
a <= X"4CA”
Long strings of ‘1’s and ‘0’s can be confusing, so in order to improve legibility, we
can introduce underscores:
a <= B"0100_1100_1010";
The underscores are ignored by VHDL; their only function is to space the digits out to
make it easier for a human to read. Note that if you do use underscores in your values,
you must put the B in front to make it clear that this should be interpreted as a binary
value. This (without the B) would be an error:
a <= "0100_1100_1010"; -- Wrong!
4 Summary
We’ve looked at the STD_LOGIC_VECTOR datatype, which represents multi-bit signals.
It is effectively an array of STD_LOGIC values. We have also seen the two main
notations for assigning values to STD_LOGIC_VECTORs and how to indicate the number
base for a STD_LOGIC_VECTOR value.
1-15

 Standard logic vector (STD_LOGIC_VECTOR)
 Aggregates
 Named association
 Positional association
How to represent standard logic vector values
1-16
Unit 1.5
Number Representation and Arithmetic
Most forms of data that are handled by digital systems (e.g. samples of audio data,
pixel values for image and video data, ASCII data for representing text) are some
form of number. In this section we will look at the background to two of the most
important VHDL numerical data types (SIGNED and UNSIGNED). Before we do that, we
will briefly recap binary data representation formats for numbers.
1 Denary
In everyday life, we use the denary (base 10) number system whose digits can take the
values 0,1,2,3,4,5,6,7,8,9. An n-digit denary number with digits di is interpreted as
having the value
 d 10
i  0 , n 1
i
i
So, for example, the number 365 has three digits: d2=3, d1=6 and d0=5. Its value is
3  100 + 6  10 + 5  1, which is1
3  102 + 6  101 + 5  100
2 Unsigned binary
For digital systems, we deal with binary numbers where digits can have value 0,1.
These values are particularly convenient for the construction of simple, cheap and fast
electronic circuits. A digit that can take only the values 0 or 1 is called a binary digit
or bit. An n-bit binary number has the value
d 2
i  0 , n 1
i
i
So, for example, the denary number 5 equates to the 3-bit binary number 101, which
has digits d2=1, d1=0 and d0=1. Its value is
1  4 + 0  2 + 1  1, which is
1  22 + 0  21 + 5  20
The bit with the highest weighting is called the most significant bit (msb) and the bit
with the lowest weighting is the least significant bit (lsb). The msb is always the
leftmost bit and the lsb is the rightmost. For this 3-bit example, bit number 2 is the
msb and bit number 0 is the lsb.
The largest number that we can represent depends on the number of bits that we use.
For example, if we use 4 bits to represent a number, then there are 16 different values
that can be represented:
Remember that anything raised to the power of zero is 1, i.e. x  1 for all x
1 0
1-17
Number Unsigned binary representation
0 0000
1 0001
2 0010
3 0011
4 0100
5 0101
6 0110
7 0111
8 1000
9 1001
10 1010
11 1011
12 1100
13 1101
14 1110
15 1111
This form of representation is called unsigned binary. This is the representation

format used by the one of the main VHDL data types, UNSIGNED. It is quite
straightforward, but it cannot deal with negative numbers. For many purposes (e.g.
address generation inside a computer, pixel data for still image and video applications,
etc.) we would never expect to encounter a negative number, and unsigned binary is
perfectly satisfactory. However, for many other purposes we do need to be able to
deal with negative numbers and we therefore need to use a more sophisticated
approach to number representation.
3 Signed numbers: 2s complement

All cars have a device called an odometer that displays how many km (or miles) they
have travelled. On older cars the display was 5 digit, and therefore could represent all
numbers from 00,000 to 99,999. If the odometer display stood at 99,999 at the start of
your journey, then after 1 km it would roll over to say 00,000.
Similarly, if we have a 4-bit binary up-counter and it reads 1111, then the next state in
its count sequence will be 0000. This provides us with an alternative method for the
representation of negative numbers. –1 is the number that is 1 less than zero. In other
words, it is a number which when added to 1 gives zero. But we have just seen that
1111 when added to 1 gives 0000. Similarly, -2 is the number that gives zero when we
add 2 to it. This is 1110. The number system that this generates is shown below
Number 2s complement representation

-8 1000
-7 1001
-6 1010
-5 1011
-4 1100
-3 1101
-2 1110
-1 1111
0 0000
1 0001
2 0010
3 0011
4 0100
5 0101
6 0110
7 0111
1-18
Thus we interpret the value of an n-bit 2s complement number as
 d n 1 2 n 1  d 2
i 0 ,n  2
i
i
This is exactly the same as an unsigned binary number, except that the msb is
negatively weighted. So, for example, the interpretation of the number 1010 is
Unsigned binary 2s complement

1010 = 1010
= 1  23 + 0  22 + 1  21 + 0  20 = 1  -23 + 0  22 + 1  21 + 0  20
=18+04+12+01 = 1  -8 + 0  4 + 1  2 + 0  1
=8+2 = -8 + 2
= 10 = -6
Note that all negative numbers have an msb of 1 and all positive numbers have an
msb of 0. The msb of a 2s complement is therefore often referred to as the sign bit.
4 Addition of 2s complement numbers

Suppose we have a circuit that adds numbers according to the normal laws of
unsigned binary arithmetic. So for example, we could use the circuit to add 12 and 2
to get the answer 14
1100 (denary 12)

+ 0010 (denary 2)
= 1110 (denary 14)
What modification would we need to make to the circuit to make it add 2s

complement numbers? If we consider the example 1100 + 0010, then this is the
interpretation:
1100 (denary -4)

+ 0010 (denary 2)
= 1110 (denary -2)
But this is correct without any modification. An adder for unsigned binary works
without modification for 2s complement. This is why 2s complement is the normal
representation format for integer and fixed point numbers in digital systems1.
The VHDL data type SIGNED uses 2s complement representation.
5 Using arithmetic: the NUMERIC_STD package

Now let’s look at a simple example to illustrate arithmetic on STD_LOGIC_VECTORs, a
comparator:
4
a
4 g
b
1In fact there is one small difference between an unsigned binary adder and a 2s complement adder,
and this is related to the treatment of overflow conditions.
1-19
It has two inputs, a and b, both of which represent four-bit binary numbers. There is a
single one-bit output g, which represents the “greater than” condition. When a>b then
g=’1’; otherwise g=’0’.
This can’t be interpreted unless we know whether a and b are signed (2’s
complement) or unsigned numbers. Suppose a=1111 and b=0001. If the numbers are
unsigned then a is fifteen and b is one. So a>b and g=1. But if the numbers are signed
then a is minus one and b is plus one; a<b and g=’0’.
The way that VHDL handles this is through the NUMERIC_STD library. This introduces
two new data types, SIGNED and UNSIGNED. These are declared and used in the same
way that STD_LOGIC_VECTORs would be, but they have the additional property that
arithmetic operators +,-,>,< and conversion to integer are defined for them. By
contrast, if we tried to apply +,-,>,< to STD_LOGIC_VECTORs this would not be allowed
and would result in a compilation error.
Here is an example piece of code:

LIBRARY ieee;
USE ieee.numeric_std.ALL;
ENTITY numbers IS
END ENTITY numbers;
ARCHITECTURE behavioural OF numbers IS

SIGNAL x: UNSIGNED(3 DOWNTO 0);
SIGNAL y: SIGNED(3 DOWNTO 0);
SIGNAL z: STD_LOGIC_VECTOR(3 DOWNTO 0);
SIGNAL x1, y1, z1: integer;
BEGIN
x <= "1010";
y <= "1010";
z <= "1010";
x1 <= TO_INTEGER(x); -- x1 gets the value +10
y1 <= TO_INTEGER(y); -- y1 gets the value -6
-- z1 <= TO_INTEGER(z); -- Wrong! Would not compile
END ARCHITECTURE behavioural;
The binary value 1010 when given to an unsigned signal x converts to an integer
value of +10. The same value when given to a signed signal y converts to an integer
value of -6. However, if we try to convert a STD_LOGIC_VECTOR z whose value is
1010, this results in a compilation error. This forces us always to be clear about the
number convention that we are using, and helps us to avoid subtle bugs creeping into
our code.
If we really did want to convert z to an integer, then we would need to feed it into a
function that converts it value to UNSIGNED or SIGNED, and then geed it into the
TO_INTEGER conversion function
z1 <= TO_INTEGER(z); -- Wrong! Would not compile
z1 <= TO_INTEGER( UNSIGNED(z) ); -- OK – y1 gets the value +10
z1 <= TO_INTEGER( SIGNED(z) ); -- OK – y1 gets the value -6
1-20
So our comparator circuit would be described like this (assuming that we want the
signed interpretation of our binary numbers):
LIBRARY ieee;
ENTITY comparator IS
PORT ( a: IN SIGNED(3 DOWNTO 0);
b: IN SIGNED(3 DOWNTO 0);
g: OUT STD_LOGIC);
END ENTITY comparator;
ARCHITECTURE behavioural OF comparator IS

BEGIN
g <= '1' WHEN a>b ELSE '0';
Alternatively, we could keep the input signals declared as STD_LOGIC_VECTORs, and

apply the sign interpretation internally within the architecture:
LIBRARY ieee;
ENTITY comparator2 IS
PORT ( a: IN STD_LOGIC_VECTOR(3 DOWNTO 0);
b: IN STD_LOGIC_VECTOR (3 DOWNTO 0);
g: OUT STD_LOGIC);
END ENTITY comparator2;
ARCHITECTURE behavioural OF comparator2 IS

BEGIN
g <= '1' WHEN SIGNED(a)>SIGNED(b) ELSE '0';
6 A design example: the Arithmetic Logic Unit

All this may seem like a lot of hard work to produce designs that would have been
easier using old traditional manual design methods. But now let’s take on a more
complicated design that really illustrates the power of VHDL.
Opcode Operation
00 num1 + num2
01 num1 – num2
10 num1 OR num2
11 num1 AND num2
This is an arithmetic logic unit. It has two inputs, num1 and num2, each of which are
16 bits wide. The 16-bit output result is produced by some arithmetic or logical
operation on the two inputs. The operation that will be performed on num1 and num2
to produce result is shown in the table.
To design one of these using manual design methods is a non-trivial task. However, in
VHDL it’s easy. Here is the listing:
1-21
LIBRARY ieee;
ENTITY alu IS
PORT ( num1, num2: IN SIGNED(15 DOWNTO 0);
opcode: IN UNSIGNED(1 DOWNTO 0);
result: OUT SIGNED(15 DOWNTO 0) );
END ENTITY alu;
ARCHITECTURE dataflow OF alu IS

BEGIN
result <= num1 + num2 WHEN opcode="00"
ELSE num1 - num2 WHEN opcode="01"
ELSE num1 OR num2 WHEN opcode="10"
ELSE num1 AND num2 WHEN opcode="11";
END ARCHITECTURE dataflow;
In order to test whether our design does what we expect, we can feed it into a
simulator tool. This is a program which allows us to apply inputs to the design, and to
see what outputs would be produced by the design. Here is an example simulation:
The horizontal axis is time, ranging from 0 to 80 ns.

num1 has been given the value “0000000000010001”, which is 0011 in Hex.
num2 has been given the value “0000000000000111”, which is 0007 in Hex.
As the opcode changes through the sequence “00”,”01”,”10”,”11” (or 0,1,2,3 in Hex),
the output result gets num1+num2, then num1-num2, then num1 OR num2 then
num1 AND num2.
Once we have simulated the description thoroughly and are sure that it correctly gives
the behaviour we want, the code can then be fed into a synthesis tool, a computer
program which will automatically generate a gate-level design.
The meaning and purpose of the following:

 The VHDL types SIGNED and UNSIGNED
 The package NUMERIC_STD
 That we cannot apply the arithmetic operations +, -, >, <, etc. to a simple
STD_LOGIC_VECTOR; we must first provide a type conversion to SIGNED or
UNSIGNED.
1-22
Unit 2.1
Dataflow and Structural VHDL
There are two fundamentally different ways that we can describe a design:
 Behavioural descriptions tell us what the design should do but not how we would
make it
 Structural descriptions tell us how we would make it but not what it would do
In this unit, we will look how a behavioural design (a 4-bit adder) might be
transformed by a synthesis tool into a structural netlist of logic gates.
As we do this, we will learn more about how to write behavioural and structural
VHDL. There are several different approaches to writing behavioural VHDL. The
easiest to understand is dataflow, a type of description where we build up the required
behaviour as a set of arithmetic and logical operations on data items.
1 Behavioural description versus structural description

Behavioural descriptions tell us what the design should do but not how we would
make it. So this is a behavioural description:
sum <= x+y;
By contrast, this is a structural description.
carry in
x
0 + sum
y 0
0
x
1 + sum
y 1
1
x
2 + sum
y 2
2
x
3 +
y sum
3 3
carry out
A structural description tells us how we would connect together several simpler units
to make a more complicated unit. In this case our simple units are full adders, and the
complicated unit is the 4-bit adder.
If we are going to build or circuits out of basic logic gates, then the above description
isn't quite finished. Although we have broken down our complicated (4-bit) unit into
simpler (1-bit) units, we still haven't shown how to make the full adders out of gates.
By contrast, in the description below everything is resolved into the most basic
building blocks we have (in this case logic gates).
1-23
This design is a special case. It is a netlist, a description that consists solely of the
interconnection of basic building blocks that are available to implement the design. A
netlist contains sufficient detail that it is immediately obvious how to build the device.
2 Example of transforming a high level description to a netlist

Imagine that we want to create a 4-bit adder:
4
x 4
4 sum
y
A human would normally describe this behaviourally:
LIBRARY ieee;
ENTITY adder IS
PORT ( x, y: IN SIGNED(3 DOWNTO 0);
sum: OUT SIGNED (3 DOWNTO 0) );
END ENTITY adder;
ARCHITECTURE behavioural OF adder IS

BEGIN
sum <= x + y;
The above code has all the advantages of a behavioural description. The code is
concise; you can easily see at a glance what function it performs; it contains no
detailed decisions about logic gates. This code would then be given to a synthesis
tool, a computer program whose purpose is to design a circuit that fulfils the required
behaviour.
1-24
2.1 Implementing the adder function
Here is a circuit that the synthesis tool might create in order to fulfil our 4-bit adder
requirement:
carry in
x
0 + sum
y 0
0
x
1 + sum
y 1
1
x
2 + sum
y 2
2
x
3 +
y sum
3 3
carry out
The four-bit adder is built up from four 1-bit full adders, which have the following
behaviour:
carry in x y Carry in Sum Carry out

0 0 0 0 0
x 0 0 1 1 0
+ sum 0 1 0 1 0
y 0 1 1 0 1
1 0 0 1 0
carry out 1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
There are many ways to implement this. One possible way is shown below.
x
sum
y
carry in
carry out
In the remainder of this lecture, we will look in detail at the full adder circuit, and use
it to illustrate the features of dataflow VHDL. In the next lecture we will look at how
to connect together the full adders and create a structural 4-bit adder. We will also
look at how to feed the 4-bit adder circuit into a simulator.
3 A dataflow description of the full adder

Suppose we want to write a behavioural description of the full adder. In order to do
this, we must explain how we want the outputs of the design to relate to the inputs.
One way is simply to use the truth table to describe the device, like this:
1-25
LIBRARY ieee;
ENTITY fulladd IS
PORT ( x, y, cin: IN STD_LOGIC;
sum, cout: OUT STD_LOGIC);
END ENTITY fulladd;
ARCHITECTURE tedious_but_easy OF fulladd IS

BEGIN
sum <= '0' WHEN x='0' AND y='0' AND cin='0'
ELSE '1' WHEN x='0' AND y='0' AND cin='1'
ELSE '1' WHEN x='1' AND y='1' AND cin='1';
cout <= '0' WHEN x='0' AND y='0' AND cin='0'
ELSE '1' WHEN x='1' AND y='1' AND cin='1';
END tedious_but_easy;
This is easy and obvious, but also tedious. There are many neater ways to describe the
behaviour. We could take inspiration from the gate level design of the full adder, and
write this
ARCHITECTURE simple OF fulladd IS

BEGIN
sum <= cin XOR x XOR y;
cout <= ( x AND y ) OR ( cin AND x ) OR ( y AND cin );
This is much neater and nicer, but requires us to think a bit harder about how the
outputs relate to the inputs.
It’s important to realise that as far as a synthesis tool is concerned, both descriptions
are the same thing. They simply say how the outputs relate to the inputs. The second
architecture is not ordering the synthesis tool to use two XOR gates, 3 AND gates and
an OR gate. It’s simply a shorthand for saying how the output relates to the inputs.
The synthesis tool is free to do whatever it wants to find a circuit that has the same
input-output relation.
1-26
3.1 Local signals
Now let’s look at a slight modification of our description.
n1
x
sum
y
n2
cin
cout
n3
n4
We have given names to the internal nodes of the circuit (n1, n2, n3, n4). Once we
have given them names, we are free to use them in our description. So here is a
slightly different description
ARCHITECTURE number3 OF fulladd IS

SIGNAL n1, n2, n3, n4: STD_LOGIC;
BEGIN
n1 <= x XOR y;
sum <= cin XOR n1;
n2 <= x AND y;
n3 <= cin AND x;
n4 <= y AND cin;
cout <= n2 OR n3 OR n4;
This is basically the same as the simple architecture of fulladd, but this time we have
used the local signals n1, n2, n3 and n4 as part of the description. In order to use the
names, we have to declare that they exist, that they are signals, and that they carry
logic values (e.g. ‘1’, ‘0’, ‘X’ and ‘U’) which means that they are of type
STD_LOGIC. The declaration of local signals takes place between the
ARCHITECTURE statement and the first BEGIN.
4 Connecting entities together: structural VHDL

Now let’s look at how to use simple designs as building blocks to make more
complex designs. We’ll do this building a structural description of a 4-bit adder. A
structural description says how we connect together simple components to make a
more complicated component. We have already designed a full adder in an entity
fulladd and an architecture called dataflow. When we compiled this code, it was
placed into a library ready to be used by other designs. By default, the current
working library is called work. So our full adder is stored with the following name:
work.fulladd(dataflow)
The name is constructed from the library name followed by a point, then the entity
name, then the architecture name.
1-27
4.1 Placing library components into a design
We can now build up a 4-bit adder structurally as follows:
cin
x
0 + sum
y 0
0 carry
x 1
1 + sum
y 1
1 carry
2
x
2 + sum
y 2
2 carry
3
x
3 +
y sum
3 3
cout
LIBRARY ieee;
ENTITY adder IS
PORT ( x, y: IN STD_LOGIC_VECTOR(3 DOWNTO 0);
cin: IN STD_LOGIC;
sum: OUT STD_LOGIC_VECTOR(3 DOWNTO 0);
cout: OUT STD_LOGIC);
END ENTITY adder;
ARCHITECTURE structural OF adder IS
SIGNAL carry: STD_LOGIC_VECTOR(4 DOWNTO 0);
BEGIN
g0: ENTITY work.fulladd(dataflow)
PORT MAP (x(0),y(0),cin,sum(0),carry(1));
PORT MAP (x(1),y(1),carry(1),sum(1),carry(2));
PORT MAP (x(3),y(3),carry(3),sum(3),cout);
END ARCHITECTURE structural;
Each of the components that we have used is defined by a statement providing

 a name for the component1 (I’ve chosen g0 … g3, but I could have chosen any
name I like.)
 the keyword ENTITY
 the full name of the gate that I want to use
 the keyword PORT MAP
 a list of the wires that I am connecting to the inputs and outputs of the gate
In the jargon of VHDL, each of these statements is called an instantiation. We have

created four instances of the fulladd component and connected them together
appropriately.
The names appearing in the port maps are the names of the wires. Wherever the same
name occurs in the output list of one component and in the input list of another, that
means that there is a wire connecting these two components. So, for example,
carry(2) is a wire connecting the output of c1 to the input of c2.
1 Strictly speaking, g1 is a statement label, but you can think of it as just providing a name for the gate.
1-28
4.2 Positional association
How does VHDL know which of the wires I am connecting to c1 are inputs and
which are outputs? If we compare the instantiation
c0: entity work.fulladd(dataflow)
PORT MAP (x(0),y(0),cin,sum(0),carry(1));
with the definition of the full adder
ENTITY fulladd IS
PORT ( x, y, cin: IN STD_LOGIC; sum, cout: OUT STD_LOGIC);
END ENTITY fulladd;
We see that the first three signals in the port map are inputs and the last two are
outputs. So the first three signals in the instantiation x(0), y(0) and cin will be attached
to the inputs x, y and cin. Similarly, sum(0) will be connected to sum and carry(1) will
be connected to cout. This is called positional association.
4.3 Named association

If you prefer, you can explicitly tell VHDL how you want to connect up the wires in
your design to the inputs and outputs of the gate, like this
c0: entity work.fulladd(dataflow)

PORT MAP (x=>x(0), y=>y(0), cin=>cin, sum=>sum(0), cout=>carry(1));
This is called named association. With named association, the order doesn’t matter, so
you could jumble up the order of the signals and write the instantiation like this
g0: entity work.fulladd(dataflow)

PORT MAP (cout=>carry(1), x=>x(0), y=>y(0), cin=>cin,
sum=>sum(0));
5 Summary
A behavioural description says what a design should do. Dataflow is a type of
behavioural description that relates the outputs to inputs using logical or arithmetic
assignments.
A structural description says how we construct a design from the composition of
simpler units. A structural description can be
 Hierarchic: made up from simpler units, which themselves then need to be
designed
 Netlist: made up from fundamental building blocks (e.g. logic gates)

 Dataflow VHDL
 Local signals
 The WORK library
 Instantiation
 Positional association and named association
1-29
Unit 2.2
VHDL Simulation
A simulator is a software tool that takes a proposed design (which could be
behavioural or structural or a mixture of both) and predicts what outputs would result
from a given set of input transitions. Simulation one of the main methods of verifying
that a proposed design has the required behaviour.
1 How are statements processed?

Before we look at how a simulator works, let’s think about the thought process that a
human would go through to figure out how a circuit would respond to transitions in its
inputs.
Let’s look again at the full adder circuit, and for the sake of clarity, we will now give
names (g1…g6) to the gates.
g1 n1
x
g2 sum
y
n2
g3
cin
g4 g6 cout
n3
g5
n4
Imagine that the signals x, y and cin are initially at zero. Looking through the circuit,
we can see that n1, n2, n3, n4, sum and cout will all be at zero.
Now imagine that x changes its value from 0 to 1. Let’s think through what happens
next:
 x is the input to three gates: g1, g3 and g4. These gates are potentially affected by
the change, so we need to re-compute their outputs n1, n2, n3.
 We also know that gates g2, g5 and g6, which don’t have x as an input, can’t be
affected by this change, so there is no point to re-computing their outputs.
 The new value of n1 is 1 (i.e. it changed)
 The new value of n2 is 0 (i.e. it is unchanged)
 The new value of n3 is 0 (i.e. it is unchanged)
 n1 just changed, which means that any gate that has n1 as an input (i.e. g2) needs
to have its output (sum) re-computed.
 n2 and n3 didn’t change, so we don’t need to bother to examine any consequences
in gate g6, which has n2 and n3 as inputs.
 The new value of sum is 1.
 There are no more gates whose inputs have changed, so we can stop analysing the
circuit now.
2 Simulation of dataflow VHDL

The way that VHDL processes statements during simulation tries to capture the above
thought process.
1-30
BEGIN
1 n1 <= x XOR y;
2 sum <= cin XOR n1;
3 n2 <= x AND y;
4 n3 <= cin AND x;
5 n4 <= y AND cin;
6 cout <= n2 OR n3 OR n4;
All statements 1-6 are scanned simultaneously, waiting for a signal on the right hand
side (RHS) to change. In the jargon of VHDL, a change to a signal is called an event.
A VHDL simulation proceeds by manipulating an event queue. If we assume that all
signals are initially at 0, then the event queue initially looks like this:
Time = 0
Signal Name: x y cin n1 n2 n3 n4 sum cout
Present value: 0 0 0 0 0 0 0 0 0
Next value: 1
Event time: 10
It has a list of the present value for each signal, any new value that has been scheduled
to take place in future, and the time at which the signal must assume this new value.
In this case then next event is that x will transition from ‘0’ to ‘1’ at time 10 ns.
Once the event queue is set up, the simulator proceeds by looking down the event
queue to find the time of the next pending event. It then jumps forward to the time of
the next event (10 ns), giving x its new value. An event has just occurred on signal x.
This triggers execution of the all of the statements that have x on the RHS:
1 n1 <= x XOR y; Gives new value of 1

3 n2 <= x AND y; Gives new value of 0
4 n3 <= cin AND x; Gives new value of 0
If the new value is different from the old value, it is placed on the event queue. The
queue now looks like this.
Time = 10
Signal Name: x y Cin n1 n2 n3 n4 sum cout
Present value: 1 0 0 0 0 0 0 0 0
Next value: 1
Event time: 10+
 is in an infinitesimal time interval. In VHDL, a signal assignment can never happen

instantaneously. (If it could, this would be equivalent to gate g1 have a zero delay,
which is impossible in real hardware and would give rise to situations where the
simulation would not faithfully reproduce the behaviour of real hardware.)
The simulator now looks down the event queue to find the next scheduled event.
There is only one item on the queue, the 0 to 1 transition on n1 at time 10+. The time
pointer is incremented to 10+and n1 takes its new value. Because an event has
occurred on n1, any statement with n1 on the RHS is triggered:
2 sum <= cin XOR n1; Gives new value of 1
1-31
Time = 10+
Present value: 1 0 0 1 0 0 0 0 0
Next value: 1
Event time: 10+
The simulator now looks down the event queue to find the time of the next scheduled
event, i.e. 10+2. The time pointer is incremented to 10+2and sum takes its new
value.
Time = 10+2
Present value: 1 0 0 1 0 0 0 1 0
Next value:
Event time:
There are no statements with sum on the RHS, so no further statements are triggered.
The simulator now looks down the queue to find the next scheduled event. There are
none, so simulation terminates.
3 Concurrent processing
Now we come to a very important point. Consider these two descriptions of the full
adder:

BEGIN
n1 <= x XOR y;
sum <= cin XOR n1;
n2 <= x AND y;
n3 <= cin AND x;
n4 <= y AND cin;

BEGIN
sum <= cin XOR n1;
n1 <= x XOR y;
n2 <= x AND y;
n3 <= cin AND x;
n4 <= y AND cin;
Although they are written in a different order, they do exactly the same thing. Unlike
programming languages such as C, which process lines in the order that they are
written, VHDL normally monitors all statements at the same time, and executes a
statement when one of its RHS values changes. This is called concurrent execution.
The style of VHDL that uses concurrent assignments through arithmetic or Boolean
operators is called dataflow VHDL.
4 Components with delays

Real components have delays. In the simulation of the full adder, we didn’t know
what the delay was, so the simulator used its default value of .  is the smallest
1-32
interval of time that the simulator can deal with. You can think of as meaning “a
moment” or “an instant”.
4.1 The AFTER keyword

Suppose we did some measurements on the gates that we have available to build our
system, and we find that the NAND gate has a delay of 5 ns. We could modify our
description of the gate to this:
ARCHITECTURE with_timing_info OF nandgate IS

BEGIN
c <= a NAND b AFTER 10 NS;
END ARCHITECTURE with_timing_info;
Now, when a or b change, the value of c will be re-computed, but c will not get its
new value until 10 ns after the change in the input. VHDL knows about the following
units of time:
Unit Name Meaning

PS picosecond 10-12 seconds
NS nanosecond 10-9 seconds
US microsecond 10-6 seconds
MS millisecond 10-3 seconds
S second
4.2 The full adder example with component delays

So let’s re-write the description with some real component delays, and see how
simulation proceeds:
ARCHITECTURE delays OF fulladd IS

BEGIN
1 n1 <= x XOR y AFTER 10 NS;
2 sum <= cin XOR n1 AFTER 10 NS;
3 n2 <= x AND y AFTER 7 NS;
4 n3 <= cin AND x AFTER 7 NS;
5 n4 <= y AND cin AFTER 7 NS;
6 cout <= n2 OR n3 OR n4 AFTER 8 NS;
END ARCHITECTURE delays;
Imagine that the signals x, y and cin are initially at zero, so n1, n2, n3, n4, sum and
cout are also initially at zero. At time 10 ns x will go to one. The event queue initially
looks like this:
Time = 0
Signal Name: x y cin n1 n2 N3 n4 sum cout
Present value: 0 0 0 0 0 0 0 0 0
Next value: 1
Event time: 10
Once the event queue is set up, the simulator looks down the event queue to find the
time of the next scheduled event. It then jumps forward to the time of the next event
(10 ns), giving x its new value. An event has just occurred on signal x. This triggers
execution of the all of the statements that have x on the RHS:
1 n1 <= x XOR y AFTER 10 NS; Gives new value of 1

3 n2 <= x AND y AFTER 7 NS; Gives new value of 0
4 n3 <= cin AND x AFTER 7 NS; Gives new value of 0
1-33
Time = 10
Signal Name: X y Cin n1 n2 n3 n4 sum cout
Present value: 1 0 0 0 0 0 0 0 0
Next value: 1
Event time: 20
The simulator now looks down the event queue to find the next scheduled event.
There is only one item on the queue, the 0 to 1 transition on n1 at time 20. The time is
incremented to 20and n1 takes its new value. Because an event has occurred on n1,
any statement with n1 on the RHS is triggered:
2 sum <= cin XOR n1 AFTER 10 NS; Gives new value of 1
Time = 20
Present value: 1 0 0 1 0 0 0 0 0
Next value: 1
Event time: 30
The simulator now looks down the event queue to find the time of the next scheduled
event, i.e. 30. The time pointer is incremented to 30and sum takes its new value.
Time = 30
Present value: 1 0 0 1 0 0 0 1 0
Next value:
Event time:
There are no statements with sum on the RHS, so no further statements are triggered.
The queue is now empty, so simulation terminates.
1-34
5 Simulation of structural VHDL
For the sake of clarity, let’s look again at the structural description, and highlight
which of the signals are inputs:
cin
x
0 + sum
y 0
0 carry
x 1
1 + sum
y 1
1 carry
2
x
2 + sum
y 2
2 carry
3
x
3 +
y sum
3 3
cout
ARCHITECTURE structural OF adder IS

SIGNAL carry: STD_LOGIC_VECTOR(4 DOWNTO 0);
BEGIN
PORT MAP (x(0),y(0),cin, sum(0),carry(1));
PORT MAP (x(3),y(3),carry(3),sum(3),cout);
END ARCHITECTURE structural;
The way that this is handled by a VHDL simulator is as follows:

 All statements g0-g3 are scanned simultaneously, waiting for an event on an input
signal.
 If one of the inputs (shown in bold) to a full adder changes, then the output for
that full adder is recomputed.
6 Summary
Simulation advances through time updating signal values according to assignments.
Dataflow VHDL consists of a series of Boolean/arithmetic assignment statements.
These statements are concurrent: all are active at the same time; a statement is
triggered to re-evaluate its left hand side value when any a right-hand side value
changes. Structural VHDL consists of instantiations of library elements, which
operate concurrently. When an input to an instance changes, the new outputs are
evaluated.

 Event
 Event queue
 Concurrent execution
 The AFTER keyword
1-35
Unit 2.3
VHDL Processes and Test Benches
In this lecture we will look at how to write blocks of VHDL that are interpreted
sequentially (as opposed to the concurrent behaviour that we have seen so far). This is
done by using a VHDL process. Sequential VHDL has many applications, but in this
lecture we will illustrate its use in setting up simulations that can be used to test out our
designs before we feed them to a synthesis tool.
1 Sequential VHDL: PROCESSes

We have already seen that the default behaviour of VHDL is that all statements within
the body of an architecture are processed concurrently:
 all statements are active at the same time;
 statements will re-evaluate their left-hand side (their outputs) when a signal on their
right-hand side (their inputs) changes.
Sometimes it useful to have blocks of VHDL where the lines of code are executed
sequentially. This is done by means of a VHDL PROCESS. A process looks like this
PROCESS
BEGIN
Statement 1;
Statement 2;
Statement 3;
END PROCESS;
When the PROCESS executes, it runs each statement sequentially, i.e. Statement 1 first,
Statement 2 second and so on. When the process reaches the END PROCESS statement, it
wraps back round to the BEGIN and starts all over again. Because of this behaviour, the
process shown above would in fact be an infinite loop running in zero time, which will
never be useful. So we need to give additional information to a PROCESS to tell it when it
should run and when it should suspend its execution. One way to do this is by means of
a WAIT statement.
2 The WAIT FOR statement

Here is an example of a WAIT statement in a process that is used to generate a clock
signal:
PROCESS
BEGIN
clock <= '1';
WAIT FOR 10 NS;
clock <= '0';
WAIT FOR 10 NS;
END PROCESS;
When simulation is carried out, the process starts running immediately at time 0. When
it gets to the WAIT statement, it is suspended. After 10 ns of simulation time has gone
by, the process resumes execution until it hits the next WAIT statement. After another 10
ns has elapsed, the process resumes, reaches the END PROCESS, wraps back round to the
BEGIN and continues execution. The resulting behaviour is as follows:
The process resumes every 10 ns the clock signal toggles between 0 and 1.
A process can be used anywhere that it would be legitimate to use a line of concurrent
code. If we use multiple processes within an architecture, then the processes operate
concurrently with one another. The processes also operate concurrently with any lines
of concurrent code within the architecture.
3 Processes with sensitivity lists

Another way to tell a process when to run is to use a sensitivity list, so the process looks
like this:
PROCESS ( sensitivity list )

BEGIN
Statement 1;
Statement 2;
END PROCESS;
The sensitivity list is a list of signals. The way this works is as follows:
 The process waits until a signal in its sensitivity list changes.
 When signal on the sensitivity list changes, the process starts executing. It runs each
of the statements in its body sequentially, i.e. one after the other, first statement 1,
then statement 2, an so on.
Suppose, for example, that we want to describe our full adder using a process:
x
sum
y
carry in
carry out
This circuit will need to re-compute its outputs whenever an input changes. So the
sensitivity list should be x, y, cin. The process body will describe how to compute the
new outputs in response to an input change.
ARCHITECTURE using_process OF fulladd IS

BEGIN
PROCESS (x, y, cin)
BEGIN
cout <= ( x AND y ) OR ( cin AND x ) OR ( y AND cin );
sum <= cin XOR x XOR y;
END PROCESS;
END ARCHITECTURE using_process;
Whenever x, y or cin change, the process will execute (sequentially) and compute new
values for cout and sum. The process will then suspend until the next time x, y or cin
change.
4 Sequential and concurrent VHDL

Inside a process, the lines of code are interpreted sequentially: the order in which they
are executed is the same as the order in which they are written. Outside of a process,
1-37
code is interpreted concurrently: statements run whenever a RHS value (or input value)
changes.
Sequential code “flows” from one line to the next, in blocks of code. This means that
we can build up complicated sequences of statements that build up a required behavior
over many lines. This is a very powerful way of doing things, and corresponds fairly
closely to the way that programming languages such as C work.
In concurrent VHDL by contrast, each statement is completely independent of

neighboring statements, so each statement must be self contained. Whatever behavior
we are trying to describe for a particular signal has to be bunched together into a single
statement of code. This can make description of certain types of behavior difficult or
even impossible.
There are many constructs of a language that do make sense if there is a flow of code
from one line to the next, but don’t make any sense at all if each statement is
independent of its neighbors. So, in sequential VHDL there are many additional features
of the language that we can use that are not available in concurrent VHDL.
Wrapping up a description in a process gives us the opportunity to write sequential

code, which makes many additional features of the language available to us. Used
wisely, this can be a good thing. However, it does give us the ability to write VHDL
that cannot be synthesized to hardware, or may synthesize very inefficiently. By
contrast, the dataflow approach forces us to write code that will synthesize nicely.
4.1 Sequential and concurrent conditionals

The syntax of the sequential IF block is shown below6
IF condition_1 THEN
sequence of statements;
ELSIF condition_2 THEN
ELSE
END IF;
Notice that this assumes a sequential flow of control from one statement to the next. So
the IF block can only be used inside a process. Using an IF block outside of a process is
a error and the code will not compile.
In concurrent code, each line stands alone and is triggered into life by a change on its
RHS. So in order to achieve conditional assignment in a piece of concurrent code, we
need a version of the IF statement that bundles all the functionality into one (possibly
quite long) line of code. This is the WHEN statement.
a <= value1 WHEN condition1

ELSE value2 WHEN conditon2
ELSE value3;
Using a WHEN statement inside of a process is an error and the code will not compile7.
6 Notice that ELSIF (one word) is not the same thing as ELSE IF (two words)
7 This annoyed users so much that a change was made in VHDL-2008 so that WHEN statements can be
used inside processes. However, most tools default to VHDL-93 so to make a WHEN statement work
inside a process requires you would need to alter the compiler option to VHDL-2008.
1-38
4.2 Sequential and concurrent selection
In order to illustrate the selection operator, consider the example of a 4-input
multiplexer. The output y is connected to one of the four data inputs as result of the
value applied at input address. If address is “00” then data(0) is selected through to the
output y; if address is “01” then data(1) is selected, and so on.
data(0)
data(1)
data(2) MUX
data(3) y
4 to 1
address(0)
address(1)
The ENTITY declaration for this device is as follows:
LIBRARY ieee;
USE ieee.std_logic_1164.all;
ENTITY mux4to1 IS
PORT ( address: IN STD_LOGIC_VECTOR(1 DOWNTO 0);
data: IN STD_LOGIC_VECTOR(3 DOWNTO 0);
y: out STD_LOGIC);
END mux4to1;
The operation of selecting one of the data lines to the output y depending on the value of
address is accomplished in concurrent VHDL using a SELECT statement:
ARCHITECTURE concurrent OF mux4to1 IS

BEGIN
WITH address SELECT
y <= data(3) WHEN "11",
data(2) WHEN "10",
data(1) WHEN "01",
data(0) WHEN OTHERS;
END ARCHITECTURE concurrent;
The OTHERS choice catches all other values for address that do not match any of the
values explicitly listed. It is clear from the fact that there is only one semicolon that
everything from WITH through OTHERS counts as one VHDL statement. This
statement will be executed whenever one of the address or data signals changes its
value.
1-39
In sequential code, the function is achieved using a CASE statement:
ARCHITECTURE sequential OF mux4to1 IS

BEGIN
PROCESS(address,data)
BEGIN
CASE address IS
WHEN "11" => y <= data(3);
WHEN "10" => y <= data(2);
WHEN "01" => y <= data(1);
WHEN OTHERS => y <= data(0);
END CASE;
END PROCESS;
END ARCHITECTURE sequential;
The CASE block is spread across several statements. These are executed (sequentially)
whenever address or data (the signals in the sensitivity list of the process) change their
value. The execution of the process computes a new value for y.
Both the CASE and the SELECT statement must exhaustively list all possible values for
the address. This is facilitated by using the OTHERS choice, to indicate everything that
has not been explicitly listed. You may wonder why we couldn’t just write this:
CASE address IS
WHEN "11" => y <= data(3);
WHEN "10" => y <= data(2);
WHEN "01" => y <= data(1);
WHEN "00" => y <= data(0);
END CASE;
Wouldn’t this exhaustively list all possible cases? The answer is no, because address is
declared as a STD_LOGIC_VECTOR. Its bits can take values not only of ‘0’ or ‘1’ but also
‘U’, ‘X’ etc.
5 Setting up simulations in VHDL: test benches

In the last lecture we produced a behavioural description of an adder:
4
x 4
4 sum
y
LIBRARY ieee;
USE ieee.std_logic_signed.ALL;
ENTITY adder IS
PORT ( x, y: IN STD_LOGIC_VECTOR(3 DOWNTO 0);
sum: OUT STD_LOGIC_VECTOR(3 DOWNTO 0) );
END ENTITY adder;

BEGIN
sum <= x + y;
We can view this as the input to a synthesis tool. However, before we synthesize our
code we want to see if it is correct. We do this through simulation: give it some inputs
and see whether the outputs behave as expected. In order to do this we need to create a
VHDL test bench.
1-40
5.1 Test bench for our adder example
A test bench is not intended to be fed to a synthesis tool; it is simply a way of applying
some test inputs to our design and observing what the outputs do. Once we are sure that
our design behaves as expected, then we will feed the design (but not the test bench) to
the synthesis tool. Because a test bench will not be synthesized, we can be much more
carefree in the features of the VHDL language that we use.
Here is the appearance of our test bench:
Test bench
in1
in2
x
+
sum output
y
The test bench is a simulation of the world around our design. It will include
representations of the signal generator that will apply test inputs to our design (in1 and
in2). It will also include representations of the logic analyzer that will capture the
outputs of our design (output) and check that they are OK. Here is the ENTITY
declaration for the test bench:
LIBRARY ieee;
ENTITY testbench IS
END ENTITY testbench;
The ENTITY declaration for the test bench may look slightly odd, since it contains no
port map. This is because it has no inputs or outputs. In order to fully describe a
simulation in VHDL, it is necessary that the top level of our design has no inputs or
outputs. (If it did have inputs and outputs, then we would need to think about some
bigger system enclosing the design that was able to supply the required inputs and
outputs.) This pattern is normally recognized by design tools as an entity that must not
be synthesized, but can be simulated.
Here is the architecture:
ARCHITECTURE tb OF testbench IS
SIGNAL in1, in2: STD_LOGIC_VECTOR(3 DOWNTO 0);
SIGNAL output: STD_LOGIC_VECTOR(3 DOWNTO 0);
BEGIN
-- Place a copy of the design in our test bench

g1: ENTITY work.adder(behavioural)
PORT MAP ( x=>in1, y=>in2, sum=>output );
-- Apply test inputs to the design

PROCESS
BEGIN
in1 <= X"2";
in2 <= X"5";
WAIT FOR 10 NS;
1-41
in1 <= X"7";
in2 <= X"1";
WAIT FOR 10 NS;
in1 <= X"9";
in2 <= X"4";
WAIT;
END PROCESS;
END ARCHITECTURE tb;
It declares local signals (in1, in2) that will be used to apply the test inputs to our design.
Similarly it declares a local signal (output) that will capture the simulated output of our
design. Then we place one copy of our design, which has been compiled to the library
as work.adder(behavioural) inside the test bench, and its inputs and outputs are wired
up to the local signals. Finally, there is a process that sets up a sequence of test inputs. If
we simulate this test bench, then this is the result
We can see that 2+5=7, 7+1=8 and 9+4=D (in hex) so the simulation gives us
reassurance that the design is working correctly.
The last line of the process says WAIT;. This tells the process to suspend forever. If we
didn’t have that WAIT statement, then the process would wrap round to the beginning
and starts the sequence all over again, repeating forever.
 How to write blocks of sequential code using a PROCESS.

 How to tell a process when to run and when to suspend
 How to set up a VHDL simulation using a testbench.

 Process
 Test bench
 WAIT statement
 Sensitivity list
1-42
Unit 2.4
Synthesized hardware
In this lecture we look at how the main constructs of VHDL are inferred as hardware.
We will go through the main operators of dataflow VHDL and how they transform to
hardware. In order to ensure that your VHDL can map to valid hardware, the
synthesizable subset of VHDL imposes some restrictions on what you can include in
your code.
1 Boolean operators
The Boolean operators have a hardware interpretation that is trivially obvious
NOT A
not a A
A A
A AND B A NAND B
a and b B a nand b B
A A
a or b B
A OR B
a nor c B
A NOR B
A A
a xor c B
A XOR B a xnor c B
A XNOR B
However, some types of hardware do not support all of these gate types. If that is the
case, then a synthesis stage called technology mapping will be used to replace the
desired gates with an arrangement of equivalent functionality that uses only the
resources available in the hardware.
2 Comparison operators
= equals /= not equals

> greater than >= greater than or equal
< less than <= less than or equal
Equality is tested by testing each of the bits of the number using an xnor gate, which
outputs a 1 if the inputs are equal. The xnor outputs are then anded together. Here is a 4-
bit example:
a(0)
b(0)
a(1)
b(1) a=b
a(2)
b(2)
a(3)
b(3)
To test for inequality (a/=b), we simply place an inverter at the output.
If we want to find the condition a<b?, then we use a subtractor to form a-b.
a msb
a<b?
sub a-b
b nb
The msb of the result is the sign bit. If it is 1, then a-b is negative, which means a<b.
This is, of course identical, to the condition b>a?
1-43
To test for b<a (or a>b), we swap the inputs to the subtractor.
To test for a<=b, we use the fact that a<=b is the logical complement of a>b.
msb a<=b?
b
s b-a
a
3 Selection operators
statements (and WHEN statements) are implemented by multiplexers. So in concurrent
IF
VHDL the statement
z <= a WHEN x=y ELSE b;
Would synthesize as
x
y equals?
a
z
b
Where the equality detector is as described above in section 2.2. The following process
would have the same effect:
PROCESS ( a, b, x, y )
BEGIN
IF x=y THEN
z <= a;
ELSE
z <= b;
END IF;
END PROCESS;
4 Latch inference
Suppose we write an incomplete WHEN statement (i.e. one that has no ELSE clause)
z <= a WHEN x=y;
or equivalently an incomplete IF statement
PROCESS ( a, b, x, y )
BEGIN
IF x=y Then
z <= a;
END IF;
END PROCESS;
If x=y then the outcome is clear, z will be set to the value of a. But what if x is not equal
to y? We haven’t specified a value of for z to take. In this circumstance, the behaviour
of VHDL is that z will continue to hold whatever value it had previously. This will
synthesize as
1-44
x
y equals
a
z
Latch
If this is the circuit you wanted then that’s fine, however this is often done by accident
and is one of the commonest coding errors in VHDL. If you don’t want a latch to be
inferred, you must assign a value to the output under every possible combination of
inputs.
5 Arithmetic operators: Addition-like operators

The basic adder is a carry ripple adder. This is slow due to the effect of carry ripple
down the carry chain, but it is simple and easy to build. This will be present in all
synthesis libraries for all target hardware.
carry in
a
0 + c carry in
b 0
0
a a
1 + c
a c
b 1 where b + c is built as
1 b
a
2 + c carry in
b 2
2
carry out carry out
a
3 +
b c
3 3
carry out
The standard operations are implemented as follows:

0 1 1 1
a a a a
0 + c 0 + c 0 + c 0 + c
b 0 0 0 0 0 b 0
0 0
a a a a
1 + 1 + 1 + 1 +
b c c c b c
1 0 1 0 1 1
1 1
a a a a
2 + c 2 + c 2 + c 2 + c
b 2 0 2 0 2 b 2
2 2
a a a a
3 + 3 + 3 + 3 +
b c 0 c 0 c b c
3 3 3 3 3 3
Add: Increment: 2s complement: Subtract:

c <= a+b c <= a+1 c <= -a c <= b-a
The subtraction and negation circuits are based on a standard trick to negate a 2s
complement number: we complement each of the bits and then add 1. So, for example,
if we want to know what –6 is represented as a 4-bit binary number we do this:
 Form the number +6 in 4-bits, i.e. 0110.
 Complement each bit (i.e. replace each 0 by a 1 and each 1 by a 0) to get 1001.
 Finally we add 1, which gives us 1010.
1-45
6 Absolute value: c <= abs(a)
This is accomplished by transforming to
c <= a when a>=0 else –a;
and then building the circuit according to the above rules, using a comparator, a
multiplexer and a 2s complementer.
7 Multiplication: c <= a * b;
This is accomplished using an array multiplier. To understand how this works, consider
an example of (unsigned) long multiplication 7  3 = 21:
0111 Multiplicand = 7
 0011 Multiplier =3
0111 LSB of multiplier  multiplicand: partial product 0 = 7
+ 01110 Bit 1 of multiplier  multiplicand: partial product 1 = 14
+ 000000 Bit 2 of multiplier  multiplicand: partial product 2 = 0
+ 0000000 MSB of multiplier  multiplicand: partial product 3 = 0
00010101 Sum all the partial products = 21
The 1-bit muliply is simply an AND gate. So, for example, partial product 0 is the LSB
of the multiplier ANDed with the each bit of multiplicand and shifted left by 0 bits.
Similarly, PP1 is bit 1 of the multiplier ANDed with the multiplicand and shifted left by
1 bit.
To add the four partial products requires a 4-input adder. This is more easily realised by
a cascade of 2-input adders. So we have one adder to add PP 0 and PP 1. This feeds its
output to another adder with adds this to PP 2. This in turn feeds another adder which
adds in PP 3.
The basic cells is therefore

Input from above
a
b
Output left +
Input from right
Output down
and consists of a 1-bit multiplier (realised as an AND gate), combined with a full adder.
These cells are then placed to build up the multiplier. A 4 x 4 multiplier has the
appearance8
8This is an unsigned multiplier. To correctly handle negative operands (i.e. a signed multiplier) we would
need to make some small modifications
1-46
Note that a multiplier that multiplies an m-bit number by an n-bit number requires n+m
bits to represent the answer.
The speed of this multiplier is limited by carry propagation. In the worst case a
propagating carry may have to pass through 8 stages (in general for an n-bit multiplier,
2n stages, i.e. twice as many as for an n-bit multiplier). This makes the combinational
multiplier slow.
8 Synthesis optimizations
In the first stage, the synthesis tool will convert the VHDL description to a netlist of
gates, represented as an EDIF file. Before it then proceeds to technology mapping and
physical synthesis, it will normally perform an optimization stage on the netlist to get
rid of un-needed gates. For example, consider the incrementer circuit of section 8.5.1.
Since the input b is known to be 0 for all bit positions, the circuit can be optimized,
giving a 60% saving in logic gates.
a a
c c
is built as
0
carry in carry in
carry out carry out

 The hardware realizations of the main VHDL constructs.
 How these are designed so as to be optimizable by the synthesis tools
 That some types of hardware do not support basic logic gates, and will require
technology mapping to alter the netlist into a form that can be physically realised.
1-47
Unit 2.5
Problems with VHDL Synthesis
Our lines of VHDL code represent chunks of hardware that receive inputs and drive
outputs. It is important to have a good grasp of how your VHDL code maps to hardware
in order to avoid a mistake which is commonly made by newcomers to VHDL. This is
the problem of having multiple different chunks of hardware all simultaneously driving
incompatible values onto the same output, causing the output to go to some
unpredictable garbage value.
1 The garbage values: ‘X’, ‘U’

In the original VHDL language, logic signals were represented by the built-in type bit.
For type bit, the only valid logic levels are ‘0’ and ‘1’. This turned out to be insufficient
to represent all of the situations that we may encounter in verifying and evaluating
systems, so an extended set of levels was introduced in the new standard
std_logic_1164. In total, this introduces 7 new logic values, but in this course we will
only be interested in two of these:
 ‘X’ is the unknown value. In reality this would be a ‘1’ or a ‘0’, but which it will be
is random and is not directly established by the logic levels of our design’s inputs.
 ‘U’ is the uninitialized value. This is a special case of X, whose cause is that a signal
has not been assigned a legitimate value. This corresponds to the behaviour of flip
flops in real designs, which at switch on of the circuit will randomly settle into an
unpredictable 0 or 1.
In VHDL, when a signal is declared we can give it an initial value, e.g.

signal a: std_logic_vector(3 downto 0):="0000";
If we do that, then obviously in our VHDL simulation the signal a will start with an
initial value of 0000. If we don’t give an initial value in the declaration
signal a: std_logic_vector(3 downto 0);
then the signal a will start with an initial value of UUUU.
If we then perform an arithmetic operation on this UUUU value, say for example
c <= a+b;
then c will get the value XXXX
2 Contention
Now suppose we have two lines of code that attempt to write to the same signal:
signal a, b, c: std_logic_vector(3 downto 0);

a <= "0001";
b <= "0010";
c <= a+b;
c <= a-b;
The two different assignments for c are driving different values. The result won’t make
sense. c will get the value XXXX, a garbage value. In general, it is a rule that in
concurrent code that each signal must appear on the left hand side of only one
assignment.
1-48
This error situation looks obvious. However, there are more subtle scenarios where this
same problem can crop up, and these can be harder to spot. We’ll think in detail about
these in the next two sections.
3 VHDL drivers in concurrent code

Consider a simple piece of concurrent VHDL code:
1: ARCHITECTURE dataflow Of example1

2: BEGIN
3: result <= num1+num2 when button='1'
else num1-num2 when button='0';
4: END ARCHITECTURE dataflow
The driver for the signal result is statement 3, i.e. the statement that will compute a new
value for result when appropriate (i.e. whenever num1, num2 or button change value).
This code is correctly constructed and will do what we want. It will build hardware as
shown below:
The following code illustrates a mistake often made by newcomers to HDL-based

design. The writer is trying to achieve the same functionality as the earlier example, but
has made a mistake.
1: ARCHITECTURE dataflow Of example1

2: BEGIN
3: result <= num1+num2 when button='1';
4: result <= num1-num2 when button='0';
5: END ARCHITECTURE dataflow
The resulting hardware is shown below.
Each line that has result on its left hand side will become a different lump of hardware.
Both try to drive the output result. Node result has multiple drivers.
The writer hoped that when button=’1’ then result would derive its value from
statement 3, and when button =’0’ then result would derive its value from statement 4.
The plan is that either one or the other assignment to result will be active, so result
should behave sensibly. However, this reasoning is wrong. result is deriving a value
from statement 3 all of the time. Statement 3 is a latch, with button as its enable signal.
When button=’1’, statement 3 drives the value num1+num2 onto result. When button
=’0’, statement 3 drives the previously memorized value onto result. Meanwhile
statement 4 is also driving a value onto result all of the time. When button =’1’ it drives
the value ‘0’; when button =’0; it drives the previously memorized value. Whatever
values button takes in this scenario, the value of result is guaranteed to be garbage.
1-49
Statement 3 tries to drive it to one value and statement 4 simultaneously tries to drive it
to a different value.
The conclusion is that if result appears on the LHS of more than one concurrent
statement, result has multiple drivers and a contention situation exists. Simulating the
code would give result a value of all Xs. A synthesis tools would probably refuse to
compile the code.
4 Processes and drivers

Each process in a piece of VHDL counts as a single driver for any signal that appears on
the LHS in the process. Similarly, anything that appears on the RHS, or in the
sensitivity list or the argument of an IF or CASE clause, will be an input to the process.
For example:
1: PROCESS(button, num1, num2)

2: BEGIN
3: IF button='1' THEN
4: result <= num1+num2;
5: ELSIF button='0' THEN
6: result <= num1-num2;
7: END IF;
8: END PROCESS;
This will construct the hardware
Within the same process, a signal can appear on the LHS as often as you want, and it
still counts as only one driver. For example:
1: PROCESS(button, num1, num2)

2: BEGIN
4: result <= num1+num2;
5: END IF;
7: result <= num1-num2;
8: END IF;
9: END PROCESS;
This would also construct the hardware
Each time button, num1 or num2 changes, the process will execute. The process runs
sequentially calculating a new values for result. The new value of result is applied 
after the process has completed. So if within a single run of the process, several
different assignments to result occurred, then the later assignments will overwrite the
earlier. After the process has completed, the new future value is assigned  later. This is
not a contention situation: node result has only one driver.
1-50
However, if the same signal appears on the LHS of two different processes, then it has
multiple drivers and contention exists. For example, consider the signal result in the
following code:
PROCESS(button, num1, num2)

BEGIN
IF button='1' THEN
result <= num1+num2;
END IF;
END PROCESS;
PROCESS(button, num1, num2)

BEGIN
IF button='0' THEN
result <= num1-num2;
END IF;
END PROCESS;
The signal result has two drivers and the result would be garbage.
5 A common mistake: incorrect IF blocks

We can also describe this circuit using a process. Before we do that, here is a common
mistake to watch out for. Here is a correct IF block:
IF condition THEN
do this;
ELSE
do that;
END IF;
An IF block must end with an END IF, and can have an optional ELSE clause. If we
have multiple conditions to check for, we get this (also correct):
IF button='1' THEN
ELSIF button='0' THEN
END IF;
The ELSIF clause if part of the one IF block. It will only be evaluated if condition1 is
false.
A common mistake is to write the IF block with multiple conditions to check like this:
IF button='1' THEN
ELSE IF button='0' THEN
END IF;
The mistake is that ELSIF is not the same thing as ELSE IF. ELSIF is part of the existing IF
block, whereas ELSE IF starts a completely new IF block nested inside the other IF block.
We now have two IF blocks, so we need two END IF statements, and the code will not
compile if we just have one END IF.
6 Synthesizable VHDL
VHDL is a powerful language and enables us to express a vast range of possibilities in
our code. However some of these are meaningless in hardware and are useful only for
1-51
simulation. There is a standard subset of the VHDL language, defined in IEEE standard
1076.6, which states which features of the language can be used with confidence that all
synthesis tools will interpret them consistently and correctly. The reason why some
features are forbidden or ignored in VHDL-for-synthesis is usually because the features
have no reasonable counterpart in real hardware. Here are some examples:
 The construct WAIT FOR is ignored by synthesis tools. The normal use of the WAIT
FOR construct is to set up the timing of test vectors in a test simulation bench.
 The AFTER clause is ignored for a similar reason. It is not possible to synthesize a
gate that has an exact delay.
 Initial values assigned to signals are normally ignored. Instead set and reset signals
must be used to initialize flip-flops. However, if your design targets an FPGA and
the signal will synthesize as a register, then the initial value will be used. This is due
to the peculiarities of the physical structure and programming method of FPGAs.
When a synthesis tool processes code that cannot be synthesized, there are two possible
outcomes:
 For major problems, an error message is issued and synthesis is aborted
 For minor problems, the offending code is deleted, and synthesis then continues
If your design depends critically on the use of one of the features of VHDL that
simulates correctly but is thrown away by the synthesis tool, then your synthesized
design will not work even though the simulations were fine.

 The causes of the logic levels ‘U’ and ‘X’
 How multiple drivers for a signal result in X values
 In concurrent code, a signal should only appear on the left and side of one
assignment statement
 A signal can be the target of many different assignment statements in a single
process
 A signal should not appear as the target of assignments from different processes
 AFTER and WAIT FOR are only intended for creating simulations test benches; if
you feed them into a synthesis tool they will be ignored.
1-52
Unit 3.1
Register Transfer Level VHDL (1)
In this lecture, we will look at how to describe sequential logic, logic whose operation is
synchronized to the edges of a clock signal.
1 The D-type flip-flop

The vast majority of hardware designs use sequential logic, where system operation is
driven by a clock signal distributed throughout the system. Each hardware block in the
system will update its outputs at the rising edge9 of the clock signal, and the outputs will
then stay stable throughout the rest of the cycle until the next rising edge. The basic
device that is used to accomplish this is the D-type flip flop. Here is the ENTITY
definition for this device
LIBRARY IEEE;
D Q USE IEEE.STD_LOGIC_1164.ALL;
ENTITY dff IS
Clk PORT ( d, clk : IN STD_LOGIC;
q : OUT STD_LOGIC);
END ENTITY dff;
The behaviour of this device is as follows. When the clock is stable, Q simply holds its
value constant. When a rising clock edge occurs, the output Q takes on the value that D
has at the moment when the clock edge occurred. It then holds that value constant until
the next rising clock edge occurs, at which time it updates itself again.
There is a small delay between the occurrence of the clock edge and the appearance of
the new output. This is called the clock-to-q delay of the flip flop.
clk
Tclk-q
Here is an architecture that describes the behaviour of the D-type flip-flop:
ARCHITECTURE rtl OF dff IS

BEGIN
PROCESS (clk)
BEGIN
IF ( rising_edge(clk) ) THEN
q <= d;
END IF;
END PROCESS;
END ARCHITECTURE rtl;
Whenever clk changes its value, the process will run. However, clk might have changed
due to a falling edge of the clock (which should not trigger an update to q) so we need
to insert an IF statement which causes q to be updated only on the rising edge of clk.
The rising_edge function is contained in the STD_LOGIC_1164 package, and returns
TRUE when clk has changed from 0 to 1 during the last delta.
9 or sometimes falling edge

1-53
The meaning of this description is as follows:
 The process runs at the precise instant of the clock edge
 At the precise instant of the clock edge, the value of the D-input will be sampled.
 This value will not appear at the output until a short time  after the clock edge has
completed.
You can think of the  delay as a model of the clk-to-q delay of the flip-flop. It doesn’t
matter how short the delay is, but the delay must be non-zero in order for the device to
give correct flip-flop behaviour.
1.1 D-type flip-flop with reset

When an electronic system is switched on, the contents of all flip-flops will go
randomly to 1 or 0. At power-up the contents of all flip-flops are garbage. (It is this
condition that the ‘U’ value in VHDL is designed to emulate.) It is useful to be able to
put the flip-flops into a known state (usually ‘0’, but sometimes ‘1’). This is done by
means of a Reset signal.
D Q
Clk
Reset
The Reset signal may be synchronous, but is usually asynchronous. If the Reset is
synchronous, then it is ignored until the rising edge of the clock. When the rising edge
comes, if Reset=’1’ then q goes to ‘0’. If Reset=’0’, then the flip-flop exhibits normal
behaviour, i.e. q<=d. This would be described like this:
ARCHITECTURE synch_reset OF dff IS

BEGIN
PROCESS (clk)
BEGIN
IF ( reset='1' ) THEN
q <= '0';
ELSE
q <= d;
END IF;
END IF;
END PROCESS;
END ARCHITECTURE synch_reset;
If the Reset is asynchronous, then it takes immediate effect, no matter what the clock is
doing. This means that the flip-flop is always sensitive to its Reset input. This would be
described like this:
1-54
ARCHITECTURE asynch_reset OF dff IS
BEGIN
PROCESS (clk, reset)
BEGIN
IF ( reset='1' ) THEN
q <= '0';
ELSIF ( rising_edge(clk) ) THEN
q <= d;
END IF;
END PROCESS;
END ARCHITECTURE asynch_reset;
2 Registered logic
2.1 Carry ripple in adders

Let’s re-visit our structural description of the adder:
carry in
x
0 + sum
y 0
0
x
1 + sum
y 1
1
x
2 + sum
y 2
2
x
3 +
y sum
3 3
carry out
When we have a simple set of inputs, this behaves sensibly. Suppose x=”0000” and
y=”0001”. Then after a short delay as the input values move through the gate delays in
the full adder, sum gets a new value of “0001”.
Now suppose we have the values x=”0001” and y=”0111”. After a brief delay sum
becomes “0110”. Then after a short while, fulladder 1 “notices” that fulladder 0 has
generated a carry: its sum output flips to ‘0’, and its carry flips to ‘1’. So sum becomes
“0100”. Then after another short while, fulladder 2 notices that fulladder 1 generated a
carry and sum becomes “0000”. Then after another while fullader 3 notices the carry
that has just been generated by fulladder 2. Then sum becomes “1000”
Here is a simulation of how the adder behaves with the two different sets of inputs:
When x=1, y=0, the sum output goes quickly to the correct output, with no
misbehaviour en route. However, when x=1 and y=7, we have a series of outputs which
are garbage, and the sum takes a long time to settle to the correct value.
1-55
This effect is called carry ripple. For our little four bit adder, the problem was awkward
enough. But realistic adders are more likely to be 16, 32 or even 64 bits in length, so the
carry may have to ripple down a very long path. This can cause a long delay period
during which the outputs of the adder are garbage.
2.2 The registered adder

The normal way to fix this is to register the outputs:
carry in
sum
0 sum_reg
x 0
0 +
y
0
sum sum_reg
x 1
1 + 1
y 4-bit
1 register
x sum sum_reg
2 + 2 2
y
2
x sum sum_reg
3 + 3 3
y
3
carry out clk
A group of D-type flip flops all controlled by the same clock signal is called a register.
This is what the output of the register looks like:
The registered sum will be updated at the each rising edge of the clock signal. If the
clock is slow enough that sum has completely settled before the next clock edge arrives,
then the registered sum is a cleaned up version of sum. (But notice that if we ran the
clock too fast, the rising edge would come during the period in which sum is garbage,
and the registered output would therefore be wrong).
One very important point to notice here is that the output is acquiring its value one
clock cycle after the corresponding inputs. So the output goes to 1 the cycle after the
inputs were x=0, y=1. Similarly, the output goes to 8 one cycle after x=7, y=1. This is
often a source of confusion, and you should make sure you understand why this
happens.
1-56
2.3 VHDL description of the registered adder
The diagrams of the previous section show a structural description of the adder. Usually
humans don’t produce structural code. They write behavioural code like this:

BEGIN
sum <= x + y;
and leave it to the synthesis tool to produce the structural gate-level description of the
design. Now if we are looking for a registered adder, the above description isn’t good
enough. There is nothing to tell the synthesis tool that we want the addition to be
synchronized to a clock signal. An obvious way to describe the adder with registered
output is to instantiate 4 flip-flops at the output of the adder:
LIBRARY ieee;
USE ieee.std_logic_unsigned.ALL;
ENTITY reg_adder IS
PORT ( x, y: IN STD_LOGIC_VECTOR(3 downto 0);
clk: IN STd_logic;
sum_reg: OUT STD_LOGIC_VECTOR(3 downto 0));
END ENTITY reg_adder;
ARCHITECTURE registered OF reg_adder IS

SIGNAL sum: STD_LOGIC_VECTOR ( 3 DOWNTO 0 );
BEGIN
sum <= x + y;
g1: ENTITY work.dff(correct)
PORT MAP ( d=>sum(0), clk=>clk, q=>sum_reg(0) );
END ARCHITECTURE registered;
However, this is pretty painful (even more so for a 32-bit adder). Instantiating lots of
small pieces of hardware is a structural way to do things, and we normally don’t want
humans to operate in this way. This is much better:
ARCHITECTURE registered2 OF reg_adder IS
BEGIN
sum <= x + y;
PROCESS (clk)
BEGIN
IF ( rising_edge(clk ) ) THEN
sum_reg <= sum;
END IF;
END PROCESS;
END ARCHITECTURE registered2;
Now we have made in clear to the synthesis tool that we want a registered version of
sum to be created, synchronized to the rising edge of the clock. It is then up to the
synthesis tool to figure out how to build a circuit that achieves this behaviour.
1-57
2.4 Register transfer level (RTL) description
This description is even more concise:
ARCHITECTURE rtl OF reg_adder IS

BEGIN
PROCESS (clk)
BEGIN
sum_reg <= x + y;
END IF;
END PROCESS;
If we want a reset signal, that can asynchronously reset the adder output to zero, this is
achieved in a similar fashion:
ARCHITECTURE rtl2 OF reg_adder IS

BEGIN
PROCESS (clk, reset)
BEGIN
IF ( reset=’1’ ) THEN
sum_reg <= "0000";
ELSIF ( rising_edge(clk) ) THEN
sum_reg <= x + y;
END IF;
END PROCESS;
END ARCHITECTURE rtl2;
This style of coding is called register transfer level coding. We are using dataflow
statements, but wrapping them up in processes triggered by the clock (and possible
some reset or enable signals) in order to make it clear on what clock cycle the outputs
should assume their values. Notice that in RTL we are simply defining the behaviour we
want ( in this example sum_reg <= a+b on a rising edge of the clock). We are leaving it
entirely up to the synthesis tool to infer what configuration of registers will be needed to
give us this behaviour.
3 Summary
In this lecture we have looked at how to use clocked processes to build register transfer
level (RTL) descriptions. These are behavioural descriptions that make it clear on which
clock edge the outputs must assume their value. These descriptions can then be
synthesised into sequential circuits using the appropriate configuration of registers by a
synthesis tool.

 How to construct behavioural descriptions of synchronous circuits
 How carry ripple causes the outputs of an adder to be briefly unreliable
 Why using registers solves the problem of transiently invalid outputs

1-58
Unit 3.2
Register Transfer Level VHDL (2)
Register transfer level descriptions build up a description of a complete system inside a
clocked process. The timing of the movement of data is controlled by the way that we
write our VHDL inside the process. It is therefore important to have a good grasp of
how signals inside processes behave, and how they map to hardware. In this lecture we
look at one of the most important rules of RTL: signals that are the target of an
assignment inside a clocked process will synthesise to registers.
1 A chain of registers
Suppose we have a chain of 4-bit registers, called s1, s2, s3. The output of one stage in
the chain is the input of the next:
Input D Q D Q D Q
s1 s2 s3
Clk
Now suppose we feed a series of numbers 8, 3, 7, 4 on successive clock cycles into the
input of the chain. Initially we have this:
8
Input D Q D Q D Q
s1 s2 s3
Clk
After first clock edge we have this
3 8
Input D Q D Q D Q
s1 s2 s3
Clk
The number 8 will be read into the first register at the clock edge. This takes a finite
amount of time, and the by the time that this read in has completed, the second stage is
no longer responding to its input. So the number 8 can proceed no further down the
chain in this clock cycle.
After second clock edge we have this
1-59
7 3 8
Input D Q D Q D Q
s1 s2 s3
Clk
And so on. On each clock cycle, the numbers shift one stage to the right. This device is
a shift register. The overall behaviour, shifting one stage per clock cycle, occurs because
reading a data item into a register has a finite delay. This is the clock-to-q delay (see
section 12.1), which is modelled in VHDL by the  delay of an assignment to a signal.
If reading data into a register entailed no delay, then a single data item would shoot all
the way down the chain at the first clock edge.
2 Describing the shift register in VHDL

Here is the description of the shift register in VHDL:
ARCHITECTURE rtl OF dff IS

BEGIN
PROCESS (clk)
BEGIN
IF (rising_edge(clk)) THEN
s1 <= d_in;
s2 <= s1;
s3 <= s2;
END IF;
END PROCESS;
The way that the process works is as follows:

 The process runs at the precise instant of the clock edge
 At the precise instant of the clock edge, the right-hand sides of the assignments will
be sampled
 These new values will not be assigned to the signals s1, s2, s3 until a short time 
after the process has completed.
This means that if we apply the value 8 at d_in, then after the first clock edge s1 get this
value. After the second clock edge, s2 will get this value. After the third clock edge, s3
will get this value.
3 Register inference in RTL VHDL

The action of a synthesis tool in looking at your code and deciding what components to
build is called inference. One of the most fundamental rules of RTL VHDL (i.e. using a
clocked process) is this:
A signal that is the target of an assignment will cause a register to be inferred.
You can see that this is the case for s1, s2 and s3 in the example above.
1-60
4 Order of statements in RTL VHDL
Inside a VHDL process, the statements are executed sequentially. So the ordering of
statements below left feels natural (as stage s1 comes before stage s2 in the chain). The
ordering below right is also correct and would also have exactly the same effect in spite
of the fact that the assignments have been written in the reverse order to what feels
natural.
PROCESS (clk) PROCESS (clk)

BEGIN BEGIN
IF (rising_edge(clk)) THEN IF (rising_edge(clk)) THEN
s1 <= d_in; s3 <= s2;
s2 <= s1; s2 <= s1;
s3 <= s2; s1 <= d_in;
END IF; END IF;
END PROCESS; END PROCESS;
This is because as the process runs, s1, s2, s3 are not updating their values. They will
only update a  time after exit from the process.

 That assigning a signal inside a clocked process causes a register to be inferred in the
synthesized hardware, and a clock cycle of delay in the behaviour
 How to construct a shift register using a series of signal assignments inside a clocked
process.
1-61
Unit 3.3
Controlling Register Inference using Signals and Variables
In this lecture, we will see how to control when registers are inserted into our circuits. A
VHDL signal that is the target of an assignment will cause a register to be inferred.
Sometimes we would like to make an assignment without causing a register. This can be
accomplished using a VHDL variable.
We will also look at pipelining, which is an important application of registers.

Pipelining is a technique that introduces additional registers at intermediate points in
our design in order to be able to operate the design at a high clock frequency.
1 Pipelines
In the last lecture, we looked at the shift register. Now suppose we put some blocks of
logic gates that perform some useful function between each register stage:
D Q D Q D
Block of D Block of D Q Block of D
n logic gates logic gates logic gates
Clk
This arrangement is called a pipeline. As we introduce inputs into the pipeline, they
flow along the pipeline at a rate of one stage per clock cycle as follows:
Before first rising clock edge

1st batch D Q D Q D
Block of D Block of D Q Block of D
of inputs
Clk
After first rising clock edge:

Results for
1st batch
of inputs
2nd batch D Q D D Q D D Q
Block of Block of Block of D
of inputs
Clk
After the 2nd rising clock edge:

Results for Results for
2nd batch 1st batch
of inputs of inputs
3rd batch D Q D D Q D D Q
Block of Block of Block of D
of inputs
Clk
And so on.
1-62
2 Speed of pipelined datapaths
A system in which data flows in at one end, and out at the other end, after undergoing
some useful processing, is called a datapath. Let’s consider a simple example of a
datapath, an adder tree. Suppose we have four numbers a, b, c, d to add together, but the
adders that we have available to us only accommodate two inputs. This can be solved
by an adder tree arrangement, shown below:
a
+
b
e
register
f + sum
d
+ clock
This takes four numbers a, b, c, d and adds them all together, catching them in an output
register.
Suppose we have a timing constraint that says that the circuit must operate from a 100
MHz clock, but the delay of each adder is 6 ns. If a set of inputs is applied at a, b, c, d
then the values at e, f will be garbage for the next 6 ns and the value at sum will be
garbage for the next 12 ns. Thus our sum output will not be valid within the 10 ns
timeframe in which we want to apply a clock edge to sample the output sum.
This can be resolved by inserting registers at e and f
a
register
+
b
e
register
f + sum
c
register
d
+
clock
Each of the register stages has stable inputs after only 6 ns, so we can apply a can apply
a clock edge to all registers after 10 ns and meet our timing constraint.
3 Controlling pipelining with signals

The way we write our VHDL code will have an important influence on the way that
registers are introduced. This is due to the rule that whenever a signal appears on the left
hand side of an assignment, a register will be inferred by synthesis. If we write the code
like this:
1-63
ARCHITECTURE pipelined OF adder IS
SIGNAL c, f: SIGNED ( 31 DOWNTO 0 );
BEGIN
PROCESS (clock)
BEGIN
IF ( rising_edge(clock) ) THEN
c <= a + b;
f <= d + e;
sum <= c + f;
END IF;
END PROCESS;
END ARCHITECTURE pipelined;
it will synthesize like this, with registers for e, f and sum.
a
register
+
b
e
register
f + sum
c
register
d
+
clock
If we want to have a register only at the sum output, then we could write our code like
this:
ARCHITECTURE no_pipeline OF adder IS
BEGIN
PROCESS (clock)
BEGIN
sum <= a + b + d + e;
END IF;
END PROCESS;
END ARCHITECTURE no_pipeline;
This would synthesize this, with just a single register stage.
a
+
b
e
register
f + sum
d
+ clock
1-64
4 VHDL variables
Sometimes we find that we want to refer to intermediate terms in our code, like
e and f, without causing registers to be inferred by the synthesis tool as occurred above.
This can be achieved by declaring the intermediate terms as variables rather than signals.
Variables are like signals, with the exception that assignments take effect immediately
(as opposed to signals, whose assignments take  delay). A variable can only exist
inside a process, and must be declared between the PROCESS and the corresponding
BEGIN statement. In order to remind us of the difference in behaviour, signals and
variables use a different assignment operator:
b <= c; -- b is signal
b := c; -- b is a variable
A variable will assume its new value immediately whenever an assignment occurs (unlike
a signal which receives a new value from an assignment after the process has completed).
This means that using a variable to represent c and f will not cause registers to be inferred.
This is shown below:
ARCHITECTURE rtl3 OF adder IS
BEGIN
PROCESS (clock)
VARIABLE e, f: SIGNED ( 31 DOWNTO 0 );
BEGIN
c := a + b;
f := d + e;
sum <= c + f;
END IF;
END PROCESS;
END ARCHITECTURE rtl3;
a
+
b
e
register
f + sum
d
+ clock
5 Summary
Pipelining introduces registers into intermediate stage in our design so that the overall
combinational delay between register stages is reduced, thus enabling us to use a higher
clock frequency.We can control inference of register stages by the way that we write
our VHDL code. Signals that are the target of an assignment cause a register to be
inferred. Variables that are the target of an assignment do not cause a register.

 How to use variables and signals to control the pipelining of a datapath
 Why pipelining a datapath increases the clock frequency that can be used
1-65
Unit 3.5
Finite State Machine Design using VHDL
Now that we have a good grasp of RTL descriptions, we will put them to work by
looking at two of the most important sequential logic building blocks: finite state
machines (in this lecture) and counters (in the next lecture). We will develop examples
of these for specific problems, but in so doing we will generate adaptable templates that
can be easily adjusted for other situations.
A finite state machine is a machine that can generate sequences and/or respond to input
sequences. We will illustrate this through a simple FSM called seq_detect.
x
seq_detect z
clock
Its behaviour is that on each rising clock edge, new value is read at input x. The output z
goes high when input sequence ends in 101. This can be illustrated with an example
input and output sequence:
x=0011011001010100
z=0000010000010100
Each time we see the input sequence x=101 on successive clock cycles, the output z will
go to 1. Note that in the example above, the x=1 that completed the second occurrence
of 101 is also the first x=1 in a new occurrence.
1 The idea of state

This device is obviously not combinational. If we have x=1, what would z be? If on the
previous clock cycle x had been 0 and the cycle before that it was 1, then z=1.
Otherwise z=0. So the present value of input x alone is not sufficient to determine the
output of the device. The present value of the input needs to be combined with
something else, something that summarises the relevant aspects of the device’s history.
This is given the name “state”. In this case, we can identify four states, which we will
label as S0, S1, S2 and S3.
 S0 – The most recent inputs have not given us any of the 101 sequence
 S1 - The last input was a 1, the first item of the sequence
 S2 - The last 2 inputs were 10, the first two items of the sequence
 S3 - The last 3 inputs were 101, the whole sequence
The device will output z=1 when it reaches state S3. In the other states, it will output
z=0.
We complete the design by defining how the input x causes transitions between these
states. Once we have done this, we can create our VHDL description.
1-66
2 The state diagram
The state diagram shows how new values for x causes transitions between states:
0 1
S0 1 S1
0 0
Input = x
0 Output = z
1 0
S3 1 S2
1 0 0
At turn-on or reset, we start in state S0. If we then receive x=0, we stay in S0 because we
have not seen any part of the required sequence. If, on the other hand, we receive x=1,
we transition to S1 because we have seen the first item in the sequence.
When we are in S1, if we receive x=0 then we transition to S2 because we have now
seen the first two items of the required sequence. If x=1, then we stay in state S1
because this could be the first item in a new 101 sequence. Following this thought
process through, you can see how the rest of the diagram is constructed.
3 The state table

The state table conveys the same information as the state diagram, but in tabular form.
For each state that the device could be in, it lists what the next state will be for any
possible value of the input.
Present state Next state

x=0 x=1
S0 S0 S1
S1 S2 S1
S2 S0 S3
S3 S2 S1
4 Describing this in VHDL

The entity is straightforward to define:
x
seq_detect z
clock
LIBRARY ieee;
ENTITY seq_detect IS
PORT ( x, clock: IN STD_LOGIC; z: OUT STD_LOGIC);
END ENTITY seq_detect ;
In order to define the architecture, we need some way to represent the states S0, S1, S2,
S3 in a way suitable for implementing as a digital memory signal. Four distinct states
1-67
can be represented as two bits. There are many different choices that we could make,
but we’ll make an arbitrary choice to represent our states as S0=00, S1=01, S2=10,
S3=11. We can now produce our code:
ARCHITECTURE rtl OF seq_detect IS

-- Define the states
SIGNAL state: STD_LOGIC_VECTOR(1 DOWNTO 0);
CONSTANT s0: STD_LOGIC_VECTOR:="00";
BEGIN
-- Only make state transitions on the rising edge of the clock
PROCESS (clock)
BEGIN
IF (rising_edge(clock) ) THEN
-- Define our state transitions
CASE state IS
WHEN s0 => IF x='0' THEN state <= s0;
ELSE state <= s1;
END IF;
ELSE state <= s1;
END IF;
ELSE state <= s3;
END IF;
ELSE state <= s1;
END IF;
END CASE;
END IF;
END PROCESS;
-- Output a 1 when in state s3
z <= '1' WHEN state = s3 ELSE '0';
There are four key parts of the code:

 Define the state encoding in the declarations section
 Use a clocked PROCESS to ensure that transitions happen synchronously
 Use a CASE statement to define the state transitions, as per the state table
 Define the output values for z produced by the states
 How to construct state diagrams and state table for synchronous finite state
machines
 How to write VHDL descriptions of synchronous finite state machines
 How to enforce particular state encodings on finite state machines, and how to leave
the synthesis tool free to choose its preferred encoding
 How to deal with unused states, and ensure that they transition to appropriate states
1-68
Unit 3.5
Creating a Display Driver using VHDL
The boards that we will use in lab have an 8-digit 7-segment display that can display up
to 8 hexadecimal digits:
In this lecture, we will put our knowledge of RTL and finite state machines to work to
produce a driver for the display. To keep the example small, we will only drive four of
the digits, but the extension to all 8 digits is straightforward. The driver will consist of
an entity that has a 16-bit input called number and outputs control signals SEGMENTS
and DIGITS. The value of number will appear in hex on the display. So, for example, if
number=000100100011011, then 1234 would appear on the display.
1 The seven segment display driver

The first item that we need to design is the 7-segment display driver for an individual
digit.
Numbering of segments When input = 0010,

output=0100100
We take a 4-bit unsigned binary number as the input, and produce the required signals
to light up the appropriate LED segments on the display. These LEDs segments are
active low. They turn on when the corresponding input is low, and turn off when that
input is high. Bearing this in mind (and also bearing in mind that segment 7, the decimal
point, should always be off) we can produce the VHDL description of the driver:
WITH number SELECT
SEGMENTS <= "11000000" WHEN "0000", --0
"11111001" WHEN "0001", --1
"10100100" WHEN "0010", --2
1-69
"10110000" WHEN "0011", --3
"10011001" WHEN "0100", --4
"10010010" when "0101", --5
"10000010" WHEN "0110", --6
"11111000" WHEN "0111", --7
"10000000" WHEN "1000", --8
"10010000" WHEN "1001", --9
"10001000" WHEN "1010", --A
"10000011" WHEN "1011", --b
"11000110" WHEN "1100", --C
"10100001" WHEN "1101", --d
"10000110" WHEN "1110", --E
"10001110" WHEN "1111", --F
"11111111" WHEN OTHERS; --Default
2 Driving multiple digits on the display

In the previous section we looked at how to build a driver for a single digit of a 7-
segment display. However, the board display has multiple digits. In this example, we
will drive 4 of the digits. Rather than having many separate sets of 8-input wires, one
for each of the 4 digit, the display is multiplexed. There is only one 8-bit driver for the
segments (segments) and a separate 8-bit input (digit) that determines which digit is
being addressed. The digit input uses zero-hot encoding, so 0111 means that digit 3 is
being addressed, 1011 is digit 2, 1101 is digit 1 and 1110 is digit 0. The display unit will
then step through each digit at a time (by causing digit to go cyclically through the
sequence 0111, 1011, 1101, 1110) and outputs the required values for that segment on
segments. Although only one digit at a time is driven, the persistence of the LED and
the insensitivity of our eyes to short durations means that the display will look as if all
digits are being addressed at the same time. However, the persistence of the LEDs
causes a problem. If we use the system clock of the board, which is 100 MHz, to carry
out the multiplex, this is too fast compared to the persistence of the LEDs, and all LEDs
will look half-on, half-off all of the time. The correct frequency to multiplex is about 1
kHz (i.e. a period of 1 millisecond).
In order to do this, we create a state machine with 4 states. The output is the signal
called DIGITS. The 4 states output the appropriate code on DIGITS to light up one digit
of the digits of the display. The input that drives the transitions is the signal called
change. This goes high once every millisecond.
The code for this finite state machine is shown below:

TYPE my_states IS (s0, s1, s2, s3 );
SIGNAL state: my_states:=s0;
PROCESS (CLK100MHZ)
BEGIN
IF (rising_edge(CLK100MHZ) ) THEN
1-70
CASE state IS
WHEN s0 => IF change='1' THEN state <= s1;
ELSE state <= s0; END IF;
WHEN s1 => IF change ='1' THEN state <= s2;
END CASE;
END IF;
END PROCESS;
process(state)
begin
case state is
when s0 => DIGITS<="0111"; slice<=number(15 downto 12);
when s1 => DIGITS<="1011"; slice <=number(11 downto 8);
when s2 => DIGITS<="1101"; slice <=number(7 downto 4);
when others => DIGITS<="1110"; slice<= number(3 downto 0);
end case;
end process;
3 Generating the correct timing for the signal “change”

The clock generated on the board and fed into the FPGA has a frequency of 100 MHz.
To make our signal change go high once every millisecond we create a 100 MHz
counter that will count to 100 thousand (binary 11000011010100000, or hex 186A0)
then reset itself to zero and start all over again:
SIGNAL count: UNSIGNED(19 DOWNTO 0):=(OTHERS=>'0') ;

SIGNAL change: STD_LOGIC;
SIGNAL period: UNSIGNED(19 DOWNTO 0):=X"186A0";
PROCESS(CLK100MHZ)
BEGIN
IF rising_edge(CLK100MHZ) THEN -- On each clock cycle …
count <= count + 1 ; -- Increment the counter
IF count >= period THEN -- If it’s time to change digit
count <= (others => '0') ; -- Reset the timer
change <= '1';
ELSE
change <='0';
END if;
END IF;
END PROCESS;
 How to write VHDL descriptions of finite state machines and counter circuits
 How to generate timer signals at regular intervals
1-71
Unit 3.6
Memory Design in VHDL
A memory is a piece of hardware that contains a series of storage locations for data.
Different types of memory permit different types of operations on these storage
locations; which location is the target of the current operation is determined by an
address input.
1 A simple memory circuit

As a specific example, imagine that we wanted to store the marks for 16 students in a
read-only memory (ROM). Each student has an ID number in the range 0 to 15. The
student marks are shown in the table below. In order to code this information, we need
to express this data in hexadecimal. Then we embed this data within a memory. In our
example we will use 16 bits (i.e. 4 hexadecimal digits) for our data words:
Marks list Hexadecimal Memory
Student Mark Student Mark

0 72 0 48
1 49 1 31
2 67 2 43
3 53 3 35
4 43 4 2B
5 57 5 39
6 61 6 3D
7 37 7 25
8 48 8 30
9 55 9 37
10 79 A 4F
11 51 B 33
12 40 C 28
13 61 D 3D
14 58 E 3A
15 62 F 3E
The index of the list runs from 0 to 15 denary (0 to F hex), which needs four unsigned
binary bits (i.e.one hex digit) to represent. To represent the number from 0 to 100
denary ( 0 to 64 hex) could be done in 7 bits, but if we want to allow for the possibility
of negative marks (e.g. due to penalties) we will use 8 signed bits.
LIBRARY IEEE;
USE IEEE.std_logic_1164.all;
USE IEEE.numeric_std.all;
ENTITY rom IS
PORT ( address: IN UNSIGNED (3 DOWNTO 0);
data : OUT SIGNED (15 DOWNTO 0) );
END ENTITY rom;
ARCHITECTURE dataflow OF rom IS

TYPE rom_array IS ARRAY ( 0 TO 15 ) OF SIGNED ( 7 DOWNTO 0 );
SIGNAL rom_data: rom_array := ( X"0048", X"0031", X"0043", X"0035",
X"002B", X"0039", X"003D", X"0025",
X"0030", X"0037", X"004F", X"0033",
X"0028", X"003D", X"003A", X"003E" );
BEGIN
data <= rom_data ( TO_INTEGER (address) );
END ARCHITECTURE dataflow;
1-72
A memory is represented in VHDL by an array. The address input is converted to an
integer and then used as an index into the array to pick out a particular value. If, for
example, we give the binary address input value of 0110, data item number 6 (i.e. a value
of 3D) will be transferred to the output. The line of code that achieves this is:
data <= mem_data ( TO_INTEGER (address) );
This memory is a ROM (read only memory), as we can only read existing values from it;
we can’t write new values into it.
A memory that had the capability to write a new value from a data input into a memory
location would achieve that like this:
mem_data ( TO_INTEGER (address) ) <= data;
This is a single port memory, as we give it one address, and we receive back one data
item. That means that we can only read one item at a time. If we wanted to read two
different items, we would have to do two read operations one after the other; we can’t
read both all in one go.
1.1 Testbench for the ROM

We simulate the ROM by constructing a testbench that includes an instantiation of the
ROM and supplies a sequence of test inputs to the ROM. The following code will apply
an address every 10 ns, with the address ranging from 0 to 15 expressed as a 4-bit
unsigned number. The final WAIT statement causes the process to suspend forever. If we
did not have the final WAIT statement, then the process would reach the end process and
wrap back round to the beginning applying the sequence 0..15 repeatedly.
LIBRARY IEEE;
USE IEEE.std_logic_1164.ALL;
USE IEEE.numeric_std.ALL;
ENTITY rom_test IS
END ENTITY rom_test;
ARCHITECTURE tb OF rom_test IS
SIGNAL input_address: UNSIGNED(3 DOWNTO 0);
SIGNAL output_data: SIGNED(7 DOWNTO 0);
BEGIN
uut: ENTITY work.rom(dataflow)

PORT MAP ( address=>input_address, data=>output_data);
PROCESS --Apply new test input on each clock

BEGIN
FOR i IN 0 TO 15 LOOP
input_address <= TO_UNSIGNED(i,4);
WAIT FOR 10 NS;
END LOOP;
WAIT;
END PROCESS;
The loop index i is implicitly declared to be of integer type by the fact that 0 TO 15 is an
integer range. We do not need to explicitly declare i in our code.
1.2 Synchronising the ROM to a clock
1-73
It is common for a ROM to be synchronised to a clock. To achieve this, we would need
to add a clock signal to its ENTITY, and replace the body of the architecture with this:
PROCESS(clock)
BEGIN
IF RISING_EDGE(clock) THEN
data <= mem_data ( TO_INTEGER (address) );
END IF;
END PROCESS;
1.3 Testbench for the ROM with clock

To create the testbench for the ROM that is synchronised to a clock, we need to add to
our testbench a process that generates a clock. When simulation is carried out, the
process starts running immediately at time 0. When it gets to the WAIT statement, it is
suspended. After 5 ns of simulation time has gone by, the process resumes execution
until it hits the next WAIT statement. After another 5 ns has elapsed, the process
resumes, reaches the END PROCESS, wraps back round to the BEGIN and continues
execution. The process runs forever; every 5 ns the clock signal toggles between 0 and
1.
ARCHITECTURE tb OF rom_test IS
SIGNAL input_address: UNSIGNED(3 DOWNTO 0);
SIGNAL output_data: SIGNED(7 DOWNTO 0);
SIGNAL clock: std_logic;
BEGIN
PROCESS --Generate clock

BEGIN
clock <= '0';
WAIT FOR 5 ns;
clock <= '1';
WAIT FOR 5 ns;
END PROCESS;
uut: ENTITY work.rom(rtl)

PORT MAP ( address=>input_address,
clock=>clock, data=>output_data);
PROCESS --Apply new test input on each clock

BEGIN
FOR i IN 0 TO 15 LOOP
input_address <= TO_UNSIGNED(i,4);
WAIT UNTIL rising_edge(clock);
END LOOP;
WAIT;
END PROCESS;
The second process in the testbench loops over values from 0 to 15, advancing by one on
each clock cycle. The loop index is converted to a 4-bit unsigned number and then applied
as the address to the ROM. The final WAIT statement suspends the process forever, thus
preventing the process from wrapping back to the beginning and applying the test inputs
repeatedly.
2 Multiport Memory
1-74
A multiport memory operates on multiple locations simultaneously. For example, a dual
port ROM would look like this:
In each read cycle, we would apply two addresses and the two indexed data items would
appear at the outputs data1 and data2. So, for example, if we apply the inputs
address1=0010 and address2=0100, we would see at the outputs data1=43 and
data2=2B.
To illustrate why a multi-port memory might be useful, suppose we wanted to compute

the sum of two student marks. We could attach our memory’s outputs to an ALU. We
would then give the memory the two addresses of the marks that we want to add
together, and we would give the ALU the opcode for addition.
This would cause the sum of the two data items would appear at the result output.
However, this is of limited use as we cannot recycle the result for further arithmetic
processing.
3 Register file
This becomes much more useful if we add a third port (address3, data3) which is
capable of writing data back into the memory array. This results in a type of memory
that is known as register file.
1-75
Suppose we wanted to compute the average student mark. Our first step would be to add
all of the marks together, then we would divide by the total number. We can achieve the
addition of all of the marks in a sequence of steps like this:
Step Address1 Address2 Address3 Opcode Result

1 0 1 0 Add mark 0 + mark 1
2 0 2 0 Add mark 0 + mark 1 + mark 2
3 0 3 0 Add mark 0 + mark 1 + mark 2 + mark 3
And so on …
Once we have read out the student mark held at location 0 (which is 48) we no longer
need to preserve the value at that location, so in subsequent cycles we use location 0 to
form a subtotal.
This is the basic idea of how a computer program works. In successive cycles, we apply
a series of binary numbers at address1, address2, address3 and opcode to instruct the
hardware as to the function it should perform and where it can find its operand data.
4 Read and Write memory (RAM)

A memory that is capable of reading from a particular address or writing to it is shown
below.
If write_enable=0, then the operation is a read operation. The data_in input is ignored,
and the data item appearing at the location indexed by the address is sent to the data_out
output.
write_enable=1, then the operation is a write. The value at data_in overwrites the data
held at the location indexed by the address input.
The arrangement above shows how small RAMs are normally organised. However, the
data items are many bits wide the requirement for separate data_in and data_out ports
can become expensive. Large RAMs, for example the main memory in a computer
system, normally has a bidirectional data input/output which connects toa bidirectional
bus:
1-76
1-77
Unit 4.1
VHDL Simulation (2)
In unit 2.2, we saw a brief introduction to VHDL simulation. In this unit and the next,
we will look in more detail at how simulations works and some points that can be
confusing to new users of VHDL.
1 The purpose of simulation

The process of verifying a proposed design using simulation looks like this:
So far, we have looked at using simulation for functional verification (does my adder
actually add? does my shifter actually shift?) which is done by applying the appropriate
sequence of 1s and 0s at the inputs, and looking for the required set of 1s and 0s at the
outputs.
However, there are two additional questions that need to be answered:

1. Synthesis ignores some features of VHDL that simulators can use without problem.
Did the synthesis tool ignore some part of my original VHDL in a way that causes
the synthesized hardware to differ in its behaviour from the original simulation for
functional verification?
2. At what range of clock frequencies will my design operate correctly? (And there is
an associated question: which part of my design is causing the limitation on speed; if
we have a good answer to this question, then we can know how to modify the design
to make it comply with timing constraints)
The first question is simply answered by substituting the synthesized circuit instead of
the initial VHDL into the test bench, then re-running the simulator. The second question
is answered by getting the synthesis tool to incorporate component delays into the
VHDL description of the synthesized circuit, and then re-running the simulator.
1.1 Timing constraints

In general, the data path portions of the VHDL will synthesize to a circuit that has the
following general pattern:
1-78
Combinational Combinational
logic logic
Combinational
logic
Combinational Combinational
logic logic
The dashed lines indicated registers. Register transfers take data from left to right one
stage on each clock cycle, passing through blocks of combinational logic. All registers
must run from the same clock, and the shortest permissible clock period is established
by the block with the longest delay.
1.2 The VHDL simulation mechanism

The way that a VHDL simulation proceeds is as follows. At the start of simulation, we
are at time 0:
 Signals that have an initialization value in the declaration receive that initialization
value
 All other signals are initialized to ‘U’, the uninitialized garbage value
 All concurrent statements are triggered to execute
 All processes are triggered to execute
 The execution of assignments to signals will cause new values to be scheduled onto
the queue. If no delay is explicitly stated, then  delay is used.
Thereafter, simulation proceeds as follows:

 The simulator finds the next scheduled event on the event queue
 The time is incremented to the time of the next scheduled event, and the signal
assignments due for that time are made
 The resulting change in signals may trigger execution of further lines of concurrent
code (if the signals are on the RHS of an assignment) or process (if the signals are
on the sensitivity list).
 The execution of assignments to signals will cause new values to be scheduled onto
the queue.
2 Simulation with delays

Real logic gates have a delay: when we change the inputs, the corresponding output
appears some time later. Here is an example: a 2-input exclusive OR gate with a 5 ns gate
delay.
ARCHITECTURE simple OF myGate IS

BEGIN
c <= a XOR b AFTER 5 NS;
END ARCHITECTURE delayed;
For this piece of VHDL code, whenever a or b change, the value of c will be re-
computed, but c will not get its new value until 5 ns after the change in the input.
Suppose all signals are initialized to zero and then the inputs a and b driven initially to
‘0’ and then to ‘1’.
1-79
At time 20 ns a=’1’ and b=’0’, an input condition that causes c to become ‘1’. However,
there is a 5 ns gap between the cause and the effect, and the output c does not assume its
new value until time 25 ns
If we look at the event queue, we can see how the simulator produced the simulation
results. At time 0, the signal c is initialized to ‘U’ and the assignment to c runs. This
computes a new value, but the new value will not be assigned until 5 ns later. The state
of the queue is therefore:
Time = 0
Signal Name: a b c
Present value: 0 0 U
Next value: 1 1 0
Event time: 20 40 5
The simulator time pointer is advanced to the time of the earliest event on the queue, i.e.
5 ns and c receives its new value.
Time = 5
Signal Name: a b c
Present value: 0 0 0
Next value: 1 1
Event time: 20 40
The signal c does not appear on the right hand side of any assignments, so the change on
c does not cause any further events to be triggered. The simulator time pointer is advanced
to the time of the next event on the queue, i.e. 20 ns and a receives its new value. The
change on a at time 20 ns causes the assignment on c to be executed and a new value of
1 is scheduled for c at time 25 ns.
Time = 20
Signal Name: a b c
Next value: 1 1
Event time: 40 25
The simulator time pointer is advanced to the time of the next event on the queue, i.e. 25
ns and c receives its new value.
1-80
Time = 25
Signal Name: a b c
Next value: 1
Event time: 40
At time 40 ns b receives its new value. The change on b causes the assignment on c to be
executed and this takes effect at time 45 ns.
Time = 40
Signal Name: a b c
Next value: 0
Event time: 45
Time = 45
Signal Name: a b c
Next value:
Event time:
3 Gate delay
To prepare ourselves for the next theme, we will need to tale a closer look at how logic
gates operate and what gate delay means. As an example, let’s consider an inverter with
a gate delay of 5 ns:
y <= NOT x AFTER 5 NS;
Our digital signals, which take values of 0 or 1, are an abstraction of what is really
happening in the real world. x is actually a continuous voltage which is interpreted a 1 if
it is above the logic threshold voltage (shown as a red dashed line) and 0 if the voltage
is below.
Suppose x goes through a 0 to 1 transition. The voltage at x will rise. When x moves
above the threshold, the output y will start to fall but it will take 5 ns to fall as low as
the threshold. During this 5 ns, the digital output y will continue to be in the voltage
range that is digitally interpreted as a 1.
1-81
3.1 Inertial delay
Now let’s imagine that x makes a 0 to 1 transition, and then after only 2 ns makes a 1 to
0 transition.
Initially the voltage at x will rise. When x moves above the threshold, the output y will
start to fall but it will take 5 ns to fall below its threshold. Before that can happen, x
starts to fall again which means that y will start to rise. Throughout the whole this
period, the digital output y will to be in the voltage range that is digitally interpreted as
a 1. The input x went briefly to 1, but the output y did not go to 0.
This is an illustration of the inertial property of logic gates: under mormal

circumstances the output will not respond to an input pulse that is shorter than the gate
delay.
4 Simulation with inertial delay

Now let’s return to out XOR gate example.
ARCHITECTURE simple OF myGate IS

BEGIN
END ARCHITECTURE delayed;
Imagine that the transitions on the inputs to the XOR gate happen closer together, with
a rising at time 20 and b rising at time 22.
Between 20 and 22, the inputs have values that would make the output want to go 1.
However, the XOR gate has a delay and cannot respond until 5 ns has elapsed. Before
this 5 ns passes, the inputs change again to a set of values that means that output should
be 0 (the value it already has).
A real device would respond to these inputs by staying constantly at zero. This is called
an inertial effect; the device delay of 5 ns is a measure of how much electrical inertia it
has, and input transients that are briefer than this length of time will have no effect on
the output. This is the default behaviour of VHDL, so if we simulate our gate, it behaves
as follows:
This is detected in VHDL by looking for collisions on the event queue.
1-82
Time = 0
Signal Name: a b c
Present value: 0 0 U
Next value: 1 1 0
Event time: 20 22 5
Time = 5
Signal Name: a b c
Next value: 1 1
Event time: 20 22
Time = 20
Signal Name: a b c
Next value: 1 1
Event time: 22 25
Time = 22
Signal Name: a b c
Next value: 1 0
Event time: 25 27
Two events have collided on the event queue. By default, VHDL just throws away the
old event to make way for the new. So the event queue looks like this:
Time = 22
Signal Name: a b c
Next value: 0
Event time: 27
So c never goes high. This gives the inertial behaviour.
5 Assigning multiple transactions for a signal

In one assignment statement, it is possible to post multiple assignments onto the event
queue. The different values must take place at different times, and they are separated by
commas. So we can write this:
a <= '0', '1' AFTER 20 NS, '0' AFTER 60 NS;
Note that this can only be used in simulation. It cannot be used in synthesis. This
technique can be useful when you want to assign the input values for a test bench, but
without using a process. So, for example, if we were testing a two-input device and
wanted the inputs to do this:
We can use the following code:
1-83
ARCHITECTURE tb1 OF tb IS
SIGNAL in1, in2: STD_LOGIC;
BEGIN
-- Set up the test input signals
a <= '0', '1' AFTER 20 NS, '0' AFTER 60 NS;
b <= '0', '1' AFTER 40 NS, '0' AFTER 80 NS;
-- Rest of the test bench goes here
END ARCHITECTURE tb1
According to the normal rules of VHDL, the assignments will run at the beginning of
simulation (because all concurrent lines of code run at the start of simulation). They will
not run again subsequently (because lines only re-run when a RHS value changes, and
these lines have no changeable signals on their RHS). After the assignments have run
this will be the state of the event queue:
Time = 0
Signal Name: a b
Present value: U U
Next value: 0 1 0 0 1 0
Event time:  20 60  40 80
Using the comma operator, the multiple transaction co-exist on the queue, and are not
treated in an inertial manner. We could achieve the same behaviour for a and b by using
the following process:
ARCHITECTURE tb2 OF tb IS
SIGNAL a, b: STD_LOGIC;
BEGIN
PROCESS
BEGIN
-- Set up the test input signals
a <= '0';
b <= '0';
WAIT FOR 20 NS;
a <= '1';
WAIT FOR 20 NS;
b <= '1';
WAIT FOR 20 NS;
a <= '0';
WAIT FOR 20 NS;
b <= '1'
WAIT;
END PROCESS;
-- Rest of the test bench goes here
END ARCHITECTURE tb2
5 Summary
VHDL simulation proceeds by moving through time in response to events scheduled on
an event queue. As assignments run, they schedule new events for signals to receive in
future. As new signals receive new values, they will trigger the execution of further
lines of code.
You should now know

 How a simulator operates
 The effect of inertial delay
 How to post multiple transactions for a single signal
1-84
Unit 4.2
VHDL Simulation (3)
In this unit we will look at how to use concurrent code and how to use processes to give
correct results for combinational logic. In particular, we will see how to construct the
sensitivity list for a process, a matter that can be confusing for users who are new to
VHDL.
1 Concurrent code with multiple assignments

We will start with a simple example whose behaviour will be obvious
BEGIN
d <= c AFTER 2 NS;
We assume that the simulation has been started by some other piece of code that has
assigned initial values of a, b, c, d to 0 and has scheduled a transition on a from 0 to 1 at
time 20.
Time = 0
Signal Name: a b c d
Present value: 0 0 0 0
Next value: 1
Event time: 20
We jump to the time of the next scheduled transaction, i.e. time 20, and let a take its
new value. This causes the assignment on c to run. That will cause c to transition to 1 at
a time 5 in future:
Time = 20
Next value: 1
Event time: 25
We jump to the time of the next scheduled transaction, i.e. time 25, and let c take its
new value. This causes the assignment on d to run. That will cause d to transition to 1 at
a time 2 in future:
Time = 25
Next value: 1
Event time: 27
Finally we jump to the time of the next scheduled transaction, i.e. time 27, and let d take
its new value. This completes the simulation:
Time = 27
Next value:
Event time:
The simulator extracts out the waveform from the tables above:
1-85
This simulation contained no surprises, but it gives au a foundation for our examination
of how processes simulate.
2 An incorrect process
Now we will try to describe the same logic using a process. The body of the process
computes new values for the outputs c and d. The process is triggered to run whenever a
signal on its sensitivity list changes. A common mistake in constructing the sensitivity
list is to assume that we only need to include the inputs a,b:
BEGIN
PROCESS(a, b) -- This is wrong
BEGIN
d <= c AFTER 2 NS;
END PROCESS;
Once again, we assume that the simulation has been started by some other piece of code
that has assigned initial values of a, b, c, d to 0 and has scheduled a transition on a from
0 to 1 at time 20. Let’s see how this simulates:
Time = 0
Next value: 1
Event time: 20
new value. This causes the process to run and compute an assignment to cause c to
transition to 1 at a time 5 in future. The assignment on d will also run, but that will
assign the current value of c (i.e. zero), not the future value:
Time = 20
Next value: 1
Event time: 25
new value. This has no further consequences: we have finished
Time = 25
Next value:
Event time:
The final waveform is as follows:
1-86
This is not equivalent to the situation that we simulated in section 1. The signal d did
not follow the value of c.
3 A corrected process
We can correct this by including c in the sensitivity list:
BEGIN
PROCESS(a,b,c) -- This is right
BEGIN
d <= c AFTER 2 NS;
END PROCESS;
Time = 0
Next value: 1
Event time: 20
new value. This causes the process to run and compute that c should transition to 1 at a
time 5 in future. The assignment on d will also run, but that will assign the current value
of c (i.e. zero), not the future value:
Time = 20
Next value: 1
Event time: 25
new value of 1. This causes the process to run again and compute an assignment to c to
get 1 which is not a change, so we don’t update the queue) and an assignment of d to 1
to happen 2 in future.
Time = 25
Next value: 1
Event time: 27
Finally we jump to the time of the next scheduled transaction, i.e. time 27, and let d take
its new value. This completes the simulation:
Time = 27
Next value:
Event time:
1-87
The final waveform is as follows:
This is correct. Adding nore c to the sensitivity list means that when c is assigned a new
value, node d updates.
4 Constructing a process for combinational logic

To get correct behaviour for a piece of combinational logic, the sensitivity list should
contain:
 All inputs
 Any internal node that is used as the target of an assignment
1-88
Unit 4.3
Using Processes for Combinational Logic
In this unit, we continue our consideration of the features of VHDL that have tended to
cause students confusion during the assignments. Specifically, we will look at how to
use signals correctly when describing combinational logic with a process.
1 Rules for describing combinational logic

Descriptions of combinational logic also need to be treated with care if we are using
references to internal signals. For example, suppose we are trying to describe the
following system and we decide to make explicit reference to the internal node n1:
n1
x
sum
y
cin
cout
The obvious way to describe this is as follows:

ARCHITECTURE wrong OF fulladd IS --1
SIGNAL n1: STD_LOGIC; --2
BEGIN --3
PROCESS (x, y, cin) --4
BEGIN --5
n1 <= x XOR y; --6
sum <= cin XOR n1; --7
cout <= ( x AND y ) OR ( cin AND x ) OR ( y AND cin ); --8
END PROCESS; --9
END ARCHITECTURE wrong; --10
The output needs to be re-calculated whenever one of the inputs change, so we use x, y
and cin as the sensitivity list for the process. However, the above description is wrong.
It does not behave like the circuit in the diagram and a synthesis tool would not produce
the desired circuit if it was fed with this code. The problem is that during the execution
of the process, all of the signals have their value frozen. The new values are applied one
delta after the process finishes running.
Imagine that initially x=’0’, y=’0’ and cin=’0’. As a result, the internal node n1=0. Then
x changes from ‘0’ to ‘1’. The process would run because a signal on its sensitivity list
has just changed. Statement 6 computes a new value of ‘1’ for n1, but this value will not
be applied until after the process has finished. In the mean time, statement 7 is executed
and uses the old value of n1, thus producing a result of 0. The process then finishes; n1
gets its new value of ‘1’ and sum gets its value of ‘0’. By contrast, the circuit in the
diagram would produce a final value of ‘1’ for sum. The VHDL description is therefore
incorrect.
We can remedy the problem with the process by adding the internal node n1 to the
sensitivity list.
ARCHITECTURE corrected OF fulladd IS --1
1-89
SIGNAL n1: STD_LOGIC; --2
BEGIN --3
PROCESS (x, y, cin, n1) --4
BEGIN --5
n1 <= x XOR y; --6
END PROCESS; --9
END ARCHITECTURE corrected; --10
In this case, when x changes to 1 the process runs because a signal on its sensitivity list
has changed. As before, n1 is assigned to a value of 1 and sum to a value of 0. Now,
because n1 has just changed and it appears on the sensitivity list, the process immediately
runs again. This time statement 7 will use the updated value of n1 and sum will get a
value of 1.
In general, the rule to describe combinational logic using a process is that all inputs and
all internal nodes that receive an assignment must appear on the sensitivity list.
3 Example using variables

The problem of the previous example happened because n1 was an intermediate signal,
and VHDL assigns to it with a  delay. We have seen that we can fix the problem by
putting the internal signal on the sensitivity list. Another way to solve the problem
would be treat n1 as a variable, like this:
ARCHITECTURE corrected OF fulladd IS --1

BEGIN --3
PROCESS (x, y, cin) --4
VARIABLE n1: STD_LOGIC; --2
BEGIN --5
n1 := x XOR y; --6
END PROCESS; --9
END ARCHITECTURE corrected; --10
Now n1 updates immediately when statement 6 is executed, so its new value is used in
statement 7.
 The difference between a signal and variable

 That a process describing combinational logic must include all inputs and all
internal signals in the sensitivity list
1-90
Index
Unit 1.1 Introduction to the Digital Half of the Module 1-1
Unit 1.2 Why do we need HDLs? 1-2

1 What’s the problem with traditional methods? 1-2
2 VHDL 1-4
3 Levels of abstraction 1-4
4 Synthesis 1-5
5 Summary 1-5
Unit 1.3 Introduction ot VHDL 1-6

1 Entity and architecture 1-6
2 Specifying another architecture 1-7
3 BEGIN and END statements 1-7
4 Semicolons 1-8
5 Stylistic issues 1-8
5.1 Case 1-8
5.2 Spaces and indents 1-9
5.3 Returns 1-9
5.4 Annotating END statements 1-9
5.5 Comments 1-10
6 The IEEE library 1-10
6.1 Opening libraries 1-10
6.1 Using STD_LOGIC 1-11
7 Summary 1-11
Unit 1.4 Handling Signals that are more than 1 bi wide 1-12
1 STD_LOGIC_VECTORs 1-12
2 An example 1-12
3 STD_LOGIC_VECTOR values 1-13
3.1 Direction of numbering 1-13
3.2 Aggregates 1-14
3.3 Concatenation 1-15
3.4 Literals 1-15
4 Summary 1-15
Unit 1.5 Number Representation and Arithmetic 1-17

1 Denary 1-17
2 Unsigned binary 1-17
3 Signed numbers: 2s complement 1-18
4 Addition of 2s complement numbers 1-19
5 Using arithmetic: the NUMERIC_STD package 1-19
6 A design example: the Arithmetic Logic Unit 1-21
1-91
Unit 2.1 Dataflow and Structural VHDL 1-23
1 Behavioural description versus structural description 1-23
2 Example of transforming a high level description to a netlist 1-24
2.1 Implementing the adder function 1-25
3 A dataflow description of the full adder 1-25
3.1 Local signals 1-27
4 Connecting entities together: structural VHDL 1-27
4.1 Placing library components into a design 1-28
4.2 Positional association 1-29
4.3 Named association 1-29
5 Summary 1-29
Unit 2.2 VHDL Simulation 1-30

1 How are statements processed? 1-30
2 Simulation of dataflow VHDL 1-30
3 Concurrent processing 1-32
4 Components with delays 1-32
4.1 The AFTER keyword 1-33
4.2 The full adder example with component delays 1-33
5 Simulation of structural VHDL 1-35
6 Summary 1-35
2.3 VHDL Processes and Test Benches 1-36

1 Sequential VHDL: PROCESSes 1-36
2 The WAIT statement 1-36
3 Processes with sensitivity lists 1-37
4 Sequential and concurrent VHDL 1-38
4.1 Sequential and concurrent conditionals 1-38
4.2 Sequential and concurrent selection 1-39
5 Setting up simulations in VHDL: test benches 1-40
5.1 Test bench for our adder example 1-41
2.4 Synthesized hardware 1-43

1 Boolean operators 1-43
2 Comparison operators 1-43
3 Selection operators 1-44
4 Latch inference 1-44
5 Arithmetic operators: Addition-like operators 1-45
6 Absolute value: c <= abs(a) 1-46
7 Multiplication: c <= a * b; 1-46
8 Synthesis optimizations 1-47
2.5 Problems with VHDL synthesis 1-48

1 The garbage values: ‘X’, ‘U’ 1-48
2 Contention 1-48
3 VHDL drivers in concurrent code 1-49
4 Processes and drivers 1-50
5 A common mistake: incorrect IF blocks 1-51
6 Synthesizable VHDL 1-52
1-92
Unit 3.1 Register Transfer Level VHDL (1) 1-53
1 The D-type flip-flop 1-53
1.1 D-type flip-flop with reset 1-54
2 Registered logic 1-55
2.1 Example: Carry ripple in adders 1-55
2.2 The registered adder 1-56
2.3 VHDL description of the registered adder 1-57
2.4 Register transfer level (RTL) description 1-58
3 Summary 1-58
Unit 3.2 Register Transfer Level VHDL (2) 1-59

1 A chain of registers 1-59
2 Describing the shift register in VHDL 1-60
3 Register inference in RTL VHDL 1-60
4 Order of statements in RTL VHDL 1-61
Unit 3.3 Controlling Register Inference using Signals and Variables 1-62
1 Pipelines 1-62
2 Speed of pipelined datapaths 1-63
3 Controlling pipelining with signals 1-63
4 VHDL variables 1-65
5 Summary 1-65
Unit 3.4 Designing Finite State Machines using VHDL 1-66

1 The idea of state 1-66
2 The state diagram 1-67
3 The state table 1-67
4 Describing this in VHDL 1-67
Unit 3.5 Creating a Display Driver using VHDL 1-69

1 The seven segment display driver 1-69
2 Driving multiple digits on the display 1-70
3 Generating the correct timing for the signal “change” 1-71
Unit 3.6 Designing memories using VHDL 1-72

1 A simple memory example: a ROM 1-72
1.1 A Testbench for the ROM 1-73
1.2 Synchronising the ROM to a clock 1-74
1.3 Testbench for the ROM with clock 1-74
2 Multiport memory 1-75
3 Register file 1-75
4 Read and write memory (RAM) 1-76
1-93
Unit 4.1 VHDL Simulation (2) 1-78
1 The purpose of simulation 1-78
1.1 Timing constraints 1-78
1.2 The VHDL simulation mechanism 1-79
2 Simulation with delays 1-79
3 Gate delay 1-81
3.1 Inertial delay 1-82
4 Simulation with inertial delay 1-82
4 Assigning multiple transactions for a signal 1-83
Unit 4.2 VHDL Simulation (3) 1-85

1 Concurrent code with multiple assignments 1-85
2 An incorrect process 1-86
3 A corrected process 1-87
4 Constructing a process sensitivity list for combinational logic 1-88
Unit 4.3 Using Processes for Combinational Logic 1-89

1 Rules for describing combinational logic 1-89
2 Example using variables 1-90
1-94

CHDD part 1

Uploaded by

Document Informationclick to expand document informationDigital note for beginer

Document Informationclick to expand document information

Copyright:

Available Formats

CHDD part 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CHDD part 1

Uploaded by

Copyright:

Available Formats

Computer Hardware and Digital Design

Part 1: The VHDL language (

Part 2 Computer hardware and application-specific integrated circuits

The digital part of the module consists of:

1 What’s the problem with traditional methods?

The procedure would then go as follows:

Some implementation styles don’t use basic logic gates

HDLs aim to alleviate these problems. Specification is normally done at the

You should now know...

The meaning of the following:

1 Entity and Architecture

To describe this in VHDL, we use an entity declaration.

Everything in uppercase is a VHDL keyword, i.e. part of the language. Everything in

ARCHITECTURE simple OF nandgate IS

The ARCHITECTURE statement says that we are producing a description of what

How the outputs relate to the inputs is described by the statement

2 Specifying another architecture

ARCHITECTURE complicated OF nandgate IS

3 BEGIN and END statements

for (i=1; i<=n; i++)

for (i=1; i<=n; i++); /* WRONG: shouldn't be a semicolon here */

Similarly in VHDL these would be wrong

FOR i IN ( 1 TO N ) LOOP; --WRONG: shouldn't be a semicolon here

Let's have another look at our simple example:

ARCHITECTURE simple OF nandgate IS

Nowadays it is normal to put everything in lowercase. Modern VHDL editors are

5.2 Spaces and indents

You can use whichever you feel is clearest.

5.4 Annotating END statements

ARCHITECTURE simple OF nandgate IS

Although the annotation of END statements is normally optional, it is considered to

-- Here is our simple first example

ARCHITECTURE simple OF nandgate IS

6 The IEEE library

6.1 Opening libraries

ARCHITECTURE simple OF nandgate IS

You should now know...

The meaning of the following:

4 c Expanded out, it looks like this: b c2

In VHDL, quantities such as a, b and c are called STD_LOGIC_VECTORs. If you are

The entity declaration would look like this.

ARCHITECTURE number2 OF orgate IS

Or, if we preferred, we could write this

If a is 4-bits wide assignment looks like this:

3.1 Direction of numbering

a: STD_LOGIC_VECTOR(3 DOWNTO 0);

nibble1 <= ( '0','1','0','0');

This would set all of the elements of nibble1 to '1'.

would cause byte to assume the value ( '0','1','0','0','0','0','1','0')

The meaning of the following:

How to represent standard logic vector values

3  100 + 6  10 + 5  1, which is1

3  102 + 6  101 + 5  100

This form of representation is called unsigned binary. This is the representation

3 Signed numbers: 2s complement

Number 2s complement representation

Unsigned binary 2s complement

4 Addition of 2s complement numbers