Common HDL Errors
Common HDL Errors
Common HDL Errors
1.1 Introduction
Visit the American National Transportation Safety Board web-site (www.ntsb.gov) and you will
discover that there are two standard ways to kill oneself in an aeroplane. One of those is known as
controlled flight into terrain, basically flying oneself into the side of a hill. The second is the stall/spin
accident. Both types of accident are well understood, easily avoided and great efforts are made to
educate pilots in their prevention. Nevertheless, they reoccur with depressing regularity. It seems that
the hard-won lessons of previous generations are not always learned.
Nobody, certainly not I, would attempt to compare the importance of writing HDL code without
making basic errors with the seriousness of avoiding aeroplane crashes. Nevertheless, there are number
of standard errors that are repeated over and over again by legions of HDL design engineers. This is
in part, I believe, because there is no equivalent official drive to promulgate good practice within the
HDL design community. The purpose of this article then, is to introduce a few of the very basic faults,
to describe the problems that they produce, and to suggest alternative, arguably superior,
methodologies. In many cases, these errors are made at the foundation level. Consequently,
everything that is built upon them thereafter is, shall we say, sub-optimal.
It should be possible to code a design by reference to the specification (in combination with any
documents referenced therein). However, the straightforwardness of that process may be enhanced if
the specification is written to be coding friendly. Here are a few suggestions towards that end:
1. Define all external interfaces. Name all inputs, outputs and bi-directional signals. These names
should be used verbatim in the RTL.
2. Define the device architecture, naming each module. This architecture should appear within the
RTL with all names maintained.
3. Define all internal communication paths, in particular any internal buses. If possible, name all the
associated signals. Describe the purpose of these communication paths and what kind of
information flows along them.
4. Describe, preferably in words and pictures, the operation of each sub-module.
5. Name all memory-mapped registers using appropriate mnemonics. These names should appear
within RTL.
A corollary of the previous point is that names are best chosen in terms of what signals do, rather than
where they come from, how many bits they have or whatever. Such information can easily be
determined from the code. What is nowhere near as obvious is what a signal actually does or even why
it exists at all.
To further aid clarity, it is also, in my opinion, good practice to avoid abbreviating wherever possible.
Such techniques help to make the code reasonably self-documenting.
r - read only
r/w - read/write
r/c - read/clear
Doesn't look too bad? But there are three problems with it as follows:
1. Reg SR has been declared as an eight bit quantity even though only five locations are used. Three
of these bits are necessarily unused. With luck, the synthesiser will realise and eliminate the three
bits but this is by no means certain.
2. It was necessary to declare the address of register SR as the constant c_SR to avoid a name clash.
No disaster but a little ungainly.
3. It was necessary to chop up the SR register twice, once when writing it to avoid assigning the
unused bits, and once when setting the INT bit. This is particularly bad because it locks the code
to the particular memory map. For example, it would be awkward to move the INT bit into a
different location. In the case of large fluid designs, the problem can become totally
unmanageable. Far better to declare the individual fields individually and then concatenate them
into the address map as follows (using VHDL this time):
begin
Note now that the address map is entirely defined by the sections dealing with reading and writing. All
other functionality, in this case setting of the INT bit, is independent of the address map. INT is
handled as a stand-alone field; there is no need to continually chop it out of the SR register. Moreover,
there are no unused bits declared that, we hope, the synthesiser will eliminate.
INTb
Processor
Interface
INTr
Interrupt
Consider the simple design hierarchy shown in Figure 2. The INT bit represents an interrupt. There
are two places it might reside. It could be sited within the processor interface (denoted r) or it could
be placed within the module in which the event that causes the interrupt will occur (shown with a b).
The examples shown in section 1.2.3 assume the latter. It can be seen how cleanly the INT bit is
handled, principally because all of its behaviour is encompassed in the single clockedLogic process.
Nevertheless, many designers choose the former location. But consider the difficulties:
1. When the particular event that sets the interrupt occurs, the detecting module must signal to the
processor interface to set the interrupt. This requires a dedicated wire. The processor will then set
the INT bit.
2. When the processor reads the SR register (in which the INT bit resides), INT gets cleared. The
INT bit must then be wired back to the detecting module so that it knows that the interrupt has
been cleared and it can again start looking for the interrupt causing event. It is essential that the
INT bit is cleared and the interrupt detection circuitry is re-armed on exactly the same clock edge.
If this is not the case, there will exist an intermittent, and very difficult to find, bug whereby, if an
interrupt causing event happens close to the point of the interrupt being read, there is a possibility
of the interrupt being missed. Placing the interrupt setting and clearing circuitry in the same
always block (Verilog) or process (VHDL) by locating the INT bit within the detecting module
(b) makes this task much easier.
The other obvious disadvantage of centralising memory-mapped registers is that the processor interface
needs a connection for every single one. Not only is this very untidy, it makes the addition or removal
of bits or fields unduly onerous. This is because it is necessary to edit (in the case of VHDL) two
entities and two architectures in order to implement the bit and connect it up. By contrast, an extra bit
can be added locally with a couple of lines of code.
endcase
Ostensibly ok. But take a closer look and you will notice two common mistakes. Firstly, there is a lot
of repetition. I call this inside out code because the repeated sub-test should really be the outer test.
Whenever an engineer finds himself repeating the same lines of code, he should wonder whether he can
enhance its coniseness by re-nesting the tests.
begin
if enable='1' then
case state is
end case;
end if;
In general, hierarchy should really only be introduced for one of two reasons:
1) It is unavoidable, for example when repeated functionality is required;
2) there is an obvious need for it, for example to separate distinct and independent functions.
Data-path controller
The beauty of such an approach is its flexibility and robustness. Individual elements in the data-path
might have two or more modes of operation with totally different data throughputs as well as sporadic
data rates. Imagine say a module whose job it is to detect a particular frame alignment pattern within a
data-stream, discard it and pass on the resulting payload data (example used in section 1.2.8.1).
Changes may also be made quickly in the light of simulation or perhaps as a result of failing to meet
synthesis timing constraints. For example, it would be straightforward to add an additional pipeline
delay into one of the data-path elements. In the case of an external controller, an increase in the latency
of a particular stage would necessitate a commensurate additional state delay within the controller.
There is an exception to this rule which is when it is intended to use replication and custom layout
techniques to produce an optimised physical data-path. Such techniques are quite rare however. If the
intention is to implement the data-path using simple random logic (e.g. Standard Cell, Gate Array or
FPGA), it is quicker and more elegant to embed the data-path into the control logic.
1.2.8.1 Example
There follows a simple example data-path element (Figure 4) whose function is to detect a frame-
alignment pattern (Table 1) in an incoming data-stream, extract the payload (D0, D1, D2…Dn-1, Dn) and
status information (S) and pass them on to the subsequent data-path element. The module has its data-
path embedded within the state machine and the output data makes use of the data abstraction
techniques described in section 1.2.9.1.
reset
synchronised
clk
LIBRARY ieee;
USE ieee.std_logic_1164.all;
PACKAGE frame IS
type t_dataOutValid is ( IGNORE, DATA, STATUS );
END frame;
LIBRARY ieee;
USE ieee.std_logic_1164.all;
USE ieee.std_logic_arith.all;
LIBRARY EXAMPLE;
USE EXAMPLE.frame.all;
ENTITY frameAligner IS
PORT(
clk : IN std_logic;
reset : IN std_logic;
dataIn : IN std_logic;
dataInValid : IN std_logic;
dataOut : OUT std_logic;
dataOutValid : OUT t_dataOutValid;
synchronised : BUFFER std_logic
);
END frameAligner ;
BEGIN
begin
if dataInValid='1' then
match := true; -- Default
case count is
when 0 => if dataIn /= '1' then
match := false;
end if;
when 1 => if dataIn /= '0' then
match := false;
end if;
when 2 => if dataIn /= '1' then
match := false;
end if;
when 3 => if dataIn /= '1' then
match := false;
end if;
when 4 => if dataIn /= '0' then
match := false;
end if;
when 8 => if dataIn /= '0' then
match := false;
end if;
when 12 => if dataIn /= '0' then
match := false;
else
synchronised <= '1';
end if;
when 5| 6| 7|
9|10|11|
13|14 => if synchronised='1' then
dataOut <= dataIn;
dataOutValid <= DATA;
end if;
if match=false then
synchronised <= '0';
count := 0;
else
count := (count+1) mod 16;
end if;
end if;
end if;
END RTL;
Such an approach is in many ways analogous to the difference between interrupt routines and
background processing in real time software. Robustness and simplicity is enhanced be doing as much
work as possible in the background. Only constrain timing when critically necessary.
The technique works particularly well when a device has to work in one of several, mutually exclusive,
modes of operation. The same physical wires may be used to transfer multiple different types of
information at numerous data rates. By this means, a single data-path can perform as many functions
as are required thereby avoiding the need for multiple data-paths with untidy multiplexing to switch
between them. Each element in the data-path would most likely consist of one or more state machines,
each capable of several programmable operating modes.
1.2.10.1 Artifacts
Apart from expediency, coding up a piece of logic in terms of its function makes the finished design far
more comprehensible to other engineers. A corollary is the need to avoid artifacts. What are artifacts?
Consider the following code segment:
always @day
case( day )
MONDAY |
TUESDAY |
WEDNESDAY |
THURSDAY |
FRIDAY: alarmClock = 1'b1;
SATURDAY |
SUNDAY: alarmClock = 1'b0;
endcase
In some cases, this methodology leads to bloated code. Functions, tasks (Verilog) or procedures
(VHDL) can help to keep things succinct:
function isItTheWeekendYet;
input day; reg [2:0] day;
case( day )
MONDAY |
TUESDAY |
WEDNESDAY |
THURSDAY |
FRIDAY: isItTheWeekendYet = 1'b0;
SATURDAY |
SUNDAY: isItTheWeekendYet = 1'b1;
endcase
endfunction // isItTheWeekendYet
always @day
alarmClock = !isItTheWeekendYet( day );
address
write strobe
chip select
data d q
address
chip select
write strobe
Figure 6 Functionally equivalent but inherently testable alternative
Said data is read into the VHDL/Verilog simulation and applied to the device under test. The resultant
output data is written out into different files. These are then either checked manually (in simple cases)
or post-processed (possibly using the original 'C' model) to deem correct operation. The basic concept
is shown in Figure 7.
This avoids the need for too much intelligence to be built into the test bench itself. Of course, it is
perfectly sensible, in the case of very complex functions, to test sub-modules hierarchically. For
example, in the above case, it would be sensible to check say the Viterbi decoder as a stand-alone
function.
RTL
Model
Original Transformed
Data File Data File
Software
Equivalent
Model
Pretty much the simplest device to test is the humble inverter. Here is a test bench for an inverter
written according to these guidelines:
procedure checkInverter(
signal a: out std_logic;
signal z: in std_logic
) is
begin
a <= '0';
wait for 5 ns;
The procedure compare is defined in a package called testHarnessUtilites.vhd and is available from the
Design Abstraction web-site (http://www.designabstraction.com/). It may be freely used and
distributed so long as the copyright message is retained. A similar Verilog equivalent,
testHarnessUtilities.v is also available. The simulation transcript produced by calling the function is as
follows:
# 5.000 ns <<< PASS >>> check inverter; expected 1 and indeed found it.
# 10.000 ns <<< PASS >>> check inverter; expected 0 and indeed found it.
It is impossible to check the output of a circuit at all times. In general, I tend to think of a test bench as
a series of gates through which the device under test must pass, rather like a canoe slalom coarse
(Figure 8). The path taken between gates is therefore irrelevant. The inverter test, above, for example
allows 5ns before checking the output. It doesn't matter whether the propagation delay of the inverter
is 1ns or 4ns, it will still pass this test (though not if it takes 6ns).
In Verilog, most of the utilities required to create a quality self-checking test bench are built into the
language. The include file testHarnesUtilities.v consists mainly of the function compare. By contrast,
it is much less straightforward to produce messages to the simulation transcript using VHDL. That is
why package testHarnessUtilities.vhd is considerably more comprehensive.
This file clearly causes the test-bench processor model to go through a sequence of read and write
operations to various registers within the device under test (perhaps it is a UART). In the case of
writes, the data to be written is supplied whereas for reads, the expected data is supplied. "Seems
reasonable", I hear you say. But there are several major disadvantages that, to my way of thinking,
make this methodology particularly painful:
1. It is necessary to write a parser to interpret the file, a non-trivial task that achieves little in itself.
The parser would require a routine to recognise the various commands in the file. Recognition of a
particular command would then trigger an associated task (Verilog) or procedure (VHDL). Why
not simply call such procedures straight from the test harness?
2. It is necessary to predict everything that the device under test will do and when it will do it because
there is no mechanism for intelligence within the test bench. It might be argued that this is a good
feature. It is certainly very inflexible. Consider polling say an Rx Ready flag until it rises. It
would most probably be necessary to run the simulation once to find out when it will happen and
then hard code the delay into the control file. Far better simply to build a bit of intelligence into
the test harness using a simple loop. Granted, if the flag never rises, the test bench might poll
forever but that would soon be noticed and fixed.
3. Every time a new instruction is required, the parser has to be enhanced to recognise it.
4. It is very difficult to coordinate the timing of the processor interface to what happens on other
interfaces to the device under test (for example the data-path inputs and outputs).
5. It is impossible to run individual threads simultaneously. The Verilog fork/join construct is
fantastic for this though it can equally be achieved using VHDL (with a little more difficulty).
Some might argue that using files allows the development of multiple different tests. This is true…just.
But more often than not, what distinguishes one test from another is very often the test data itself
(different data rates say). The control side of things will usually be pretty similar aside from the initial
configuration of the device. All in all, file driven test-bench control achieves, with considerably less
finesse, what VHDL procedures or Verilog tasks can achieve directly.
The use of this particularly bad technique is virtually endemic across the industry. Why, I know not.
My suspicion is that many engineers are hell bent on using file I/O to prove how clever they are,
irrespective of its usefulness! Notice that I am not against file I/O per se. It's great for handling data
(section 1.3.1) but it's manifestly inadequate as a means of coordinating the operation of a test bench.
1.3.5 Always build tests upward from the lowest common denominator.
This point might seem obvious. Nevertheless, I have seen a surprising number of cases where this rule
is violated to the considerable future inconvenience of the perpetrator. For example, a given design
might deal with say packets of data comprising header byte, justification bits, payload data and a CRC
Similarly, in the case of designs that interface to a microprocessor (which tends to be most designs), it
is good practice to provide a test bench task (Verilog) or procedure (VHDL) to perform reads and
writes. These are then strung together in the test bench as necessary to stimulate the device under test
as if the test bench was a piece of firmware. Seems obvious enough but I've seen people define a task
to say write two bytes because that was what was needed at the time. Why not simply call the write
task twice?
Incidentally, I always provide the Read/Write task with three modes of operation as shown in Table 2.
Note that the very powerful READ_RETURN mode opens up a whole host of capabilities simply not
possible in a file driven test bench.
The other purpose in not changing inputs close to the active edge of the system clock is that it will
avoid set-up and hold violations when running gate level simulations. In the case of truly
asynchronous inputs (where double buffering is used to re-time the signal to the clock), this is
artificially optimistic. Set-up and hold violations would be expected in real life and should equally be
expected during gate-level simulation. The transcript should be examined to ensure that all such
warnings are only produced by registers where this is known to be the case and nowhere else.
By far the easiest way to produce cycle based vectors is to use the original test bench. This means that
the test bench must be persuaded to run in a cycle-based mode. It is then a simple matter to write out a
print-on-change file (using for example $dumpfile followed by $dumpvars in Verilog) and translate it
into the Vendor's required format using the translation tools that they should also supply.
It is a good idea to only run the test bench in cycle based mode for the purpose of generating vectors.
In normal mode, input timings should be more representative of real life. When working in verilog,
this is easily achieved using conditional compilation.
1.4 Conclusion
This article is necessarily subjective. Nevertheless, there are some observations here that might
provide some insight into the kind of problems that engineers typically cause themselves. Rules, as we
know, are for the guidance and wise men and the obedience of fools. The wise man, however,
acquaints himself with the received wisdom before flaunting it. Feel free to ignore the contents of this
article. But I invite you to try some of these techniques in your next design. You might like them.
Glossary
ALU Arithmetic Logic Unit
ASIC Application Specific Integrated Circuit
CRC Cyclic Redundancy Check
FPGA Field Programmable Gate Array
RTL Register Transfer Level
Rx Receive
UART Universal Asynchronous Receiver Transmitter
Verilog
VHDL Very High Speed Integrated Circuit Hardware Description Language