B. Tech Project

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 52

CHAPTER 1

AN OVERVIEW OF SERIAL COMMUNICATION


Serial communication is the process of sending data and receiving one bit of
data at one time sequentially through a communications channel or computer bus.On
the other hand, parallel communications is a process where all the bits of each symbol
are sent together. In general, serial communication is used for all long-haul
communications and most computer networks where it is impractical to use parallel
communications due to the cost of cable and synchronization. Nowadays computer
buses or network communication using serial communications are becoming more
common as improved technology enables them to transfer data at higher speeds.
There are 2 types of serial communication, full-duplex and half duplex. A full
duplex device can send and receive data at the same time. Thus, a full duplex
communication needs 2 different ports, one for serial in data while another for serial
out data. On the other hand, half duplex serial devices support only one-way
communications and therefore only able either receiving or transmitting data at a
time. Normally half duplex devices share the same port for both serial in and out.
Although IOP designed in this project has 2 dedicated port serial in and serial out for
transmitting and receiving data, however IOP is considered as half-duplex device as
IOP only have one control unit to manage the receive and transmit traffic at a time.
UART
Universal asynchronous receiver/transmitter (UART) is an asynchronous serial
receiver/transmitter. It is a piece of computer hardware that commonly used in PC
serial port to translate data between parallel and serial interfaces. The UART takes
bytes of data and transmits the individual bits in a sequential fashion. At the receiving
point, UART re-assembles the bits into complete bytes.
Asynchronous transmission allows data to be transmitted without having to
send a clock signal to the receiver. Thus, the sender and receiver must agree on timing
parameters in advance and special bits are added to each word which is used
to synchronize the sending and receiving units. In general, UART contains of two
main block, the transmitter and receiver block. The transmitter sends a byte of data bit

by bit serially out from UART while UART receiver receives the serial in data bit by
bit and converts them into a byte of data.
UART starts the data transmission by asserting a bit called the "Start Bit" to
the beginning of each data that is to be transmitted. The Start Bit is also used to
inform the receiver that a byte of data is about to be sent. After the Start Bit, the
individual bits of the byte of data are sent, with the Least Significant Bit (LSB) being
sent first. Each bit in the transmission is transmitted for exactly the same amount of
time as all of the other bits. On the other, UART the receiver will need to sample the
logic value that being received at approximately halfway through the period assigned
to each bit to determine if it is logic 1 or logic 0.
When a byte of data has been sent, the transmitter may add a Parity Bit. The
Parity Bit may be used by the receiver to perform simple error checking. In this
project, parity bit is not being implemented. After this, a Stop Bit is sent by the
transmitter to indicate the transmitter has completed the data transmission. If another
byte of data is to be transmitted, the Start Bit for the new data can be sent as soon as
the Stop Bit for the previous word has been sent. Figure 2.1 below shows the typical
UART data frames format that used by the IOP UART module in this project.

The speed of the serial connection is measured in bits-per-second or normally


expressed as "baud rate". The duration of a bit is dependent on the baud rate. The
baud rate is the number of times the signal can switch states in one second.

Thus, if the line is operating at 9600 baud, the line can switch states 9,600
times per second. This means each bit has the duration of 1/9600 of a second or about
100 micro second. In this project the baud rate of UART module in IOP is set as 9600.
As shown in Figure 2.1, each character or byte requires 10 bits to be transmitted.
Thus, IOP is able to transfer 960 bytes of data in a second.

CHAPTER 2
INTRODUCTION TO UART
Serial communication is an essential to computers and allows them to
communicate with low speed peripheral devices, such as the keyboard, the mouse,
modems etc. Thus, the UART or Universal Asynchronous Receiver/ Transmitter is
the most important component required in serial communication.
Asynchronous communication is performed between two (or more) devices if the
devices operate on independent clocks. This is because there is no guarantee that the
clocks of the communicating devices will have the exact frequency and phase over
extended periods. To combat this problem, asynchronous communication requires
additional synchronisation bits to be added around actual data in order to maintain
signal integrity. Figure 1 below illustrate the waveform of an asynchronous serial data
stream.

Fig 2.1 . SERIAL DATA STREAM


In asynchronous communication, data is preceded with a start bit (Space) that
indicates to the receiver that a word (a chunk of data broken up into individual bits) is
about to begin. The end of a word is followed by a stop bit (Mark), which tells the
receiver that the word has come to an end. Now it should begin looking for the next
start bit. Any bits it receives before getting the start bit should be ignored. To insure
data integrity, a parity bit is often added between the last bit of data and the stop bit.
The parity bit ensures that the data received is composed of the same number of bits
in the same order in which they were sent.

2.1 UART ARCHITECTURE


The UART circuit enables a computing processing unit (CPU) serial access to the
external peripheral. The interface between the CPU and the UART is usually byte
parallel and can be synchronous (i.e. Register Map interface). The transmission
properties of the UART, such as parity check, number of symbol bits, number of stop
bits etc., can be programmed via a control register which is part of the UART
circuitry. The CPU can configure the UART by writing the specific control bits via the
parallel interface. Figure 2

below illustrates the block diagram of an UART

circuit,including its interface.

Fig 2.2.SIMPLIFIED BLOCK DIAGRAM OF AN UART


The UART consists of the following four main function blocks:

CPU Bus Controller

Baud rate generator

Receiver

Transmitter

2.2 CPU BUS CONTROLLER


The CPU Bus Controller provides the parallel data I/O interface to the local
processor bus. It generates the necessary control signal to enable the uP (or CPU) to
access onto the data, status and control register of the UART circuit. Furthermore, it
generates the local control signal within the UART circuit according to the control
register content and updates the status register content according to the local status
signal values generated by the other function blocks. It also accommodates the
transmitter and receiver buffer (Hold Register or FIFO) that is essential for
asynchronous transmission.
2.3 BAUD RATE GENERATOR
The Baud Rate Generator is a programmable transmit and receive bit timing
device. Given the programmed value, it generates a periodic pulse, which determines
the baud rate of the UART transmission. This pulse is used by the receiver and
transmitter circuit to generate a sampling pulse for sampling the received serial data
and to determine the bit width of the transmit data.
2.4 RECEIVER
The Receiver block detects the start bit of an incoming serial data and samples
the data bits, bit by bit, according to the baud clock of the baud rate generator. It
completes the receive process of a single symbol of 6,7 or 8 bits with the detection of
the stop bit (the stop bit can be 1, 1.5 or 2 bits width). Parity check of the received
symbol ensures that the data has been received correctly. In the case of invalid stop bit
or parity check error, the status signals parity error or frame error will be set. Finally
the receiver writes the received symbol data onto the local data bus, which connected
to the CPU Bus controller, and sets a signal to indicate _Receiver data Write_. This
signal initiates that the CPU Bus controller informs the CPU via an interrupt about the
data arrival.
2.5 TRANSMITTER

The transmitter block is responsible for the serial transmitting of the data,
which is written by the uP (CPU) onto the TxD Hold register (or FIFO) at the CPU
Bus controller block. First the transmitter detects whether the UART transmitter
buffer (FIFO or TxD Hold Register) contains data for transmission. If it does, it loads
the data onto the transmit register at the transmitter circuit via the local data bus
(which connects the CPU Bus controller and the transmitter) and sets a signal to
indicate _Transmit data Read_. This signal initiates a sequence where the CPU Bus
controller informs the CPU via an interrupt about the transmitted data, so that the
CPU can load a new value. Synchronous to the baud clock which is generated by the
baud rate generator, the transmitter sets the start bit on the TxD signal line to initiate
the start of a frame and then bit by bit the symbol data. It finally completes the
transmission by sending a parity bit that represents the parity of transmitted data and
completes the frame with the final stop bit. The procedure will be repeated for another
symbol, if the transmitter buffer contains another symbol, else the transmitter goes to
an idle mode, transmitting _Marks_.
2.6 UART DESIGN SPECIFICATION
As part of this laboratory, only the receiver baud rate generator and the
transmitter circuit shall be designed. The CPU bus control and register block is not a
part of the scope of this laboratory.
The targeted UART circuit shall supports following features:

7 bit and 8 bit symbol word size.

Programmability parity check (EVEN and ODD parity)

Frame Error Indication

Parity Error Indication

Baud rates according to the table below.

The Figure below illustrates the block diagram and the external interface of the UART
circuit that shall be design as part of this laboratory:

Fig 2.3 UART SPECIFICATION

Fig 2.4.TIMING DIAGRAM OF THE PARALLEL I/O INTERFACE

CHAPTER 3
FIFO ARCHITECTURE
FIFO Types:
Every memory in which the data word that is written in first also comes out first
when the memory is read is a first-in first-out memory. Figure 1 illustrates the data
flow in a FIFO. There are three kinds of FIFO:

Shift register FIFO with an invariable number of stored data words and,
thus, the necessary synchronism between the read and the write operations
because a data word must be read every time one is written

Exclusive read/write FIFO FIFO with a variable number of stored data


words and, because of the internal structure, the necessary synchronism
between the read and the write operations Concurrent read/write FIFO FIFO
with a variable number of stored data words and possible asynchronism
between the read and the write operation.

10

Fig 3.1.FIFO DATA FLOW


The shift register is not usually referred to as a FIFO, although it is first-in
first-out in nature. onsequently, this application report focuses exclusively on FIFOs
that handle variable-length data. Two electronic systems always are connected to the
input and output of a FIFO: one that writes and one that reads. If certain timing
conditions must be maintained between the writing and the reading systems, we speak
of exclusive read/write FIFOs because the two systems must be synchronized. But, if
there are no timing restrictions in how the systems are driven, meaning that the
writing system and the reading system can work out of synchronism, the FIFO is
called concurrent read/write.
3.1 PASSING MULTIPLE ASYNCHRONOUS SIGNALS

11

Attempting to synchronize multiple changing signals from one clock domain


into a new clock domain and insuring that all changing signals are synchronized to the
same clock cycle in the new clock domain has been shown to be problematic[1].
FIFOs are used in designs to safely pass multi-bit data words from one clock domain
to another. Data words are placed into a FIFO buffer memory array by control signals
in one clock domain, and the data words are removed from another port of the same
FIFO buffer memory array by control signals from a second clock domain.
Conceptually, the task of designing a FIFO with these assumptions seems to be easy.
The difficulty associated with doing FIFO design is related to generating the FIFO
pointers and finding a reliable way to determine full and empty status on the FIFO.

3.2 SYNCHRONOUS FIFO POINTERS


For synchronous FIFO design (a FIFO where writes to, and reads from the
FIFO buffer are conducted in the same clock domain), one implementation counts the
number of writes to, and reads from the FIFO buffer to increment (on FIFO write but
no read), decrement (on FIFO read but no write) or hold (no writes and reads, or
simultaneous write and read operation) the current fill value of the FIFO buffer. The
FIFO is full when the FIFO counter reaches a predetermined full value and the FIFO
is empty when the FIFO counter is zero. Unfortunately, for asynchronous FIFO
design, the increment-decrement FIFO fill counter cannot be used, because two
different and asynchronous clocks would be required to control the counter. To
determine full and empty status for an asynchronous FIFO design, the write and read
pointers will have to be compared.

3.3 ASYNCHRONOUS FIFO POINTERS

12

In order to understand FIFO design, one needs to understand how the FIFO
pointers work. The write pointer always points to the next word to be written;
therefore, on reset, both pointers are set to zero, which also happens to be the next
FIFO word location to be written. On a FIFO-write operation, the memory location
that is pointed to by the write pointer is written, and then the write pointer is
incremented to point to the next location to be written. Similarly, the read pointer
always points to the current FIFO word to be read. Again on reset, both pointers are
reset to zero, the FIFO is empty and the read pointer is pointing to invalid data
(because the FIFO is empty and the empty flag is asserted). As soon as the first data
word is written to the FIFO, the write pointer increments, the empty flag is cleared,
and the read pointer that is still addressing the contents of the first FIFO memory
word, immediately drives that first valid word onto the FIFO data output port, to be
read by the receiver logic. The fact that the read pointer is always pointing to the next
FIFO word to be read means that the receiver logic does not have to use two clock
periods to read the data word. If the receiver first had to increment the read pointer
before reading a FIFO data word, the receiver would clock once to output the data
word from the FIFO, and clock a second time to capture the data word into the
receiver. That would be needlessly inefficient. The FIFO is empty when the read and
write pointers are both equal. This condition happens when both pointers are reset to
zero during a reset operation, or when the read pointer catches up to the write pointer,
having read the last word from the FIFO. A FIFO is full when the pointers are again
equal, that is, when the write pointer has wrapped around and caught up to the read
pointer. This is a problem. The FIFO is either empty or full when the pointers are
equal, but which? One design technique used to distinguish between full and empty is
to add an extra bit to each pointer. When the write pointer increments past the final
FIFO address, the write pointer will increment the unused MSB while setting the rest
of the bits back to zero as shown in Figure 1 (the FIFO has wrapped and toggled the
pointer MSB). The same is done with the read pointer. If the MSBs of the two
pointers are different, it means that the write pointer has wrapped one more time that
the read pointer. If the MSBs of the two pointers are the same, it means that both
pointers have wrapped the same number of times.

13

Fig 3.2.FIFO STATUS DIAGRAM

Using n-bit pointers where (n-1) is the number of address bits required to
access the entire FIFO memory buffer, the FIFO is empty when both pointers,
including the MSBs are equal. And the FIFO is full when both pointers, except the
MSBs are equal. The FIFO design in this project uses n-bit pointers for a FIFO with
2(n-1) write-able locations to help handle full and empty conditions.

3.4 FIFO TESTING TROUBLES


Testing a FIFO design for subtle design problems is nearly impossible to do.
The problem is rooted in the fact that FIFO pointers in an RTL simulation behave
ideally, even though, if incorrectly implemented, they can cause catastrophic failures

14

if used in a real design. In an RTL simulation, if binary-count FIFO pointers are


included in the design all of the FIFO pointer bits will change simultaneously; there is
no chance to observe synchronization and comparison problems.
In a gate-level simulation with no backannotated delays, there is only a slight
chance of observing a problem if the gate transitions are different for rising and
falling edge signals, and even then, one would have to get lucky and have the correct
sequence of bits changing just prior to and just after a rising clock edge. For higher
speed designs, the delay differences between rising and falling edge signals
diminishes and the probability of detecting problems also diminishes.
Finding actual FIFO design problems is greatest for gate-level designs with
backannotated delays, but even doing this type of simulation, finding problems will be
difficult to do and again the odds of observing the design problems decreases as signal
propagation delays diminish. Clearly the answer is to recognize that there are
potential FIFO design problems and to do the design correctly from the start.
The behavioral model that I sometimes use for testing a FIFO design is a FIFO
model that is simple to code, is accurate for behavioral testing purposes and would be
difficult to debug if it were used as an RTL synthesis model. This FIFO model is only
recommended for use in a FIFO testbench. The model accurately determines when
FIFO full and empty status bits should be set and can be used to determine the data
values that should have been stored into a working FIFO.

15

CHAPTER 4
THE FIFO BLOCK DIAGRAM

Fig 4.1.FIFO ARCHITECTURE DIAGRAM


4.1 HANDLING FULL & EMPTY CONDITIONS:
Exactly how FIFO full and FIFO empty are implemented is design-dependent.
The FIFO design in this paper assumes that the empty flag will be generated in the

16

read-clock domain to insure that the empty flag is detected immediately when the
FIFO buffer is empty, that is, the instant that the read pointer catches up to the write
pointer (including the pointer MSBs). The FIFO design in this project assumes that
the full flag will be generated in the write-clock domain to insure that the full flag is
detected immediately when the FIFO buffer is full, that is, the instant that the write
pointer catches up to the read pointer (except for different pointer MSBs).

4.2 GENERATING EMPTY


As shown in Figure 1, the FIFO is empty when the read pointer and the
synchronized write pointer are

equal. The empty comparison is simple to do.

Pointers that are one bit larger than needed to address the FIFO memory buffer are
used. If the extra bits of both pointers (the MSBs of the pointers) are equal, the
pointers have wrapped the same number of times and if the rest of the read pointer
equals the synchronized write pointer, the FIFO is empty.
The write pointer must be synchronized into the read-clock domain through a pair of
synchronizer registers found in the sync_w2r module. In order to efficiently register
the rempty output, the synchronized write pointer is actually compared against the
rnext .

4.3 GENERATING FULL


Since the full flag is generated in the write-clock domain by running a
comparison between the write and read pointers, one safe technique for doing FIFO
design requires that the read pointer be synchronized into the write clock domain
before doing pointer comparison. The full comparison is not as simple to do as the
empty comparison. Pointers that are one bit larger than needed to address the FIFO
memory buffer are still used for the comparison, but simply using counters with an
extra bit to do the comparison is not valid to determine the full condition.

4.4 PESSIMISTIC FULL & EMPTY

17

The FIFO described in this paper has implemented full-removal and emptyremoval using a pessimistic method. That is, full and empty are both asserted
exactly on time but removed late. Since the write clock is used to generate the FIFOfull status and since FIFO-full occurs when the write pointer catches up to the
synchronized read pointer, full-detection is accurate and immediate. Removal of
full status is pessimistic because full comparison is being done with a
synchronized read pointer. When the read

pointer does increment, the FIFO is no

longer full, but the full-generation logic will not detect the change until two rising
wclk edges synchronize the updated rptr into the wclk domain. This is generally not a
problem, since it means that the data-sending hardware is being held-off or
informed that the FIFO is still full for a couple of extra wclk edges. The important
detail is to insure that the FIFO does not overflow. Signaling the data-sender to not
send more data for a couple of extra wclk edges merely gives time for the FIFO to
make room to receive more data. Similarly, since the read clock is used to generate
the FIFO-empty status and since FIFO-empty occurs when the read pointer catches up
to the synchronized write pointer, empty-detection is accurate and immediate.
Removal of empty status is pessimistic because empty comparison is being done
with a synchronized write pointer. When the write pointer does increment, the FIFO is
no longer empty, but the empty-generation logic will not detect the change until two
rising rclk edges synchronize the updated wptr into the rclk domain. This is generally
not a problem, since it means that the data-receiving logic is being held-off or
informed that the FIFO is still empty for a couple of extra rclk edges. The important
detail is to insure that the FIFO does not underflow. Signaling the data-receiver to
stop removing data from the FIFO for a couple of extra rclk edges merely gives time
for the FIFO to be filled with more data.

4.5 ACCURATE SETTING OF FULL & EMPTY


Note that setting either the full flag or empty flag might not be quite accurate
if both pointers are incrementing simultaneously. For example, if the write pointer
catches up to the synchronized read pointer, the full flag will be set, but if the read
pointer had incremented at the same time as the write pointer, the full flag will have
been set early since the FIFO is not really full due to a read operation occurring

18

simultaneous to the write-to-full operation, but the read pointer had not yet been
synchronized into the write-clock domain. The setting of the full flag was slightly too
early and slightly pessimistic. This is not a design problem.

4.6 EXCLUSIVE READ/WRITE FIFOS


In exclusive read/write FIFOs, the writing of data is not independent of how
the data are read. There are timing relationships between the write clock and the read
clock. For instance, overlapping of the read and the write clocks could be prohibited.
To permit use of such FIFOs between two systems that work asynchronously to one
another, an external circuit is required for synchronization. But this synchronization
circuit usually considerably reduces the data rate.

4.7 CONCURRENT READ/WRITE FIFOS


In concurrent read/write FIFOs, there is no dependence between the writing
and reading of data. Simultaneous writing and reading are possible in overlapping
fashion or successively. This means that two systems with different frequencies can be
connected to the FIFO. The designer need not worry about synchronizing the two
systems because this is taken care of in the FIFO. Concurrent read/write FIFOs,
depending on the control signals for writing and reading, fall into two groups:
Synchronous FIFOs
Asynchronous FIFOs
4.8 ASYNCHRONOUS FIFOS
The control signals of an asynchronous FIFO correspond most closely to
human intuition and were, in the past, the only kind of FIFO driving. The block
diagram in Figure 8 shows the control lines of an asynchronous FIFO, and Figure 9
illustrates the typical timing on these lines in a read and write operation.

19

Fig 4.2.CONNECTIONS OF AN ASYNCHRONOUS FIFO

Fig 4.3.TIMING DIAGRAM OF ASYNCHRONOUS FIFO OF LENGTH 4:


The control lines WRITE CLOCK and FULL are used to write data. When a
data word is to be written into an asynchronous FIFO, it is first necessary to check
whether there is space available in the FIFO. This is done by querying the FULL
status line. If free space is indicated, the data word is applied to the data inputs and
written into the FIFO by a clock edge on the WRITE CLOCK input. In analogous
fashion, the control lines READ CLOCK and EMPTY are used to read data. In this
case, the EMPTY status output has to be queried before reading, because data can be

20

read out only if it is stored in the FIFO. Then, a clock edge is applied to the READ
CLOCK input, causing the first word in the data queue to appear on the data output.
The timing diagram in Figure 9 shows the resetting of the FIFO that is always
necessary at the beginning. Then, three data words are written in. The data words D1
through D3 appear one after the other on the INPUT DATA inputs and clock edges are
applied to WRITE CLOCK for transfer of the data. Once the first data word has been
written into the FIFO, the EMPTY signal changes from low level to high level.
Another two data words are written into the FIFO before the first read cycle. The
subsequent reading out of the first data word with the aid of a clock edge on READ
CLOCK does not alter the status signals. With the writing of another two data words,
the FIFO is full. This is indicated by the FULL signal. Finally, the four data words D2
through D5 remaining in the FIFO are read out. Thus, the FIFO is empty again, so the
EMPTY status line shows this by low level.
4.9 SYNCHRONOUS FIFOS
Synchronous FIFOs are controlled based on methods of control proven in
processor systems. Every digital processor system works synchronized with a systemwide clock signal. This system timing continues to run even if no actions are being
executed. Enable signals, also often called chip-select signals, start the synchronous
execution of write and read operations in the various devices, such as memories and
ports.
The block diagram in Figure 11 shows all the signal lines of a synchronous FIFO. It
requires a free-running clock from the writing system and another from the reading
system. Writing is controlled by the WRITE ENABLE input synchronous with
WRITE CLOCK. The FULL status line can be synchronized entirely with WRITE
CLOCK by the free-running clock. In an analogous manner, data words are read out
by a low level on the READ ENABLE input synchronous with READ CLOCK. Here,
too, the free-running clock permits 100 percent synchronization of the EMPTY signal
with READ CLOCK.

21

Fig 4.4.SYNCHRONOUS FIFO


Thus, synchronous FIFOs are integrated easily into common processor
architectures, offering complete synchronism of the FULL and EMPTY status signals
with the particular free-running clock.
Figure shows the typical waveform in a synchronous FIFO. WRITE CLOCK and
READ CLOCK are free running. The writing of new data into the FIFO is initialized
by a low level on the WRITE ENABLE line. The data are written into the FIFO
with the next rising edge of WRITE CLOCK. In analogous fashion, the READ
ENABLE line controls the reading out of data synchronous with READ CLOCK. All
status lines within the FIFO can be synchronized by the two free-running-clock
signals. The FULL line only changes its level synchronously with WRITE CLOCK,
even if the change is produced by the reading of a data word. Likewise, the EMPTY
signal is synchronized with READ CLOCK. A synchronous FIFO is the only
concurrent read/write FIFO in which the status signals are synchronized with the
driving logic.

22

Fig 4.5.TIMING DIAGRAM FOR A SYNCHRONOUS FIFO OF LENGTH 4

23

CHAPTER 5
IMPLEMENTATION OF A MULTI-CHANNEL UART
CONTROLLER BASED ON FIFO TECHNIQUE USING FPGA

5.1 DESIGN OF ASYNCHRONOUS FIFOS

An asynchronous FIFO refers to a FIFO design where data values are written
to a FIFO buffer from one clock domain and the data value are read from the same
FIFO buffer from another clock domain, where the two clock domains are
asynchronous to each other. FIFOs are always used for data cache,

storing

differences of frequency or phase of asynchronous signals. And asynchronous


FIFOs are often used to quickly and safely pass data from one clock domain to
another asynchronous clock domain. In asynchronous clock circuit, periods and
phases of each clock domain are completely independent so the probability of data
loss is always not zero. This paper introduces a way of designing FIFO based on
FPGAs with high write/read speed and high
reliability. Generally, a FIFO consists of a RAM Array block, a Status block, a writer
pointer (WR_ptr) and a read point (RD_ptr) and its structure is showing in figure 2.
A RAM array with separate read and write ports is used to stored data. The
writer pointer points to the location that will be written next, and the read pointer
points to the location that will be read currently. A write operation increments the
writer pointer and a read operation increments the read pointer. On reset, both pointers
are reset to zero, the FIFO is empty. The writer pointer happens to be the next FIFO
location to be written and the reader pointer is pointing to invalid data. The
responsibility of the status block is to generate the _Empty_ and _Full_ signals to the
FIFO. If the _Full_ is active then the FIFO can not accommodate more
data and if the _Empty_ is active then the FIFO can not provide more data to readout.
When writing data into the FIFO _wclk_ will be used as the clock domain and when
reading data out of the FIFO _rclk_ will be used as the clock domain. These both
clock domains are asynchronous.

24

Fig 5.1.ASYNCHRONOUS FIFO STRUCTURE DIAGRAM


In designing of asynchronous FIFOs, two difficult problems can not be
ignored. One is how to judge FIFOs status according to the writer pointer and read
pointer. The other is how to design circuit to synchronize asynchronous clock
domains to avoid Metastability.
5.2 STATUS OF EMPTY AND FULL OF FIFO
Creating empty and full signals is the most important part of designing a
FIFO. No matter under what circumstance, the read and write pointers can not point to
the same address of the FIFO. So, the empty and full signals play very important roles
within FIFO that they block access to further
read or write respectively. The critical importance of this blocking lies in the fact that
pointer positions are the only control that is over the FIFO, and write or read
operation changes the pointers. Generally, in an ordinary FIFO, when the read pointer
equals to the writer pointer the FIFO is empty. But in a circular FIFO it is either
empty or full when both of the pointers are equal. Because the full and empty signals
can not only be decided by the pointers_ value but

25

also be influenced by the operation that caused the pointers to become equal. If a reset
or read makes the pointers equal to each other, the FIFO is really empty. If a write
makes the pointers equal, the FIFO is full [5]. In order to exactly know weather the
FIFO is full or
empty, we can set a direction flag keeps track of what causes the pointers to become
equal to each other. The flag tells the status circuit the direction in which the FIFO is
currently
headed. The implementation of the direction flag is a little complex because you have
to set the threshold of _going toward full_ and _going toward empty_. In this paper,
this method is instead of another design technique used to distinguish between full
and empty is to add an extra bit to each pointer
.
5.3 SOLUTIONS OF METASTABILITY:
Metastability can cause unpredictable problems in a FIFO, so in the designing
stage we should do the best to reduce the metastability. If asynchronous element is in
a system, metastability is unavoidable. There is absolutely no way to eliminate
metastability completely, so what we do is calculate a _probability_ of error and
express this in terms of time ie. MTBF (Mean Time between Failures). MTBF is a
statistical measure of failure probability, and requires some much more complex,
empirical and experimental data to arrive at. In a D flip-flop, when the input signal
changes instantaneously from 0 to 1 at time 0. t =0, the value of Q is uncertain. This is
metastability.
In the FIFO, it needs to sample the value of a counter with a clock that is
synchronous to the counter clock. Thus it will meet a situation where the counter is
changing from FFFF to
0000, and every single bit goes metastable. This means that the counter would
potentially read any value between FFFF to 0000 and the FIFO does not work. The
most important things that must to be done are to make sure that not all bits of the
counter will change simultaneously. In order to minimize the probability of
occurrence of such errors, we should make sure that precisely one bit changes every
time the counter increments. So we need a counter that counts in
the Gray codes. Gray codes are named after the person who originally patented the

26

code back in 1953, Frank Gray. Gray code is different form binary code that is every
next value
differs from the previous in only one bit position. There are multiple ways to design a
Gray code counter and this paper details a simple and straight forward method to do
the design. The technique describe in this paper uses only one set of flip-flops for the
Gray code counter as shown in figure.
In a FIFO, converts the Gray code to Binary code, increments it and convert it
back to the Gray code and store it. The Gray code counter assumes that the outputs of
registers bits are the Gray code value. The Gray code outputs are then passed to the
Gray to binary converter which is
passed to a binary adder to generate the next binary value which is passed to the
binary to Gray converter that generates the next Gray code value stored in register.
The first fact to remember about a Gray code is that the code distance between any
two adjacent words is just 1(only one bit can change from one Gray count to the
next). The second fact to remember about a Gray code counter is that most useful
Gray code counters must have power-of-2 counts in the sequence.

Fig 5.2.GREY COUNTER ARCHITECTURE


5.4 HARDWARE STRUCTURE
In the multi-channel controller, there are different blocks including UART
block, Status Detectors, asynchronous FIFOs block and Baud Rate Generator block.
Each block has different function in the controller.The first part is UART circuit block
and its structure is shown in figure 5. It consists of three parts Receive
Circuit,Transmit Circuit and Control/Status Registers. The Transmit Circuit consists
of a Transmit Buffer and a Shift Register. Transmit Buffer loads data being
transmitted from local CPU. And Shift Register accepts data from the Transmit Buffer

27

and send it to the TXD pin one by one bit. The Receive Circuit consists of a Receive
Shift Register and a Receive Buffer. The Receive Shift Register receives data from
RXD one by one bit. The Receive Buffer loads data from long-distance MCU and gets
it ready for the local PC to read. The Control Register a special function register is
used to control the UART and indicate status of it. According to each bits value the
UART will choose different kind of communication method and the UART knows
what to do to receive or transmit data. FIFOs are used to store data received from the
PC and get ready for sub MCUs. When writing data into FIFOs and reading data out
of FIFOs we could set different clock domains according to the PCs and MCUs
Baud Rate. So it can be used to implement communications between MCUs at
different Baud Rate .
The controller also has a block of Baud Rate Generator to engender different
Baud Rates to content requirements for different kind of systems. This block is
constituted by timers (32/16 bits timers), frequency dividers and a Baud Rate setting
register.

Fig 5.3.STRUCTURE OF UART BLOCK


Using FIFO technique and the COM block as mentioned before, we design a
multichannel controller. It can be used to implement communications between MCUs
in a complex system. And it can also be used to complete communication between
high speed device and low speed device. Structure of the controller is showing in

28

figure 6. The controller is built within a FPGA EP1C6Q240 which is based on


SRAM technique produced by ALTERA. It is possible to design small scale
memorizer like FIFOs. When designing FIFOs within FPGAs, you should consider
the capacity of FIFO in practice and also consider the FPGAs capacities.

Fig 5.4.STRUCTURE OF THE CONTROLLER

29

CHAPTER 6
PROJECT ADVANTAGES

Software compatible with 16450, 16550 and 16750 UARTs .

Configuration capability

Separate configurable BAUD clock line

Two modes of operation: UART mode and FIFO mode

In the FIFO mode transmitter and receiver are each buffered with 16 byte or
64 byte FIFO to reduce the number of interrupts presented to the CPU

Optional FIFO size extension to 128, 256 or 512 Bytes

Adds or deletes standard asynchronous communication bits (start, stop, and


parity) to or from the serial data

In UART mode receiver and transmitter are double buffered to eliminate a


need for precise synchronization between the CPU and serial data .

Independently controlled transmit, receive, line status, and data set interrupts

False start bit detection

16 bit programmable baud generator

Fully programmable serial-interface characteristics:


o

5-, 6-, 7-, or 8-bit characters

Baud generation

Complete status reporting capabilities

Line break generation and detection.

Two DMA Modes allows single and multitransfer.

Technology independent HDL Source Code.

Full prioritized interrupt system controls.

30

CHAPTER 7
APPLICATIONS

Serial Data communications applications

Modem interface

Finds applications in modern complex control systems like robotic movements

More than two systems operating with different frequencies.

31

CHAPTER 8
HARDWARE DESCRIPTION LANGUAGE(VHDL)
8.1 HDL(VHDL)
8.1.1 why (v) hdl?

Interoperability

Technology independence

Design reuse

Several levels of abstraction

Readability

Standard language

Widely supported

8.1.2 WHAT IS VHDL?


VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)

Design specification language


Design entry language

32

Design simulation language


Design documentation language
An alternative to schematics
8.2 DESIGN UNITS:
Segments of VHDL code that can be compiled separately and stored in a library.

8.2.1 entities:

A black box with interface definition.

Defines the inputs/outputs of a component (define pins)

A way to represent modularity in VHDL.

Similar to symbol in schematic.

Entity declaration describes entity.

Eg: entity Comparator is


port (A, B : in std_logic_vector(7 downto0);
EQ : out std_logic);
end Comparator;
PORTS:

33

Provide channels of communication between the component and its


environment.

Each port must have a name, direction and a type.

An entity may have NO port declaration

Port directions:

In: A value of a port can be read inside the component, but cannot be assigned.
Multiple reads of port are allowed.

Out: Assignments can be made to a port, but data from a port cannot be read.
Multiple assignments are allowed.

In out: Bi-directional, assignments can be made and data can be read. Multiple
assignments are allowed.

Buffer: An out port with read capability. May have at most one assignment.
(are not recommended)

8.2.2 Architectures:

Every entity has at least one architecture.

One entity can have several architectures.

Architectures can describe design using:


BehaviorStructureDataflow

Architectures can describe design on many levelsGate levelRTL (Register


Transfer Level)Behavioral level

Configuration declaration links architecture to entity.

Eg:
Architecture Comparator1 of Comparator is
Begin
EQ <= 1when (A=B) else 0;
End Comparator1;
8.3 LEVELS OF ABSTRACTION:

34

VHDL supports many possible styles of design description, which differ primarily in
how closely they relate to the HW.
It is possible to describe a circuit in a number of ways.

Structural-------

Dataflow -------

Behavioral -------

Higher level of abstraction

8.3.1 Structural Vhdl Description:

Circuit is described in terms of its components.

From a low-level description (e.g., transistor-level description) to a high level


description (e.g., block diagram).

For large circuits, a low-level description quickly becomes impractical.

8.3.2 Dataflow Vhdl Description:

Circuit is described in terms of how data moves through the system.


In the dataflow style you describe how information flows between registers in
the system.
The combinational logic is described at a relatively high level, the placement
and operation of registers is specified quite precisely.

The behavior of the system over the time is defined by registers.

There are no build-in registers in VHDL-language.


Either lower level description
or behavioral description of sequential elements is needed.

The lower level register descriptions must be created or obtained.

If there is no 3rd party models for registers => you must write the behavioral
description of registers.

The behavioral description can be provided in the form of


subprograms(functions or procedures)

8.3.3 Behavioral Vhdl Description:

Circuit is described in terms of its operation over time.

35

Representation might include, e.g., state diagrams, timing diagrams and


algorithmic descriptions.

The concept of time may be expressed precisely using delays (e.g., A <= B
after 10 ns)

If no actual delay is used, order of sequential operations is defined.

In the lower levels of abstraction (e.g., RTL) synthesis tools ignore detailed
timing specifications.

The actual timing results depend on implementation technology and efficiency


of synthesis tool.

There are a few tools for behavioral synthesis.

CHAPTER 9
FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can be
configured by the customer or designer after manufacturinghence the name "fieldprogrammable". FPGAs are programmed using a logic circuit diagram or a source
code in a hardware description language (HDL) to specify how the chip will work.
They can be used to implement any logical function that an application-specific
integrated circuit (ASIC) could perform, but the ability to update the functionality
after shipping offers advantages for many applications.
FPGAs contain programmable logic components called "logic blocks", and a
hierarchy of reconfigurable interconnects that allow the blocks to be "wired
together"somewhat like a one-chip programmable breadboard. Logic blocks can be
configured to perform complex combinational functions, or merely simple logic gates
like AND and XOR. In most FPGAs, the logic blocks also include memory elements,
which may be simple flip-flops or more complete blocks of memory.
HISTORY : The FPGA industry sprouted from programmable read only memory
(PROM) and programmable logic devices (PLDs). PROMs and PLDs both had the
option of being programmed in batches in a factory or in the field (field
programmable), however programmable logic was hard-wired between logic gates.

36

Xilinx Co-Founders, Ross Freeman and Bernard Vonderschmitt, invented the first
commercially viable field programmable gate array in 1985 the XC2064. The
XC2064 had programmable gates and programmable interconnects between gates, the
beginnings of a new technology and market. The XC2064 boasted a mere 64
configurable logic blocks (CLBs), with two 3-input lookup tables (LUTs). More than
20 years later, Freeman was entered into the National Inventor's Hall of Fame for his
invention.
Some of the industrys foundational concepts and technologies for programmable
logic arrays, gates, and logic blocks are founded in patents awarded to David W. Page
and LuVerne R. Peterson in 1985.
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement 600,000
reprogrammable gates. Casselman was successful and the system was awarded a
patent in 1992.
Xilinx continued unchallenged and quickly growing from 1985 to the mid-1990s,
when competitors sprouted up, eroding significant market-share. By 1993, Actel was
serving about 18 percent of the market.
The 1990s were an explosive period of time for FPGAs, both in sophistication and the
volume of production. In the early 1990s, FPGAs were primarily used in
telecommunications and networking. By the end of the decade, FPGAs found their
way into consumer, automotive, and industrial applications.
FPGAs got a glimpse of fame in 1997, when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device. Thomsons
algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip to decide the
configuration needed to accomplish a sound recognition task.
MODERN DEVELOPMENTS

A recent trend has been to take the coarse-grained architectural approach a step
further by combining the logic blocks and interconnects of traditional FPGAs with
embedded microprocessors and related peripherals to form a complete "system on a

37

programmable chip". This work mirrors the architecture by Ron Perlof and Hana
Potash of Burroughs Advanced Systems Group which combined a reconfigurable
CPU architecture on a single chip called the SB24. That work was done in 1982.
Examples of such hybrid technologies can be found in the Xilinx Virtex-II PRO and
Virtex-4 devices, which include one or more PowerPC processors embedded within
the FPGA's logic fabric. The Atmel FPSLIC is another such device, which uses an
AVR processor in combination with Atmel's programmable logic architecture.
An alternate approach to using hard-macro processors is to make use of "soft"
processor cores that are implemented within the FPGA logic. (See "Soft processors"
below).
As previously mentioned, many modern FPGAs have the ability to be reprogrammed
at "run time," and this is leading to the idea of reconfigurable computing or
reconfigurable systems CPUs that reconfigure themselves to suit the task at hand.
The Mitrion Virtual Processor from Mitrionics is an example of a reconfigurable soft
processor, implemented on FPGAs. However, it does not support dynamic
reconfiguration at runtime, but instead adapts itself to a specific program.
Additionally, new, non-FPGA architectures are beginning to emerge. Softwareconfigurable microprocessors such as the Stretch S5000 adopt a hybrid approach by
providing an array of processor cores and FPGA-like programmable cores on the
same chip.
APPLICATIONS :
Applications of FPGAs include digital signal processing, software-defined radio,
aerospace and defense systems, ASIC prototyping, medical imaging, computer vision,
speech recognition, cryptography, bioinformatics, computer hardware emulation,
radio astronomy and a growing range of other areas.
FPGAs originally began as competitors to CPLDs and competed in a similar space,
that of glue logic for PCBs. As their size, capabilities, and speed increased, they began
to take over larger and larger functions to the state where some are now marketed as
full systems on chips (SoC). Particularly with the introduction of dedicated multipliers

38

into FPGA architectures in the late 1990s, applications, which had traditionally been
the sole reserve of DSPs, began to incorporate FPGAs instead.
FPGAs especially find applications in any area or algorithm that can make use of the
massive parallelism offered by their architecture. One such area is code breaking, in
particular brute-force attack, of cryptographic algorithms.
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are performed
on the FPGA instead of a microprocessor.
The inherent parallelism of the logic resources on an FPGA allows for considerable
computational throughput even at a low MHz clock rates. The flexibility of the FPGA
allows for even higher performance by trading off precision and range in the number
format for an increased number of parallel arithmetic units. This has driven a new
type of processing called reconfigurable computing, where time intensive tasks are
offloaded from software to FPGAs.
The adoption of FPGAs in high performance computing is currently limited by the
complexity of FPGA design compared to conventional software and the extremely
long turn-around times of current design tools, where 4-8 hours wait is necessary after
even minor changes to the source code.
Traditionally, FPGAs have been reserved for specific vertical applications where the
volume of production is small. For these low-volume applications, the premium that
companies pay in hardware costs per unit for a programmable chip is more affordable
than the development resources spent on creating an ASIC for a low-volume
application. Today, new cost and performance dynamics have broadened the range of
viable applications.
ARCHITECTURE :
The most common FPGA architecture consists of an array of configurable logic
blocks (CLBs), I/O pads, and routing channels. Generally, all the routing channels
have the same width (number of wires). Multiple I/O pads may fit into the height of
one row or the width of one column in the array.

39

An application circuit must be mapped into an FPGA with adequate resources. While
the number of CLBs and I/Os required is easily determined from the design, the
number of routing tracks needed may vary considerably even among designs with the
same amount of logic. (For example, a crossbar switch requires much more routing
than a systolic array with the same gate count.) Since unused routing tracks increase
the cost (and decrease the performance) of the part without providing any benefit,
FPGA manufacturers try to provide just enough tracks so that most designs that will
fit in terms of LUTs and IOs can be routed. This is determined by estimates such as
those derived from Rent's rule or by experiments with existing designs.

The FPGA is an array or island-style FPGA. It consists of an array of logic blocks and routing
channels. Two I/O pads fit into the height of one row or the width of one column, as shown
below. All the routing channels have the same width (number of wires).

FPGA STRUCTURE

A classic FPGA logic block consists of a 4-input lookup table (LUT), and a flip-flop,
as shown below. In recent years, manufacturers have started moving to 6-input LUTs
in their high performance parts, claiming increased performance.

40

TYPICAL LOGIC BLOCK

CHAPTER 10
SIMULATION AND VERIFICATION
To verify design of the controller a test bench is written to make verification in
Modelsim. Data received from the PC or other main MCU will be stored in FIFOs
within FPGA till the controller received the commands to order the controller to send
data to sub-controllers. Then the controller will set a kind of Baud Rate according to
commands desired. As showing in figure 8, the controller is receiving data and store
the data received to different FIFO waiting for read.
When sub-controllers are required to receive data at different Baud Rates, the
controller can set each channel at its required Baud Rate to transmit data. The
transmitting
sequence is showing as following in figure 9. The controller sends data at the same
time but at different Baud Rate When sub-controllers are required to receive data at
the
same Baud Rate, the controller can also set all channels at the same Baud Rate to
transmit data as showing in figure 9. All sub-MCU can receive data at the same time.
We design four channels in Verilog HDL as totally the same structure use always
block to implement communication. So on theory, there are no time differences
between sub-controller when the controller transmits data to sub-controllers at the
same time. But in fact, there are hardware delays in FPGAs and these delays may

41

causes subcontrollers can not receiver data from the controller at the same time
precisely. Comparing with the delays in RS485
net these delays can be ignored. So using this controller can greatly improve
synchronization of sub-controllers.

10.1 EMPTY FIFO:


SIMULATION

RTL SCHEMATIC

42

10.2 FULL FIFO:


SIMULATION

43

RTL SCHEMATIC

10.3 TOTAL FIFO:


SIMULATION

RTL SCHEMATIC

44

45

10.4 TRANSFER OF DATA WITH 3 DIFFERENT SPEEDS USING


FIFO:

46

SIMULATIO

47

10.5 TRANSFER OF DATA WITH SAME SPEED USING FIFO:


SIMULATION

SYNTHESIS REPORT
Macro Statistics
# RAMs

:3

48

16x7-bit dual-port RAM


:3
# Adders/Subtractors
:7
32-bit adder
:1
5-bit adder
:6
# Counters
: 14
32-bit up counter
:8
4-bit up counter
:6
# Registers
: 89
1-bit register
: 62
32-bit register
:1
5-bit register
: 18
7-bit register
:8
# Latches
:1
7-bit latch
:1
# Comparators
:6
3-bit comparator equal
:3
5-bit comparator equal
:3
# Multiplexers
:1
32-bit 4-to-1 multiplexer
:1
# Xors
: 57
1-bit xor2
: 54
1-bit xor7
:3
=============================================================
========
Advanced HDL Synthesis Report
Macro Statistics
# RAMs
:3
16x7-bit dual-port distributed RAM
:3
# Adders/Subtractors
:7
32-bit adder
:1
5-bit adder
:6
# Counters
: 14
32-bit up counter
:8
4-bit up counter
:6
# Registers
: 240
Flip-Flops
: 240
# Latches
:1
7-bit latch
:1
# Comparators
:6
3-bit comparator equal
:3
5-bit comparator equal
:3
# Multiplexers
:1
32-bit 4-to-1 multiplexer
:1
# Xors
: 57
1-bit xor2
: 54
1-bit xor7
:3
=============================================================
========
Final Register Report

49

Macro Statistics
# Registers
: 44
Flip-Flops
: 44
=============================================================
========

Device utilization summary:


Selected Device : 3s100etq144-4
Number of Slices:
Number of Slice Flip Flops:
Number of 4 input LUTs:
Number of IOs:
Number of bonded IOBs:
Number of GCLKs:

34 out of 960 3%
44 out of 1920 2%
65 out of 1920 3%
8
5 out of 108
4%
1 out of 24
4%

Minimum period: 6.536ns (Maximum Frequency: 152.994MHz)


Minimum input arrival time before clock: 4.630ns
Maximum output required time after clock: 4.283ns
Maximum combinational path delay: No path found

RTL SCHEMATIC

50

CHAPTER 11
CONCLUSION
This project introduces a method to design a synchronous FIFO based on
FPGA. And using synchronous FIFO technique implements a multi-channel
UART controller within FPGA .The controller can be used to implement
communications in complex system with different Baud Rates of sub-controllers.
It can be used to reduce time delays between sub-controllers of a complex control
system to improve the synchronization of each sub-controller. The controller is
reconfigurable and scalable
In the serial communication between a computer and different devices, UART
play a major role in the data transmission. Here the computer is working with a
particular frequency and the devices connected to it work with different frequency.
To convert these asynchronous domains to synchronous domains , we use a UART
consisting of a FIFO.
FIFO converts asynchronous to synchronous where as the UART converts the
serial data to parallel and then parallel to serial. Here we conclude that using FIFO
in the UART is very advantageous when compared with the normal UART.

51

CHAPTER 12
REFERENCES
1)FIFO Architecture,

Functions,

and Applications

by

Peter

Forstner

TEXAS

INSTRUMENTS
2) Simulation and Synthesis Techniques for synchronous FIFO Design by Clifford E.
Cummings
3) Synchronous FIFO 5.0-XILINX
4) http://en.wikipedia.org/wiki/First-in,_first-out
5)http://www.xilinx.com/support/documentation/ipmeminterfacestorelement_fifo_synchfifo.h
tm
6) http://semiconductors.globalspec.com/Industrial-Directory/synchronous_fifo

52

You might also like