0% found this document useful (0 votes)
42 views

FFTF With Verilog

Hardware implementation of an FFT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

FFTF With Verilog

Hardware implementation of an FFT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/322447765

FFT Final Written Project

Working Paper · January 2018


DOI: 10.13140/RG.2.2.12473.67682

CITATIONS READS
0 6,440

1 author:

Brad Shelton
University of Utah
2 PUBLICATIONS 0 CITATIONS

SEE PROFILE

All content following this page was uploaded by Brad Shelton on 12 January 2018.

The user has requested enhancement of the downloaded file.


The Fast Fourier Transform – a Verilog
Hardware Implementation
Bradford D. Shelton, Student, Electrical and Computer Engineering, University of Utah.

 Euler’s Formula:
Abstract— The ability to digitally process analog waveforms has
made up a significant portion of electrical and electronics 𝑒 𝑗𝑥 = cos(𝑥) + 𝑗𝑠𝑖𝑛(𝑥). (3)
engineering for years. Over the years, many methods have been
developed to process signals. Yet despite the various techniques
available, the first step of synthesis almost always requires some Although Eq. (2) is in a form that is computable for hardware,
form of Fourier analysis. In hardware, the Fast Fourier it requires a massive amount of transistors. The Fourier series
Transform (FFT) uses the least amount of space to transform demonstrates the scale of calculations required to approximate
sampled time data. The FFT uses a method known as Decimation sine and cosine below:
in Time (DIT) to divide large multiple-word additions into smaller
single-word additions. This article will provide an analysis of the ∞ ∞
Verilog code, layout, routed circuits required for an 8-bit FFT. 𝑎0 𝜋𝑛𝑡 𝜋𝑛𝑡
𝑓(𝑡) ≈ ( ) + ∑ 𝑎𝑛 cos ( ) + ∑ 𝑏𝑛 sin ( ), (4)
Examination of these analyses will allow a conclusion to be made 2 𝐿 𝐿
as to which method(s) best implement the FFT in hardware. 𝑛 𝑛
𝐿
𝜋𝑛𝑥
𝑎𝑛 = ∫ 𝑓(𝑥) cos ( ) 𝑑𝑥, 𝑛 ≤ 1, (5)
Index Terms—Analog to Digital Converter, Butterfly Radix-2, −𝐿 𝐿
Complex Plane, Decimation in Time, Discrete Fourier Transform, 𝐿
𝜋𝑛𝑥
Fast Fourier Transform, Read-Only Memory, Shift Register, 𝑏𝑛 = ∫ 𝑓(𝑥) sin ( ) 𝑑𝑥, 𝑛 ≤ 1, (6)
Verilog, Very Large-Scale Integration. −𝐿 𝐿
1 𝐿
𝑎0 = ( ) ∫ 𝑓(𝑥) 𝑑𝑥, (7)
𝐿 −𝐿
I. INTRODUCTION
given that 𝐿 is half the period of 𝑓(𝑥) and −𝐿 ≤ 𝑥 ≤ 𝐿. Not
O VER the past few decades, the FFT has transformed the
industry of signal processing by evolving communication
over satellite networks, acoustic engineering, and digital filters.
only must the equation itself be computed, the coefficients have
to be calculated as well. To circumvent this problem, a much
Many voice-activated products, RF controllers, and audio simpler algorithm was developed i.e. the split-radix FFT. In
synthesizers rely on the ability to convert time data to the 1965, J. W. Cooley and John Tukey dramatically reduced the
frequency domain. The FFT is derived from the Discrete amount of calculations of the original Fourier Transform –
Fourier Transform (DFT). The DFT takes a sequence of digital known as the Cooley-Tukey FFT [2]. This method uses a
samples (bits) and converts them to the frequency domain for Decimation in Time (DIT) method that divides and calculates
further analysis. Shown below is the equation for the DFT: the even and odd-indexed bits separately. This DIT is done
recursively until there is only one even or odd index of the data
𝑁−1 added to another even or odd index [3]. The DIT equation is
𝑋𝑘 [𝑛] = ∑ 𝑥𝑛 𝑒 −𝑗2𝜋𝑘𝑛 (1) shown below:
𝑛
𝑁 𝑁
−1 𝑗2𝜋𝑘𝑛 −1 𝑗2𝜋𝑘𝑛
2 − 2 −
where 𝑋𝑘 [𝑛] is the resultant data from the transformed complex 𝑁

𝑗2𝜋𝑘𝑛 𝑁
𝑋𝑘 [𝑛] = ∑ 𝑥2𝑛 𝑒 2 +𝑒 𝑁 ∗ ∑ 𝑥2𝑛+1 𝑒 2 . (8)
input 𝑥𝑛 function, n represents discrete time, and 𝑘 is the
𝑛 𝑛
sampling frequency [1]. This equation is also represented in its
computable form as follows: This ‘divide and conquer’ algorithm reduces the number of
𝑁
𝑁−1 complex multiplications from (𝑁 − 1)2 to ( ) 𝑙𝑜𝑔2 𝑁 and the
2𝜋𝑘𝑛 2𝜋𝑘𝑛 2
𝑋𝑘 [𝑛] = ∑ 𝑥𝑛 [cos ( ) − 𝑗 sin ( )] (2) number of complex additions from (𝑁 2 − 𝑁) to 𝑁𝑙𝑜𝑔2 𝑁 – 𝑁
𝑁 𝑁
𝑛 is the number of bits. As seen above, this algorithm also adds a
multiplication to the odd index of the DFT equation: e^(-
where the DFT’s complex coefficients are more explicit using

B. D. Shelton wrote this research article while an undergraduate student with


the Electrical and Computer Engineering Department, University of Utah, Salt
Lake City, UT 84112 (email: brad.shelton@utah.edu)
j2𝜋kn)/N. This multiplier is referred to as a twiddle factor. reversed so proper addition takes place.
Twiddle factors are roots in the complex plane that must be
multiplied by the odd indices in order to retain imaginary values II. MATERIALS AND METHODS
in the frequency domain. For example, when using a 12-bit The overall scope of this project originally was to take 8-bit
DIT, the twiddle factors are found by taking the 12th root of the data (specifically a guitar signal) from an off-chip ADC,
complex plane as seen in Fig. 1: convert the ADC data to the frequency domain with the FFT,
compute the magnitude of the data, and output the varying
magnitude in real-time to a display which would be installed on
the front of a tube combo amplifier so one can see the
frequencies being strummed in real-time. After some research,
the scope of the project was extended to allow different types
of ADCs to input data, compute the FFT on any type of analog
signal, output the frequency data to be used for another external
circuit while still outputting the frequency magnitude data to a
display. The FFT also could be scaled if desired. A block
diagram of the original scope of this project is shown below in
Fig. 3:

Fig. 1. 12th root twiddle factors in complex plane [4].

These twiddle factors are usually represented as a 𝑊0 , 𝑊1 , 𝑊2 ,


etc. Because the scope of this project only studied the
performance of an 8-bit FFT, only the 𝑊81 , 𝑊82 , 𝑎𝑛𝑑 𝑊83
twiddle factors were necessary from the algorithm.
In order for this DIT equation to be implemented in
hardware, data must first be parallelized so that each of the eight
bits can be acquired simultaneously for addition. The simplest
to understand this is by way of the Butterfly Radix-2. The
Butterfly Radix-2 is an upgraded version of the Cooley-Tukey
DIT algorithm that uses the least amount of computation to Fig. 3. Project Block Diagram [7].
transform time data into frequency [5]. It is shown below in Fig.
2 and appears like a butterfly: A. Implementation in Hardware
To implement this 8-bit FFT, much research was done on
analog-to-digital converters (ADCs). Several ADCs were
available, but specifically an 8-bit parallel out ADC was of
interest. With 8-bits of data output in parallel, there would be
no need for a serial-in parallel-out (SIPO) shift register. This
would reduce any lag due to a slow internally generated clock
compared to a fast ADC clock. Thus, the ADC output could be
directly wired to the Butterfly Radix-2. However, to meet the
complexity standards for this project, both the parallel-in
parallel-out (PIPO) synchronizer and SIPO architectures would
be designed and made selectable with a 2-1 multiplexer. This
was ideal because it allowed the FFT chip to be used with both
types of ADCs. Also, for this chip there will be two main
outputs: freq_data and display_data. Once the Radix-2 has
transformed the ADC time data, it will be made available to
external circuitry with the freq_data output (general use FFT).
Fig. 2. 12th root twiddle factors in complex plane [6].
The freq_data output could then be passed into an external
digital filter circuit that performs equalization. It also could be
As seen above, the addition is divided until only single-word sent into a digital delay or reverb circuit or both coupled with a
indices are added together. Rising arrows refer to an addition digital-to-analog (DAC) converter and output to a loudspeaker.
while falling arrows subtract. In between stages of addition a For a frequency spectrum to be output to a display, the
‘W’ (twiddle factor) appears that needs to be multiplied by the magnitude of the freq_data will be further computed. Based on
odd indices. Also, the bit indices of the data must usually be the amplitude of this magnitude data, an 8-bit output from read-
only memory (ROM) will be assigned to display_data. This the SIPO was simulated separately from the PIPO and gave
ROM assignment will allow any analog signal’s magnitude to slower timing as expected. These results are documented in
be output to a basic LED display. For simplicity, the APPENDIX A.
magnitudes were computed by dividing the real and imaginary
portions of the frequency data by a set number of bits to IV. CONCLUSION
normalize the frequency data rather than use a complicated From the resultant data above, it is clear that pre-defining
square root function or lookup table (LUT) [see APPENDIX each stage of the Butterfly Radix-2 made writing Verilog code
A]. The display could have color-coded LEDs that follow the much easier; although it took quite a bit of time to do the math.
ROY-G-BIV visible light spectrum for lower or higher This also minimized the circuitry needed to compute the FFT.
frequencies. At times, there was interest to add an on-chip ADC or DAC, but
Much of the FFT computation was compressed by pre- the complexity of the shift registers, FFT, and ROM was
determining the outputs of each stage in the butterfly with hand sufficient enough for one person. The square root function was
calculations [see APPENDIX A]. This allowed the FFT Verilog unnecessarily difficult to understand. There were several online
code to be extremely minimal and thus, very scalable (8-bit tutorials of how to code this math operation, but it never gave
compared to 16 or 32-bit FFT). Verilog code for the FFT that the correct output, so it was decided to just normalize the
describes the behavior of the inputs and outputs can also be seen frequency data with a modulo operator or divide a certain
in APPENDIX A. number of bits from each magnitude.
B. Design and Fabrication From extensive research and analysis of the mathematical
equations, DIT Radix-2 techniques, and the parallel ADC input,
For design and fabrication of the FFT chip, the University of
this chip has been optimized to deliver the fastest computable
Utah’s license of Cadence Virtuoso, Synopsys, and Innovus
and scalable FFT for the scope of this class. The parallel ADC
were used to simulate and test FFT Verilog files. First, a library
input allowed the chip to do almost real-time computations and
(Library_1_Final.lib) was developed which included schematic
the math could be completely combinatorial.
diagrams, behavioral Verilog files, and layout for various logic
gates that could be connected into more interesting circuits.
V. REFERENCES
This library consisted of inverters, an asynchronous clear D
Flip-Flip, buffers, an and-or-invert and or-and-invert, filler
cells, nand and nor gates, a 2-1 mux, tie-high and tie-low cells, [1] Signals and Systems, 2nd ed. Simon Haykin, Barry Van Veen, 2017, pp.
a two input xor gate, and various drive strengths for each logic 128-156.
gate. The library also had capacitive parasitics, leakage power, [2] R. Shirbhate, T. Panse and C. Ralekar, "Design of parallel FFT
the rise and fall times of inputs and outputs, and area of layout architecture using Cooley Tukey algorithm", 2015 International
for each logic gate as well as other properties necessary for Conference on Communications and Signal Processing (ICCSP), pp. 574
simulation [see APPENDIX B]. Once this .lib file was - 578, 2015.
finalized, a behavioral Verilog file (fft.v) was created along [3] Signals and Systems, 2nd ed. Simon Haykin, Barry Van Veen, 2017, pp.
with a testbench to simulate the complex addition and 128-156.
multiplication [see APPENDIX A – ‘Testing’]. [4] Dr. Joel B. Harley, “Assistant Professor, Electrical & Computer
Once FFT.v was working as expected, Synopsys Design Engineering, University of Utah", MEB, 2017.
Compiler was used to synthesize FFT.v into a structural Verilog [5] Dr. Joel B. Harley, “Assistant Professor, Electrical & Computer
file (FFT_struct.v). Once generated, this structural Verilog file Engineering, University of Utah", MEB, 2017.
was imported into Innovus. Inside Innovus, the gnd! and vdd! [6] Dr. Joel B. Harley, “Assistant Professor, Electrical & Computer
nets were created with various metal layers. Then, the structural Engineering, University of Utah", MEB, 2017.
Verilog file was connected to the supply voltage circuitry using [7] Bradford D. Shelton, “Student, Electrical & Computer Engineering,
the place and route function. Once the FFT circuit was University of Utah”, Project Block Diagram.
successfully routed without any errors, pad rings were
generated to connect the core FFT circuitry to the outer input Bradford D. Shelton was born in
and output pads. This was done using the various drive strength Murray, Utah, on August 30, 1990.
gates of the .lib file for proper chip fabrication. This also He is expected to graduate from the
allowed the chip to be used with other external circuits. University of Utah’s Electrical and
Computer Engineering program
with a bachelor’s of science in
III. RESULTS
electrical engineering by the Spring
From simulation of FFT.v was coded to compute only a few of 2018. He will then pursue a
computations with two 1-bit inputs. It was clear that the code Master’s of Business from the
worked well and was extremely small. Iterations through larger University of Utah or Western Governor’s University and a
additions of the bits remained faithful to the expected outcome Sound Engineering Associates Degree from Salt Lake
of the DIT equation. For the PIPO data, a clock wasn’t Community College or Utah Valley University. Brad has
necessary, which let the timing run as fast as the gates always been influenced by entrepreneurship and music, which
themselves could add, subtract, and multiply. For carefulness, persuaded him to major in electrical engineering and business
4

since acoustical engineering was unavailable at the time. He engineering firms when necessary. Brad has had several
sings, plays guitar and piano, and writes and records many internships prior to running his own business. He worked as a
different genres of music in his recording studio. His main systems engineering Co-Op at L3 Technologies, an acoustic
interest in electrical engineering is guitar amplifier design and engineer for Constantine Technologies, LLC, a
the digital signal processing that is used in sound engine DAWs network/systems intern for the Center for High-Performance
such as Logic Pro or FL Studio. Computing, and an electrical engineering intern at Blackrock
Brad currently works for himself as the owner of WirelessiM Microsystems. In all of these internships he has gained and
LLC – an electrical engineering company specializing in analog refined his software/hardware design, test, and debugging skills
and digital audio design. He also does various contract work for at an intermediate level.

APPENDIX A
DOCUMENTATION
I. COMPLEXITY

As mentioned previously, since I originally only thought to output the magnitude of frequency data to a display using parallel

ADC input, the scope was limited and simple. However, by adding a multiplexer that selects a SIPO or PIPO input, the core can

be used with several different ADCs to implement a real-time display or slightly delayed output to the display using eight or 16-

bit shift registers. Also, by adding the second output freq_data, the output can now go to a DAC, which increases the complexity

further if a PISO or PIPO shift register were used on the output side as well. The square root function could be more complex.

For sake of time to turn this in I couldn’t finish that function, but if used with a lookup table, the data wouldn’t require too much

ROM space on the chip and would provide more accurate results. I intend to finish this function by fabrication time. Other than

that, the paper above already mentioned the complexities of the FFT.

II. DECISIONS DECISIONS

At first, I had wanted to make my own ADC on chip, but after hearing the extensive amount of time that would take for a group

of four, let alone one person, I decided to pass. This was a VERY GOOD IDEA. ADCs are inexpensive and come in many forms

from other manufacturers. The next thought I had was to connect to an off-chip ADC providing serial input data. This idea was

useful, but I didn’t like having to wait and batch my signals with our 10MHz clock to get the data. To be honest I’m not sure if I

would visibly notice a difference whether the clock was 10MHz or 100MHz with my output magnitude data, but I figured parallel

data would be insanely faster. This is when I researched ADCs and found that they have some with parallel outputs. Because this

bypassed my clocked shift register I worried whether or not this was a good idea. My flip-flop wasn’t ever used this way. I didn’t

think you would approve of that, so I added a batch shift register that would capture serial data from ADCs of that kind as well as

a synchronizer circuit, which I haven’t implemented yet, that synchronizes the parallel ADC input to my FFT core in case the ADC
5

clock is way too fast and shoves more bits than my core can compute at one time on the wire. I don’t expect this to be an issue

because all my math is combinatorial, but it’s possible I suppose.

One of the best decisions I made was to reduce the complexity of the additions by completely removing the twiddle factors. The

math doesn’t actually require them at the hardware level. I’ll demonstrate this below in Fig. 1:

Fig. 1. Butterfly Radix-2 (twiddle factors circled in red).

From the Radix-2 above you can see that after the first two stages of addition you need to multiply some of the additions by

twiddle factors. The four three twiddle factors used are:

𝑊81 = 0.707 − 𝑗0.707; 𝑊82 = −𝑗; 𝑊83 = −0.707 − 𝑗0.707 (1)

The entire addition of the butterfly is explicitly calculated below where the hardware Verilog code is on the left and the actual

real world theoretical sum is on the right. If the actual sum is blank, this means that the hardware sum is exactly the same as the

actual sum. Also, if a sum is highlighted in red, that means hardware didn’t have to compute it due to cancellation of terms or

substitution tricks:

HARDWARE SUMS ≡ ACTUAL SUMS

First Phase of Butterfly Addition


6

S0 <= adc[0] + adc[4]; //

S1 <= adc[0] - adc[4]; // S1 = adc[0] + (-1*adc[4])

S2 <= adc[2] + adc[6]; //

S3 <= adc[6] - adc[2]; // S3 = -j*(adc[2] + (-1*adc[6]))

S4 <= adc[1] + adc[5]; //

S5 <= adc[1] - adc[5]; // S5 = adc[1] + (-1*adc[5])

S6 <= adc[3] + adc[7]; //

S7 <= adc[7] - adc[3]; // S7 = -j*(adc[3] + (-1*adc[7]))

2nd Phase of Butterfly Addition

T0 <= S0 + S2; // T0 = S0 + S2 = adc[0] + adc[4] + adc[2] + adc[6]

T1 <= S1 - S3; // T1 = S1 + S3 = adc[0] + (-1*adc[4]) + (-j*(adc[2] + (-1*adc[6])))

T2 <= S0 - S2; // T2 = S0 + (-1*S2) = adc[0] + adc[4] + (adc[2] + adc[6])*-1

T3 <= S1 + S3; // T3 = S1 + (-1*S3) = adc[0] + (-1*adc[4]) + (-j*(adc[2] + (-1*adc[6])))*-1

T4 <= S4 + S6; // T4 = S4 + S6 = adc[1] + adc[5] + adc[3] + adc[7]

T5 <= S5 - S7; // T5 = (S5 + S7)*(0.707-j0.707) = adc[1] + (-1*adc[5]) + (-j*(adc[3] + (-1*adc[7]))*(0.707-j0.707)

T6 <= S6 - S4; // T6 = S4 + (-1*S6) = adc[1] + adc[5] + (adc[3] + adc[7])*-1

T7 <= S5 + S7; // T7 = -1*(S5 + S7)*(-0.707-j0.707) = adc[1] + (-1*adc[5]) + (-j*(adc[3] + (-1*adc[7]))*(-0.707-j0.707)*-1

3rd Phase of Butterfly Addition

U0 <= T0 + T4; // U0 = T0 + T4 = S0 + S2 + S4 + S6 = adc[0] + adc[4] + adc[2] + adc[6] + adc[1] + adc[5] + adc[3] + adc[7]

U1 <= T1 + T5 // U1 = T1 + T5 = S1 + S3 + (S5 + S7)*(0.707-j0.707) = adc[0] + (-1*adc[4]) + (-j*(adc[2] + (-1*adc[6]))) +

adc[1] + (-1*adc[5]) + (-j*(adc[3] + (-1*adc[7]))*(0.707-j0.707)

U2 <= T2 + T6 // U2 = T2 + T6 = S0 + (-1*S2) + S4 + (-1*S6) = adc[0] + adc[4] + (adc[2] + adc[6])*-1 + adc[1] + adc[5] +

(adc[3] + adc[7])*-1

U3 <= T3 + T7 // U3 = T3 + T7 = S1 + (-1*S3) + -1*(S5 + S7)*(-0.707-j0.707) = adc[0] + (-1*adc[4]) + (-j*(adc[2] + (-

1*adc[6])))*-1 + adc[1] + (-1*adc[5]) + (-j*(adc[3] + (-1*adc[7]))*(-0.707-j0.707)*-1

U4 <= T0 + T4; // U4 = T0 + (-1*T4) = S0 + S2 + (S4 + S6)*-1 = adc[0] + adc[4] + adc[2] + adc[6] + (adc[1] + adc[5] + adc[3]

+ adc[7])*-1

U5 <= T1 + T5; // U5 = T1 + (-1*T5) = S1 + S3 + ((S5 + S7)*(0.707-j0.707))*-1 = adc[0] + (-1*adc[4]) + (-j*(adc[2] + (-

1*adc[6]))) + (adc[1] + (-1*adc[5]) + (-j*(adc[3] + (-1*adc[7]))*(0.707-j0.707))*-1


7

U6 <= T2 + T6; // U6 = T2 + (-1*T6) = S0 + (-1*S2) + (S4 + (-1*S6))*-1 = adc[0] + adc[4] + (adc[2] + adc[6])*-1 + (adc[1] +

adc[5] + (adc[3] + adc[7])*-1)*-1

U7 <= T3 + T7; // U7 = T3 + (-1*T7) = S1 + (-1*S3) + (-1*(S5 + S7)*(-0.707-j0.707))*-1 = adc[0] + (-1*adc[4]) + (-j*(adc[2]

+ (-1*adc[6])))*-1 + (adc[1] + (-1*adc[5]) + (-j*(adc[3] + (-1*adc[7]))*(-0.707-j0.707)*-1)*-1.

You can see that there is a terrible amount of math involved even for eight bits. But, as shown below in Figs. 2-5, there is a great

deal of substituting and cancelling due to double-negatives and imaginary numbers getting squared:

Fig. 2. Butterfly Radix-2 calculations.


8

Fig. 3. Butterfly Radix-2 calculations continued.


9

Fig. 4. Butterfly Radix-2 calculations continued.


10

Fig. 5. Butterfly Radix-2 calculations continued.


11

From the figures above, you may have been able to notice that double negatives cancel, j^2 causes subtractions instead of additions,

and that the 0.707 portion of the twiddle factor is squared in several places and then doubled which leaves a coefficient of

approximately 1. Then you can combine portions of the sums that are real valued separate from the sums that have an imaginary

coefficient like so:

r0 <= T0 + T4;

i0 <= 0;

r1 <= S1 - S7;

i1 <= -S3 - S5;

r2 <= T2 - T6;

i2 <= 0;

r3 <= S1 - S7;

i3 <= S3 + S5;

r4 <= T0 - T4;

i4 <= 0;

r5 <= S1 + S7;

i5 <= -S3 + S5;

r6 <= T2 + T6;

i6 <= 0;

r7 <= S1 + S7;

i7 <= S3 - S5;

It is interesting to note that due to cancellation, some of the T sums aren’t even used and many of imaginary bit indices end of

summing to zero.

The next idea I had was to add another output freq_data so that my FFT was general purpose. This was probably the best decision

I made so the chip can be coupled with several other external circuits. Also, I decided to make the square root function a normalizing

function instead of the actual function. I don’t like this decision, but in order to see results from my simulations I had to do it. I

plan to fix this before sending my core to MOSIS. The next thing I decided was to make my Verilog scalable. This was the second

most important decision because I could get better precision and handle more data in my core at one time depending how big my

core got with each scaling of 2 (8, 16, 32-bits, etc.). Right now, with 8-bits my code circuit is extremely small. I can make it bigger

for sure due to how I pre-calculated the butterfly radix-2 stages. I also think it would be really cool to put a super simple reverb
12

loop on my input ADC data just to see what it sounds like. I also didn’t realize that I could have reg signed input or reg signed

output which made simulation terrible until I realized how to properly code that syntax. That was a lifesaver.

III. QUALITY AND COMPLETION

To this point – taking into account that it’s just me and I’ve come this far since we met two days ago – I have tested the 8-bit

PIPO FFT using NC-Verilog and gotten accurate results. This code forms a perfectly normal fft_struct.v file in Synopsys. The 8-

bit SIPO is also tested and is working, but my counter isn’t working for whatever reason. It looks exactly like the other groups’

FFT code. Since the SIPO is just a shift register, the radix-2 still works without a counter and just computes the FFT every clock

cycle instead of every eight cycles – pushing the MSB off every positive edge of the clock.

I’ve been able to open Innovus with a generated Verilog structural file and it places my nets and my butterfly addition. This is

how I knew my circuit was able to be scalable to a higher bit resolution. Unfortunately, I haven’t been able to test my ROM,

which is a big deal I know, but I don’t think it will be too difficult. Otherwise over the break all I need to do is get my pad rings

and drive strengths connected once I get my ROM synthesized properly and then we’re in the clear for fabrication.

IV. TESTING

To simulate my Verilog code, I started out really simple. I just wanted my two-input radix-2 to work. It simulated with flying

colors. Then I went on to the 8-bit FFT. The code and simulation is shown below with no problems:

module FFT (input clk, clr,


input signed [7:0] adc,
output reg signed [15:0] display_data,
output reg signed [7:0] S0, S1, S2, S3, S4, S5, S6, S7,
output reg signed [7:0] T0, T1, T2, T3, T4, T6,
output reg signed [15:0] r0, r1, r2, r3, r4, r5, r6, r7,
output reg signed [15:0] i0, i1, i2, i3, i4, i5, i6, i7,
output reg signed [15:0] freq_data,
output reg [15:0] mag_data);

always @ (*)
begin
S0 <= adc[0] + adc[4];
S1 <= adc[0] - adc[4];
S2 <= adc[2] + adc[6];
S3 <= adc[6] - adc[2];
S4 <= adc[1] + adc[5];
S5 <= adc[1] - adc[5];
S6 <= adc[3] + adc[7];
S7 <= adc[7] - adc[3];

T0 <= S0 + S2;
T2 <= S0 - S2;
T4 <= S4 + S6;
T6 <= S6 - S4;

r0 <= T0 + T4;
13

i0 <= 0;
r1 <= S1 - S7;
i1 <= -S3 - S5;
r2 <= T2 - T6;
i2 <= 0;
r3 <= S1 - S7;
i3 <= S3 + S5;
r4 <= T0 - T4;
i4 <= 0;
r5 <= S1 + S7;
i5 <= -S3 + S5;
r6 <= T2 + T6;
i6 <= 0;
r7 <= S1 + S7;
i7 <= S3 - S5;

freq_data_real <= {r7,r6,r5,r4,r3,r2,r1,r0};

freq_data_imag <= {i7,i6,i5,i4,i3,i2,i1,i0};

mag_data <= ((freq_data_real*freq_data_real) + (freq_data_imag*freq_data_imag));

display_data <= mag_data;

end

endmodule

********TESTBENCH**********

// Verilog stimulus file.


// Please do not create a module in this file.

// Default verilog stimulus.

initial
begin

adc[7:0] = 8'b00000000;

for(adc = 0; adc < 256; adc = adc + 1)


begin
#5;
End

for(adc = 0; adc > -128; adc = adc - 1)


begin
#5;
end

end
14

Fig. 2. Simulation of 8-bit Radix-2 PIPO without clock.

I also simulated the SIPO shift register and got functional results – which will be fixed shortly – using this Verilog test bench

code:

module FFT (input clk, clr,


input signed adc0,
output reg signed [15:0] display_data,
output reg signed [7:0] a0,
output reg signed [15:0] freq_data
output reg [15:0] mag_data,
);

reg [2:0] counter;

always @ (posedge clk)


begin
if(clr)
begin
counter <= 0;
end
else
begin
counter <= counter + 1;

a0[0] <= adc0; // SHIFT REGISTER


a0[1] <= a0[0];
a0[2] <= a0[1];
15

a0[3] <= a0[2];


a0[4] <= a0[3];
a0[5] <= a0[4];
a0[6] <= a0[5];
a0[7] <= a0[6];
end

always @ (a0)
begin
--DO ALL THE MATH --
end
endmodule

********TESTBENCH**********

// Verilog stimulus file.


// Please do not create a module in this file.

// Default verilog stimulus.

initial
begin

adc0 = 1'b0;
clk = 1'b0;
clr = 1'b1;
#2
clr = 1'b0;

end
always #2 clk=~clk;
always #4 adc0 = ~adc0;

Fig. 3. Simulation of 8-bit Radix-2 PIPO without clock.


16

The output display_data keeps a consistent value because my input (adc) is giving it the same value once all eight slots of the

batch shift register are filled. Next shown is the symbol schematic for my circuit:

Fig. 4. FFT Schematic Symbol.

Thus, you can see the eight parallel ADC inputs, the one serial input, and the two 15-bit outputs display_data and freq_data. I

will test this schematic in Spectre this weekend for analog verification. To verify my chip before it is fabricated, I will use the

Xilinx FPGA board with specified fake ADC values and light up the 8 LEDs on the board in correspondence with the magnitude

amplitudes. To actually test the chip when it is fabricated, I intend to buy an 8x8 bit LED display, with my guitar connected to an

ADC that is wired to the chip.


17

APPENDIX B
LIBRARY DOCUMENTATION
AOI2X1: Fig. 9. AOI2X2 Schematic.

Fig. 8. AOI2X1 Schematic.


Fig. 7. AOI2X1 Schematic.
AOI2X2:

Fig. 10. AOI2X2 Layout.


18

BUF4X: BUF8X:

Fig. 11. BUF4X Schematic. Fig. 13. BUF8X Schematic.

Fig. 12. BUF4X Layout. Fig. 14. BUF8X Layout.


19

DFFX1:

Fig. 15. DFFX1 Schematic.

Fig. 16. DFFX1 Layout. Fig. 18. INVX1 Layout.

INVX1: FILL1: FILL4: FILL8:

Fig. 17. INVX1 Schematic. Figs. 19-21. FILL1, FILL4, FILL8 Layout.
20

INVX4: INVX8:

Fig. 24. INVX8 Schematic.

Fig. 22. INVX4 Schematic.

Fig. 23. INVX4 Layout. Fig. 25. INVX8 Layout.


21

MUX2X1: NAND2X1:

Fig. 26. MUX2X1 Schematic.

Fig. 28. NAND2X1 Schematic.

Fig. 27. MUX2X1 Layout. Fig. 29. NAND2X1 Layout.


22

NANDX2: NAND2X4:

Fig. 32. NAND2X4 Schematic.

Fig. 30. NANDX2 Schematic.

Fig. 31. NANDX2 Layout. Fig. 33 NAND2X4 Layout.


23

NAND3X1: NOR2X1:

Fig. 34. NAND3X1 Schematic.

Fig. 36. NOR2X1 Schematic.

Fig. 35. NAND3X1 Layout. Fig. 37. NOR2X1 Layout.


24

NOR2X4: NOR3X1:

Fig. 40. NOR3X1 Schematic.

Fig. 38. NOR2X4 Schematic.

Fig. 39. NOR2X4 Layout. Fig. 41. NOR3X1 Layout.


25

NORX2: OAI2X1:

Fig. 44. OAI2X1 Schematic.


Fig. 42. NORX2 Schematic.

Fig. 43. NORX2 Layout. Fig. 45. OAI2X1 Layout.


26

OAI2X2: TIEHI:

Fig. 48. TIEHI Schematic.

Fig. 46. OAI2X2 Schematic.

Fig. 47. OAI2X2 Layout.


Fig. 49. TIEHI layout.
27

TIEHI4X: TIELO:

Fig. 50. TIEHI4X Schematic.

Fig. 52. TIELO Schematic.

Fig. 53. TIELO Layout.


Fig. 51. TIEHI4X Layout.
28

TIELO4X: XOR2X1:

Fig. 56. XOR2X1 Schematic.

Fig. 54. TIELO4X Schematic.

Fig. 55. TIELO4X Layout. Fig. 57. XOR2X1 Layout.


29

Enabled Inverter: Transmission Gate:

Fig. 58. Enabled Inverter Schematic.

Fig. 60. Transmission gate Schematic.

Fig. 59. Enabled Inverter Layout. Fig. 61. Transmission gate Layout.
30

Cell Descriptions: 0 0 1
0 1 1
 General 1 0 1
o The D Flip Flop (DFFX1) is rising edge triggered, 1 1 0
master slave flip flop with asynchronous active-low
clear o NAND2X4
Inputs Output
o Any cell that has X2, X4, X8, or 4X in its name has a A B F
0 0 1
drive strength multiple of the X1 version i.e. INV8 has
0 1 1
eight times the logical drive strength than that of the 1 0 1
INVX1. The areas also are generally double. 1 1 0

o To find behavioral Verilog files for each cell please o NAND2X8


refer to the “Verilog” folder in the zip file Inputs Output
Final_Project.zip A B F
0 0 1
0 1 1
 Boolean Tables 1 0 1
o INVX1 1 1 0
Input Output
In Out o NORX2
0 1 Inputs Output
1 0 A B F
0 0 1
o INVX4 0 1 0
Input Output 1 0 0
In Out 1 1 1
0 1
1 0 o NOR2X1
Inputs Output
o INVX8 A B F
Input Output 0 0 1
In Out 0 1 0
0 1 1 0 0
1 0 1 1 1

o BUF4X o NOR2X4
Input Output Inputs Output
In Out A B F
0 0 0 0 1
1 1 0 1 0
1 0 0
o BUF8X 1 1 1
Input output
In Out o NOR2X8
0 0 Inputs Output
1 1 A B F
0 0 1
o NANDX2 0 1 0
Inputs Output 1 0 0
A B F 1 1 1
0 0 1
0 1 1 o TIEHI
1 0 1 Output
1 1 0 F
1
o NAND2X1
Inputs Output o TIEHI4X
A B F Output
0 0 1 F
0 1 1 1
1 0 1
1 1 0 o TIELO
Output
o NAND2X1 F
Inputs Output 1
A B F
31

o TIELO4X o OAI2X2
Output Inputs Output
F A B C F
1 0 0 0 1
0 0 1 1
o NAND3X1 0 1 0 1
Inputs Output 0 1 1 0
A B C F 1 0 0 0
0 0 0 1 1 0 1 0
0 0 1 1 1 1 0 0
0 1 0 1 1 1 1 0
0 1 1 1
1 0 0 1 o MUX2X1
1 0 1 1 Inputs Output
1 1 0 1 S P0 P1 F
1 1 1 0 0 0 0 0
o NOR3X1 0 0 1 1
Inputs Output 1 1 0 1
A B C F 1 1 1 1
0 0 0 1
0 0 1 0 o XOR2X1
0 1 0 0 Inputs Output
0 1 1 1 A B F
1 0 0 0 0 0 0
1 0 1 1 0 1 1
1 1 0 1 1 0 1
1 1 1 1 1 1 0

o AOI2X1 o Enabled Inverter


Inputs Output Inputs Output
A B C F In Enable Out
0 0 0 0 0 0 High-Z
0 0 1 0 1 0 High-Z
0 1 0 0 0 1 1
0 1 1 1 1 1 0
1 0 0 1
1 0 1 1
1 1 0 1
o Transmission gate
Inputs Output
1 1 1 1
In CLK Out
X 0 High-Z
o AOI2X2 X 0 High-Z
Inputs Output 0 1 0
A B C F 1 1 1
0 0 0 0
0 0 1 0
0 1 0 0  Timing, Area, Parasitics:
0 1 1 1
1 0 0 1 Please refer to Lib6710_1_Final.lib for timing, capacitance,
1 0 1 1 area, and other characteristics
1 1 0 1
1 1 1 1

o OAI2X1
Inputs Output
A B C F
0 0 0 1
0 0 1 1
0 1 0 1
0 1 1 0
1 0 0 0
1 0 1 0
1 1 0 0
1 1 1 0

View publication stats

You might also like