Fast Performance Pipeline Re-Configurable FFT
Fast Performance Pipeline Re-Configurable FFT
Fast Performance Pipeline Re-Configurable FFT
3, May 2019
Abstract—This paper proposes fast performance A reconfigurable computational scheme has attracted
reconfigurable pipeline variable points FFT processor many researchers [2]. Hardware acts as a general
design. This design is proposed for variable N points whose hardware solution for implementing a variety of different
values can be 8, 16, 32, 64, 128, 256 and 512 sample points.
computation within or across application domains which
Hence, the proposed reconfigurable design can be used for
different points OFDM applications rather using different require performing FFT of length N. Implementing each
designs. The proposed implementation design uses Radix-22 FFT on dedicated IP is an overhead in silicon area of the
Common Factor Algorithm (CFA) algorithm and SDF chip. The novel approach is used to design variable
architecture. Radix-22 CFA algorithm reduces the number length FFT processor to utilize the OFDM transmission.
of twiddle factors compared to Radix-4 and Radix-2 and Some of the approaches to design FFT are memory-based,
utilizes Radix-2 butterfly structure whose complexity is very pipeline-based and general-purpose DSP. Memory based
low. SDF architecture uses less memory and utilizes is most area efficient but it needs many computation
multipliers fully compared to others. The frequency of
proposed design is varied with the FFT points N. The
cycles. Pipeline based architecture possesses regularity,
proposed design is synthesized successfully using XST of modularity, local connection and high throughput rate
Xilinx ISE 14.1 and simulated using ModelSim & MATLAB with a lower clock frequency.
tool to verify results. Synthesized results of the proposed All implementations of FFT can be designed with
design for 512 points are 155.9 MHz max frequency, slice different pipelined architectures - SDF (single path delay
used 1795, slices Flip Flops used 1028 and LUTs used 3335 feedback), MDC (multipath delay commutator), and SDC
which are approximately 37% higher, 35%, 19%, 27% (single path delay commutator) architectures. The single
lesser in ratio than convention FFTs. path delay feedback structure is more suitable than
Index Terms—FFT, Radix-22 CFA, complex multiplier, SDF, multipath delay and single path delay commutator
OFDM architecture as: -
1) The SDF architecture is very convenient to
implement the different length FFT.
I. INTRODUCTION 2) The number of the required registers in SDF
The Fast Fourier Transform (FFT) has been identified architecture is lesser than that in MDC and SDC
as a widely used algorithm in orthogonal frequency structures [3].
division multiplexing systems. Therefore, Fast Fourier The controller of proposed FFT design is easier than
transform is a fundamental building block used in DSP the other structures as overall architecture and all
systems, with applications ranging from OFDM based components are controlled by count clock signal only.
Digital Modems, to Ultrasound, CT Image and RADAR This proposed research presents the implementation of
algorithms as shown in Fig. 1. OFDM is a very flexible reconfigurable FFT for variable length N which can be
and efficient modulation technique to transmit data used for 8, 16, 32, 64, 128, 256 and 512 points instead
reliably on high transmission in the wireless and wire using the different processor for different point’s
systems such as ADSL, IEEE802.11a, WPAN, WLAN applications. The radix impacts FFT processor. A small
and DVB-T [1]. FFT Implementation technique is fast radix increases the count of twiddle factors and states
and efficient than other techniques. simple structure of butterfly. A higher radix reduces the
count of twiddle factors and states complex structure of
butterfly. In the proposed processor, Radix-22 uses lesser
twiddle factors compared to Radix-4, and butterfly
structure based on Radix-2 that is the simplest structure.
This paper is structured as follows. Section II describes
Fig. 1. Receiver OFDM transmission block diagram. the proposed reconfigurable algorithm for variable length
FFT. Section III describes the architecture of proposed
Manuscript received April 7, 2018; revised October 25, 2018; reconfigurable Radix-22 FFT processor for different
accepted October 25, 2018. points. Section IV describes the comparison and the
Corresponding author: Manish Bansal (manishbansal2008@
gmail.com). conclusion is given in the last section.
II. PROPOSED RECONFIGURABLE ALGORITHM FOR The common factor algorithm (CFA) form [4]:
VARIABLE LENGTH FFT ( N 4) 1 1 1
N N
The Radix-22 Common Factor Algorithm (CFA) and X k1 2k2 4k3 = x n1 n2 n3 WNnk
n3 0 n2 0 n1 0 2 4
butterfly structure are used to implement the variable
length FFT processor. Thus, the proposed processor is ( N / 4) 1 1 1
N N
designed in such a way that it works for 8, 16, 32, 64, 128, = x n1 n2 n3
512 points FFT calculation. The Proposed processor is n3 0 n2 0 n1 0 2 4
implemented in pipeline single path delay feedback N N
n1 n2 n3 k1 2 k2 4 k3
structure that is more convenient structure and reduces WN 2 4
(4)
the FFT processor's complexity. Each FFT stage consists
of butterfly structure. In this section, Radix-22 common The twiddle factor can be expressed as
factor algorithm, components, and the proposed FFT N N
reconfigurable processor are described for variable length. n1 n2 n3 k1 2 k2 4 k3
WNnk =WN 2 4
The definition of Discrete Fourier Transform (DFT) of (5)
= (1) ( j ) (1)n1k2 WNn3 ( k1 2 k2 )WNn3/k43
n1k1 n2 k1
size N is defined as [4]: -
BU PE BU PE
N 1
X k x n WNnk 0 k N (1) where n1, n2, and n3 are the index terms of the input sample
n 0 n and k1, k2, and k3 are the index terms of the output
where the twiddle WNnk is defined as
factor sample k. (1)n1k1 and (1)n1k2 are butterfly elements.
exp( j 2 nk / N ) and N represents the length of the
( j )n2 k1 and WNn3 ( k1 2 k2 ) are processing elements.
sequence to be transformed. n is time domain ordinal. k is
The detailed structure of the proposed pipeline Re-
frequency domain ordinal. The Radix-22 algorithm is
configurable FFT processor based on Radix-22 for
expressed using a common factor algorithm and 3-
variable length N is shown Fig. 2 that retains the Log2N
dimensional index mapping. The Radix-22 algorithm for
number of stages. Shift register (SR) stores values. For N
N points is presented as [5].
points, the 1st shift register stores N/2 values, the 2nd shift
Applying divide and conquer 3-D liner index mapping:
register stores N/4 values, the 3rd shift register stores N/8
N N values, so on and last shift register stores one value.
n n1 n2 n3 (2) Twiddle generator produces twiddle factors per
2 4
k k1 2k2 4k3 (3) WNn3 ( k1 2 k2 ) based on n3, k1 and k2 values. Swapper
( j )n2 k1 swaps signed real value and imaginary value
where N = 8, 16, 32, 64, 128, 256 and 512.
then invert the sign of swapped imaginary value.
n1 , n2 0,1 and n3 0,1,
, N 22 1 Butterfly structure adds, subtracts and bypasses two input
values. A complex multiplier multiplies twiddle factor
k1 , k2 0,1 and k3 0,1, , N 2 1
2
value with input complex value
Fig. 3 shows a signal flow graph (SFG) of the calculations are based on n1, n2, n3, k1, k2, k3 which are
proposed reconfigurable Radix-22 FFT processor. It is to calculated as per N-points. Similarly, next two stage
be noted that inputs are in normal order and outputs are in calculations are based on n1, n2, n3, k1, k2, k3 which is
permuted (bit-reversed) order of inputs. First two stage calculated as per N/4-points and so on. First stage
performs the operation between the nth and the (n+N/2)th Fig. 5 Trivial (–j) multiplier performs swapping of real
points with butterfly structure, next stage performs the imaginary values and sign inversion of real value. The
operation between the (n+N/2)th and the (n+N/4)th points signal ‘S’ controls this swapping and the 2nd compliment
and so on. This process runs continuously up to last stage invert real value sign. Whenever (–j) multiplication is
log2N which performs the operation between the 1st and
required as per swapper ( j )n2 k1 value, multiplexers
the 2nd point, the 3rd and the 4th point and so on.
Mux1 and Mux2 switch to ‘1’ then real value sign is
inverted and imaginary signed value is swapped to real
and inverted real signed value is swapped to imaginary.
In this way, instead of doing (–j) multiplication,
swapping & sign inversion is done. This approach
enhances the frequency and reduces the logic and
hardware.
can be found at each position for next stages, later PEs at be discussed. Similarly, PEs at each position for others
each position and each stage are shown in Fig. 8 and Fig. point’s FFT can be found using parameters values in
9 for 16 points FFT and 32 Points FFT respectively will Table II.
TABLE I: COMPONENTS USED IN THE PROPOSED RECONFIGURABLE FFT BASED ON VARIABLE LENGTH N
Number of Number
Number Number of Number of values Number of Number of
N twiddle of
of Stages Shift Register Shift Registers store twiddle ROM twiddle factors
generators Swappers
8 3 3 4+2+1 1 1 3 1
16 4 4 8+4+2+1 1 1 9 2
32 5 5 16+8+4+2+1 1 2 21+12 2
64 6 6 32+16+8+4+2+1 1 2 45+36 3
128 7 7 64+32+16+8+4+2+1 1 3 93+84+48 3
256 8 8 128+64+32+16+8+4+2+1 1 3 191+180+144 4
512 9 9 256+128+64+32+16+8+4+2+1 1 4 381+372+336+192 4
TABLE II: PROCESSING ELEMENTS PARAMETERS VALUES FOR THE PROPOSED RECONFIGURABLE FFT
After 1st After 2nd After 4th After 5th After 6th After 8th
PEs After 3rd After 7th
Stage Stage Stage Stage Stage Stage
N
WNn3 ( k12k2 )
Stage ( j )n2k1 WNn3 ( k12k2 ) WNn3 ( k12k2 )
Stage ( j )n2k1 WNn3 ( k12k2 )
( j )n2k1 ( j )n2k1
n1, n2 = 0, 1
k1, k2 = 0, 1
8 No PE No PE No PE No PE No PE No PE
n3, k3 = 0, 1
N =8
n1, n2 = 0, 1 n1, n2 = 0, 1
16 k1, k2 = 0, 1 k1, k2 = 0, 1
No PE No PE No PE No PE No PE
n3, k3 = 0, 1, .., 3 n3, k3 = 0
N =16 N =4
n1, n2 = 0, 1, k1, k2 = 0, 1 n1, n2 = 0, 1, k1, k2 = 0, 1
32 n3, k3 = 0, 1, .., 7 n3, k3 = 0, 1 No PE No PE No PE No PE
N =32 N =8
n1, n2 = 0, 1, n1, n2 = 0, 1 n1, n2 = 0, 1
k1, k2 = 0, 1 k1, k2 = 0, 1 k1, k2 = 0, 1
64 No PE No PE No PE
n3, k3 = 0, 1, .., 15 n3, k3 = 0, 1, .., 3 n 3, k3 = 0
N =64 N =16 N=4
n1, n2 = 0, 1, k1, k2 = 0, 1 n1, n2 = 0, 1, k1, k2 = 0, 1 n1, n2 = 0, 1, k1, k2 = 0, 1
128 n3, k3 = 0, 1,.., 31 n3, k3 = 0, 1, .., 7 n 3, k3 = 0, 1 No PE No PE
N =128 N =32 N=8
n1, n2 = 0, 1, k1, k2 = 0, 1 n1, n2 = 0, 1, k1, k2 = 0, 1 n1, n2 = 0, 1, k1, k2 = 0, 1 n1, n2 = 0, 1, k1, k2 = 0, 1
256 n3, k3 = 0, 1, .., 63 n3, k3 = 0, 1, .., 15 n 3, k3 = 0, 1, .., 3 n 3, k3 = 0 No PE
N =256 N=64 N=16 N=4
n1, n2 = 0, 1 n1, n2 = 0, 1 n1, n2 = 0, 1 n1, n2 = 0, 1
k1, k2 = 0, 1 k1, k2 = 0, 1 k1, k2 = 0, 1 k1, k2 = 0, 1
512
n3, k3 = 0, 1,..,127 n3, k3 = 0, 1, .., 31 n 3, k3 = 0, 1, .., 7 n 3, k3 = 0, 1
N =512 N=128 N=32 N=8
0
After 2nd stage
W512 [((N/4) *3)-3] number
of Twiddles WN values
1
W512 After 4th stage
Twiddle ROM
[((N/16) *3)-3]*4number
of Twiddles WN/4 values
… After 6th stage
W…511 [((N/64) *3)-3]*16number
512
… … of Twiddles WN/16 values
… …
.. ..
… After mth stage
[((N/4^(m/2)) *3)-3]*4^(m/2-1) number
…
of Twiddles WN/4^(m/2-1) values
..
Fig. 6. Twiddle values fetched from twiddle ROM.
The twiddle ROM has 512 twiddle factors values from 6 and Table III. These values are fetched from ROM
0
W 512 to W 511
512 in the proposed processor. Even stages based on symmetric W kN N / 2 W kN and periodicity
have the number of twiddle factors which are generated W kN N / 2 W kN properties. Some of the values are shown
by twiddle generators and the value of these twiddle in Table III. This method is reducing area as even stages
factors are fetched from twiddle ROM as shown in a Fig. use the same values of twiddle ROM for different FFT
points. The twiddle factor W 0N value wherever is reduce hardware and increase the performance of the
required is bypassed rather than multiplying in the proposed design.
proposed processor as W 0N =1. These all techniques
TABLE III: TECHNIQUE TO FIND TWIDDLE FACTORS FROM TWIDDLE ROM
PEs After 2nd Stage After 4th Stage After 6th Stage After 8th Stage
N W n3(k12k 2) W n3(k12k 2) W n3(k12k 2) W nN3(k12k 2)
N N N
W 512 W 8
128 2
W 512 W 8
64 1
8 No PEs as there are only 3 stages in 8 points FFT
W 512 W 8
192 3
64
W 512 W 16
2
16 W 512 W 16
32 1
No PEs as there are only 4 stages in 16 points FFT
96
W 512 W 16
3
W 512 W 32 W 512 W 8
32 2 128 2
W 512 W 32 W 512 W 8
224 14 64 1
32 No PEs as there are only 5 stages in 32 points FFT
W 512 W 32 W 512 W 8
112 7 192 3
W 512 W 64 W 512 W 16
240 30 64 2
W 512 W 64 W 512 W 16
120 15 32 1
64 No PEs as there are only 6 stages in 64 points FFT
W 512 W 16
360 45 96 3
W 512 W 64
W 512 W 128 W 512 W 32 W 512 W 8
240 60 32 2 128 2
flow continuous up to the 31st cycle. Stage 4 additions also performed at the clocks wherever it’s required based
outputs from 16th to 31st cycles are FFT results. on addresses as shown in Fig. 8 and Fig. 9. These
Swapping and twiddle multiplication operations are operations are controlled by counter signal.
Int. Conf. on Signal Processing and Integrated Networks, Feb. Manish Bansal received the B.E. degree in Electronics &
2017, pp. 607-612. Telecommunication and M.Tech. degree. His research interests include
[20] S. He and M. Torkelson, “A new approach to pipeline processor,” fast Fourier transform, signal processing, parallel computing, and VLSI.
in Proc. International Conference on Parallel Processing, April Dr. Sangeeta Nakhate received the Ph.D. Her research interests
1996, pp. 766-770. include fast Fourier transform, signal processing, VLSI and microwave
fields.