High Speed, Low Complexity, Folded, Polymorphic Wavelet Architecture Using Reconfigurable Hardware
High Speed, Low Complexity, Folded, Polymorphic Wavelet Architecture Using Reconfigurable Hardware
1
International Journal of Computer Applications (0975 – 8887)
Volume 2 – No.5, June 2010
are obtained by quadrature mirroring the low-pass filters. The initial 2.2. Reconfigurable hardware
step for calculation of rationalized 9/7 coefficients is the analysis of
When 9/7 filter coefficient are expressed in terms of 5/3 filter
filter bank structure [2].
coefficients number of adders are reduced. The hardware complexity
(z)=Z (-z) 1) is again reduced.
(z) = (-z) (2)
D(Z)= (Z)= (Z) (3) Table 3. 5/3 filter coefficients
B+3C+ A+3B +3C =0 (4a) I H(i) F(i)
3A+3B+C+3 +3A +B =0 (4b) 2 -1/8 0
3+A+ =0 (4c) 1 2/8 -1/2
The variables A,B,C in the simultaneous equations are solved to get
the rational 9/7 coefficients. The parameter (free factor) can be 0 6/8 1
varied between -1.5 and -2. The parameters A ,B and C depend By comparing table3 with table 2 and 1 9/7 filter coefficients can be
on value of . Setting = -1.6848 will give back the original “9/7” expressed in terms of 5/3 filter coefficients, which is given by the
filter pair. When =-2 , the coefficients of the “9/7” can be following equations.
expressed as sum of power of two (SPT) terms and the filter can be
implemented without any multipliers. Low9/7=low5/3(i)-w(0)/32+w(4)/64 (5)
High9/7=high5/3(i)/2-w(1)+w(3)/32 (6)
Table 1. Low pass coefficients The 5/3 filter is implemented such that this portion (5/3) is common
to 5/3 and 9/7 filter. Switches are used to select between 5/3 and 9/7
filter. In this implementation multipliers are removed by expressing
coefficients in SPT form. Another select line is used to select high
-3/2 [5,5-4,29,46,29,-4,-5,5]/96 pass and low pass filter. The reconfigurable folded architecture is
shown in figure 4. The input signal is an image and it is given serially
-5/3 [9,-6,-24,86,190,86,-24,-6,9]/320 to the architecture.
-3/2 [5,5-4,29,46,29,-4,-5,5]/96
-5/3 [9,-6,-24,86,190,86,-24,-6,9]/320
-8/5 [5,-4,-9,40,80,40,-9,-4,5]/144
-2 [1,0,-8,16,46,16,-8,0,1]/64
Fig. 2. Unfolded structure
2
International Journal of Computer Applications (0975 – 8887)
Volume 2 – No.5, June 2010
weight(delay) of computational units. Scheduled circuit requires less visited, the algorithm targets the next level and so on till all nodes
time to compute the output of DWT than the unscheduled circuit. are sorted . The edge within the DFG represent the data flow. In this
way, the architecture is scheduled and reduce the overall latency.
Many well known scheduling algorithms like weight based
scheduling, resource scheduling, priority scheduling etc. Among
these weight based scheduling is chosen for the work due to many
reasons. It is simple, when compared to other scheduling Algorithm for weight based scheduling
algorithms like priority scheduling algorithm which requires
assigning priority to the nodes of the circuit. Weight based Input :There are n task{1,2,...n} each task has a weight wi and W is
scheduling is sufficient for simple circuits such that reconfigurable the upper bound, a data flow graph is drawn with n nodes.
architecture considered in this work. Thus it eliminates the need the Output :Set of scheduled nodes S.
need for prioritizing the node, which is critical only for complicated Initialize the graph.
circuits. Resource based scheduling is oriented towards optimal use Let Ο denote the set of unscheduled nodes.
of resources and hence will increase the delay. Taking into account While Ο is empty
the delay constraint and simplicity, weight based scheduling do
algorithm is optimal and then it has been implemented in this work. Find the lightest node in the top level of DFG ,let it be x. then
If wx < W then
The delay of the various computational elements in the architecture Remove x from the Ο set and include it in the S set.
are computed and are assigned as weight of the nodes in the DFG. Repeat ,until Ο is empty
Then these nodes are sorted based on their weights, and are visited in
an increasing order. Once all the nodes in the current level have been
3
International Journal of Computer Applications (0975 – 8887)
Volume 2 – No.5, June 2010
4. SIMULATION RESULT Table 6. Comparison of unfolded vs folded hardware
The image input is given to the architecture as 8 bit binary values. LUT Power Adders
For 1D DWT first row filtering is done. Then the output of 1D DWT
is given as the input for the next stage. The same architecture is used Unfolded 186 49.55mW 12
for next stage filtering. In the 2D DWT the image input is fed column reconfigurable
wise to the hardware. hardware
Verilog simulation of the proposed architecture is carried out in Folded 172 40.17mW 9
Model sim SE 6.5. The Model sim results are verified with reconfigurable
MATLAB output to ensure the functionality of the circuit. The hardware
timing information is obtained from Quartus II. Using that delay
The table 6 is the comparison between folded and unfolded
information C code is developed for scheduling. The first step is to
architectures. The folded design requires less number of LUTs,
convert the behavioural description into the data flow graph (DFG).
adders and logic slices than unfolded architecture.
The DFG represents the reconfigurable architecture and the
arithmetic operations are represented by the nodes of the DFG. The 6. CONCLUSION
next step is to schedule the operations depending on the data
availability. Overall speed of the circuit is increased by scheduling. The proposed architecture is a very efficient for fast computation of
DWT. The speed of the architecture is improved by scheduling. This
5.EXPERIMENTAL RESULTS high speed circuit has only few adders and shifters thus reducing the
complexity to a great extent. The future work includes dynamic
The proposed architecture requires less number of adders and no
power reduction technique for reducing the power consumption of
multipliers than previous works given in table4.
switched modules.
Table 4. Comparison of hardware complexity of
proposed architecture with existing works 7. REFERENCES
[1] M. Martina and G. Masera, “Multiplierless, folded 9/7 5/3
Proposed [5] [7] [8] [6] [1] wavelet VLSI architecture,” IEEE Transactions on Circuits and
work Systems II, vol. 54, no. 9, pp. 770–774, Sept. 2007.
[2] D. Tay, “Rationalizing the coefficients of popular biorthogonal
wavelet filters,”IEEE Transactions on Circuits and Systems for
Adders 9 4 43 32 21 19 Video Technology,, vol. 10, no. 6,pp. 998–1005, Sept. 2000.
[3] A. Pande and J. Zambreno, “Design andanalysis of efficient
Multipliers 0 3 0 0 0 0 reconfigurable wavelet filters,” in IEEE International
Conference on Electro/ Information Technology
[4] K. Kotteri, S. Barua, A. Bell, and J. Carletta, “A comparison of
Table 5. Comparison of speed of scheduled vs unscheduled hardware implementations of the biorthogonal 9/7 DWT:
hardware convolution versus lifting,” IEEETransactions on Circuits and
Systems II, vol. 52, no. 5, pp. 256–260, May 2005.
Low High High Low [5] The improved lifting scheme and novel reconfigurable VLSI
architecture for the 5/3 and 9/7 wavelet filtersIEEE Int.
pass pass pass pass Conf.Acoustics, Speech, Signal Processing, vol. 5, May 2004,
5/3 5/3 9/7 9/7 pp. 730–732.
[6] M. Martina and G. Masera, “Low-complexity, efficient 9/7
wavelet filters implementation,”in Proceedings of the Intl.
Scheduled 10μs 13μs 13μs 16 μs Conference on Image Processing (ICIP), Sept. 2005.
Unscheduled 15μs 15μs 28μs 28μs [7] M. Alam, C. Rahman, W. Badawy, and G. Jullien, “Efficient
distributed arithmetic based dwt architecture for multimedia
applications,” in Proceedings of the Intl. Workshop on SoC for
Real Time Applications, 2003, pp. 333–336.
From the table 5 shows the comparison between scheduled and [8] K. A. Kotteri, A. E. Bell, and J. E. Carletta, “Design of
unscheduled architecture. The speed of 9/7 filter has been improved multiplierless.high-performance, wavelet filter banks with image
by scheduling. compression applications,”IEEE Trans. Circuits Syst. I, Reg.
Papers, vol. 51, pp. 483–494,Mar. 2004.