0% found this document useful (0 votes)
36 views4 pages

High Speed, Low Complexity, Folded, Polymorphic Wavelet Architecture Using Reconfigurable Hardware

The document discusses a high-speed, low-complexity polymorphic wavelet architecture for reconfigurable hardware. It uses a multiplier-free implementation of the 5/3 and 9/7 wavelet filters through Lagrange half-band filtering to approximate filter coefficients rationally. This allows expressing the 9/7 filter in terms of the 5/3 filter, reducing the number of adders. The architecture can dynamically switch between 5/3 and 9/7 filters for applications like JPEG 2000 image compression. A scheduling algorithm is also used to improve processing speed by reducing inter-task communication and data dependencies.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views4 pages

High Speed, Low Complexity, Folded, Polymorphic Wavelet Architecture Using Reconfigurable Hardware

The document discusses a high-speed, low-complexity polymorphic wavelet architecture for reconfigurable hardware. It uses a multiplier-free implementation of the 5/3 and 9/7 wavelet filters through Lagrange half-band filtering to approximate filter coefficients rationally. This allows expressing the 9/7 filter in terms of the 5/3 filter, reducing the number of adders. The architecture can dynamically switch between 5/3 and 9/7 filters for applications like JPEG 2000 image compression. A scheduling algorithm is also used to improve processing speed by reducing inter-task communication and data dependencies.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

International Journal of Computer Applications (0975 – 8887)

Volume 2 – No.5, June 2010

High Speed, Low complexity, Folded, Polymorphic


Wavelet Architecture using Reconfigurable Hardware
R.Lavanya Saranya B
VLSI signal processing research group VLSI signal processing research group
Amrita school of engineering ,Ettimadai Amrita school of engineering ,Ettimadai
Coimbatore Coimbatore

ABSTRACT computed using adders butterflies. In cascaded implementation[8]


The main aim of this paper is to design and implement a high speed, uses only adders and shifters for calculating DWT. Lifting scheme
low complexity and polymorphic architecture for reconfigurable [4] require long critical path. Flipping requires multipliers for DWT
folded wavelet filters. 5/3 wavelet results are incorporated into the computation.
9/7 data path which reduces the number of adders compared to other
solutions and also allows on the fly switching between the filters. Many DWT approaches uses multiplier free implementation
The proposed work is to improve the speed of this reconfigurable [7],[8],[6] but all these approaches need more hardware than the
architecture. This is accomplished by scheduling. A weight based proposed work. The improved lifting and reconfigured 5/3 and 9/7
scheduling algorithm has been used in this paper. This is an analysis wavelet filters [5], overcome the problem of traditional lifting scheme
method to improve inter task communication as well as data but it involves more number of multipliers which means more
dependencies among tasks which will reduce the overall hardware than multiplier free implementation. The direct filter bank
communication overhead and processing time. approach is used in [3] and [1]. By implementing multipliers in terms
of adders and shifters, the complexity of the architecture can be
I. INTRODUCTION reduced. Multipliers are replaced with adders and shifters by
The Discrete Wavelet Transform is used for image compression in representing coefficients in canonical sign digit form. With
many signal processing applications. DWT can be implemented using polymorphic wavelet architecture further reduction in hardware is
filter bank structures. Polymorphic DWT architecture enables the achieved by expressing 9/7 filter in terms of 5/3 filter, which reduce
dynamic control of allocated resources to get high performance the number of adders in the architecture.5/3 filter implementation is
output. Polymorphism is the ability of the architecture to adapt its common to both 5/3 and 9/7 implementation on fly switch is used to
hardware usage to meet the desired dynamic requirements. The select these filters.
resultant hardware has high throughput and high image
reconstruction quality, and low power consumption. This is mainly
used in real time image and multimedia processing application such
as video surveillance and telemedicine. Most implementations
approximate the irrational coefficients to rational values, leading to
image reconstruction quality tradeoffs. To overcome this Lagrange
Half Band Filter [2] which can be used for the computation of
rational coefficients. This method of calculation of rational filter Fig.1. Filter bank structure of DWT
coefficients are more accurate.
The 9/7 filter has irrational filter coefficients and requires more
Biorthogonal filters are used for implementing polymorphic DWT.
hardware. So quantizing the filter coefficients results in minimum
In biorthogonal filter bank all the filters are finite-impulse response
hardware but leads to poor performance. In this work approximation
(FIR) filters. Biorthogonal wavelets have symmetric filter
of filter coefficients are done using a Lagrange half band
coefficients and they have orthogonality property. Symmetric
filter(LHFB). This is an efficient method that preserves
filters have linear phase property which
biorthogonality and perfect reconstruction. LHBF convert irrational
prevents the phase distortion at the filter output. The main
values to fractional value which can be easily expressed in hardware.
difference between traditional filter bank design and wavelet filter
design is the regularity of the wavelet filter design. Regularity can be
achieved by preserving zeros at the aliasing frequency (z=-1). 2. THE POLYMORPHIC ARCHITECTURE
The main aim of this work is to achieve a high speed, accurate, low
The implementation of CDF-9/7 over fixed point arithmetic leads to complexity, reconfigurable hardware. To obtain a low complexity
quantization error and requires large amounts of hardware resources hardware multiplier free implementation is used. Reconfigurable
for perfect image reconstruction. Multiplier free implementation hardware reduces number of adders in the circuit. It is expressing 9/7
results in reduced complexity without reducing the accuracy. The two filter in terms of 5/3. The number of adders are further reduced by
most common standard filters for DWT are Le Gall’s 5/3 filter and folding the reconfigurable architecture.
the Daubechies 9/7 filter. Using these two filters poly-DWT filter
achieves high image compression and power efficient 2.1 Lagrange Half Band Filter
Lagrange filter is used for generating rational coefficients of 9/7
implementation. Reconfiguring 5/3 and 9/7 filters reduces the
filter. The rational values for 9/7 filter are computed by analyzing
hardware cost. A typical application where this reconfigurable
architecture finds use is the free JPEG 2000 software codec (Jasper). filter bank structure given in figure 1. The analysis and synthesis low
The intra frame coding is done using 9/7 filter and differential frame pass filters are denoted by F0 and H0 respectively. The analysis and
coding is done using 5/3 filter. In Distributed arithmetic [7] DWT is synthesis high-pass filters are denoted by F1 and H1 respectively and

1
International Journal of Computer Applications (0975 – 8887)
Volume 2 – No.5, June 2010
are obtained by quadrature mirroring the low-pass filters. The initial 2.2. Reconfigurable hardware
step for calculation of rationalized 9/7 coefficients is the analysis of
When 9/7 filter coefficient are expressed in terms of 5/3 filter
filter bank structure [2].
coefficients number of adders are reduced. The hardware complexity
(z)=Z (-z) 1) is again reduced.
(z) = (-z) (2)
D(Z)= (Z)= (Z) (3) Table 3. 5/3 filter coefficients
B+3C+ A+3B +3C =0 (4a) I H(i) F(i)
3A+3B+C+3 +3A +B =0 (4b) 2 -1/8 0
3+A+ =0 (4c) 1 2/8 -1/2
The variables A,B,C in the simultaneous equations are solved to get
the rational 9/7 coefficients. The parameter (free factor) can be 0 6/8 1
varied between -1.5 and -2. The parameters A ,B and C depend By comparing table3 with table 2 and 1 9/7 filter coefficients can be
on value of . Setting = -1.6848 will give back the original “9/7” expressed in terms of 5/3 filter coefficients, which is given by the
filter pair. When =-2 , the coefficients of the “9/7” can be following equations.
expressed as sum of power of two (SPT) terms and the filter can be
implemented without any multipliers. Low9/7=low5/3(i)-w(0)/32+w(4)/64 (5)
High9/7=high5/3(i)/2-w(1)+w(3)/32 (6)
Table 1. Low pass coefficients The 5/3 filter is implemented such that this portion (5/3) is common
to 5/3 and 9/7 filter. Switches are used to select between 5/3 and 9/7
filter. In this implementation multipliers are removed by expressing
coefficients in SPT form. Another select line is used to select high
-3/2 [5,5-4,29,46,29,-4,-5,5]/96 pass and low pass filter. The reconfigurable folded architecture is
shown in figure 4. The input signal is an image and it is given serially
-5/3 [9,-6,-24,86,190,86,-24,-6,9]/320 to the architecture.

-8/5 [5,-4,-9,40,80,40,-9,-4,5]/144 2.4. Folded design


The folding transformation used in DSP architectures allows
-2 [1,0,-8,16,46,16,-8,0,1]/64 multiple operations like additions to be time multiplexed to a single
functional unit . By executing multiple operations on a single
TABLE 2. LOW PASS COEFFICIENTS functional unit, the number of functional units in the implementation
is reduced. The fig. 3 is the folded structure of fig.2 .

-3/2 [5,5-4,29,46,29,-4,-5,5]/96

-5/3 [9,-6,-24,86,190,86,-24,-6,9]/320

-8/5 [5,-4,-9,40,80,40,-9,-4,5]/144

-2 [1,0,-8,16,46,16,-8,0,1]/64
Fig. 2. Unfolded structure

The main technique used is to allow some degree of freedom in


choosing the coefficients by freeing some zeros of the LHBF. Table
1 contain low pass filter coefficients and table 2 contain high pass
filter coefficients of 9/7 filter.

2.1. Canonical Signed Digit


The rational filter coefficients are expressed as Sum or difference
of Powers of Two(SPT) for a multiplier less implementation. This
multiplier free implementation reduces the complexity of the Fig. 3. Folded adder structure
hardware. The total number of SPT used must be kept minimum in
order to achieve a low complexity hardware without reducing the
compression performance . There is a subset of SPT representation
3. PROPOSED WORK
called canonical signed digit (CSD) form. In CSD no two adjacent 3.1. Scheduling
bits are nonzero. It is an optimal representation in which nonzero Scheduling is used to improve the speed of the circuit. In this
digits are required to represent a given number. The main advantage proposed work weight based scheduling algorithm is used. The
of canonical representation is that it uses two third of as many output of the 5/3 filter is given to the input of the 9/7 filter. So a
nonzero digits as a binary representation, thus it reduce the number long delay occur at the output. To improve the speed of the circuit,
of adders required in hardware. schedule the computational units(adders and shifters) in the circuit. It
should be scheduled based on the availability of inputs and

2
International Journal of Computer Applications (0975 – 8887)
Volume 2 – No.5, June 2010
weight(delay) of computational units. Scheduled circuit requires less visited, the algorithm targets the next level and so on till all nodes
time to compute the output of DWT than the unscheduled circuit. are sorted . The edge within the DFG represent the data flow. In this
way, the architecture is scheduled and reduce the overall latency.
Many well known scheduling algorithms like weight based
scheduling, resource scheduling, priority scheduling etc. Among
these weight based scheduling is chosen for the work due to many
reasons. It is simple, when compared to other scheduling Algorithm for weight based scheduling
algorithms like priority scheduling algorithm which requires
assigning priority to the nodes of the circuit. Weight based Input :There are n task{1,2,...n} each task has a weight wi and W is
scheduling is sufficient for simple circuits such that reconfigurable the upper bound, a data flow graph is drawn with n nodes.
architecture considered in this work. Thus it eliminates the need the Output :Set of scheduled nodes S.
need for prioritizing the node, which is critical only for complicated Initialize the graph.
circuits. Resource based scheduling is oriented towards optimal use Let Ο denote the set of unscheduled nodes.
of resources and hence will increase the delay. Taking into account While Ο is empty
the delay constraint and simplicity, weight based scheduling do
algorithm is optimal and then it has been implemented in this work. Find the lightest node in the top level of DFG ,let it be x. then
If wx < W then
The delay of the various computational elements in the architecture Remove x from the Ο set and include it in the S set.
are computed and are assigned as weight of the nodes in the DFG. Repeat ,until Ο is empty
Then these nodes are sorted based on their weights, and are visited in
an increasing order. Once all the nodes in the current level have been

Fig.4. Reconfigurable folded hardware

3
International Journal of Computer Applications (0975 – 8887)
Volume 2 – No.5, June 2010
4. SIMULATION RESULT Table 6. Comparison of unfolded vs folded hardware

The image input is given to the architecture as 8 bit binary values. LUT Power Adders
For 1D DWT first row filtering is done. Then the output of 1D DWT
is given as the input for the next stage. The same architecture is used Unfolded 186 49.55mW 12
for next stage filtering. In the 2D DWT the image input is fed column reconfigurable
wise to the hardware. hardware
Verilog simulation of the proposed architecture is carried out in Folded 172 40.17mW 9
Model sim SE 6.5. The Model sim results are verified with reconfigurable
MATLAB output to ensure the functionality of the circuit. The hardware
timing information is obtained from Quartus II. Using that delay
The table 6 is the comparison between folded and unfolded
information C code is developed for scheduling. The first step is to
architectures. The folded design requires less number of LUTs,
convert the behavioural description into the data flow graph (DFG).
adders and logic slices than unfolded architecture.
The DFG represents the reconfigurable architecture and the
arithmetic operations are represented by the nodes of the DFG. The 6. CONCLUSION
next step is to schedule the operations depending on the data
availability. Overall speed of the circuit is increased by scheduling. The proposed architecture is a very efficient for fast computation of
DWT. The speed of the architecture is improved by scheduling. This
5.EXPERIMENTAL RESULTS high speed circuit has only few adders and shifters thus reducing the
complexity to a great extent. The future work includes dynamic
The proposed architecture requires less number of adders and no
power reduction technique for reducing the power consumption of
multipliers than previous works given in table4.
switched modules.
Table 4. Comparison of hardware complexity of
proposed architecture with existing works 7. REFERENCES
[1] M. Martina and G. Masera, “Multiplierless, folded 9/7 5/3
Proposed [5] [7] [8] [6] [1] wavelet VLSI architecture,” IEEE Transactions on Circuits and
work Systems II, vol. 54, no. 9, pp. 770–774, Sept. 2007.
[2] D. Tay, “Rationalizing the coefficients of popular biorthogonal
wavelet filters,”IEEE Transactions on Circuits and Systems for
Adders 9 4 43 32 21 19 Video Technology,, vol. 10, no. 6,pp. 998–1005, Sept. 2000.
[3] A. Pande and J. Zambreno, “Design andanalysis of efficient
Multipliers 0 3 0 0 0 0 reconfigurable wavelet filters,” in IEEE International
Conference on Electro/ Information Technology
[4] K. Kotteri, S. Barua, A. Bell, and J. Carletta, “A comparison of
Table 5. Comparison of speed of scheduled vs unscheduled hardware implementations of the biorthogonal 9/7 DWT:
hardware convolution versus lifting,” IEEETransactions on Circuits and
Systems II, vol. 52, no. 5, pp. 256–260, May 2005.
Low High High Low [5] The improved lifting scheme and novel reconfigurable VLSI
architecture for the 5/3 and 9/7 wavelet filtersIEEE Int.
pass pass pass pass Conf.Acoustics, Speech, Signal Processing, vol. 5, May 2004,
5/3 5/3 9/7 9/7 pp. 730–732.
[6] M. Martina and G. Masera, “Low-complexity, efficient 9/7
wavelet filters implementation,”in Proceedings of the Intl.
Scheduled 10μs 13μs 13μs 16 μs Conference on Image Processing (ICIP), Sept. 2005.
Unscheduled 15μs 15μs 28μs 28μs [7] M. Alam, C. Rahman, W. Badawy, and G. Jullien, “Efficient
distributed arithmetic based dwt architecture for multimedia
applications,” in Proceedings of the Intl. Workshop on SoC for
Real Time Applications, 2003, pp. 333–336.
From the table 5 shows the comparison between scheduled and [8] K. A. Kotteri, A. E. Bell, and J. E. Carletta, “Design of
unscheduled architecture. The speed of 9/7 filter has been improved multiplierless.high-performance, wavelet filter banks with image
by scheduling. compression applications,”IEEE Trans. Circuits Syst. I, Reg.
Papers, vol. 51, pp. 483–494,Mar. 2004.

You might also like