My Project

CHAPTER 1
INTRODUCTION
1.1 BACKGROUND
Digital imagery has had an enormous impact on industrial applications and

scientific applications. It is no surprise that image coding has been a subject of great
commercial interest in today’s world. Generally an image is a positive function on a
plane. The value of this function at each point specifies the luminance or brightness of the
picture at that point. Digital images are sample versions of such functions, where the
value of the function is specified only at discrete locations on the image plane, known as
pixels. The value of the luminance at each pixel is represented to a predefined precision
M. Eight bits of precision for luminance is common on imaging applications. The eight-
bit precision is motivated by both the existing memory structures (1 byte=8 bits) as well
as the dynamic range of the human eye. The prevalent custom is that the samples (pixels)
reside on a rectangular lattice, which will be assumed for convenience to be N x N
matrix. The brightness value at each pixel is a number between 0 and 2M-1 .The simplest
binary representation of such an image is a list of the brightness values at each pixel, a
list containing N2M bits.
In some image processing applications, exact reproduction of the image bits is
not necessary. In this case, one can perturb the image slightly to obtain a shorter
representation. Such a code procedure, where perturbations reduce storage requirements
is known as Image lossy coding. Where as in some other applications exact reproduction
of the image is necessary. This is known as lossless compression. However to improve
the time of transmission and the overall efficiency of system the Image is always
compressed before transmission.
1.2 IMAGE COMPRESSION
Uncompressed Image data requires considerable storage capacity and
transmission bandwidth. Despite rapid progress in mass-storage density, processor
speeds, and digital communication system performance, demand for data storage capacity
and data-transmission bandwidth continues to outstrip the capabilities of available
technologies. The recent growth of data intensive multimedia-based web applications has
sustained the need for more efficient way to compress the images.
The wavelet coding method has been recognized as an efficient coding technique for
compression. The wavelet transform decomposes a typical image data to a few
coefficients with large magnitude and many coefficients with small magnitude. Since
most of the energy of the image concentrates on these coefficients with large magnitude,
lossy compression systems just using coefficients with large magnitude can realize both
high compression ratio and the reconstructed image with good quality at the same time. .3
1. 3 PROBLEM STATEMENT
An efficient transmission of data over a channel always remain a challenge as
the speed of transmission and the clarity always effect each other. As the speed of
transmission increases the bits transmitted has got higher tendency towards getting
corrupted. This problem is much severe when comes to image transmission, as the
images are represented in a large matrix dimension. Transmission of such big Matrices
with a high speed always results in poor clarity on reception. To overcome this problem
proper transformation of the image along with image compression is to be done, before
transmission.
The main objective of this project is to use the wavelet transform for lossless data
Compression . In the proposed method, the lifting scheme which is the latest
implementation method of wavelet transformation, is adopted for lossless compression.
In the existing lossless compression methods to get better accuracy bitrate should be
compromised or to get better bitrate accuracy should be compromised. To overcome this
problem in the lossless compression lifting scheme has been proposed .Using lifting
scheme better accuracy can be achieved without compromising on bitrate.
1.4 OBJECTIVE
The wavelet transform, which decomposes a signal into constituent parts in the
time-frequency domain, has been successful in providing high compression ratios while
maintaining good image quality. It is well known that the wavelets can execute
compression with a higher signal-to-noise ratio than DCT (discrete cosine transform) in
the case of low bit-rate. Then, JPEG2000, the next international standard, is being
developed based on wavelet transforms. However, the conventional wavelet transform
needs additional mechanisms to execute lossless compression, of which recovered data is
identical to the original one.
In contrast, the Lifting scheme, which is the latest implementation method of
wavelet transform, doesn't have such problems and is very suitable for the lossless
compression. It can decorrelate the target data in the space domain without Fourier
transform. The considerable advantages of LS are,
(a) simple and fast procedure,
(b) ease of treating integer number, and
(c) ease of obtaining inverse transform.
The advantage (a) means that LS is exceedingly suitable for hardware
implementation, because it uses only addition and multiplication, and requires small
memory for calculations.
From the viewpoint of (b), the conventional wavelet transform has a problem in
mapping integers to integers. It needs an additional mechanism to cancel the rounding
error for lossless compression. LS is, on the other hand, feasible to lossless compression,
because it does not require such mechanisms to treat integer data.
The last advantage (c) makes LS useful in practical implementation.
The lifting scheme is realized in a modular approach with three main modules
namely the split module, predict module, and update module. These modules are
integrated to the image compression system for performing loss less compression.
1. 5 THESIS ORGANIZATION
In chapter one we give brief introduction to image compression and the problems
happening with image compression and revalent survey to image compression. In chapter
two overview of the proposed system is given and brief description to image
compression with types of image compression i.e description about the lossy image
compression and lossless image compression and different types of lossy and lossless
compression . Wavelet compression is discussed and the overview of the lifting scheme
is given. In chapter three the design of the system is discussed .The lifting algorithm is
explained in detail various phases like split , predict and update is explained. In chapter
four implementation of the system is described.
CHAPTER 2
IMAGE COMPRESSION
2.1 0VERVIEW
Compression is a way of encoding digital data so it takes up less storage space

and requires less network bandwidth to be transmitted. To save bandwidth, data image
files are compressed before being transmitted over a network. That’s helpful when
transmitting E-mail or downloading information from the Internet, because compressed
files take up less space and take less time to transfer. Compression eliminates
unnecessary information, such as empty field and redundant data. A compressed file
must be expanded to its original size or a size close to it before it can be used. Whether
one needs to send images over the Internet or squeeze multiple files into a single file for
backup and storage; image compression components will give us the code which one
needs to make the images lean and mean and make ones applications run much faster.
There are two basic types of compression: LOSSY compression and LOSSLESS
compression. One of the important aspects of image storage is its efficient compression.
For example, An image, 1024 pixel x 1024 pixel x 24 bit, without compression, would
require 3 MB of storage and 7 minutes for transmission, utilizing a high speed, 64 Kbit/s,
ISDN line. If the image is compressed at a 10:1 compression ratio, the storage
requirement is reduced to 300 KB and the transmission time drops to under 6 seconds.
Seven 1 MB images can be compressed and transferred to a floppy disk in less time than
it takes to send one of the original files, uncompressed, over an AppleTalk network.
In a distributed environment large image files remain a major bottleneck within
systems. Compression is an important component of the solutions available for creating
file sizes of manageable and transmittable dimensions. Increasing the bandwidth is
another method, but the cost sometimes makes this a less attractive solution. Platform
portability and performance are important in the selection of the
compression/decompression technique to be employed. Compression solutions today are
more portable due to the change from proprietary high-end solutions to accepted and
implemented international standards.
Two categories of data compression algorithm can be distinguished: lossless and
lossy. Lossy techniques cause image quality degradation in each
compression/decompression step. Careful consideration of the human visual perception
ensures that the degradation is often unrecognizable, though this depends on the selected
compression ratio. In general, lossy techniques provide far greater compression ratios
than lossless techniques.
The storage requirement for uncompressed video is 23.6 Megabytes/second (512
pixels x 512 pixels x 3 bytes/pixel x 30 frames/second). With MPEG compression, full-
motion video can be compressed down to 187 kilobytes/second at a small sacrifice in
quality. As computer graphics attain higher resolution and image processing applications
require higher intensity resolution (more bits per pixel), the need for image compression
will increase. Medical imagery is a prime example of images increasing in both spatial
resolution and intensity resolution. Although humans don't need more than 8 bits per
pixel to view gray scale images, computer vision can analyze data of much higher
intensity resolutions.
Compression ratios are commonly present in discussions of data compression. A
compression ratio is simply the size of the original data divided by the size of the
compressed data. A technique that compresses a 1 megabyte image to 100 kilobytes has
achieved a compression ratio of 10.compression ratio = original data/compressed data = 1
M bytes/ 100 k bytes = 10.0
For a given image, the greater the compression ratio, the smaller the final image will be.
2. 2 TYPES OF IMAGE COMPRESSION

There are two basic types of image compression: lossless compression and lossy
compression. A lossless scheme encodes and decodes the data perfectly, and the resulting
image matches the original image exactly. There is no degradation in the process-no data
is lost.
Lossy compression schemes allow redundant and nonessential information to be
lost. Typically with lossy schemes there is a tradeoff between compression and image
quality. You may be able to compress an image down to an incredibly small size but it
looks so poor that it isn't worth the trouble. Though not always the case, lossy
compression techniques are typically more complex and require more computations.
Lossy image compression schemes remove data from an image that the human eye
wouldn't notice. This works well for images that are meant to be viewed by humans. If
the image is to be analyzed by a machine, lossy compression schemes may not be
appropriate. Computers can easily detect the information loss that the human eye may
not. The goal of lossy compression is that the final decompressed image be visually
lossless. Hopefully, the information removed from the image goes unnoticed by the
human eye.
Many people associate huge degradations with lossy image compression. What
they don't realize is that the most of the degradations are small if even noticeable. The
entire imaging operation is lossy, scanning or digitizing the image is a lossy process, and
displaying an image on a screen or printing the hardcopy is lossy. The goal is to keep the
losses indistinguishable.
Which compression technique to use depends on the image data. Some images,
especially those used for medical diagnosis, cannot afford to lose any data. A lossless
compression scheme will need to be used. Computer generated graphics with large areas
of the same color compress well with simple lossless schemes like run length encoding or
LZW. Continuous tone images with complex shapes and shading will require a lossy
compression technique to achieve a high compression ratio. Images with a high degree of
detail that can't be lost, such as detailed CAD drawings, cannot be compressed with lossy
algorithms.
When choosing a compression technique, you must look at more than the
achievable compression ratio. The compression ratio alone tells you nothing about the
quality of the resulting image. Other things to consider are the
compression/decompression time, algorithm complexity, cost and availability of
computational resources, and how standardized the technique is. If you use a compression
method that achieves fantastic compression ratios but you are the only one using it, you
will be limited in your applications. If your images need to be viewed by any hospital in
the world, you better use a standardized compression technique and file format.
If the compression/decompression will be limited to one system or set of systems you
may wish to develop your own algorithm. The algorithms presented in this chapter can be
used like recipes in a cookbook. Perhaps there are different aspects you wish to draw
from different algorithms and optimize them for your specific application
Input Data storage

data ENCODE or DECODE
stream transmission
Figure : A typical data compression system.
Before presenting the compression algorithms, it is needed to define a few terms

used in the data compression world. A character is a fundamental data element in the
input stream. It may be a single letter of text or a pixel in an image file. Strings are
sequences of characters. The input stream is the source of the uncompressed data to be
compressed. It may be a data file or some communication medium. Code words are the
data elements used to represent the input characters or character strings. Also the term
encoding to mean compressing is used. As expected, decoding and decompressing are the
opposite terms. In many of the following discussions, ASCII strings is used as data set.
The data objects used in compression could be text, binary data, or in our case, pixels. It
is easy to follow a text string through compression and decompression examples.
2. 3 LOSSLESS CODING TECHNIQUES
Lossless coding guaranties that the decompressed image is absolutely identical to

the image before compression. This is an important requirement for some application
domains, e.g. medial imaging, where not only high quality is in demand, but unaltered
archiving is a legal requirement. Lossless techniques can also used for the compression of
other data types where loss of information is not acceptable, e.g. text documents and
program executables. Some compression methods can be made more effective by adding
a 1D or 2D delta coding to the process of compression. These deltas make more
effectively use of run length encoding, have (statistically) higher maxima in code tables
(leading to better results in Huffman and general entropy codings), and build greater
equal value areas usable for area coding.
Some of these methods can easily be modified to be lossy. Lossy element fits
perfectly into 1D/2D run length search. Also, logarithmic quantisation may be inserted to
provide better or more effective results.
2.3.1 Run length encoding
Run length encoding is a very simple method for compression of sequential data.
It takes advantage of the fact that, in many data streams, consecutive single tokens are
often identical. Run length encoding checks the stream for this fact and inserts a special
token each time a chain of more than two equal input tokens are found. This special input
advises the decoder to insert the following token n times into his output stream. Run
length coding is easily implemented, either in software or in hardware. It is fast and very
well verifiable, but its compression ability is very limited.
2.3.2 Entropy coding (Lempel/Ziv)
Nowadays, there is a wide range of so-called modified Lempel/Ziv codings. These

algorithms all have a common way of working. The coder and the decoder both build up
an equivalent dictionary of meta symbols, each of which represents a whole sequence of
input tokens. If a sequence is repeated after a symbol was found for it, then only the
symbol becomes part of the coded data and the sequence of tokens referenced by the
symbol becomes part of the decoded data later. As the dictionary is build up based on the
data, it is not necessary to put it into the coded data, as it is with the tables in a Huffman
coder.
This method becomes very efficient even on virtually random data. The average
compression on text and program data is about 1:2, the ratio on image data comes up to
1:8 on the average GIF image. Here again, a high level of input noise degrades the
efficiency significantly. Entropy coders are a little tricky to implement, as there are
usually a few tables, all growing while the algorithm runs. LZ coding is subject to patents
owned by IBM and Unisys (formerly Sperry).
2.3.3 Area coding
Area coding is an enhanced form of run length coding, reflecting the two
dimensional character of images. This is a significant advance over the other lossless
methods. For coding an image it does not make too much sense to interpret it as a
sequential stream, as it is in fact an array of sequences, building up a two dimensional
object. Therefore, as the two dimensions are independent and of same importance, it is
obvious that a coding scheme aware of this has some advantages. The algorithms for area
coding try to find rectangular regions with the same characteristics. These regions are
coded in a descriptive form as an Element with two points and a certain structure. The
whole input image has to be described in this form to allow lossless decoding afterwards.
The possible performance of this coding method is limited mostly by the very high
complexity of the task of finding largest areas with the same characteristics. Practical
implementations use recursive algorithms for reducing the whole area to equal sized sub
rectangles until a rectangle does fulfill the criteria defined as having the same
characteristic for every pixel.
This type of coding can be highly effective but it bears the problem of a nonlinear
method, which cannot be implemented in hardware. Therefore, the performance in terms
of compression time is not competitive, although the compression ratio is.
2. 4 LOSSY CODING TECHNIQUES

In most of applications we have no need in the exact restoration of stored image. This
fact can help to make the storage more effective, and this way we get to lossy
compression methods. Lossy image coding techniques normally have three components:
• Image modelling which defines such things as the transformation to be applied to

the image
• Parameter quantization whereby the data generated by the transformation is
quantised to reduce the amount of information
• encoding, where a code is generated by associating appropriate code words to the
raw data produced by the quantiser.
Each of these operations is in some part responsible for the compression. Image
modelling is aimed at the exploitation of statistical characteristics of the image (i.e. high
correlation, redundancy). Typical examples are transform coding methods, in which the
data is represented in a different domain (for example, frequency in the case of the
Fourier Transform, the Discrete Cosine Transform ,the Kahrunen-Loewe Transform, and
so on), where a reduced number of coefficients contains most of the original information.
In many cases this first phase does not result in any loss of information.
The aim of quantisation is to reduce the amount of data used to represent the information
within the new domain. quantisation is in most cases not a reversible operation: therefore,
it belongs to the so called 'lossy' methods.
Encoding is usually error free. It optimises the representation of the information (helping,
sometimes, to further reduce the bit rate), and may introduce some error detection codes.
2.4.1 Transform coding (DCT/Wavelets/)
A general transform coding scheme involves subdividing an NxN image into smaller nxn
blocks and performing a unitary transform on each sub image. A unitary transform is a
reversible linear transform whose kernel describes a set of complete, orthonormal discrete
basic functions. The goal of the transform is to decorrelate the original signal, and this
decorrelation generally results in the signal energy being redistributed among only a
small set of transform coefficients. In this way, many coefficients may be discarded after
quantisation and prior to encoding. Also, visually lossless compression can often be
achieved by incorporating the HVS contrast sensitivity function in the quantisation of the
coefficients.
Transform coding can be generalized into four stages:
• image subdivision
• image transformation
• coefficient quantisation
• Huffman encoding.
For a transform coding scheme, logical modelling is done in two steps: a segmentation
one, in which the image is subdivided in bidimensional vectors (possibly of different
sizes) and a transformation step, in which the chosen transform (e.g. KLT, DCT,
Hadamard) is applied.
Quantisation can be performed in several ways. Most classical approaches use 'zonal
coding', consisting in the scalar quantisation of the coefficients belonging to a predefined
area (with a fixed bit allocation), and 'threshold coding', consisting in the choice of the
coefficients of each block characterised by an absolute value exceeding a predefined
threshold. Another possibility, that leads to higher compression factors, is to apply a
vector quantisation scheme to the transformed coefficients.
2.4.2 Huffman encoding
This algorithm, developed by D.A. Huffman, is based on the fact that in an input stream
certain tokens occur more often than others. Based on this knowledge, the algorithm
builds up a weighted binary tree according to their rate of occurrence. Each element of
this tree is assigned a new code word, where at the length of the code word is determined
by its position in the tree. Therefore, the token which is most frequent and becomes the
root of the tree is assigned the shortest code. Each less common element is assigned a
longer code word. The least frequent element is assigned a code word which may have
become twice as long as the input token.
The compression ratio achieved by Huffman encoding uncorrelated data becomes

something like 1:2. On slightly correlated data, as on images, the compression rate may
become much higher, the absolute maximum being defined by the size of a single input
token and the size of the shortest possible output token (max. compression = token
size[bits]/2[bits]). While standard palletised images with a limit of 256 colours may be
compressed by 1:4 if they use only one colour, more typical images give results in the
range of 1:1.2 to 1:2.5.
2.4.3Vector quantisation
A vector quantiser can be defined mathematically as a transform operator T from a K-

dimensional Euclidean space R^K to a finite subset X in R^K made up of N vectors. This
subset X becomes the vector codebook, or, more generally, the codebook. An optimum
scalar quantiser was proposed by Lloyd and Max. Later on, Linde, Buzo and Gray
resumed and generalized this method, extending it to the case of a vector quantiser. The
algorithm that they proposed is derived from the KNN clusterisation method, and is
performed by iterating the following basic operations:
• subdivide the training set into N groups (called 'partitions' or 'Voronoi regions'),
which are associated with the N codebook letters, according to a minimum
distance criterion;
• the centroids of the Voronoi regions become the updated codebook vectors;
• compute the average distortion: if the percent reduction in the distortion (as
compared with the previous step) is below a certain threshold, then STOP.
Once the codebook has been designed, the coding process simply consists in the
application of the T operator to the vectors of the original image. In practice, each group
of n pixels will be coded as an address in the vector codebook, that is, as a number from 1
to N.
2.4.4 Segmentation and approximation methods
With segmentation and approximation coding methods, the image is modelled as a

mosaic of regions, each one characterised by a sufficient degree of uniformity of its
pixels with respect to a certain feature (e.g. grey level, texture); each region then has
some parameters related to the characterising feature associated with it.
The operations of finding a suitable segmentation and an optimum set of approximating

parameters are highly correlated, since the segmentation algorithm must take into account
the error produced by the region reconstruction (in order to limit this value within
determined bounds). These two operations constitute the logical modelling for this class
of coding schemes; quantisation and encoding are strongly dependent on the statistical
characteristics of the parameters of this approximation (and, therefore, on the
approximation itself).
Classical examples are polynomial approximation and texture approximation. For

polynomial approximation regions are reconstructed by means of polynomial functions in
(x, y); the task of the encoder is to find the optimum coefficients. In texture
approximation, regions are filled by synthesizing a parameterized texture based on some
model (e.g. fractals, statistical methods, Markov Random Fields [MRF]). It must be
pointed out that, while in polynomial approximations the problem of finding optimum
coefficients is quite simple (it is possible to use least squares approximation or similar
exact formulations), for texture based techniques this problem can be very complex.
2.4.5 Comparison of Different Compression Methods
During the last years, some standardization processes based on transform coding, such as
JPEG, have been started. Performances of such a standard are quite good if compression
factors are maintained under a given threshold (about 20 times). Over this threshold,
artifacts become visible in the reconstruction and tile effect affects seriously the images
decoded, due to quantisation effects of the DCT coefficients.
On the other hand, there are two advantages: first, it is a standard, and second, dedicated
hardware implementations exist. For applications which require higher compression
factors with some minor loss of accuracy when compared with JPEG, different
techniques should be selected such as wavelets coding or spline interpolation, followed
by an efficient entropy encoder such as Huffman, arithmetic coding or vector
quantisation. Some of these coding schemes are suitable for progressive reconstruction
(Pyramidal Wavelet Coding, Two Source Decomposition, etc). This property can be
exploited by applications such as coding of images in a database, for previewing purposes
or for transmission on a limited bandwidth channel.
2. 5 WAVELET COMPRESSION
The fundamental idea behind wavelets is to analyze the signal at different scales
or resolutions, which is called multiresolution. Wavelets are a class of functions used to
localize a given signal in both space and scaling domains. A family of wavelets can be
constructed from a mother wavelet. Compared to Windowed Fourier analysis, a mother
wavelet is stretched or compressed to change the size of the window. In this way, big
wavelets give an approximate image of the signal, while smaller and smaller wavelets
zoom in on details. Therefore, wavelets automatically adapt to both the high frequency
and the low-frequency components of a signal by different sizes of windows. Any small
change in the wavelet representation produces a correspondingly small change in the
original signal, which means local mistakes are not influence the entire transform. The
wavelet transform is suited for nonstationary signals, such as very brief signals and
signals with interesting components at different scales.
Wavelets are functions generated from one single function ψ, which is called
mother wavelet, by dilations and translations

Where ψ must satisfy ∫ ψ (x) dx = 0.
The basic idea of wavelet transform is to represent any arbitrary function f as a
decomposition of the wavelet basis or write f as an integral over a and b of ψa,b .
Let with m, n € integers, and a0>1,b0>0 fixed. Then the wavelet
decomposition is
In image compression, the sampled data are discrete in time. It is required to have
discrete representation of time and frequency, which is called the discrete wavelet
transform (DWT).
Wavelet Transform (WT) was used to analyze non-stationary signals, i.e., whose
frequency response varies in time. Although the time and frequency resolution problems
are results of a physical phenomenon and exist regardless of the transform used, it is
possible to analyze any signal by using an alternative approach called the multi resolution
analysis (MRA). MRA analyzes the signal at different frequencies with different
resolutions. MRA are basically designed to give good time resolution and poor frequency
resolution at high frequencies and good frequency resolution and poor time resolution at
low frequencies. This approach is useful especially when the signal considered has high
frequency components for short durations and low frequency components for long
durations. Which are basically used in practical applications.
2 . 6 THE CONTINUOUS WAVELET TRANSFORM

A continuous wavelet transform is given as:
---------- (1)
Where * denotes complex conjugation. From this equation it can be seen how a function f
(t) is decomposed into a set of basis functions, Ψ s, τ (t) called the wavelets. The variables
s and t , scale and translation, are the new dimensions after the wavelet transform. For
completeness sake (2) gives the inverse wavelet transform.
--------- (2)
The wavelets are generated from a single basic wavelet (t), the so-called mother
wavelet, by scaling and translation:
---------- (3)
In (3) s is the scale factor is the translation factor and the factor s-1/2 is for energy
normalization across the different scales. It is important to note that in (1), (2) and (3) the
wavelet basis functions are not specified. This is a difference between the wavelet
transform and the Fourier transform, or other transforms. The theory of wavelet
transforms deals with the general properties of the wavelets and wavelet transforms only.
It defines a framework within one can design wavelets to taste and wishes.
2. 7 DISCRETE WAVELETS
The wavelet transform has three properties that make it difficult to use directly in
the form of (1). The first is the redundancy of the CWT. In (1) the wavelet transform is
calculated by continuously shifting a continuously scalable function over a signal and
calculating the correlation between the two. It is seen that these scaled functions is
nowhere near an orthogonal basis and the obtained wavelet coefficients is therefore be
highly redundant. For most practical applications this redundancy is removed.
Even without the redundancy of the CWT one still have an infinite number of
wavelets in the wavelet transform and would like to see this number reduced to a more
manageable count. This is the second problem.
The third problem is that for most functions the wavelet transforms have no
analytical solutions and they can be calculated only numerically or by an optical analog
computer. Fast algorithms are needed to be able to exploit the power of the wavelet
transform and it is in fact the existence of these fast algorithms that have put wavelet
transforms where they are today.
Let us start with the removal of redundancy.
As mentioned before the CWT maps a one-dimensional signal to a two-
dimensional time-scale joint representation that is highly redundant. The time-bandwidth
product of the CWT is the square of that of the signal and for most applications, which
seek a signal description with as few components as possible, this is not efficient. To
overcome this problem discrete wavelets have been introduced.
Discrete wavelets are not continuously scalable and translatable but can only be
scaled and translated in discrete steps. This is achieved by modifying the wavelet
Representation (3) to create
---------- (4)
Although it is called a discrete wavelet, it normally is a (piecewise) continuous
function. In (4) j and k are integers and s0 > 1 is a fixed dilation step. The translation
factor τ0 depends on the dilation step. The effect of discretizing the wavelet is that the
time-scale space is now sampled at discrete intervals. It is usually chosen s0 = 2 so that
the sampling of the frequency axis corresponds to dyadic sampling. This is a very natural
choice for computers, the human ear and music for instance. For the translation factor the
value is usually chosen τ0 = 1 so that a dyadic sampling of the time axis is obtained.
When discrete wavelets are used to transform a continuous signal the result is be a
series of wavelet coefficients, and it is referred to as the wavelet series decomposition.
An important issue in such a decomposition scheme is of course the question of
reconstruction. It is all very well to sample the timescale joint representation on a dyadic
grid, but if it is not be possible to reconstruct the signal it is not be of great use. As it
turns out, it is indeed possible to reconstruct a signal from its wavelet series
decomposition. It is proven that the necessary and sufficient condition for stable
reconstruction is that the energy of the wavelet coefficients must lie between two positive
bounds, i.e.
------- (5)
Where || f ||2 is the energy of f(t), A > 0, B < and A, B are independent of f(t). When (5) is
satisfied, the family of basis functions ψ j, k (t) with j, k € Z is referred to as a frame with
frame bounds A and B. When A = B the frame is tight and the discrete wavelets behave
exactly like an orthonormal basis. When A≠B exact reconstruction is still possible at the
expense of a dual frame. In a dual frame discrete wavelet transform the decomposition
wavelet is different from the reconstruction wavelet.
The last step that has to taken is making the discrete wavelets orthonormal. This
can be done only with discrete wavelets. The discrete wavelets can be made orthogonal to
their own dilations and translations by special choices of the mother wavelet, which
means:
--------- (6)
An arbitrary signal can be reconstructed by summing the orthogonal wavelet basis
functions, weighted by the wavelet transform coefficient:
--------- (7)
Equation (7) shows the inverse wavelet transform for discrete wavelets, which is not yet
seen. Orthogonal is not essential in the representation of signals. The wavelets need not
be orthogonal and in some applications the redundancy can help to reduce the sensitivity
to noise or improve the shift invariance of the transform. This is a disadvantage of
discrete wavelets: the resulting wavelet transform is no longer shift invariant, which
means that the wavelet transforms of a signal and of a time-shifted version of the same
signal are not simply shifted versions of each other.
In many practical applications the signal of interest is sampled. In order to use the
results one have achieved so far with a discrete signal that have to make our wavelet
transform discrete too. Remember that our discrete wavelets are not time-discrete, only
the translation- and the scale step are discrete. Simply implementing the wavelet filter
bank as a digital filter bank intuitively seems to do the job. But intuitively is not good
enough.
Stated that the scaling function could be expressed in wavelets from minus
infinity up to a certain scale j. If added a wavelet spectrum to the scaling function
spectrum that is get a new scaling function, with a spectrum twice as wide as the first.
The effect of this addition is that one can express the first scaling function in terms of the
second, because all the information that is need to do this is contained in the second
scaling function. It can express this formally in the so-called multiresolution formulation
or two-scale relation
----------- (8)
The two-scale relation states that the scaling function at a certain scale can be
expressed in terms of translated scaling functions at the next smaller scale. Do not get
confused here: smaller scale means more detail. The first scaling function replaced a set
of wavelets and therefore one can also express the wavelets in this set in terms of
translated scaling functions at the next scale. More specifically it can be written for the
wavelet at level j. Which is the two-scale relation between the scaling function and the
wavelet.
--------- (9)
A signal f (t) could be expressed in terms of dilated and translated wavelets up to
a scale j-1, this leads to the result that f (t) can also be expressed in terms of dilated and
translated scaling functions at a scale j:
--------- (10)
To be consistent in the notation discrete scaling functions are to be considered,
since only discrete dilations and translations are allowed. If in this equation one step up a
scale to j-1, it had to add wavelets in order to keep the same level of detail. It can then
express the signal f (t) as
-------- (11)
If the scaling function φ j, k (t) and the wavelets ψ j, k (t) are orthonormal or a tight
frame, then the coefficients λ j-1(k) and γ j-1(k) are found by taking the inner products
----------- (12)
If φ j ,k (t) and ψ j ,k (t) are replaced in the inner products by suitably scaled and
translated versions of manipulate a bit, keeping in mind that the inner product can also
be written as an integration,
--------- (13)
--------- (14)
These two equations state that the wavelet- and scaling function coefficients on a
certain scale can be found by calculating a weighted sum of the scaling function
coefficients from the previous scale. Now recall from the section on the scaling function
that the scaling function coefficients came from a low pass filter and recall from the
section on sub band coding how it is iterated a filter bank by repeatedly splitting the low-
pass spectrum into a low-pass and a high-pass part. The filter bank iteration started with
the signal spectrum, so if the signal spectrum is the output of a low-pass filter at the
previous (imaginary) scale, then the sampled signal can be considered as the scaling
function coefficients from the previous (imaginary) scale. In other words, the sampled
signal f(k) is simply equal to (k) at the largest scale.
As known from signal processing theory a discrete weighted sum are the same as
a digital filter and since the coefficients λ j (k) come from the low-pass part of the splitted
signal spectrum, the weighting factors h (k) must form a low-pass filter. And since the
coefficients γ j (k) come from the high-pass part of the splitted signal spectrum, the
weighting factors g (k) must form a high-pass filter.
This means that they form one stage of an iterated digital filter bank and from
now on the coefficients h (k) is referred as the scaling filter and the coefficients g(k) as
the wavelet filter.
It is now certain that implementing the wavelet transform as an iterated digital
filter bank is possible and from now on one can speak of the discrete wavelet transform
or DWT. Our intuition turned out to be correct. Because of this one are rewarded with a
useful bonus property of (13) and (14), the sub sampling property. One last look at these
two equations one see that the scaling and wavelet filters have a step-size of 2 in the
variable k. The effect of this is that only every other λ j(k) is used in the convolution, with
the result that the output data rate is equal to the input data rate. Although this is not a
new idea, it has always been exploited in sub band coding schemes, it is kind of nice to
see it pop up here as part of the deal.

The sub sampling property also solves our problem, which had come up at the end
of the section on the scaling function, of how to choose the width of the scaling function
spectrum. Because, every time the iterate the filter bank the number of samples for the
next stage is halved so that in the end one are left with just one sample (in the extreme
case). It is be clear that this is where the iteration definitely has to stop and this
determines the width of the spectrum of the scaling function.
Normally the iteration is stop at the point where the number of samples has
become smaller than the length of the scaling filter or the wavelet filter, whichever is the
longest, so the length of the longest filter determines the width of the spectrum of the
scaling function.
2. 8 WAVELET FILTER
With the redundancy removed, one still have two hurdles to take before, the wavelet
transform in a practical form. Continuing by trying to reduce the number of wavelets
needed in the wavelet transform and save the problem of the difficult analytical solutions
for the end.
Even with discrete wavelets it still needs an infinite number of scalings and
translations to calculate the wavelet transform. The easiest way to tackle this problem is
simply not to use an infinite number of discrete wavelets. Of course this poses the
question of the quality of the transform. Is it possible to reduce the number of wavelets to
analyze a signal and still have a useful result.
The translations of the wavelets are of course limited by the duration of the signal
under investigation so that have an upper boundary for the wavelets.

From (18) it is observed that wavelet has a band-pass like spectrum. From Fourier
theory the compression in time is equivalent to stretching the spectrum and shifting it
upwards:
This means that a time compression of the wavelet by a factor of 2 is stretch the
frequency spectrum of the wavelet by a factor of 2 and also shift all frequency
components up by a factor of 2. Using this insight one can cover the finite spectrum of
our signal with the spectra of dilated wavelets in the same way as that one covered our
signal in the time domain with translated wavelets. To get a good coverage of the signal
spectrum the stretched wavelet spectra should touch each other, as if they were standing
hand in hand. This can be arranged by correctly designing the wavelets.
Fig : Touching Wavelet Spectra resulting from scaling of mother wavelet in the time
domain
Summarizing, if one wavelet can be seen as a band-pass filter, then a series of
dilated wavelets can be seen as a band-pass filter bank. If one look at the ratio between
the center frequency of a wavelet spectrum and the width of this spectrum it is seen that it
is the same for all wavelets. This ratio is normally referred to as the fidelity factor Q of a
filter and in the case of wavelets one speaks therefore of a constant-Q filter bank
2. 9 SCALING FUNCTION
The scaling function was introduced by Mallat. It is sometimes referred to as the
averaging filter Because of the low-pass nature of the scaling function spectrum.
If the scaling function is considered as being just a signal with a low-pass spectrum,
then it can be decompose in wavelet components and expressed as (7):
---------- (15)
Since the scaling function (t) is selected in such a way that its spectrum neatly fitted
in the space left open by the wavelets, the expression (15) uses an infinite number of
wavelets up to a certain scale j as shown in figure 2.2. This means to analyze a signal
using the combination of scaling function and wavelets, the scaling function by itself
takes care of the spectrum otherwise covered by all the wavelets up to scale j, while the
rest is done by the wavelets. In this way limited the number of wavelets from an infinite
number to a finite number are obtained.
Fig : An infinite set of wavelets replaced by one scaling Function

By introducing the scaling function that have circumvented the problem of the
infinite number of wavelets and set a lower bound for the wavelets. Of course when using
a scaling function instead of wavelets, information is lost. That is to say, from a signal
representation point of view one do not loose any information, since it is still be possible
to reconstruct the original signal, but from a wavelet analysis point of view possible
valuable scale information is discarded. The width of the scaling function spectrum is
therefore an important parameter in the wavelet transform design. The shorter its
spectrum the more wavelet coefficients you is have and the more scale information. But,
as always, there is be practical limitations on the number of wavelet coefficients you can
handle. Later on, in the discrete wavelet transform this problem is more or less
automatically solved. The low-pass spectrum of the scaling function allows us to state
some sort of admissibility condition similar to (19)
∫ φ (t) dt =1 -------- (16)
Which shows that the 0th moment of the scaling function cannot vanish.
Summarizing once more, if one wavelet can be seen as a band-pass filter and a
scaling function is a low pass filter, then a series of dilated wavelets together with a
scaling function can be seen as a filter bank.
2. 10 SUB-BAND ANALYSIS
A time-scale representation of a digital signal is obtained using digital filtering
Techniques. Recall that the CWT is a correlation between a wavelet at different scales
and the signal with the scale (or the frequency) being used as a measure of similarity. The
continuous wavelet transform was computed by changing the scale of the analysis
window, shifting the window in time, multiplying by the signal, and integrating over all
times. In the discrete case, filters of different cutoff frequencies are used to analyze the
signal at different scales. The signal is passed through a series of high pass filters to
analyze the high frequencies, and it is passed through a series of low pass filters to
analyze the low frequencies.
The resolution of the signal, which is a measure of the amount of detail
information in the signal, is changed by the filtering operations, and the scale is changed
by up sampling and down sampling (sub sampling) operations. Sub sampling a signal
corresponds to reducing the sampling rate, or removing some of the samples of the signal.
For example, sub sampling by two refers to dropping every other sample of the signal.
Sub sampling by a factor n reduces the number of samples in the signal n times.
Up sampling a signal corresponds to increasing the sampling rate of a signal by
adding new samples to the signal. For example, up sampling by two refers to adding a
new sample, usually a zero or an interpolated value, between every two samples of the
signal. Up sampling a signal by a factor of n increases the number of samples in the
signal by a factor of n.
Although it is not the only possible choice, DWT coefficients are usually sampled
from the CWT on a dyadic grid, i.e., s0 = 2 and ∏ 0 = 1, yielding s=2j and ∏ =k*2j . Since
the signal is a discrete time function, the terms function and sequence is be used
interchangeably in the following discussion. This sequence is be denoted by x[n], where
n is an integer.
The procedure starts with passing this signal (sequence) through a half band digital
low pass filter with impulse response h [n]. Filtering a signal corresponds to the
mathematical operation of convolution of the signal with the impulse response of the
filter. The convolution operation in discrete time is defined as follows:
A half band low pass filter removes all frequencies that are above half of the
highest frequency in the signal. For example, if a signal has a maximum of 1000 Hz
component, then half band low pass filtering removes all the frequencies above 500 Hz.
The unit of frequency is of particular importance at this time. In discrete signals,
frequency is expressed in terms of radians. Accordingly, the sampling frequency of the
signal is equal to 2∏ radians in terms of radial frequency. Therefore, the highest
frequency component that exists in a signal is be ∏ radians, if the signal is sampled at
Nyquist’s rate (which is twice the maximum frequency that exists in the signal); that is,
the Nyquist’s rate corresponds to ∏ rad/s in the discrete frequency domain. Therefore
using Hz is not appropriate for discrete signals. However, Hz is used whenever it is
needed to clarify a discussion, since it is very common to think of frequency in terms of
Hz. It should always be remembered that the unit of frequency for discrete time signals is
radians.
After passing the signal through a half band low pass filter, half of the samples
can be eliminated according to the Nyquist’s rule, since the signal now has a highest
frequency of ∏/2 radians instead of ∏ radians. Simply discarding every other sample is
sub sample the signal by two, and the signal is then have half the number of points. The
scale of the signal is now doubled. Note that the low pass filtering removes the high
frequency information, but leaves the scale unchanged. Only the sub sampling process
changes the scale. Resolution, on the other hand, is related to the amount of information
in the signal, and therefore, it is affected by the filtering operations. Half band low pass
filtering removes half of the frequencies, which can be interpreted as losing half of the
information. Therefore, the resolution is halved after the filtering operation. Note,
however, the sub sampling operation after filtering does not affect the resolution, since
removing half of the spectral components from the signal makes half the number of
samples redundant anyway. Half the samples can be discarded without any loss of
information. In summary, the low pass filtering halves the resolution, but leaves the scale
unchanged. The signal is then sub sampled by 2 since half of the number of samples is
redundant. This doubles the scale.
This procedure can mathematically be expressed as
Having said that, now looking at how the DWT is actually computed: The DWT
analyzes the signal at different frequency bands with different resolutions by
decomposing the signal into coarse approximation and detail information. DWT employs
two sets of functions, called scaling functions and wavelet functions, which are
associated with low pass and high pass filters, respectively. The decomposition of the
signal into different frequency bands is simply obtained by successive high pass and low
pass filtering of the time domain signal. The original signal x[n] is first passed through a
half band high pass filter g[n] and a low pass filter h[n]. After the filtering, half of the
samples can be eliminated according to the Nyquist’s rule, since the signal now has a
highest frequency of ∏ /2 radians instead of ∏. The signal can therefore be sub sampled
by 2, simply by discarding every other sample. This constitutes one level of
decomposition and can mathematically be expressed as follows:
Where yhigh [k] and ylow [k] are the outputs of the high pass and low pass filters,
respectively, after sub sampling by 2. This decomposition halves the time resolution since
only half the number of samples now characterizes the entire signal. However, this
operation doubles the frequency resolution, since the frequency band of the signal now
spans only half the previous frequency band, effectively reducing the uncertainty in the
frequency by half. The above procedure, which is also known as the sub band coding, can
be repeated for further decomposition. At every level, the filtering and sub sampling is
result in half the number of samples (and hence half the time resolution) and half the
frequency band spanned (and hence double the frequency resolution). Figure 2.3
illustrates this procedure, where x [n] is the original signal to be decomposed, and h[n]
and g[n] are low pass and high pass filters, respectively. The bandwidth of the signal at
every level is marked on the figure as "f".

Fig: Decomposition of signal x[n] into low pass and high pass filters h[n] and g[n]
The Sub band Coding Algorithm as an example, suppose that the original signal
x[n] has 512 sample points, spanning a frequency band of zero to ∏ rad/s. At the first
decomposition level, the signal is passed through the high pass and low pass filters,
followed by sub sampling by 2. The output of the high pass filter has 256 points (hence
half the time resolution), but it only spans the frequencies ∏/2 to ∏ rad/s (hence double
the frequency resolution). These 256 samples constitute the first level of DWT
coefficients. The output of the low pass filter also has 256 samples, but it spans the other
half of the frequency band, frequencies from 0 to ∏/2 rad/s. This signal is then passed
through the same low pass and high pass filters for further decomposition. The output of
the second low pass filter followed by sub sampling has 128 samples spanning a
frequency band of 0 to ∏/4 rad/s, and the output of the second high pass filter followed
by sub sampling has 128 samples spanning a frequency band of ∏/4 to ∏/2 rad/s. The
second high pass filtered signal constitutes the second level of DWT coefficients. This
signal has half the time resolution, but twice the frequency resolution of the first level
signal. In other words, time resolution has decreased by a factor of 4, and frequency
resolution has increased by a factor of 4 compared to the original signal. The low pass
filter output is then filtered once again for further decomposition. This process continues
until two samples are left. For this specific example there would be 8 levels of
decomposition, each having half the number of samples of the previous level. The DWT
of the original signal is then obtained by concatenating all coefficients starting from the
last level of decomposition (remaining two samples, in this case). The DWT is then have
the same number of coefficients as the original signal.
The frequencies that are most prominent in the original signal is appear as high
amplitudes in that region of the DWT signal that includes those particular frequencies.
The difference of this transform from the Fourier transform is that the time localization of
these frequencies is not be lost. However, the time localization is have a resolution that
depends on which level they appear. If the main information of the signal lies in the high
frequencies, as happens most often, the time localization of these frequencies is be more
precise, since they are characterized by more number of samples. If the main information
lies only at very low frequencies, the time localization is not be very precise, since few
samples are used to express signal at these frequencies. This procedure in effect offers a
good time resolution at high frequencies, and good frequency resolution at low
frequencies. Most practical signals encountered are of this type.
2.10.1 SUB BAND CODING
Two of the three problems mentioned in above section have now been resolved,
but one still does not know how to calculate the wavelet transform. If regarded the
wavelet transform as a filter bank, then considering the wavelet transforming a signal as
passing the signal through this filter bank. The outputs of the different filter stages are
the wavelet and scaling function transform coefficients. Analyzing a signal by passing it
through a filter bank is not a new idea and has been around for many years under the
name sub band coding. It is used for instance in computer vision applications.
Fig : Splitting the signal spectrum with an iterated filter bank
The filter bank needed in sub band coding can be built in several ways. One way
is to build many band pass filters to split the spectrum into frequency bands. The
advantage is that the width of every band can be chosen freely, in such a way that the
spectrum of the signal to analyze is covered in the places where it might be interesting.
The disadvantage is that to design every filter separately and this can be a time
consuming process. Another way is to split the signal spectrum in two (equal) parts, a
low pass and a high-pass part. The high-pass part contains the smallest details that are
interested in and could stop here. However, the low-pass part still contains some details
and therefore it can be split again. And again, until a satisfactory number of bands are
have created. In this way an iterated filter bank can be created.
Fig : Implementation of one stage iterated filter banks
Usually the number of bands is limited by for instance the amount of data or
computation power available. The process of splitting the spectrum is shown in figure
2.5. The advantage of this scheme is to design only two filters whereas the disadvantage
is; only signal spectrum coverage is fixed.
2.11 WAVELET PROPERTIES

The most important properties of wavelets are the admissibility and the regularity
conditions and these are the properties, which gave wavelets their name. It can be shown
that square integrable functions (t) satisfying the admissibility condition,

---------- (17)
can be used to first analyze and then reconstruct a signal without loss of information. In
(17) Ψ(ω) stands for the Fourier transform of ψ (t) the admissibility condition implies
that the Fourier transform of (t) vanishes at the zero frequency, i.e.
----------- (18)
This means that wavelets must have a band-pass like spectrum. This is a very
important observation, which is be used later on to build an efficient wavelet transform.
A zero at the zero frequency also means that the average value of the wavelet in the time
domain must be zero,
∫ Ψ (t) dt=0 ---------- (19)
and therefore it must be oscillatory. In other words, ψ (t) must be a wave.
As can be seen from (1) the wavelet transform of a one-dimensional function is
two-dimensional; the wavelet transform of a two-dimensional function is four-
dimensional. The time-bandwidth product of the wavelet transform is the square of the
input signal and for most practical applications this is not a desirable property. Therefore
one imposes some additional conditions on the wavelet functions in order to make the
wavelet transform decrease quickly with decreasing scale s. These are the regularity
conditions and they state that the wavelet function should have some smoothness and
concentration in both time and frequency domains. Regularity is a quite complex concept
and is try to explain it a little using the concept of vanishing moments.

If the wavelet transform (1) is expanded into the Taylor series at t = 0 until order n (let =
0 for simplicity):
------- (20)
Here f (p) stands for the pth derivative of f and O(n+1) means the rest of the expansion.
Now, if the moments of the wavelet is defined by Mp, Mp = ∫ t p ψ(t) dt ------- (21) then it
can rewrite (20) into the finite development
-------- (22)
From the admissibility condition, already has that the 0th moment M0 = 0 so that the
first term in the right-hand side of (22) is zero. If it is now manage to make the other
moments up to Mn zero as well, then the wavelet transform coefficients (s, ) is decay as
fast as sn+2 for a smooth signal f(t). If a wavelet has N vanishing moments, then the
approximation order of the wavelet transform is also N. The moments do not have to be
exactly zero, a small value is often good enough. In fact, experimental research suggests
that the number of vanishing moments required depends heavily on the application.
Summarizing, the admissibility condition gave us the wave, regularity and vanishing
moments gave us the fast decay or the let, and put together they give us the wavelet.
2.12 MEASURING FREQUENCY CONTENT BY WAVELET
TRANSFORM
Wavelet transform is capable of providing the time and frequency information
simultaneously. Hence it gives a time-frequency representation of the signal. When one is
interested in knowing what spectral component exists at any given instant of time, to
know the particular spectral component at that instant. In these cases it may be very
beneficial to know the time intervals these particular spectral components occur.
Wavelets (small waves) are functions defined over a finite interval and having an average
value of zero. The basic idea of the wavelet transform is to represent any arbitrary
function ƒ(t) as a superposition of a set of such wavelets or basis functions. These basis
functions are obtained from a single wave, by dilations or contractions (scaling) and
translations (shifts). The discrete wavelet transform of a finite length signal x(n) having N
components, for example, is expressed by an N x N matrix similar to the discrete cosine
transform .
2.13 WAVELET DECOMPOSITION

There are several ways wavelet transforms can decompose a signal into various sub
bands. These include uniform decomposition, octave-band decomposition, and adaptive
or wavelet-packet decomposition. Out of these, octave-band decomposition is the most
widely used.
The procedure is as follows: wavelet has two functions “wavelet “and “scaling
function”. They are such that there are half the frequencies between them. They act like a
low pass filter and a high pass filter. Figure 2-6 shows a typical decomposition scheme.
The decomposition of the signal into different frequency bands is simply obtained by
successive high pass and low pass filtering of the time domain signal. This filter pair is
called the analysis filter pair. First, the low pass filter is applied for each row of data,
thereby getting the low frequency components of the row. But since the low pass filter is
a half band filter, the output data contains frequencies only in the first half of the original
frequency range. By Shannon's Sampling Theorem, they can be sub-sampled by two, so
that the output data now contains only half the original number of samples. Now, the
high8 pass filter is applied for the same row of data, and similarly the high pass
components are separated.
Fig : Two-dimensional,four-band filter bank for subband image coding.
This is a non-uniform band splitting method that decomposes the lower frequency
part into narrower bands and the high-pass output at each level is left without any further
decomposition. This procedure is done for all rows. Next, the filtering is done for each
column of the intermediate data. The resulting two-dimensional array of coefficients
contains four bands of data, each labeled as LL (low-low), HL (high-low), LH (low-high)
and HH (high-high). The LL band can be decomposed once again in the same manner,
thereby producing even more sub bands. This can be done up to any level, thereby
resulting in a pyramidal decomposition as shown in Figure 2-6.
(a)
(b)
(c) (d)
(e)
Fig : (a) (b) (c) (d) (e) Pyramidal Decomposition of ‘Barbara’ image
The LL band is decomposed thrice as shown in figure 2.7(d). The compression ratios
with wavelet-based compression can be up to 300-to-1, depending on the number of
iterations. The LL band at the highest level is most important, and the other 'detail' bands
are of lesser importance, with the degree of importance decreasing from the top of the
pyramid to the bands at the bottom.
2.14 INVERSE WAVELET TRANSFORM

The inverse fast wavelet transform can be computed iteratively using digital
filters. The figure below shows the required synthesis or reconstruction filter bank, which
reverses the process of the analysis or decomposition filter bank of the forward process.
At each iteration, four scale j approximation and detail sub images are up sampled and
convolved with two one dimensional filters-one operating on the sub images columns and
the other on its rows. Addition of the results yields the scale j +1 approximation, and the
process is repeated until the original image is reconstructed. The filters used in the
convolutions are a function of the wavelets employed in the forward transform.

2 hψ(m)
D
Wψ (j,m,n)
Rows + 2 hψ(m)
(along m)
hø(m)
2 Columns Wø (j+1,m,n)
WψV (j,m,n) (along n)
Rows
+
WψH (j,m,n) 2 hψ(m)

Rows + 2 hø(m)
hø(m)
H
Wø (j,m,n) 2 Columns
Rows
Fig : The Inverse wavelet Transform
Inverse Fourier Transform

The Fourier transform relates a signal's time and frequency domain
representations to each other. The direct Fourier transform (or simply the Fourier
transform) calculates a signal's frequency domain representation from its time-domain
variant. The inverse Fourier transform finds the time-domain representation from the
frequency domain.
A function F(ω) is called the Fourier transform of f(x), if
exist.
is called the inverse Fourier transform of F(ω).
The Fourier transform of f is therefore a function F{f(t)} of the new variable ω. This
function, evaluated at ω, is F(ω).
Wavelet algorithms are recursive. The output of one step of the algorithm becomes the
input for the next step. The initial input data set consists of 2n elements. Each successive
step operates on 2n-i elements, were i = 1 ... n-1. For example, if the initial data set
contains 128 elements, the wavelet transform will consist of seven steps on 128, 64, 32,
16, 8, 4, and 2 elements.
On this web page stepj+1 follows stepj. If element i in step j is being updated, the notation
is stepj,i. The forward lifting scheme wavelet transform divides the data set being
processed into an even half and an odd half. In the notation below even i is the index of
the ith element in the even half and oddi is the ith element in the odd half . Viewed as a
continuous array (which is what is done in the software) the even element would be a[i]
and the odd element would be a[i+(n/2)].
Another way to refer to the recursive steps is by their power of two. This notation is used
in Ripples in mathematics. Here stepj-1 follows stepj, since each wavelet step operates on
a decreasing power of two. This is a nice notation, since the references to the recursive
step in a summation also correspond to the power of two being calculated.
The wavelet Lifting Scheme is a method for decomposing wavelet transforms into a set
of stages. Lifting scheme algorithms have the advantage that they do not require
temporary arrays in the calculation steps, as is necessary for some versions of the wavelet
algorithm. Lossless compression is a compression technique that does not lose any data in
the compression process. Lossless compression "packs data" into a smaller file size by
using a kind of internal shorthand to signify redundant data. If an original file is 1.5MB
(megabytes), lossless compression can reduce it to about half that size, depending on the
type of file being compressed. This makes lossless compression convenient for
transferring files across the Internet, as smaller files transfer faster. Lossless compression
is also handy for storing files as they take up less room. The zip convention, used in
programs like WinZip, uses lossless compression. For this reason zip software is popular
for compressing program and data files. That's because when these files are
decompressed, all bytes must be present to ensure their integrity. If bytes are missing
from a program, it won't run. If bytes are missing from a data file, it will be incomplete
and garbled. GIF image files also use lossless compression. Lossless compression has
advantages and disadvantages.
The advantage is that the compressed file will decompress to an exact duplicate of the
original file, mirroring its quality. The disadvantage is that the compression ratio is not all
that high, precisely because no data is lost. To get a higher compression ratio -- to reduce
a file significantly beyond 50% -- you must use lossy compression. Lossy compression
will strip a file of some of its redundant data. Because of this data loss, only certain
applications are fit for lossy compression, like graphics, audio, and video. Lossy
compression necessarily reduces the quality of the file to arrive at the resulting highly
compressed size, but depending on the need, the loss may acceptable and even
unnoticeable in some cases. JPEG uses lossy compression, which is why converting a
GIF file to JPEG will reduce it in size. It will also reduce the quality to some extent.
Lossless and lossy compression have become part of our every day vocabulary largely
due to the popularity of MP3 music files. A standard sound file in WAV format,
converted to a MP3 file will lose much data as MP3 employs a lossy, high-compression
algorithm that tosses much of the data out.
This makes the resulting file much smaller so that several dozen MP3 files can fit, for
example, on a single compact disk, verses a handful of WAV files. However the sound
quality of the MP3 file will be slightly lower than the original WAV, noticeably so to
some. As always, whether compressing video, graphics or audio, the ideal is to balance
the high quality of lossless compression against the convenience of lossy compression.
Choosing the right lossy convention is a matter of personal choice and good results
depend heavily on the quality of the original file.
CHAPTER 3
WAVELET-LIFTING SCHEME
The lifting scheme is a new method for constructing wavelets. The main difference with
classical constructions is that it is does not rely on the Fourier transform. This way,
lifting can be used to construct second generation wavelets, i. e., wavelets which are not
necessarily translations and dilations of one function. The latter we refer to as first
generation wavelets or classical wavelets. Since the lifting scheme does not depend on
the Fourier transform, it has applications in the following examples:
Wavelets on bounded domains: The construction of wavelets on an interval is needed to

transform finite length signals without introducing artifacts in the boundaries. The
remainder of this paper gives more details.
Wavelets on curves and surfaces: This case is related to solving equations on curves or
surfaces or analysis of data that live on curves or surfaces.
Weighted wavelets: Wavelets biorthogonal with respect to a weighted inner product are
needed for diagonalization of differential operators and weighted approximation.
Wavelets and irregular sampling: Many real life problems require basis functions and
transforms adapted to irregular sampled data.
It can be seen that wavelets adapted to these settings cannot be formed by translation and
dilation. The Fourier transform can thus no longer be used as a construction tool. The
lifting scheme provides an alternative.
The basic idea behind lifting scheme is very simple. Begin with a trivial wavelet, the
Lazy wavelet a function which essentially does not calculate anything, but which has the
formal properties of a wavelet. The lifting scheme then gradually builds a new wavelet,
with improved properties. This is the inspiration behind the name lifting scheme. To
explain the way the lifting scheme works, refer to the block diagram that shows the three
stages of lifting: split, predict, and update. Starting with a simple case.
Then develop a general framework to create any type of wavelet. Suppose sample a
signal f(t) with sampling distance 1. Denote the original samples as λ0,k = f(k)
for k Є Z. Decorrelate this signal. In other words, see if it is possible to capture the
information contained in this signal with fewer coefficients, i.e., coefficients with a larger
sampling distance. A more compact representation is needed in applications such as data
compression. May be it will not be possible to exactly represent the signal with fewer
coefficients but instead find an approximation within an acceptable error bound. Thus
want to have precise control over the information which is lost by using fewer
coefficients. Obviously, the difference between the original and approximated signals to
be small.
3.1 THE LIFITING ALGORITHM

3.1.1 Split phase
The number of coefficients can be reduced by simply sub sampling the even samples and
obtaining a new sequence given by
λ-1,k = λ0,2k for k Є Z
- γ j,k
λ j+1,k
split Predict Update
+
λ j,k
+
+
Figure : Lifting scheme
where negative indices are used because the convention is that the smaller the data set,
the smaller the index. Information lost should also be kept track of. In other words, which
extra information is needed to recover the original set{λ0,k}from the set{ λ-1,k}.
Coefficients {γ-1,k}are used to encode this difference and refer to them as wavelet
coefficients. Many different choices are possible and, depending on the statistical
behavior of the signal, one will be better than the other. Better means smaller wavelet
coefficients. The most naive trivial choice would be to say that the lost information is
simply contained in the odd coefficients, γ-1,k = λ0,2k+1 for k Є Z. This choice corresponds
to the Lazy wavelet. Indeed, not much is done except for sub sampling the signal in even
and odd samples. Obviously this will not decorrelate the signal. The wavelet coefficients
are only small when the odd samples are small and there is no reason what so ever why
this should be the case. No restriction should be imposed on how the data set should be
split, nor on the relative size of each of the subsets. Simply a procedure is needed to join
{λ-1,k} and {γ-1,k} again into the original data set {λ0,k}. The easiest possibility for the split
is a simply brutal cut of the data set into two disjoint parts, but a split between even and
odd samples is a better choice. Choice of Lazy wavelet is better. The next step, the
predict, will help to find a more elaborate scheme to recover the original samples {λ 0,k}
from the sub sampled coefficients {λ-1,k}.
3.1.2 Predict phase
A more compact representation of {λ0,k}should be obtained. Consider the case where
{λ-1,k} does not contain any information (e. g. that part of the signal is equal to zero).
Then a more compact representation is obtained since {λ0,k}can be replaced with the
smaller set {λ-1,k}.Indeed, the extra part needed to reassemble {λ0,k} does not contain any
information. The previous situation hardly ever occurs in practice. Therefore, a way is
needed to find the odd samples {γ-1,k}.The even samples {λ0,2k} can immediately be found
as λ0,2k = λ-1,k. On the other hand the odd samples {λ0,2k+1}are predicted based on the
correlation present in the original data. If prediction operator P can be found independent
of the data, so that
γ-1.k = P(λ-1,k)
then again original data set can be replaced with {λ-1,k}, now missing part can be
predicted to reassemble {λ0,k}. The construction of the prediction operator is typically
based on some model of the data which reacts its correlation structure. Obviously, the
prediction operator P cannot be dependent on the data, otherwise the information would
be hidden in P.
Again, in practice, it might not be possible to exactly predict {γ-1,k} based on {λ-1,k}.
However, P(λ-1,k) is likely to be close to {γ-1,k}. Thus, {γ-1,k} is replaced with the
difference between itself and its predicted value P(λ-1,k). If the prediction is reasonable,
this difference will contain much less information than the original {γ-1,k} set. This
abstract difference operator is represented with a - sign and thus get
γ-1,k := λ0,2k+1 - P(λ-1,k)
The wavelet subset now encodes how much data deviates from the model on which P was
built. If the signal is some how correlated, the majority of the wavelet coefficients is
small. An insight is obtained on how to split the original data set. Indeed, in order to get
maximal data reduction from prediction, subsets {λ-1,k} and {γ-1,k}should be maximally
correlated. Cutting the signal into left and right parts might not be the best idea since
values on the far left and the far right are hardly correlated. Predicting the right half of the
signal based on the left is thus a tough job. A better method is to interlace the two sets, as
done in the previous step. Now , the original data set can be replaced with the smaller set
{λ-1,k} and the wavelet set {γ-1,k}. With a good prediction, the two subsets {λ-1,k,γ-1,k} yield
a more compact representation than the original set {λ0,k}.To find a good prediction
operator, again maximal correlation is assumed among neighboring samples. Hence odd
sample λ0,2k+1 can be predicted as the average of its two (even) neighbors: λ-1,k and λ-1,k+1.
The difference with this prediction scheme then becomes
γ-1,k = λ0,2k+1 – 1/2(λ-1,k + λ-1,k+1)
The model used to build P is a function piecewise linear over intervals of length 2. If the
original signal complies with the model, all wavelet coefficients in {γ-1,k} are zero. In
other words, the wavelet coefficients measure to which extent the original signal fails to
be linear. Their expected value is small. In terms of frequency content, the wavelet
coefficients capture high frequencies present in the original signal. These wavelet
coefficients are used. They encode the detail needed to go from the {λ-1,k}coefficients to
the{λ0,k}. They capture the high frequencies present in the original signal while the
{λ-1,k}some how capture the low frequencies. This scheme can now be iterated . Split
{λ-1,k} into two subsets {λ-2,k}and{γ-2,k} (by sub sampling) and then replace {γ-2,k}with the
difference between {γ-2,k}and P(λ-2,k. After n steps, the original data set is replaced with
the wavelet representation. {λ-n,k ,γ-n,k …….,γ-1,k}Given that the wavelet sets encode the
difference with some predicted value based on a correlation model, this is likely to give a
more compact representation.
Different prediction functions
The prediction does not necessarily have to be linear. Failure can be found to be cubic
and any other higher order. This introduces the concept of interpolating subdivision. An
extension of original sampled function is defined to a function defined on the whole real
line. Some value N is used to denote the order of the subdivision (interpolation) scheme.
For instance, to find a piecewise linear approximation, use N equal to 2. To find a cubic
approximation N should be equal to 4. It can be seen that N is important because it sets
the smoothness of the interpolating function used to find the wavelet coefficients (high
frequencies). This function is referred as the dual wavelet and to N as the number of dual
vanishing moments.
Linear interpolation Cubic interpolation
Figure : linear and cubic interpolation

Consider fancier interpolation schemes than the piecewise linear. Say, instead of defining
the new value at the midpoint between two old values as a linear interpolation of the
neighboring values, use two neighboring values on either side and define the (unique)
cubic polynomial p(x) which interpolates those four values
λj,k-1 = p(xj,k-1), λj,k = p(xj,k), λj,k+1=p(xj,k+1), λj,k+2=p(xj,k+2)
The new sample value (odd index) will then be the value that this cubic polynomial takes
on at the midpoint, while all odd samples (even index) are preserved
λj+1,2k=λj,k λj+1,2k+1=p(xj+1,2k+1)
Even though each step in the subdivision involves cubic polynomials, the limit function
will not be a polynomial anymore. While there is no sense yet as to what the limit
function looks like, it is easy to see that it can reproduce cubic polynomials. Assume that
the original sequence of samples came from some given cubic polynomial. In that case
the interpolating polynomial over each set of 4 neighboring sample values will be the
same polynomial and all newly generated samples will be on the original cubic
polynomial, in the limit reproducing it. In general, use N (N even) samples and build
polynomials of degree N-1. Then the order of the subdivision scheme is N.
The prediction function P, thus, uses polynomial interpolation of order N - 1 to find the
predicted values. The higher the order of this function, the better approximation of the
coefficients based on the coefficients. This is good if it is known that the original data set
resembles some polynomial of order N-1, as said before. Then, the {γj,k} set is going to be
zero or very small, for there is almost no difference between the original data and the
predicted values. What makes interpolating subdivision so attractive from an
implementation point of view is that only a routine is needed which can construct an
interpolating polynomial given some numbers and locations. The new sample
k=1 k=2 k=3 k=4 k=1 k=2 k=3 k=4 k=1 k=2 k=3 k=4
unaffected by boundary unaffected by boundary affected by boundary
Figure : Behaviour of cubic interpolation subdivision near boundary
value is then simply given by the evaluation of this polynomial at the new, refined
location. The algorithm that best adapts to the interpolating subdivision scheme is
Neville's algorithm. Notice also that nothing in the definition of this procedure requires
the original samples to be located at integers. This feature can be used to define scaling
functions over irregular subdivisions, which is not part of the scope of this paper. This
interpolation scheme allows to easily accommodate interval boundaries for finite
sequences. For the cubic construction described 1 sample is taken on the left and 3 on the
right at the left boundary of an interval. The cases are similar at the right boundary. As
soon the calculation of new γ values start near the right boundary, will be having less λ
coefficients on the right side and more on the left. If the γ coefficient is at the right
boundary, there are no λ coefficients on the right. All of them will be on the left. A list of
all the possible cases when N = 4 is the following:
Case 1: Near Left Boundary: More λ coefficients on the right side of the γ coefficient
than on the left side.
1λ on the left and 3 λ's on the right (remember that due to the splitting, always λ is in the
first position)
Case 2: Middle: Enough λ coefficients on either side of the γ coefficient.
2 λ's on the left and 2 λ's on the right
Case 3: Near Right Boundary: More λ coefficients on the left side of the λ coefficient
than on the right side.
3 λ's on the left and 1 λ on the right
4 λ's on the left and 0 λ's on the right
Using the interpolation scheme and Neville's algorithm, a set of coefficients are generated
that will help to find the correct approximation using a function of order N -1. For
example, if N = 2, then two coefficients are needed for the two possible cases (one λ on
the left and one on the right, and 2 λ's on the left and none on the right).If N = 4, four
coefficients are needed for each one of the four cases as mentioned. These coefficients
will be called filter coefficients.
Due to symmetry, it is known that all the cases on the right side are the opposite to the
cases on the left side .For example, the coefficients used for the case 3 λ's on the left and
1 λ on the right are the same as the ones used in the case 1λ on the left and 3λ's on the
right", but in opposite order. Thus, a total of N/2+1 different cases (one for the middle
case and N=2 for the symmetric boundary cases. That is, when there are two λ's on either
side and when there are one λ on the left and three λ's on the right. They are referred as
cases (a) and (b).Since there is a unique cubic interpolation (it does not matter how
separated the samples are, always they have the same interpolating cubic function), the
set of coefficients that will help to predict any γ every time should be known.
The basic idea is the following: N is equal to 4; therefore, there are 4 coefficients for
every case. To calculate, c1 put its value to 1 and all the other three, c2, c3 and c4, to
zero. Construct the polynomial that best fits the available resources and start evaluating
the function at the points. For case (a) evaluate the function where two coefficients are to
the left and two to the right. For case (b) evaluate the function where there is one
coefficient on the left and three on the right. The procedure is the same with the other
coefficients. Tables list the filter coefficients needed for the interpolation with N = 2 and
4. One property of these filter coefficients is that every set of N coefficients for every
case adds up to 1.The prediction phase thus gets reduced to a lookup in the previous
tables in order to be able to calculate the wavelet coefficients. For example, if to predict a
γ value using N = 4 and three λ coefficients on the left and one λ on the right, we would
perform the following operation
γ-j,k = λ-j+1,k = (0.0625*λ-j,k-3 – 0.3125*λ-j,k-2 + 0.9375 * λ-j,k-1 + 0.3125*λ-j,k+1)
The prediction of other γ coefficients would be a similar process except the use of the
neighboring λ coefficients and the corresponding filter coefficients.
cases coefficients
# λ on left #λ on right k-3 k-1 k+1 k-3

0 2 -0.5 1.5
1 1 -0.5 0.5
2 0 1.5 0.5
Table : Filter coefficients for N=2
The wavelet coefficients have been calculated. However, we are not very pleased with
the choice of the{λ-1,k}. The reason is the following. Suppose we are given 2n +1 original
samples {λ0,k 0‫≤׀‬k≤2n}. This scheme could be applied (split and predict) n times thus
obtaining{γj,k‫׀‬-n≤j≤-1,0≤k≤2n+j}and two (coarsest level) coefficients λ-n,0 and λ-n,1.These
are the first (λ-n,0=λ0,0) and the last (λ-n,1=λ0.2n)original samples. This introduces
considerable aliasing. Some global properties of the original data set should be
maintained in the smaller version{λ-j,k} . For example, in the case of an image, the smaller
images {λ-j,k}should have the same overall brightness, i.e., the same average pixel value.
Therefore the last values should be the average of all the pixel values in the original
image. Part of this problem can be solved by introducing a third stage: the update.
3.1.3 Update phase
Lift the λ-1,k with the help of the wavelet coefficients γ-1,k. Again neighboring wavelet
coefficients are used. The idea is to find a better λ-1,k so that a certain scalar quantity Q(),
e. g., the mean, is preserved, or
Q(λ-1,k)=Q(λ0,k)
This could be done by finding a new operator to extract λ-1,k directly from λ0,k but not
done for two reasons. First, this would create a scheme which is very hard to invert.
Secondly, it would be better to reuse the work already done maximally. Therefore, the
proposed system uses the already computed wavelet set {γ-1,k} to update{ λ-1,k }so that the
latter preserves Q(). In other words, an operator U is constructed and update { λ-1,k } as
λ-1,k = λ-1,k +U(γ-1,k )
find a scaling function using the previously calculated wavelet coefficients in order to
maintain some properties among all the λ coefficients throughout all levels. One way is to
set all λ0,k to zero except for λ0,0 which is set to one. Then, the interpolating subdivision ad
infinitum. The resulting function is φ(x), the scaling function, which will help to create a
real wavelet that will maintain some desired properties from the original signal. This
function will have an order depending of some (even) value Ñ, which is not necessarily
equal to N. We will call Ñ the number of real vanishing moments. The higher the order of
this function, the less aliasing effect is seen in the resulting transform. The basic
properties to be preserved are the moments of the wavelet function, ψ at every level. One
of the things known from the properties of wavelets, is that the integral of ψ along the
real line from must be equal to zero. This is also true for higher moments. Thus,
Basically want to preserve up to Ñ-1 moments of the λ's at every level and use this
information to see how much of every coefficient is needed to update every λ. These
update values are named lifting coefficients. Before starting the algorithm to calculate the
lifting coefficients, first need to initialize the moments information for every coefficient
at the first level. The integral is set to one for all the coefficients because all the filter
coefficients for each λ add to one. The initial values for the higher order moments are
calculated using the coefficient indices as shown in the table below. Table 2: Initial
moments using index k
Initial moments using index k

Mom1=k1 0 1 2 3 4 5
Mom2=k2 0 1 4 9 16 25
Mom3=k3 0 1 8 27 64 125
Once the moments are initialized, the following steps can be applied.
1.Check for the λ's that contributed to predicting every γ and see how much this
contribution was. (These values are given by the filter coefficients found in the prediction
stage.)
2. Update the moments for every λ at the current level with the following equation,
mj,k=mj,k + fj*ml,k
where j is the index relative to a λ coefficient, f(j) is its corresponding filter coefficient
(0 < j ≤ N), k is the moment being updated (0 < k ≤ Ñ), and l is the index relative to a
coefficient.
3. Knowing that all the moments must be zero at every level, a linear system can be
constructed to find the lifting coefficients for every . The steps are:
(a) Put a one in a γ coefficient and zero in all the remaining γ's.
(b) Apply an inverse transform one step up to see how much this is contributing to the λ's
that update it and create a linear system of Ñ x Ñ variables
(c) Solve the system and find the set of lifting coefficients for the γ coefficient with value
set to one.
This linear system is built and solved for every coefficient to find its corresponding set
of lifting coefficients. After applying the previous steps, we have a set of Ñ lifting
coefficients for every at every level. These values are used to apply the update operator,
U, to the λ coefficients before iterating to the next level. To update the λ's, position at a
λj,k coefficient and take its corresponding lifting coefficients, e.g. (a,b) for Ñ= 2.Identify
the λ's which were affected by this γ , e. g. λ-j,k-1 and λ-j,k+1. Now,
λ-j,k-1 = λ-j,k+1 +a*γ-j,k and λ-j,k+1 = λ-j,k+1 + b* γ-j,k
Then, move to the next and do the same. An example of the split, predict and update
phases for a 1-D signal with length L = 8, N = 2 and Ñ= 2 follows. First, consider the
split and predict:
λ1 λ2 λ3 λ4 λ5 λ6 λ7 λ8
γ1 γ2 γ3 γ4
γ1 uses λ1 and λ3 for prediction. Similarly, γ2 uses λ3 and λ5, γ3 uses λ5 and λ7, and γ4 uses
λ5 and λ7. The second stage, the update, is performed. In this example, the following
lifting coefficients ((a,b) pairs) are obtained,
γ1 γ2 γ3 γ4
(2/5,1/5) (0,2/3) (4/15,1/5) (-2/15,2/5)
λ1 λ3 λ5 λ7
λ1 uses a from γ1 for updating. Similarly, λ3 uses b from γ1 and a from γ2; λ5 uses b from
γ2 a from γ3, and a from γ4; and λ7 uses b from γ3 and b from γ4.
At the next level, the coefficients get organized as follows after the split and predict
stages.
λ1 λ3 λ5 λ7
γ1 γ3
γ1 uses λ1 and λ5, and γ2 uses λ1 and λ5 for prediction.
γ1 γ2
(1/2,0.214286) (-1/3,0.476190)
λ1 λ5
In the update stage ,λ1 uses a from γ1 and a from γ2, and λ5 uses b from γ1 and b from γ2
for updating.
It is important to note that for a longer signal, the lifting coefficients are going to be
(1/4,1/4) for all the λ's unaffected by the boundaries. Using these values, the λ
coefficients can be updated with the following equation,
λ-1,k= λ-1,k + ¼* γ-1,k-1 + ¼ * γ-1,k
The three stages of lifting described by Equations and depicted in the block diagram are
combined and iterated to generate the 1-D fast lifted forward wavelet transform
algorithm:
{λj,k, γj,k = split(λj+1,k )
For j= -1 down to –n γj,k -= P (λj,k )
λj,k +=U (γj,k )
one of the nice properties of lifting can be illustrated: once forward transform is
performed, immediately inverse can be derived. Just have to reverse the operations and
toggle + and -. This leads to the following algorithm for the inverse transform:
λj,k -=u(γj,k )
For j= -n to-1 γj,k +=p(λj,k )
λj+1,k =join(λj,k ,γj,k )
To calculate the total number of iterations of this transform, three factors have to be
considered: the signal length (L), the number of dual vanishing moments (N), and the
number of real vanishing moments (Ñ). It can be proven that the total number of
iterations is given by,
N=[log2((L-1)/(Nmax -1))]
where Nmax = max(N,Ñ). It can be seen from Equation 6 that the size of the signal does
not matter, i.e., signals do not necessarily have to have dyadic dimensions. The
interpolating subdivision guarantees correct treatment of the boundaries for every case.
An extension of the 1-D algorithm for 2-D signals is a simple repetitive scheme of the 1-
D transform through rows and columns, as the transform is separable. For better support
of frequencies, the application of the square 2-D method is proposed. The basic idea is to
apply the 1-D transform to all the rows first and, afterwards, to all the columns. This is
done at every level in order to create a square window transform that gives better
frequency support than a rectangular window transform. Different filter and lifting
coefficients are used for each dimension (X,Y ) if they are different. Using the filter
coefficients (1/2,1/2) and lifting coefficients (1/4,1/4), the wavelet transform presented
here is the (N = 2, Ñ= 2) biorthogonal wavelet transform of Cohen-Daubechies-
Feauveau. This simple example already shows how the lifting scheme can speed up the
implementation of the wavelet transform. Classically, the {λ-1,k}coefficients are found as
the convolution of the {λ0,k} coefficients with the filter
H={-1/8,1/4,3/4,1/4,-1/8}
This step would take 6 operations per coefficient while lifting only needs 3.
3.1.4 Inverse lifting scheme

γj,k -
Update Predict Merge λj+1,k
λj,k
+
Figure : Inverse lifting scheme

A whole family of biorthogonal wavelets can be constructed by varying the three stages
of lifting:
1. Split: Choices other than the Lazy wavelet are possible as the initial split. A typical
alternative is the Haar wavelet transform.
2. Predict: In wavelet terminology, the prediction step establishes the number of
vanishing moments(N) of the dual wavelet. In other words, if the original signal is a
polynomial of degree less than N, all wavelet coefficients will be zero. It is shown that
schemes with order higher than N = 2 are easily obtained by involving more neighbors.
3. Update: Again in wavelet terminology, the update step establishes the number of
vanishing moments ( Ñ) of the primal or real wavelet. In other words, the transform
preserves the first Ñ moments of the λj,k sequences. It is shown that schemes with order
higher than Ñ= 2 can be constructed by involving more neighbors. In some cases, namely
when the split stage already creates a wavelet with a vanishing moment (such as the
Haar), the update stage can be omitted. In other cases, with more than one update step
applied, another family of wavelets, rather than the biorthogonal ones, can be created.
The in-place implementation can be easily derived from the diagrams and equations. Let's
assume the original samples are stored in a vector v[k]. Each coefficient λj,k or γj,k is
stored in location . The Lazy wavelet transform is then immediate. All other operations
can be done with + = or - = operations. In other words, when predicting the γ coefficients,
the λ coefficients can be substituted by them in the same position. When updating the λ
coefficients, they can be saved in the same position.
Chapter 4
Design approach
4.1 DESIGN CONSIDERATION
The Embedded zero-wavelet tree coding is fundamentally been classified into 6
major parts;
1) Preprocessing of the source image.
2) Transformation of the processed image using wavelet
transforms.
3) Encoding the transformed data for compression.
4) Decoding the compressed encoded data.
5) Performing the inverse transformation on the decoded data to
retrieve back the original data back.
6) Processing the retrieved data to obtain the original image
back.
The design unit implements the Embedded zero-tree wavelet coding system for
data compression. The coding system reads the multiresolution component of the image
obtain from the transformation module and pass the data to the decoder unit to retrive the
image back. Figure below shows the implemented embedded zero tree wavelet coding
system for image processing.

4.2 EXISTING CODING SYSTEM
Fig 4.9 shows the block diagram of JPEG 2000 coding system used for
compressing still images.
Source Image
Wavelet Quantizer
Encoder
transformation
Compressed data
Inverse wavelet Dequantizer Decoder

Transformation
Retrieved Image
Fig : Block diagram for the JPEG-2000 image coding system
4.2.1 TRANSFORMATION UNIT
To perform the forward DWT the JPEG2000 system uses a one-dimensional (1-
D) subband decomposition of a 1-D set of samples into low-pass and high-pass samples.
Low-pass samples represent a down-sampled, low-resolution version of the original set
and High-pass samples represent a down-sampled residual version of the original set.
The transformation uses the convolution based filering mode with the process
performing similar to the one expalined in section 4.2.2 of embedded coding.
4.2.2 QUANTIZATION UNIT

Quantization refers to the process of approximating the continuous set of values
in the image data with a finite (preferably small) set of values. The input to a quantizer is
the original data, and the output is always one among a finite number of levels. The
quantizer is a function whose set of output values are discrete, and usually finite.
Obviously, this is a process of approximation, and a good quantizer is one which
represents the original signal with minimum loss or distortion.
The input-output characteristics of quantization process is shown in Fig 4.10.
Inverse quantization formulates the reverse of quantization. In quantization process each
block of Transformed coefficients is subjected to a process of quantization, wherein
grayscale and color information are discarded. Each transformed coefficient is divided by
its corresponding element in a scaled quantization matrix, and the resulting numerical
value is rounded. The default quantization matrices for luminance and chrominance are
specified in the JPEG standard, and were designed in accordance with a model of human
perception. The scale factor of the quantization matrix directly affects the amount of
image compression, and the lossy quality of JPEG compression arises as a direct result of
this quantization process.
Fig : Quantization relationship wrt. a given input

In quantization, each input symbol is treated separately in producing the output.
A quantizer can be specified by its input partitions and output levels (also called
reproduction points). If the input range is divided into levels of equal spacing, then
the quantizer is termed as a Uniform Quantizer, and if not, it is termed as a Non-
Uniform Quantizer. A uniform quantizer can be easily specified by its lower bound
and the step size. Also, implementing a uniform quantizer is easier than a non-
uniform quantizer. For example in Fig 4.11 if the input falls between n*r and (n+1)*r,
the quantizer outputs the symbol n.
Fig : A uniform quantizer
Just the same way a quantizer partitions its input and outputs discrete levels, a
Dequantizer is one which receives the output levels of a quantizer and converts them into
normal data, by translating each level into a 'reproduction point' in the actual range of
data. It can be seen from literature, that the optimum quantizer (encoder) and optimum
dequantizer (decoder) must satisfy the following conditions.
• Given the output levels or partitions of the encoder, the best decoder is one that
puts the reproduction points x' on the centers of mass of the partitions. This is
known as centroid condition.
• Given the reproduction points of the decoder, the best encoder is one that puts the
partition boundaries exactly in the middle of the reproduction points, i.e. each x is
translated to its nearest reproduction point. This is known as nearest neighbour
condition.
• The quantization error (x - x') is used as a measure of the optimality of the
quantizer and dequantizer.
The Quantizer Quantizes the data at a specific step level ∆ given by
• (Step size) ∆ = R/ (2b -1)
• where R is the range of the image matrix (I)
(Range) R = max (I)-min (I). ‘b’ indicate the number of bit outputted for every step. The
quantization carried out on thresholding operation. For every value of the image element
fed to the quantizer an equivalent binary sequence is passed as output which is passed to
the encoder module for further processing.
4.2.3 ENTROPY ENCODER

After the data has been quantized into a finite set of values, it can be encoded
using an Entropy Coder to give additional compression. By entropy, it means the amount
of information present in the data, and an entropy coder encodes the given set of symbols
with the minimum number of bits required to represent them. Various algorithms were
proposed for the coding of image among which huffman coding is found to be more
efficient than others.
Huffman encoding is designed to work best on images that have a lot of
repetition. The general concept is to assign the most used bytes fewer bits, and the least
used bytes more bits. First, the most used bytes in the image are assigned a variable
length binary code. The more often the byte is used the shorter the control code. The less
often a byte occurs the longer the control code.

4.2.4 HUFFMAN DECODER
The decoder unit decodes the encoded data bit stream from the encoder module.
The decoder unit performs the reverse operation to the encoder process. This unit shares
the same code word used under encoding from the code book.
The Huffman decoder block carries out decoding reading the unique code bits passed in
place of the data bit. The data bits are received in serial format and compared with the
unique word. Equivalent data bits are passed out whenever there is a matching of the
unique word. For decoding of ‘m’ block of data bits a minimum of 2 m-1 iterations are
performed which make the system much slower in operation.
4.2.5 DEQUANTIZER
The dequantizer unit dequantizer the decoded data bits. The dequantization
operation is carried in the reverse manner to the quantization. The dequantizer takes the
same step sizes as in quantization from the quantization table. The reconstructed data are
not exactly recovered to the original image which makes the system a lossy compression
system.
4.2.6 INVERSE TRANSFORMATION
The inverse Transforamtion is carried out in a similar fashion to the one explained
Under the embedded coding.
PROPOSED SYSTEM:
Source Image
Wavelet Lifting
Pre-Processor Transformation Coding
Compressed data
Post Processor Inverse wavelet Inverse Lifting

Transformation Codin
g
Retrieved image
Fig : Block diagram for the proposed EZW coding system
4.3.1 PREPROCESSING UNIT
Before the processing of image data the image are preprocessed to improve the
rate of operation for the coding system. Under preprocessing tiling on the original image
is carried out. The term “tiling” refers to the partition of the original (source) image into
rectangular nonoverlapping blocks (tiles), which are compressed independently, as
though they were entirely distinct images. All operations, including component mixing,
wavelet transform, quantization and entropy coding are performed independently on the
image tiles. The tile component as shown in Fig 4.2 is the basic unit of the original or
reconstructed image. Tiling reduces memory requirements, and since they are also
reconstructed independently, they can be used for decoding specific parts of the image
instead of the whole image. All tiles have exactly the same dimensions, except maybe
those at the boundary of the image. Arbitrary tile sizes are allowed, up to and including
the entire image (i.e., the whole image is regarded as one tile).
8x8 tile image
Original imageC
Fig : Tiling of original image
4.3.2 TRANSFORMATION UNIT
This unit Transforms the input image from time domain to frequency domain and
decomposes the original image into its fundamental components. The transformation is
performed using Debuchie wavelet transform. Wavelet transform is a very useful tool for
signal analysis and image processing, especially in multi-resolution representation. One-
dimensional discrete wavelet transform (1-D DWT) decomposes an input sequence into
two components (the average component and the detail component) by calculations with
a low-pass filter and a high-pass filter. Two-dimensional discrete wavelet transform (2-D
DWT) decomposes an input image into four sub-bands, one average component (LL) and
three detail components (LH, HL, HH) as shown in Fig 4.3.

Fig : The result of 2-D DWT decomposition
The wavelet transform uses filter banks shown in Fig 4.5 for the decomposition of
preprocessed original image into 3 details and 1 approximate coefficient. 2-D DWT is
achieved by two ordered 1-D DWT operations (row and column). Initially the row
operation is carried out to obtain 1 D decomposition. Then it is transformed by the
column operation and the final resulted 2-D DWT is obtained.
Fig : Filter Bank Implementation of Wavelet sub-band decomposition
The filtering is carried out by convolving the input image with the filter
coefficients passed. Each decomposing stage consists of a pair of high pass and a low
pass filter. These filters isolate the highly varying and low varying components from the
given image. Fig 4.5 (a) shows the original image used for decomposition into
fundamental subbands using filter bands as shown in Fig 4.4. Fig 4.5 (d) shows the 3level
decomposition of the original image. The approximate coefficient obtained at each level
were further decomposed to 3 detailed and 1 approximate coefficient for n levels.

(a) Original image (b) 1 scale level decomposition
(c) 2 scale level decomposition

1 level detailed coefficient
Approximate
coefficient
2nd level
detailed
coefficient
(d) 3 scale level decomposition Fig : (a) (b) (c) (d
4.4 WAVELET LIFTING COMPRESSION

This technique overcomes the lossy nature of wavelet compression by two modules
namely the encoding and the decoding. The encoding module constitutes of three sub
modules called split, predict and update modules. The reconstruction of the lossless
compressed data is decoding module which consists of three sub modules update, predict
and merge.
4.5 FUNCTIONAL BLOCKS

In this section we describe the lifting scheme in more detail. Consider a signal sj with 2j
samples which we want to transform into a coarser signal sj-1 and a detail signal
dj-1. A typical case of a wavelet transform built through lifting consists of three steps:
split, predict , and update.
4.5.1 Split:
This stage does not do much except for splitting the signal into two disjoint sets of
samples. In our case one group consists of the even indexed samples s2l and the other
group consists of the odd indexed samples s2l+1. Each group contains half as many
samples as the original signal. The splitting into even and odds is a called the Lazy
wavelet transforms.
(evenj-1, oddj-1) = Split(sj)
4.5.2 Predict:
The even and odd subsets are interspersed. If the signal has a local correlation structure,
the even and odd subsets will be highly correlated. In other words given one of the two
sets, it should be possible to predict the other one with reasonable accuracy. Always use
the even set to predict the odd one. In the Haar case the prediction is particularly simple.
An odd sample sj,2l+1 will use its left neighboring even sample sj,2l as its predictor. Then let
the detail dj-1,l be the difference between the odd sample and its prediction:
dj-1,l = sj,2l+1 – sj,2l
which defines an operator P such that
dj-1 = oddj-1-P(evenj-1)
It should be possible to represent the detail more efficiently. Note that if the original
signal is a constant, then all details are exactly zero.
4.5.3 Update:
One of the key properties of the coarser signals is that they have the same average value
as the original signal, i.e., the quantity
is independent of j. This results in the fact that the last coefficients s0,0 is the DC
component or overall average of the signal. The update stage ensures this by letting
sj-1,l = sj,2l + dj-1,l/2
Substituting this definition we easily verify that
which defines an operator U of the form
sj-1 = evenj-1 + U(dj-1)
All this can be computed in-place: the even locations can be overwritten with the
averages and the odd ones with the details. An abstract implementation is given by:
(oddj-1, evenj-1) := Split(sj);

oddj-1 -= P(evenj-1);
evenj-1 += U(oddj-1);
These three stages are depicted in a forward lifting scheme we can immediately build the
inverse scheme, Again we have three stages:
4.5.4 Undo update:

Given dj and sj we can recover the even samples by simply subtracting the update
information:
evenj-1 = sj-1 - U(dj-1):
In the case of Haar, we compute this by letting
Sj,2l = sj-1,l – dj-1,l/2

4.5.5 Undo predict:
Given evenj-1 and dj-1 we can recover the odd samples by adding the prediction
information
oddj-1 = dj-1 + P(evenj-1)
In the case of Haar, we compute this by letting
sn,2l+1 = dn-1,l + sn,2l
4.5.6 Merge:
Now that we have the even and odd samples we simply have to zipper them together to
recover the original signal. This is the inverse Lazy wavelet:
sj = Merge(evenj-1, oddj-1)
assuming that the even slots contain the averages and the odd ones contain the difference,
the implementation of the inverse transform is:
evenj-1 -= U(oddj-1);
oddj-1 += P(evenj-1);
sj := Merge(oddj-1, evenj-1)
The inverse transform is thus always found by reversing the order of the operations and
flipping the signs.
4.6 INVERSE TRANSFORMATION

Inverse transformation is the process of retriving back the image data from the
obtained image values. The image data trasnformed and decomposeed under encoding
side is rearranged from higher level decomposition to lower level with the highest
decomposed level been arranged at the top. From the highest level of decomposition the
lower values are arranged as shown in Fig 4.6.
LL2 HL2
LL1 HL1
LH2 HH2 HL1
LH1 HH1 LH1 HH1
Retrieved Image
Fig : Decomposition level and the reconstruction of image back
The reconstuction of the decompsed image is carried out by iteratively repeating the filter
banks followed by upsampling by 2 and then combining the two obained filtered
components, each sub level addition gives the approximate coefficient of the upper level
on a total reconstruction the final addition gives the reconstructed image back. Fig 4.7
shows the reconstuction of the obtained decomposed component for a two level scaling.
D1
D2
Retrieved
Image
D11 D3
D22
D23
APP3
Fig : Multilevel reconstruction of the decomposed image
4.6.1POST PROCESSING UNIT
This unit reforms the image from the obtained tiles by placing them in sequence
as 8x8 blocks for the complete image dimension. These tiles are aligned in same
sequence as segmented under encoding side.
Reconstructed tile
Reconstructed image
Fig : Reconstruction of Image from tile
4.7 OPERATIONAL FLOW DIAGRAM (DFD)

Wavelet transform using lifting scheme FLOW CHART
Wavelet transform
Separate Even
even value
values matrix
Mat
rix
valu Split Predict Update
es
Separate
Odd value Prediction
odd
matrix function
values
Transformed
matrix
Inverse wavelet transform using lifting scheme FLOW CHART
Inverse Wavelet
transform
Even
value
matrix
Trans
forme
d Update Predict Merge
matri
x
Prediction Odd value
function matrix
Original
matrix
CHAPTER 5
RESULTS AND DISCUSSION
5.1 CASE 1: A 256 x 256 tif image
In the above screen first the input image is read or taken. In this case tif image is taken as
input
Read input button is clicked to get the input image.
The input image which is taken is shown above. It is 256 x 256 tif image.
After reading the image wavelet method button is pressed and the compressed image
after applying wavelet transform is obtained.
The above image is the retrieved image after lifting scheme method is applied on original
image.
The error rate of the wavelet compression method and the lifting scheme is compared in
the above graph.
5.2 Case II: A 307 x 593 tif image
An image is read by clicking the read input button.

Compressed image obtained by wavelet transformation method.
The above image is the retrieved image after applying lifting scheme method.
the above graph.
5.3 Case III :A 256 x 256 bmp image
An image is read by clicking the read input button.

Compressed image obtained by wavelet transformation method. The retrieved image
obtained after wavelet method is blurred.
The above image is the retrieved image after applying lifting scheme method. The above
image is almost similar to the original image.
the above graph.
CHAPTER 6
CONCLUSION AND FUTURE WORK
This project implements a method to optimize wavelet function for lossless data
compression using lifting scheme. The project work realized a optimal data compression
system using lifting scheme which compress the image in a lossless manner retaining the
accuracy. The lifting scheme is realized in a modular approach in Matlab platform with
three major blocks mainly split, predict, update. The observations were carried out for
various types of images with different formats and observed that the lifting scheme
provides a considerable accuracy as compared to its counterpart wavelet transform with
quantization. The lifting scheme retains the level of compression compared to wavelet
transformation. From all made observations it is concluded that the proposed lifting
scheme for lossless compression provides considerable accuracy maintaining
compression level as compared to wavelet transform.
The proposed scheme can be further enhanced for high rate of compression for more
accuracy using advances methods such as genetic algorithm .The method could also be
extended for analysis of this method for various bitrate applications.
CHAPTER 7
REFERENCES
1. A.Alecu, A.Munteanu, P.Schelkens, J.Cornelis, and S.Dewitte,"Wavelet-based
infinite scalable coding," IEE Electronics Letters, vol. 38, no. 22, pp. 1338-1340,
2002.
2. R. Ansari, N. Memon, and E. Ceran, "Near-lossless Image Compression
Techniques," Journal of Electronic Imaging, vol. 7, no. 3, pp. 486-494, July 1998.
3. M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, "Image Coding Using
Wavelet Transform," IEEE Transactions on Image Processing, vol. 1, pp. 205-
220, April 1992.
4. A. R. Calderbank, I.Daubechies, W.Sweldens, and B.L.Yeo, "Wavelet
Transforms that Map Integers to Integers," Journal of Applied Computational
Harmonics Analysis,vol. 5, pp. 332-369, 1998.
5. I.Daubechies,"Orthonormal bases of compactly supported wavelets,"
Communications on Pure and Applied Mathematics, vol. 41, pp. 909-996, 1988.
6. I Daubechies and W.Sweldens, "Factoring Wavelet Transforms into Lifting
Steps," Journal of Fourier Analysis and Applications, vol. 4, no. 3, pp. 247-269,
1998.
7. A. Munteanu, J. Cornelis, G. V. d. Auwera, and P. Cristea, "Wavelet-based
lossless compression scheme with progressive transmission capability,"
International Journal of Imaging Systems and Technology, vol. 10, no. 1, pp. 76-
85, 1999.
8. A. Munteanu, J. Cornelis, and P. Cristea, "Wavelet-Based Lossless Compression
of Coronary Angiographic Images," IEEE Transactions on Medical Imaging,
vol.18, no. 3, pp. 272-281, 1999.
9. W. Sweldens and P. Schróder: Building your own wavelets at home, In “Wavelets
in Computer Graphics”,ACM SIGGRAPH Course Notes (1996).
10. W. Sweldens: The lifting scheme: A construction of second generation wavelets,

Siam J. Math. Anal, Vol. 29,No. 2, pp 511-546 (1997).

My Project

Uploaded by

Copyright:

Available Formats

My Project

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

My Project

Uploaded by

Copyright:

Available Formats

CHAPTER 1

Digital imagery has had an enormous impact on industrial applications and

Compression is a way of encoding digital data so it takes up less storage space

2. 2 TYPES OF IMAGE COMPRESSION

Input Data storage

Figure : A typical data compression system.

Before presenting the compression algorithms, it is needed to define a few terms

2. 3 LOSSLESS CODING TECHNIQUES

Lossless coding guaranties that the decompressed image is absolutely identical to

2.3.1 Run length encoding

2.3.2 Entropy coding (Lempel/Ziv)

Nowadays, there is a wide range of so-called modified Lempel/Ziv codings. These

2.3.3 Area coding

2. 4 LOSSY CODING TECHNIQUES

• Image modelling which defines such things as the transformation to be applied to

2.4.1 Transform coding (DCT/Wavelets/)

Transform coding can be generalized into four stages:

2.4.2 Huffman encoding

The compression ratio achieved by Huffman encoding uncorrelated data becomes

A vector quantiser can be defined mathematically as a transform operator T from a K-

With segmentation and approximation coding methods, the image is modelled as a

The operations of finding a suitable segmentation and an optimum set of approximating

Classical examples are polynomial approximation and texture approximation. For

2.4.5 Comparison of Different Compression Methods

or resolutions, which is called multiresolution. Wavelets are a class of functions used to

constructed from a mother wavelet. Compared to Windowed Fourier analysis, a mother

change in the wavelet representation produces a correspondingly small change in the

signals with interesting components at different scales.

mother wavelet, by dilations and translations

The basic idea of wavelet transform is to represent any arbitrary function f as a

decomposition of the wavelet basis or write f as an integral over a and b of ψa,b .

Let with m, n € integers, and a0>1,b0>0 fixed. Then the wavelet

durations. Which are basically used in practical applications.

2 . 6 THE CONTINUOUS WAVELET TRANSFORM

completeness sake (2) gives the inverse wavelet transform.

wavelet, by scaling and translation:

calculated by continuously shifting a continuously scalable function over a signal and

highly redundant. For most practical applications this redundancy is removed.

manageable count. This is the second problem.

transforms where they are today.

Let us start with the removal of redundancy.

As mentioned before the CWT maps a one-dimensional signal to a two-

dimensional time-scale joint representation that is highly redundant. The time-bandwidth

overcome this problem discrete wavelets have been introduced.

Representation (3) to create

Although it is called a discrete wavelet, it normally is a (piecewise) continuous

time-scale space is now sampled at discrete intervals. It is usually chosen s0 = 2 so that

series of wavelet coefficients, and it is referred to as the wavelet series decomposition.

An important issue in such a decomposition scheme is of course the question of

wavelet is different from the reconstruction wavelet.

An arbitrary signal can be reconstructed by summing the orthogonal wavelet basis

functions, weighted by the wavelet transform coefficient:

to noise or improve the shift invariance of the transform. This is a disadvantage of

signal are not simply shifted versions of each other.

infinity up to a certain scale j. If added a wavelet spectrum to the scaling function

A signal f (t) could be expressed in terms of dilated and translated wavelets up to

translated scaling functions at a scale j:

express the signal f (t) as

signal f(k) is simply equal to (k) at the largest scale.