Intoduction: Image Processing and Compression Techniques
Intoduction: Image Processing and Compression Techniques
Intoduction: Image Processing and Compression Techniques
CHAPTER 1
INTODUCTION
Digital image processing technology began in 1960’s and 1970’s to be used in
medical imaging,remote resources observation and astronomy .the invention of of computer
axial tomography (CAT) in 1970’s is one of the most important event in the application
image processing .In medical diagnosis CAT is a process in which a ring detector encircles an
object (patient) and a x ray source concentric with the detector ring rotates about the object
.the xray passes through the object and are collected at the opposite end by the corresponding
detector in the ring .as the source rotates this process is repeated.
X rays where invented in 1895 by withelm Conrad poentgen .this two inventions led to
some of the most important applications of image processing .From 1960’s until present the
field of image processing is grown vigourously.
In addition to applications in medicine and space program digital image processing
technology now are used in broad range of applications. computer procedures are used to
enhance the contrast or code the intensity levels in to far easier interpretation of x rays and
other images used in industry ,medicine and other biological science .geographers used the
similar technology to study pollution patterns from aerial and satellite imagery .image
enhancement and restoration procedures are used to process degraded images of
unrecoverable objects or experimental results too expensive to duplicate .in orcheology
image processing methods have successfully restored blurred pictures that were the only
available records of rare artifacts lost or damaged after being photographed .similarly
successful applications of image processing concepts can be found in astronomy
,biology,medicine ,law enforcement ,defence and industry.
The second major application of image processing is in solving problems dealing with
machine perception . In this case interest is on procedures for extracting from an image
information in a form suitable for computer processing .
Typical problem in machine perception that utilizes image processing technology are
automatic character recognition ,industrial machine vision for product assembly and
inspection .
Image compression is a art and science of reducing the amount of data required to
represent an image is one of the most useful and commercially successful technologies in the
field of image processing .
To better understand the need of compact image representation .Consider the amount
of data required to represent a 2 hour standard definition television movie using 740*480*24
bit pixel array .a digital movie is a sequence of video frames in which each frame is a full
colour still image .Because video players must display the frame sequence at rate near 30
frames per sec(fps) .Standard digital video must be accessed at
30fps*720*480pixels/frames *3bytes/pixels = 31,104,000bytes
And a 2hr movie consists of
31,104,000 bytes/sec *60 sec/hr*2hr =2.24 *10^11 bytes or 224 Gbytes of data.
To put a 2hr movie on a single DVD each frame must be compressed on a average by a
factor of 26.3.The compression must be higher for high definition television where image
restoration reach 1920*1080*24bits/image.
CHAPTER 2
IMAGE PROCESSING
2.1.1 SCANNERS
A device that can read text or illustrations printed on paper and translate the
information into a form the computer can use. A scanner works by digitizing an image --
dividing it into a grid of boxes and representing each box with either a zero or a one,
depending on whether the box is filled in. (For color and gray scaling, the same principle
applies, but each box is then represented by up to 24 bits.) The resulting matrix of bits, called
a bit map, can then be stored in a file, displayed on a screen, and manipulated by programs.
Optical scanners do not distinguish text from illustrations; they represent all images as
bit maps. Therefore, you cannot directly edit text that has been scanned. To edit text read by
an optical scanner, you need an optical character recognition (OCR ) system to translate the
image into ASCII characters. Most optical scanners sold today come with OCR packages.
Once a photograph is in digital format, you can apply a wide variety of special effects
to it with image enhancing software. You can then print the photo out on a normal printer or
send it to a developing studio which will print it out on photographic paper. Although the
resolution of digital photos is not nearly as high as photos produced from film, digital
photography is ideal when you need instant, low-resolution pictures. It's especially useful for
photos that will be displayed on the World Wide Web because Web graphics need to be low
resolution anyway so that they can be downloaded quickly.
CHAPTER 3
Sequence of image processing
Fig 3
The requirements for processing an images is that the images must be available in
digitized form, that is, arrays of finite length binary words.
For digitization the given Image is sampled on a discrete grid and each sample or pixel is
quantized using a finite number of bits. The digitized image is processed by a computer. To
display a
digital image, it is first converted into analog signal, which is scanned onto a display .
A band limited image is sampled uniformly on a rectangular grid with spacing dx,dy
can be recovered without error from the sample values f (mdx,ndy) provided the sampling
rate is greater than nyquist rate, that is
1 . Color quantization
Color quantization :
Quantization reduces the number of colors used in an image; this is important for
displaying images on devices that support a limited number of colors and for efficiently
compressing certain kinds of images. Most bitmap editors and many operating systems have
built-in support for color quantization. Popular modern color quantization algorithms include
the nearest color algorithm (for fixed palettes), the median cut algorithm, and an algorithm
based on octrees.It is common to combine color quantization with dithering to create an
impression of a larger number of colors and eliminate banding artifacts.
The human eye is fairly good at seeing small differences in brightness over a
relatively large area, but not so good at distinguishing the exact strength of a high frequency
brightness variation. This fact allows one to get away with a greatly reduced amount of
information in the high frequency components. This is done by simply dividing each
component in the frequency domain by a constant for that component, and then rounding to
the nearest integer. This is the main lossy operation in the whole process. As a result of this,
it is typically the case that many of the higher frequency components are rounded to zero, and
many of the rest become small positive or negative numbers
i.LLOYD-MAX QUANTIZER
CHAPTER 4
PROCESSING TECHNIQUES
Image restoration is a process to restore an original image from its observed but
degraded version Z. Since edges are important structures of the true image, they should be
preserved during image restoration . In the literature, a commonly used model for describing
the relationship between f and Z is
where h is a 2-D point spread function (psf) describing the spatial blur, ε(x, y) is a pointwise
noise at (x, y), Ω is the design space, and h ⊗ f denotes the convolution between h and f. In
model (1), the spatial blur is assumed to be linear and location invariant, and the pointwise
noise is additive, which may not be true in certain applications.
The blurring can usually be modeled as an LSI system with a given PSF h[m,n] .
4.3Examples
Figure 4.2
Once we apply the PSF to the original image, we receive our blurred image that is shown in
Figure 4.3:
Figure 4.3
Example 4.2 looks at the original images in its typical form; however, it is often
useful to look at our images and PSF in the frequency domain. In Figure 4.4, we take another
look at the image blurring example above and look at how the images and results would
appear in the frequency domain if we applied the fourier transforms.
Figure 4.4
CHAPTER 5
Image compression
5.1 DEFINITION
It is concerned with minimizing the no of bits required to represent an image.Image
compression is the application of data compression on digital images. In effect, the objective is to
reduce redundancy of the image data in order to be able to store or transmit data in an efficient
form. The best image quality at a given bit-rate (or compression rate) is the main goal of
image compression.
A algorithm for image compression reconstruction is shown in the block diagram
given below in the fig 5.1 .The first step removes information redundancy caused by high
correlation of image data.the second step is coding of transformed data using a code of fixed
or variable length .An advantage of variable length codes is the possibility of coding more
frequent data using shorter code words therefore increasing compression efficiency ,while an
advantage of fixed length coding is a standard codeword length that offers easy handling and
fast processing.compressed data are decoded after transmission or archiving and
reconstructed .no redundant data may be lost in the data compression process otherwise error
free compression is impossible.
Transmission
archiving
Image data compression methods consists of two parts.Image data properties must be
determined first such image entropy,various correlation functions ,etc.The part yields an
appropriate compression technique design sith respect to image properties.
G−1
He = - ∑ klog2[ p(k )] (1)
k=0
r = b – He (2)
where b is the smallest number of bits with which image quantization levels can represented .
This definition of image information redundancy can evaluated only if a good
estimate of entropy is available ,which is usually not so because the necessary statistical
properties of image is not known .Image data entropy however can be estimated from a gray
level histogram.Let h(k) be the frequency of gray level k in an image f, 0 ≤ k ≤ 2^b-1 ,let
the image size be M x N .the probability of occurrence of gray level can be estimated as
P(K)=h(k)/MN (3)
b
2 −1
He = - ∑ p ( k ) log 2[ p( k )] (4)
k=0
The information redundancy estimate is r = b-He. The definition of compression ratio is then
K=b/He (5)
Theoretical limits of possible image compression can found using these formulas.For
example ,the entropy of satellite remote sensing data may be He € [4,5],where image data are
quantized in to 256 gray leveks or 8 bits per pixel .we can easily compute the information
redundancy as r € [3,4] bits.This implies that these data can be represented by an average
data valume of 4-5 bits per pixel with no loss of information and the compression ratio would
be k € [1.6,2]
5.3.1 Scalability
Certain parts of the image are encoded with higher quality than others. This can be
combined with scalability (encode these parts first, others later).
Compressed data can contain information about the image which can be used to
categorize, search or browse images. Such information can include color and texture
statistics, small preview images and author/copyright information.
amount of noise introduced through a lossy compression of the image. However, the
subjective judgement of the viewer is also regarded as an important, perhaps the most
important, measure.
Run-length encoding – used as default method in PCX and as one of possible in BMP,
TGA, TIFF
DPCM and Predictive Coding
Entropy encoding
Adaptive dictionary algorithms such as LZW – used in GIF and TIFF
Deflation – used in PNG, MNG and TIFF
Chain codes
Most lossless compression programs do two things in sequence: the first step generates a
statistical model for the input data, and the second step uses this model to map input data to
bit sequences in such a way that "probable" (e.g. frequently encountered) data will produce
shorter output than "improbable" data.
The primary encoding algorithms used to produce bit sequences are Huffman coding (also
used by DEFLATE) and arithmetic coding. Arithmetic coding achieves compression rates
close to the best possible for a particular statistical model, which is given by the information
entropy, whereas Huffman compression is simpler and faster but produces poor results for
models that deal with symbol probabilities close to 1.There are two primary ways of
constructing statistical models: in a static model, the data are analyzed and a model is
constructed, then this model is stored with the compressed data. This approach is simple and
modular, but has the disadvantage that the model itself can be expensive to store, and also
that it forces a single model to be used for all data being compressed, and so performs poorly
on files containing heterogeneous data. Adaptive models dynamically update the model as the
data are compressed. Both the encoder and decoder begin with a trivial model, yielding poor
compression of initial data, but as they learn more about the data performance improves.
Most popular types of compression used in practice now use adaptive coders. Lossless
compression methods may be categorized according to the type of data they are designed to
compress. While, in principle, any general-purpose lossless compression algorithm (general-
purpose meaning that they can compress any bit string) can be used on any type of data,
many are unable to achieve significant compression on data that are not of the form for which
they were designed to compress. Many of the lossless compression techniques used for text
also work reasonably well for indexed images.
Text
Statistical modeling algorithms for text (or text-like binary data such as executables) include:
Multimedia
Techniques that take advantage of the specific characteristics of images such as the
common phenomenon of contiguous 2-D areas of similar tones. Every pixel but the first is
replaced by the difference to its left neighbor. This leads to small values having a much
higher probability than large values. This is often also applied to sound files and can
compress files which contain mostly low frequencies and low volumes. For images this step
can be repeated by taking the difference to the top pixel, and then in videos the difference to
the pixel in the next frame can be taken.A hierarchical version of this technique takes
neighboring pairs of data points, stores their difference and sum, and on a higher level with
lower resolution continues with the sums. This is called discrete wavelet transform.
JPEG2000 additionally uses data points from other pairs and multiplication factors to mix
then into the difference. These factors have to be integers so that the result is an integer under
all circumstances. So the values are increased, increasing file size, but hopefully the
distribution of values is more peaked. The adaptive encoding uses the probabilities from the
previous sample in sound encoding, from the left and upper pixel in image encoding, and
additionally from the previous frame in video encoding. In the wavelet transformation the
probabilities are also passed through the hierarchy.
Some of the most common lossless compression algorithms are listed below.
General purpose
Audio
Graphics
3D Graphics
Video
Animation codec
CorePNG
Dirac - Has a lossless mode. [2]
FFV1
JPEG 2000
Huffyuv
Lagarith
MSU Lossless Video Codec
SheerVideo
5.5.2 Limitations
Lossless data compression algorithms cannot guarantee compression for all input data sets.
In other words, for any (lossless) data compression algorithm, there will be an input data
set that does not get smaller when processed by the algorithm. This is easily proven with
elementary mathematics using a counting argument, as follows. Assume that each file is
represented as a string of bits of some arbitrary length
Suppose that there is a compression algorithm that transforms every file into a distinct
file which is no longer than the original file, and that at least one file will be compressed
into something that is shorter than itself.
Let M be the least number such that there is a file F with length M bits that
compresses to something shorter. Let N be the length (in bits) of the compressed
version of F.
Because N < M, every file of length N keeps its size during compression. There are 2N
such files. Together with F, this makes 2N + 1 files which all compress into one of the
2N files of length N.
But 2N is smaller than 2N + 1, so by the pigeonhole principle there must be some file
of length N which is simultaneously the output of the compression function on two
different inputs. That file cannot be decompressed reliably (which of the two originals
should that yield?), which contradicts the assumption that the algorithm was lossless.
We must therefore conclude that our original hypothesis (that the compression
function makes no file longer) is necessarily untrue.
Any lossless compression algorithm that makes some files shorter must necessarily make
some files longer, but it is not necessary that those files become very much longer. Most
practical compression algorithms provide an "escape" facility that can turn off the normal
coding for files that would become longer by being encoded. Then the only increase in size is
a few bits to tell the decoder that the normal coding has been turned off for the entire input.
For example, DEFLATE compressed files never need to grow by more than 5 bytes per
65,535 bytes of input.
In fact, if we consider files of length N, if all files were equally probable, then for any
lossless compression that reduces the size of some file, the expected length of a compressed
file (averaged over all possible files of length N) must necessarily be greater than N. So if we
know nothing about the properties of the data we are compressing, we might as well not
compress it at all. A lossless compression algorithm is useful only when we are more likely to
compress certain types of files than others; then the algorithm could be designed to compress
those types of data better.
Thus, the main lesson from the argument is not that one risks big losses, but merely that
one cannot always win. To choose an algorithm always means implicitly to select a subset of
all files that will become usefully shorter. This is the theoretical reason why we need to have
different compression algorithms for different kinds of files: there cannot be any algorithm
that is good for all kinds of data.
The "trick" that allows lossless compression algorithms, used on the type of data they
were designed for, to consistently compress such files to a shorter form is that the files the
algorithms are designed to act on all have some form of easily-modeled redundancy that the
algorithm is designed to remove, and thus belong to the subset of files that that algorithm can
make shorter, whereas other files would not get compressed or even get bigger. Algorithms
are generally quite specifically tuned to a particular type of file: for example, lossless audio
compression programs do not work well on text files, and vice versa.
On the other hand, it has also been proven that there is no algorithm to determine whether
a file is incompressible in the sense of Kolmogorov complexity; hence, given any particular
file, even if it appears random, it's possible that it may be significantly compressed, even
including the size of the decompressor. An example is the digits of the mathematical constant
pi, which appear random but can be generated by a very small program. However, even
though it cannot be determined whether a particular file is incompressible, a simple theorem
about incompressible strings shows that over 99% of files of any given length cannot be
compressed by more than one byte (including the size of the decompressor).
„± CCITT group 3 1D
„± CCITT group 3 2D
„± CCITT group 4 2D
„± Lempel-Ziv and Welch algorithm.
Reducing the color space to the most common colors in the image. The selected
colors are specified in the color palette in the header of the compressed image. Each
pixel just references the index of a color in the color palette. This method can be
combined with dithering to avoid posterization.
Chroma subsampling. This takes advantage of the fact that the eye perceives spatial
changes of brightness more sharply than those of color, by averaging or dropping
some of the chrominance information in the image.
Transform coding. This is the most commonly used method. A Fourier-related transform
such as DCT or the wavelet transform are applied, followed by quantization and entropy
coding.
5.7 APPILICATION
For visual and audio data, some loss of quality can be tolerated without losing the
essential nature of the data. By taking advantage of the limitations of the human sensory
system, a great deal of space can be saved while producing an output which is nearly
indistinguishable from the original. These lossy data compression methods typically offer a
three-way tradeoff between compression speed, compressed data size and quality loss.
Lossless compression schemes are reversible so that the original data can be
reconstructed, while lossy schemes accept some loss of data in order to achieve higher
compression. However, lossless data compression algorithms will always fail to compress
some files; indeed, any compression algorithm will necessarily fail to compress any data
HKBKCE Bangalore 23 Department of EC
Image Processing And Compression Techniques
containing no discernible patterns. Attempts to compress data that has been compressed
already will therefore usually result in an expansion, as will attempts to compress all but the
most trivially encrypted data.In practice, lossy data compression will also come to a point
where compressing again does not work, although an extremely lossy algorithm, like for
example always removing the last byte of a file, will always compress a file up to the point
where it is empty.
25.888888888
25.[9]8
Interpreted as, "twenty five point 9 eights", the original string is perfectly recreated, just
written in a smaller form. In a lossy system, using
26
instead, the exact original data is lost, at the benefit of a smaller file size
Original
Processed by
Canny edge detector
CHAPTER 6
JPEG Compression
One of the hottest topics in image compression technology today is JPEG. The
acronym JPEG stands for the Joint Photographic Experts Group, a standards committee that
had its origins within the International Standard Organization (ISO). In 1982, the ISO formed
the Photographic Experts Group (PEG) to research methods of transmitting video, still
images, and text over ISDN (Integrated Services Digital Network) lines. PEG's goal was to
produce a set of industry standards for the transmission of graphics and image data over
digital communications networks.
In 1987, the ISO and CCITT combined their two groups into a joint committee that
would research and produce a single standard of image data compression for both
organizations to use. This new committee was JPEG. Although the creators of JPEG might
have envisioned a multitude of commercial applications for JPEG technology, a consumer
public made hungry by the marketing promises of imaging and multimedia technology are
benefiting greatly as well. Most previously developed compression methods do a relatively
poor job of compressing continuous-tone image data; that is, images containing hundreds or
thousands of colors taken from real-world subjects. And very few file formats can support
24-bit raster images.
GIF, for example, can store only images with a maximum pixel depth of eight bits, for
a maximum of 256 colors. And its LZW compression algorithm does not work very well on
typical scanned image data. The low-level noise commonly found in such data defeats LZW's
ability to recognize repeated patterns. Both TIFF and BMP are capable of storing 24-bit data,
but in their pre-JPEG versions are capable of using only encoding schemes (LZW and RLE,
respectively) that do not compress this type of image data very well. JPEG provides a
HKBKCE Bangalore 26 Department of EC
Image Processing And Compression Techniques
compression method that is capable of compressing continuous-tone image data with a pixel
depth of 6 to 24 bits with reasonable speed and efficiency. And although JPEG itself does not
define a standard image file format, several have been invented or modified to fill the needs
of JPEG data storage.
Unlike all of the other compression methods described so far in this chapter, JPEG is
not a single algorithm. Instead, it may be thought of as a toolkit of image compression
methods that may be altered to fit the needs of the user. JPEG may be adjusted to produce
very small, compressed images that are of relatively poor quality in appearance but still
suitable for many applications. Conversely, JPEG is capable of producing very high-quality
compressed images that are still far smaller than the original uncompressed data. JPEG is also
different in that it is primarily a lossy method of compression. Most popular image format
compression schemes, such as RLE, LZW, or the CCITT standards, are lossless compression
methods. That is, they do not discard any data during the encoding process. An image
compressed using a lossless method is guaranteed to be identical to the original image when
uncompressed.
Lossy schemes, on the other hand, throw useless data away during encoding. This is, in
fact, how lossy schemes manage to obtain superior compression ratios over most lossless
schemes. JPEG was designed specifically to discard information that the human eye cannot
easily see. Slight changes in color are not perceived well by the human eye, while slight
changes in intensity (light and dark) are. Therefore JPEG's lossy encoding tends to be more
frugal with the gray-scale part of an image and to be more frivolous with the color.
The fact that JPEG is lossy and works only on a select type of image data might make
you ask, "Why bother to use it?" It depends upon your needs. JPEG is an excellent way to
store 24-bit photographic images, such as those used in imaging and multimedia applications.
JPEG 24-bit (16 million color) images are superior in appearance to 8-bit (256 color) images
on a VGA display and are at their most spectacular when using 24-bit display hardware
(which is now quite inexpensive).
The amount of compression achieved depends upon the content of the image data. A
typical photographic-quality image may be compressed from 20:1 to 25:1 without
experiencing any noticeable degradation in quality. Higher compression ratios will result in
image files that differ noticeably from the original image but still have an overall good image
quality. And achieving a 20:1 or better compression ratio in many cases not only saves disk
space, but also reduces transmission time across data networks and phone lines. An end user
can "tune" the quality of a JPEG encoder using a parameter sometimes called a quality
setting or a Q factor. Although different implementations have varying scales of Q factors, a
range of 1 to 100 is typical. A factor of 1 produces the smallest, worst quality images; a factor
of 100 produces the largest, best quality images. The optimal Q factor depends on the image
content and is therefore different for every image. The art of JPEG compression is finding the
lowest Q factor that produces an image that is visibly acceptable, and preferably as close to
the original as possible.
6.2 Steps that has to be followed to find the optimal compression for an
image using the JPEG library
As we have said, JPEG doesn't fit every compression need. Images containing large
areas of a single color do not compress very well. In fact, JPEG will introduce
"artifacts" into such images that are visible against a flat background, making them
The JPEG specification defines a minimal subset of the standard called baseline
JPEG, which all JPEG-aware applications are required to support. This baseline uses an
encoding scheme based on the Discrete Cosine Transform (DCT) to achieve compression.
DCT is a generic name for a class of operations identified and published some years ago.
DCT-based algorithms have since made their way into various compression methods. DCT-
based encoding algorithms are always lossy by nature. DCT algorithms are capable of
achieving a high degree of compression with only minimal loss of data. This scheme is
effective only for compressing continuous-tone images in which the differences between
adjacent pixels are usually small. In practice, JPEG works well only on images with depths of
at least four or five bits per color channel. The baseline standard actually specifies eight bits
per input sample. Data of lesser bit depth can be handled by scaling it up to eight bits per
sample, but the results will be bad for low-bit-depth source data, because of the large jumps
between adjacent pixel values. For similar reasons, colormapped source data does not work
very well, especially if the image has been dithered.
The JPEG algorithm is capable of encoding images that use any type of color space.
JPEG itself encodes each component in a color model separately, and it is completely
independent of any color-space model, such as RGB, HSI, or CMY. The best compression
ratios result if a luminance/chrominance color space, such as YUV or YCbCr, is used. (See
Chapter 2 for a description of these color spaces.) Most of the visual information to which
human eyes are most sensitive is found in the high-frequency, gray-scale, luminance
component (Y) of the YCbCr color space. The other two chrominance components (Cb and
Cr) contain high-frequency color information to which the human eye is less sensitive. Most
of this information can therefore be discarded.
In comparison, the RGB, HSI, and CMY color models spread their useful visual image
information evenly across each of their three color components, making the selective
discarding of information very difficult. All three color components would need to be
encoded at the highest quality, resulting in a poorer compression ratio. Gray-scale images do
not have a color space as such and therefore do not require transforming.
The simplest way of exploiting the eye's lesser sensitivity to chrominance information is
simply to use fewer pixels for the chrominance channels. For example, in an image nominally
1000x1000 pixels, we might use a full 1000x1000 luminance pixels but only 500x500 pixels
for each chrominance component. In this representation, each chrominance pixel covers the
same area as a 2x2 block of luminance pixels. We store a total of six pixel values for each
2x2 block (four luminance values, one each for the two chrominance channels), rather than
the twelve values needed if each component is represented at full resolution. Remarkably,
this 50 percent reduction in data volume has almost no effect on the perceived quality of most
images. Equivalent savings are not possible with conventional color models such as RGB,
because in RGB each color channel carries some luminance information and so any loss of
resolution is quite visible.
When the uncompressed data is supplied in a conventional format (equal resolution for all
channels), a JPEG compressor must reduce the resolution of the chrominance channels by
downsampling, or averaging together groups of pixels. The JPEG standard allows several
different choices for the sampling ratios, or relative sizes, of the downsampled channels. The
luminance channel is always left at full resolution (1:1 sampling). Typically both
chrominance channels are downsampled 2:1 horizontally and either 1:1 or 2:1 vertically,
meaning that a chrominance pixel covers the same area as either a 2x1 or a 2x2 block of
luminance pixels. JPEG refers to these downsampling processes as 2h1v and 2h2v sampling,
respectively.
Another notation commonly used is 4:2:2 sampling for 2h1v and 4:2:0 sampling for 2h2v;
this notation derives from television customs (color transformation and downsampling have
been in use since the beginning of color TV transmission). 2h1v sampling is fairly common
because it corresponds to National Television Standards Committee (NTSC) standard TV
practice, but it offers less compression than 2h2v sampling, with hardly any gain in perceived
quality.
The image data is divided up into 8x8 blocks of pixels. (From this point on, each color
component is processed independently, so a "pixel" means a single value, even in a color
image.) A DCT is applied to each 8x8 block. DCT converts the spatial image representation
into a frequency map: the low-order or "DC" term represents the average value in the block,
while successive higher-order ("AC") terms represent the strength of more and more rapid
changes across the width or height of the block. The highest AC term represents the strength
of a cosine wave alternating from maximum to minimum at adjacent pixels.
The DCT calculation is fairly complex; in fact, this is the most costly step in JPEG
compression. The point of doing it is that we have now separated out the high- and low-
frequency information present in the image. We can discard high-frequency data easily
without losing low-frequency information. The DCT step itself is lossless except for round
off errors.
The human eye is good at seeing small differences in brightness over a relatively large
area, but not so good at distinguishing the exact strength of a high frequency brightness
variation. This allows one to greatly reduce the amount of information in the high frequency
components. This is done by simply dividing each component in the frequency domain by a
constant for that component, and then rounding to the nearest integer. This is the main lossy
operation in the whole process. As a result of this, it is typically the case that many of the
higher frequency components are rounded to zero, and many of the rest become small
positive or negative numbers, which take many fewer bits to store.
What we have examined thus far is only the baseline specification for JPEG. A
number of extensions have been defined in Part 1 of the JPEG specification that provide
progressive image buildup, improved compression ratios using arithmetic encoding, and a
lossless compression scheme. These features are beyond the needs of most JPEG
implementations and have therefore been defined as "not required to be supported"
extensions to the JPEG standard.
Progressive image buildup is an extension for use in applications that need to receive
JPEG data streams and display them on the fly. A baseline JPEG image can be displayed only
after all of the image data has been received and decoded. But some applications require that
the image be displayed after only some of the data is received. Using a conventional
compression method, this means displaying the first few scan lines of the image as it is
decoded. In this case, even if the scan lines were interlaced, you would need at least 50
percent of the image data to get a good clue as to the content of the image. The progressive
buildup extension of JPEG offers a better solution.
Progressive buildup allows an image to be sent in layers rather than scan lines. But
instead of transmitting each bitplane or color channel in sequence (which wouldn't be very
useful), a succession of images built up from approximations of the original image are sent.
The first scan provides a low-accuracy representation of the entire image--in effect, a very
low-quality JPEG compressed image. Subsequent scans gradually refine the image by
increasing the effective quality factor. If the data is displayed on the fly, you would first see a
crude, but recognizable, rendering of the whole image. This would appear very quickly
because only a small amount of data would need to be transmitted to produce it. Each
subsequent scan would improve the displayed image's quality one block at a time.
A limitation of progressive JPEG is that each scan takes essentially a full JPEG
decompression cycle to display. Therefore, with typical data transmission rates, a very fast
JPEG decoder (probably specialized hardware) would be needed to make effective use of
progressive transmission. A related JPEG extension provides for hierarchical storage of the
same image at multiple resolutions. For example, an image might be stored at 250x250,
500x500, 1000x1000, and 2000x2000 pixels, so that the same image file could support
display on low-resolution screens, medium-resolution laser printers, and high-resolution
HKBKCE Bangalore 34 Department of EC
Image Processing And Compression Techniques
imagesetters. The higher-resolution images are stored as differences from the lower-
resolution ones, so they need less space than they would need if they were stored
independently. This is not the same as a progressive series, because each image is available in
its own right at the full desired quality.
The baseline JPEG standard defines Huffman compression as the final step in the
encoding process. A JPEG extension replaces the Huffman engine with a binary arithmetic
entropy encoder. The use of an arithmetic coder reduces the resulting size of the JPEG data
by a further 10 percent to 15 percent over the results that would be achieved by the Huffman
coder. With no change in resulting image quality, this gain could be of importance in
implementations where enormous quantities of JPEG images are archived.
Not all JPEG decoders support arithmetic decoding. Baseline JPEG decoders are
required to support only the Huffman algorithm.
The arithmetic algorithm is slower in both encoding and decoding than Huffman.
The arithmetic coder used by JPEG (called a Q-coder) is owned by IBM and AT&T.
(Mitsubishi also holds patents on arithmetic coding.) You must obtain a license from
the appropriate vendors if their Q-coders are to be used as the back end of your JPEG
implementation.
A question that commonly arises is "At what Q factor does JPEG become lossless?"
The answer is "never." Baseline JPEG is a lossy method of compression regardless of
adjustments you may make in the parameters. In fact, DCT-based encoders are always lossy,
because roundoff errors are inevitable in the color conversion and DCT steps. You can
suppress deliberate information loss in the downsampling and quantization steps, but you still
won't get an exact recreation of the original bits. Further, this minimum-loss setting is a very
inefficient way to use lossy JPEG.
The JPEG standard does offer a separate lossless mode. This mode has nothing in
common with the regular DCT-based algorithms, and it is currently implemented only in a
few commercial applications. JPEG lossless is a form of Predictive Lossless Coding using a
2D Differential Pulse Code Modulation (DPCM) scheme. The basic premise is that the value
of a pixel is combined with the values of up to three neighboring pixels to form a predictor
value. The predictor value is then subtracted from the original pixel value. When the entire
bitmap has been processed, the resulting predictors are compressed using either the Huffman
or the binary arithmetic entropy encoding methods described in the JPEG standard. Lossless
JPEG works on images with 2 to 16 bits per pixel, but performs best on images with 6 or
more bits per pixel. For such images, the typical compression ratio achieved is 2:1. For image
data with fewer bits per pixels, other compression schemes do perform better.
The following JPEG extensions are described in Part 3 of the JPEG specification.
Variable quantization allows the scaling of quantization values within the compressed
data stream. At the start of each 8x8 block is a quantizer scale factor used to scale the
quantization table values within an image component and to match these values with the AC
coefficients stored in the compressed data. Quantization values may then be located and
changed as needed. Variable quantization allows the characteristics of an image to be
changed to control the quality of the output based on a given model. The variable quantizer
can constantly adjust during decoding to provide optimal output. The amount of output data
can also be decreased or increased by raising or lowering the quantizer scale factor. The
maximum size of the resulting JPEG file or data stream may be imposed by constant adaptive
adjustments made by the variable quantizer.
The variable quantization extension also allows JPEG to store image data originally
encoded using a variable quantization scheme, such as MPEG. For MPEG data to be
accurately transcoded into another format, the other format must support variable
quantization to maintain a high compression ratio. This extension allows JPEG to support a
data stream originally derived from a variably quantized source, such as an MPEG I-frame.
Selective refinement is used to select a region of an image for further enhancement. This
enhancement improves the resolution and detail of a region of an image. JPEG supports three
types of selective refinement: hierarchical, progressive, and component. Each of these
refinement processes differs in its application, effectiveness, complexity, and amount of
memory required.
Tiling is used to divide a single image into two or more smaller subimages. Tiling allows
easier buffering of the image data in memory, quicker random access of the image data on
disk, and the storage of images larger than 64Kx64K samples in size. JPEG supports three
types of tiling: simple, pyramidal, and composite.
Simple tiling divides an image into two or more fixed-size tiles. All simple tiles are
coded from left to right and from top to bottom and are contiguous and non-
overlapping. All tiles must have the same number of samples and component
HKBKCE Bangalore 37 Department of EC
Image Processing And Compression Techniques
identifiers and must be encoded using the same processes. Tiles on the bottom and
right of the image may smaller than the designated size of the image dimensions and
will therefore not be a multiple of the tile size.
Pyramidal tiling also divides the image into tiles, but each tile is also tiled using
several different levels of resolution. The model of this process is the JPEG Tiled
Image Pyramid (JTIP), which is a model of how to create a multi-resolution
pyramidal JPEG image.
SPIFF is an officially sanctioned JPEG file format that is intended to replace the
defacto JFIF (JPEG File Interchange Format) format in use today. SPIFF includes all of the
features of JFIF and adds quite a bit more functionality. SPIFF is designed so that properly
written JFIF readers will read SPIFF-JPEG files as well.
Other JPEG extensions include the addition of a version marker segment that stores
the minimum level of functionality required to decode the JPEG data stream. Multiple
version markers may be included to mark areas of the data stream that have differing
Next, each component (Y, Cb, Cr) of each 8×8 block is converted to a frequency-
domain representation, using a normalized, two-dimensional type-II discrete cosine
transform (DCT).
Before computing the DCT of the subimage, its gray values are shifted from a positive range
to one centered around zero. For an 8-bit image each pixel has 256 possible values: [0,255].
To center around zero it is necessary to subtract by half the number of possible values, or
128.
Subtracting 128 from each pixel value yields pixel values on [ − 128,127]
The next step is to take the two-dimensional DCT, which is given by:
(1)
where
is a normalizing function
If we perform this transformation on our matrix above, and then round to the nearest
integer, we get
Note the rather large value of the top-left corner. This is the DC coefficient. The
remaining 63 coefficients are called the AC coefficients. The advantage of the DCT is its
tendency to aggregate most of the signal in one corner of the result, as may be seen above.
The quantization step to follow accentuates this effect while simultaneously reducing the
overall size of the DCT coefficients, resulting in a signal that is easy to compress efficiently
in the entropy stage.
The DCT temporarily increases the bit-depth of the image, since the DCT coefficients of
an 8-bit/component image take up to 11 or more bits (depending on fidelity of the DCT
calculation) to store. This may force the codec to temporarily use 16-bit bins to hold these
coefficients, doubling the size of the image representation at this point; they are typically
reduced back to 8-bit values by the quantization step. The temporary increase in size at this
stage is not a performance concern for most JPEG implementations, because typically only a
very small part of the image is stored in full DCT form at any given time during the image
encoding or decoding process.
Quantization
(2)
where G is the unquantized DCT coefficients; Q is the quantization matrix above; and B is
the quantized DCT coefficients.
Using this quantization matrix with the DCT coefficient matrix from above results in:
For example, using −415 (the DC coefficient) and rounding to the nearest integer
(3)
Entropy coding
The zigzag sequence for the above quantized coefficients are shown below. (The
format shown is just for ease of understanding/viewing.)
−26
−3 0
−3 −2 −6
2 −4 1 −4
1 1 5 1 2
−1 1 −1 2 0 0
0 0 0 −1 −1 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0
0 0 0 0
0 0 0
0 0
If the i-th block is represented by Bi and positions within each block are represented
by (p,q) where p = 0, 1, ..., 7 and q = 0, 1, ..., 7, then any coefficient in the DCT image can be
represented as Bi(p,q). Thus, in the above scheme, the order of encoding pixels (for the i-th
block) is Bi(0,0), Bi(0,1), Bi(1,0), Bi(2,0), Bi(1,1), Bi(0,2), Bi(0,3), Bi(1,2) and so on.
DECODING
Decoding to display the image consists of doing all the above in reverse.
Taking the DCT coefficient matrix (after adding the difference of the DC coefficient back in)
and taking the entry-for-entry product with the quantization matrix from above results in
which closely resembles the original DCT coefficient matrix for the top-left portion. Taking
the inverse DCT (type-III DCT) results in an image with values (still shifted down by 128)
FIG 6.5
FIG 6.6
Notice the slight differences between the original (top) and decompressed image (bottom),
which is most readily seen in the bottom-left corner.
This is the uncompressed subimage and can be compared to the original subimage (also see
images to the right) by taking the difference (original − uncompressed) results in error values
(4)
The error is most noticeable in the bottom-left corner where the bottom-left pixel becomes
darker than the pixel to its immediate right.
The JPEG encoding does not fix the precision needed for the output compressed image.
On the contrary, the JPEG standard (as well as the derived MPEG standards) have very strict
precision requirements for the decoding, including all parts of the decoding process (variable
length decoding, inverse DCT, dequantization, renormalization of outputs); the output from
the reference algorithm must not exceed:
These assertions are tested on a large set of randomized input images, to handle the worst
cases. L. This has a consequence on the implementation of decoders, and it is extremely
critical because some encoding processes (notably used for encoding sequences of images
like MPEG) need to be able to construct, on the encoder side, a reference decoded image. In
order to support 8-bit precision per pixel component output, dequantization and inverse DCT
transforms are typically implemented with at least 14-bit precision in optimized decoders.
JPEG compression artifacts blend well into photographs with detailed non-uniform
textures, allowing higher compression ratios. Notice how a higher compression ratio first
affects the high-frequency textures in the upper-left corner of the image, and how the
contrasting lines become more fuzzy. The very high compression ratio severely affects the
quality of the image, although the overall colors and image form are still recognizable.
However, the precision of colors suffer less (for a human eye) than the precision of contours
(based on luminance). This justifies the fact that images should be first transformed in a color
model separating the luminance from the chromatic information, before subsampling the
chromatic planes (which may also use lower quality quantization) in order to preserve the
precision of the luminance plane with more information bits.
For information, the uncompressed 24-bit RGB bitmap image below (73,242 pixels)
would require 219,726 bytes (excluding all other information headers). The filesizes
indicated below include the internal JPEG information headers and some meta-data. For full
quality images (Q=100), about 8.25 bits per color pixel is required. On grayscale images, a
minimum of 6.5 bits per pixel is enough (a comparable Q=100 quality color information
requires about 25% more encoded bits). The full quality image below (Q=100) is encoded at
9 bits per color pixel, the medium quality image (Q=25) uses 1 bit per color pixel. For most
applications, the quality factor should not go below 0.75 bit per pixel (Q=12.5), as
demonstrated by the low quality image. The image at lowest quality uses only 0.13 bit per
pixel, and displays very poor color, it could only be usable after subsampling to a much lower
display size.
Size
Compressi
Image Quality (byte Comment
on Ratio
s)
Full
quality 83,26 Extremely
2.6:1
(Q = 10 1 minor artifacts
0)
Averag
e Initial signs of
15,13
quality 15:1 subimage
8
(Q = 50 artifacts
)
Mediu Stronger
m artifacts; loss
quality 9,553 23:1 of high
(Q = 25 resolution
) information
Severe high
frequency loss;
Low artifacts on
quality subimage
4,787 46:1
(Q = 10 boundaries
) ("macroblocki
ng") are
obvious
Extreme loss
of color and
Lowest
detail; the
quality 1,523 144:1
leaves are
(Q = 1)
nearly
unrecognizable
The medium quality photo uses only 4.3% of the storage space but has little
noticeable loss of detail or visible artifacts. However, once a certain threshold of compression
is passed, compressed images show increasingly visible defects. See the article on rate
distortion theory for a mathematical explanation of this threshold effect.
From 2004 to 2008, new research has emerged on ways to further compress the data
contained in JPEG images without modifying the represented image. This has applications in
scenarios where the original image is only available in JPEG format, and its size needs to be
reduced for archival or transmission. Standard general-purpose compression tools cannot
significantly compress JPEG files.Typically, such schemes take advantage of improvements
to the naive scheme for coding DCT coefficients, which fails to take into account:
Some standard but rarely-used options already exist in JPEG to improve the efficiency of
coding DCT coefficients: the arithmetic coding option, and the progressive coding option
(which produces lower bitrates because values for each coefficient are coded independently,
and each coefficient has a significantly different distribution). Modern methods have
improved on these techniques by reordering coefficients to group coefficients of larger
magnitude together; using adjacent coefficients and blocks to predict new coefficient values;
dividing blocks or coefficients up among a small number of independently coded models
based on their statistics and adjacent values; and most recently, by decoding blocks,
predicting subsequent blocks in the spatial domain, and then encoding these to generate
predictions for DCT coefficients.
Typically, such methods can compress existing JPEG files between 15 and 25 percent,
and for JPEGs compressed at low-quality settings, can produce improvements of up to 65%.
CHAPTER 7
VIDEO COMPRESSION
7.2 MPEG
HKBKCE Bangalore 53 Department of EC
Image Processing And Compression Techniques
Moving Picture Experts Group (MPEG) was formed by the ISO to set standards for
audio and video compression and transmission.[1] It was established in 1988 and its first
meeting was in May 1988 in Ottawa, Canada.[2][3][4] As of late 2005, MPEG has grown to
include approximately 350 members per meeting from various industries, universities, and
research institutions
FIG 8. 1
MPEG also standardizes the protocol and syntax under which it is possible to
combine or multiplex audio data with video data to produce a digital equivalent of a
television program. Many such programs can be multiplexed and MPEG defines the way in
which such multiplexes can be created and transported. The definitions include the metadata
used by decoders to demultiplex correctly.MPEG uses an asymmetric compression method.
Compression under MPEG is far more complicated than decompression, making MPEG a
good choice for applications that need to write data only once, but need to read it many times.
An example of such an application is an archiving system. Systems that require audio and
video data to be written many times, such as an editing system, are not good choices for
MPEG; they will run more slowly when using the MPEG compression scheme.
MPEG uses two types of compression methods to encode video data: interframe and
intraframe encoding. Interframe encoding is based upon both predictive coding and
interpolative coding techniques, as described below.
When capturing frames at a rapid rate (typically 30 frames/second for real time video)
there will be a lot of identical data contained in any two or more adjacent frames. If a motion
compression method is aware of this "temporal redundancy," as many audio and video
compression methods are, then it need not encode the entire frame of data, as is done via
intraframe encoding. Instead, only the differences (deltas) in information between the frames
is encoded. This results in greater compression ratios, with far less data needing to be
encoded. This type of interframe encoding is called predictive encoding.
To support both interframe and intraframe encoding, an MPEG data stream contains three
types of coded frames:
An I-frame contains a single frame of video data that does not rely on the information in
any other frame to be encoded or decoded. Each MPEG data stream starts with an I-frame.
A P-frame is constructed by predicting the difference between the current frame and closest
preceding I- or P-frame. A B-frame is constructed from the two closest I- or P-frames. The B-
frame must be positioned between these I- or P-frames.
HKBKCE Bangalore 55 Department of EC
Image Processing And Compression Techniques
IBBPBBPBBPBBIBBPBBPBBPBBI
In theory, the number of B-frames that may occur between any two I- and P-frames is
unlimited. In practice, however, there are typically twelve P- and B-frames occurring
between each I-frame. One I-frame will occur approximately every 0.4 seconds of video
runtime.Remember that the MPEG data is not decoded and displayed in the order that the
frames appear within the stream. Because B-frames rely on two reference frames for
prediction, both reference frames need to be decoded first from the bitstream, even though the
display order may have a B-frame in between the two reference frames.
In the previous example, the I-frame is decoded first. But, before the two B-frames
can be decoded, the P-frame must be decoded, and stored in memory with the I-frame. Only
then may the two B-frames be decoded from the information found in the decoded I- and P-
frames. Assume, in this example, that you are at the start of the MPEG data stream. The first
ten frames are stored in the sequence IBBPBBPBBP (0123456789), but are decoded in the
sequence:
IPBBPBBPBB (0312645978)
IBBPBBPBBP (0123456789)
In practice, the sizes of the frames tend to be 150 Kbits for I-frames, around 50 Kbits
for P-frames, and 20 Kbits for B-frames. The video data rate is typically constrained to 1.15
Mbits/second, the standard for DATs and CD-ROMs.
The MPEG standard does not mandate the use of P- and B-frames. Many MPEG
encoders avoid the extra overhead of B- and P-frames by encoding I-frames. Each video
frame is captured, compressed, and stored in its entirety, in a similar way to Motion JPEG. I-
frames are very similar to JPEG-encoded frames. In fact, the JPEG Committee has plans to
add MPEG I-frame methods to an enhanced version of JPEG, possibly to be known as JPEG-
II.
There are also some disadvantages to this scheme. The compression ratio of an I-
frame-only MPEG file will be lower than the same MPEG file using motion compensation. A
one-minute file consisting of 1800 frames would be approximately 2.5Mb in size. The same
file encoded using B- and P-frames would be considerably smaller, depending upon the
content of the video data. Also, this scheme of MPEG encoding might decompress more
slowly on applications that allocate an insufficient amount of buffer space to handle a
constant stream of I-frame
7.2.1 Standards
The MPEG standards consist of different Parts. Each part covers a certain aspect of the
whole specification.[11] The standards also specify Profiles and Levels. Profiles are intended
to define a set of tools that are available, and Levels define the range of appropriate values for
the properties associated with them.[12] Some of the approved MPEG standards were revised
by later amendments and/or new editions. MPEG has standardized the following compression
formats and ancillary standards:
MPEG-1 (1993): Coding of moving pictures and associated audio for digital
storage media at up to about 1,5 Mbit/s (ISO/IEC 11172). The first MPEG
compression standard for audio and video. It was basically designed to allow moving
pictures and sound to be encoded into the bitrate of a Compact Disc. It is used on
Video CD, SVCD and can be used for low-quality video on DVD Video. It was used
in digital satellite/cable TV services before MPEG-2 became widespread.
To meet the low bit requirement, MPEG-1 downsamples the images, as well as uses
picture rates of only 24–30 Hz, resulting in a moderate quality.[13] It includes the
popular Layer 3 (MP3) audio compression format
Some people are confused about the relationship between MPEG and JPEG. The
MPEG and JPEG (Joint Photographic Experts Group) committees of the ISO originally
started as the same group, but with two different purposes. JPEG focused exclusively on still-
image compression, while MPEG focused on the encoding/synchronization of audio and
video signals within a single data stream. Although MPEG employs a method of spatial data
compression similar to that used for JPEG, they are not the same standard nor were they
designed for the same purpose.
Another acronym you may hear is MJPEG (Motion JPEG). Several companies have
come out with an alternative to MPEG--a simpler solution (but not yet a standard) for how to
store motion video. This solution, called Motion JPEG, simply uses a digital video capture
device to sample a video signal, to capture frames, and to compress each frame in its entirety
using the JPEG compression method. A Motion JPEG data stream is then played back by
decompressing and displaying each individual frame. A standard audio compression method
is usually included in the Motion JPEG data stream.
On average, the temporal compression method used by MPEG provides a compression ratio
three times that of JPEG for the same perceived picture quality.
CHAPTER 8
Application of compression
Are in broadcast TV
Radar
Teleconferencing
Motion pictures ,
Satellite images,
Weather maps,
Geological surveys
CHAPTER 9
CONCLUSION
Using image processing techniques, we can sharpening the images, contrast to make a
graphic display more useful for display, reduce amount of memory requirement for storing
image information,etc., due to such techniques, image processing is applied in
“recognition of images” as in factory floor quality assurance systems; “image
enhancement”, as in satellite reconnaissance systems; “image synthesis” as in law
enforcement suspect identification systems, and “image construction” as in plastic surgery
design systems. Application of compression is in broadcast TV, remote sensing
via satellite, military communication via aircraft, radar, teleconferencing, facsimile
transmission etc.