0% found this document useful (0 votes)
3 views16 pages

Notes Module2

The document discusses various methods of text and image compression, categorizing them into lossless and lossy techniques. It covers specific algorithms such as Huffman coding, Lempel-Ziv coding, and Discrete Cosine Transform (DCT), explaining their principles and applications in reducing data size for efficient transmission. Additionally, it highlights the importance of entropy encoding and the structure of JPEG compression, including quantization and the role of frequency components in achieving compression ratios.

Uploaded by

Afreed Shahid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views16 pages

Notes Module2

The document discusses various methods of text and image compression, categorizing them into lossless and lossy techniques. It covers specific algorithms such as Huffman coding, Lempel-Ziv coding, and Discrete Cosine Transform (DCT), explaining their principles and applications in reducing data size for efficient transmission. Additionally, it highlights the importance of entropy encoding and the structure of JPEG compression, including quantization and the role of frequency components in achieving compression ratios.

Uploaded by

Afreed Shahid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

MODULE – 2

Text and Image Compression

Compression Principles
•Compressions algorithms can be classified as being either lossless(to reduce the amount of source information to
be transmitted with no loss of information) –e.g transfer of text file over the network or

•lossy(reproduced a version perceived by the recipient as a true copy) –e.g digitized images, audio and video
streams.
Entropy Encoding -Run-length encoding -Lossless
•Examples of run-length encoding are when the source information comprises long substringsof the same character
or binary digit

•In this the source string is transmitted as a different set of codewords which indicates only the character but also
the number of bits in the substring

•providing the destination knows the set of codewords being used, it simply interprets each codeword received and
outputs the appropriate number of characters/bits
e.g. output from a scanner in a Fax Machine 000000011111111110000011 will be represented as 0,7 1,10 0,5 1,2

Entropy Encoding –statistical encoding


•A set of ASCII code words are often used for the transmission of strings of characters
•However, the symbols and hence the code words in the source information does not occur with the same
frequency. E.g A may occur more frequently than P which may occur more frequently than Q
•The statistical coding uses this property by using a set of variable length code words –the shortest being the
one representing the most frequently appearing symbol.

Differential encoding
•Uses smaller codewords to represent the difference signals. Can be lossy or lossless
•This type of coding is used where the amplitude of a signal covers a large range but the difference between
successive values is small
•Instead of using large codewords a set of smaller code words representing only the difference in amplitude is
used
•For example if the digitization of the analog signal requires 12 bits and the difference signal only requires 3
bits then there is a saving of 75% on transmission bandwidth
Transform encodinginvolves transforming the source information from one form into another, the other form
lending itself more readily to the application of compression
Transform Encoding
•As we scan across a set of pixel locations the rate of change in magnitude will vary from zero if all the pixel
values remain the same to a low rate of change if say one half is different from the next half, through to a high
rate of change if each pixel changes magnitude from one location to the next
•The rate of change in magnitude as one traverses the matrix gives rise to a term known as the ‘spatial
frequency’
•Hence by identifying and eliminating the higher frequency components the volume of the information
transmitted can be

Transform coding: DCT transform principles

Discrete Cosine Transformation is used to transform a two-dimensional matrix of pixel values into an
equivalent matrix of ‘spatial frequency components (coefficients)
•At this point any frequency components with amplitudes below the threshold values can be dropped (lossy)
Huffman Algorithm
Method of construction for an encoding tree
•Full Binary Tree Representation

•Each edge of the tree has a value,

(0 is the left child, 1 is the right child)


•Data is at the leaves, not internal nodes

•Result: encoding tree

•“Variable-Length Encoding”

•1. Maintain a forest of trees

•2. Weight of tree = sum frequency of leaves

•3. For 0 to N-1


–Select two smallest weight trees

–Form a new tree

variable length code whose length is inversely proportional to that character’s frequency

•must satisfy nonprefix property to be uniquely decodable

•two pass algorithm


–first pass accumulates the character frequency and generate codebook

–second pass does compression with the codebook

create codes by constructing a binary tree


1. consider all characters as free nodes
2. assign two free nodes with lowest frequency to a parent nodes with weights equal to sum of their frequencies
3. remove the two free nodes and add the newly created parent node to the list of free nodes
4. repeat step2 and 3 until there is one free node
Right of binary tree :1

•Left of Binary tree :0

•Prefix (example)
–e:”01”, b: “010”

–“01”is prefix of “010”==> “e0”


•same frequency : need consistency of left or right

Example(64 data)
•RKKKKKKK
•KKKRRKKK
•KKRRRR GG
•KKBCCCRR
•GGGMCBRR
•BBBMYBBR
•GGGGGGGR
•GRRRRGRR

Static Huffman Coding


Huffman (Code) Tree
Given : a number of symbols (or characters) and their relative probabilities in prior
Must hold “prefix property” among codes

Text Compression –Flow chart of a suitable decoding algorithm


Decoding of received bitstream assuming codewords derived: decoding algorithm.

The algorithm assumes a table of codewords is available at the receiverand this also holds the corresponding
ASCII codeword
Text Compression –Lampel-Ziv coding
•The LZ algorithm uses stringsof characters instead of single characters
•For example for text transfer, a table containing all possible character strings are present in the encoder and the
decoder
•As each word appears instead of sending the ASCII code, the encoder sends only theindexof the word in the
table
•This index value will be used by the decoder to reconstruct the text into its original form. This algorithm is also
known as a dictionary-based compression
The principle of the Lempel-Ziv-Welshcoding algorithm is for the encoder and decoder to build the contents of
the dictionary dynamicallyas the text is being transferred

•Initially the decoder has only the character set –e.g ASCII. The remaining entries in the dictionary are
builtdynamicallyby the encoder and decoder

Initially the encoder sends the index of the four characters T, H, I, S and sends the space character which will
be detected as a non alphanumeric character

•It therefore transmits the character using its index as before but in addition interprets it as terminating the first
word and this will be stored in the next free location in the dictionary

•Similar procedure is followed by both the encoder and decoder

•In applications with 128 characters initially the dictionary will start with 8 bits and 256 entries 128 for the
characters and the rest 128 for the words.
A key issue in determining the level of compression that is achieved, is the number of entriesin the
dictionary since this determines the dictionary since this determines the number of bitsthat are required for
the index
Although colour images comprising 24-bit pixels are supported GIF reduces the number of possible colours that
are present by choosing 256 entries from the original set of 224colours that match closely to the original image

•Hence instead of sending as 24-bit colour values only 8-bit index to the table entry that contains the closest
match to the original is sent.This results in a 3:1 compression ratio
•The contents of the table are sent in addition to the screen size and aspect ratio information
•The image can also be transferred over the network using the interlaced mode
GIF also allows an image to be stored and subsequently transferred over the network in an interlaced mode;
useful over either low bit rate channels or the Internet which provides a variable transmission rate

The compression image data is organized so that the decompressed image is built up in a progressive way as the
data arrives
Since FAX machines are used with public carrier networks, the ITU-T has produced standards relating to them
•These are T2(Group1), T3 (Group2), T4 (Group3) (PSTN), and T6 (Group 4) (ISDN)
•Both use data compression ratio in the range of 10:1
•The resulting codewords are grouped into termination-codes table(white or black run-lengths from 0 to 63 pels
in steps of 1) and the make-up codes table(contains in multiples of 64 pels)
•Since this codeword uses two sets of codeword it is known as Huffman encoding.

MMR coding (2 dimensional coding)


•The modified-modified relative element address designatecoding explores the fact that most scanned lines differ
from the previous line by only a few pels
•E.g. if a line contains a black-run then the next line will normally contain the same run pels plus or minus 3 pels
•In MMR the run-lengths associated with a line are identified by comparing the line contents, known as the
coding line (CL), relative to the immediately preceding line known as the reference line (RL)
•The run lengths associated with a coding line are classified into three groups relative to the reference line Image
Compression –run-length possibilities: pass mode (a), vertical mode
•This is the case when the run-length in the reference line (b1b2) overlaps the next run-lengthin the coding
line(a1a2) by a maximum of plus or minus 3 pels
This is the case when the run-length in thereference line (b1b2) overlaps the run-length (a1a2) by more than plus
or minus 3 pels
Image Compression –JPEG encoder schematic

The Joint Photographic Experts Group forms the basis of most video compression algorithms

Source image is made up of one or more 2-D matrices of values

•2-D matrix is required to store the required set of 8-bit grey-level values that represent the image
•For the colour image if a CLUT is used then a single matrix of values is required
•If the image is represented in R, G, B format then three matrices are required
•If the Y, Cr, Cbformat is used then the matrix size for the chrominance components is smaller than the Y
matrix ( Reduced representation) The Joint Photographic Experts Group forms the basis of most video
compression algorithms

Source image is made up of one or more 2-D matrices of values

•2-D matrix is required to store the required set of 8-bit grey-level values that represent the image
•For the colour image if a CLUT is used then a single matrix of values is required
•If the image is represented in R, G, B format then three matrices are required
•If the Y, Cr, Cbformat is used then the matrix size for the chrominance components is smaller than the Y
matrix ( Reduced representation)

•Once the image format is selected then the values in each matrix are compressed separately using the DCT
•In order to make the transformation more efficient a second step known as block preparationis carried out
before DCT
•In block preparation each global matrix is divided into a set of smaller 8X8 submatrices (block) which are fed
sequentially to the DCT
Once the source image format has been selected and prepared (four alternative forms of representation), the set
values in each matrix are compressed separately using the DCT)

Each pixel value is quantized using 8 bits which produces a value in the range 0 to 255 for the R, G, B or Y and
a value in the range –128 to 127 for the two chrominance values Cband Cr

•If the input matrixis P[x,y]and the transformed matrixis F[i,j]then the DCT for the 8X8 block is computed
using the expression:

All 64 values in the input matrix P[x,y]contribute to each entry in the transformed matrix F[i,j]

•For i = j = 0the two cosine terms are 0 and hence the value in the location F[0,0]of the transformed matrix is
simply a function of the summation of all the values in the input matrix

•This is the mean of all 64 values in the matrix and is known as the DC coefficient
•Since the values in all the other locations of the transformed matrix have a frequency coefficient associated
with them they are known as AC coefficients

for j = 0only the horizontal frequency coefficients are present


•for i = 0only the vertical frequency components are present
•For all the other locations both the horizontal and vertical frequency coefficients are present

Using DCT there is very little loss of information during the DCT phase
•The losses are due to the use of fixed point arithmetic
•The main source of information loss occurs during the quantization and entropy encoding stages where the
compression takes place
•The human eye responds primarily to the DC coefficient and the lower frequency coefficients (The higher
frequency coefficients below a certain threshold will not be detected by the human eye)
•This property is exploited by dropping the spatial frequency coefficients in the transformed matrix (dropped
coefficients cannot be retrieved during decoding)

In addition to classifying the spatial frequency components the quantization process aims to reduce the size of
the DC and AC coefficients so that less bandwidth is required for their transmission (by using a divisor)
•The sensitivity of the eye varies with spatial frequency and hence the amplitude threshold below which the eye
will detect a particular frequency also varies
•The threshold values vary for each of the 64 DCT coefficients and these are held in a 2-D matrix known as the
quantization table with the threshold value to be used with a particular DCT coefficient in the corresponding
position in the matrix

The choice of threshold value is a compromise between the level of compression that is required and the
resulting amount of information loss that is acceptable
•JPEG standard has two quantization tables for the luminance and the chrominance coefficients. However,
customized tables are allowed and can be sent with the compressed image

From the quantization tableand the DCT and quantizationcoefficents number of observations can be made:
-The computation of the quantized coefficients involves rounding the quotients to the nearest integer value
-The threshold values used increase in magnitude with increasing spatial frequency
-The DC coefficient in the transformed matrix is largest
-Many of the higher frequency coefficients are zero
Entropy encoding consists of four stages
Vectoring–The entropy encoding operates on a one-dimensional string of values (vector). However the output of
the quantization is a 2-D matrix and hence this has to be represented in a 1-D form. This is known as vectoring
Differential encoding–In this section only the difference in magnitude of the DC coefficient in a quantized block
relative to the value in the preceding block is encoded. This will reduce the number of bits required to encode the
relatively large magnitude
The difference values are then encoded in the form (SSS, value) SSS indicates the number of bits needed and
actual bits that represent the value
e.g: if the sequence of DC coefficients in consecutive quantized blocks was: 12, 13, 11, 11, 10, ---the difference
values will be 12, 1, -2, 0, -1

The remaining 63 values in the vector are the AC coefficients


•Because of the large number of 0’s in the AC coefficients they are encoded as string of pairs of values
•Each pair is made up of (skip, value)where skipis the number of zeros in the run and valueis the next non-zero
coefficient

•The above will be encoded as


(0,6) (0,7) (0,3)(0,3)(0,3) (0,2)(0,2)(0,2)(0,2)(0,0)
Final pair indicates the end of the string for this block
Significant levels of compression can be obtained by replacing long strings of binary digits by a string of much
shorter codewords
•The length of each codeword is a function of its relative frequency of occurrence
•Normally, a table of codewords is used with the set of codewords precomputed using the Huffman coding
algorithm

In order for the remote computer to interpret all the different fields and tables that make up the bitstream it is
necessary to delimit each field and set of table values in a defined way
•The JPEG standard includes a definition of the structure of the total bitstream relating to a particular
image/picture. This is known as a frame
•The role of the frame builder is to encapsulateall the information relating to an encoded image/picture

At the top level the complete frame-plus-header is encapsulated between a start-of-frame and an end-of-frame
delimiter which allows the receiver to determine the start and end of all the information relating to a complete
image
•The frame header contains a number of fields
-the overall width and height of the image in pixels
-the number and type of components (CLUT, R/G/B, Y/Cb/Cr)
-the digitization format used (4:2:2, 4:2:0 etc.)

At the next level a frame consists of a number of components each of which is known as a scan
The level two header contains fields that include:
-the identity of the components
-the number of bits used to digitize each component
-the quantization table of values that have been used to encode each component
•Each scan comprises one or more segmentseach of which can contain a group of (8X8)blocks preceded by a
header
•This contains the set of Huffman codewords for each block
Image Compression –JPEG encoder
The values are first centred around zero by substracting 128 from each intensity/luminance value
Block preparation is necessary since computing the transformed value for each position in a matrix requires the
values in all the locations to be processed
In order to exploit the presence of the large number of zeros in the quantized matrix, a zig-zag of the matrix is
used
Image Compression –JPEG decoder

A JPEG decoder is made up of a number of stages which are simply the corresponding decoder sections of
those used in the encoder

The JPEG decoder is made up of a number of stages which are the corresponding decoder sections of those
used in the encoder
•The frame decoder first identifies the encoded bitstream and its associated control information and tables
within the various headers
•It then loads the contents of each table into the related table and passes the control information to the image
builder
•Then the Huffman decoder carries out the decompression operation using preloaded or the default tables of
codewords

The two decompressed streams containing the DC and AC coefficients of each block are then passed to the
differential and run-length decoders
•The resulting matrix of values is then dequantized using either the default or the preloaded values in the
quantization table
•Each resulting block of 8X8 spatial frequency coefficient is passed in turn to the inverse DCT which in turn
transforms it back to their spatial form
•The image builder then reconstructs the image from these blocks using the control information passed to it by
the frame decoder
Although complex using JPEG compression ratios of 20:1
can be obtained while still retaining a good quality image
• This level (20:1) is applied for images with few colour
transitions
• For more complicated images compression ratios of 10:1
are more common
• Like GIF images it is possible to encode and rebuild the
image in a progressive manner. This can be achieved by two
different modes – progressive mode and hierarchical mode Progressive mode – First the DC and low-
frequency
coefficients of each block are sent and then the highfrequency
coefficients
• hierarchial mode – in this mode, the total image is first
sent using a low resolution – e.g 320 X 240 and then at a
higher resolution 640 X 480.

MODULE -3
Audio and Video Compression

Both audio and most video signals are continuously varying analog signals

The compression algorithms associated with digitized audio and video are different from close
Audio compress
Pulse code modulation(PCM)

Bandlimited signal

The bandwidth of the communication channels that are available dictate rates that are less than these.This can
be achieved in one of two ways:
Audio signal is sampled at a lower rate

A compression algorithm is used


4.2.1 Differential pulse code modulation
DPCM is a derivative of standard PCM and exploits the fact that,for most audio signals, the range of the
differences in amplitude between successive samples of the audio waveform is less than the range of the actual
sample amplitudes.

Figure4.1

4.2.2 Adaptive differential PCM


Additional savings in bandwidth –or improved quality –can be obtained by varying the number of bits used for
the difference signal depending on its amplitude
A second ADPCM standard ,which is G.722.It added subband coding.
A third standard based on ADPCM is also available.this is defined in G.726.This also uses subband coding but
with a speech bandwidth of 3.4kHz

Even higher levels of compression-but at higher levvels of complexity-can be obtained by also making the
predictor coefficients adaptive.This is the principle of adaptive of adaptive predictive coding

There are then quantizized and sent and the destination uses them,together with a sound synthesizer,to regenerate a
sound that is perceptually comparable with the source audio signal.this is LPCtechnique.
Three feature which determine the perception of a signal by the ear are its:
Pitch
Period
Loudness
Basic feature of an LPC
encoder/decoder: ig 4.4

Code-excited LPC
The synthesizers used in
most LPC decoders are
based on a very basic model
of the vocal tract
In the CELP
model,instead of treating
each digitized segment
independently for encoding
purpose
All coders of this type
have a delay associated with
them which is incurred
while each block of
digitized samples is
analyzed by the encoder and
the speech is reconstructed
at the decoder
4.2.6 Perceptual coding
Perceptual encoders have been designed for the compression of general audio

Perceptual coding since its role is to exploit a number of the limitation of the human ear.

Sensitivity of the ear


A strong signal may reduce the level of sensitivity of the ear to other signals which are near to it in frequency.
The Sensitivity of the ear varies with the frequency of the signal,the perception threshold of the ear –that is, its
minimum level of sensitivity-as a function of frequency is show in figure 4.5(a)
Most sensitive to signals in the range 2-5kHz
Shown 4.5(b) shows how the the sensitivity of the ear changes in the vicinity of a loud signal
The masking effect also varies with frequency as show in figure 4.6
Critical bandwidth
Temporal masking:
When the ear hears a loud sound,it takes a short but finite time before it can hear a quieter sound
SHOW 4.7

You might also like