0% found this document useful (0 votes)
40 views14 pages

Assignment 1

The document discusses data compression techniques. It provides details about lossless and lossy compression, including common algorithms used for each type. For lossless compression, it describes Run Length Encoding, Lempel-Ziv-Welch, Huffman Coding, and Arithmetic Encoding. It also discusses disadvantages of data compression such as added complexity and slower processing for sophisticated methods.

Uploaded by

Harshit Walia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views14 pages

Assignment 1

The document discusses data compression techniques. It provides details about lossless and lossy compression, including common algorithms used for each type. For lossless compression, it describes Run Length Encoding, Lempel-Ziv-Welch, Huffman Coding, and Arithmetic Encoding. It also discusses disadvantages of data compression such as added complexity and slower processing for sophisticated methods.

Uploaded by

Harshit Walia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

INTEGRAL UNIVERSITY

Data Compression

NAME : HARSHIT WALIA


BRANCH :B.C.A
SUBJECT: Data Compression
ENROLLMENT NO. 2100104048
TEACHER NAME: Zohaib Hasan
Khan
GROUP : 5
ASSIGNMENT 1

Q1.   What are the two main types of data compression? Name the various algorithms
used to implement lossless data compression? What is disadvantage of data compression?

ANSWER :

There are mainly two types of data compression techniques -

1. Lossless Data Compression


2. Lossy Data Compression

Lossless data compression is used to compress the files without losing an original file's
quality and data. Simply, we can say that in lossless data compression, file size is reduced,
but the quality of data remains the same.

Lossy data compression is used to compress larger files into smaller files. In this compression
technique, some specific amount of data and quality are removed (loss) from the original
file. It takes less memory space from the original file due to the loss of original data and
quality. This technique is generally useful for us when the quality of data is not our first
priority.

The algorithms used in lossless compression are:

 Run Length 

 Lempel-Ziv-Welch

 Huffman Coding

 Arithmetic Encoding
Run-length encoding (RLE) is a form of lossless data compression in which runs of data
(sequences in which the same data value occurs in many consecutive data elements) are stored
as a single data value and count, rather than as the original run.

Lempel–Ziv–Welch (LZW) is a universal lossless data compression algorithm created by Abraham


Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved
implementation of the LZ78 algorithm published by Lempel and Ziv in 1978.

Huffman coding is a lossless data compression algorithm. The idea is to assign variable-
length codes to input characters, lengths of the assigned codes are based on the
frequencies of corresponding characters.
Arithmetic coding is a data compression technique that encodes data (the data string) by creating a
code string which represents a fractional value on the number line between 0 and 1. The coding
algorithm is symbolwise recursive; i.e., it operates upon and encodes (decodes) one data symbol per
iteration or recursion.

DISADVANTAGES OF DATA COMPRESSION:

 ADDED COMPLICATION

 EFFECT OF ERRORS IN TRANSMISSION

 SLOWER FOR SOPHISTICATED METHODS (BUT SIMPLE METHODS CAN BE FASTER FOR WRITING TO DISK.)

 ``UNKNOWN'' BYTE / PIXEL RELATIONSHIP (+)

 NEED TO DECOMPRESS ALL PREVIOUS DATA (+)

Q2 : With the use of a block diagram, describe compression and


reconstruction? Suppose storing an image made up of a square array of 256×256
pixels requires 65,536 bytes. The image is compressed and the compressed version
requires 16,384 bytes. Evaluate the compression ratio?

ANSWER:

The reconstruction process is  the inverse process of the above process. First, the compressed
data are decoded. Second, the decoded data are reconstructed using the second-generation
wavelet method to obtain the reconstructed data of two-level fusion compression and
reconstruction.

compression,  decrease in volume of any object or substance resulting from applied stress.
Compression may be undergone by solids, liquids, and gases and by living systems.
NUMERICAL:
The compression Ratio for the above compression is given By

Compression Ratio= Original Size /Compressed Size

Compression Ratio= 65536/ 16384 = 4:1

We can also represent the compression ratio by expressing the reduction in the amount of data
required as a percentage of the size of the original data

Total Compression in percentage = Original-Compressed ×100% / Original = 65536-16384 ×100% /


65536

= 75%

In this particular example, the compression ratio calculated in this manner would be 75%

Compression Ratio
A very logical way of measuring how well a compression algorithm compresses a given set of data is
to look at the ratio of the number of bits required to represent the data before compression to the
number of bits required to represent the data after compression. This ratio is called the compression
ratio

Q3: Differentiate lossless compression with lossy compression. Explain Markov Model with suitable
diagram?

ANSWER :

Key Lossy Compression Lossless Compression

Data By using lossy compression, you can get rid Even unnoticeable bytes are retained with
Elimination of bytes that are regarded as unnoticeable. lossless compression.

After lossy compression, a file cannot be After lossless compression, a file can be restored
Restoration
restored to its original form. to its original form.

Quality suffers as a result of lossy No quality degradation happens in lossless


Quality compression. It leads to some level of data compression.
loss.

Lossy compression reduces the size of a file Lossless compression reduces the size but less as
Size
to a large extent. compared to lossy compression.

Algorithm Transform coding, Discrete Cosine Run length encoding, Lempel-Ziv-Welch,


Key Lossy Compression Lossless Compression

Transform, Discrete Wavelet transform, Huffman Coding, Arithmetic encoding, etc.


used
fractal compression, etc.

Lossy compression is used to compress audio, Lossless compression is used to compress files
Uses video and images. containing text, program codes, and other such
critical data.

The data holding capacity of the lossy Lossless compression has low data holding
Capacity
compression approach is quite significant. capacity as compared to lossy compression.

A Markov model is a stochastic method for randomly changing systems that possess the Markov
property. This means that, at any given time, the next state is only dependent on the current state
and is independent of anything in the past.

Q4 What are prefix codes? Explain Compression and Reconstruction with


the help of block diagram.
ANSWER :

A PREFIX CODE IS A TYPE OF CODE SYSTEM DISTINGUISHED BY ITS POSSESSION OF THE "PREFIX PROPERTY", WHICH
REQUIRES THAT THERE IS NO WHOLE CODE WORD IN THE SYSTEM THAT IS A PREFIX OF ANY OTHER CODE WORD IN THE
SYSTEM. IT IS TRIVIALLY TRUE FOR FIXED-LENGTH CODE, SO ONLY A POINT OF CONSIDERATION IN VARIABLE-LENGTH
CODE

The reconstruction process is  the inverse process of the above process. First, the compressed
data are decoded. Second, the decoded data are reconstructed using the second-generation
wavelet method to obtain the reconstructed data of two-level fusion compression and
reconstruction.

compression,  decrease in volume of any object or substance resulting from applied stress.
Compression may be undergone by solids, liquids, and gases and by living systems.

Q5:    Discuss the various applications of Huffman coding?

Lossless Image Compression

A simple task for Huffman coding is to encode images in a lossless manner. This is useful for

precise and critical images such as medical images and Very High Resolution (VHR) satellite

images, where it is important for the data to remain consistent before and after compression.

Text Compression
There seems to be a large flaw with Huffman coding. Huffman coding requires that it must

know the distribution of the data before it can encode it. Adaptive Huffman coding is an

alternative because it can build a Huffman coding tree and encode the data in just a single

pass, but it is much more computationally demanding and slower than if the Huffman codes

were already known.

Audio Compression

Audio is another application area that benefits greatly from Huffman encoding when the

scheme is required to be lossless. Audio samples are typically recorded as 16 bits/channel. So,

traditional Huffman coding would need to have 2¹⁶ letters in its alphabet to encode all the

possible values, which is clearly unreasonable. There are a few advanced techniques such as

recursive indexing and representing multiple symbols with single probabilities which can still

operate by encoding these numbers like this. But, these are not very common techniques.

Q6: Given the eight symbols A, B, C, D, E, F, G, and H with probabilities 1/30,


1/30,1/30, 2/30, 3/30, 5/30, 5/30, and 12/30:
i) Draw the Huffman tree for these symbols.
ii) Compute the average no. of bits/symbol.

ANSWER :

A – 10100 – 5 BITS
B – 10101 – 5 BITS
C – 1100 – 4 BITS
D – 1101 – 4 BITS
E – 1011 – 4 BITS
F – 100 – 3 BITS
G – 111 – 3 BITS
H – 0 – 1 BIT
AVERAGE HUFFMAN CODE SIZE = 5 * (1 / 30 ) + 5 * (1 / 30 ) + 4 * (2 / 30 ) + 4 * (3 / 30 ) + 3
* (5 / 30 ) + 3 * (5 / 30 ) + 1 * (12 / 30 ) = 76 / 30..
Q7: 7.    Differentiate between adaptive Huffman coding and Huffman
coding? Explain update procedure for Adaptive Huffman Coding

ANSWER:

Adaptive Huffman coding

Adaptive Huffman Coding is also known as Dynamic Huffman Coding. The implementation
is done using Vitter Algorithm. 
Adaptive Huffman coding for a string containing alphabets: 
Let m be the total number of alphabets. So m = 26. 
For Vitter Algorithm, find a parameters e & r such that 

Decoding
Steps for Decoding: 
 
Read Binary string
If encountered leaf node is NYT 
Read next e bits 
If e bit value < r, Then to get required symbol convert (e+1) bits to decimal value of (e+1) bits
+1
If e bit value > r, Then to get required symbol convert e bits to decimal value of e bits + r

Huffman coding
Huffman coding is a lossless data compression algorithm. The idea is to assign variable-
length codes to input characters, lengths of the assigned codes are based on the
frequencies of corresponding characters.

I. The variable-length codes assigned to input characters are Prefix Codes, means the
codes (bit sequences) are assigned in such a way that the code assigned to one
character is not the prefix of code assigned to any other character.
II. This is how Huffman Coding makes sure that there is no ambiguity when decoding
the generated bitstream. 
III. Let us understand prefix codes with a counter example. Let there be four
characters a, b, c and d, and their corresponding variable length codes be 00, 01, 0
and 1. This coding leads to ambiguity because code assigned to c is the prefix of
codes assigned to a and b. If the compressed bit stream is 0001, the de-
compressed output may be “cccd” or “ccb” or “acd” or “ab”

Update procedure for Adaptive Huffman


Coding
 

 
Q8.    What are the difficulties in Arithmetic Coding? Explain it.
ANSWER:

I. Arithmetic Coding Arithmetic coding is the most efficient method to code symbols according
to the probability of their occurrence.

II. The average code length corresponds exactly to the possible minimum given by information
theory.Deviations which are caused by the bit-resolution of binary code trees do not exist.

III. In contrast to a binary Huffman code tree the arithmetic coding offers a clearly better
compression rate.

IV. Its implementation is more complex on the other hand. In arithmetic coding, a message is
encoded as a real number in an interval from one to zero.

V. Arithmetic coding typically has a better compression ratio than Huffman coding, as it
produces a single symbol rather than several separate codewords.

There are a few disadvantages of arithmetic coding.

1. One is that the whole codeword must be received to start decoding the symbols, and if there
is a corrupt bit in the codeword, the entire message could become corrupt.

2. Another is that there is a limit to the precision of the number which can be encoded, thus
limiting the number of symbols to encode within a codeword.

3. There also exist many patents upon arithmetic coding, so the use of some of the algorithms
also call upon royalty fees.

Q9:  Explain the difference between source symbols and code


words.

SOURCE
CODE WORD
CODE

Code word is generated


Source code is generated by
01. by compiler or other
human or programmer.
translator.

Code word is low level


02. Source code is high level code.
code.

03. Source code is written in plain text code word is translated


by using some high level code of source code. It is
programming language. in binary format.

Source code is human Code word is not human


04.
understandable. understandable.

Code word is machine


Source code is not directly
05. understandable and
understandable by machine.
executable.

It is written in machine
It is written in a high-level language
language through
06. like C, C++, Java, Python, etc., or
compiler or assembler or
assembly language.
other translator.

07. It can be easily modified. It can not be modified.

It does not contain


It contains comments for better comments for
08.
understanding by programmer. understanding by
machine.

It contains more number


It contains less number of
09. of statements than source
statements than object code.
code.

It is more close towards


10. It is less close. towards machine.
machine.

Performance of object
Performance of source code is less
code is more than source
11. than object code as it is less close
code as it is more close
towards machine.
towards machine.

Object code is output of


Source code is input to compiler or
12. compiler or any other
any other translator.
translator.

Object code is system


13. Source code is not system specific.
specific.

14. It can be changed over time. Source code needs to be


compiled or translated by
any other translator to get
modified object code.

Language translators like compiler,


Object code is machine
assembler, interpreter are used to
15. code so it does not
translate source code to object
require any translation.
code.

The source lines of code gives the


readability and understandability to
This is not the case with
16. the user. Use of fewer lines of code
object code.
gives better performance by giving
same results in most cases.

Q 10: MPEG is designed to squeeze out as much redundancy as


possible in order to achieve high levels of compression. This is done in
two stages, list them and explain one of them.

ANSWER :

MPEG-2 Compression

MPEG-2 video compression is the de facto standard for entertainment video. MPEG-2 video
compression is popular for a number of reasons:

 It is an international standard [ISO/IEC IS 13818-2].


 MPEG-2 places no restrictions on the video encoder implementation. This allows
each encoder designer to introduce new techniques to improve compression
efficiency and picture quality. Since MPEG-2 video encoders were first introduced in
1993, compression efficiency has improved by 30 to 40%, despite predictions by
many that MPEG-2’s fundamental theoretical limitations would prevent this.
 MPEG-2 fully defines the video decoder’s capability at particular levels and profiles.
Many MPEG-2 chip-sets are available and will work with any main level at main
profile (MP@ML)–compliant MPEG-2 bit-stream from any source. Nevertheless,
quality can change significantly from one MPEG-2 video decoder to another,
especially in error handling and video clip transitions.
 MPEG-2 video compression is part of a larger standard that includes support for
transport and timing functions.
Moreover, MPEG-2 is likely to remain as the dominant standard for entertainment video
because it has been so successful in establishing an inventory of standard decoders (both in
existing consumer electronics products and in the chip libraries of most large semiconductor
companies). Additional momentum comes from the quantity of real-time and stored content
already compressed into MPEG-2 format. Even succeeding work by the MPEG committees
has been abandoned (MPEG-3) or retargeted to solve different problems (MPEG-4 and
MPEG-7).

MPEG-2 is a lossy video compression method based on motion vector estimation, discrete


cosine transforms, quantization, and Huffman encoding. (Lossy means that data is lost, or
thrown away, during compression, so quality after decoding is less than the original picture.)
Taking these techniques in order:

 Motion vector estimation is used to capture much of the change between video
frames, in the form of best approximations of each part of a frame as a translation
(generally due to motion) of a similar-sized piece of another video frame. Essentially,
there is a lot of temporal redundancy in video, which can be discarded. (The
term temporal redundancy is applied to information that is repeated from one frame
to another.)
 Discrete cosine transform (DCT) is used to convert spatial information into frequency
information. This allows the encoder to discard information, corresponding to higher
video frequencies, which are less visible to the human eye.
 Quantization is applied to the DCT coefficients of either original frames (in some
cases) or the DCT of the residual (after motion estimation) to restrict the set of
possible values transmitted by placing them into groups of values that are almost the
same.
 Huffman encoding uses short codes to describe common values and longer codes to
describe rarer values—this is a type of entropy coding.

The foregoing is a highly compressed summary of MPEG-2 video compression (with many
details omitted). However, there are so many excellent descriptions of MPEG compression
(see DTV: The Revolution in Electronic Imaging, by Jerry C. Whitaker; Digital Compression
for Multimedia: Principles and Standards, by Jerry D. Gibson and others; Testing Digital
Video, by Dragos Ruiu and others; and Modern Cable Television Technology; Video, Voice,
and Data Communications, by Walter Ciciora and others) that more description is not
justified here. Instead, the following sections concentrate on the most interesting aspects of
MPEG:

You might also like