0% found this document useful (0 votes)

40 views14 pages

Assignment 1

The document discusses data compression techniques. It provides details about lossless and lossy compression, including common algorithms used for each type. For lossless compression, it describes Run Length Encoding, Lempel-Ziv-Welch, Huffman Coding, and Arithmetic Encoding. It also discusses disadvantages of data compression such as added complexity and slower processing for sophisticated methods.

Uploaded by

Harshit Walia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views14 pages

Assignment 1

Uploaded by

Harshit Walia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

INTEGRAL UNIVERSITY

Data Compression

NAME : HARSHIT WALIA

BRANCH :B.C.A
SUBJECT: Data Compression
ENROLLMENT NO. 2100104048
TEACHER NAME: Zohaib Hasan
Khan
GROUP : 5
ASSIGNMENT 1

Q1. What are the two main types of data compression? Name the various algorithms
used to implement lossless data compression? What is disadvantage of data compression?

ANSWER :

There are mainly two types of data compression techniques -

1. Lossless Data Compression

2. Lossy Data Compression

Lossless data compression is used to compress the files without losing an original file's
quality and data. Simply, we can say that in lossless data compression, file size is reduced,
but the quality of data remains the same.

Lossy data compression is used to compress larger files into smaller files. In this compression
technique, some specific amount of data and quality are removed (loss) from the original
file. It takes less memory space from the original file due to the loss of original data and
quality. This technique is generally useful for us when the quality of data is not our first
priority.

The algorithms used in lossless compression are:

 Run Length

 Lempel-Ziv-Welch

 Huffman Coding

 Arithmetic Encoding
Run-length encoding (RLE) is a form of lossless data compression in which runs of data
(sequences in which the same data value occurs in many consecutive data elements) are stored
as a single data value and count, rather than as the original run.

Lempel–Ziv–Welch (LZW) is a universal lossless data compression algorithm created by Abraham

Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved
implementation of the LZ78 algorithm published by Lempel and Ziv in 1978.

Huffman coding is a lossless data compression algorithm. The idea is to assign variable-
length codes to input characters, lengths of the assigned codes are based on the
frequencies of corresponding characters.
Arithmetic coding is a data compression technique that encodes data (the data string) by creating a
code string which represents a fractional value on the number line between 0 and 1. The coding
algorithm is symbolwise recursive; i.e., it operates upon and encodes (decodes) one data symbol per
iteration or recursion.

DISADVANTAGES OF DATA COMPRESSION:

 ADDED COMPLICATION

 EFFECT OF ERRORS IN TRANSMISSION

 SLOWER FOR SOPHISTICATED METHODS (BUT SIMPLE METHODS CAN BE FASTER FOR WRITING TO DISK.)

 ``UNKNOWN'' BYTE / PIXEL RELATIONSHIP (+)

 NEED TO DECOMPRESS ALL PREVIOUS DATA (+)

Q2 : With the use of a block diagram, describe compression and

reconstruction? Suppose storing an image made up of a square array of 256×256
pixels requires 65,536 bytes. The image is compressed and the compressed version
requires 16,384 bytes. Evaluate the compression ratio?

ANSWER:

The reconstruction process is the inverse process of the above process. First, the compressed
data are decoded. Second, the decoded data are reconstructed using the second-generation
wavelet method to obtain the reconstructed data of two-level fusion compression and
reconstruction.

compression, decrease in volume of any object or substance resulting from applied stress.
Compression may be undergone by solids, liquids, and gases and by living systems.
NUMERICAL:
The compression Ratio for the above compression is given By

Compression Ratio= Original Size /Compressed Size

Compression Ratio= 65536/ 16384 = 4:1

We can also represent the compression ratio by expressing the reduction in the amount of data
required as a percentage of the size of the original data

Total Compression in percentage = Original-Compressed ×100% / Original = 65536-16384 ×100% /

65536

= 75%

In this particular example, the compression ratio calculated in this manner would be 75%

Compression Ratio
A very logical way of measuring how well a compression algorithm compresses a given set of data is
to look at the ratio of the number of bits required to represent the data before compression to the
number of bits required to represent the data after compression. This ratio is called the compression
ratio

Q3: Differentiate lossless compression with lossy compression. Explain Markov Model with suitable
diagram?

ANSWER :

Key Lossy Compression Lossless Compression

Data By using lossy compression, you can get rid Even unnoticeable bytes are retained with
Elimination of bytes that are regarded as unnoticeable. lossless compression.

After lossy compression, a file cannot be After lossless compression, a file can be restored
Restoration
restored to its original form. to its original form.

Quality suffers as a result of lossy No quality degradation happens in lossless

Quality compression. It leads to some level of data compression.
loss.

Lossy compression reduces the size of a file Lossless compression reduces the size but less as
Size
to a large extent. compared to lossy compression.

Algorithm Transform coding, Discrete Cosine Run length encoding, Lempel-Ziv-Welch,

Key Lossy Compression Lossless Compression

Transform, Discrete Wavelet transform, Huffman Coding, Arithmetic encoding, etc.

used
fractal compression, etc.

Lossy compression is used to compress audio, Lossless compression is used to compress files
Uses video and images. containing text, program codes, and other such
critical data.

The data holding capacity of the lossy Lossless compression has low data holding
Capacity
compression approach is quite significant. capacity as compared to lossy compression.

A Markov model is a stochastic method for randomly changing systems that possess the Markov
property. This means that, at any given time, the next state is only dependent on the current state
and is independent of anything in the past.

Q4 What are prefix codes? Explain Compression and Reconstruction with

the help of block diagram.
ANSWER :

A PREFIX CODE IS A TYPE OF CODE SYSTEM DISTINGUISHED BY ITS POSSESSION OF THE "PREFIX PROPERTY", WHICH
REQUIRES THAT THERE IS NO WHOLE CODE WORD IN THE SYSTEM THAT IS A PREFIX OF ANY OTHER CODE WORD IN THE
SYSTEM. IT IS TRIVIALLY TRUE FOR FIXED-LENGTH CODE, SO ONLY A POINT OF CONSIDERATION IN VARIABLE-LENGTH
CODE

compression, decrease in volume of any object or substance resulting from applied stress.
Compression may be undergone by solids, liquids, and gases and by living systems.

Q5: Discuss the various applications of Huffman coding?

Lossless Image Compression

A simple task for Huffman coding is to encode images in a lossless manner. This is useful for

precise and critical images such as medical images and Very High Resolution (VHR) satellite

images, where it is important for the data to remain consistent before and after compression.

Text Compression
There seems to be a large flaw with Huffman coding. Huffman coding requires that it must

know the distribution of the data before it can encode it. Adaptive Huffman coding is an

alternative because it can build a Huffman coding tree and encode the data in just a single

pass, but it is much more computationally demanding and slower than if the Huffman codes

were already known.

Audio Compression

Audio is another application area that benefits greatly from Huffman encoding when the

scheme is required to be lossless. Audio samples are typically recorded as 16 bits/channel. So,

traditional Huffman coding would need to have 2¹⁶ letters in its alphabet to encode all the

possible values, which is clearly unreasonable. There are a few advanced techniques such as

recursive indexing and representing multiple symbols with single probabilities which can still

operate by encoding these numbers like this. But, these are not very common techniques.

Q6: Given the eight symbols A, B, C, D, E, F, G, and H with probabilities 1/30,

1/30,1/30, 2/30, 3/30, 5/30, 5/30, and 12/30:
i) Draw the Huffman tree for these symbols.
ii) Compute the average no. of bits/symbol.

ANSWER :

A – 10100 – 5 BITS
B – 10101 – 5 BITS
C – 1100 – 4 BITS
D – 1101 – 4 BITS
E – 1011 – 4 BITS
F – 100 – 3 BITS
G – 111 – 3 BITS
H – 0 – 1 BIT
AVERAGE HUFFMAN CODE SIZE = 5 * (1 / 30 ) + 5 * (1 / 30 ) + 4 * (2 / 30 ) + 4 * (3 / 30 ) + 3
* (5 / 30 ) + 3 * (5 / 30 ) + 1 * (12 / 30 ) = 76 / 30..
Q7: 7. Differentiate between adaptive Huffman coding and Huffman
coding? Explain update procedure for Adaptive Huffman Coding

ANSWER:

Adaptive Huffman coding

Adaptive Huffman Coding is also known as Dynamic Huffman Coding. The implementation
is done using Vitter Algorithm.
Adaptive Huffman coding for a string containing alphabets:
Let m be the total number of alphabets. So m = 26.
For Vitter Algorithm, find a parameters e & r such that

Decoding
Steps for Decoding:

Read Binary string
If encountered leaf node is NYT
Read next e bits
If e bit value < r, Then to get required symbol convert (e+1) bits to decimal value of (e+1) bits
+1
If e bit value > r, Then to get required symbol convert e bits to decimal value of e bits + r

Huffman coding
Huffman coding is a lossless data compression algorithm. The idea is to assign variable-
length codes to input characters, lengths of the assigned codes are based on the
frequencies of corresponding characters.

I. The variable-length codes assigned to input characters are Prefix Codes, means the
codes (bit sequences) are assigned in such a way that the code assigned to one
character is not the prefix of code assigned to any other character.
II. This is how Huffman Coding makes sure that there is no ambiguity when decoding
the generated bitstream.
III. Let us understand prefix codes with a counter example. Let there be four
characters a, b, c and d, and their corresponding variable length codes be 00, 01, 0
and 1. This coding leads to ambiguity because code assigned to c is the prefix of
codes assigned to a and b. If the compressed bit stream is 0001, the de-
compressed output may be “cccd” or “ccb” or “acd” or “ab”

Update procedure for Adaptive Huffman

Coding

Q8. What are the difficulties in Arithmetic Coding? Explain it.
ANSWER:

I. Arithmetic Coding Arithmetic coding is the most efficient method to code symbols according
to the probability of their occurrence.

II. The average code length corresponds exactly to the possible minimum given by information
theory.Deviations which are caused by the bit-resolution of binary code trees do not exist.

III. In contrast to a binary Huffman code tree the arithmetic coding offers a clearly better
compression rate.

IV. Its implementation is more complex on the other hand. In arithmetic coding, a message is
encoded as a real number in an interval from one to zero.

V. Arithmetic coding typically has a better compression ratio than Huffman coding, as it
produces a single symbol rather than several separate codewords.

There are a few disadvantages of arithmetic coding.

1. One is that the whole codeword must be received to start decoding the symbols, and if there
is a corrupt bit in the codeword, the entire message could become corrupt.

2. Another is that there is a limit to the precision of the number which can be encoded, thus
limiting the number of symbols to encode within a codeword.

3. There also exist many patents upon arithmetic coding, so the use of some of the algorithms
also call upon royalty fees.

Q9: Explain the difference between source symbols and code

words.

SOURCE
CODE WORD
CODE

Code word is generated

Source code is generated by
01. by compiler or other
human or programmer.
translator.

Code word is low level

02. Source code is high level code.
code.

03. Source code is written in plain text code word is translated

by using some high level code of source code. It is
programming language. in binary format.

Source code is human Code word is not human

04.
understandable. understandable.

Code word is machine

Source code is not directly
05. understandable and
understandable by machine.
executable.

It is written in machine
It is written in a high-level language
language through
06. like C, C++, Java, Python, etc., or
compiler or assembler or
assembly language.
other translator.

07. It can be easily modified. It can not be modified.

It does not contain

It contains comments for better comments for
08.
understanding by programmer. understanding by
machine.

It contains more number

It contains less number of
09. of statements than source
statements than object code.
code.

It is more close towards

10. It is less close. towards machine.
machine.

Performance of object
Performance of source code is less
code is more than source
11. than object code as it is less close
code as it is more close
towards machine.
towards machine.

Object code is output of

Source code is input to compiler or
12. compiler or any other
any other translator.
translator.

Object code is system

13. Source code is not system specific.
specific.

14. It can be changed over time. Source code needs to be

compiled or translated by
any other translator to get
modified object code.

Language translators like compiler,

Object code is machine
assembler, interpreter are used to
15. code so it does not
translate source code to object
require any translation.
code.

The source lines of code gives the

readability and understandability to
This is not the case with
16. the user. Use of fewer lines of code
object code.
gives better performance by giving
same results in most cases.

Q 10: MPEG is designed to squeeze out as much redundancy as

possible in order to achieve high levels of compression. This is done in
two stages, list them and explain one of them.

ANSWER :

MPEG-2 Compression

MPEG-2 video compression is the de facto standard for entertainment video. MPEG-2 video
compression is popular for a number of reasons:

 It is an international standard [ISO/IEC IS 13818-2].

 MPEG-2 places no restrictions on the video encoder implementation. This allows
each encoder designer to introduce new techniques to improve compression
efficiency and picture quality. Since MPEG-2 video encoders were first introduced in
1993, compression efficiency has improved by 30 to 40%, despite predictions by
many that MPEG-2’s fundamental theoretical limitations would prevent this.
 MPEG-2 fully defines the video decoder’s capability at particular levels and profiles.
Many MPEG-2 chip-sets are available and will work with any main level at main
profile (MP@ML)–compliant MPEG-2 bit-stream from any source. Nevertheless,
quality can change significantly from one MPEG-2 video decoder to another,
especially in error handling and video clip transitions.
 MPEG-2 video compression is part of a larger standard that includes support for
transport and timing functions.
Moreover, MPEG-2 is likely to remain as the dominant standard for entertainment video
because it has been so successful in establishing an inventory of standard decoders (both in
existing consumer electronics products and in the chip libraries of most large semiconductor
companies). Additional momentum comes from the quantity of real-time and stored content
already compressed into MPEG-2 format. Even succeeding work by the MPEG committees
has been abandoned (MPEG-3) or retargeted to solve different problems (MPEG-4 and
MPEG-7).

MPEG-2 is a lossy video compression method based on motion vector estimation, discrete

cosine transforms, quantization, and Huffman encoding. (Lossy means that data is lost, or
thrown away, during compression, so quality after decoding is less than the original picture.)
Taking these techniques in order:

 Motion vector estimation is used to capture much of the change between video
frames, in the form of best approximations of each part of a frame as a translation
(generally due to motion) of a similar-sized piece of another video frame. Essentially,
there is a lot of temporal redundancy in video, which can be discarded. (The
term temporal redundancy is applied to information that is repeated from one frame
to another.)
 Discrete cosine transform (DCT) is used to convert spatial information into frequency
information. This allows the encoder to discard information, corresponding to higher
video frequencies, which are less visible to the human eye.
 Quantization is applied to the DCT coefficients of either original frames (in some
cases) or the DCT of the residual (after motion estimation) to restrict the set of
possible values transmitted by placing them into groups of values that are almost the
same.
 Huffman encoding uses short codes to describe common values and longer codes to
describe rarer values—this is a type of entropy coding.

The foregoing is a highly compressed summary of MPEG-2 video compression (with many
details omitted). However, there are so many excellent descriptions of MPEG compression
(see DTV: The Revolution in Electronic Imaging, by Jerry C. Whitaker; Digital Compression
for Multimedia: Principles and Standards, by Jerry D. Gibson and others; Testing Digital
Video, by Dragos Ruiu and others; and Modern Cable Television Technology; Video, Voice,
and Data Communications, by Walter Ciciora and others) that more description is not
justified here. Instead, the following sections concentrate on the most interesting aspects of
MPEG:

Data Compression Seminar Report
67% (6)
Data Compression Seminar Report
34 pages
Data Compression
No ratings yet
Data Compression
20 pages
Chapter 3 Multimedia Data Compression
100% (2)
Chapter 3 Multimedia Data Compression
23 pages
CS6301 - Analog and Digital Communication (ADC) PDF
No ratings yet
CS6301 - Analog and Digital Communication (ADC) PDF
122 pages
Greedy Algorithm DAA
No ratings yet
Greedy Algorithm DAA
22 pages
Theory in Programming Practice
No ratings yet
Theory in Programming Practice
250 pages
Unit 2 - Source Coding-4
No ratings yet
Unit 2 - Source Coding-4
57 pages
Shan PDF
No ratings yet
Shan PDF
104 pages
Information Theory and Source Coding
No ratings yet
Information Theory and Source Coding
14 pages
BCS401-Module-4
No ratings yet
BCS401-Module-4
42 pages
Unit 1 Data Compression
No ratings yet
Unit 1 Data Compression
30 pages
Unit 5 Partial
No ratings yet
Unit 5 Partial
116 pages
Huffman Coding 1
No ratings yet
Huffman Coding 1
54 pages
Module 5 - Info Theory and Compression Algo
No ratings yet
Module 5 - Info Theory and Compression Algo
58 pages
DCT Based Coding
No ratings yet
DCT Based Coding
49 pages
Bcs401 Module 4 Ada Notes
No ratings yet
Bcs401 Module 4 Ada Notes
41 pages
Chapter 4 - Introduction To Source Coding PDF
No ratings yet
Chapter 4 - Introduction To Source Coding PDF
72 pages
Module 5 IVP
No ratings yet
Module 5 IVP
112 pages
unit 5 data compression
No ratings yet
unit 5 data compression
98 pages
Data Compression (RCS 087)
No ratings yet
Data Compression (RCS 087)
51 pages
Information Theory and Entropy
100% (1)
Information Theory and Entropy
111 pages
Greedy Algorithms: CSE373: Design and Analysis of Algorithms
No ratings yet
Greedy Algorithms: CSE373: Design and Analysis of Algorithms
52 pages
Data Compression Explained
No ratings yet
Data Compression Explained
110 pages
ICT - Module 1 Lecture 3
No ratings yet
ICT - Module 1 Lecture 3
43 pages
Data Compression and Huffman Algorithm
0% (1)
Data Compression and Huffman Algorithm
18 pages
Lecture 3 Compression in Multimedia
No ratings yet
Lecture 3 Compression in Multimedia
60 pages
Lossless Img Comp
No ratings yet
Lossless Img Comp
8 pages
Improvised GZIP Published Eai.1!10!2019.160599
No ratings yet
Improvised GZIP Published Eai.1!10!2019.160599
8 pages
Chapter 6 PPT
No ratings yet
Chapter 6 PPT
82 pages
Heaps, Heapsort and Priority Queues: No Gaps in It. The Last Two Conditions Give A Heap A Weak Amount of Order
No ratings yet
Heaps, Heapsort and Priority Queues: No Gaps in It. The Last Two Conditions Give A Heap A Weak Amount of Order
24 pages
Introduction To Information Theory and Coding
No ratings yet
Introduction To Information Theory and Coding
46 pages
Algorithms For Data Compression in Wireless Computing Systems
No ratings yet
Algorithms For Data Compression in Wireless Computing Systems
7 pages
hggj Chapter Four
No ratings yet
hggj Chapter Four
30 pages
Lossless Data Compression Techniques and Their Performance
No ratings yet
Lossless Data Compression Techniques and Their Performance
6 pages
MM Unit-III - 0
No ratings yet
MM Unit-III - 0
22 pages
Ip-Un3 1
No ratings yet
Ip-Un3 1
44 pages
Assignment Agmase
No ratings yet
Assignment Agmase
14 pages
Dereje Teferi Dereje - Teferi@aau - Edu.et
No ratings yet
Dereje Teferi Dereje - Teferi@aau - Edu.et
36 pages
Wa0023.
No ratings yet
Wa0023.
28 pages
DC M1 Merged
No ratings yet
DC M1 Merged
26 pages
3.3 Huffman Coding
No ratings yet
3.3 Huffman Coding
28 pages
L12, L13, L14, L15, L16 - Module 4 - Source Coding
No ratings yet
L12, L13, L14, L15, L16 - Module 4 - Source Coding
59 pages
Analysis and Comparison of Algorithms For Lossless Data Compression
No ratings yet
Analysis and Comparison of Algorithms For Lossless Data Compression
8 pages
Comparison of Lossless Data Compression Algorithms
No ratings yet
Comparison of Lossless Data Compression Algorithms
12 pages
Umit;1 Mmdcs
No ratings yet
Umit;1 Mmdcs
17 pages
Compression: Safeen H. Rasool Assist. Lecturer
No ratings yet
Compression: Safeen H. Rasool Assist. Lecturer
16 pages
multimedia unit-4
No ratings yet
multimedia unit-4
30 pages
16 Greedy Algorithms
No ratings yet
16 Greedy Algorithms
21 pages
Stu-Lossless Compression Algos
No ratings yet
Stu-Lossless Compression Algos
21 pages
A New Lossless Image Compression Technique Based On Bose, Chandhuri and Hocquengham (BCH)
No ratings yet
A New Lossless Image Compression Technique Based On Bose, Chandhuri and Hocquengham (BCH)
8 pages
ut1ppt
No ratings yet
ut1ppt
77 pages
16 San
No ratings yet
16 San
7 pages
Data Compression Intro
100% (1)
Data Compression Intro
107 pages
Unit 1
No ratings yet
Unit 1
4 pages
Vik
No ratings yet
Vik
23 pages
Image Compression by Retaining Image Quality - Ieee Format
No ratings yet
Image Compression by Retaining Image Quality - Ieee Format
4 pages
Image Compression: Transmit
No ratings yet
Image Compression: Transmit
16 pages
Lossless and Lossy Compression
No ratings yet
Lossless and Lossy Compression
18 pages
Chapter 5 Data Compression
No ratings yet
Chapter 5 Data Compression
18 pages
EEE 427_4
No ratings yet
EEE 427_4
9 pages
Huffman Coding Technique
No ratings yet
Huffman Coding Technique
13 pages
Compression
No ratings yet
Compression
21 pages
Chapter-5 Data Compression
No ratings yet
Chapter-5 Data Compression
53 pages
Data Compression Report
No ratings yet
Data Compression Report
10 pages
CHAPTER 7
No ratings yet
CHAPTER 7
36 pages
Chapter 5 Data Compression
No ratings yet
Chapter 5 Data Compression
17 pages
Main Techniques and Performance of Each Compression
No ratings yet
Main Techniques and Performance of Each Compression
23 pages
Data Compression Techniques: Pushpender Rana, Student
No ratings yet
Data Compression Techniques: Pushpender Rana, Student
4 pages
Chapter 5 New
No ratings yet
Chapter 5 New
19 pages
Lossless Compression
No ratings yet
Lossless Compression
11 pages
Matlab
No ratings yet
Matlab
4 pages
Data Compression
No ratings yet
Data Compression
7 pages
Computer Science Extended Essay
No ratings yet
Computer Science Extended Essay
15 pages
Image Compression
100% (1)
Image Compression
47 pages
Compression Techniques and Cyclic Redundency Check
No ratings yet
Compression Techniques and Cyclic Redundency Check
5 pages
Research
No ratings yet
Research
4 pages
Trees and Optimization
No ratings yet
Trees and Optimization
13 pages
Algorithmic Specified Complexity in The Game of Life
No ratings yet
Algorithmic Specified Complexity in The Game of Life
11 pages
Multimedia Class 11
No ratings yet
Multimedia Class 11
6 pages
Huffman Codes
No ratings yet
Huffman Codes
8 pages
On The Implementation of Minimum Redundancy Prefix Codes: Alistair Moffat and Andrew Turpin
No ratings yet
On The Implementation of Minimum Redundancy Prefix Codes: Alistair Moffat and Andrew Turpin
8 pages
Scribe
No ratings yet
Scribe
5 pages
Literature Survey
No ratings yet
Literature Survey
5 pages
Information Theory and Coding
No ratings yet
Information Theory and Coding
9 pages
1 Greedy Algorithms: 1.1 Activity Selection Problem
No ratings yet
1 Greedy Algorithms: 1.1 Activity Selection Problem
6 pages
Define Psychovisual Redundancy
No ratings yet
Define Psychovisual Redundancy
3 pages
Huffman Coding
No ratings yet
Huffman Coding
3 pages
Assign1 ITC
No ratings yet
Assign1 ITC
1 page
Audio Visual Speech Recognition: Advancements, Applications, and Insights
From Everand
Audio Visual Speech Recognition: Advancements, Applications, and Insights
Fouad Sabry
No ratings yet
Human Visual System Model: Understanding Perception and Processing
From Everand
Human Visual System Model: Understanding Perception and Processing
Fouad Sabry
No ratings yet