Assignment 1
Assignment 1
Data Compression
Q1. What are the two main types of data compression? Name the various algorithms
used to implement lossless data compression? What is disadvantage of data compression?
ANSWER :
Lossless data compression is used to compress the files without losing an original file's
quality and data. Simply, we can say that in lossless data compression, file size is reduced,
but the quality of data remains the same.
Lossy data compression is used to compress larger files into smaller files. In this compression
technique, some specific amount of data and quality are removed (loss) from the original
file. It takes less memory space from the original file due to the loss of original data and
quality. This technique is generally useful for us when the quality of data is not our first
priority.
Run Length
Lempel-Ziv-Welch
Huffman Coding
Arithmetic Encoding
Run-length encoding (RLE) is a form of lossless data compression in which runs of data
(sequences in which the same data value occurs in many consecutive data elements) are stored
as a single data value and count, rather than as the original run.
Huffman coding is a lossless data compression algorithm. The idea is to assign variable-
length codes to input characters, lengths of the assigned codes are based on the
frequencies of corresponding characters.
Arithmetic coding is a data compression technique that encodes data (the data string) by creating a
code string which represents a fractional value on the number line between 0 and 1. The coding
algorithm is symbolwise recursive; i.e., it operates upon and encodes (decodes) one data symbol per
iteration or recursion.
ADDED COMPLICATION
SLOWER FOR SOPHISTICATED METHODS (BUT SIMPLE METHODS CAN BE FASTER FOR WRITING TO DISK.)
ANSWER:
The reconstruction process is the inverse process of the above process. First, the compressed
data are decoded. Second, the decoded data are reconstructed using the second-generation
wavelet method to obtain the reconstructed data of two-level fusion compression and
reconstruction.
compression, decrease in volume of any object or substance resulting from applied stress.
Compression may be undergone by solids, liquids, and gases and by living systems.
NUMERICAL:
The compression Ratio for the above compression is given By
We can also represent the compression ratio by expressing the reduction in the amount of data
required as a percentage of the size of the original data
= 75%
In this particular example, the compression ratio calculated in this manner would be 75%
Compression Ratio
A very logical way of measuring how well a compression algorithm compresses a given set of data is
to look at the ratio of the number of bits required to represent the data before compression to the
number of bits required to represent the data after compression. This ratio is called the compression
ratio
Q3: Differentiate lossless compression with lossy compression. Explain Markov Model with suitable
diagram?
ANSWER :
Data By using lossy compression, you can get rid Even unnoticeable bytes are retained with
Elimination of bytes that are regarded as unnoticeable. lossless compression.
After lossy compression, a file cannot be After lossless compression, a file can be restored
Restoration
restored to its original form. to its original form.
Lossy compression reduces the size of a file Lossless compression reduces the size but less as
Size
to a large extent. compared to lossy compression.
Lossy compression is used to compress audio, Lossless compression is used to compress files
Uses video and images. containing text, program codes, and other such
critical data.
The data holding capacity of the lossy Lossless compression has low data holding
Capacity
compression approach is quite significant. capacity as compared to lossy compression.
A Markov model is a stochastic method for randomly changing systems that possess the Markov
property. This means that, at any given time, the next state is only dependent on the current state
and is independent of anything in the past.
A PREFIX CODE IS A TYPE OF CODE SYSTEM DISTINGUISHED BY ITS POSSESSION OF THE "PREFIX PROPERTY", WHICH
REQUIRES THAT THERE IS NO WHOLE CODE WORD IN THE SYSTEM THAT IS A PREFIX OF ANY OTHER CODE WORD IN THE
SYSTEM. IT IS TRIVIALLY TRUE FOR FIXED-LENGTH CODE, SO ONLY A POINT OF CONSIDERATION IN VARIABLE-LENGTH
CODE
The reconstruction process is the inverse process of the above process. First, the compressed
data are decoded. Second, the decoded data are reconstructed using the second-generation
wavelet method to obtain the reconstructed data of two-level fusion compression and
reconstruction.
compression, decrease in volume of any object or substance resulting from applied stress.
Compression may be undergone by solids, liquids, and gases and by living systems.
A simple task for Huffman coding is to encode images in a lossless manner. This is useful for
precise and critical images such as medical images and Very High Resolution (VHR) satellite
images, where it is important for the data to remain consistent before and after compression.
Text Compression
There seems to be a large flaw with Huffman coding. Huffman coding requires that it must
know the distribution of the data before it can encode it. Adaptive Huffman coding is an
alternative because it can build a Huffman coding tree and encode the data in just a single
pass, but it is much more computationally demanding and slower than if the Huffman codes
Audio Compression
Audio is another application area that benefits greatly from Huffman encoding when the
scheme is required to be lossless. Audio samples are typically recorded as 16 bits/channel. So,
traditional Huffman coding would need to have 2¹⁶ letters in its alphabet to encode all the
possible values, which is clearly unreasonable. There are a few advanced techniques such as
recursive indexing and representing multiple symbols with single probabilities which can still
operate by encoding these numbers like this. But, these are not very common techniques.
ANSWER :
A – 10100 – 5 BITS
B – 10101 – 5 BITS
C – 1100 – 4 BITS
D – 1101 – 4 BITS
E – 1011 – 4 BITS
F – 100 – 3 BITS
G – 111 – 3 BITS
H – 0 – 1 BIT
AVERAGE HUFFMAN CODE SIZE = 5 * (1 / 30 ) + 5 * (1 / 30 ) + 4 * (2 / 30 ) + 4 * (3 / 30 ) + 3
* (5 / 30 ) + 3 * (5 / 30 ) + 1 * (12 / 30 ) = 76 / 30..
Q7: 7. Differentiate between adaptive Huffman coding and Huffman
coding? Explain update procedure for Adaptive Huffman Coding
ANSWER:
Adaptive Huffman Coding is also known as Dynamic Huffman Coding. The implementation
is done using Vitter Algorithm.
Adaptive Huffman coding for a string containing alphabets:
Let m be the total number of alphabets. So m = 26.
For Vitter Algorithm, find a parameters e & r such that
Decoding
Steps for Decoding:
Read Binary string
If encountered leaf node is NYT
Read next e bits
If e bit value < r, Then to get required symbol convert (e+1) bits to decimal value of (e+1) bits
+1
If e bit value > r, Then to get required symbol convert e bits to decimal value of e bits + r
Huffman coding
Huffman coding is a lossless data compression algorithm. The idea is to assign variable-
length codes to input characters, lengths of the assigned codes are based on the
frequencies of corresponding characters.
I. The variable-length codes assigned to input characters are Prefix Codes, means the
codes (bit sequences) are assigned in such a way that the code assigned to one
character is not the prefix of code assigned to any other character.
II. This is how Huffman Coding makes sure that there is no ambiguity when decoding
the generated bitstream.
III. Let us understand prefix codes with a counter example. Let there be four
characters a, b, c and d, and their corresponding variable length codes be 00, 01, 0
and 1. This coding leads to ambiguity because code assigned to c is the prefix of
codes assigned to a and b. If the compressed bit stream is 0001, the de-
compressed output may be “cccd” or “ccb” or “acd” or “ab”
Q8. What are the difficulties in Arithmetic Coding? Explain it.
ANSWER:
I. Arithmetic Coding Arithmetic coding is the most efficient method to code symbols according
to the probability of their occurrence.
II. The average code length corresponds exactly to the possible minimum given by information
theory.Deviations which are caused by the bit-resolution of binary code trees do not exist.
III. In contrast to a binary Huffman code tree the arithmetic coding offers a clearly better
compression rate.
IV. Its implementation is more complex on the other hand. In arithmetic coding, a message is
encoded as a real number in an interval from one to zero.
V. Arithmetic coding typically has a better compression ratio than Huffman coding, as it
produces a single symbol rather than several separate codewords.
1. One is that the whole codeword must be received to start decoding the symbols, and if there
is a corrupt bit in the codeword, the entire message could become corrupt.
2. Another is that there is a limit to the precision of the number which can be encoded, thus
limiting the number of symbols to encode within a codeword.
3. There also exist many patents upon arithmetic coding, so the use of some of the algorithms
also call upon royalty fees.
SOURCE
CODE WORD
CODE
It is written in machine
It is written in a high-level language
language through
06. like C, C++, Java, Python, etc., or
compiler or assembler or
assembly language.
other translator.
Performance of object
Performance of source code is less
code is more than source
11. than object code as it is less close
code as it is more close
towards machine.
towards machine.
ANSWER :
MPEG-2 Compression
MPEG-2 video compression is the de facto standard for entertainment video. MPEG-2 video
compression is popular for a number of reasons:
Motion vector estimation is used to capture much of the change between video
frames, in the form of best approximations of each part of a frame as a translation
(generally due to motion) of a similar-sized piece of another video frame. Essentially,
there is a lot of temporal redundancy in video, which can be discarded. (The
term temporal redundancy is applied to information that is repeated from one frame
to another.)
Discrete cosine transform (DCT) is used to convert spatial information into frequency
information. This allows the encoder to discard information, corresponding to higher
video frequencies, which are less visible to the human eye.
Quantization is applied to the DCT coefficients of either original frames (in some
cases) or the DCT of the residual (after motion estimation) to restrict the set of
possible values transmitted by placing them into groups of values that are almost the
same.
Huffman encoding uses short codes to describe common values and longer codes to
describe rarer values—this is a type of entropy coding.
The foregoing is a highly compressed summary of MPEG-2 video compression (with many
details omitted). However, there are so many excellent descriptions of MPEG compression
(see DTV: The Revolution in Electronic Imaging, by Jerry C. Whitaker; Digital Compression
for Multimedia: Principles and Standards, by Jerry D. Gibson and others; Testing Digital
Video, by Dragos Ruiu and others; and Modern Cable Television Technology; Video, Voice,
and Data Communications, by Walter Ciciora and others) that more description is not
justified here. Instead, the following sections concentrate on the most interesting aspects of
MPEG: