Huffman Coding
Huffman Coding
• Suppose we have to encode a text that comprises symbols from some n-symbol
alphabet by assigning to each of the text’s symbols some sequence of bits called
the codeword.
• For example, we can use a fixed-length encoding that assigns to
each symbol a bit string of the same length m (m ≥ log2 n). This is exactly what
the standard ASCII code does.
• Variable-length encoding, which assigns codewords of different lengths to
different symbols, introduces a problem that fixed-length encoding does not have.
Namely, how can we tell how many bits of an encoded text represent the first (or,
more generally, the ith) symbol? To avoid this complication, we can limit ourselves
to the so-called prefix-free (or simply prefix) codes.
Compression Technique
Message BCCABBDDAECCBBAEDDCC
LENGTH= 20
ASCII Code binary form
A 65 01000001
B 66 01000010
C 67
D 68
E 69
Total no. of bits for the message= 8x20 = 160 bits
Huffman Coding
• Huffman’s algorithm
Step 1 Initialize n one-node trees and label them with the symbols of the
alphabet given. Record the frequency of each symbol in its tree’s root
to indicate the tree’s weight. (More generally, the weight of a tree will
be equal to the sum of the frequencies in the tree’s leaves.)
Step 2 Repeat the following operation until a single tree is obtained. Find
two trees with the smallest weight (ties can be broken arbitrarily, but
see Problem 2 in this section’s exercises). Make them the left and right
subtree of a new tree and record the sum of their weights in the root
of the new tree as its weight.
A tree constructed by the above algorithm is called a Huffman tree. It
defines—in the manner described above—a Huffman code.
Fixed size Codes
Total no. of bits for the message= 3x20 = 60 bits
A 3 3/20 000
B 5 5/20 001
C 6 6/20 010
D 4 4/20 011
E 2 2/20 100
So for the table,
5x8 bits characters
5x3 bits codes
A 3 3/20 =.15
B 5 5/20=.25
C 6 6/20=.3
D 4 4/20=.2
E 2 2/20=.1
HUFFMAN CODING
Huffman Coding
Variable size Codes
Total no. of bits for the message= 3x20 = 60 bits
A 3 001
B 5 10
C 6 11
D 4 01
E 2 000
Huffman Coding