Huffman Coding 1

Huffman Coding
Huffman proposed an algorithm that assigns instantaneous

codes to the source symbols si, with probabilities P(si)
Binary Huffman Coding Algorithm

For the design of binary Huffman codes the Huffman coding
algorithm is as follows:
1. Re-order the source symbols in decreasing order of symbol
probability
2. Successively reduce the source S to S1, then S2 and so on, by
combining two symbols of Sj into a combined symbol, Sj+1
and re-ordering the new set of symbol probabilities in the
same way. Keep track of the position of the combined
symbol. Terminate the source reduction when a two symbol
source is produced. 1
Huffman Coding
Binary Huffman Coding Algorithm
3. Assign a compact code for the final reduced source. For a

two symbol source the trivial code is {0, 1}
4. Backtrack to the original source S assigning a compact
code for the j-th reduced source The compact code
assigned to S in is the binary Huffman code
2
Huffman Coding
Example 1
Consider a 5 symbol source with the following probability

assignments:
First construct the Huffman coding table using the algorithm
3
Huffman Coding
Example 1
Binary Huffman coding table
4
Huffman Coding
Example 1
Construct binary Huffman

coding tree
5
Huffman Coding
Example 1
Construct binary Huffman

coding tree
The binary Huffman code
6
Huffman Coding
Example 1
Average length of the code is L = 2.2 bits/symbol and the

efficiency of the Huffman code is
7
Huffman Coding
Example 1 (Alternative method)
Different binary Huffman coding table
8
Huffman Coding
Binary Huffman coding tree
9
Huffman Coding
Different binary Huffman

coding table
10
Huffman Coding
Average length of the code for the alternative method is L =

2.2 bits/symbol. So the efficiency of the Huffman code is
also same, i,.e.,
11
Huffman Coding
12
Huffman Coding
13
Huffman Coding
Some Comments on Huffman Codes
Equivalence of source coding and 20 questions:

“20 questions game”
• In the game one person chooses something (really anything)
and gives the category it belongs to (such as person, place or
thing). Then, the other player is allowed to ask up to 20
yes/no questions to try to determine what it is.
Suppose that we wish to find the most efficient series of yes–
no questions to determine an object from a class of objects.
Assuming that we know the probability distribution on the
objects, can we find the most efficient sequence of questions?
14
Huffman Coding
A sequence of questions is equivalent to a code for the

object. Any question depends only on the answers to the
questions before it.
Since the sequence of answers uniquely determines

the object, each object has a different sequence of
answers, and if we represent the yes–no answers by 0’s
and 1’s, we have a binary code for the set of objects.
15
Huffman Coding
The average length of this code is the average number of

questions for the questioning scheme
The first question in this scheme becomes: Is the first bit

equal to 1 in the object’s codeword?
16
Huffman Coding
Since the Huffman code is the best source code for a

random variable, the optimal series of questions is that
determined by the Huffman code. In Example 5.6.1 the
optimal first question is: Is X equal to 2 or 3? The answer
to this determines the first bit of the Huffman code.
17
Huffman Coding
Assuming that the answer to the first question is “yes,”

the next question should be “Is X equal to 3?”, which
determines the second bit.
18
Huffman Coding
However, we need not wait for the answer to the first

question to ask the second. We can ask as our second
question “Is X equal to 1 or 3?”, determining the second
bit of the Huffman code independent of the first.
19
Huffman Coding
The expected number of questions EQ in this optimal

scheme satisfies
20
Huffman Coding
Huffman coding for weighted codewords:
Huffman’s algorithm for minimizing can be applied to

any set of numbers pi ≥ 0, regardless of . In this case, the
Huffman code minimizes the sum of weighted code
lengths rather than the average code length.
21
Huffman Coding
Huffman coding for weighted codewords: We perform

the weighted minimization using the same algorithm.
In this case the code minimizes the weighted sum of the

codeword lengths, and the minimum weighted sum is 36.
22
Huffman Coding
Huffman codes and Shannon codes:
Shannon code length of a code is , which is much worse

than the optimal code for some particular symbol
Example, consider two symbols, with probability 0.9999

0.0001. The Shannon code length for the two symbols are
1 bit and 14 bits, respectively.
The optimal codeword length is obviously 1 bit for both

symbols.
23
Huffman Coding
Hence, the codeword for the infrequent symbol is much

longer in the Shannon code than in the optimal code.
Is it true that the codeword lengths for an optimal code

are always less than
The following example illustrates that this is not

always true.
24
Huffman Coding
Example 5.7.3 Consider a random variable X with a distribution

(1/3, 1/3, 1/4, 1/12 ). The Huffman coding procedure results in
codeword lengths of (2, 2, 2, 2) or (1, 2, 3, 3). Both these codes
achieve the same expected codeword length.
In the second code, the third symbol has length 3, which is greater
than . Thus, the codeword length for a Shannon code could be less
than the codeword length of the corresponding symbol of an
optimal (Huffman) code.
This example also illustrates the fact that the set of codeword
lengths for an optimal code is not unique.
25
Huffman Coding
Optimality of Huffman Codes
To prove the optimality of Huffman codes, we first prove

some properties of a particular optimal code
Without loss of generality, we will assume that the

probability masses are ordered, so that p1  p2  …..  pm,
Recall that a code is optimal if is minimal.
26
Huffman Coding
Lemma 5.8.1: For any distribution, there exists an

optimal instantaneous code (with minimum expected
length) that satisfies the following properties:
1. The lengths are ordered inversely with the probabilities

(i.e., if pj > pk, then lj ≤ lk).
2. The two longest codewords have the same length.
3. Two of the longest codewords differ only in the last bit
and correspond to the two least likely symbols.
27
Huffman Coding
Proof: The proof amounts to swapping, trimming, and

rearranging, as shown in Figure
By trimming branches
without siblings, we
A possible improve the code.
instantaneous code
28
Huffman Coding
Proof: The proof amounts to swapping, trimming,

and rearranging, as shown in Figure
We now rearrange the tree as shown in (c), so Finally, we swap probability assignments
that the word lengths are ordered by increasing to improve the expected depth of the tree,
length from top to bottom as shown in (d). 29
Huffman Coding
Proof: Consider an optimal code Cm:

1.
Consider Cm, with the codewords j and k of Cm
interchanged. Then
We have pj − pk > 0, and since Cm is optimal, L(Cm) − L(Cm) ≥

0. Hence, we must have lk ≥ lj . Thus, Cm itself satisfies property
1.
30
Huffman Coding
2. The two longest codewords are of the same length.

If the two longest codewords are not of the same length, one
can delete the last bit of the longer one, preserving the prefix
property and achieving lower expected codeword length.
Hence, the two longest codewords must have the same
length.
31
Huffman Coding
3. The two longest codewords differ only in the last bit and
correspond to the two least likely symbols.
• If there is a maximal-length codeword without a sibling, we can
delete the last bit of the codeword and still satisfy the prefix
property. This reduces the average codeword length and
contradicts the optimality
• Hence, every maximal-length codeword in any optimal code
has a sibling.
• Now we can exchange the longest codewords so that the two
lowest-probability source symbols are associated with two
siblings on the tree. This does not change the expected length.
Thus, the codewords for the two lowest-probability source
symbols have maximal length and agree in all but the last bit.32
Huffman Coding
Canonical Codes: The codes that satisfy the previous lemma.
For any probability mass function for an alphabet of size m, p =

(p1, p2, . . . , pm) with p1 ≥ p2 ≥ · · · ≥ pm, we define the Huffman
reduction p = (p1, p2, . . . , pm−2, pm−1 + pm) over an alphabet of size
m − 1 as shown in the Figure.
33
Huffman Coding
34
Huffman Coding
Let C∗m−1(p) be an optimal code for p, and let C∗m(p) be the
canonical optimal code for p.
The proof of optimality will follow from two constructions: First,

we expand an optimal code for p to construct a code for p, and
then we condense an optimal canonical code for p to construct a
code for the Huffman reduction p.
Comparing the average codeword lengths for the two codes

establishes that the optimal code for p can be obtained by
extending the optimal code for p.
35
Huffman Coding
From the optimal code for p, we construct an extension

code for m elements as follows:
Take the codeword in C*m−1 corresponding to weight

pm−1 + pm and extend it by adding a 0 to form a codeword
for symbol m − 1 and by adding 1 to form a codeword
for symbol m.
The code construction is illustrated as follows:

36
Huffman Coding
Calculation of the average length ( ) follows that
37
Huffman Coding
Similarly, from the canonical code for p, we construct a code

for p by merging the codewords for the two lowest-
probability symbols. The new code for p has average length
38
Huffman Coding
After adding the above two equations we get
or
39
Huffman Coding
Now by assumption, since L*(p) is the optimal length for p, we

have L(p ) − L*(p ) ≥ 0.
Similarly, the length of the extension of the optimal code for p

has to have an average length at least as large as the optimal
code for p [i.e., L(p) − L*(p) ≥ 0].
But the sum of two nonnegative terms can only be 0 if both of

them are 0, which implies that L(p) = L*(p) (i.e., the extension
of the optimal code for p is optimal for p).
40
Huffman Coding
Consequently, if we start with an optimal code for p with m −

1 symbols and construct a code for m symbols by extending
the codeword corresponding to pm−1 + pm, the new code is also
optimal. Starting with a code for two elements, in which case
the optimal code is obvious, we can by induction extend this
result to prove the following theorem.
Theorem 5.8.1 Huffman coding is optimal; that is, if C* is a

Huffman code and C is any other uniquely decodable code,
L(C*) ≤ L(C).
41
Huffman Coding
Shannon-Fano-Elias Coding
Without loss of generality, we can take X = {1, 2, . . .,m}.

Assume that p(x) > 0 for all x. The cumulative distribution
function F(x) is defined as
Consider the modified cumulative distribution

function
42
Huffman Coding
43
Huffman Coding
Since all the probabilities are positive,  F (a)   F(b) if a 

b, and hence we can determine x if we know  F(x). Merely
look at the graph of the cumulative distribution function and
find the corresponding x. Thus, the value of  F(x) can be
used as a code for x.
In general, F(x) is a real number expressible only by an

infinite number of bits. So it is not efficient to use the exact
value of  F(x) as a code for x. If we use an approximate
value, what is the required accuracy?
44
Huffman Coding
Assume that we truncate  F(x) to l(x) bits (denoted

by ). Thus, we use the first l(x) bits of  F(x) as a code
for x. By definition of rounding off, we have
and therefore lies within the step corresponding to

x. Thus, l(x) bits are sufficient to describe x. 45
Huffman Coding
Now we need to prove that the codewords are prefix-free.

(no complete code word of the code be a prefix of some other code
word.)
To check whether the code is prefix-free, we consider each

codeword to represent not a point but the interval
Now the code is prefix-free if and only if the intervals
corresponding to codewords are disjoint.
46
Huffman Coding
The interval corresponding to any codeword has length 2-l(x),

which is less than half the height of the step corresponding to x
by (5.75). The lower end of the interval is in the lower half of the
step. Thus, the upper end of the interval lies below the top of the
step.
Therefore, the intervals corresponding to different codewords are

disjoint and the code is prefix-free.
47
Huffman Coding
Since we use
bits to represent x, the expected length of this code is
Thus, this coding scheme achieves an average codeword length

that is within 2 bits of the entropy.
48
Huffman Coding
Example 5.9.1 We first consider an example where all the

probabilities are dyadic. We construct the code in the following
table:
49
Huffman Coding
In this case, the average codeword length is 2.75 bits and the
entropy is 1.75 bits. The Huffman code for this case achieves the
entropy bound. Looking at the codewords, it is obvious that there
is some inefficiency—for example, the last bit of the last two
codewords can be omitted. But if we remove the last bit from all
the codewords, the code is no longer prefix-free.
50
Huffman Coding
Example 5.9.2 We now give another example for construction of the

Shannon–Fano–Elias code. In this case, since the distribution is not
dyadic, the representation of F(x) in binary may have an infinite number
of bits. We denote 0.01010101 . . . by 0.01. We construct the code in the
following table:
51
Huffman Coding
52
53
54

Huffman Coding 1

Uploaded by

Copyright:

Available Formats

Huffman Coding 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Huffman Coding 1

Uploaded by

Copyright:

Available Formats

Huffman Coding

Huffman proposed an algorithm that assigns instantaneous

Binary Huffman Coding Algorithm

3. Assign a compact code for the final reduced source. For a

Consider a 5 symbol source with the following probability

First construct the Huffman coding table using the algorithm

Binary Huffman coding table

Construct binary Huffman

Construct binary Huffman

The binary Huffman code

Average length of the code is L = 2.2 bits/symbol and the

Different binary Huffman coding table

Binary Huffman coding tree

Example 1 (Alternative method)

Different binary Huffman

Example 1 (Alternative method)

Average length of the code for the alternative method is L =

Equivalence of source coding and 20 questions:

Equivalence of source coding and 20 questions:

A sequence of questions is equivalent to a code for the

Since the sequence of answers uniquely determines

Equivalence of source coding and 20 questions:

The average length of this code is the average number of

The first question in this scheme becomes: Is the first bit

Equivalence of source coding and 20 questions:

Since the Huffman code is the best source code for a

Equivalence of source coding and 20 questions:

Assuming that the answer to the first question is “yes,”

Equivalence of source coding and 20 questions:

However, we need not wait for the answer to the first

Equivalence of source coding and 20 questions:

The expected number of questions EQ in this optimal

Huffman’s algorithm for minimizing can be applied to

Huffman coding for weighted codewords: We perform

In this case the code minimizes the weighted sum of the

Huffman codes and Shannon codes:

Shannon code length of a code is , which is much worse

Example, consider two symbols, with probability 0.9999

The optimal codeword length is obviously 1 bit for both

Hence, the codeword for the infrequent symbol is much

Is it true that the codeword lengths for an optimal code

The following example illustrates that this is not

Example 5.7.3 Consider a random variable X with a distribution

To prove the optimality of Huffman codes, we first prove

Without loss of generality, we will assume that the

Recall that a code is optimal if is minimal.

Lemma 5.8.1: For any distribution, there exists an

1. The lengths are ordered inversely with the probabilities

Proof: The proof amounts to swapping, trimming, and

Proof: The proof amounts to swapping, trimming,

Proof: Consider an optimal code Cm:

We have pj − pk > 0, and since Cm is optimal, L(Cm) − L(Cm) ≥

2. The two longest codewords are of the same length.

Canonical Codes: The codes that satisfy the previous lemma.

For any probability mass function for an alphabet of size m, p =

The proof of optimality will follow from two constructions: First,

Comparing the average codeword lengths for the two codes

From the optimal code for p, we construct an extension

Take the codeword in C*m−1 corresponding to weight

The code construction is illustrated as follows: