Huffman Coding 1

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 54

Huffman Coding

Huffman proposed an algorithm that assigns instantaneous


codes to the source symbols si, with probabilities P(si)

Binary Huffman Coding Algorithm


For the design of binary Huffman codes the Huffman coding
algorithm is as follows:
1. Re-order the source symbols in decreasing order of symbol
probability
2. Successively reduce the source S to S1, then S2 and so on, by
combining two symbols of Sj into a combined symbol, Sj+1
and re-ordering the new set of symbol probabilities in the
same way. Keep track of the position of the combined
symbol. Terminate the source reduction when a two symbol
source is produced. 1
Huffman Coding
Binary Huffman Coding Algorithm

3. Assign a compact code for the final reduced source. For a


two symbol source the trivial code is {0, 1}
4. Backtrack to the original source S assigning a compact
code for the j-th reduced source The compact code
assigned to S in is the binary Huffman code

2
Huffman Coding
Example 1

Consider a 5 symbol source with the following probability


assignments:

First construct the Huffman coding table using the algorithm

3
Huffman Coding
Example 1

Binary Huffman coding table

4
Huffman Coding
Example 1

Construct binary Huffman


coding tree

5
Huffman Coding
Example 1

Construct binary Huffman


coding tree

The binary Huffman code

6
Huffman Coding
Example 1

Average length of the code is L = 2.2 bits/symbol and the


efficiency of the Huffman code is

7
Huffman Coding
Example 1 (Alternative method)

Different binary Huffman coding table

8
Huffman Coding
Example 1 (Alternative method)

Binary Huffman coding tree

9
Huffman Coding

Example 1 (Alternative method)

Different binary Huffman


coding table

10
Huffman Coding

Example 1 (Alternative method)

Average length of the code for the alternative method is L =


2.2 bits/symbol. So the efficiency of the Huffman code is
also same, i,.e.,

11
Huffman Coding

12
Huffman Coding

13
Huffman Coding
Some Comments on Huffman Codes

Equivalence of source coding and 20 questions:


“20 questions game”
• In the game one person chooses something (really anything)
and gives the category it belongs to (such as person, place or
thing). Then, the other player is allowed to ask up to 20
yes/no questions to try to determine what it is.
Suppose that we wish to find the most efficient series of yes–
no questions to determine an object from a class of objects.
Assuming that we know the probability distribution on the
objects, can we find the most efficient sequence of questions?

14
Huffman Coding
Some Comments on Huffman Codes

Equivalence of source coding and 20 questions:

A sequence of questions is equivalent to a code for the


object. Any question depends only on the answers to the
questions before it.

Since the sequence of answers uniquely determines


the object, each object has a different sequence of
answers, and if we represent the yes–no answers by 0’s
and 1’s, we have a binary code for the set of objects.

15
Huffman Coding
Some Comments on Huffman Codes

Equivalence of source coding and 20 questions:

The average length of this code is the average number of


questions for the questioning scheme

The first question in this scheme becomes: Is the first bit


equal to 1 in the object’s codeword?

16
Huffman Coding
Some Comments on Huffman Codes

Equivalence of source coding and 20 questions:

Since the Huffman code is the best source code for a


random variable, the optimal series of questions is that
determined by the Huffman code. In Example 5.6.1 the
optimal first question is: Is X equal to 2 or 3? The answer
to this determines the first bit of the Huffman code.
17
Huffman Coding
Some Comments on Huffman Codes

Equivalence of source coding and 20 questions:

Assuming that the answer to the first question is “yes,”


the next question should be “Is X equal to 3?”, which
determines the second bit.

18
Huffman Coding
Some Comments on Huffman Codes

Equivalence of source coding and 20 questions:

However, we need not wait for the answer to the first


question to ask the second. We can ask as our second
question “Is X equal to 1 or 3?”, determining the second
bit of the Huffman code independent of the first.

19
Huffman Coding
Some Comments on Huffman Codes

Equivalence of source coding and 20 questions:

The expected number of questions EQ in this optimal


scheme satisfies

20
Huffman Coding
Some Comments on Huffman Codes
Huffman coding for weighted codewords:

Huffman’s algorithm for minimizing can be applied to


any set of numbers pi ≥ 0, regardless of . In this case, the
Huffman code minimizes the sum of weighted code
lengths rather than the average code length.

21
Huffman Coding
Some Comments on Huffman Codes

Huffman coding for weighted codewords: We perform


the weighted minimization using the same algorithm.

In this case the code minimizes the weighted sum of the


codeword lengths, and the minimum weighted sum is 36.
22
Huffman Coding
Some Comments on Huffman Codes

Huffman codes and Shannon codes:

Shannon code length of a code is , which is much worse


than the optimal code for some particular symbol

Example, consider two symbols, with probability 0.9999


0.0001. The Shannon code length for the two symbols are
1 bit and 14 bits, respectively.

The optimal codeword length is obviously 1 bit for both


symbols.

23
Huffman Coding
Some Comments on Huffman Codes
Huffman codes and Shannon codes:

Hence, the codeword for the infrequent symbol is much


longer in the Shannon code than in the optimal code.

Is it true that the codeword lengths for an optimal code


are always less than

The following example illustrates that this is not


always true.

24
Huffman Coding
Some Comments on Huffman Codes
Huffman codes and Shannon codes:

Example 5.7.3 Consider a random variable X with a distribution


(1/3, 1/3, 1/4, 1/12 ). The Huffman coding procedure results in
codeword lengths of (2, 2, 2, 2) or (1, 2, 3, 3). Both these codes
achieve the same expected codeword length.
In the second code, the third symbol has length 3, which is greater
than . Thus, the codeword length for a Shannon code could be less
than the codeword length of the corresponding symbol of an
optimal (Huffman) code.
This example also illustrates the fact that the set of codeword
lengths for an optimal code is not unique.
25
Huffman Coding
Optimality of Huffman Codes

To prove the optimality of Huffman codes, we first prove


some properties of a particular optimal code

Without loss of generality, we will assume that the


probability masses are ordered, so that p1  p2  …..  pm,

Recall that a code is optimal if is minimal.

26
Huffman Coding
Optimality of Huffman Codes

Lemma 5.8.1: For any distribution, there exists an


optimal instantaneous code (with minimum expected
length) that satisfies the following properties:

1. The lengths are ordered inversely with the probabilities


(i.e., if pj > pk, then lj ≤ lk).
2. The two longest codewords have the same length.
3. Two of the longest codewords differ only in the last bit
and correspond to the two least likely symbols.

27
Huffman Coding
Optimality of Huffman Codes

Proof: The proof amounts to swapping, trimming, and


rearranging, as shown in Figure

By trimming branches
without siblings, we
A possible improve the code.
instantaneous code
28
Huffman Coding
Optimality of Huffman Codes

Proof: The proof amounts to swapping, trimming,


and rearranging, as shown in Figure

We now rearrange the tree as shown in (c), so Finally, we swap probability assignments
that the word lengths are ordered by increasing to improve the expected depth of the tree,
length from top to bottom as shown in (d). 29
Huffman Coding
Optimality of Huffman Codes

Proof: Consider an optimal code Cm:


1.
Consider Cm, with the codewords j and k of Cm
interchanged. Then

We have pj − pk > 0, and since Cm is optimal, L(Cm) − L(Cm) ≥


0. Hence, we must have lk ≥ lj . Thus, Cm itself satisfies property
1.
30
Huffman Coding
Optimality of Huffman Codes

2. The two longest codewords are of the same length.


If the two longest codewords are not of the same length, one
can delete the last bit of the longer one, preserving the prefix
property and achieving lower expected codeword length.
Hence, the two longest codewords must have the same
length.

31
Huffman Coding
Optimality of Huffman Codes

3. The two longest codewords differ only in the last bit and
correspond to the two least likely symbols.
• If there is a maximal-length codeword without a sibling, we can
delete the last bit of the codeword and still satisfy the prefix
property. This reduces the average codeword length and
contradicts the optimality
• Hence, every maximal-length codeword in any optimal code
has a sibling.
• Now we can exchange the longest codewords so that the two
lowest-probability source symbols are associated with two
siblings on the tree. This does not change the expected length.
Thus, the codewords for the two lowest-probability source
symbols have maximal length and agree in all but the last bit.32
Huffman Coding
Optimality of Huffman Codes

Canonical Codes: The codes that satisfy the previous lemma.

For any probability mass function for an alphabet of size m, p =


(p1, p2, . . . , pm) with p1 ≥ p2 ≥ · · · ≥ pm, we define the Huffman
reduction p = (p1, p2, . . . , pm−2, pm−1 + pm) over an alphabet of size
m − 1 as shown in the Figure.

33
Huffman Coding
Optimality of Huffman Codes

34
Huffman Coding
Optimality of Huffman Codes

Let C∗m−1(p) be an optimal code for p, and let C∗m(p) be the
canonical optimal code for p.

The proof of optimality will follow from two constructions: First,


we expand an optimal code for p to construct a code for p, and
then we condense an optimal canonical code for p to construct a
code for the Huffman reduction p.

Comparing the average codeword lengths for the two codes


establishes that the optimal code for p can be obtained by
extending the optimal code for p.

35
Huffman Coding
Optimality of Huffman Codes

From the optimal code for p, we construct an extension


code for m elements as follows:

Take the codeword in C*m−1 corresponding to weight


pm−1 + pm and extend it by adding a 0 to form a codeword
for symbol m − 1 and by adding 1 to form a codeword
for symbol m.

The code construction is illustrated as follows:


36
Huffman Coding
Optimality of Huffman Codes

Calculation of the average length ( ) follows that

37
Huffman Coding
Optimality of Huffman Codes

Similarly, from the canonical code for p, we construct a code


for p by merging the codewords for the two lowest-
probability symbols. The new code for p has average length

38
Huffman Coding
Optimality of Huffman Codes

After adding the above two equations we get

or

39
Huffman Coding
Optimality of Huffman Codes

Now by assumption, since L*(p) is the optimal length for p, we


have L(p ) − L*(p ) ≥ 0.

Similarly, the length of the extension of the optimal code for p


has to have an average length at least as large as the optimal
code for p [i.e., L(p) − L*(p) ≥ 0].

But the sum of two nonnegative terms can only be 0 if both of


them are 0, which implies that L(p) = L*(p) (i.e., the extension
of the optimal code for p is optimal for p).
40
Huffman Coding
Optimality of Huffman Codes

Consequently, if we start with an optimal code for p with m −


1 symbols and construct a code for m symbols by extending
the codeword corresponding to pm−1 + pm, the new code is also
optimal. Starting with a code for two elements, in which case
the optimal code is obvious, we can by induction extend this
result to prove the following theorem.

Theorem 5.8.1 Huffman coding is optimal; that is, if C* is a


Huffman code and C is any other uniquely decodable code,
L(C*) ≤ L(C).

41
Huffman Coding
Shannon-Fano-Elias Coding

Without loss of generality, we can take X = {1, 2, . . .,m}.


Assume that p(x) > 0 for all x. The cumulative distribution
function F(x) is defined as

Consider the modified cumulative distribution


function

42
Huffman Coding
Shannon-Fano-Elias Coding

43
Huffman Coding
Shannon-Fano-Elias Coding

Since all the probabilities are positive,  F (a)   F(b) if a 


b, and hence we can determine x if we know  F(x). Merely
look at the graph of the cumulative distribution function and
find the corresponding x. Thus, the value of  F(x) can be
used as a code for x.

In general, F(x) is a real number expressible only by an


infinite number of bits. So it is not efficient to use the exact
value of  F(x) as a code for x. If we use an approximate
value, what is the required accuracy?

44
Huffman Coding
Shannon-Fano-Elias Coding

Assume that we truncate  F(x) to l(x) bits (denoted


by ). Thus, we use the first l(x) bits of  F(x) as a code
for x. By definition of rounding off, we have

and therefore lies within the step corresponding to


x. Thus, l(x) bits are sufficient to describe x. 45
Huffman Coding
Shannon-Fano-Elias Coding

Now we need to prove that the codewords are prefix-free.


(no complete code word of the code be a prefix of some other code
word.)

To check whether the code is prefix-free, we consider each


codeword to represent not a point but the interval

Now the code is prefix-free if and only if the intervals

corresponding to codewords are disjoint.

46
Huffman Coding
Shannon-Fano-Elias Coding

The interval corresponding to any codeword has length 2-l(x),


which is less than half the height of the step corresponding to x
by (5.75). The lower end of the interval is in the lower half of the
step. Thus, the upper end of the interval lies below the top of the
step.

Therefore, the intervals corresponding to different codewords are


disjoint and the code is prefix-free.

47
Huffman Coding
Shannon-Fano-Elias Coding

Since we use

bits to represent x, the expected length of this code is

Thus, this coding scheme achieves an average codeword length


that is within 2 bits of the entropy.

48
Huffman Coding
Shannon-Fano-Elias Coding

Example 5.9.1 We first consider an example where all the


probabilities are dyadic. We construct the code in the following
table:

49
Huffman Coding
Shannon-Fano-Elias Coding

In this case, the average codeword length is 2.75 bits and the
entropy is 1.75 bits. The Huffman code for this case achieves the
entropy bound. Looking at the codewords, it is obvious that there
is some inefficiency—for example, the last bit of the last two
codewords can be omitted. But if we remove the last bit from all
the codewords, the code is no longer prefix-free.

50
Huffman Coding
Shannon-Fano-Elias Coding

Example 5.9.2 We now give another example for construction of the


Shannon–Fano–Elias code. In this case, since the distribution is not
dyadic, the representation of F(x) in binary may have an infinite number
of bits. We denote 0.01010101 . . . by 0.01. We construct the code in the
following table:

51
Huffman Coding
Shannon-Fano-Elias Coding

52
53
54

You might also like