HW 2 Sol
HW 2 Sol
HW 2 Sol
1. Huffman coding.
Consider the random variable
x1 x2 x3 x4 x5 x6 x7
X=
0.50 0.26 0.11 0.04 0.04 0.03 0.02
(a) Find a binary Huffman code for X.
(b) Find the expected codelength for this encoding.
(c) Extend the Binary Huffman method to Ternarry (Alphabet of 3)
and apply it for X.
Solution: Huffman coding.
(a) The Huffman tree for this distribution is
Codeword
1 x1 0.50 0.50 0.50 0.50 0.50 0.50 1
01 x2 0.26 0.26 0.26 0.26 0.26 0.50
001 x3 0.11 0.11 0.11 0.11 0.24
00011 x4 0.04 0.04 0.08 0.13
00010 x5 0.04 0.04 0.05
00001 x6 0.03 0.05
00000 x7 0.02
(b) The expected length of the codewords for the binary Huffman code
is 2 bits. (H(X) = 1.99 bits)
(c) The ternary Huffman tree is
Codeword
0 x1 0.50 0.50 0.50 1.0
1 x2 0.26 0.26 0.26
20 x3 0.11 0.11 0.24
21 x4 0.04 0.04
222 x5 0.04 0.09
221 x6 0.03
220 x7 0.02
This code has an expected length 1.33 ternary symbols. (H3 (X) =
1.25 ternary symbols).
1
2. Codes.
Let X1 , X2 , . . . , i.i.d. with
1, with probability 1/2
X= 2, with probability 1/4
3, with probability 1/4.
H(X n )
H(X ) , lim . (1)
n→∞ n
What is the entropy rate of the process
Solution: Codes.
(a) Yes, this code is nonsingular because C(x) is different for every
x.
(b) Yes, this code is uniquely decodable. Reversing the codewords
0, if x = 1
C ′ (x) = 10, if x = 2
11, if x = 3
2
(c) No, this code is not instantaneous because C(1) is a prefix of
C(2).
(d) The expected codeword length is
3
L(C(x)) = 0.5 × 1 + 0.25 × 2 + 0.25 × 2 = .
2
Further, the entropy rate of the i.i.d. X n is
3
H(X ) = H(X) = H(.5, .25, .25) = .
2
So the code is a uniquely decodable code with L = H(X ), and
therefore the sequence is maximally compressed with H(Z) =
1 bit. If H(Z) were less than its maximum of 1 bit then the Z n
sequence could be further compressed to its entropy rate, and X m
could also be compressed further by blockcoding. However, this
would result in Lm < H(X ) which contradicts theorem 5.4.2 of
the text. So H(Z) = 1 bit.
Note that the Z n sequence is not i.i.d. ∼ Br( 21 ), even though
H(Z) = 1 bit. For example, P {Z1 = 1} = 41 , and a sequence
starting 10 . . . is not allowed. However, once Zi = 0 for some i then
Zk is Bernoulli( 21 ) for k > i, so Z n is asymptotically Bernoulli( 21 )
and gives the entropy rate of 1 bit.
(b) Assume you have a file of size 1,000 symbols where the symbols are
distributed i.i.d. according to the pmf above. After applying the
Huffman code, what would be the pmf of the compressed binary
file and what would be the expected length?
(c) Generate a sequence (using MATLAB or any other software) of
length 10, 000 symbols of X with i.i.d probability PX . Assume the
alphabet of X is X = (0, 1, . . . , 6).
3
(d) What is the percentage of each symbol (0, 1, . . . , 6) in the sequence
that was generated by MATLAB. Explain this result using the law
of large numbers.
(e) Represent each symbol in X using the simple binary representa-
tion. Namely, X = 0 represent as ′ 000′ , X = 1 represent as ′ 001′ ,
X = 2 represent as ′ 010′ , . . . , X = 6 represent as ′ 110′ .
(f) What is the length of the simple representation. What percentage
of ′ 0′ and ′ 1′ do you have in this representation?
(g) Now, compress the 10, 000 symbols of X, into bits using Huffman
code.
(h) What is the length of the compressed file? What is the percentage
of ′ 0′ and ′ 1′ do you have in this representation?
(i) Explain the results.
Solution:
011 1 1 1
8
1
4 1
1
010 8 0
1 0
0011 1 1 1
16 2
1
8
1 1
0010 16 0 0
1
4
0001
1 1
16
1 0
8
1
0000 16
0
Figure 1: Huffman
4
(b) Huffman code is optimal code and achieves the entropy for dyadic
distribution. If the distribution of the digits is not Bernoulli( 12 )
you can compress it further. The binary digits of the data would
be equally distributed after applying the Huffman code and there-
fore p0 = p1 = 12 .
The expected length would be:
1 1 1 1 1 1 1
E[l] = · 1 + · 3 + · 3 + · 4 + · 4 + · 4 + · 4 = 2.25
2 8 8 16 16 16 16
Therefore, the expected length of 1000 symbols would be 2250
bits.
(c-i) The results from simulation should be as follows:
Drawn symbols
-------------
Size: 10000
Percentage of x=0: 0.4984
Percentage of x=1: 0.125
Percentage of x=2: 0.1262
Percentage of x=3: 0.0636
Percentage of x=4: 0.0575
Percentage of x=5: 0.0647
Percentage of x=6: 0.0646
Before compression
------------------
Size: 30000
Percentage of b=0: 0.77213
Percentage of b=1: 0.22787
After compression
------------------
Size: 22463
Percentage of b=0: 0.49958
Percentage of b=1: 0.50042
5
Explanation: The symbols X(i), i ∈ {1, 2, . . . , 10, 000} are drawn
i.i.d. according to PX . Therefore, by the L.L.N, the percentage
of appearances of each symbol x ∈ X , should be approximately
PX (x). Next, we represent each symbol with 3 binary symbols.
Since the file in uncompressed, the percentage of 0’s is not half.
The expected percentage can be calculated from the PMF. In our
case,
Since there are 30, 000 bits for representation, the expected per-
centage is 77.08%. After Huffman’s code is applied, the number
of appearances of 0’s and 1’s are almost equal. This is since loss-
less compression maximize entropy (now 0 and 1 in the file are
uniform).
Solution
(a) X
H(X) = − p(x) log p(x)
x∈X
∞
X
=− 2−i log2 (2−i )
i=1
∞
X −i
=− =2
i=1
2i
6
(b) Coding Scheme:
1 0
2 10
3 110
4 1110
5 11110
. .
. .
. .
Average Length:
∞ ∞
X X i
L∗ = p(x = i)L(i) = i
= 2 = H(X)
i=1 i=1
2
5. Bad wine.
One is given 6 bottles of wine. It is known that precisely one bottle
has gone bad (tastes terrible). From inspection of the bottles it is
determined that the probability pi that the ith bottle is bad is given by
7 5 4 4 3 3
(p1 , p2 , . . . , p6 ) = ( 26 , 26 , 26 , 26 , 26 , 26 ). Tasting will determine the bad
wine.
Suppose you taste the wines one at a time. Choose the order of tasting
to minimize the expected number of tastings required to determine the
bad bottle. Remember, if the first 5 wines pass the test you don’t have
to taste the last.
Now you get smart. For the first sample, you mix some of the wines in
a fresh glass and sample the mixture. You proceed, mixing and tasting,
stopping when the bad bottle has been determined.
7
Solution: Bad wine.
(b) The first bottle to be tasted should be the one with probability
7
26
.
(c) The idea is to use Huffman coding.
(01) 7 7 8 11 15 26
(11) 5 6 7 8 11
(000) 4 5 6 7
(001) 4 4 5
(100) 3 4
(101) 3
8
6. Relative entropy is cost of miscoding.
Let the random variable X have five possible outcomes {1, 2, 3, 4, 5}.
Consider two distributions on this random variable
Symbol p(x) q(x) C1 (x) C2 (x)
1 1/2 1/2 0 0
2 1/4 1/8 10 100
3 1/8 1/8 110 101
4 1/16 1/8 1110 110
5 1/16 1/8 1111 111
(a) Calculate H(p), H(q), D(p||q) and D(q||p).
(b) The last two columns above represent codes for the random vari-
able. Verify that the average length of C1 under p is equal to the
entropy H(p). Thus C1 is optimal for p. Verify that C2 is optimal
for q.
(c) Now assume that we use code C2 when the distribution is p. What
is the average length of the codewords. By how much does it
exceed the entropy H(p)?
(d) What is the loss if we use code C1 when the distribution is q?
9
Similarly, D(q||p) = 18 .
(b) The average codeword length for C1 is
1 1 1 1 15
El1 = ·1+ ·2+ ·3+2· ·4= .
2 4 8 16 8
Similarly, the average codeword length for C2 is 2.
(c)
1 1 1 1
E p l2 =
·1+ ·3+ ·3+2· · 3 = 2,
2 4 8 16
which exceeds H(p) by D(p||q) = 81 .
17
(d) Similarly, Eq l1 = 8
, which exceeds H(q) by D(q||p) = 18 .
the sum of the probabilities of all symbols less than i. Then the code-
word for i is the number Fi ∈ [0, 1] rounded off to li bits, where
li = ⌈log p1i ⌉.
(a) Show that the code constructed by this process is prefix-free and
the average length satisfies
(b) Construct the code for the probability distribution (0.5, 0.25, 0.125, 0.125).
1 1
log ≤ li < log + 1 (4)
pi pi
10
which implies that
X
H(X) ≤ L = pi li < H(X) + 1. (5)
11
9. AEP. Let X1 , X2 , . . . be independent identically distributed random
variables drawn according to the probability Qn mass function p(x), x ∈
{1, 2, . . . , m}. Thus p(x1 , x2 , . . . , xn ) = i=1 p(xi ). We know that
1
−
Qnn log p(X1 , X2 , . . . , Xn ) → H(X) in probability. Let q(x1 , x2 , . . . , xn ) =
i=1 q(xi ), where q is another probability mass function on {1, 2, . . . , m}.
Solution: AEP.
1 q(X1 , X2 , . . . , Xn ) 1X q(Xi )
lim − log = lim − log
n p(X1 , X2 , . . . , Xn ) n p(Xi )
q(X)
= −E(log ) w.p. 1
p(X)
X q(x)
= − p(x) log
p(x)
X p(x)
= p(x) log
q(x)
= D(p||q).
12
10. Empirical distribution of a sequence Before starting the question,
below are two facts that you may consider to use:
√
• Stirling approximation: n! ≈ 2πn( ne )n .
• Consider a sequence of length n that consist of two different num-
bers. The first number appears n1 times and the second number
appears n2 times such that n1 + n2 = n. The number of different
combinations of such sequences is given by n1nn2 = n1!n2!
n!
.
A fair dice with 6 faces was thrown n times, where n is a very large
number.
(a) Find how many different sequences there exists with an empirical
pmf (p1 , p2 , ..., p6 ), where pi is the portion of the sequence that is
equal to i ∈ {1, 2, ..., 6}.
In this section you can assume that n! ≈ ( ne )n since only the power
of ne will matter.
(b) Now, we were told that the portion of odd numbers in the sequence
is 2/3 (i.e., p1 + p3 + p5 = 2/3). For n very large, what is the most
likely empirical pmf of the sequence. Hint: Define:
1 p 1 2 p2
(
X 32
X = 3 p3 , Y = 4 p4 , Z = .
Y 13
5 p5 6 p6
Solution
13
(a) The number of combinations N is given by:
n
N =
n1 n2 n3 n4 n5 n6
n!
=
n1 !n2 !n3 !n4 !n5 !n6 !
n n
(a)
= e ,
n1 n1 n2 n2 n3 n3 n4 n4 n5 n5 n6 n6
e e e e e e
Finally, we get:
P6 ni
N ≈ 2− i=1 ni log n
P6 ni ni
= 2−n i=1 n log n
n1 n2 n3 n4 n5 n6
= 2nH( n , n , n , n , n , n )
(a) 1 1 1 1 1 1
≈ 2nH( 6 , 6 , 6 , 6 , 6 , 6 )
(b) 1
= 26nH( 6 ) ,
where:
(a) For sufficient large n we have nni ≈ 61 .
(b) i.i.d.
The result is probable since the typical set and the set defined by
the law of large numbers converge as the number of samples goes
to infinty.
(b) First, observe that:
n1 + n3 + n5 2 2
= ⇔ p1 + p2 + p3 =
n 3 3
We would like to find {p1 , p2 , p3 , p4 , p5 , p6 } which, under the given
constraints maximizes entropy. This results in the biggest typical
14
set which means the event will be most likely. Equivalently:
1 p1 2 p2
(
X 32
X= 3 p3 , Y = 4 p4 , Z =
Y 13
5 p5 6 p6
Define the following indicator auxiliary variable:
(
1 Z=X
I=
0 Z=Y
Now, let us maximize the entropy of Z.
(a)
H(Z) = H(Z, I)
= H(I) + H(Z|I)
2 1 2
= H( ) + H(Y ) + H(X),
3 3 3
where (a) follows since
H(Z, I) = H(Z, I)
= H(Z) + H(I|Z)
| {z }
Deterministic Given Z
= H(Z).
Accordingly, maximizing H(Z) means maximizing H(X), H(Y ).
As we’ve shown at class uniform distribution maximizes entropy.
Thus,
2 1
H(X), H(Y ) ∼ U nif orm ⇒ p1 , p3 , p5 = , p2 , p4 , p6 = .
9 9
11. drawing a codebook Let Xi be a r.v. i.i.d distributed according
to P (x). We draw codebook of 2nR codewords of X n independently
using P (x) and i.i.d.. We would like to answer the question: what is
the probability that the first codeword would be identical to another
codeword in the codebook as n goes to infinity.
• Let xn be a sequence in the typical set Anǫ (X). What is the asymp-
totic probability (you may provide an upper and lower bound) as
n → ∞ that we draw a sequence X n i.i.d distributed according to
P (x) and we get xn .
15
• Using your answer from the previous sub-question find an α such
that if R < α the probability that the first codewaord in the
codebook appears twice or more in the codebook goes to zero as
n → ∞.
• Find an α such that if R > α the probability that the first code-
waord in the codebook appears twice or more in the codebook
goes to 1 as n → ∞.
where (a) follows from the union bound and (b) follows from sec-
tion a. If we set ᾱ = H(X) − ǫ this probability goes to zero.
(c) We want to find α such that if R > α then P (A) = 1 − Pr(∀i 6=
16
1 : xn (i) 6= xn (1)) goes to 1. Consider the following derivations:
nR
2
Y
n n
Pr(∀i 6= 1 : x (i) 6= x (1)) = Pr(xn (i) 6= xn (1))
i=2
2nR
Y
≤ (1 − Pr(xn (i) = xn (1)))
i=2
2nR
(a) Y
≤ (1 − 2−n(H(X)+ǫ) )
i=2
(b) nR
≤ (1 − 2−n(H(X)+ǫ) )2
(c) nR −n(H(X)+ǫ)
≤ e−2 2
nR−(H(X)+ǫ)
≤ e−2
12. Saving the princess. A princess was abducted and was put in one of
K rooms. Each room is labeled by a number 1, 2, . . . , K. Each room
is of size si where i = 1, 2, . . . , K. The probability of the princess to
be in room i is pi , and proportional to the size of the room si , namely,
pi = αsi where α is a constant.
(a) Find α.
(b) In order to save the princess you need to find in which room she
is. You may ask the demon a yes/no question. Like is she in
room number 1 or is she in room 2 or 5 or is she in a room of
odd number, and so on. You will save the princess if and only if
the expected number of questions is the minimum possible. What
would be the questions you should ask the demon in order to save
the princess?
17
(a) Since p is a probability mass function, it must satisfy
K
X K
X K
X
1= pi = αsi = α si
i=1 i=1 i=1
1
Therefore, α = PK .
i=1 si
(b) The idea is to use Huffman code with the probability that the
princess should be in a specific room. Follow the solution of Bad
wine question. Here, we will build questions like is the princess
in room 1 or 2 instead of mixing wine, and we want to find the
princess instead of bad wine bottle.
Yn Yn
Figure 2: Lossless source coding with side information at the encoder and
decoder.
Show that a code with rate R < H(X|Y ) can not be achievable, and
interpret the result.
Hint: Let T , f (X n , Y n ). Consider
nR ≥ H(T )
≥ H(T |Y n ), (7)
18
Solution Sketch of the solution (please fill in the explanation for each
step):
nR ≥ H(T )
≥ H(T |Y n ),
≥ I(X n ; T |Y n )
= H(X n |Y n ) − H(X n |T, Y n )
= nH(X|Y ) − ǫn ,
where ǫn → 0.
ii.
D PA,B PA PB = D PB|A PB PA .
iii.
I(A; B) = D PB|A PB PA .
iv. X
D PA|B QA|B PB = PB (b)D PA|B=b QA|B=b .
b∈B
19
Zn Zn
T (X n , Z n )
n
X Encoder Decoder X̂ n
(b) What are the conditions for which the equality I(X; Y ) = H(Y )−
H(Z) holds.
(c) Assume that the conditions for I(X; Y ) = H(Y ) − H(Z) are sat-
isfied. Is it true that there exists a function such that Z = g(Y )?
20
Solution
0 =H(Y |X, Z)
=H(X, Y, Z) − H(X, Z)
=H(X) + H(Y |X) + H(Z|X, Y ) − H(X) − H(Z|X)
=H(Y |X) − I(Z; Y |X)
It follows that
17. True or False: Copy each relation and write true or false.
(b) For two probability distributions, pXY and qXY , that are defined
on X × Y, the following holds:
(c) If X and Y are dependent and also Y and Z are dependent, then
X and Z are dependent.
21
Solution
I(X; W ) ≤I(X; Z)
≤I(Y ; Z)
where
X
D PY |X QY |X PX = PX (x)D PY |X=x QY |X=x ,
x∈X
x 0 1 2 3
pX (x) 0.5 0.25 0.125 0.125
22
(b) Build a binary Huffman code for the source X.
(c) What is the expected length of the resulting compressed sequence.
(d) What is the expected number of zeros in the resulting compressed
sequence.
(e) Let X̃ n be an another source distributed i.i.d. according to pX̃ .
x̃ 0 1 2 3
pX̃ (x̃) 0.3 0.4 0.1 0.2
Solution
H(X) = − 0.5 · log 0.5 − 0.25 · log 0.25 − 2 · 0.125 · log 0.125
=1.75
(b)
c(x) x pX (x) Tree
1 1
1 0 0.5
0
1 0.5
01 1 0.25
1 0
001 2 0.125 0.25
0
000 3 0.125
23
(c) Denote the length of a codeword by L(c(xi )). Then
n
X
n n
L(c (x )) = L(c(xi ))
i=1
Xn
= [pX (0) · L(c(0)) + pX (1) · L(c(1)) + pX (2) · L(c(2)) + pX (3) · L(c(3))]
i=1
=n (0.5 · 1 + 0.25 · 2 + 0.125 · 3 + 0.125 · 3)
=1.75n
Since the code is optimal, the number of zeros is half of the ex-
pected length (see the following sub-question).
24
(e) Denote the length of a codeword by L(c(xi )). Then
n
X
L(cn (xn )) = L(c(xi ))
i=1
n
X
= [pX (0) · L(c(0)) + pX (1) · L(c(1)) + pX (2) · L(c(2)) + pX (3) · L(c(3))]
i=1
=n (0.3 · 1 + 0.4 · 2 + 0.1 · 3 + 0.2 · 3)
=2n
Note that the expected number of zeros is not half of the expected
length. It implies that the code is not optimal.
R∗ = H(X̃) = 1.846
25