Hw1sol Sanov Rate Distortion
Hw1sol Sanov Rate Distortion
Hw1sol Sanov Rate Distortion
1. Sanov’s theorem:
Prove the simple version of Sanov’s theorem for the binary random vari-
ables, i.e., let X1 , X2 , . . . , Xn be a sequence of binary random variables,
drawn i.i.d. according to the distribution:
1
Solution
Sanov’s theorem
•
n
Pr(nX n = i + 1) i+1
q i+1 (1 − q)n−i−1 n−i q
= n i
= (7)
Pr(nX n = i) i
q (1 − q)n−i i+11−q
where the second inequality follows from the fact that the sum is
less than the largest term times the number of terms. Taking the
2
logarithm and dividing by n and taking the limit as n → ∞, we
obtain
1
lim log Pr {(X1 , X2 , . . . , Xn ) : pX ≥ p} ≤ −D(p||q) (13)
n→∞ n
Similarly, using the fact the sum of the terms is larger than the
largest term, we obtain
n
X n i
Pr {(X1 , X2 , . . . , Xn ) : pX ≥ p} ≥ q (1 − q)n−i(14)
i
i=⌈np⌉
n
≥ q i (1 − q)n−i (15)
⌈np⌉
≥ 2−nD(p||q) (16)
and
1
lim log Pr {(X1 , X2 , . . . , Xn ) : pX ≥ p} ≥ −D(p||q) (17)
n→∞ n
2. Strong Typicality
Let X n be drawn i.i.d.∼ P (x). Prove that for each xn ∈ Tδ (X),
′ ′
2−n(H(X)+δ ) ≤ P n (xn ) ≤ 2−n(H(X)−δ )
δ
But, since xn ∈ Tδ (X), we have |Pxn (a) − P (a)| < |X |
for all a. There-
fore,
δ X 1 X 1 δ X 1
H(X)− log < Pxn (a) log < H(X)+ log
|X | a P (a) a∈X P (a) |X | a P (a)
3
Hence,
′ ′
2−n(H(X)+δ ) ≤ P n (xn ) ≤ 2−n(H(X)−δ )
′ 1
where δ = |Xδ | a log P (a)
P
(a) Suppose P is such that P (a) > 0 for all a ∈ X . Then, there
is an inclusion relationship between the weakly typical set Aǫ (P )
and the strongly typical set Tδ (P ) for an appropriate choice of
ǫ. Which of the statement is true: Aǫ (P ) ⊆ Tδ (P ) or Aǫ (P ) ⊇
Tδ (P )? What is the appropriate relation between δ and ǫ?
(b) Give a description of the sequences that belongs to Aǫ (P ), vs. the
sequences that belongs to Tδ (P ), when the source is uniformly
distributed, i.e. P (a) = |X1 | , ∀a ∈ X . (Assume |X | < ∞.)
(c) Can you explain why Tδ (P ) is called strongly typical set and
Aǫ (P ) is called weakly typical set?
1
Therefore, we can see that if ǫ > |Xδ | a log P (a)
P
, then Aǫ (P ) ⊇
Tδ (P ). To show that the other way does not hold, see the next
part.
4
(b) When P is a uniform distribution, we can see that Aǫ (P ) includes
every possible sequences xn . To see this, we can easily see that
P n (xn ) = ( |X1 | )n , and thus, −1/n log P n (xn ) = log |X | = H(P ).
Thus, xn ∈ Aǫ (P ) for all xn , for all ǫ > 0. However, obviously,
among those sequences in Aǫ (P ), only those who have the type
1 δ 1 δ
− ≤ Pxn (a) ≤ +
|X | |X | |X | |X |
for each letter a ∈ X , are in Tδ (P ). Therefore, for sufficiently
small value of δ, there always exist some sequences that are in
Aǫ (P ), but not in Tδ (P ).
(c) From above questions, we can see that the strongly typical set is
contained in the weakly typical set for appropriate choice of δ and
ǫ. Thus, we can see that the definition of strong typical set is
stronger than that of the weakly typical set.
and hence
5
We can achieve this lower bound by choosing p(x̂) to be the uniform
distribution, and the conditional distribution of p(x|x̂) to be
= 1−D if x̂ = x
p(x̂|x) (22)
= D/(m − 1) if x̂ 6= x.
5. Erasure distortion
Consider X ∼ Bernoulli( 12 ), and let the distortion measure be given by
the matrix
0 1 ∞
d(x, x̂) = . (24)
∞ 1 0
Calculate the rate distortion function for this source. Can you suggest
a simple scheme to achieve any value of the rate distortion function for
this source?
1−a
0 Q - 0
Q
Q
Q
Q
Q
aQ
s
Q e
a
3
1 - 1
1−a
6
For this joint distribution, it is easy to calculate the distortion D = a
and that I(X; X̂) = H(X) − H(X|X̂) = 1 − a. Hence we have R(D) =
1 − D for 0 ≤ D ≤ 1. For D > 1, R(D) = 0.
It is very see how we could achieve this rate distortion function. If D
is rational, say k/n, then we send only the first n − k of any block of n
bits. We reproduce these bits exactly and reproduce the remaining bits
as erasures. Hence we can send information at rate 1 − D and achieve
a distortion D. If D is irrational, we can get arbitrarily close to D by
using longer and longer block lengths.
6. Rate distortion.
A memoryless source U is uniformly distributed on {0, . . . , r − 1}. The
following distortion function is given by
0, u = v,
d(u, v) = 1, u = v ± 1 mod r,
∞, otherwise.
7
since U is uniform, and due to symmetry, V is also uniform. We know
that H(1 − p, p, p) ≤ log 3, and this is achieved when p = 2/3. There-
fore,
R(D) = log r − log 3, if D > 2/3.
Now, let’s consider the case when D ≤ 2/3. Denote
df (p) 1−p
= log(1 − p) + 1 − log p − 1 + 1 = log +1
dp p
8
d(x, x̂) is given by
1 2 3 4
1 0 0 1 1
2 0 0 1 1
3 1 1 0 0
4 1 1 0 0
(a) Find R(0), the rate necessary to describe the process with zero
distortion.
(b) Find the rate distortion function R(D).
(Hint: The distortion measure allows to simplify the problem into
one you have already seen.)
(c) Suppose we have a nonuniform distribution p(i) = pi , i = 1, 2, 3, 4.
What is R(D)?
Solution: Simplification
9
distortion measure for Y is d(y, ŷ) and its rate distortion function is
RY (D).
Suppose we now wish to describe the process {(Xi , Yi )} subject to dis-
tortion constraints limn→∞ Ed(X n , X̂ n ) ≤ D1 and limn→∞ Ed(Y n , Ŷ n ) ≤
D2 . Our rate distortion theorem can be shown to naturally extend to
this setting and imply that the minimum rate required to achieve these
distortion is given by
Now, suppose the {Xi } process and the {Yi } process are independent
of each other.
(a) Show
RX,Y (D1 , D2 ) ≥ RX (D1 ) + RY (D2 ).
(b) Does equality hold?
Now answer the question.
Solution: Rate distortion for two independent sources.
(a) Given that X and Y are independent, we have
p(x, y, x̂, ŷ) = p(x)p(y)p(x̂, ŷ|x, y) (25)
Then
I(X, Y ; X̂, Ŷ ) = H(X, Y ) − H(X, Y |X̂, Ŷ ) (26)
= H(X) + H(Y ) − H(X|X̂, Ŷ ) − H(Y |X, X̂,(27)
Ŷ )
≥ H(X) + H(Y ) − H(X|X̂) − H(Y |Ŷ ) (28)
= I(X; X̂) + I(Y ; Ŷ ) (29)
where the inequality follows from the fact that conditioning re-
duces entropy. Therefore
RX,Y (D1 , D2 ) = min I(X, Y ; X̂, Ŷ ) (30)
p(x̂,ŷ|x,y):Ed(X,X̂)≤D1 ,Ed(Y,Ŷ )≤D2
≥ min I(X; X̂) + I(Y ; Ŷ(31)
)
p(x̂,ŷ|x,y):Ed(X,X̂)≤D1 ,Ed(Y,Ŷ )≤D2
10
(b) If
p(x, y, x̂, ŷ) = p(x)p(y)p(x̂|x)p(ŷ|y), (34)
then
11
of each region. For example, for the positive real line, the centroid a is
given as
Z ∞
2 x2
a = x√ e− 2σ2 dx
0 2πσ 2
Z ∞ r
2 −y
= σ e dy
0 π
r
2
= σ ,
π
using the substitution y = x2 /2σ 2 . The expected distortion for one bit
quantization is
Z 0 r !2
2 1 x2
D = x+σ √ e− 2σ2 dx
−∞ π 2πσ 2
Z ∞ r ! 2
2 1 x2
+ x−σ √ e− 2σ2 dx
0 π 2πσ 2
Z ∞
2 1 x2
= 2 x2 + σ 2 √ e− 2σ2 dx
−∞ π 2πσ 2
Z ∞ r !
2 1 x2
−2 −2xσ √ e− 2σ2 dx
0 π 2πσ 2
r
2 2 2 1 2 2
= σ + σ − 4√ σ
π 2π π
π−2
= σ2
π
≈ .3634 σ 2 ,
which is much larger than the distortion rate bound D(1) = σ 2 /4.
12
We are interested in describing U, when S, called side information, is
known both at the encoder and at the decoder. Prove that the rate
distortion function is given by
Since both encoder and decoder have the common side information
S, we can be helped from the correlation of our source U and the side
information S. The idea for the achievability is, we use the different rate
distortion code for the source symbols that have different corresponding
side information. Therefore, for the symbols that has side information
S = s will have the rate distortion code at rate
13
It is clear that the overall rate distortion code has the expected distor-
tion ≤ D.
For the converse, we have
nR ≥H(V n |S n )
=I(U n ; V n |S n )
=H(U n |S n ) − H(U n |V n , S n )
n
X
H(Ui |Si ) − H(Ui |U i−1 , V n , S n )
=
i=1
n
X
≥ (H(Ui |Si ) − H(Ui |Vi ))
i=1
Xn
= I(Ui ; Vi |Si )
i=1
n
X
≥ RU |S (Ed(Ui , Vi ))
i=1
≥RU |S (Ed(U n , V n ))
≥RU |S (D)
where each step has the identical reason as the original converse. Now,
to compare RU |S (D) and RU (D), consider following:
=RU (D).
Here, the inequality in (40) is from the fact that the minimization is
held on the smaller set. The inequality in (41) is from the fact that
S → U → V forms a Markov chain with the joint distribution that we
get from the minimization in (40). Also, we have,
14
Since I(S; V |U) = 0 and I(S; V ) ≥ 0, we have the inequality in (41).
Therefore, we can only gain from the the side information. Intuitively,
this also makes sense, because we can always just ignore the side in-
formation and get the original rate distortion code, but the rate may
be worse. The two rates will be equal if and only if U and S are
independent, which achieves the equality in both (40) and (41).
15