(Some) Solutions For HW Set # 2
(Some) Solutions For HW Set # 2
(Some) Solutions For HW Set # 2
(a) Here’s the way most of you did the problem: Since Z = X + Y , P (Z = z|X = x) =
P (Y = z − x|X = x).
X
H(Z|X) = P (x)H(Z|X = x)
X X
= − P (x) P (Z = z|X = x) log P (Z = z|X = x)
x z
X X
= P (x) P (Y = z − x|X = x) log P (Y = z − x|X = x)
x y
X
= P (x)H(Y |X = x)
= H(Y |X).
Here’s another way, which is more cute. Note that (X, Y ) and (X + Y, X − Y ) are
of course in a 1-1 relationship. Then,
with equality iff H(Y |X, Z) = 0, that is, when Y is a function of X and Z.
(b) Using the chain rule for entropy and the fact that conditioning reduces entropy,
(j)
where P̂n is the empirical distribution induced by X1 , . . . , Xj−1 , Xj+1 , . . . , Xn . Then, by
the convexity of the relative entropy,
1 X
1 X
(j)
D(P̂n+1 kQ) = D P̂n
Q ≤ D(P̂n(j) kQ),
n + 1 1≤j≤n+1 n + 1 1≤j≤n+1
(j)
and since the P̂n all have the same distribution, taking expectations yields,
E D(P̂n+1 kQ) ≤ E D(P̂n kQ) ,
as claimed.
5. Estimating the entropy. The fact that E[H(P̂n )] ≤ H(P ) follows from the concavity of
the entropy (using Jensen’s inequality), upon noting that E(P̂n ) = P .
7. Hypothesis testing. Stein Lemma says we should use the decision region,
P1n (xn1 )
Bn∗ = Bn∗ () = {xn1 ∈ An : 2D(P1 kP2 )− ≤ ≤ 2D(P1 kP2 )+ }.
P2n (xn1 )
1 P n (X n ) X P1 (x)
log 1n 1n → R := Q(x) log = D(QkP2 ) − D(QkP1 ),
n P2 (X1 ) x
P 2 (x)
in probability, as n → ∞.
Now consider three cases: R < D(P1 kP2 ) − , D(P1 kP2 ) − < R < D(P1 kP2 ) + , and
R > D(P1 kP2 ) + .