Lecture_11

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

EE/Stats 376A: Information theory Winter 2017

Lecture 11 — February 14
Lecturer: David Tse Scribe: Junjie J, Jiada L, Yizheng L, Vivek B

11.1 Outline
• Achieving capacity using random codes

• Linear codes

11.2 Recap
In the last couple of lectures we showed that one cannot communicate reliably at a rate
above the capacity. Therefore, we are left with the question of whether we can design codes
whose rate is arbitrarily close to the capacity.

11.2.1 Sphere packing


We start by revisiting the geometric picture of coding shown in Figure 11.1. Consider a set
of M codewords each of block length n: xn (1), . . . , xn (M ).
Each codeword, when sent over the channel, will generate a sphere of uncertainty (a.k.a
noise sphere). That is, for a particular codeword xn (m), the resulting output Y n (m) (with
high probability) lies in the noise sphere of xn (m). We want to design codewords to ensure
that the amount of overlap between the noise spheres is small: large overlap corresponds to
large decoding error. Before we figure out how to do that, let us first give a rough upper
bound on how many codewords we can choose to avoid large overlap.
Observe that for every codeword xn (m), the number of jointly typical sequences (xn (m), Y n (m))
(size of its noise sphere) is approximately 2nH(Y |X) . Coupling this with the fact that the total
number of typical output sequences in the {0, 1}n -sphere is 2nH(Y ) , we have

#Typical output sequences in the {0, 1}n -sphere


M ≤ (11.1)
#Typical sequences in a noise sphere
nH(Y )
2
= H(Y |X)
2
= 2nI(X,Y ) .

This yields heuristically the same result as we proved rigorously in the last lecture, namely
that we cannot communicate reliably above capacity. But the fact that this ”sphere-packing”
picture gives the correct upper bound suggests that it is a useful visualization of thinking
about how to achieve this upper bound, namely, we need to pack the {0, 1}n -sphere using
2nI(X,Y ) noise spheres with negligible overlap.

11-1
EE/Stats 376A Lecture 11 — February 14 Winter 2017

10

7
xn (3)
6

4
xn (2)
3
xn (1)
2

1
{0, 1}n -sphere: all typical Y n (m)
0
0 1 2 3 4 5 6 7 8 9 10
Figure 11.1

11.3 Random Codes


Instead of obtaining specific capacity-achieving codes, Shannon devised a random coding
technique, which only shows the existence of capacity-achieving codes.

11.3.1 Coding scheme C


Unless otherwise mentioned, input typical sequences are defined w.r.t the capacity-achieving
distribution, i.e. p⋆ (x) =
∑argmax p(x) I(X; Y ) and output typical sequences are defined w.r.t

the distribution p(y) = p (x)p(y|x), where p(y|x) is fixed for a given channel.
Construct a random code C by independently picking M codewords uniformly over the set
of input typical sequences. Since the coding scheme is random, we denote these codewords by
random variables X n (1), X n (2) · · · X n (M ). Even though the coding scheme C is randomly
chosen, both the encoder and the decoder ‘know’ the coding scheme (after it is chosen).

11.3.2 Decoder
For the sake of exposition, will shall fix our message m for the entire discussion below.
Suppose we send the random codeword X n (m) through the channel and receive an output
sequence Y n (m). The output Y n (m) will fall into one of the three cases:

• Case 1: there is a unique m̃ = m for which (X n (m̃), Y n (m)) is jointly typical. In this
case, the decoder returns m.

11-2
EE/Stats 376A Lecture 11 — February 14 Winter 2017

3 xn (m)
Y n (m)
2
xn (m̃)
1

0
0 1 2 3 4 5
Figure 11.2: Coding schema

• Case 2: there is another m̃ ̸= m for which (X n (m̃), Y n (m)) is jointly typical. Here,
the decoder declares an error.

• Case 3: there exists no m̃ for which (X n (m̃), Y n (m)) is jointly typical. Here too, the
decoder declares an error.

11.3.3 Probability of error Pe (C)


Case 3 occurs only when the output Y n (m) does not belong to the set of ‘output typical
sequence’ and from AEP, the probability of this event vanishes to[ zero as ] n → ∞. Therefore,
we have to bound the probability of case 2 in order to bound E Pe (C) .
It is easy to see that case 2 occurs if and only if Y n (m) lies in an overlapping region of
two (or more) noise spheres (Figure 11.2); computing the probability of such an event for a
particular coding scheme C = c is hard. However, we can compute the average probability
of this event over all the coding schemes.
Before proceeding any further, lets take a step back and analyze the random variables
Y n (m) and X n (m̃) for m̃ ̸= m.
1. From the construction of the codewords, X n (m) is uniform over the set of input typ-
ical sequences. Since the output Y n (m) depends only on the input X n (m) (and the
channel), it is uniform over the set of output typical sequences.

2. Since X n (m) and X n (m̃) are picked independently, and Y n (m) depends only the input
X n (m) (and the channel), Y n (m) and X n (m̃) are independent.

3. From the construction of the codewords, X n (m̃) is also uniform over the set of input
typical sequences.
( )
Using the above three points, for m̃ ̸= m, the probability that X n (m̃), Y n (m) is a
jointly typical sequence is

number of jointly typical sequences 2nH(X,Y )


≈ nH(X) nH(Y )
product of input and output typical sequences 2 2
−nI(X,Y )
= 2 . (11.2)

11-3
EE/Stats 376A Lecture 11 — February 14 Winter 2017

typical y n sequences

typical xn sequences

Figure 11.3: Only a subset of the intersections (black dots) are jointly typical

Refer to Figure 11.3 for a diagrammatic explanation of the above derivation.


Equation (11.2) gives us the probability that X n (m̃) is jointly typical with Y n (m) for
any message m̃ ̸= m. Therefore, using union bound we have
P r(∃m̃ ̸= m, st. X n (m̃) is jointly typical with Y n (m)) ≤ (M − 1)2−nI(X;Y )
= 2nR 2−nI(X:Y )
n→∞
−→ 0 (11.3)
for R < I(X : Y ) = Capacity. Since the above analysis does not depend on the choice of
the input message m, these arguments directly imply that the average probability of error
across all random codes converges to 0, i.e.,
[ ] n→∞
E Pe (C) −→ 0.
Since Pe (C) for all codes cannot be strictly above the average, there exists a code C ∗ such
that
Pe (C ∗ ) ≤ EC [Pe (C)] → 0.
Therefore, we have shown the existence of a capacity-achieving code without specifying the
code itself!

11.3.4 Computational efficiency


Even though the random coding scheme achieves capacity, it is computationally inefficient
for the following reasons:
• Storage: For achieving capacity we need to use large block lengths i.e, n ≫ 1. The
above coding scheme needs to store all the codewords since there is no structure in
the codebook. Thus, the space and time complexity of this (brute) encoding scheme is
exponential in n. For example, if R = 1/2 and n = 1000, we have M = 2nR = 2500 ≈
10150 codewords!

11-4
EE/Stats 376A Lecture 11 — February 14 Winter 2017

• Decoding: The time complexity of the decoder is also exponential in n, as one needs
to test for joint typicality with the output sequence for all the codewords.

11.4 Linear codes


The space and the time complexity for encoding random codes is large because the codewords
are not ‘structured’ w.r.t the messages. We overcome this problem with random linear
codes: each message W is mapped to a k-bit string uk ∈ {0, 1}k , and its corresponding
codeword is given by
xn = Guk ,
where G is a random matrix in the space {0, 1}n×k . The space and time complexity for
encoding a linear codes is nk ≪ 2n .
Since we have obtained a space/time efficient encoding scheme, the next logical questions
are (i) Does random linear codes achieve capacity? (ii) What is the time complexity of the
corresponding decoder?
Let us try answering the above questions for the BEC(p) channel. On the average np
entries of xn are erased and decoder only observes n(1 − p) entries of xn . Equivalently, in
the language of linear algebra, the decoder has n(1 − p) equations to solve for k unknowns,
which immediately implies

k ≤ n(1 − p)
k
=⇒ R = ≤ 1 − p = Capacity.
n
For general binary symmetric channels, one can prove that random codes can achieve ca-
pacity, using a similar argument as the proof that general random codes achieve capacity.
However, there is no known fast decoding algorithm that an exponential brute-force search.
In the upcoming lectures, we will design polar codes (a class of linear codes), which
achieve the capacity along with efficient encoding and decoding space/time complexities of
order O(n log n). Stay tuned!

11-5

You might also like