Channel Capacity
Channel Capacity
Channel Capacity
Radu T. Trı̂mbiţaş
November 2012
Outline
Contents
1 Channel Capacity 2
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Discrete Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Symmetric Channels 15
7 Feedback Capacity 37
1
8 Source-Channel Separation Theorem 39
9 Coding 42
9.1 Introduction to coding . . . . . . . . . . . . . . . . . . . . . . . . 42
9.2 Hamming Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1 Channel Capacity
1.1 Introduction
Towards Channels
2
The 1930s America
Radio Communications
3
• Q: Is there an upper bound on the information capable of being sent un-
der different noise conditions?
• Key: If we increase the transmission rate over a noisy channel will the
error rate increase?
Simple Example
4
A key idea
• If we choose the messages carefully at the sender, then with very high
probability, they will be uniquely identifiable at the receiver.
• The idea is that we choose the source messages that (tend to) not have
any ambiguity (or have any overlap) at the receiver end. I.e.,
• This might restrict our possible set of source messages (in some cases
severely, and thereby decrease our rate R), but if any message received in
a region corresponds to only one source message, ”perfect” communica-
tion can be achieved.
• We will see many instances of discrete memoryless channels (or just DMC).
• Recall back from lecture 1 our general model of communications:
Model of Communication
5
• Noisy channel distorts this message into a length-n string of receiver sym-
bols Y n .
• Decoder attempts to reconstruct original message as best as possible, comes
up with W,
b one of M possible sent messages.
p(y| x )
I ( X; Y ) = I p( x) ( X, Y ) = ∑ |p(x){z
p(y| x ) log
p(y)
(1)
x,y }
p( x,y)
p(y| x )
= ∑ p( x ) p(y| x ) log (2)
x,y ∑ x0 p(y| x 0 ) p( x 0 )
6
Definition 4 (Rate). The rate R of a code is measured in the number of bits per
channel use.
7
• That is,
• only possible way to get low error is if R < C. Something funny happens
at the point C, the capacity of the channel.
• Note that C is not 0, so can still achieve ”perfect” communication over a
noisy channel as long as R < C.
8
2.2 Noisy Channel with non-overlapping outputs
Noisy Channel with non-overlapping outputs
This channel has 2 possible outputs corresponding to each of the 2 inputs.
The channel appears to be noisy, but really is not.
I ( X; Y ) = H ( X ) − H (Y | X ) = H ( X )
| {z }
=0
• Then C = log k.
9
Asside: on the optimization to compute the value C
• The solution p∗ ( x ) that we find that achieves this maximum won’t nec-
essarily be unique.
• Also, the solution p∗ ( x ) that we find won’t necessarily be the one that we
end up, say, using when we wish to do channel coding.
10
Noisy Typewriter
• In this case the channel input is either received unchanged at the output
with probability 1/2 or is transformed into the next letter with probabil-
ity 1/2 (Figure 1).
• So 26 input symbols, and each symbol maps probabilistically to itself or
its lexicographic neighbor.
C = max I ( X; Y ) = max ( H ( X ) − H (Y | X )
p( x ) p( x )
= log 26 − 1 = log 13
11
2.5 Binary Symmetric Channel (BSC)
Binary Symmetric Channel (BSC)
• So H (Y ) = 1 if H ( X ) = 1.
• Thus, we get that C = 1 − H ( p) which happens when X is uniform.
• If p = 1/2 then C = 0, so if it randomly flips bits, then no information
can be sent.
• If p 6= 1/2, then we can communicate, albeit potentially slowly. E.g., if
p = 0.499 then C = 2.8854 × 10−6 bits per channel use. So to send one
bit, need to use the channel quite a bit.
• If p = 0 or p = 1, then C = 1 and we can get maximum possible rate (i.e.,
the capacity is one bit per channel use).
12
Decoding
• Lets temporarily look ahead to address this problem.
• We can ”decode” the source using the received string, source distribu-
tion, and the channel model p(y| x ) via Bayes rule. I.e.
P (Y | X ) P ( X ) P (Y | X ) P ( X )
P ( X |Y ) = =
P (Y ) ∑ x0 P(Y | X 0 = x 0 ) Pr ( X 0 = x 0 )
13
• e is an erasure symbol, if that happens we don’t have access to the trans-
mitted bit.
• The probability of dropping a bit is then α.
• We want to compute capacity. Obviously, C = 1 if α = 0.
C = max I ( X, Y ) = max ( H (Y ) − H ( X |Y ))
p( x ) p( x )
= max H (Y ) − H (α).
p( x )
H (Y ) = H (Y, E) = H ( E) + H (Y | E).
= H ( α ) + (1 − α ) H ( π ).
H (Y | E ) = H (Y |Y = e ) + ( 1 − α ) H (Y |Y 6 = e ) = α · 0 + ( 1 − α ) H ( π ) .
• Then we get
C = max H (Y ) − H (α)
p( x )
14
3 Symmetric Channels
Symmetric Channels
C = log |Y | − H (r ) (4)
Proof.
I ( X; Y ) = H (Y ) − H (Y | X ) = H (Y ) − H (r ) ≤ log |Y | − H (r ) (5)
where c is the sum of the entries in one column of the transition matrix.
Example 7. The channel with transition matrix
0.3 0.2 0.5
p(y| x ) = 0.5 0.3 0.2
0.2 0.5 0.3
15
4 Properties of Channel Capacity
Properties of Channel Capacity
1. C ≥ 0 since I ( X; Y ) ≥ 0.
2. C ≤ log |X | since C = max p( x) I ( X; Y ) ≤ max H ( X ) = log |X |.
3. C ≤ log |Y | for the same reason. Thus, the alphabet sizes limit the trans-
mission rate.
4. I ( X; Y ) = I p( x) ( X; Y ) is a continuous function of p( x ).
16
• The smaller bins correspond to the noise in the channel, and the packing
problem depends on the underlying ”shape”
• Not really a partition, since there might be wasted space, also depending
on the bin and region shapes.
• There are ≈ 2nH (X ) typical sequences, each with probability 2−nH (X ) and
(n)
with p( Aε ) ≈ 1, so the only thing with ”any” probability is the typical
set and it has all the probability.
• The same thing is true for conditional entropy.
• That is, for a typical input X, there are 2nH (Y | X ) output sequences.
• Overall, there are 2nH (Y ) typical output sequences, and we know that
2nH (Y ) ≥ 2nH (Y | X ) .
• There are ≈ 2nH (Y ) (typical) outputs (i.e., the marginally typical Y se-
quences).
2nH (Y )
≤ = 2n( H (Y )− H (Y |X )) = 2nI (X;Y ) (7)
2nH (Y | X )
17
• We can view the number of non-confusable inputs (7) as a volume. 2nH (Y )
is the total number of possible slots, while 2nH (Y | X ) is the number of slots
taken up (on average) for a given source word. Thus, the number of
source words that can be used is the ratio.
5.2 Definitions
Definitions
18
• No feedback if p( xk | x1:k−1 ; y1:k−1 ) = p( xk | x1:k−1 ). We’ll analyze feedback
a bit later.
Remark. If the channel is used without feedback, the channel transition
function reduces to
n
p (yn | x n ) = ∏ p ( y i | x i ). (8)
i =1
(M,n) code
Definition 13 (( M, n) code). An ( M, n) code for channel (X ; p(y| x ); Y ) is
1. An index set {1, 2, . . . , M } .
2. An encoding function X n : {1, 2, . . . , M } → X n yielding codewords
X n (1), X n (2), X n (3), . . . , X n ( M ). Each source message has a codeword,
and each codeword is n code symbols.
3. Decoding function, i.e., g : Y n → {1, 2, . . . , M } which makes a ”guess”
about original message given channel output.
Remark. In an ( M, n) code, M = the number of possible messages to be
sent, and n = number of channel uses by the codewords of the code.
Error
Definition 14 (Probability of Error λi for message i ∈ {1, . . . , M}).
λi := P ( g(Y n ) 6= i | X n = x n (i )) = ∑n p (yn |xn (i)) I ( g (yn ) 6= i) (9)
y
I (·) is the indicator function; the conditional probability of error given that
index i was sent.
Definition 15 (Max probability of Error λ(n) for ( M, n) code).
(n)
Definition 16 (Average probability of error Pe ).
M
1
∑ λi = P(Z 6= g(Yn ))
(n)
Pe =
M i =1
19
Rate
Definition 17. The rate R of an ( M, n) code is
log M
R= bits per transmission
n
Definition 18. A rate R is achievable if there exists a sequence of 2nR , n
Capacity
Definition 19. The capacity of a channel is the supremum of all achievable rates.
• So the capacity of a DMC is the rate beyond which the error won’t tend
to zero with increasing n.
where
n
p( x n , yn ) = ∏ p ( x i , y i ). (14)
i =1
20
Intuition for Jointly Typical Sequences
• So intuitively,
num. jointly typical seqs. 2nH (X,Y )
= nH (X ) nH (Y )
num ind. chosen typical seqs. 2 2
= 2n( H (X,Y )− H (X )− H (Y ))
= 2−nI (X,Y )
en, Y
3. If ( X en ) ∼ p( x n ) p(yn ), are drawn independently, then
P (X en, Yen ) ∈ A(ε n) ≤ 2−n( I (X;Y )−3ε) . (15)
21
Key property: we have bound on the probability of independently drawn
sequences being jointly typical, falls off exponentially fast with n, if I ( X; Y ) >
0.
1
− log P ( X n ) → − E (log P( X )) = H ( X ) (17)
n
so ∀ε > 0 ∃m1 such that for n > m1
1 ε
P − log P( X n ) − H ( X ) > ε < (18)
n 3
| {z }
S1
22
Joint AEP proof
(n)
Proof of P ( X n , Y n ) ∈ Aε → 1. • For n > max(m1 , m2 , m3 ), we have that
P (S1 ∪ S2 ∪ S3 ) ≤ 3ε/3
(n)
• So, non-typicality has probability < ε , meaning P( Aε ) ≤ ε giving
(n)
P( Aε ) ≥ 1 − ε, as desired.
∑ ∑
(n)
1= p( x n , yn ) ≥ p( x n , yn ) ≥ Aε 2−n( H (X,Y )+ε)
n n
x ,y (n)
( x n ,yn )∈ Aε
(n)
=⇒ Aε ≤ 2n( H (X,Y )+ε)
(n)
• Also, P Aε > 1 − ε,
∑
(n)
1−ε ≤ p( x n , yn ) ≤ Aε 2−n( H (X,Y )−ε)
(n)
( x n ,yn )∈ Aε
(n)
=⇒ Aε ≥ (1 − ε)2n( H (X,Y )−ε)
23
More Intuition
• There are ≈ 2nH (X ) typical X sequences
• There are ≈ 2nH (Y ) typical Y sequences.
• The total number of independent typical pairs is ≈ 2nH (X ) 2nH (Y ) , but not
all of them are jointly typical. Rather only ≈ 2nH (X;Y ) of them are jointly
typical.
• The fraction of independent typical sequences that are jointly typical is:
2nH (X,Y )
= 2n( H (X,Y )− H (X )− H (Y )) = 2−nI (X,Y )
2nH (X ) 2nH (Y )
and this is essentially the probability that a randomly chosen pair of
(marginally) typical sequences is jointly typical.
• So if we use typicality to decode (which we will) then there are about
2nI (X;Y ) pairs of sequences before we start using pairs that will be jointly
typical and chosen randomly.
• Ex: if p( x ) = 1/M then we can choose about M samples before we see a
given x, on average.
Conversely, any (2nR , n) sequence of codes with λ(n) → 0 as n → ∞ must have that
R < C.
24
Channel Theorem
• Two parts to prove: 1) all rates R < C are achievable (exists a code with
vanishing error). Conversely, 2) if the error goes to zero, then must have
R < C.
{ x1 (ω ), x2 (ω ), . . . , xn (ω )} .
25
All rates R < C are achievable
Proof that all rates R < C are achievable. • Can compute probabilities of a given
codeword for ω . . .
n
P ( x n (ω )) = ∏ p(xi (ω )), ω ∈ {1, 2, . . . , M } .
i =1
Otherwise output special invalid integer ”0” (error). Three types of errors
might occur (type A, B, or C).
26
(n)
A. ∃k 6= ω̂ such that. ( x n (k), yn ) ∈ Aε (i.e., > 1 possible typical message).
Proof that all rates R < C are achievable. three types of quality measures we might
be interested in.
2nR
1
∑ λi
(n)
Pe (C) = P(ω̂ 6= ω |C) =
2nR i =1
where
We want to show that if R < C, then exists a codebook C s.t. this error
→ 0 (and that if R > C error must → 1).
27
1. Expand average error (bullet 2 above) and show that it is small.
2. deduce that ∃ at least one code with small error
3. show that this can be modified to have small maximum probability of
error.
2nR
1
∑ P(C) Pe (C) = ∑ P(C) ∑
(n)
P(E ) = λω (C) (24)
C C 2nR ω =1
2nR
1
=
2nR ∑ ∑ P(C)λω (C) (25)
ω =1 C
but
nR
∏2i=1 P( x n (i ))
z }| {
∑ P(C)λw (C) = ∑ P ( g (Y ) 6= ω |X = x (ω )) P x (1), . . . , x 2
n n n n n nR
C C | {z }
T
= ∑ T
x n (1),...,x n (2nR )
P(C)λω (C)
= ∑ ∏ P( x n (i )) ∑ P ( g (Y n ) 6= ω | X n = x n (ω )) P ( x n (ω ))
i 6=ω x n (ω )
| {z }
1
= ∑ P ( g (Y n ) 6= ω | X n = x n (ω )) P ( x n (ω ))
x n (ω )
Last sum is the same regardless of ω, call it β. Thus, we can can arbitrarily
assume that ω = 1.
28
All rates R < C are achievable
2nR
1
∑ P(C) Pe ∑ ∑ P(C)λ1 (C) = P(E |W = 1)
(n)
P(E ) = (C) = β=
C 2nR ω =1 C
• E1 means that the transmitted and received codeword are not jointly typ-
ical (this is error type B from before).
• E2 ∪ E3 ∪ · · · ∪ E2nR . This is either:
29
– Type A: greater than 1 codeword is jointly typical with received se-
quence
30
All rates R < C are achievable
We have that
(n)
(n → ∞)
P E1 = P Aε →0
31
• This will allow us to bound the error, as long as I ( X; Y ) > 3ε.
Proof that all rates R < C are achievable. • So if we chose R < I ( X; Y ) (strictly),
we can find an ε and n so that the average probability of error P(E ) ≤ 2ε,
can be made as small as we want.
• But, we need to get from an average to a max probability of error, and
bound that.
• First, choose p( x ) = argmax p( x) I ( X; Y ) rather than uniform p( x ), to change
the condition from R < I ( X; Y ) to R < C. Thus, this gives us higher rate
limit.
• If P(E ) ≤ 2ε, the bound on the average error is small, so there must exist
some specific code, say C ∗ s.t.
(n)
Pe (C ∗ ) ≤ 2ε.
32
Proof that all rates R < C are achievable. • Lets break apart this error proba-
bility.
2nR
1
∑ λi (C ∗ )
(n)
Pe (C ∗ ) =
2nR i =1
1 1
=
2nR ∑ λi (C ∗ ) +
2nR ∑ λi (C ∗ )
i:λi <4ε i:λi ≥4ε
• Now suppose more than half of the indices had error ≥ 4ε (i.e., suppose
|{i : λi ≥ 4ε}| ≥ 2nR /2 = 2nR−1 ). Under this assumption
1 1 1
2nR ∑ λi (C ∗ ) ≥
2nR ∑ 4ε =
2nR
|{i : λi ≥ 4ε}| 4ε > 2ε.
i:λi ≥4ε i:λi ≥4ε
• Create a new codebook that eliminates all bad codewords (i.e., those in
with index {i : λi ≥ 4ε}). There are at most half of them.
• The remaining codewords are of size ≥ 2nR /2 = 2nR−1 = 2n( R−1/n) (at
least half of them). They all have max probability ≤ 4ε .
• We now code with rate R0 = R − 1/n → R as n → ∞, but for this new
sequence of codes, the max error probability (n) 4 , which can be made as
small as we wish.
Discussion
33
• But we have yet to prove the converse . . .
• We next need to show that any sequence of (2nR , n) codes with (2nR , n)
must have that R < C.
(n)
• First lets consider the case if Pe = 0, in such case it is easy to show that
R < C.
nR = H (W ) = H (W |Y n ) + I (W; Y n ) = I (W; Y n )
≤ I (Xn ; Yn ) // Since W → X n → Y n Markov chain and DP inequality
n
≤ ∑ I (Xi ; Yi ) //Fano’s lemma, follows next
i =1
≤ nC //definition of capacity
• Hence
R ≤ C.
34
Lemma 24. Let Y n be the result of passing X n through a memoryless channel of
capacity C. Then
I ( X n ; Y n ) ≤ nC, ∀ p ( x n ). (27)
Proof.
I ( X n ; Y n ) = H (Y n ) − H (Y n | X n )
n
= H (Y n ) − ∑ H (Yi |Y1 , . . . , Yi−1 , X n )
i =1
n
= H (Y n ) − ∑ H (Yi | Xi ) //def. of DMC
i =1
Fano’s Lemmas
Proof - continuation.
n
I ( X n ; Y n ) = H (Y n ) − ∑ H (Yi | Xi )
i =1
n n
≤ ∑ H (Yi ) − ∑ H (Yi |Xi )
i =1 i =1
n
= ∑ |I (X{zi ; Yi}) ≤ nC.
i =1
≤C
35
Converse to the Channel Coding Theorem
• We rewrite (28) as
(n) C 1
Pe ≥ 1− − .
R nR
• if n → ∞ and R > C, then error lower bound is strictly positive, and
depends on 1 − C/R.
(n) (n0)
• Even for small n, Pe > 0, since otherwise, if Pe = 0 for some code, we
can concatenate code to get large n same rate code, contradicting Pe > 0.
• Hence we cannot achieve an arbitrarily low probability of error at rates
above capacity
36
ments of any such code.
nR = H (W ) = H ( X n (W ))
= H (W |Wb ) + I (W; W b ) = I (W; W
b)
| {z }
=0, since Pe =0
= H (Y ) − H (Y n | X n )
n
n
= H (Y n ) − ∑ (Y n | X i )
i =1
= ∑ H (Yi ) − ∑ H (Yi | Xi ) //if all Yi indep.
i i
= ∑ I ( Xi ; Yi )
i
= nC //p∗ ( x ) ∈ arg max I ( Xi ; Yi )
p( x )
7 Feedback Capacity
Feedback Capacity
Consider a sequence of channel uses.
37
Another way of looking at it is:
• A: No.
• Intuition: w/o memory, feedback tells us nothing more than what we
already know, namely p(y| x ).
• Can feedback made decoding easier? Yes, consider binary erasure chan-
nel, when we get Y = e we just re-transmit.
• Can feedback help for channels with memory? In general, yes.
Definition 25. A (2nR , n) feedback code is is the encoder Xi (W; Y1:i−1 ), a de-
(n)
coder g : Y n → 1, 2, . . . , 2nR and Pe = P ( g (Y n ) 6= W ), for H (W ) = nR
(uniform).
Definition 26 (Capacity with feedback). The capacity with feedback CFB of a
DMC is the supremum of all rates achievable by feedback codes.
Theorem 27 (Feedback capacity).
H (W ) = H (W |W
b ) + I (W; W
b)
(n)
≤ 1 + Pe nR + I (W; W
b) //by Fano
(n)
≤ 1 + Pe nR + I (W; Y n ) //DP ineq.
38
Feedback codes for DMC
. . . proof continued.
n
I (W; Y n ) = H (Y n ) − H (Y n |W ) = H (Y n ) − ∑ H (Yi |Y1:i−1 , W )
i =1
n
= H (Y n ) − ∑ H (Yi |Y1:i−1 , W, Xi ) //Xi = f (W, Y1:i−1 )
i =1
n
= H (Y n ) − ∑ H (Yi | Xi ) ≤ ∑ [ H (Yi ) − H (Yi | Xi )]
i =1 i
= ∑ I ( Xi ; Yi ) ≤ nC
i
39
Joint Source/Channel Theorem: process
The process would go something as follows:
6. Maybe obvious now, but at the time (1940s) it was a revolutionary idea!
40
Source/Channel Coding Theorem
Theorem 28 (Source-channel coding theorem). If V1:n satisfies AEP and if H (V ) <
(n)
C, then ∃ a sequence of (2nR ; n) codes with Pe → 0. Conversely, if H (V ) > C, then
(n)
Pe > 0 for all n and cannot send the process with arbitrarily low probability of error.
(n) (n) (n)
Proof. • If V satisfies AEP, then ∃ a set Aε with Aε < 2n( H (V )+ε) (Aε
has all the probability).
• We only encode the typical set, and signal an error otherwise. This ε
contributes to Pe .
n o
(n)
• We index elements of Aε as 1, 2, . . . , 2n( H +ε) , so need n( H + ε) bits.
41
Source/Channel Coding Theorem
. . . proof continued. • We get the following derivation
H (V1 , V2 , . . . , Vn ) H (V1:n )
H (V ) ≤ =
n n
1 1
= H (V1:n |V b1:n ) + I (V1:n ; V b1:n )
n n
1 (n)
1
≤ 1 + Pe log |V n | + I (V1:n ; V b1:n ) //by Fano
n n
1 (n)
1
≤ 1 + Pe log |V n | + I ( X1:n ; Y1:n ) //V → X → Y → V b
n n
1 (n)
≤ + Pe log |V | + C //memoryless
n
• Letting n → ∞, 1/n and Pe → 0 which leaves us with H (V ) ≤ C.
9 Coding
9.1 Introduction to coding
Coding and codes
• Shannon’s theorem says that there exists a sequence of codes such that if
R < C the error goes to zero.
• It doesn’t provide such a code, nor does it offer much insight on how to
find one.
• Typical set coding is not practical. Why? Exponentially large sized blocks.
• In all cases, we add enough redundancy to a message so that the original
message can be decoded unambiguously
42
Repetition Repetition Repetition Code Code Code
• For many channels (e.g., BSC(p < 1/2)), error goes to zero as k → 1.
• Easy decoding: when K is odd, take a majority vote (which is optimal for
a BSC)
• On the other hand, Rα1/K → 0 as K → ∞
• One scenario
s 0 0 1 0 1 1 0
z}|{ z}|{ z}|{ z}|{ z}|{ z}|{ z}|{
t 000 000 111 000 111 111 000
n 000 001 000 000 101 000 000
r 000 001 111 000 010 111 000
• Another scenario
s 0 0 1 0 1 1 0
z}|{ z}|{ z}|{ z}|{ z}|{ z}|{ z}|{
t 000 000 111 000 111 111 000
n 000 001 000 000 101 000 000
r 000
|{z} 001
|{z} 111
|{z} 000
|{z} 010
|{z} 111
|{z} 000
|{z}
s
b 0 0 1 0 0 1 0
corrected errors *
undetected errors *
• Thus, can only correct one bit error not two.
43
• nth bit is an indicator of an odd number of 1 bits in.
• I.e., xn ← mod ∑in=−11 xi , 2
• Thus a necessary condition for valid code word is: mod ∑in=−11 xi , 2 = 0.
• Any any instance of an odd number of errors (bit swaps) won’t pass this
condition (although an even number of errors will pass the condition).
• Quite perfect: can not correct errors, and moreover only detects some of
the kinds of errors (odd number of swaps).
• On the other hand, parity checks form the basis for many sophisticated
coding schemes (e.g., low-density parity check (LDPC) codes, Hamming
codes etc.).
• We study Hamming codes next.
x4 ≡ x1 + x2 + x3 mod 2
x5 ≡ x0 + x2 + x3 mod 2
x6 ≡ x0 + x1 + x3 mod 2
44
• We can also describe this using linear equalities as follows (all mod 2).
x1 + x2 + x3 + x4 =0
x0 + x2 + x3 + x5 =0
x0 + x1 + x3 + x6 = 0
• Why? Since columns of H are all different, sum of any two columns is
non-zero, so can’t have any weight-2v (summing two columns is never
zero).
• Minimum weight is 3 since sum of two columns will equal another col-
umn, and sum of two equal column vectors is zero.
45
Hamming Codes : Distance
• Now a BSC(p) (crossover probability p) will chance some of the bits (noise),
meaning a 0 might change to a 1 and vice verse.
• So if x = ( x0 , x1 , . . . , x6 ) is transmitted, what is received is y = x + z =
( x0 + z0 ; x1 + z1 , . . . , x6 + z6 ), where z = (z0 , z1 , . . . , z6 ) is the noise vector.
• Receiver knows y but wants to know x. We then compute
s = Hy = H ( x + z) = |{z}
Hx + Hz = Hz
=0
46
• 16 is better than 128 (possible z vectors) but still many.
• What is the probability of each solution? Since we are assuming a BSC(p)
with p < 1/2, the most probable solution has the leastweight. Any solu-
tion with weight k has probability pk .
• Notice that there is only one possible solution with weight 1, and this is
most probable solution.
• In previous example, most probable solution is z T = (01000000) and in
y = x + z with y T = 0111001 this leads to codeword x = 0011001 and
information bits 0011.
• In fact, for any s, there is a unique minimum weight solution for z in
s = Hz (in fact, this weight is no more than 1)!
• If s = (000) then the unique solution is z = (0000000).
• For any other s, then s must be equal to one of the columns of H, so we
can generate s by flipping the corresponding bit of z on (giving weight 1
solution).
47
• Here, first four bits to be sent ( x0 , x1 , x2 , x3 ) are set as desired and parity
bits ( x4 , x5 , x6 ) are also set. Figure shows ( x0 ; x1 , . . . , x6 ) = (1, 0, 0, 0, 1, 0, 1)
with parity check bits:
x4 ≡ x0 + x1 + x2 mod 2
x5 ≡ x1 + x2 + x3 mod 2
x6 ≡ x0 + x2 + x3 mod 2
• The syndrome can be seen as a condition where the parity conditions are
not satisfied.
• Above we argued that for s 6= (0, 0, 0) there is always a one bit flip that
will satisfy all parity conditions.
48
Hamming Decoding: Venn Diagrams
• Example: And here, there are two errors, y6 and y2 (marked with a *).
• Flipping y1 will achieve parity, but this will lead to three errors (i.e., we
will switch to a wrong codeword, and since codewords have minimum
Hamming distance of 3, we’ll get 3 bit errors).
49
Coding
• Convolutional codes
• Turbo codes (two convolutional codes with permutation network)
• Low Density Parity Check (LDPC) codes.
• All developed on our journey to find good codes with low rate that achieve
Shannon’s promise.
References
References
[1] Thomas M. Cover, Joy A. Thomas, Elements of Information Theory, 2nd edi-
tion, Wiley, 2006.
[2] David J.C. MacKay, Information Theory, Inference, and Learning Algorithms,
Cambridge University Press, 2003.
[3] Robert M. Gray, Entropy and Information Theory, Springer, 2009
50
References
[1] D. A. Huffman, A method for the construction of minimum redundancy
codes, Proc. IRE, 40: 1098–1101,1952
[2] C. E. Shannon, A mathematical theory of communication, Bell System
Technical Journal, 1948.
[3] R. V. Hamming, Error detecting and error correcting codes, Bell System
Technical Journal, 29: 147-160, 1950
51