Chapter 6 ITC
Chapter 6 ITC
Chapter 6 ITC
R E L IA B L E
M E.S S A G E S
T H R.O U G H ·
U N R E L IA B L E
CHANNELS
,\
6~ 1. Introduction I
In Chapter 6 we shell prove Shannon's second theorem-the u,ost
surprising as wel l~ the most important single result of information
theory. Because of the significance of this theorem, it would be well
to stand back and survey the main results we have already obtained.
We have been able to justify our use of entropy and entropy-
derived measures of information in .two inste.noes-Shanno
n's
first theorem (Section 4-3) and the generalization of this theorem
dealing with equivocation (Section 5-5), Shannon's first theorem
147
RELIABLE MESSAGES THROUGH UNRELIABLE CHANNELS 1~9
J
D CODING
150 INFORMATION THEORY AN
Th is ch an ne l ha s three inn~u
W he n a pa rti cu la
?
r ch
Th
an
is
::1,
. pu
as~
t
<,Ja an ? three o~tputs, b1, b2, b,.
18 received, which in pu t sh ou ld
ng definition.
we sa y w as se nt qu es tio n pr om pt s th e fallowi
~e::~nA
a. p e
al
ou tp ut ) • ph ab et B = { b·} . = 1 2
, , · · • ,
r-symbol in pu t
Conside~ a ch an ne l wi th an d an s- sy m bo l
- {Oi}, 1 = 1, 2, • • . , r, an
s.
A d ..
eo1s1on
I d( b ' '3 un iq ue in pu t sy m -
ru an y fu nc tio n sp ec ifying .a
e, i , 1S
l.
bo fo r ea ch ou tp ut symbo
l
for tho channel of (6-2) are
Eu .m pl e 6- 1. Tw o possible. de<'ision ru les
d(b1) =- a1 .
(6-3)
d(b .) - a 1
d(b,) - a,
and
d(b i) - t11
(~ 4)
d(ba) ... as
d(ba) - a,
d(b1) • a,
d(b,) - a 1
d(ba) aa aa
Note that this rule is not unique. There a.re, in fact, three maximum-lik.eli•
hood decision rules for this channel.
►
150 INFORMATION THEORY AND CODING
This channel has three inputs a1, a2, aa and three outputs, b1, b2, ba.
When a particular channel output is received, which input should
we say was sent? This question prompts the following definition.
Definition. Consider a channel with an r-symbol input
alphabet A == {cis}, i = 1, 2, . . . , r, and an 8-symbol
output alphabet B = {b1} , j = 1, 2, . . . , B. A decision
rule, d(bJ), is any function specifying a unique input sym-
bol for ea.ch output symbol. ·
Example 6-1. Two possible derision rules for the channel of (6-2) are
d(b1) - Gt
d(b1) -= a1 (a.3)
d(ba) - a,
a.nd
d(b,) - Gt
d(b1) - Gt (6-4)
d(ba) - a,
For a cha:nnal with r inputs and 8 outputs, there are r• different
possible decision rules. The question which prompted our defini-
tion of a ·decision rul~ may, therefore, be rephrased as "Which one
of these r' decision rules should we use?'' The answer to this
question will depend upon what we are trying to accomplish, but a
reasonable goal for our purposes is the minimizat ion of the error .
probability of the channel. Hence, we seek a decision rule which
minimizes the error probabilit y of our channel. To find such a rule,
we calculate the probabilit y of error PB. This probabilit y may
be written as the average of P(E/b;), the conditiona l probabilit y of
error given that the output of the channel is b1,
Pzr
.
=L
B
P(E/b);t'(b ) (6-5)
Hence, when the a ·priori probabilities are all equal, the oonditional
maximum-likelihood decision rule may be written
d(b1) = a• (6-9a)
whe~
P(bJ/a*) ~ P(bJ/4') for all i (6-9b)
Note that this rule i8 not unique. There are, in faot, three maximum-likeli-
·
hood decision rules forI this ohannel.
I
I
I
l
152 INFORMATION THEORY AND CODING
r
= iB P(b) - B P[d(b)/bJP(b)
= 1 - ~ P[d(b)_, b]. (6-10)
. The terms in the summation of (6-10) are just the joiJ}t probabili-
ties that d(b!) = a• is transm itted and b,-.is received (for ea.oh J).
Hence, defining PB = 1 ·- Ps, we may rewrite (6-10) as
Pa= r P(a*, b) .
B
(6-11)
Since
r P(a, b) a:::: 1 '(6-12)
t.'s
we may also rewrite (6_-10) as
of the A alphabe t except d(b1) :::; a•. An ~ternat ive way of writing
(6-13) is
Pa= }; P(b/a)P(a) (6-14)
II B,A-a•
If .the a priori probabi lities P(a) are all equal, then {6-14) becomes
Pa =~
B,A-a•
2 P(b/a) (6-16)
Equation (6-16) is of" some ·'interest since (for ·the special case ·of
equal-a priori probabilities) it provid ~ an expression for the channel
a
error probability in terms of sum over the elements of the channel ·
matrix P(b/a). The sum is over all elemente of the channel matrix,
except that one term [corresponding to d(b1)] is omitted from each
column.
-
RELIABLE MESSAGES THROUGH UNRELIABLE CHANNELS 153
E:rample 8-8, We calculate the error probability of the obannel wied in
Examples ~1 and 6-2.
0.6 0.3 0.2]
0.2 0.3 0.6 (~16)
[ 0.3 0.3 0.4
We fWllme that all three Jnput symbols are chosen with equal probabilities and
that a maximum-likelihood decision rule ia used. (Recall that this rule result.a
in & minimum P11 lor equal a priori probabilities.)
P• • tf(0.2 + 0.3) + (0.3 + 0.3) + (0.2 + 0.4)]
- 0.667
Pll s: i
B,A-a•
P(a, b) . (6-13)
H(A/B) =B
I
f. 0
• P(a, b) log P(;/b)
l
15-4 INFORMAT ION THEORY AND CODING
Now we employ (2-2) ·-to change the base of the logarithms in the
right side of (6-19),
(log e)- 1[H(A/B) - H(PB) - P.a log (r - 1)]
on eacli term in the summatio ns. · The right side of (6-20) is less
than or equa.l to
I
j
REUABlE MESSAGE
S THROUGH UNRE
LIABLE CHANNELS 155
,. - 1 in pu t sy m bo
ls w as se nt w it h a
ab ov e in te rp re ta ti t m os t lo g (r - 1)
on un fo rt un at el y, bi
t (6 -2 3) , al th ou gh is no t e. de qu at e as ts . ;h0~
it ~ Y be us ed as
di ff er en t fr om th e th e ba si s of a pr oo a pr oo
f so m ew ha t
L et UB ex am in e on e w e ha ve us ed .
th e co nd it io n fo r eq
(6 -2 3) . T h e in eq ua li ty in th e F an
ua li ty o bo un d
ln x < x - 1
be co m es an eq ua (6 -2 1)
li ty if an d on ly if
in (6-23), w e fi nd x
th at th e Fa.no bo un =- 1. U si ng th is co nd it io n
on ly if d be co m es an eq ua
li ty if an d
P (a /b ) - r ~ fo r al l b an d a
1 ~ a• (6 -2 4a )
an d
P (a */ b) =- P11 fo r al l b
Si nc e (6-24b)
LA P(a/b) = l fo r all b
th e se co nd co nd it io
n, (6-24b), follows
ti on (6-24a) im pl ie fr om th e first, (6 -2
s, fo r ea ch b, th at 4a ). Equa-
on e se le ct ed b y al l in pu t sy m bo ls
ou r de ci si on ru le ex ce pt th e
co nd it io n th us se rv a.re eq ua ll y pr ob ab
es to re in fo rc e ou le . T hi s
bo un d. r in te rp re ta ti on of
th e Fa.no
6- 4. Reliable Mes
sages and Unrelia
ble Channels
T he ob je ct of 8-ha
.nnon's se co nd th eo
m en ta l li m it at io ns re m is to de sc ri be th
on th e tr an sm is si on e fu nd a-
th ro ug h an un re li ab of re li ab le er ro r- fr ee
le ch an ne l. C on si m es sa ge s
tr an sm it re li ab le m de
es sa ge s (F ig ur e 6-1) r fi rs t th e us e of a. B S C to
.
A - { ~ - \ B SC ~ ~} - B
Flo'Olllll 6- 1. A B
SC .
T o be ev en m or e
specific, w e as su m
er ro r of th is B S C , e th at p, th e pr ob
is 0.01. T hu s, 99 ab il it y of
m it te d are received pe r ce nt of th e bi
co ni ts tr an s-
si on sy st em s, ho w ev rr ec tl y. F o r ma.ny m od em -d at a- tr
er , th is le ve l of re li a. ns m is -
ab il it y is fa.r fr om
ad eq ua te .
l
I
l:;
156 INFORMATION THEORY AND CODING
10- •, or eve n lower are
Pro bab ilit y of err or req uire me nts of 10-e,
reli abi lity of our channel
oft en nec ess ary . In ord er to inc rea se the
l times. For example,
we ma y hav e to rep eat ·o ur mes sag e sev era
or 1) three times. One
sup pos e we dec ide to sen d eac h mes sag e (0
ted in Fig ure 6-2.
me tho d of vie win g this pro ced ure is illu stra
Measage, Output, .
U nU1ted ,i.gnala
000 000
001
001 010
010
011
100
101
~1 (BB C)• ~
011
100
101
110
110 111 111
xne nt
O
t
T e (BS C) L- a bin ary seq uen ce of len gth 3. The pro bab ility
ror s occ ur in the tran smi ssio n of our thre e dig
its is ·
t,ha,t no er
(1 - p)• = (p) •
000 00 000 00
00001
00010 00001
00011 00010
00011
-.\ (BBC)• ~
11110
11111 11111
FIG UBI I ~8. A met hod of increasing reliability.
Th e probability of zero, one, two, three,
four, or five bin it errors
in tre.nsmission is
p' .'
5'P'fl'
1Qp2pl .
10p11}2
5p4p
p'
t The proba.bility of message error will ordinarily depend ,upo
n the
meea.ge probabilities. Beoo.use of the sym met ry in the situatio & prio ri
describing, however, the erro r pro bab ility n. we a.re
is independent of the e. priori prob&-
bilities.
·
158 fNFORMATION THEORY AND CODING '
respectively. If we again use a majority rule (i.e., maximum likeli-
hood) to decide whether 00000 or 11111 was transmitted we obtain I
a probability of error '
Ps = p' + 6p 1} + 10p p
4 3 2
(6-27) '
,•
·(i.e., .the sum of the probabilities of five, four, or three binit errors).
For p = 0.01, this yields · .
PB s== 10-1 (6-28)
There is, of course, no limit to this crude method of increasing
reliability. In Table 6-1 we give the probability of message error
when 1, 3, 5, 7, 9, and 11 binits per message a.re used in a BSC with
single-binit probability of erro:s; of p == 0.01.
TilLZ 6-1. PBOBAllILlTY 01' M'ESSA.Glll ERROR IN A BBC
Biniu par ~robabil~y of
b&nartl muaag~ muaage error
1 10-1
3 3 X 10-•
g 10-1
7 4 X 10-'
9 10-•
11 5 X 10-10
l
t
RELIABLE MESSAGES THROUGH UNRELIABLE CHANNELS 159
the most obvious method of exchanging rate for reliability. Do
there exist more sophisticated and more e~cient methods of making
this exchange? That is, for a given value of probability of message
error in Figure 6-4, do there exist coding methods which give us a
greater message rate than that achieved by simple repetition? The
answer to this question is an emphatic "Yest"
10-IJ
10-4 1
10-4
Limit given by
10_~ )>----l....-S_h_
••-no_n_•s-~h_eo_,e_m_ _
1 .! l ! .l 1
, s a · 7. 9 . 11
C Message rate, binary messages/blnlt
Ftat7BII 6-5. Two poasible exchanges of rate for reliability in a BSC.
wished to transmit. As indicated in Figures 6-1 and 6-2, this c~n
be viewed as increasing the order of the channel extension we use
and selectjng only two of the possible extension input symbols,
a., as messages. A more powerful method of varying the message
rate -the method we shall employ in proving Shannon's second
theo rem- is to fix the order of the·extension we use but to vary
the number of channel input symbols, a,, used as messages. Again
we illustrate this for the case of the BSC in Figure 6-6.
Suppose we can transmit binary symbols through a BSC a~ the.
rate of one per second. Then the CXI, consisting of sequences of
-
RELIABLE MESSAGES THROUGH UNRELIABLE CHANNELS 161
while the message rate is i binit per sec. If, on the other hand, we
use all eight of the a, as mess~ges, the probability that a message
000 000
001 001
010 .--------, 010
A
1
-
011
100 -+
I (BBC)• !L 011'
100 =- B• .
101 101
110 110
111 ' 111
Fia'OJUl 6-6. Third extenmon of the BSC.
Mua agu
Output., Mkcted
:} -- --.. 000 i
I
!
010 } 011
011 - - - - -
100 } 101
101 - - - - -
110}----- 110
111
FIGU RE 6a7. A max imu m-li keli hoo d deci sion .rule .
No. of message.a M: 8 4 2
The code words of the three codes given in Table 6-2 may .be
shown as vertices of three-dimensional cubes. Then the hamnung
1~ INFORMATION THEORY AND CODING
distance between any two code points can be seen to be the num-
ber of steps we must take to go from one point to the other. Note
that the minimum distances between code points in codes ct, CB,
and e a.re 1, 2, and 8, respectively. ·
The minimum distanoe between code points of a code is closely
related to the probability of error of the code. In general, the
greater the minimum distance, the lower the error probability we
ca.n expect when using the code. Of colll'8e, the greater the mini- .
mum distance, the fewer code points we can pack onto the vertices
of an n-dimensional C\lbe. This is but another expression of an
effect noted in the previous section. The advantages of being able
to represent a large number of messages wi~ a. code must be weighed
011 011
111
I I
001 I
I
101
II I
I
000
/
I
01~-
--- 110
000
/
,J...- -- 110
000
_,.,,
I
>-- --
100
Code <t. Code CB Code e
Flouu 6-8. Three codes for the (BSC) •.
{6-35)
J
' RELIABLE MESSAGES THROUGH UNRalABLE CHANNELS
that errors will oecur in the D places where they cliffer and that no
errors will occur in the n - D remaining places:
(6-36)
For p < ½-(the only case of any interest), P(ftJ/a.) decreases with
increasing D.· The farther /J1 is from the transmitted binary sequence,
the less likely it is to be received. The·~um-likelihood deci-
sion rule selects that code word which 'maximizes P(/31/ ~); hence,
for any recei:ved sequence {J;, the maximum-likelihood rule selects
the code word closest to fJ; in the hamming-distance sense.
When one of these code words, say ao, is sent through the chan-
nel, some other bina.ry sequence of length n, Bay {J1, is received
167
(Figure 6-11).
a, ~1 (BBC)• ~ fJi
messages) from the set of 2" possible inputs to the (BBC)", such
tha.t the probability of error in decoding the channel output can l
•
be made as small as we wish.
· In Figure 6-10 we have indicated the 2" possible inputs and 2" t
outputs of the nth extension of a BSC with probability of error p.
1!
A"
lj
00···00 00 .. · 00 '
00 • • • 01 00 .-.-, 01
- . .. . . .
00 • · • 10 00 · • ··-10
11 · · · 11 11 · · · 11 . ~ I
1
Fiauu: 6-10'. The nth extension of a BSC. j
The inputs and outputs of this channel consist of sequences of l
n ·binary digits. In order to send M messages through this channel,
we select M of the 2" p~ible .inputs as code words. In Section
6-5 we indicated how the probability of messs.ge enor P s increases
as M increases. The question we ·must now answer is, "How many
messages is it possible to send and still have the message error
probability remain small?"
Of course, the answer to this question must depend upon how
we select our input symbols for the messages. If we select the
code words so that they cluster together, we can expect a higher
probability of error than if we construct a code with the same num-
ber of code words more or less equally spaced from one another.
The method of coding will be crucial to the probability of error we.
obtain and, therefore, to the maximum number of messages we
may use. Let us defer this important question, however, and
assume that somehow we have selected a code consisting of M code
words of n binits each for use with the (BSC)".
168 INFORMATION THEORY AND CODIN.G
none or because there are more than one), we just throw up our
hands and make an error! Now the reader may object that, in
the latter circumstances, we a.re giving up too easily. The reader
is correct in this objection. Nevertheless, we shall still be able to
show that this procedure leads to a negligible probability of error.
Using the procedure just described, there are two ways in which
an error in decoding a received symbol may occur. Let S(np.)
indicate a sphere ·of radius np. about the received symbol (Figure
6-18). The first way is if ao, the transmitted code word, is not in
l
'\
ecx1
131
t The coding procedure deecribed may result in a singular code- that is, we
lll&y select the ume alip of paper more than once, ,-nd therefore use
the s~e
code word for more than one meaaage. For M << 2•, such an occurrence 18
unlikeiy but 1)018ible. For M > 2- su~h an occurrence is inevitable.
' RELIABLE MESSAGES THROUGH UNRBJABLE CHANNELS
We can bound this sum with e.n often used inequality of informa-
tion theory (Peterson, 1961, p. 246; Wozencraft and Reiffen, 1961,
p. 71): ·
I (~) ~
i•O
2•BlP,l for p, <i (6-46)
us sio n
6-9. Sh an no n's Se co nd Th eo rem -D isc j
ore m pro ve d in the las t tw o sec tions is ad mi tte dly of a
Th e the
considered is the :sim ple st no n-
veey special nature. Th e channel
triv:ial ch an ne l-t he BSC. Never
theless, -al l the im po rta nt ideas
re general the ore m an d all the
necesaa.ry for the pro of of the mo
po rta nt con seq uen ces wh ioh fol low . fro m the. more .general
im
RELIABLE MESSAGES THROUGH UNRELIABLE CHANNELS 173
!;
theorem are illustrated in these sections. We use the present sec-
tion to discuss these ideas and implications before proving the
general theorem in Section 6-10.
The first idea we discuss is random coding, introduced by Shannon.
It is necessary to understand what is meant by such a coding
procedure if we are to appreciate the limitations of the theorem.
Because of the fact that the code words were selected at random,
we could use Equation (6-47) to bound the probability that an
arbitrary code word would be in the sphere of radius np, surround-
ing p1, If we had fixed the code words in some deterministic
fashion, we would not be able to speak of the probability that a
code word is within a distance np, of the received sequence PJ in
this way. Looking at this matter even more closely, it is possible
to view the coding procedure we have described as really no coding
procedure at all.
AJJ a ·practical method of coding or finding a good set of oode
words, random coding leaves mueh to be desired. If we use this
procedure, on the average we can make the probability of error as
small as we-wish. The average referred to here, unfortunately, is
the average over the ensemble of possible codes. Thus we a.re not ~
assured, once we arrive a.t some fixed code, that we have a good
code. As an extreme example, note that it is possible to obtain
the highly singular code where all M messages are mapped into the
same code word.
Shannon's second theorem can, therefore, be characterized a.s a
I
little more than an existence proof, but a little less than a. construc-
tive proof. The theorem.does not exactly show us how to construct
a good code, and so we do not quite have a method of finding codes. I
the converse will be suffic ient for our purposes. We note, however,
that severa l strong er fonns may be proved . Wolfowitz (1959) has
shown that if we choose M = 2"<~+•> code words (where C is the
chann el capac ity and E > 0), then the best proba bility of error
approaches OM as n increa ses I
The codin g theore m states that we can make the proba bility of
misint erpret ing a word sent throug h a noisy chann el to repres ent
one of our messa ges as small as we wish, while keepin g the messa ge
rate of the cha.nn el fixed. The impor tant point of the previous
sentence is that the theore m deals with the proba bility of error of
messages, or code- worda. For example, in the case of the BSC,
the theore m says that the proba bility of misint erpret ing a sequence
of n zeros e.nd ones is arbitr arily smaU. This is much strong er
than merel y saying that the proba bility of misint erpret ing a single
binit is arbitr arily smaJ). This distin ction has led to misunder-
stara dinge in interp reting the result s of the variou s forms of the
converse of Shann on's secon d theore m. In terms of the BBC
treate d in the previo us sectio n, the converse states that if the num-
ber of equipr obable messa ges M is greate r than .2nc (where age.in
C is the capac ity of the BSC) , the proba bility of error in a word
approa.ches 1 as n increa ses. This conclusion will hold for any set.
of code words (not just on the averag e with respec t to an ensem ble
of codes) and· for any decisi on rule. The theore m is of. ~ t
mathe matic al intere st, but its releva nce to the commumca.tion
proble m is .often overem phasiz ed. The theore m does not s~te
that effective comm unicat ion is impossible if M > 2nc.
this point let us plot the proba bility of a binit error 1n a BSC
:o clarify
versus th~ messa ge rate when the binits are chosen to be O or 1
with equal proba bility• . • h o
rate R bina...v messages per b1n1t, less t a.n ,
F or any messa ge , ,l.,U,94., b b·lit
the ea. cit of the chann el, we know that ~e pro a I Y of a
. . pa y b made as small as we wish. For a message
b1D.1t error can also ~ think of the following procedure.
rWate R _greatei
e again emp oy
~ ::=i on of the BSC to transm it messages
nth . a e rate of R binary messages I
and let n increa se. To achiev e a mess g to send throug h the nth /l
1 ¼
~
I!
I 5
1
7
.! ...
11
C
B = message rate, binary messages/blnit
FIGUBB 6-14. Binit error rate in the BBC as a function of meimage rate.
however, have been obtain ed, detailing exactl y how fast the proba -
bility of error can be made to approa ch O as n, the order of the
extens ion used, is increased. We do not treat this subjec t other
than to remar k that many author s have been able to obtain expo-
nentia l (or almos t exponential) behav ior of the probab ility of error
with n. Some of these results are referenced in the notes at the
end of this chapte r.
A11