Convolutional Coding and Viterbi Decoding
Convolutional Coding and Viterbi Decoding
Convolutional Coding and Viterbi Decoding
Waveform
Former
?
Baseband
Front-End
R(t)
6
6
Encoder
?
Decoder
6
c
1
, . . . , c
n
y
1
, . . . , y
n
d
1
, . . . , d
k
d
1
, . . . ,
d
k
c
j
-
-
y
j
6
Z N(0,
N
0
2
)
Figure 6.1: System view for the current chapter.
6.2 The Encoder
The encoder is the device that takes the message and produces the discrete-time channel
input. In this chapter the message consists of a sequence b
1
, b
2
, . . . , b
k
of binary source
symbols. To simplify the descriptoin of the encoder, we let the source symbols take value
in {1}.
For comparison with bit-by-bit on a pulse train, we let the channel symbols take value
in {
E
s
}. In describing the encoder output we do not want to carrying around the
factor
E
s
. Hence we declare the encoder output to be the sequence x
1
, x
2
, . . . , x
n
where
x
j
=
c
i
Es
{1}, j = 1, . . . , n.
The source symbols enter the encoder sequentially, at regular intervals determined by the
encoder clock. During the j th epoch, j = 1, 2, . . . , the encoder takes b
j
and produces
two output symbols, x
2j1
and x
2j
, according to the encoding map
x
2j1
= b
j
b
j2
x
2j
= b
j
b
j1
b
j2
.
6.2. The Encoder 197
To produce x
1
and x
2
the encoder needs b
1
and b
0
, which are assumed to be 1 by
default.
The circuit that implements the convolutional encoder is depicted in Figure 6.2, where
denotes multiplication in R or, equivalently, addition in the binary eld with ele-
ments {1} (see Problem 5). A shift register stores the past two inputs. As implied by
the indices, the outputs are serialized.
b
j1
b
j2
b
j
x
2j
= b
j
b
j1
b
j2
x
2j1
= b
j
b
j2
Figure 6.2: Convolutional encoder.
Notice that the encoder output has length n = 2k . The following is an example of
a source sequence of length k = 5 and the corresponding encoder output sequence of
length n = 10.
b
j
1 1 1 1 1
x
2j1
, x
2j
1, 1 1, 1 1, 1 1, 1 1, 1
j 1 2 3 4 5
Because the n = 2k encoder output symbols are determined by the k input bits, only 2
k
of the 2
n
seqeunces of {
E
s
}
n
are codewords. Hence we use only a fraction 2
k
/2
n
= 2
k
of all possible n-length channel input sequences. From a high level point of view, we give
up a factor two in the bit rate to make the signal space much less crowded, hoping that
this will signicantly reduce the probability of error.
We have already seen two ways to describe the encoder (the encoding map and the encoding
circuit). A third way, useful in determining the error probability, is the state diagram of
Figure 6.3. The diagram describes a nite state machine. The state of the convolutional
encoder is what the encoder needs to know about past inputs so that the state and the
current input determine the current output. For the convolutional encoder of Figure 6.2
the state at time j can be dened to be (b
j1
, b
j2
) . Hence we have 4 states.
As the diagram shows, there are two possible transitions from each state. The input
symbol b
j
decides which of the two transitions is taken during epoch j . Transitions are
labeled by b
j
|x
2j1
, x
2j
. To be consistent with the default b
1
= b
0
= 1, the state is (1, 1)
when b
1
enters the encoder.
198 Chapter 6.
b
l
r
t
1| 1, 1
1|1, 1
1|1, 1
1|1, 1
1|1, 1
1| 1, 1
1| 1, 1
1| 1, 1
State Labels
t = (1, 1)
l = (1, 1)
r = (1, 1)
b = (1, 1)
Figure 6.3: State diagram description of the convolutional encoder.
The choice of letting the encoder input and output symbols be the elements of {1}
is not standard. Most authors chose the input/output alphabet to be {0, 1} and use
addition modulo 2 instead of multiplication. Our choice is better suited for the AWGN
channel, whereas the conventional viewpoint has the advantage of making it evident that
the code is linear. As it is linear and time-invariant, the output is the result of convolving
the input with an imulse response (even though the coding literature does not talk about
imulse response). In Exercise 6 we establish the link between the two viewpoints and in
Exercise 5 we prove from the rst principles that the encoder is indeed linear.
In each epoch, the convolutional encoder we have chosen has k
0
= 1 symbol entering and
n
0
= 2 symbols exiting the encoder. In general, a convolutional encoder is specied by
(i) the number k
0
of source symbols entering the encoder in each epoch; (ii) the number
n
0
of symbols produced by the encoder in each epoch, where n
0
> k
0
; (iii) the constraint
length m
0
dened as the number of input k
0
-tuples used to determine an output n
0
-
tuple; and (iv) the encoding function, specied for instance by a k
0
m
0
matrix of 1s and
0s for each position of the output n
0
-tuple. In our example, k
0
= 1, n
0
= 2, m
0
= 3,
and the encoding function is specied by [1, 0, 1] and [0, 1, 1] (compare to the connections
that determine the top and bottom output in Figure 6.2). In our case, the elements of
the output n
0
tuple are serialized into a single sequence that we consider to be the actual
encoder output, but there are other possibilities. For instance, we could take the pair
x
2j1
, x
2j
and map it into an element of a 4-ary channel input symbol.
6.3. The Decoder 199
6.3 The Decoder
A maximum likelihood (ML) decoder for the AWGN channel decides for (one of) the
encoder output sequence x that maximizes
hy, ci
kck
2
2
,
where y is the decoder input sequence and c =
E
s
x. The last term in the above
expression is irrelevant as it is
nEs
2
, thus the same for all c. Furthermore, nding an x
that maximizes hy, ci = hy,
E
s
xi is the same as nding one that maximizes hy, xi .
In the above paragraph, we have introduced a slight abuse of notation. Up to this point
the inner product and the norm have been dened for vectors of C
n
written in column
form, with n being an arbitrary but xed positive integer. Considering n-tuples in colon
form is a standard mathematical practice when matrix operations are involved. (We have
used matrix notation to express the density of Gaussian random vectors.) In coding
theory, people nd it more useful to write n-tuples in row form because it saves space
when specic examples are give. For instance, it would be impractical to label trellis
branches with column vectors. Furthermore, the length k of the information sequence
and the length n of the encoder output sequence can vary from message to message. For
these reasons, we prefer referring to them as sequences rather than as k and n-tuples,
respectively. The abuse of notation we refer to is that we take inner products and norms
of sequences and their length varies as needed.
To nd an x that maximizes hx, yi , we could in principle compute hx, yi for all 2
k
sequences that can be produced by the encoder. This brute-force approach would be
quite unpractical. For instance, if k = 100 (which is a relatively modest value for k ),
2
k
= (2
10
)
10
which is approximately (10
3
)
10
= 10
30
. Using this approximation, a VLSI
chip that makes 10
9
inner products per second takes 10
21
seconds to check all possibilities.
This is roughly 4 10
13
years. The universe is only roughly 2 10
10
years old!
What we need is a method that nds a maximizing x with a number of operations that
grows linearly (as opposed to exponentially) in k . We will see that the so-called Viterbi
algorithm achieves this.
To describe the Viterbi algorithm (VA), we introduce a fourth way of describing a con-
volutional encoder, namely the trellis. The trellis is an unfolded transition diagram that
keeps track of the passage of time. For our example, if we assume that we start at state
(1, 1) , that we encode k = 5 source bits and then feed the encoder with the dummy
bits b
0
= b
1
= 1 to make it go back to the initial state and thus be ready for the next
transmission, we obtain the trellis description shown on the top of Figure 6.4.
There is a one-to-one correspondence between an encoder input sequence b , an encoder
output sequence x, and a path (or state sequence) that starts at the initial state (1, 1)
(left state) and ends at the nal state (1, 1) (right state) of the trellis. Hence we can refer
to a path by means of an input sequence, an output sequence or a sequence of states.
200 Chapter 6.
The trellis on the top of Figure 6.4 has edges labeled by the corresponding encoder output
pair. The encoder input symbol is omitted as it is the rst component of the next state.
To decode using the Viterbi algorithm, we replace the label of each edge with the edge
metric (also called branch metric) computed as follows. The edge with x
2j1
= a and
x
2j
= b , where a, b {1}, is assigned the edge metric ay
2j1
+by
2j
. Now if we add up
all the edge metrics along a path, we obtain the path metric hx, yi for that path.
Example 91. Consider the trellis on the top of Figure 6.4 and let the decoder input
sequence by y = (1, 3), (2, 1), (4, 1), (5, 5), (3, 3), (1, 6), (2, 4) . For convenience,
we chose the components of y to be integers, but in reality they are real-valued. Also
for convenience, we use parentheses to group the components of y into pairs (y
2j1
, y
2j
)
that belong to the same trellis section. The edge metrics are shown on the second trellis
(from the top) of Figure 6.4. Once again, by adding the edge metric along one path, we
obtain the hx, yi for that path.
The problem of nding (one of) the x that maximizes hx, yi is reduced to the problem
of nding the path with the largest path metric. Next we give an example of how to do
this and then we explain why it works.
Example 92. Our starting point is the the second trellis of Figure 6.4, which has been
labeled with the edge metrics. We construct the third trellis in which every state is
labeled with the metric of the surviving path to that state obtained as follows. We use
j = 0, 1, . . . , k + 2 to run over the trellis depth. (The +2 relates to the two dummy
bits.) Depth j = 0 refers to the initial state (leftmost) and depth j = k +2 to the nal
state (rightmost). Let j = 0 and to the single state at depth j assign the metric 0. Let
j = 1 and label each of the two states at depth j with the metric of the only subpath
to that state. (See the third trellis from the top.) Let j = 2 and label the four states at
depth j with the metric of the only subpath to that state. For instance, the label to the
state (1, 1) at depth j = 2 is obtained by adding the metric of the single state and
the single edge that precedes it, namely 1 = 4 + 3. From j = 3 on the situation is
more interesting, because now every state can be reached from two previous states. We
label the state under consideration with the largest of the two subpath metrics to that
state and make sure to remember to which of the two subpaths it corresponds. In the
gure, we make this distinction by dashing the last edge of the other path. (If we were
doing this by hand we would not need a third trellis. Rather we would label the states
on the second trellis and put a cross on the edges that are dashed on our third trellis.)
The subpath with the highest brach metric (the one that has not been dashed) is called
survivor. We continue similarly for j = 4, 5, . . . , k + 2. At depth j = k + 2 there is
only one state and its label maximizes hx, yi over all paths. By tracing back along the
non-dashed path, we nd the maximum likelihood path. From it, we can read out the
corresponding bit sequence. The maximum likelihood path is shown in bold on the fourth
and last trellis of Figure 6.4.
From the above example, it is clear that, starting from the left and working its way to
the right, the Viterbi algorithm visits all states and keeps track of the subpath that has
6.3. The Decoder 201
the largest metric to that state. In particular, the algorithm nds the path between the
initial state and the nal state that has the largest metric.
The complexity of the Viterbi algorithm is linear in the number of trellis sections, i.e.,
in k . Recall that the brute force approach has complexity exponential in k . The saving
of the Viterbi algorithm comes from not having to compute the metric of non-survivors.
When we prune an edge at depth j , we are in fact eliminating 2
kj
possible extension
of that edge. The brute force approach computes the metric of those extensions but not
the Viterbi algorithm.
A formal denition of the VA (one that can be programmed on a computer) and a more
formal argument that it nds the path that maximizes hx, yi is given in Appendix 6.A.
202 Chapter 6.
1
,1
!
1
,
!
1
!
1
,!
1
1,1
!
1
,1
1,!1
!
1
,
1
1
,!
1
1
,1
!
1
,
!
1
!
1
,!
1
1,1
!
1
,1
1,!1
!
1
,
1
1
,!
1
1
,1
!
1
,
!
1
!
1
,!
1
1,1
!
1
,1
1,!1
!
1
,
1
1
,!
1
1,1
!1,1
1,!1
!1,!1
STATE
!
4
4
1
!1
3
2
!2
5
!5
!
7
7
3
!
3
5
!
5
5
1
0
!
1
0
!
1
0
10
0
0
0
!
6
6
6
!6
0
0
0
0
3
!
3
!
3
!
5
0
1,1
!1,1
1,!1
!1,!1
STATE
!
1
,!
1
1,1
!
1
,!
1
1,1
!
1
,
1
1
,!
1
!
1
,
!
1
1,1
!
1
,
!
1
1,1
!
1
,1
!
1
,
1
1
,!
1
31
!4
5
4
3
!7
!1
6
0
10
4
1,1
!1,1
1,!1
!1,!1
STATE
!
4
4
1
!1
3
2
!2
5
!5
!
7
7
3
!
3
5
!
5
5
1
0
!
1
0
!
1
0
10
0
0
0
!
6
6
6
!6
0
0
0
0
3
!
3
!
3
!
5
0
4
4
20
16
20
20
22
10
29
25 31
!4
5
4
3
!7
!1
6
0
10
4
1,1
!1,1
1,!1
!1,!1
STATE
!
4
4
1
!1
3
!2
5
!5
!
7
7
3
!
3
5
!
5
5
1
0
!
1
0
!
1
0
10
0
0
0
!
6
6
6
!6
0
0
0
0
3
!
3
!
3
!
5
0
4
4
20
16
20
20
22
10
29
25
2
Figure 6.4: The Viterbi algorithm. Top gure: Trellis representing the encoder
where edges are labeled with the corresponding output symbols. Second gure:
Edges are re-labeled with the edge metric corresponding to the received sequence
(1, 3), (2, 1), (4, 1), (5, 5), (3, 3), (1, 6), (2, 4) (parentheses have been
inserted to facilitate parsing). Third gure: Each state has been labeled with the
metric of a survivor to that state and non-surviving edges are pruned (dashed).
Fourth gure: Tracing back from the end, we nd the decoded path (bold); it
corresponds to the source sequence 1, 1, 1, 1, 1, 1, 1.
6.4. Bit Error Probability 203
6.4 Bit Error Probability
In this section we derive an upper bound to the bit error probability P
b
. As usual, we x a
signal and we evaluate the error probability conditioned on this signal being transmitted.
If the result depends on the chosen signal (which is not the case here), then we remove
the conditioning by averaging over all signals.
Each signal that can be produced by the transmitter corresponds to a path in the trellis.
The path we condition on is referred to as the reference path. We are free to choose
the reference path and, for notational convenience, we choose the all-one path: it is the
one that corresponds to the information sequence being a sequence of k ones with initial
encoder state (1, 1) . The encoder output is a sequence of 1s of length n = 2k .
The task of the decoder is to nd (one of) the paths in the trellis that has the largest
hx, yi where x is the encoder output that corresponds to that path. If the decoder does
not come up with the correct path, it is because it chooses a path that contains one or
more detours.
Detours (with respect to the reference path) are trellis path segments that share with the
reference path only the starting and the ending state.
1
See Figure 6.5.
Start 1st detour End 1st detour
End 2nd detour Start 2nd detour
Reference path
Figure 6.5: Detours.
Errors are produced when the decoder follows a detour. To compute the bit error probabil-
ity, we study the random process produced by the decoder when it chooses the maximum
likelihood trellis path. Each such path is either the correct path or else it breaks down
in some number of detours. To the path selected by the decoder, we associate a se-
quence
0
,
1
, . . . ,
k1
dened as follows. If there is a detour that starts at depth j ,
j = 0, 1, . . . , k 1, we let
j
be the number of bit errors produced by that detour. In
all other cases we let
j
= 0. Then
k1
j=0
j
is the number of bits that are incorrectly
decoded and
1
kk
0
k1
j=0
j
is the corresponding fraction of bits ( k
0
=1 in our running ex-
ample). Over the ensemble of all possible noise processes,
j
becomes a random variable
1
For an analogy, the reader can think of the trellis as a road map, of the reference path as an intended
road for a journey, and the path selected by the decoder as the actual road taken during the journey.
Due to constructions, occasionally the actual path splits from the intended path to merge again with it
at some later point. A detour is the chunk of road from the split to the merge.
204 Chapter 6.
j
and the bit error probability is
P
b
4
= E
_
1
kk
0
k1
j=0
j
_
=
1
kk
0
k1
j=0
E[
j
] .
To upper bound the above expression, we need to learn how many detours of a certain
kind there are. We do so in the next session.
6.4.1 Counting Detours
In this subsection, we consider the innite trellis obtained by extending the nite trellis in
both directions. Each path of the innite trellis corresponds to an innite input sequence
b = . . . b
1
, b
0
, b
1
, b
2
, . . . and an innite output sequence x = . . . x
1
, x
0
, x
1
, x
2
, . . . . These
are sequences that belong to {1}
.
To each such detour we can associate two numbers, namely the input distance i and the
output distance d. The input distance is the number of positions where the two input
sequences dier over the course of the detour. Likewise, the output distance is the number
of output discrepancies over the course of the detour.
We seek the answer to the following question. For any given reference path and depth
j {0, 1, . . . }, what is the number a(i, d) of detours that start at depth j and have
input distance i and output distance d, with respect to the reference path? This number
depends neither on j nor on the reference path. It does not depend on j because the
encoder is a time-invariant machine, i.e., all the sections of the innite trellis are identical.
We will see that it does not depend on the reference path either, because the encoder is
linear in a sense that we will discuss.
Example 93. Using again the top subgure of Figure 6.4, we can verify by inspection
that for each reference path and each positive integer j there is a single detour that starts
at depth j and has parameters i = 1 and d = 5. Thus a(1, 5) = 1. We can also verify
that a(2, 5) = 0 and a(2, 6) = 2.
To determine a(i, d) we choose the all-one path as the reference. We modify the state
diagram into one for which all paths are detours with respect to the reference path. This
is the detour ow graph shown in Figure 6.6. It is obtained by removing from the state
diagram the self-loop of state b = (1, 1) and by split opening b to create two new states
denoted by s (for start) and e (for end). For every j , there is a one-to-one correspondence
between a detour to the all-one path that starts at depth j and a path between node s
and node e of the detour ow graph.
6.4. Bit Error Probability 205
s
l
t
r e
ID
2
ID
I
D
ID
D
2
D
Figure 6.6: Detour ow graph.
The label I
i
D
d
, ( i and d nonnegative integers), on an edge of the detour ow graph
indicates that the input and output distances increase by i and d, respectively, when the
detour takes this edge.
In terms of the detour ow graph, a(i, d) is the number of paths between s and e that
have path label I
i
D
d
, where the label of a path is the product of all labels along that
path. We will actually determine the generating function T(I, D) of a(i, d) dened as
T(I, D) =
i,d
I
i
D
d
a(i, d).
The letters I and D in the above expression should be seen as place holders without
any physical meaning. It is like describing a set of coecients a
0
, a
1
, . . . , a
n1
by means
of the polynomial p(x) = a
0
+a
1
x+. . . +a
n1
x
n1
. To determine T(I, D) , we introduce
auxiliary generating functions, one for each intermediate node of the detour ow graph,
namely
T
l
(I, D) =
i,d
I
i
D
d
a
l
(i, d)
T
t
(I, D) =
i,d
I
i
D
d
a
t
(i, d)
T
r
(I, D) =
i,d
I
i
D
d
a
r
(i, d),
T
e
(I, D) =
i,d
I
i
D
d
a
e
(i, d),
where in the rst line we dene a
l
(i, d) as the number of paths in the detour ow graph
that start at node s, end at node l , and have path label I
i
D
d
. Similarly, for x = t, r, e,
a
x
(i, d) is the number of paths in the detour ow graph that start at node s, end at node
x, and have path label I
i
D
d
. Notice that T
e
(I, D) is indeed the T(I, D) of interest to
us.
206 Chapter 6.
From the detour ow graph, we see that the various generating functions are related
as follows, where to simplify notation we drop the two arguments ( I and D) of the
generating functions:
T
l
= ID
2
+ T
r
I
T
t
= T
l
ID + T
t
ID
T
r
= T
l
D + T
t
D
T
e
= T
r
D
2
.
This system can be solved for T
e
(hence for T ) by pure formal manipulations, like solving
a system of equations. The result is
T(I, D) =
ID
5
1 2ID
.
As we will see shortly, the generating function T(I, D) of a(i, d) is more useful than
a(i, d) itself. However, to show that we can indeed obtain a(i, d) from T(I, D) we use
the expansion
1
1x
= 1 + x + x
2
+ x
3
+ to write
T(I, D) =
ID
5
1 2ID
= ID
5
(1 + 2ID + (2ID)
2
+ (2ID)
3
+
= ID
5
+ 2I
2
D
6
+ 2
2
I
3
D
7
+ 2
3
I
4
D
8
+
This means that there is one path with parameters d = 5, i = 1, that there are two
paths with d = 6, i = 2, etc. The general expression for i = 1, 2, . . . is
a(i, d) =
_
2
i1
, d = i + 4
0, otherwise.
By means of the detour ow graph, it is straightforward to verify this expression for small
values of i and d.
It remains to be shown that a(i, d) (the number of detours that start at any given depth
j and have parameter i and d) does not depend on which refrence path we choose. We
do this in Exercise 7.
6.4.2 Upper Bound to P
b
We are now ready for the nal step: the derivation of an upper bound to the bit-error
probability. We recapitulate.
6.4. Bit Error Probability 207
We x an arbitrary encoder input sequence, let x = x
1
, x
2
. . . , x
n
be the corresponding
encoder output and c =
E
s
x the channel input sequence. The waveform signal is
w(t) =
n
j=1
c
j
j
(t)
where
1
(t), . . . ,
n
(t) forms an orthonormal collection. We transmit this signal over the
AWGN channel with power spectral density N
0
/2. Let r(t) = w(t) +z(t) be the received
signal, where z(t) is a sample path of the noise process Z(t) , and let
y = y
1
, . . . , y
n
, where y
i
=
_
r(t)
i
(t)dt,
be the decoder input.
The Viterbi algorithm labels each edge in the trellis with the corresponding edge metric
and nds the path through the trellis with the largest path metric. An edge from depth
j 1 to j with output symbols x
2j1
, x
2j
is labeled with the edge metric y
2j1
x
2j1
+
y
2j
x
2j
.
The maximum likelihood path selected by the Viterbi decoder could contain detours. If
there is a detour that starts at depth j , j = 0, 1, . . . , k1, we set
j
to be the number of
bit errors made on that detour. Otherwise, we set
j
= 0. Let
j
be the corresponding
random variable (over all possible noise realizations).
For the path selected by the Viterbi algorithm, the total number of incorrect bits is
k1
j=0
j
and
1
kk
0
k1
j=0
j
is the fraction of errors with respect to the kk
0
source bits.
Hence the bit-error probability is
P
b
=
1
kk
0
k1
j=0
E[
j
]. (6.1)
The expected value E[
j
] can be written as follows
E[
j
] =
h
i(h)(h), (6.2)
where the sum is over all detours h that start at depth j with respect to the reference
path, (h) stands for the probability that detour h is taken, and i(h) for the input
distance between detour h and the reference path.
Next we upperbound (h) . If a detour starts at depth j and ends at depth l = j + m,
then the corresponding encoder-output symbols form a 2m tuple u {1}
2m
. Let
u = x
2j+1
, . . . , x
2l
{1}
2m
and = y
2j+1
, . . . , y
2l
be the corresponding sub-sequence of
the reference path and of the channel output, respectively, see Figure 6.7.
208 Chapter 6.
Detour starts at depth
Detour ends at depth
l = j + m
j
u
u
Figure 6.7: Detour and reference path, labeled with the corresponding output
subsequences.
If the Viterbi algorithm takes this detour, it must be that the subpath metric along the
detour is at least as large as the corresponding subpath metric along the reference path.
An equivalent condiditon is that is at least as close to
E
s
u as it is to
E
s
u. Observe
that has the statistic of
E
s
u+Z where Z N(0, I
n
0
N
0
2
) and n
0
is the common length
of u, u, and . The probability that is at least as close to
E
s
u as it is to
E
s
u is
Q
_
d
E
2
_
, where d
E
= 2
E
s
d is the Euclidean distance between
E
s
u and
E
s
u. Using
d
E
(h) to denote the Euclidean distance of detour h to the reference path, we obtain
(h) Q
d
E
(h)
2
= Q
_
_
2E
s
d(h)
N
0
_
_
,
where the inequality sign is needed because the event that is at least as close to
E
s
u
as it is to
E
s
u is no guarantee that the Viterbi decoder will take detour u. In fact,
there could be another detour even closer to . Inserting the above bound into (6.2) we
obtain the rst inequality in the following chain.
6.4. Bit Error Probability 209
E
j
=
h
i(h)(h)
h
i(h)Q
_
2E
s
d(h)
N
0
_
(a)
=
i=1
d=1
iQ
_
_
2E
s
d
N
0
_
a(i, d)
(b)
i=1
d=1
iQ
_
_
2E
s
d
N
0
_
a(i, d)
(c)
i=1
d=0
iz
d
a(i, d).
To obtain equality (a) we group the terms of the sum that have the same i and d and
introduce a(i, d) to denote the number of such terms in the nite trellis. a(i, d) is the
nite-trellis equivalent to a(i, d) introduced in Section 6.4.1. As the innite trellis contains
all the detours of the nite trellis and more, a(i, d) a(i, d) . This justies (b). In (c) we
use
Q
_
_
2E
s
d
N
0
_
e
Esd
N
0
= z
d
, for z = e
Es
N
0
.
For the nal step we use the relationship
i=1
if(i) =
I
i=0
I
i
f(i)
I=1
,
which holds for any function f and can be veried by taking the derivative of
i=0
I
i
f(i)
with respect to I and then setting I = 1. Hence
E[
j
]
i=1
d=0
iz
d
a(i, d)
=
I
i=1
d=0
I
i
D
d
a(i, d)
I=1,D=z
=
I
T(I, D)
I=1,D=z
.
Plugging into (6.1) and using the fact that the above bound does not depend on j yields
P
b
=
1
kk
0
k1
j=0
E[
j
]
1
k
0
I
T(I, D)
I=1,D=z
. (6.3)
210 Chapter 6.
In our specic example we have k
0
= 1 and T(D, I) =
ID
5
12ID
, hence
T
I
=
D
5
(12ID)
2
. Thus
P
b
z
5
(1 2z)
2
.
The bit error probability depends on the encoder and on the channel. Bound (6.3) nicely
separates the two contributions. The encoder is accounted for by T(I, D)/k
0
and the
channel by z . More precisely, z
d
is an upper bound to the probability that a maximum
likelihood receiver makes a decoding error when the choice is between two encoder output
sequences that have Hamming distance d. As shown in Problem 35(ii) of Chapter 2, we
can use the Bhattacharyya bound to determine z for any binary-input discrete memoryless
channel. For such a channel,
z =
y
_
P(y|a)P(y|b)
where a and b are the two letters of the input alphabet and y runs over all the elements
of the output alphabet. Hence, the technique used in this chapter is applicable to any
binary input discrete memoryless channel.
6.5 Concluding Remarks
To be specic, in thiese concluding remarks we assume that the waveform former is as in
the previous chapter, i.e., the transmitted signal is of the form
w(t) =
n1
i=0
c
i
(t iT)
for some pulse (t) that has unit norm and is orthogonal to is T -space translates.
Without coding, c
j
= b
j
E
s
_
E
s
_
. Then the relevant design parameters are (the
subscripts s and b stand for symbol and bit, respectively)
R
b
= R
s
=
1
T
[bits/s],
E
b
= E
s
,
P
b
= Q
E
s
= Q
_
_
2E
s
N
0
_
e
Es
N
0
,
where we have used
2
=
N
0
2
and Q(x) exp
_
x
2
2
_
.
With coding, if we want the transmitted signal to have the same form in both cases, the
6.5. Concluding Remarks 211
corresponding parameters are
R
b
=
R
s
2
=
1
2T
[bits/s],
E
b
= 2E
s
,
P
b
z
5
(1 2z)
2
where z = e
Es
N
0
.
We see that the encoder halves the bit rate and doubles the energy per bit consumed by
the system. As
E
b
N
0
becomes large, the denominator of the above bound for P
b
becomes
essentially 1 and the bound decreases as z
5
. The bound for the uncoded case is z .
The above bounds are plotted in Figure 15.
0 1 2 3 4 5 6
10
10
10
9
10
8
10
7
10
6
10
5
10
4
10
3
10
2
10
1
10
0
1 3 5
E
s
/N
0
in dB
P
b
z = e
Es
N
0
Q
_
2Es
N
0
z
5
(12z)
2
t
r
u
n
c
a
t
e
d
u
p
p
e
r
b
o
u
n
d
Figure 6.8: Bit error probability: bounds and approximation. The truncated
upper bound is the bound derived in Exercise 15.
In both cases the power spectral density is
Es
T
|
F
(f)|
2
(see Exercise 1). Hence the band-
width is the same in both cases. Since coding reduces the bit-rate by a factor 2, the
bandwidth eciency, dened as the number of bits per second per Hz, is smaller by a
factor of 2 in the coded case. With a more powerful code, we can further decrease the bit
error probability without aecting the other parameters. The price for a more powerful
conventional code is that the constraint length is bigger, which implies more states and,
in turn, a higher decoding complexity in terms of the number of operations per bit.
212 Chapter 6.
It is instructivce to compare the performance of the chosen code to a trivial code that
sends the information symbol twice. In this case the j th bit is sent as b
j
E
s
(t
(2j 1)T) + b
j
E
s
(t 2jT) which is the same as sending b
j
2E
s
(t 2jT) with
2
) . By
the law of large numbers,
_
(
Z
2
i
)/n goes to as n goes to innity. This means that
with probability approaching 1, the received n-tuple Y will be in a thin shell of radius
2
) . It also teaches us that the probability of error cannot be made
arbitrarily small if we use more than 2
nC
signals. Since (log
2
m)/n is the number of bits
per dimension that we are sending, when we use m signals embedded in an n dimensional
space, it is quite appropriate to call C [bits/dimension] the capacity of the discrete-time
additive white Gaussian noise channel. For the continuous-time AWGN channel of total
bandwidth W , the channel capacity is
W
2
log
2
1 +
2P
WN
0
[bits/sec],
where P is the transmitted power. It can be approached arbitrarily close, with an error
probability as small as desired, using signals of the form w(t) =
j
c
j
(t jT) and
powerful codes.
Appendix 6.A Formal Denition of the Viterbi Algorithm
Let = {(1, 1), (1, 1), (1, 1), (1, 1)} be the state space and dene the edge metric
j1,j
(, ) as follows. If there is an edge that connects state at depth j 1 to
state at depth j let
j1,j
(, ) = x
2j1
y
2j1
+ x
2j
y
2j
,
wherex
2j1
, x
2j
is the encoder output of the corresponding edge. If there is no such edge
we let
j1,j
(, ) = .
6.A. Formal Denition of the Viterbi Algorithm 213
Since
j1,j
(, ) is the j th term in hx, yi for any path that goes through state at
depth j 1 and state at depth j , hx, yi is obtained by adding the edge metrics along
the path specied by x.
The path metric is the sum of the edge metrics taken along the edges of a path. A longest
path from state (1, 1) at depth j = 0, denoted (1, 1)
0
, to a state at depth j , denoted
j
, is one of the paths that has the largest path metric. The Viterbi algorithm works
by constructing, for each j , a list of the longest paths to the states at depth j . The
following observation is key to understanding the Viterbi algorithm. If path
j1
j
is
a longest path to state of depth j , where path
j2
and denotes concatenation,
then path
j1
must be a longest path to state of depth j 1, for if another path, say
alternatepath
j1
were shorter for some alternatepath
j2
, then alternatepath
j1
j
would be shorter than path
j1
j
. So the longest depth j path to a state
can be obtained by checking the extension of the longest depth (j 1) paths by one
branch.
The following notation is useful for the formal description of the Viterbi algorithm. Let
j
() be the metric of a longest path to state
j
and let B
j
() {1}
j
be the encoder
input sequence that corresponds to this path. We call B
j
() {1}
j
the survivor
because it is the only path through state
j
that will be extended. (Paths through
j
that have a smaller metric have no chance of extending into a maximum likelihood path).
For each state, the Viterbi algorithms computes two things: a survivor and its metric.
The formal algorithm follows, where B(, ) is the encoder input that corresponds to the
transition from state to state if there is such a transition and is undened otherwise.
1. Initially set
0
(1, 1) = 0,
0
() = for all 6= (1, 1) ,
B
0
(1, 1) = , and j = 1.
2. For each , find one of the for which
j1
() +
j1,j
(, ) is a maximum. Then set
j
()
j1
() +
j1,j
(, ),
B
j
() B
j1
() B(, ).
3. If j = k + 2, output the first k bits of B
j
(1, 1) and
stop. Otherwise increment j by one and go to Step 2.
The reader should have no diculty verifying (by induction on j ) that
j
() as computed
by Viterbis algorithm is indeed the metric of a longest path from (1, 1)
0
to state at
depth j and that B
j
() is the encoder input sequence associated to it.
214 Chapter 6.
Appendix 6.B Exercises
Problem 1. (Power Spectral Density) Consider the random process
X(t) =
i=
X
i
_
E
s
(t iT
s
T
0
),
where T
s
and E
s
are xed positive numbers, (t) is some unit-energy function, T
0
is
a uniformly distributed random variable taking value in [0, T
s
) , and {X
i
}
i=
is the
output of the convolutional encoder described by
X
2n
= B
n
B
n2
X
2n+1
= B
n
B
n1
B
n2
with iid input sequence {B
i
}
i=
taking values in {1}.
(a) Express the power spectral density of the above random process for a general (t) .
(b) Plot the power special density for a rectangular pulse (t) of width T
s
.
Problem 2. (Power Spectral Density: Correlative Encoding) Repeat Exercise 1 using
the encoder:
X
i
= B
i
B
i1
.
Compare this exercise to Exercise 4 of Chapter 5.
Problem 3. (Viterbi Algorithm) An output sequence x
1
, . . . , x
10
from the convolutional
encoder of Figure 6.9 is transmitted over the discrete-time AWGN channel. The initial
and nal state of the encoder is (1, 1) . Using the Viterbi algorithm, nd the maxi-
mum likelihood information sequence
b
1
, . . . ,
b
4
, 1, 1, knowing that b
1
, . . . , b
4
are drawn
independently and uniformly from {1} and that the channel output y
1
, . . . , y
10
=
1, 2, 1, 4, 2, 1, 1, 3, 1, 2. (We are choosing integers for convenience.)
b
j1
b
j2
b
j
{1}
x
2j+1
x
2j
Figure 6.9:
6.B. Exercises 215
Problem 4. (Intersymbol Interference) We speak of intersymbol interference (ISI) when
the sampled matched-lter output is a linear combination of various channel input symbols
plus noise. ISI can result from the channel impulse response, or can be introduced by
the encoder to shape the transmitted signals spectrum, but in this case we speak of
correlative or partial response encoding (see Exercise 2). ISI can also be the result of
sampling the matched lter output at the wrong time (see e.g. Section 7.6). To leverage
on what we have learned in this chapter, we model the intersymbol interference channel
by an encoder followed by the AWGN channel, i.e.,
Y
i
= X
i
+ Z
i
X
i
=
L
j=0
B
ij
h
j
, i = 1, 2, . . . (6.4)
where B
i
is the i th information bit, h
0
, . . . , h
L
are coecients that describe the inter-
symbol interference, and Z
i
is zero-mean, Gaussian, of variance
2
and statistically inde-
pendent of everything else. The input/output relationship can be modeled by a trellis, as
we have learned for convolutional encoders, and the ML decision rule can be implemented
by the Viterbi algorithm.
(a) Draw the trellis that describes all sequences of the form X
1
, . . . , X
7
resulting from
information sequences of the form B
1
, . . . , B
5
, 0, B
i
{0, 1}, assuming
h
i
=
_
_
_
1, i = 0
2, i = 1
0, otherwise.
To determine the initial state, you may assume that the preceding information se-
quence terminated with 0. Label the trellis edges with the input/output symbols.
(b) Specify a metric f(x
1
, . . . , x
6
, ) =
6
i=1
f(x
i
, y
i
) whose minimization or maximization
with respect to the valid x
1
, . . . , x
6
leads to a maximum likelihood decision. Specify
if your metric needs to be minimized or maximized.
(c) Assume y
1
, . . . , y
6
= 2, 0, 1, 1, 0, 1. Find the maximum likelihood estimate of the
information sequence B
1
, . . . , B
5
.
Problem 5. (Linearity) In this exercise, we establish in what sense the encoder of Fig-
ure 6.2 is linear.
(a) For this part you might want to review the axioms of a eld. (See e.g. K. Homan
and R. Kunze, Linear Algebra, Prentice Hall or your favorite linear algebra book.)
Consider the set F
0
= {0, 1} with the following addition and multiplication tables:
216 Chapter 6.
+ 0 1
0 0 1
1 1 0
0 1
0 0 0
1 0 1
(The addition in F
0
is the usual addition over R with result taken modulo 2. The
multiplication is the usual multiplication over R and there is no need to take the
modulo 2 operation because the result is automatically in F
0
.) F
0
, +, and
form a binary eld denoted by F
2
. Now consider F
that sends 0 to 1
and 1 to 1. Hence, by construction, for a, b F
0
, T(a + b) = T(a) + T(b) and
T(a b) = T(a) T(b) . Be aware of the double meaning of + and in the
previous sentence.
(b) For this part you might want to review the notion of a vector space. Let F
0
, +
and be as dened in (a). Let V = F
0
. This is the set of innite sequences
taking values in F
0
. Does V , F
0
, + and form a vector space? (Addition of
vectors and multiplication of a vector with a scalar is done component-wise.) Repeat
using F
.
(c) For this part you might want to review the notion of linear transformation. Let
f : V V be the transformation that sends an innite sequence b V to an innite
sequence x V according to
x
2j1
= b
j1
+ b
j2
+ b
j3
x
2j
= b
j
+ b
j2
,
where the + is the one dened over the eld of scalars implicit in V . Argue that
this f is linear. Comment: When V = F
, but this is just a matter of notation, the result of the two operations
on the elements of F
2E
s
}. We are more energy ecient if we
use the channel input alphabet {
E
s
}. Towards that end, we map the encoder outut
symbols into F
b
j
) {1}. Hint: For a and b in F
2
,
T(a + b) = T(a)T(b) .
b
j1
b
j2
b
j
{0, 1}
+ +
+ T
T
x
2j
x
2j1
(a) Conventional description. Addition is modulo 2.
b
j1
b
j2
b
j
{1}
x
2j
x
2j1
(b) Description used in this text. Multiplication is over R.
Figure 6.10:
Comment: the encoder of Figure 6.10(b) is linear over the eld F
b) = f(b)f(
b) and the
all-one output sequence (which is the output to the all-one input sequence).
(b) Fix an arbitrary reference path and an arbitrary detour that splits from the reference
path at time 0. Let b and
b be the corresponding input sequences. Because the
the detour starts at time 0, b
i
=
b
i
for i < 0 and b
0
6=
b
0
. Argue that
b uniquely
denes a detour
b that splits from the all-one path at time 0 and such that:
(i) the distance between b and
b is the same as that between
b and the all-one
input sequence.
(ii) the distance between f(b) and f(
b) and the
all-one output sequence.
(c) Conclude that a(i, d) does not depend on the reference path.
Problem 8. (Rate 1/3 Convolutional Code.) For the convolutional encoder of Fig-
ure 6.11:
b
n1
b
n2
b
n
{1}
x
3n
= b
n
b
n2
x
3n+1
= b
n1
b
n2
x
3n+2
= b
n
b
n1
b
n2
Figure 6.11:
(a) Draw the state diagram and the detour ow graph.
(b) Suppose that the serialized encoder output symbols are scaled so that the resulting
energy per bit is E
b
and send over the AWGN channel of noise variance
2
= N
0
/2.
Derive an upper bound to the bit error probability assuming that the decoder imple-
ments the Viterbi algorithm.
Problem 9. (Rate 2/3 Convolutional Code) The following equations describe the output
sequence of a convolutional encoder that in each epoch takes k
0
= 2 input symbols from
{1} and outputs n
0
= 3 symbols from the same alphabet.
x
3n
= b
2n
b
2n1
b
2n2
x
3n+1
= b
2n+1
b
2n2
x
3n+2
= b
2n+1
b
2n
b
2n2
6.B. Exercises 219
(a) Draw an implementation of the encoder based on delay elements and multipliers.
(b) Draw the state diagram.
(c) Suppose that the serialized encoder output symbols are scaled so that the resulting
energy per bit is E
b
and send over the AWGN channel of noise variance
2
= N
0
/2.
Derive an upper bound to the bit error probability assuming that the decoder imple-
ments the Viterbi algorithm.
Problem 10. (Convolutional Encoder, Decoder and Error Probability) For the convolu-
tional code described by the state diagram of Figure 6.12:
(a) Draw the encoder.
(b) As a function of the energy per bit E
b
, upper bound the bit error probability of the
Viterbi algorithm when the scaled encoder output sequence is transmitted over the
discrete-time AWGN channel of noise variance
2
= N
0
/2.
1 | 1, !1
1,!1
!1 | !1, !1
!1 | 1, 1
!1,1
1 | 1, 1
!1 | !1, 1
1,1
!1,!1
1 | !1, !1
1 | !1, 1
!1 | 1, !1
Figure 6.12:
Problem 11. (Trellis with Antipodal Signals) Figure 6.13(a) shows a trellis section la-
beled with the output symbols x
2j1
, x
2j
of a convolutional encoder. Notice how branches
that are the mirror-image of each other have antipodal output symbols (symbols that are
220 Chapter 6.
the negative of each other). The purpose of this exercise is to see that when the trel-
lis has this particular structure and codewords are sent through the AWGN channel,
the maximum likelihood sequence detector further simplies (with respect to the Viterbi
algorithm).
+1
1, 1
1
,
1
1
1, 1
1
,
1
j 1 j
(a)
+1
a
b
1
a
b
j 1 j
c
d
c
d
j + 1
(b)
Figure 6.13:
Figure 6.13(b) shows two consecutive trellis sections labeled with the branch metric.
Notice that the mirror symmetry of Figure (a) implies the same kind of symmetry for
Figure (b). The maximum likelihood path is the one that has the largest path metric. To
avoid irrelevant complications we assume that there is only one path that maximizes the
path metric.
(a) Let
j
{1} be the state visited by the maximum likelihood path at depth j .
Suppose that a genie informs the decoder that
j1
=
j+1
= 1. Write down the
necessary and sucient condition for
j
= 1.
(b) Repeat for the remaining three possibilities of
j1
and
j+1
. Does the necessary
and sucient condition for
j
= 1 depend on the value of
j1
and
j+1
?
(c) The brach metric for the branch with output symbols x
2j1
, x
2j
is x
2j1
y
2j1
+
x
2j
y
2j
, where y
j
is x
j
plus noise. Using the result of the previous part, spec-
ify a maximum likelihood sequence decision for
j
= 1 based on the observation
y
2j1
, y
2j
, y
2j+1
, y
2j+2
.
Problem 12. (Viterbi for the Binary Erasure Channel) Consider the convolutional en-
coder of Figure 6.14 with inputs and outputs over {0, 1} and addition modulo 2. Its
output is sent over the the binary erasure channnel described by
P
Y |X
(0|0) = P
Y |X
(1|1) = 1 ,
P
Y |X
(?|0) = P
Y |X
(?|1) =
P
Y |X
(1|0) = P
Y |X
(0|1) = 0.
6.B. Exercises 221
b
j1
b
j2
+
+
b
j
{0, 1}
x
2j
= b
j
+ b
j2
x
2j1
= b
j1
+ b
j2
Figure 6.14:
(a) Draw a trellis section that describes the encoder map.
(b) Derive the branch metric and specify whether a maximum likelihood decoder chooses
the path with largest or smallest path metric.
(c) Suppose that the initial encoder state is (0, 0) and that the channel output is
{0, ?, ?, 1, 0, 1}. What is the most likely information sequence?
(d) Derive an upper bound to the bit error probability.
Problem 13. (Sampling Error) A transmitter sends
X(t) =
B
i
(t iT),
where {B
i
}
i=
, B
i
{1, 1}, is a sequence of independent and uniformly distributed
bits and (t) is a centered and unit-energy rectangular pulse of width T . The communi-
cation channel between the transmitter and the receiver is the AWGN channel of power
spectral density
N
0
2
. At the receiver, the channel output Z(t) is passed through a lter
matched to (t) , and the output is sampled, ideally at times t
k
= kT , k integer.
(a) Consider that there is a timing error, i.e., the sampling time is t
k
= kT where
T
= 0.25. Ignoring the noise, express the matched lter output observation w
k
at
time t
k
= kT as a function of the bit values b
k
and b
k1
.
(b) Extending to the noisy case, let r
k
= w
k
+ z
k
be the k -th matched lter output
observation. The receiver is not aware of the timing error. Compute the resulting
error probability.
(c) Now assume that the receiver knows the timing error (same as above) but it
can not correct for it. (This could be the case if the timing error becomes known
once the samples are taken.) Draw and label four sections of a trellis that describes
the noise-free sampled matched lter output for each input sequence b
1
, b
2
, b
3
, b
4
. In
your trellis, take into consideration the fact that the matched lter is at rest before
x(t) =
4
i=1
b
i
(t iT) enters the lter.
222 Chapter 6.
(d) Suppose that the sampled matched lter output consists of 2, 0.5, 0, 1. Use the
Viterbi algorithm to decide on the transmitted bit sequence.
Problem 14. (Simulation) The purpose of this exercise is to determine, by simulation,
the bit error probability of the communication system studied in this chapter. For the
simulation, we recommend using MATLAB, as it has high-level functions for the various
tasks, notably for generating a random information sequence, for doing convolutional
encoding, for simulating the discrete-time AWGN channel, and for decoding by means of
the Viterbi algorithm. Although the actual simulation is on the discrete-time AWGN, we
specify a continuous-time setup. It is part of your task to translate the continuous-time
specications into what you need for the simulation. We begin with the uncoded version
of the system of interest.
(a) By simulation, determine the minimum obtainable bit error probability P
b
of bit-by-
bit on a pulse train transmitted over the AWGN channel. Specically, the channel
input signal has the form
X(t) =
j
X
j
(t jT),
where the symbols are iid and take value in {
E
s
}, the pulse (t) has unit norm
and is orthogonal to its T -fold translates. Plot P
e
as function of (E
s
/N
0
) in the
range from 1 to 6 dB, where N
0
/2 is the noise power spectral density. Verify your
results with Figure 6.8. Hint: the following are useful MATLAB functions: To be
added.
(b) Repeat with the symbol sequence being the output of the convolutional encoder of
Figure 6.2 multiplied by
E
s
. The decoder shall implement the Viterbi algorithm.
Also in this case you can verify your results by comparing with Figure 6.8.
Problem 15. (Bit Error Probability) In the process of upper bounding the bit error
probability, in Section 6.4.2 we make the following step
E
j
i=1
d=1
iQ
_
_
2E
s
d
N
0
_
a(i, d)
i=1
d=0
iz
d
a(i, d).
(a) Instead of upper bounding the Q-function as done above, use the results of Sec-
tion 6.4.1 to substitute a(i, d) and d with explicit functions of i and get rid of the
second sum. You should obtain
P
b
i=1
iQ
2E
s
(i + 4)
N
0
2
i1
.
6.B. Exercises 223
(b) Truncate the above sum to the rst 5 terms and evaluate it numerically for (E
s
/N
0
)
between 1 and 6 dB. Plot the results and compare to Figure 6.8.
224 Chapter 6.