Convolutional Code
Convolutional Code
Wireless Information Transmission System Lab. Institute of Communications Engineering g g National Sun YatYat-sen University
1
1 2 k 1 2
2
k 1 2
K
k
k bit bits
n-1
Output
Convoltuional Code
Convolutional codes k = number of bits shifted into the encoder at one time
n = number of encoder output bits corresponding to the k information bits r = k/n = code rate K = constraint length, encoder memory Each encoded bit is a function of the present input bits and their past ones.
Generator Sequence
r0
r1
r2
r0
r1
r2
r3
x1
x2
00
Present Next Output
1(11)
0 1 0 1 0 1 0 1
00 00 01 01 10 10 11 11
00 10 00 10 01 11 01 11
00 11 11 00 01 10 10 01
5
10
1(10)
1(01)
State Diagram
00
0(00)
00
0(00)
00 0(00)
00 0(00)
00
0(00)
00
0(00)
00
01
01
01
01
01
10
10
10
10
10
Encoding Process
Input: 1 Output: 11 0 01 1 00 1 10 1 01 0 10 0 11
00
0(00)
00
0(00)
00
0(00)
00 0(00)
00 0(00)
00
0(00)
00
0(00)
00
01
01
01
01
01
10
10
10
10
10
Basic concept Generate the code trellis at the decoder The decoder penetrates through the code trellis level by level in search for the transmitted code sequence At each level of the trellis, the decoder computes and compares the metrics of all the partial paths entering a node The decoder stores the partial path with the larger metric and eliminates all the other p partial p paths. The stored p partial p path is called the survivor.
00
0(00)
00
0(00)
00
0(00)
00 0(00)
00 0(00)
00
0(00)
00
0(00)
00
2
01 01 01 01 01
10
10
10
10
10
0
11 1(01) 11 1(01) 11 1(01) 11
10
00
0(00)
00
0(00)
00
0(00)
00 0(00)
00 0(00)
00
0(00)
00
0(00)
00
4
01 01 01 01 01
10
10
10
10
10
2
11 1(01) 11 1(01) 11 1(01) 11
11
00
0(00)
00
0(00)
00
0(00)
00 0(00)
00 0(00)
00
0(00)
00
0(00)
00
4
01
3
01 01 01 01
10
10
10
10
10
12
00
0(00)
00
0(00)
00
0(00)
00 0(00)
00 0(00)
00
0(00)
00
0(00)
00
4
01
3
01
3
01 01 01
10
10
10
10
10
13
00
0(00)
00
0(00)
00
0(00)
00 0(00)
00 0(00)
00
0(00)
00
0(00)
00
4
01
3
01
3
01
3
01 01
10
10
10
10
10
14
00
0(00)
00
0(00)
00
0(00)
00 0(00)
00 0(00)
00
0(00)
00
0(00)
00
4
01
3
01
3
01
3
01
3
01
10
10
10
10
10
15
00
0(00)
00
0(00)
00
0(00)
00 0(00)
00 0(00)
00
0(00)
00
0(00)
00
4
01
3
01
3
01
3
01
3
01
10
10
10
10
10
16
00
0(00)
00
0(00)
00
0(00)
00 0(00)
00 0(00)
00
0(00)
00
0(00)
00
4
01
3
01
3
01
3
01
3
01
10
10
10
10
10
O t t 10111(00) Output:
17
Convolutional Codes
Wireless Information Transmission System Lab. Institute of Communications Engineering g g National Sun YatYat-sen University
Convolutional Code
Convolutional codes differ from block codes in that the encoder contains memory and the n encoder outputs at any time unit depend not only on the k inputs but also on m previous input blocks. An (n, , k, , m) convolutional code can be implemented p with a k-input, p , n-output linear sequential circuit with input memory m. Typically, yp y n and k are small integers g with k<n, but the memory y order m must be made large to achieve low error probabilities. In the important special case when k=1, the information sequence is not divided into blocks and can be processed continuously. Convolutional codes were first introduced by Elias in 1955 as an alternative l i to block bl k codes. d
19
Convolutional Code
Shortly thereafter, Wozencraft proposed sequential decoding as an efficient decoding scheme for convolutional codes, codes and experimental studies soon began to appear. In 1963, , Massey yp proposed p a less efficient but simpler-to-implement p p decoding method called threshold decoding. Then in 1967, Viterbi p proposed p a maximum likelihood decoding g scheme that was relatively easy to implement for cods with small memory orders. This scheme, called Viterbi decoding, together with improved versions of sequential decoding, led to the application of convolutional codes to deep-space deep space and satellite communication in early 1970s.
20
Convolutional Code
A convolutional code is generated by passing the information sequence to be transmitted through a linear finite finite-state state shift register. register In general, the shift register consists of K (k-bit) stages and n linear algebraic g function generators. g
21
Convolutional Code
Convolutional codes k = number b of f bits bi shifted hif d into i the h encoder d at one time i k=1 is usually used!! n = number of encoder output bits corresponding to the k information bits Rc = k/n = code rate K = constraint length, length encoder memory. memory Each encoded bit is a function of the present input bits and their h i past ones.
22
Example 1:
Consider C id the th binary bi convolutional l ti l encoder d with ith constraint t i t length l th K=3, k=1, and n=3. The generators are: g1=[100], =[100] g2=[101], =[101] and g3=[111]. =[111] The generators are more conveniently given in octal form as (4 5 7) (4,5,7).
23
Example 2:
Consider C id a rate t 2/3 convolutional l ti l encoder. d The generators are: g1=[1011], g2=[1101], and g3=[1010]. I octal In l form, f these h generator are (13, (13 15, 15 12). 12)
24
There are three alternative methods that are often used to describe a convolutional code: Tree diagram Trellis diagram State disgram g
25
Tree diagram
Note that h the h tree diagram di in i the h right i h repeats itself after the third stage. This is consistent with the fact that the constraint length K=3. The output p sequence q at each stage g is determined by the input bit and the two previous input bits. I other In th words, d we may sat t that th t the th 33 bit output sequence for each input bit y the input p bit and the is determined by four possible states of the shift register, denoted as a=00, b=01, c=10, and d=11. =11 T diagram Tree di for f rate 1/3, 1/3 K=3 convolutional code.
26
Trellis diagram
(00)
(01)
(10)
(11)
State diagram
0 1 a a a c 0 1 b a b c 0 1 c b c d 0 1 d b d d
29
30
St t diagram State di
31
In general, we state that a rate k/n, constraint length K, convolutional code is characterized by 2k branches emanating from each node of the tree diagram. 1) possible The h trellis lli and d the h state diagrams di each h have h 2k(K-1) ibl states. There are 2k branches entering each state and 2k branches leaving each state.
32
the encoder consists of an m= 3-stage shift register together with n=2 modulo-2 modulo 2 adders and a multiplexer for serializing the encoder outputs. The mod-2 adders can be implemented as EXCLUSIVE-OR gates.
S ce mod-2 Since od add addition t o is s a linear ea ope operation, at o , t the e encoder e code is s a linear ea feedforward shift register. All convolutional encoders can be implemented p using g a linear feedforward shift register of this type.
33
The information sequence u =(u0, u1, u2, ) enters the encoder one bit at a time. time Since the encoder is a linear system, the two encoder output (1) (1) (2) (2) sequence q v (1) = ( 0 , 1(1) , 2 , ) and v ( 2 ) = ( 0 , 1( 2 ) , 2 , ) can be obtained as the convolution of the input sequence u with the two encoder impulse response. The impulse responses are obtained by letting u =(1 0 0 ) and observing the two output sequence. Since the encoder has an m-time unit memory, the impulse responses can last at most m+1 time units, and are written as :
(1) (1) g (1) = ( g 0 , g1(1) , , g m ) (2) ( 2) g ( 2) = ( g 0 , g1( 2 ) , , g m )
34
The encoder of the binary (2, 1, 3) code is g (1) = (1 0 1 1) g ( 2 ) = (1 1 1 1) The impulse p response p g(1) and g(2) are called the g generator sequences q of the code. The encoding g equations q can now be written as v (1) = u g (1) v (2) = u g ( 2)
where h *d denotes t discrete di t convolution l ti and d all ll operations ti are mod-2. d2
( j) l
( j) = ul i gi( j ) = ul g 0 + ul 1 g1( j ) + i =0
( j) + ul m g m , j = 1, 2,.
l(1) = u l + ul 2 + ul 3 l( 2 ) = u l + u l 1 + u l 2 + u l 3
as can easil easily be verified erified b by direct inspection of the encoding circuit. circ it After encoding, the two output sequences are multiplexed into a signal sequence, sequence called the code word, for transmission over the channel. The code word is given by
(1) ( 2 ) (1) ( 2 ) v = ( 0 0 ,1(1)1( 2 ) , 2 2 ,
).
36
Example 10.1
Let L t the th information i f ti sequence u = (1 0 1 1 1). 1) Then Th the th output t t sequences are v (1) = (1 0 1 1 1) (1 0 1 1) = (1 0 0 0 0 0 0 1) v (2) = (1 0 1 1 1) (1 1 1 1) = (1 1 0 1 1 1 0 1) and the code word is
v = (1 1, 0 1, 0 0, 0 1, 0 1, 0 1, 0 0, 1 1).
37
If the generator sequence g(1) and g(2) are interlaced and then g in the matrix arranged
(1) ( 2 ) g0 g0 G=
(1) ( 2 ) gm gm (1) ( 2) gm g 1 m 1
(1) ( 2 ) gm gm
where the blank areas are all zeros, the encoding equations can be rewritten itt in i matrix t i form f as v = uG G. G is called the generator matrix of the code. Note that each row of G is identical to the preceding row but shifted n = 2 places to right, and the G is a semi-infinite matrix, corresponding to the fact that the information sequence u is of arbitrary length.
38
If u has finite length L, then G has L rows and 2(m+L) columns, and v has length g 2( (m + L) ).
Example 10.2
Consider a (3, 2, 1) convolutional codes Since k = 2, the encoder consists of two m = 11 stage shift registers together oge e with w n=3 mode-2 adders and two multiplexers.
40
The information sequence enters the encoder k = 2 bits at a time, and can be written as
(1) ( 2 ) (1) ( 2 ) u = (u 0 u 0 , u1(1) u1( 2 ) , u 2 u2 ,
(1) (1) , u1(1) , u 2 , ) u (1) = (u 0 ( 2) ( 2) u (2) = (u 0 , u1( 2 ) , u 2 , ) There are three generator sequences corresponding to each input sequence. j) j) j) Let g i( j ) = ( g i(,0 , g i(,1 , , g i(, m ) represent the generator sequence corresponding to input i and output j, the generator sequence of the (3, 2, 1) convolutional codes are
41
The convolution operation implies that 1) ( 2) l(1) = u l(1) + u l( + u l 1 1 ( 2) (2) (1) l = u l + u l 1 1) l( 3) = u l(1) + u l( 2 ) + u l( 1, After multiplexing, the code word is given by
(1) ( 2 ) ( 3) (1) ( 2 ) ( 3) v = ( 0 0 0 ,1(1)1( 2)1(3) , 2 2 2 , ).
42
Example 10.3
v = (1 1 0, 0 0 0 , 0 0 1, 1 1 1).
43
44
Example 10.4
1 0 1 1 1 1 0 1 1 1 0 0 1 0 1 111 = (1 1, 0 1, 1 0) 0 1 1 1 0 0 1 0 1 1 1 1 0 1 1 1 0 0 = (1 1 0, 0 0 0, 0 0 1, 1 1 1), it agree with our previous calculation using discrete convolution.
45
In particular, the encoder now contains k shift registers, not all of which must have the same length. length If Ki is the length of the ith shift register, then the encoder memory order m is defined as m max K i
1 i k
p of a (4, ( , 3, , 2) ) An example convolutional encoder in which the shift register l h are 0, 0 1, 1 and d 2. 2 length
46
The overall constraint length is defined as nAn(m+1). Since each information bit remains in the encoder for up to m+1 time units, and during each time unit can affect any of the n encoder p , nA can be interpreted p as the maximum number of encoder outputs, outputs that can be affected by a signal information bit. For example, p the constraint length g of the ( (2,1,3), ) ( (3,2,1), ) and ( (4,3,2) ) convolutional codes are 8, 6, and 12, respectively. Note that different books or papers may have different definitions for constraint length.
47
where h each h Gl is i a k n submatrix b t i whose h entries t i are ) (n) g1(,1l) g1(,2 g l 1, l (1) ( 2) (n) g 2 ,l g 2 ,l g 2 ,l Gl = (1) ( 2) (n) g k ,l g k ,l g k ,l Note that each set of k rows of G is identical to the previous set of rows but b t shifted hift d n places l to t the th right. i ht
48
For an information sequence (1) ( 2 ) (k ) u = (u 0 , u 1 , ) = ( u 0 u0 u0 , u1(1) u1( 2 ) u1( k ) , ) (1) ( 2 ) (n) and the code word v = ( v 0 , v1 , ) = ( 0 0 0 ,1(1)1( 2 ) 1( n ) , ) is given gi en by b v = uG. Since the code word v is a linear combination of rows of the generator matrix G, an (n, k, m) convolutional code is a linear code. code
49
A convolutional encoder generates n encoded bits for each k information bits, bits and R = k/n is called the code rate. For an kL finite length information sequence, the corresponding g n(L + m), where the final nm outputs p are code word has length generated after the last nonzero information block has entered the encoder. Viewing a convolutional code as a linear block code with generator matrix G, the block code rate is given by kL/n(L + m), the ratio of th number the b of f information i f ti bits bit to t the th length l th of f the th code d word. d If L m, then L/(L + m) 1, and the block code rate and convolutional code are approximately equal .
50
In a linear system, time-domain operations involving convolution can be replaced by more convenient transform transform-domain domain operations involving polynomial multiplication. Since a convolutional encoder is a linear system, y , each sequence q in the encoding equations can be replaced by corresponding polynomial, and the convolution operation replaced by polynomial multiplication. In the polynomial representation of a binary sequence, the sequence it lf is itself i represent t by b the th coefficients ffi i t of f the th polynomial. l i l For example, for a (2, 1, m) code, the encoding equations become
v (1) ( D ) = u ( D )g (1) ( D ) v ( 2 ) ( D ) = u ( D )g ( 2 ) ( D ), where u(D) = u0 + u1D + u2D2 + is the information sequence.
51
v ( D) = v (1) ( D 2 ) + Dv ( 2 ) ( D 2 ) the h indeterminate i d i D can be b interpreted i d as a delay d l operator, and d the h power of D denoting the number of time units a bit is delayed with respect to the initial bit. bit
52
Example 10.6
For the previous (2, (2 1 1, 3) convolutional code, code the generator polynomials are g(1)(D) = 1+D2+D3 and g(2)(D) = 1+D+D2+D3. For o the e information o a o sequence seque ce u(D) = 1+D2+D3+D4, the e encoding e cod g equation are
v (1) ( D ) = (1 + D 2 + D 3 + D 4 )(1 + D 2 + D 3 ) = 1 + D 7 v ( 2 ) ( D ) = (1 + D 2 + D 3 + D 4 )(1 + D + D 2 + D 3 ) = 1+ D + D3 + D4 + D5 + D7 ,
and d th the code d word d is i
Note that the result is the same as previously computed using convolution and matrix multiplication.
53
Since the encoder is a linear system, and u(i)(D) is the ith input sequence and v(j)(D) is the jth output sequence, sequence the generator polynomial g i( j ) (D ) can be interpreted as the encoder transfer function relating input i to output j. As with k-input, n-output linear system, there are a total of kn transfer functions. These can be represented by the k n transfer function matrix
(1) g1 ( D ) g1( 2) ( D ) (1) ( 2) g D g 2 ( ) 2 (D) = g (1) ( D ) g ( 2 ) ( D ) k k
54
G (D)
( D ) (n) g 2 ( D )
( g1
n)
n g (k ) ( D )
Using the transfer function matrix, the encoding equation for an (n, k, m) code can be expressed as
V (D ) = U (D )G (D )
(1) ( 2) (k ) ( D ), ( D ), , ( D ) u u u where U ( D ) is the k-tuple of input (1) (2) (n) sequences and V ( D ) tuple v ( D ), ) v ( D ), ) , v ( D) is the n-tuple of output sequences. After multiplexing, p g, the code word becomes
v(D ) = v (1) D n + Dv (2 ) D n +
( )
( )
+ D n 1 v (n ) D n .
( )
55
Example 10.7
(1)
(2 )
(3 )
] [ [
v ( D ) = (1 + D 9 ) + (1 + D 9 ) D + ( D 6 + D 9 ) D 2 = 1 + D + D8 + D 9 + D10 + D11
56
Then, we can find a means of representing the code word v(D) directly in terms of the input sequences. sequences A little algebraic manipulation yields
v (D ) = u (i ) D n g i (D ) where h
gi ( D )
i =1
( )
g (i1) ( D n ) + Dg (i 2) ( D n ) +
D n 1g (i n 1) ( D n ) , 1 i k ,
57
Example 10.8
For th F the previous i (2, (2 1, 1 3) convolutional l ti l codes, d th the composite it generator polynomial is
g (D ) = g (1) D 2 + Dg (2 ) D 2 = 1 + D + D 3 + D 4 + D 5 + D 6 + D 7
( )
( )
58
60
The state diagram can be modified to provide a complete description of the Hamming weights of all nonzero code words (i.e. (i e a weight distribution function for the code). State S0 is split p into an initial state and a f final state, the self-loop p around state S0 is deleted, and each branch is labeled with a branch gain Xi ,where i is the weight of the n encoded bits on that branch. Each path connecting the initial state to the final state represents a nonzero code word that diverge from and remerge with state S0 exactly tl once. The path gain is the product of the branch gains along a path, and the weight of the associated code word is the power of X in the path gain.
61
Structural Properties of Convolutional Codes Modified encoder state diagram of a (2, 1, 3) code.
The path representing the sate sequence S0S1S3S7S6S5S2S4S0 has path gain X2X1X1X1X2X1X2X2=X12.
62
Structural Properties of Convolutional Codes Modified encoder state diagram of a (3, 2, 1) code.
The path representing the sate sequence S0S1S3S2S0 has path gain X2X1X0X1 =X12.
63
The weight distribution function of a code can be determined by considering the modified state diagram as a signal flow graph and applying Masons gain formula to compute its generating function
T ( X ) = Ai X i ,
i
where Ai is the number of code words of weight i. In a signal flow graph, a path connecting the initial state to the final state which does not g go through g any y state twice is called a f forward path. A closed path starting at any state and returning to that state without going through any other state twice is called a loop.
64
Let Ci be the gain of the ith loop. A set of loops is nontouching if no state belongs to more than one loop in the set. Let {i} be the set of all loops loops, {i i, j j} be the set of all pairs of nontouching loops, {i, j, l} be the set of all triples of g loops, p and so on. nontouching Define = 1 C i + C i ' C j ' C i '' C j '' C l '' + ,
i i' , j' i '' , j '' , l ''
Ci ' C j ' is the product of where Ci is the sum of the loop gains, i i' , j' the loop gains of two nontouching loops summed over all pairs of C i '' C j '' C l '' is the product of the loop gains of nontouching loops, '' i , j '' , l '' three nontouching loops summed over all nontouching loops.
65
And i is defined exactly like , but only for that portion of the graph not touching the ith forward path; that is, is all states along the ith forward path, together with all branches connected to these states, are removed from the graph when computing i. Masons formula for computing the generating function T(X) of a graph can now be states as
T (X ) =
F
i i
where the sum in the numerator is over all forward paths and Fi is the gain of the ith forward path.
66
(C 8 (C 11
(C (C (C (C (C (C (C (C (C
= X8 = X3 = X = X
7 2
2 3 4
5 6
= X9 = X8 = X3 = X) = X5 = X
) ) ) ) ) ) ) ) )
9 10
= X)
(C C = X ) Loop pair 2 : ( loop 3, loop 11) ( C C = X ) Loop pair 3: ( loop 4, loop 8) (C C = X ) Loop pair 4: ( loop 4, loop 11) ( C C = X ) Loop pair 5: ( loop 6, loop 11) ( C C = X ) Loop pair 6: ( loop 7, loop 9) (C C = X ) Loop pair 7: ( loop 7, loop 10) ( C C = X ) L Loop pair i 8: 8 ( loop l 7, loop l 11) (C C = X ) Loop pair 9: ( loop 8, loop 11) ( C C = X ) Loop pair 10 : ( loop 10, loop 11) ( C C = X )
Loop pair 1: ( loop 2, loop 8)
4 2 8 8 3 11 8 3 4 3 4 11 9 6 11 9 8 7 7 7 10 4 7 11 2 8 11 5 10 11
68
Example (2,1,3) Code : (cont.) There are two triples of nontouching loops :
Loop triple 1 : (loop 4, loop 8, loop 11 ) Loop triple
11
= X4
11
= X8
)
)
( + (X (X
+ X8 + X3+ X3+ X4 + X8 + X7 + X4 + X2 + X5 + X 8 = 1 2X + X 3
69
Th are seven forward There f d paths th in i this thi state t t diagram di : Foward path 1 : S 0 S1 S 3 S 7 S 6 S 5 S 2 S 4 S 0 F1 = X 12 Foward path 2 : S 0 S1 S 3 S 7 S 6 S 4 S 0 Foward path 3 : S 0 S1 S 3 S 6 S 5 S 2 S 4 S 0 Foward path 4 : S 0 S1 S 3 S 6 S 4 S 0 Foward d path h 5 : S 0 S1 S 2 S 5 S 3 S 7 S 6 S 4 S 0 Foward path 6 : S 0 S1 S 2 S 5 S 3 S 6 S 4 S 0 Foward path 7 : S 0 S1 S 2 S 4 S 0
7 2
( ) (F = X ) (F = X ) (F = X ) (F = X ) (F = X ) (F = X ).
11 6 3 4 8 5 7 6 7 7
70
Forward F d paths th 1 and d 5 touch t h all ll states t t in i the th graph, h and d hence h the th sub graph not touching these paths contains no states. Therefore, 1 = 5 = 1. 1 g p not touching g forward p The subgraph paths 3 and 6: 3 = 6 = 1 - X
71
72
73
Example (2,1,3) Code : (cont.) The generating function for this graph is then given by
X 12 1 + X 7 (1 X ) + X 11 (1 X ) + X 6 1 2 X + X 2 + X 8 1 + X 7 (1 X ) + X 7 1 X X 4 T (X ) = 3 1 2 X X X6 + X7 X8 6 7 8 9 10 = = X + 3 X + 5 X + 11 X + 25 X + 3 1 2X X
T(X) provides id a complete l t description d i ti of f the th weight i ht distribution di t ib ti of f all nonzero code words that diverge from and remerge with state S0 exactly once. once In this case, there is one such code word of weight 6, three of weight 7, five of weight 8, and so on.
74
Example (3,2,1) Code : (cont.) There are eight loops, loops six pairs of nontouching loops, loops and one triple of nontouching loops in the graph of previous modified g : (3, ( , 2, , 1) ) code, , and encoder state diagrams
= 1 X 2 + X 4 + X 3 + X + X 2 + X 1 + X 2 + X 3
6
( + (X
) ( )
= 1 2 X 2 X 2 X 3 + X 4 + X 5.
75
Fi i = X 5 (1 X X 2 X 3 + X 5 ) + X 4 (1 X 2 ) + X 6 1 + X 5 (1 X 3 )
+ X 4 1 + X 3 1 X X 2 + X 6 1 X 2 + X 6 1 + X 5 (1 X ) + X 8 1 + X 4 1 X X 2 X 3 + X 4 + X 7 1 X 3 + X 6 1 + X 3 (1 X ) + X 6 1 = 2 X 3 + X 4 + X 5 + X 6 X 7 X 8.
( (
2X 3 + X 4 + X 5 + X 6 X 7 X 8 3 4 5 T (X ) = = 2 X + 5 X + 15 X + 2 3 4 5 1 2X 2X X + X + X
This code contains two nonzero code word of weight 3, five of weight i h 4, fifteen fif of f weight i h 5, and d so on.
76
Additional information about the structure of a code can be obtained using the same procedure. procedure If the modified state diagram is augmented by labeling each branch p g to a nonzero information block with Y j, where j is the corresponding weight of the k information bits on the branch, and labeling every branch with Z, the generating function is given by
T ( X , Y , Z ) = Ai , j ,l X Y Z .
i j l i , j ,l
The coefficient Ai,j,l denotes the number of code words with weight i, whose associated information sequence has weight j, and whose l length h is i l branches. b h
77
Structural Properties of Convolutional Codes The augment state diagram for the (2, 1, 3) codes.
78
For th F the graph h of f th the augment t state t t diagram di for f the th (2, (2 1, 1 3) codes, d we have:
= 1 ( X 8Y 4 Z 7 + X 3Y 3 Z 5 + X 7Y 3 Z 6 + X 2Y 2 Z 4 + X 4Y 4 Z 7 + X 3Y 3 Z 6 + X 3YZ 3 + XYZ 2 + X 5Y 3 Z 4 + X 4Y 2 Z 3 + XYZ ) + ( X 4Y 4 Z 7 + X 8Y 4 Z 7 + X 3Y 3 Z 6 + X 3Y 3 Z 5 + X 4Y 4 Z 7 + X 8Y 4 Z 7 + X 7 Y 3 Z 6 + X 4 Y 2 Z 4 + X 2 Y 2 Z 3 + X 5Y 3 Z 4 ) X 4 Y 4 Z 7 + X 8Y 4 Z 7 X 4Y 2
= 1 + XY Z + Z 2 X 2Y 2 Z 4 Z 3 X 3 YZ 3 Y 3 Z 6
3
( (Z
Z4
) ( ) X (Y Z
8 3
Y 4 Z 7 X 9Y 4 Z 7
79
12 4 8 7 3 6 2 11 3 7 F = X Y Z 1 + X Y Z 1 XYZ + X Y Z (1 XYZ ) i i
) )+
This implies Thi i li that th t the th code d word d of f weight i ht 6 has h length l th 5 branches b h and an information sequence of weight 2, one code word of weight 7 has length 4 branches and information sequence weight 1, another has length 6 branches and information sequence weight 3, the third has length 7 branches and information sequence weight 3, and so on.
81
The state diagram can be used to obtain the distance property of a convolutional code. Without loss of g generality, y, we assume that the all-zero code sequence is the input to the encoder.
82
First, we label the branches of the state diagram as either D0=1, D1, D2, or D3, where the exponent of D denotes the Hamming distance between the sequence of output bits corresponding to each branch and the sequence of output bits corresponding to the all-zero branch. The self-loop at node a can be eliminated, since it contributes nothing to the distance properties of a code sequence relative to the all-zero code sequence. Furthermore, node a is split into two nodes, one of which represents th input the i t and d the th other th the th output t t of f the th state t t diagram. di
83
Use the modified state diagram, we can obtain four state equations: X c = D 3 X a + DX b
X b = DX c + DX d X d = D2 X c + D2 X d X e = D2 X b The transfer function for the code is defined as T(D)=Xe/Xa. By solving the state equations, we obtain:
D6 6 8 10 12 T (D) = D D D D = + 2 + 4 + 8 + 2 1 2D
(d 6 ) 2 2 (even d ) ad = (odd d ) 0
84
= ad D d
d =6
The transfer function can be used to provide more detailed information than just the distance of the various paths paths. Suppose we introduce a factor N into all branch transitions caused by y the input p bit 1. Furthermore, we introduce a factor of J into each branch of the state diagram g so that the exponent p of J will serve as a counting g variable to indicate the number of branches in any given path from node a to node e.
85
The exponent of the factor J indicates the length of the path that merges with the all all-zero zero path for the first time time. The exponent of the factor N indicates the number of 1s in the information sequence q for that p path. The exponent of D indicates the distance of the sequence of encoded bits for that p path from the all-zero sequence. q
Reference: John G. Proakis, , Digital g Communications, , Fourth Edition, , pp. pp 477 482, McGraw-Hill, 2001.
87
An important subclass of convolutional codes is the class of systematic codes. In a systematic code, the first k output sequences are exact replicas of the k input p sequences, q , i.e., , v(i) = u(i), i = 1, 2, , k, and the generator sequences satisfy
gi
( j)
1 = 0
if j = i , i = 1,2 , ,...k, , if j i
88
where I is the k k identity matrix, matrix 0 is the k k all all-zero zero matrix matrix, and Pl is the k (n k) matrix
( k +1) (k + 2 ) g 1 g ,l 1,l g (k +1) g (k + 2 ) 2,l Pl = 2,l g (kk,l+1) g (kk,l+ 2 )
89
(n ) g1 ,l g (2n,l) , g (kn,l)
1 0 0 1 G (D ) = 0 0
( k +1 ) (D ) 0 g1 0 g (2k +1) (D )
1 g (kk +1) (D )
(n ) (D ) g1 g (2n ) (D ) g (kn ) (D )
Since the first k output sequences equal the input sequences, they are called information sequences and the last n k output sequences are called parity sequences. Note that whereas in general kn generator sequences must be specified to define an (n, n k, k m) convolutional code, code only k(n k) sequences must be specified to define a systematic code. Systematic y codes represent p a subclass of the set of all p possible codes.
90
Consider C id the th (2, (2 1, 1 3) systematic t ti code. d The Th generator t sequences are g(1)=(1 0 0 0) and g(2)=(1 1 0 1). The generator matrix is
1 1 0 1 0 1 1 0 G= 1 0 1 1 0 1 0 0 0 1 0 1 0 0 0 1
G(D) = [1 1 + D + D3].
)(
92
One advantage of systematic codes is that encoding is somewhat simpler than for nonsystematic codes because less hardware is required. For an (n, k, m) systematic y code with k > n k , there exits a modified encoding circuit which normally requires fewer than K shift register states.
The (2, 1, 3) systematic code requires only one modulo-2 adder with three inputs.
93
Consider C id the th (3, (3 2, 2 2) systematic t ti code d with ith transfer t f function f ti matrix
1 0 1 + D + D 2 G (D ) = 2 0 1 1 + D
94
Example (3,2,2) Systematic Code: (1) (1) Since the information sequences are given by v (D) = u (D) and v(2)(D) = u(2)(D), and the parity sequence is given by
(3 ) (D ) + u (2 ) (D )g (23 ) (D ) , v (3 ) (D ) = u (1) (D )g 1
The (3, 2, 2) systematic encoder, and it requires only two stages of encoder memory rather than 4.
95
A complete discussion of the minimal encoder memory required to realize a convolutional code is given by Forney. Forney In most cases the straightforward realization requiring K states of shift register g memory y is most efficient. In the case of an (n,k,m) systematic code with k>n-k, a simpler realization usually y exists. Another advantage of systematic codes is that no inverting circuit is needed for recovering the information sequence from the code word.
96
Nonsystematic codes, on the other hand, require an inverter to recover the information sequence; that is, is an n k matrix G-1(D) must exit such that G(D)G-1(D) = IDl for some l 0, where I is the k k identity matrix. Since V(D) = U(D)G(D), we can obtain V(D)G-1(D) = U(D)G(D)G-1(D) = U(D)Dl , and the information sequence can be recovered with an l-time-unit time unit delay from the code word by letting V(D) be the input to the n-input, k-output linear sequential circuit whose transfer function matrix is G-1(D).
97
For an (n, 1, m) code, a transfer function matrix G(D) has a feedforward inverse G-1(D) of delay l if and only if GCD[g(1)(D), g(2)(D),, g(n)(D)] = Dl for some l 0 0, where GCD denotes the greatest common divisor. divisor n For an (n, k, m) code with k > 1, let i (D ), i = 1, 2, , , n k be the determinants of the k distinct k k submatrices of the transfer function matrix G(D). A feedforward inverse of delay l exits if and only if
GCD i ( D) : i = 1, 2, for some l 0. = Dl n , k
98
99
For th F the (3, (3 2, 2 1) code d , the th 2 2 submatrices b ti of f G(D) yield i ld determinants 1 + D + D2, 1 + D2, and 1. Since 2 2 GCD 1 + D + D , 1 + D , 1 =1 there exists a feedforward inverse of delay 0. Th required The i d transfer t f function f ti matrix t i is i given by: 0 0 G 1 ( D ) = 1 1 + D D 1
100
To understand what happens when a feedforward inverse does not , it is best to consider an example. p exist, (1) (2) 2 For the (2, 1, 2) code with g (D) = 1 + D and g (D) = 1 + D ,
2 GCD 1 + D , 1 + D = 1 + D,
and a feedforward inverse does not exist. If the information sequence is u(D) = 1/(1 + D) = 1 + D + D2 + , the output sequences are v(1)(D) = 1 and v(2)(D) = 1 + D; that is, is the code word contains only three nonzero bits even though the information sequence has infinite weight. If this code word is transmitted over a BSC, and the three nonzero bits are changed to zeros by the channel noise, the received sequence will be all zeros.
101
A MLD will then produce the all-zero code word as its estimated, since this is a valid code word and it agrees exactly with the received sequence. The estimated information sequence q will be u ( D ) = 0, implying py g an infinite number of decoding errors caused by a finite number (only three in this case) of channel errors. Clearly, this is a very undesirable circumstance, and the code is said to be subject to catastrophic error propagation, and is called a catastrophic hi code d . (1) ( 2) (n) = D l and GCD GCD g D , g D , , g D ( ) ( ) ( ) i ( D ) l D : i = 1, 2,...., n = are necessary and sufficient conditions k for a code to be noncatastrophic.
()
102
Any code for which a feedforward inverse exists is noncatastrophic. noncatastrophic Another advantage of systematic codes is that they are always noncatastrophic. p A code is catastrophic if and only if the state diagram g contains a loop p of zero weight other than the self-loop around the state S0. Note that the self-loop around the state S3 has zero weight. State diagram of a (2, 1, 2) catastrophic code.
103
In choosing nonsystematic codes for use in a communication system, system it is important to avoid the selection of catastrophic codes. O l af Only fraction i 1/(2n 1) of f (n, 1, 1 m) nonsystematic i codes d are catastrophic. A similar result for (n, k, m) codes with k > 1 is still lacking.
104
Introduction
In 1967, Viterbi introduced a decoding algorithm for convolutional codes which has since become known as Viterbi algorithm. Later, Omura O showed h d that h the h Viterbi i bi algorithm l i h was equivalent to finding the shortest path through a weighted graph. h Forney recognized that it was in fact a maximum likelihood decoding algorithm for convolutional codes; that is, the decoder output selected is always the code word that gives the largest value of the log-likelihood function.
106
In order to understand Viterbis decoding algorithm, it is convenient to expand the state diagram of the encoder in time (i.e., (i e to represent each time unit with a separate state diagram). The resulting g structure is called a trellis diagram g , and is shown in Figure (a) for the (3, 1, 2) code with 2 2 G ( D) = 1 + D , 1 + D , 1 + D + D and an information sequence of length L=5. The trellis diagram g contains L+m+1 time units or levels, , and these are labeled from 0 to L+m in Figure (a). Assuming that the encoder always starts in state S0 and returns to state S0, the first m time units correspond to the encoders departure from state S0, and the last m time units correspond to the encoders return t to t state t t S0.
107
108
Not all states can be reached in the first m or the last m time units. However in the center portion of the trellis However, trellis, all states are possible, possible and each time unit contains a replica of the state diagram. There are two branches leaving g and entering g each state. The upper branch leaving each state at time unit i represents the input p ui = 1, while the lower branch represents p ui = 0. Each branch is labeled with the n corresponding outputs vi, and each of the 2L code words of length N = n(L + m) is represented by a unique path through the trellis. For example, the code word corresponding to the information sequence u = (1 1 1 0 1) is i shown h highlighted hi hli h d in i Figure Fi (a). ( )
109
In the general case of an (n, k, m) code and an information sequence of length kL, there are 2k branches leaving and entering each state, state and 2kL distinct paths through the trellis corresponding to the 2kL code words. Now assume that an information sequence u = (u0,, uL1) of length kL is encoded into a code word v = (v0, v1,, vL+m1) of length N = n(L + m), and that a sequence r = (r0, r1,, rL+m1) is received over a discrete memoryless channel (DMC). Alt Alternatively, ti l these th sequences can be b written itt as u = (u0,, ukL1), ) v = (v0, v1,, vN1), r = (r0, r1,, rN1), where the subscripts now simply represent the ordering of the symbols in each sequence. sequence
110
As a general rule of detection, the decoder must produce an estimate of the code word v based on the received sequence r. v as the A maximum likelihood decoder (MLD) for a DMC chooses v code word v which maximizes the log-likelihood function logP(r |v). Since for a DMC L + m 1 N 1 P ( r | v ) = P ( ri | v i ) = P ( ri | vi ), i =0 i =0 it follows that log P ( r | v ) =
L + m 1
i =0
N 1
where h P(ri |vi) i is a channel h l transition t iti probability b bilit . This is a minimum error probability decoding rule when all code words are equally likely likely.
111
The log-likelihood function log P(r |v) is called the metric associated with the path v, and is denoted M(r |v). ) The terms log P(ri |vi) in the sum of Equation (A) are called branch metrics, and are denoted M(ri |vi), whereas the terms log g P(ri |vi) are called bit metrics, and are denoted M(ri |vi). The p path metric M(r |v) can be written as M (r | v ) =
L + m 1
i =0
M ( ri | v i ) = M ( ri | vi ).
i =0
N 1
The decision made by the log-likelihood function is called the softdecision. If the h channel h l is i added dd d with i h AWGN, AWGN soft-decision f d i i decoding d di leads l d to finding the path with minimum Euclidean distance.
112
A partial path metric for the first j branches of a path can now be expressed as M
([r | v ] ) = M ( r | v ).
j 1 j i =0 i i
The following algorithm, when applied to the received sequence r from a DMC, finds the path through the trellis with the largest metric (i.e., the maximum likelihood path). The algorithm g p processes r in an iterative manner. At each step, it compares the metrics of all paths entering each state, and stores the path with the largest metric, called the survivor, together with its metric.
113
114
There are 2K survivors from time unit m through time unit L, one for each of the 2K states. states (K is the total number of registers) After time unit L there are fewer survivors, since there are fewer states while the encoder is returning g to the all-zero state. Finally, at time unit L + m, there is only one state, the all-zero state, and hence only y one survivor, and the algorithm g terminates. Theorem The final survivor v in the Viterbi algorithm is maximum likelihood path, that is,
M r | v M (r | v ) ,
( )
all v v.
115
Proof. Assume that the maximum likelihood path is eliminated by the algorithm at time unit j, as illustrated in figure. g This implies that the partial path metric of the survivor exceeds that of the maximum likelihood path at this point. If the remaining portion of the maximum likelihood path is appended onto the survivor at time unit j, the total metric of this path will exceed the total metric of the maximum likelihood path.
116
p + N log g (1 p ) 1 p where d(r, v) is the Hamming distance between r and v. log g P ( r | v ) = d ( r , v ) log g
117
Since log p (1 p ) < 0 and N log (1 p ) is a constant for all v , an MLD for BSC chooses v as the code word v that minimizes the Hamming distance
d ( r, v ) =
L + m 1 i =0
d ( r , v ) = d ( r , v ).
i i i =0 i i
N 1
In applying the Viterbi algorithm to the BSC, d(ri, vi) becomes the branch metric, d(ri, vi) becomes the bit metric, and the algorithm must t find fi d the th path th through th h the th trellis t lli with ith the th smallest ll t metric t i (i.e., (i the path closest to r in Hamming distance). The detail of the algorithm are exactly the same, except that the Hamming distance replaces the log-likelihood function as the metric and the survivor at each state is the path with the smallest metric. Thi kind This ki d of f decoding d di is i called ll d hard-decision h dd ii .
118
Example 11.2:
The application of the Viterbi algorithm to a BSC is shown in the following figure. Assume that a code word from the trellis diagram of the (3, 1, 2) code of Figure (a) is transmitted over a BSC and that the received sequence is
r = (1 1 0, 1 1 0, 1 1 0, 1 1 1, 0 1 0, 1 0 1, 1 0 1) .
119
The final survivor v = (1 1 1, 0 1 0, 1 1 0, 0 1 1, 1 1 1, 1 0 1, 0 1 1) is shown as the highlighted path in the figure, figure and the decoded information sequence is u = (1 1 0 0 1) . The fact that the final survivor has a metric of 7 means that no other path through the trellis differs from r in fewer than seven positions. Note that at some states neither path is crossed out. This indicates a tie in the metric values of the two paths entering that state. If the final survivor goes through any of these states, there is more than one maximum ma im m likelihood path (i.e., (i e more than one path whose distance from r is minimum).
120
From an i F implementation l t ti point i t of f view, i whenever h a ti tie in i metric ti values occurs, one path is arbitrarily selected as the survivor, because of the impracticality of storing a variable number of paths. This sa arbitrary b t a y resolution eso ut o of o ties t es has as no o effect e ect on o the t e decoding decod g error probability.
121
In some practical applications, there is a need to employ high-rate convolutional codes, codes e.g. e g rate of (n-1)/ 1)/n. The trellis for such high-rate codes has 2n-1 branches that enter each state. Consequently, q y, there are 2n-1 metric computations p per p state that must be performed in implementing the Viterbi algorithm and as many comparisons of the updated metrics in order to select the best path at each state. The computational complexity inherent in the implementation of the d d of decoder f a high-rate hi h t convolutional l ti l code d can b be avoided id d by b designing the high-rate code from a low-rate code in which some of the coded bits are deleted (punctured) from transmission. transmission Puncturing a code reduces the free distance.
122
The puncturing process may be described as periodically deleting selected bits from the output of the encoder, encoder thus, thus creating a periodically time varying trellis code. We begin g with a rate 1/n p parent code and define a puncturing p g period p P, corresponding to P input information bits to the encoder. In one p period, the encoder outputs p nP coded bits. Associated with the nP encoded bits is a puncturing matrix P of the form:
p11 p 21 P= pn1 p12 p22 pn 2
123
p1 p p2 p pnp
Each column of P corresponds to the n possible output bits from the encoder for each input bit and each element of P is either 0 or 1. 1 When pij=1, the corresponding output bit from the encoder is transmitted. When pij=0, the corresponding output bit from the encoder is deleted. The code rate is determined by the period P and the number of bits deleted. If we delete N bits out of nP, the code rate is P/(nP-N).
124
Example: Constructing a rate code by puncturing the output of the rate 1/3 1/3, K=3 3 encoder as shown in the following figure figure. There are many choices for P and M to achieve the desired rate. We may y take the smallest value of P, namely, y, P=3. Out of every nP=9 output bits, we delete N=5 bits.
Original Circuit.
125
Example: Trellis of a rate convolutional code by puncturing the output of the rate 1/3, 1/3 K=3 3 encoder .
126
Some puncturing matrices are better than others in that the trellis paths have better Hamming distance properties. properties A computer search is usually employed to find good puncturing matrices. Generally, the high-rate punctured convolutional codes generated in this manner have a free distance that is either equal q to or 1 bit less than the best same high-rate convolutional code obtained directly without puncturing. In general, high-rate codes with good distance properties are obtained by puncturing rate maximum free distance codes.
127
128
The decoding of punctured convolutional codes is performed in the same manner as the decoding of the low-rate low rate 1/n parent code code, using the trellis of the 1/n code. When one or more bits in a branch are p punctured, , the corresponding p g branch metric increment is computed based on the non-punctured bits, thus, the punctured bits do not contribute to the branch metrics. Error events in a punctured code are generally longer than error events in the low rate 1/n parent code. Consequently, the decoder must wait longer than five constraint lengths before making final decisions on the received bits.
129
In the transmission of compressed digital speech signals and in some other applications applications, there is a need to transmit some groups of information bits with more redundancy than others. In other words, , the different g groups p of information bits require q unequal error protection to be provided in the transmission of the information sequence, where the more important bits are transmitted with more redundancy. Instead of using separate codes to encode the different groups of bits, it is i desirable d i bl to t use a single i l code d that th t has h variable i bl redundancy. d d Thi This can be accomplished by puncturing the same low rate 1/n convolutional code by different amounts as described by Hagenauer (1988).
130
The puncturing matrices are selected to satisfy a rate- compatibility criterion where the basic requirement is that lower criterion, lower-rate rate codes (higher redundancy) should transmit the same coded bits as all higher-rate codes plus additional bits. The resulting codes obtained from a single rate 1/n convolutional code are called rate-compatible punctured convolutional (RCPC) code. In applying RCPC codes to systems that require unequal error protection t ti of f the th information i f ti sequence, we may format f t the th groups of f bits into a frame structure and each frame is terminated after the last group of information bits by K-1 1 zeros. zeros
131
Example: Constructing a RCPC code from a rate 1/3, 1/3 K=4 maximum free distance convolutional code. Note that when a 1 appears in a puncturing matrix of a high-rate code, a 1 also appears in the same position for all lower-rate lower rate codes.
132
Practical Considerations
The minimum free distance dfree can be increased either by decreasing the code rate or by increasing the constraint length, or both. If hard-decision a d dec s o decoding is used, the performances (coding gains) are reduced by approximately 2 dB for the AWGN channel when compared with the soft-decision decoding.
133
Practical Considerations
Two important issues in the implementation of Viterbi decoding are: The effect of path memory truncation, truncation which is a desirable feature that ensures a fixed decoding delay. The degree of quantization of the input signal to the Viterbi decoder. The path memory truncation to about five constraint lengths has been found to result in negligible performance loss.
134
Practical Considerations
Bit error probability for rate Viterbi decoding with eight-level quantized inputs to the decoder and 32-bit path memory.
135
Practical Considerations
Performance of rate codes with harddecision Viterbi decoding g and 32-bit path memory truncation
136
Practical Considerations
Performance of rate , K=5 code with eighteight-, four-, four- and two-level quantization at the input p to the Viterbi decoder. Path truncation length=32 bits.
137
Practical Considerations
Performance of rate , K=5 5 code with 32 32-, 16 16-, and 8-biy path memory truncation and eight- and two-level quantization.
138