Math 422 Coding Theory & Cryptography: John C. Bowman University of Alberta
Math 422 Coding Theory & Cryptography: John C. Bowman University of Alberta
Math 422 Coding Theory & Cryptography: John C. Bowman University of Alberta
John C. Bowman
University of Alberta
Edmonton, Canada
Reproduction of these lecture notes in any form, in whole or in part, is permitted only for
nonprofit, educational use.
Contents
Preface 5
1 Introduction 6
1.A Error Detection and Correction . . . . . . . . . . . . . . . . . . . . . 7
1.B Balanced Block Designs . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.C The ISBN Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 Linear Codes 23
2.A Encoding and Decoding . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.B Syndrome Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3 Hamming Codes 39
4 Golay Codes 44
5 Finite Fields 48
6 Cyclic Codes 57
7 BCH Codes 66
8 Cryptographic Codes 78
8.A Symmetric-Key Cryptography . . . . . . . . . . . . . . . . . . . . . . 78
8.B Public-Key Cryptography . . . . . . . . . . . . . . . . . . . . . . . . 81
8.B.1 RSA Cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . 81
8.B.2 Rabin Public-Key Cryptosystem . . . . . . . . . . . . . . . . . 86
8.C Discrete Logarithm Schemes . . . . . . . . . . . . . . . . . . . . . . . 86
8.C.1 Diffie–Hellman Key Exchange . . . . . . . . . . . . . . . . . . 86
8.C.2 Okamoto Authentication Scheme . . . . . . . . . . . . . . . . 87
8.C.3 Digital Signature Standard . . . . . . . . . . . . . . . . . . . . 89
8.C.4 Silver–Pohlig–Hellman Discrete Logarithm Algorithm . . . . . 90
8.D Cryptographic Error-Correcting Codes . . . . . . . . . . . . . . . . . 90
Bibliography 91
3
4 CONTENTS
Index 92
Preface
These lecture notes are designed for a one-semester course on error-correcting codes
and cryptography at the University of Alberta. I would like to thank my col-
leagues, Professors Hans Brungs, Gerald Cliff, and Ted Lewis, for their written
notes and examples, on which these notes are partially based (in addition to the
references listed in the bibliography) and also Professor Jochen Kuttler, for pointing
out several errors in an earlier version of these notes. The figures in this text were
drawn with the high-level vector graphics language Asymptote (freely available at
http://asymptote.sourceforge.net).
5
Chapter 1
Introduction
In the modern era, digital information has become a valuable commodity. For exam-
ple, the news media, governments, corporations, and universities all exchange enor-
mous quantities of digitized information every day. However, the transmission lines
that we use for sending and receiving data and the magnetic media (and even semi-
conductor memory devices) that we use to store data are imperfect.
Since transmission line and storage devices are not 100% reliable device, it has
become necessary to develop ways of detecting when an error has occurred and,
ideally, correcting it. The theory of error-correcting codes originated with Claude
Shannon’s famous 1948 paper “A Mathematical Theory of Communication” and has
grown to connect to many areas of mathematics, including algebra and combinatorics.
The cleverness of the error-correcting schemes that have been developed since 1948 is
responsible for the great reliability that we now enjoy in our modern communications
networks, computer systems, and even compact disk players.
Suppose you want to send the message “Yes” (denoted by 1) or “No” (denoted
by 0) through a noisy communication channel. We assume that there is a uniform
probability p < 1 that any particular binary digit (often called a bit) could be altered,
independent of whether or not any other bits are transmitted correctly. This kind
of transmission line is called a binary symmetric channel. (In a q-ary symmetric
channel, the digits can take on any of q different values and the errors in each digit
occur independently and manifest themselves as the q − 1 other possible values with
equal probability.)
If a single bit is sent, a binary channel will be reliable only a fraction 1 − p of the
time. The simplest way of increasing the reliability of such transmissions is to send
the message twice. This relies on the fact that if p is small, the probability p2 of two
errors occurring is very small. The probability of no errors occurring is (1 − p)2 . The
probability of one error occurring is 2p(1 − p) since there are two possible ways this
could happen. While reception of the original message
√ is more likely than any other
particular result if p < 1/2, we need p < 1 − 1/ 2 ≈ 0.29 to be sure that the correct
message is received most of the time.
6
1.A. ERROR DETECTION AND CORRECTION 7
• The set of all words in the English language is a code over the 26-letter alphabet
{A, B, . . . , Z}.
One important aspect of all error-correcting schemes is that the extra information
that accomplishes this must itself be transmitted and is hence subject to the same
kinds of errors as is the data. So there is no way to guarantee accuracy; one simply
attempts to make the probability of accurate decoding as high as possible.
A good code is one in which the codewords have little resemblance to each other.
If the codewords are sufficiently different, we will soon see that it is possible not only
to detect errors but even to correct them, using nearest-neighbour decoding, where
one maps the received vector back to the closest nearby codeword.
• The set of all 10-digit telephone numbers in the United Kingdom is a 10-ary code of
length 10. It is possible to use a code of over 82 million 10-digit telephone numbers
(enough to meet the needs of the U.K.) such that if just one digit of any phone
number is misdialed, the correct connection can still be made. Unfortunately, little
thought was given to this, and as a result, frequently misdialed numbers do occur
in the U.K. (as well as in North America)!
Definition: We define the Hamming distance d(x, y) between two codewords x and y
of Fqn as the number of places in which they differ.
Remark: Notice that d(x, y) is a metric on Fqn since it is always non-negative and
satisfies
1. d(x, y) = 0 ⇐⇒ x = y,
2. d(x, y) = d(y, x) for all x, y ∈ Fqn ,
3. d(x, y) ≤ d(x, z) + d(z, y) for all x, y, z ∈ Fqn .
The first two properties are immediate consequences of the definition, while the third
property, known as the triangle inequality, follows from the simple observation that
d(x, y) is the minimum number of digit changes required to change x to y, whereas if
we were to change x to y by first changing x to z and then changing z to y, we would
require d(x, z) + d(z, y) changes. Thus d(x, y) ≤ d(x, z) + d(z, y). This important
inequality is illustrated in Fig 1.1.
1.A. ERROR DETECTION AND CORRECTION 9
Definition: The weight w(x) of a q-ary codeword x is the number of nonzero digits
in x.
Remark: Let x and y be codewords in Znq . Then d(x, y) = w(x − y). Here, x − y is
computed mod q, digit by digit.
Definition: Let C be a code in Fqn . We define the minimum distance d(C) of the
code:
d(C) = min{d(x, y) : x, y ∈ C, x 6= y}.
Remark: In view of the previous discussion, a good code is one with a relatively
large minimum distance.
• For example, here is a (5, 4, 3) code, consisting of four codewords from Z52 , which
are at least a distance 3 from each other:
0 0 0 0 0
0 1 1 0 1
C3 = 1 0 1 1 0 .
1 1 0 1 1
The following theorem (cf. Fig. 1.2) shows how this works in general.
Theorem 1.1 (Error Detection and Correction): In a symmetric channel with error-
probability p > 0,
(i) a code C can detect up to t errors in every codeword ⇐⇒ d(C) ≥ t + 1;
(ii) a code C can correct up to t errors in any codeword ⇐⇒ d(C) ≥ 2t + 1.
Proof:
(i) “⇐” Suppose d(C) ≥ t + 1. Let a codeword x be transmitted such that t or
fewer errors are introduced, resulting in a new vector y ∈ Fqn . Then d(x, y) =
w(x−y) ≤ t < t+1 ≤ d(C), so the received vector cannot be another codeword.
Hence t errors can be detected.
“⇒” Suppose C can detect up to t errors. If d(C) < t + 1, then there is some
pair of codewords x and y with d(x, y) ≤ t. Since it is possible to send the
codeword x and receive another codeword y by the introduction of t errors,
we conclude that C cannot detect t errors, contradicting our premise. Hence
d(C) ≥ t + 1.
(ii) “⇐” Suppose d(C) ≥ 2t + 1. Let a codeword x be transmitted such that t
or fewer errors are introduced, resulting in a new vector y ∈ Fqn satisfying
d(x, y) ≤ t. If x0 is a codeword other than x, then d(x, x0 ) ≥ 2t + 1 and the
triangle inequality d(x, x0 ) ≤ d(x, y) + d(y, x0 ) implies that
d(y, x0 ) ≥ d(x, x0 ) − d(x, y) ≥ 2t + 1 − t = t + 1 > t ≥ d(y, x).
Hence the received vector y is closer to x than to any other codeword x0 , making
it possible to identify the original transmitted codeword x correctly.
“⇒” Suppose C can correct up to t errors. If d(C) < 2t + 1, there is some
pair of distinct codewords x and x0 with distance d(x, x0 ) ≤ 2t. If d(x, x0 ) ≤ t,
let y = x0 , so that 0 = d(y, x0 ) < d(y, x) ≤ t. Otherwise, if t < d(x, x0 ) ≤ 2t,
construct a vector y by changing t of the digits of x that are in disagreement
with x0 to their corresponding values in x0 , so that 0 < d(y, x0 ) ≤ d(y, x) = t.
In either case, it is possible to send the codeword x and receive the vector y
due to t or fewer transmission errors. But since d(y, x0 ) ≤ d(y, x), the received
vector y cannot not be unambiguously decoded as x using nearest-neighbour
decoding. This contradicts our premise. Hence d(C) ≥ 2t + 1.
1.A. ERROR DETECTION AND CORRECTION 11
Corollary 1.1.1: If a code C has minimum distance d, then C can be used either (i)
to detect up to d − 1 errors or (ii) to correct up to b d−1
2
c errors in any codeword.
Here bxc represents the greatest integer less than or equal to x.
A good (n, M, d) code has small n (for rapid message transmission), large M (to
maximize the amount of information transmitted), and large d (to be able to correct
many errors. A primary goal of coding theory is to find codes that optimize M for
fixed values of n and d.
Definition: Let Aq (n, d) be the largest value of M such that there exists a q-ary
(n, M, d) code.
To help us tabulate Aq (n, d), let us first consider the following special cases:
(i) Aq (n, 1) = q n ;
(ii) Aq (n, n) = q.
Proof:
(i) When the minimum distance d = 1, we require only that the codewords be
distinct. The largest code with this property is the whole of Fqn , which has
M = q n codewords.
(ii) When the minimum distance d = n, we require that any two distinct codewords
differ in all n positions. In particular, this means that the symbols appearing in
the first position must be distinct, so there can be no more than q codewords.
A q-ary repetition code of length n is an example of an (n, q, n) code, so the
bound Aq (n, n) = q can actually be realized.
Remark: There must be at least two codewords for d(C) even to be defined. This
means that Aq (n, d) is not defined if d > n, since d(x, y) = w(x−y) ≤ n for distinct
codewords x, y ∈ Fqn .
Lemma 1.1 (Reduction Lemma): If a q-ary (n, M, d) code exists, with d ≥ 2, there
also exists an (n − 1, M, d − 1) code.
Proof: Given an (n, M, d) code, let x and y be codewords such that d(x, y) = d
and choose any column where x and y differ. Delete this column from all codewords.
Since d ≥ 2, the codewords that result are distinct and form a (n − 1, M, d − 1) code.
12 CHAPTER 1. INTRODUCTION
Theorem 1.3 (Even Values of d): Suppose d is even. Then a binary (n, M, d) code
exists ⇐⇒ a binary (n − 1, M, d − 1) code exists.
Proof:
“⇒” This follows from Lemma 1.1.
“⇐” Suppose C is a binary (n − 1, M, d − 1) code. Let Ĉ be the code of
length n obtained by extending each codeword x of C by adding a parity
bit w(x) mod 2. This makes the weight w(x̂) of every codeword x̂ of Ĉ
even. Then d(x, y) = w(x) + w(y) − 2w(xy) must be even for every pair
of codewords x and y in Ĉ, so d(Ĉ) is even. Note that d − 1 = d(C) ≤
d(Ĉ) ≤ d. But d − 1 is odd, so in fact d(Ĉ) = d. Thus Ĉ is a (n, M, d)
code.
Corollary 1.3.1 (Maximum Code Size for Even d): If d is even, then A2 (n, d) =
A2 (n − 1, d − 1).
This result means that we only need to calculate A2 (n, d) for odd d. In fact, in
view of Theorem 1.1, there is little advantage in considering codes with even d if the
goal is error correction. In Table 1.1, we present values of A2 (n, d) for n ≤ 16 and for
odd values of d ≤ 7.
As an example, we now compute the value A2 (5, 3) entered in Table 1.1, after
establishing a useful simplification, beginning with the following definition.
Definition: Two q-ary codes are equivalent if one can be obtained from the other by
a combination of
Remark: Note that the distances between codewords are unchanged by each of these
operations. That is, equivalent codes have the same (n, M, d) parameters and can
correct the same number of errors. Furthermore, in a q-ary symmetric channel, the
error-correction performance of equivalent codes will be identical.
Lemma 1.2 (Zero Vector): Any code over an alphabet containing the symbol 0 is
equivalent to a code containing the zero vector 0.
Proof: Given a code of length n, choose any codeword x1 x2 . . . xn . For each i such
that xi 6= 0, apply the relabelling 0 ↔ xi to the symbols in the ith column.
• Armed with the above lemma and the concept of equivalence, it is now easy to
prove that A2 (5, 3) = 4. Let C be a (5, M, 3) code with M ≥ 4. Without loss
of generality, we may assume that C contains the zero vector (if necessary, by
replacing C with an equivalent code). Then there can be no codewords with just
one or two 1s since d = 3. Also, there can be at most one codeword with four or
more 1s; otherwise there would be two codewords with at least three 1s in common
positions and less than a distance 3 apart. Since M ≥ 4, there must be at least
two codewords containing exactly three 1s. By rearranging columns, if necessary,
we see that the code contains the codewords
0 0 0 0 0
1 1 1 0 0
0 0 1 1 1
There is no way to add any more codewords containing exactly three 1s and we
can also now rule out the possibility of five 1s. This means that there can be at
most four codewords, that is, A2 (5, 3) ≤ 4. Since we have previously shown that
A2 (5, 3) ≥ 4, we deduce that A2 (5, 3) = 4.
14 CHAPTER 1. INTRODUCTION
Remark: A fourth codeword, if present in the above code, must have exactly four 1s.
The only possible position for the 0 symbol is in the middle position, so the fourth
codeword must be 11011. We then see that the resulting code is equivalent to C3
and hence A2 (5, 3) is unique, up to equivalence.
The above trial-and-error approach becomes impractical for large codes. In some
of these cases, an important bound, known as the sphere-packing or Hamming bound,
can be used to establish that a code is as large as possible for given values of n and d.
t
X n
(q − 1)k
k=0
k
vectors.
Proof: The number of vectors that are a distance k from a fixed vector in Fqn is
n
(q − 1)k , because there are nk choices for the k positions that differ from those of
k
the fixed vector and there are q − 1 values that can be assigned independently to each
of these k positions. Summing over the possible values of k, we obtain the desired
result.
Proof: By the triangle inequality, any two spheres of radius t that are centered on
distinct codewords will have no vectors in common. The total number of vectors in
the M spheres of radius t centered on the M codewords is thus given by the left-hand
side of the above inequality; this number can be no more than the total number q n
of vectors in Fqn .
• For our binary (5, 4, 3) code, Eq. (1.1) gives the bound M (1 + 5) ≤ 25 = 32,
which implies that A2 (5, 3) ≤ 5. We have already seen that A2 (5, 3) = 4. This
emphasizes, that just because some set of numbers {n, M, t} satisfy Eq. (1.1), there
is no guarantee that such a code actually exists.
Definition: A perfect code is a code for which equality occurs in 1.1. For such a
code, the M spheres of radius t centered on the codewords fill the whole space Fqn
completely, without overlapping.
1.A. ERROR DETECTION AND CORRECTION 15
Remark: The codes that consist of a single codeword (taking t = n and M = 1),
codes that contain all vectors of Fqn (with t = 0 and M = q n ), and the binary
repetition code (with t = (n − 1)/2 and M = 2) of odd length n are trivially perfect
codes.
Each term in this sum is the number of vectors of weight t in Fqn . When we sum over
all possible values of t, we obtain q n , the total number of vectors in Fqn .
Alternatively, we see directly from the Binomial Theorem that
n
X n
(q − 1)t = (1 + (q − 1))n = q n .
t
t=0
Problem 1.2: Show that a q-ary (n, M, d) code must satisfy M ≤ q n−d+1 . Hint: what
can can you say about the vectors obtained by deleting the last d − 1 positions of
all codewords? It might help to first consider the special cases d = 1 and d = 2.
If you delete the last d − 1 positions of all codewords, the resulting vectors must be
distinct, or else the codewords could not be a distance d apart from each other. Since the
number of distinct q-ary vectors of length n − d + 1 is q n−d+1 , the number of codewords M
must be less or equal to this number.
Aq (n, d)
Aq (n − 1, d) ≥ .
q
Problem 1.4: Let C be a code with even distance d = 2m. Let t be the maximum
number of errors that C can be guaranteed to correct.
(b) Prove that C cannot be a perfect code. That is, there is no integer M such
that
t
X n
M (q − 1)k = q n .
k=0
k
For C to be perfect, each vector in Fqn would have to be contained in exactly one of the
M codeword spheres of radius t. However, we know that there are codewords x and y with
d(x, y) = 2m = 2t + 2. Consider the vector v obtained by changing t + 1 of those digits
where x disagrees with y to the corresponding digits in y. Then d(v, x) = d(v, y) = t + 1,
so v does not lie within the codeword spheres about x or y. If v were within a distance t
from another codeword z, the triangle inequality would imply that
contradicting the fact that the code has minimum distance 2t + 2. Thus v does not lie in
any codeword sphere. That is, C is not a perfect code.
• Let S = {1, 2, 3, 4, 5, 6, 7} and consider the subsets {1, 2, 4}, {2, 3, 5}, {3, 4, 6},
{4, 5, 7}, {5, 6, 1}, {6, 7, 2}, {7, 1, 3} of S. Each number lies in exactly 3 blocks, each
block contains 3 numbers, and each pair of numbers occurs together in exactly 1
block. The six lines and circle in Fig. 1.3 represent the blocks. Hence these subsets
form a (7, 7, 3, 3, 1) design.
Remark: The parameters (b, v, r, k, λ) are not independent. Consider the set of
ordered pairs
Since each of the v points lie in r blocks, there must be a total of vr ordered pairs
in T . Alternatively, we know that since there are b blocks and k points in each
block, we can form exactly bk such pairs. Thus bk = vr. Similarly, by considering
the set
we deduce
bk(k − 1) = λv(v − 1),
which, using bk = vr, simplifies to r(k − 1) = λ(v − 1).
Definition: A block design is symmetric if v = b (and hence k = r); that is, the
number of points and blocks is identical. For brevity, this is called a (v, k, λ)
design.
18 CHAPTER 1. INTRODUCTION
Definition: The incidence matrix of a block design is a v×b matrix with entries
1 if xi ∈ Bj ,
aij =
0 if xi ∈
/ Bj ,
• We now construct a (7, 16, 3) binary code C consisting of the zero vector 0, the
unit vector 1, the 7 rows of A, and the 7 rows of the matrix B obtained from A by
the relabelling 0 ↔ 1:
0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1
a1 1 0 0 0 1 0 1
a2 1 1 0 0 0 1 0
a3 0 1 1 0 0 0 1
a4 1 0 1 1 0 0 0
a5 0 1 0 1 1 0 0
a6 0 0 1 0 1 1 0
C=
a7 = 0
.
0 0 1 0 1 1
b1 0 1 1 1 0 1 0
b2 0 0 1 1 1 0 1
b3 1 0 0 1 1 1 0
b4 0 1 0 0 1 1 1
b5 1 0 1 0 0 1 1
b6 1 1 0 1 0 0 1
b7 1 1 1 0 1 0 0
To find the minimum distance of this code, note that each row of A has exactly
three 1s (since r = 3) and any two distinct rows of A have exactly one 1 in common
1.C. THE ISBN CODE 19
Thus C is a (7, 16, 3) code, which in fact is perfect, since the equality in Eq. (1.1) is
satisfied:
7 7
16 + = 16(1 + 7) = 128 = 27 .
0 1
The existence of a perfect binary (7, 16, 3) code establishes A2 (7, 3) = 16, so we
have now established another entry of Table 1.1.
If x10 turns out to be 10, an X is printed in place of the final digit. The tenth digit
serves to make the weighted check sum
10
X 9
X 9
X 9
X
kxk = kxk + 10 kxk = 11 kxk = 0 (mod 11).
k=1 k=1 k=1 k=1
So, if 10
P
k=1 kxk 6= 0 (mod 11), we know that an error has occurred. In fact, the ISBN
number is able to (ii) detect a single error or (ii) detect a transposition error that
results in two digits (not necessarily adjacent) being interchanged.
20 CHAPTER 1. INTRODUCTION
P10If a single error occurs, then some digit xj is received as xj + e with e 6= 0. Then
k=1 kxk + je = je (mod 11) 6= 0 (mod 11) since j and e are nonzero.
Let y be the vector obtained by exchanging the digits xj and xk in an ISBN
code x, where j 6= k. Then
10
X
ixi + (k − j)xj + (j − k)xk = (k − j)xj + (j − k)xk (mod 11)
i=1
= (k − j)(xj − xk ) (mod 11) 6= 0 (mod 11)
if xj 6= xk .
In the above arguments we have used the property of the field Z11 (the integers
modulo 11) that the product of two nonzero elements is always nonzero (since ab = 0
and a 6= 0 ⇒ a−1 ab = 0 ⇒ b = 0). Consequently, Zab with a, b > 1 cannot be a field
because the product ab = 0 (mod ab), even though a 6= 0 and b 6= 0. Note also that
there can be no inverse a−1 in Zab , for otherwise b = a−1 ab = a−1 0 = 0 (mod ab).
In fact, Zp is a field ⇐⇒ p is prime (cf. Theorem 5). For this reason, the ISBN
code is calculated in Z11 and not in Z10 , where 2 · 5 = 0 (mod n).
The ISBN code cannot be used to correct errors unless we know a priori which digit
is in error. To do this, we first need to construct a table of inverses modulo 11 using
the Euclidean division algorithm. For example, let y be the inverse of 2 modulo 11.
Then 2y = 1 (mod 11) implies 2y = 11q+1 or 1 = −11q+2y for some integers y and q.
On dividing 11 by 2 as we would to show that gcd(11, 2) = 1, we find 11 = 5 · 2 + 1 so
that 1 = 11 − 5 · 2, from which we see that q = −1 and y = −5 (mod 11) = 6 (mod 11)
are solutions. Similarly, 7−1 = 8 (mod 11) since 11 = 1 · 7 + 4 and 7 = 1 · 4 + 3 and
4 = 1·3+1, so 1 = 4−1·3 = 4−1·(7−1·4) = 2·4−1·7 = 2·(11−1·7)−1·7 = 2·11−3·7.
Thus −3 · 7 = −2 · 11 + 1; that is, 7 and −3 = 8 are inverses mod 11. The complete
table of inverses modulo 11 is shown in Table 1.2.
x 1 2 3 4 5 6 7 8 9 10
x−1 1 6 4 3 9 2 8 7 5 10
Suppose that we detect an error and we also know that it is the digit xj that is
in error (and hence unknown). Then we can use our table of inverses to solve for the
value of xj , assuming all of the other digits are correct. Since
10
X
jxj + kxk = 0 (mod 11),
k=1
k6=j
we know that
10
X
xj = −j −1 kxk (mod 11).
k=1
k6=j
1.C. THE ISBN CODE 21
For example, if we did not know the fourth digit x of the ISBN 0-19-x53803-0, we
would calculate
Problem 1.5: A smudge has obscured one of the digits of the ISBN code 0-8018-
6183057294739-1.
Determine the unknown digit.
The sixth digit is
Problem 1.6: A smudge has obscured one of the digits of the ISBN code 0-393-
051061830572940-X.
Determine the unknown digit.
The eighth digit is
Linear Codes
An important class of codes are linear codes in the vector space Fqn , where Fq is a
field.
Definition: A linear code C is a code for which, whenever u ∈ C and v ∈ C, then
αu + βv ∈ C for all α, β ∈ Fq . That is, C is a linear subspace of Fqn .
Remark: A binary code C is linear ⇐⇒ it contains 0 and the sum of any two
codewords in C is also in C.
Problem 2.1: Show that the (7, 16, 3) code developed in the previous chapter is
linear.
Remark: A linear code C will always be a k-dimensional linear subspace of Fqn for
some integer k between 1 and n. A k-dimensional code C is simply the set of all
linear combinations of k linearly independent codewords, called basis vectors. We
say that these k basis codewords generate or span the entire code space C.
Remark: Note that a q-ary [n, k, d] code is an (n, q k , d) code. To see this, let the k
basis vectors of an [n, k, d] code be uj , for j = 1, . . . , k. The q k codewords are
obtained as the linear combinations kj=1 aj uj ; there are q possible values for each
P
of the k coefficients aj . Note that
k
X k
X k
X
aj uj = bj u j ⇒ (aj − bj )uj = 0 ⇒ aj = bj , j = 1, . . . k,
j=1 j=1 j=1
by the linear independence of the basis vectors, so the q k generated codewords are
distinct.
22
23
Remark: Not every (n, q k , d) code is a q-ary [n, k, d] code (it might not be linear).
Lemma 2.1 (Distance of a Linear Code): If C is a linear code in Fqn , then d(C) =
w(C).
so w(C) = d(C).
Remark: Lemma 2.1 implies, for a linear code, that we only have to examine the
weights of the M − 1 nonzero codewords in order to find the minimum distance.
M
In contrast, for a general nonlinear code, we need to make 2 = M (M − 1)/2
comparisons (between all possible pairs of distinct codewords) to determine the
minimum distance.
Definition: A k × n matrix with rows that are basis vectors for a linear [n, k] code C
is called a generator matrix of C.
Problem 2.2: Show that the (7, 16, 3) perfect binary code in Chapter 1 is a [7, 4, 3]
linear code (note that 24 = 16) with generator matrix
1 1 1 1 1 1 1 1
a1 1 0 0 0 1 0 1
=
a2 1 1 0 0 0 1 0
a3 0 1 1 0 0 0 1
24 CHAPTER 2. LINEAR CODES
Remark: Linear q-ary codes are not defined unless q is a power of a prime (this is sim-
ply the requirement for the existence of the field Fq ). However, lower-dimensional
codes can always be obtained from linear q-ary codes by projection onto a lower-
dimensional subspace of Fqn . For example, the ISBN code is a subset of the 9-
10
dimensional subspace of F11 consisting of all vectors perpendicular to the vector
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); this is the space
( 10
)
X
(x1 x2 . . . x10 ) : kxk = 0 (mod 11) .
k=1
However, not all vectors in this set (for example X-00-000000-1) are in the ISBN
code. That is, the ISBN code is not a linear code.
For linear codes we must slightly restrict our definition of equivalence so that the
codes remain linear (e.g., in order that the zero vector remains in the code).
Definition: Two linear q-ary codes are equivalent if one can be obtained from the
other by a combination of
Remark: A generator matrix for a vector space can always be reduced to an equiv-
alent reduced echelon form spanning the same vector space, by permutation of its
rows and columns, multiplication of a row by a non-zero scalar, or addition of one
row to another. Note that any combination of these operations, including operation
(B) above, will generate equivalent linear codes.
Problem 2.3: Show that the generator matrix for the (7, 16, 3) perfect code in Chap-
ter 1 can be written in reduced echelon form as
1 0 0 0 1 0 1
0 1 0 0 1 1 1
G= 0 0 1 0 1 1 0 .
0 0 0 1 0 1 1
2.A. ENCODING AND DECODING 25
u = [ u1 u2 . . . uk ],
0 0 0 1 0 1 1
Problem 2.4: If a generator for a linear [n, k] code is in standard form, show that
the message vector is just the first k bits of the codeword.
Definition: Let C be a linear code over Fqn . Let a be any vector in Fqn . The set
a + C = {a + x : x ∈ C} is called a coset of C.
Lemma 2.2 (Equivalent Cosets): Let C be a linear code in Fqn and a ∈ Fqn . If b is
an element of the coset a + C, then
b + C = a + C.
b + y = (a + x) + y = a + (x + y) ∈ a + C,
Definition: The standard array (or Slepian) of a linear [n, k] code C in Fqn is a
q n−k × q k array listing all the cosets of C. The first row consists of the codewords
in C themselves, listed with 0 appearing in the first column. Subsequent rows are
listed one at a time, beginning with a vector of minimal weight that has not already
been listed in previous rows, such that the entry in the (i, j)th position is the sum
of the entries in position (i, 1) and position (1, j). The vectors in the first column
of the array are referred to as coset leaders.
0 0 0 0 1 0 1 1 0 0 1 0 1 1 1 1 1 0 1 0
0 0 0 1 0 0 1 1 1 1 1 0 1 0 0 1 1 0 0 1
0 0 1 0 0 0 1 0 0 1 1 0 0 1 0 1 1 1 1 1
0 1 0 0 0 0 0 1 0 1 1 1 1 1 0 1 0 0 1 1
1 0 0 0 0 1 1 1 0 1 0 0 1 1 0 0 1 0 1 1
0 0 0 1 1 0 1 1 1 0 1 0 1 0 1 1 1 0 0 0
0 1 0 1 0 0 0 1 1 1 1 1 1 0 0 1 0 0 0 1
2.A. ENCODING AND DECODING 27
Remark: The last two rows of the standard array for C3 could equally well have
been written as
1 1 0 0 0 1 0 1 0 1 0 1 1 1 0 0 0 0 1 1
1 0 0 0 1 1 1 1 0 0 0 0 1 1 1 0 1 0 1 0
Definition: If the codeword x is sent, but the received vector is y, we define the
.
error vector e = y − x.
Remark: If no more than t errors have occurred, the coset leaders of weight t or
less are precisely the error vectors that can be corrected. Recall that the code C3 ,
having minimum distance 3, can only correct one error. For the code C3 , as long as
no more than one error has occurred, the error vector will have weight at most one.
We can then decode the received vector by checking to see under which codeword
it appears in the standard array, remembering that the codewords themselves are
listed in the first row. For example, if y = 10111 is received, we know that the
error vector e = 00001, and the transmitted codeword must have been x = y − e =
10111 − 00001 = 10110.
Remark: If two errors have occurred, one cannot determine the original vector with
certainty, because in each row with coset leader weight 2, there are actually two
vectors of weight 2. For a code with minimum distance 2t + 1, the rows in the
standard array of coset leader weight greater than t can be written in more than one
way, as we have seen above. Thus, if 01110 is received, then either 01110 − 00011 =
01101 or 01110 − 11000 = 10110 could have been transmitted.
Remark: Let C be a binary [n, k] linear code and αi denote the number of coset
leaders for C having weight i, where i = 0, . . . , n. If p is the error probability for a
single bit, then the probability Pcorr (C) that a received vector is correctly decoded
is n
X
Pcorr (C) = αi pi (1 − p)n−i .
i=0
Remark: If C can correct t errors then the coset leaders of weight no more than
t
are unique and hence the total number of such leaders of weight i is αi = ni for
0 ≤ i ≤ t. In particular, if n = t, then
n
X n i
Pcorr (C) = p (1 − p)n−i = (p + 1 − p)n = 1.
i=0
i
Such a code is able to correct all possible errors (no matter how poor the transmis-
sion line is); however, since C only contains a single codeword, it cannot be used
to send any information!
28 CHAPTER 2. LINEAR CODES
Remark: For i > t, the coefficients αi can be difficult to calculate. For a perfect
code, however, we know that every vector is within a distance t of some codeword.
Thus, the error vectors that can be corrected by a perfect code are precisely those
vectors of weight no more than t; consequently,
n for 0 ≤ i ≤ t,
αi = i
0 for i > t.
[(1 − p)5 + 5p(1 − p)4 + 10p2 (1 − p)3 ]2 = [(1 − p)3 (1 + 3p + 6p2 )]2 = 0.99998
so that Perr = 0.00002. While this error rate is almost four times lower than that
for C3 , bear in mind that the repetition scheme requires the transmission of twice
as much data for the same number of information digits (i.e. it has half the rate
of C3 ).
Definition: Let C be a [n, k] linear code. The dual code C ⊥ of C in Fqn is the set of
all vectors that are orthogonal to every codeword of C:
Problem 2.5: Show that dual code C ⊥ to a linear code C is itself linear.
2.B. SYNDROME DECODING 29
That is,
v ∈ C ⊥ ⇐⇒ Gv t = 0
(where the superscript t denotes transposition). This just says that v is orthogonal
to each of the rows of G. From linear algebra, we know that the space spanned by
the k independent rows of G is a k dimensional subspace and the null space of G,
which is just C ⊥ , is an n − k dimensional subspace.
.
Definition: The redundancy r = n − k of a code represents the number of parity
check digits in the code.
that is, C is the space of all vectors that are orthogonal to every vector in C ⊥ . In
other words, Hut = 0 ⇐⇒ u ∈ C.
Proof: Let C be a linear code and u be a codeword such that w(u) = d(C) = d.
Since
u ∈ C ⇐⇒ Hut = 0
and u has d nonzero components, we see that some d columns of H are linearly
dependent. However, any d − 1 columns of H must be linearly independent, or else
there would exist a nonzero codeword in C with weight d − 1.
• For a code with weight 3, Theorem 2.2 tells us that any two columns of its parity-
check matrix must be linearly independent, but that some 3 columns are linearly
dependent.
Definition: Given a linear code with parity-check matrix H, the column vector Hut
is called the syndrome of u.
Lemma 2.3: Two vectors u and v are in the same coset of a linear code C ⇐⇒
they have the same syndrome.
Proof:
u − v ∈ C ⇐⇒ H(u − v)t = 0 ⇐⇒ Hut = Hv t .
To compute the syndrome for a code, we need only first determine the parity check
matrix. The following lemma describes an easy way to construct the standard form
of the parity-check matrix from the standard-form generator matrix.
[ −At | 1n−k ].
Proof: This follows from the fact that the rows of G are orthogonal to every row
of H, in other words, that
t −A
GH = [ 1k A ] = 1k (−A) + (A)1n−k = −A + A = 0,
1n−k
Problem 2.6: Show that the null space of a matrix is invariant to standard row
reduction operations (permutation of rows, multiplication of a row by a non-zero
scalar, and addition of one row to another) and that these operations may be used
to put a matrix H of full rank into standard form.
Remark: The syndrome Het of a binary error vector e is just the sum of those
columns of H for which the corresponding entry in e is nonzero.
The following theorem makes it particularly easy to correct errors of unit weight.
It will play a particularly important role for the Hamming codes discussed in the next
chapter.
Theorem 2.3: The syndrome of a vector that has a single error of m in the ith
position is m times the ith column of H.
Proof: Let ei be the vector with the value m in the ith position and zero in all
other positions. If the codeword x is sent and the vector y = x + ei is received the
syndrome Hy t = Hxt + Heti = 0 + Heti = Heti is just m times the ith column of H.
Remark: If the syndrome does not match any of the columns of H, we know that
more than one error has occurred. We can still determine which coset the syndrome
belongs to by comparing the computed syndrome with a table of syndromes of all
coset leaders. If the corresponding coset leader has minimal weight within its coset,
we are able to correct the error. To decode errors of weight greater than one we
will need to construct a syndrome table, but this table, having only q n−k entries,
is smaller than the standard array, which has q n entries.
Problem 2.7: Using the binary linear code with parity check matrix
0 0 1 1
H= 1 0 1
0 ,
0 1 1 0
The syndrome [0, 0, 1]t corresponds to the second column of H. So the transmitted
vector was 1011 − 0100 = 1111.
32 CHAPTER 2. LINEAR CODES
Problem 2.9:
(b) Prove that exactly 2n−1 vectors in F2n have even weight.
By specifying that a vector in F2n has even weight, we are effectively imposing a parity
check equation on it; the last bit is constrained to be the sum of the previous n − 1 bits. So
one can construct exactly 2q−1 vectors of even weight.
(c) If C ⊥ is the binary repetition code of length n, prove that C is a binary code
consisting of all even weight vectors. Hint: find a generator matrix for C ⊥ .
The generator matrix for C ⊥ is the 1 × n matrix
G⊥ = [ 1 1 1 ... 1 1 1 ].
This must be a parity checkPmatrix for C, so we know that C consists of all vectors
x1 x2 . . . xn for which w(x) = ni=1 xi = 0 (mod 2). That is, C consists of all even weight
vectors.
Alternatively, we can explicitly find a parity check matrix for C ⊥ ; namely, the n − 1 × n
matrix
1 1 0 0 ··· 0 0
1 0 1 0 ··· 0 0
⊥
1 0 0 1 ··· 0 0
H = ..
.. .. .
. . .
1 0 0 ··· 0 1 0
1 0 0 0 ··· 0 1
We see that H ⊥ is a generator matrix for C and that C consists only of even weight vectors.
Furthermore, the vector space C has dimension n − 1, so we know that it contains all 2n−1
even weight vectors.
Problem 2.10: Let C be the code consisting of all vectors in Fqn with checksum
0 mod q. Let C 0 be the q-ary repetition code of length n.
(a) Find a generator G and parity-check matrix H for C. What are the sizes of
these matrices?
A generator for C is the (n − 1) × n matrix
1 0 0 ··· 0 −1
0 1 0 ··· 0 −1
··· −1
0 0 1 0
G= ... .. .. .
. .
0 0 ··· 1 0 −1
0 0 0 ··· 1 −1
A parity-check matrix for C is the 1 × n matrix
H = [1 1 ... 1 ].
(c) Which of the following statements are correct? Circle all correct statements.
0 ⊥
• C ⊂ C
,
• C 0 = C ⊥
,
0 ⊥
• C ⊃ C
,
• C 0 ∩ C ⊥ = ∅,
A odd-length binary (n, 2, n) code can correct correct t = (n − 1)/2 errors. Odd-length
binary codes are (trivially) perfect: they satisfy the Hamming equality
n−1
2
X n
2 = 2n ,
k
k=0
from which the desired result follows. (Equivalently, this is a consequence of the left-right
symmetry of Pascal’s triangle.)
(g) By examining the inner (dot) products of the rows of G with each other,
determine which of the following statements are correct (circle all correct statements
and explain):
• C ⊂ C ⊥
,
• C = C ⊥,
• C ⊃ C ⊥,
• C ∩ C ⊥ = ∅,
36 CHAPTER 2. LINEAR CODES
The only correct statement is C ⊂ C ⊥ , since the rows of G are orthogonal to each other.
Note that C cannot be self-dual because it has dimension k = 3 and C ⊥ has dimension
n − k = 7 − 3 = 4.
(h) Suppose the vector 1100011 is received. Can this vector be decoded, assuming
that no more than one error has occurred? If so, what was the transmitted codeword?
Yes, in fact this is the first row of G, so it must be a codeword. So no errors have
occurred; the transmitted codeword was 1100011. As a check, one can verify that the
syndrome is [0000]t .
(i) Suppose the vector 1010100 is received. Can this vector be decoded, assuming
that no more than one error has occurred? If so, what was the transmitted codeword?
The syndrome is [1101]t , which is the second column of H. So the transmitted vector
was 1110100.
(j) Suppose the vector 1111111 is received. Show that at least 3 errors have
occurred. Can this vector be unambiguously decoded by C? If so what was the
transmitted codeword? If not, and if only 3 errors have occurred, what are the
possible codewords that could have been transmitted?
Since the syndrome [1011]t is neither a column of H nor the sum of two columns of H,
it does not correspond to an error vector of weight 1 or 2. Thus, at least 3 errors have
occurred. We cannot unambiguously decode this vector because C can only correct 1 error.
In fact, since the rows of G have weight 4, by part (g) and Problem 4.3, we know that
all nonzero codewords in C have weight 4. So any nonzero codeword could have been
transmitted, with 3 errors, to receive 1111111.
(b) Use G to encode the information messages 100, 010, 001, 200, 201, and 221.
The information word x is encoded as xG. So the information messages can be encoded
as follows:
1 0 0 1 0 0 1 2 0
0 1 0 0 1 0 0 1 1
0 0 1 1 0 0 1 2 0
0 0 1 2 0 1
2 0 0 0 1 0 0 1 1 =
.
2 0 0 2 1 0
2 0 1 0 0 1 2 0 1
2 0 1 1 1 1
2 2 1 2 2 1 1 0 0
2.B. SYNDROME DECODING 37
That is, the encoded words are the rows of the resulting matrix, namely 100120, 010011,
001201, 200210, 201111, and 221100.
(c) What is the minimum distance of this code?
Since
1 0 2 2 0 0
2H = 2
1 0 0 2 0 ,
0 1 1 0 0 2
we see that the columns of H and 2H are distinct, so no two columns of H are linearly
dependent. But the first column of H is the fifth column plus twice the fourth column, so
by Theorem 2.2 we know that d = 3.
(d) Decode the received word 122112, if possible. If you can decode it, determine
the corresponding message vector.
The syndrome is
1
2
2 0 1 1 0 0
2
2
1 2 0 0 1 0
= 0 .
1
0 2 2 0 0 1
1
1
2
The syndrome 201 is twice the third column of H, so the corrected word is 122112 −
2(00100) = 120112. Since G is in standard form, the corresponding message word, 120, is
just the first three bits of the codeword.
(e) Decode the received word 102201, if possible. If you can decode it, determine
the corresponding message vector.
The syndrome is
1
0
2 0 1 1 0 0
2
0
1 2 0 0 1 0
= 1 .
2
0 2 2 0 0 1
0
2
1
The syndrome 012 is not a multiple of any column of H, so either an incorrect codeword
was transmitted or more than one error in transmission has occurred. But you can only
correct one error with this code, so you have to ask for a retransmission.
Chapter 3
Hamming Codes
One way to construct perfect binary [n, k] codes that can correct single errors is to
ensure that every nonzero vector in F2n−k appears as a unique column of H. In this
manner, the syndrome of every possible vector in F2n can be identified with a column
of H, so that every vector in F2n is at most a distance one away from a codeword.
This is called a binary Hamming code, which we now discuss in the general space Fqn ,
where Fq is a field.
Remark: One can form q − 1 distinct scalar multiples from any nonzero vector u
in Fqr : if λ, γ ∈ Fq , then
λu = γu ⇒ (λ − γ)u = 0 ⇒ λ = γ or u = (λ − γ)−1 0 = 0.
Remark: Not only are the columns of H distinct, all nonzero multiples of any two
columns are also distinct. That is, any two columns of H are linearly independent.
The total number of nonzero column multiples that can thus be formed is n(q−1) =
q r − 1. Including the zero vector, we see that H yields a total of q r distinct
syndromes, corresponding to all possible error vectors of unit weight in Fqr .
• The columns of a parity-check matrix for the binary Hamming code Ham(r, 2)
consist of all possible nonzero binary codewords of length r.
38
39
Problem 3.1: Given a parity-check matrix H for a binary Hamming code, show that
the standard form for H (obtained by row reduction) is just a rearrangement of
the columns of H.
The generator matrix is then seen to be [ 1 1 1 ]. That is, Ham(2, 2) is just the
binary triple-repetition code.
Problem 3.2: Show that this code is equivalent to the (7, 16, 3) perfect code in
Chapter 1.
• A parity check matrix for Ham(3, 2) can be constructed by considering all possible
nonempty subsets of {a, b, c}, each of which corresponds to one of the bits of a
codeword x = x1 x2 . . . x7 in F27 :
a a a a
b b b b
c c c c
x1 x2 x3 x4 x5 x6 x7
Given any four binary information digits x1 , x2 , x3 , and x4 , there will be a unique
codeword satisfying Hx = 0; the parity-check digits x5 , x6 , and x7 can be deter-
mined from the three checksum equations corresponding to each of the elements a,
b, and c:
a : x2 + x3 + x4 + x5 = 0 (mod 2),
b: x1 + x3 + x4 + x6 = 0 (mod 2),
and
c: x1 + x2 + x4 + x7 = 0 (mod 2).
For example, the vector x = 1100110 corresponds to the collection
Since there are an even number of as, bs, and cs in this collection, we know that x
is a codeword.
Problem 3.3: Show that two distinct codewords x and y that satisfy the above three
parity check equations must differ in at least 3 places.
• We can write a parity-check matrix for a Ham(3, 2) code in in the binary ascending
form
0 0 0 1 1 1 1
H = 0 1 1 0 0 1 1 .
1 0 1 0 1 0 1
If the vector 1110110 is received, the syndrome is [0, 1, 1]t , which corresponds to
the binary number 3, so we know immediately that the a single error must have
occurred in the third position, without even looking at H. Thus, the transmitted
codeword was 1100110.
41
Remark: For nonbinary Hamming codes, we need to compare the computed syn-
drome with all nonzero multiples of the columns of the parity-check matrix.
• Thus A2 (3, 3) = 2, A2 (7, 3) = 16, A2 (15, 3) = 211 = 2048, and A2 (31, 3) = 226 .
Problem 3.4: Determine the number αi of coset leaders of weight i for Ham(r, 2),
for each i = 0, . . . , n.
We know that the Hamming code is perfect and has minimum distance 3. The error
vectors that can be corrected by a Hamming code are precisely those vectors of weight one
or less. These vectors fill Fqn completely, where n = 2r − 1. Consequently, the coset weights
are distributed according to
n for 0 ≤ i ≤ 1,
αi = i
0 for i > 1.
That is, α0 = 1, α1 = n = 2r − 1, α2 = α3 = . . . = αn = 0. Note that the total number of
cosets is α0 + α1 = 2r = 2n−k and each of them contain 2k vectors, where k = n − r.
Problem 3.5: For all r ∈ N, describe how to construct from Ham(r, 2) a code of
r
length n = 2r with minimum distance d = 4 that contains M = 22 −1−r codewords.
Prove that the minimum distance of your code is 4 and that M is the maximum
number of possible codewords for these parameters.
r
Extend the Hamming code Ham(r, 2), with length n − 1 = 2r − 1, M = 2n−r = 22 −1−r ,
and distance 3 by adding a parity check to produce a code with n = 2r but still M codewords.
Since the parity check guarantees that the weight of all extended codewords is even, we know
that the distance between any two of these codewords x and y is w(x − y) = w(x) + w(y) −
2w(xy), which is even. Hence the minimum distance of the extended code, which is at least 3
and certainly no more than 4, must in fact be 4. We also know that the Hamming code is
perfect. The extended Hamming code is not perfect, but we know by Corollary 1.3.1 that
the maximum number of codewords for the parameters (n, 4) is the same as the maximum
number M of codewords for the parameters (n − 1, 3).
Chapter 4
Golay Codes
We saw in the last chapter that the linear Hamming codes are nontrivial perfect
codes.
A. Yes, two other linear perfect codes were found by Golay in 1949. In addition,
several nonlinear perfect codes are known that have the same n, M , and d
parameters as Hamming codes.
The condition for a code to be perfect is that its n, M , and d values satisfy the
sphere-packing bound
t
X n
M (q − 1)k = q n , (4.1)
k =0
k
with d = 2t + 1. Golay found three other possible integer triples (n, M, d) that do
not correspond to the parameters of a Hamming or trivial perfect code. They are
(23, 212 , 7) and (90, 278 , 5) for q = 2 and (11, 36 , 5) for q = 3. It turns out that there
do indeed exist linear binary [23, 12, 7] and ternary [11, 6, 5] codes; these are known as
Golay codes. But, as we shall soon, it is impossible for linear or nonlinear (90, 278 , 5)
codes to exist.
Problem 4.1: Show that the (n, M, d) triples (23, 212 , 7), (90, 278 , 5) for q = 2, and
(11, 36 , 5) for q = 3 satisfy the sphere-packing bound (1.1).
Remark: In view of Theorem 1.3, a convenient way of finding a binary [23, 12, 7]
Golay code is to construct first the extended Golay [24, 12, 8] code, which is just
the [23, 12, 7] Golay code augmented with a final parity check in the last position
(such that the weight of every codeword is even).
43
44 CHAPTER 4. GOLAY CODES
The extended binary Golay [24, 12, 8] code C24 can be generated by the matrix
G24 defined by
1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1
0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 0 0 0 1 0
0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 1 0 1
0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 1 0 1 1
0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 1 0 1 1 0
0 0 0 0 0 1 0 0 0 0 0 0 1 1 1 0 0 0 1 0 1 1 0 1
.
0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 1 1 0 1 1
0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 1 0 1 1 1
0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 1 0 1 1 1 0
0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 1 0 1 1 1 0 0
0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 0 1 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 1 1 1 0 0 0 1
Problem 4.2: Show that u·v = 0 for all rows u and v of G24 . Hint: note that the
first row of G is orthogonal to itself. Then establish that u·v = 0 when u is the
second row and v is any row of G24 . Then use the cyclic symmetry of the rows of
the matrix A0 formed by deleting the first column and first row of A.
Remark: The above exercise establishes that the rows of G24 are orthogonal to each
other. Noting that the weight of the rows of G24 is either 12 or 8, we make use of
the following result.
Problem 4.3: Let C be a binary linear code with generator matrix G. If each row
of G is orthogonal to itself and all other rows and has weight divisible by 4, prove
that C ⊂ C ⊥ and that the weight of every codeword in C is a multiple of 4.
⊥
Remark: Since k = 12 and n − k = 12, the linear spaces C24 and C24 have the same
⊥ ⊥
dimension. Hence C24 ⊂ C24 implies C24 = C24 . This means that the parity check
matrix H24 = [A | 112 ] for C24 is also a generator matrix for C24 !
We are now ready to show that distance of C24 is 8 and, consequently, that the bi-
nary Golay [23, 12] code generated by the first 23 columns of G24 must have minimum
distance either 7 or 8. But since the third row of this reduced generator matrix is a
codeword of weight 7, we can then be sure that the minimum distance is exactly 7.
45
Theorem 4.1 (Extended Golay [24, 12] code): The [24, 12] code generated by G24 has
minimum distance 8.
Proof: We know that the weight of the code generated by G24 must be divisible
by 4. Since both G24 and H24 are generator matrices for the code, any codeword
can be expressed either as a linear combination of the rows of G24 or as a linear
combination of the rows of H24 . We now show that a codeword x ∈ C24 cannot have
weight 4. It is not possible for the all of the left-most twelve bits of x to be 0 if x
is some nontrivial linear combination of the rows of G24 . Likewise, it is not possible
for all of the right-most twelve symbols of x to be 0 if x is some nontrivial linear
combination of the rows of H24 . If exactly one of the left-most (right-most) twelve
bits of x were 1, then x would then be identical to a row of G24 (H24 ), none of which
has weight 4. The only possible codeword of weight 4 is therefore a sum of two rows
of G24 , but it is easily seen (again using the cyclic symmetry of A0 ) that no two rows
of G24 differ in only four positions. Since the weight of every codeword in C24 must be
a multiple of 4, we now know that C24 must have a minimum distance of at least 8. In
fact, since the second row of G24 is a codeword of weight 8, we see that the minimum
distance of C24 is exactly 8.
Problem 4.4: Show that the ternary Golay [11, 6] code generated by the first 11
columns of the generator matrix
1 0 0 0 0 0 0 1 1 1 1 1
0 1 0 0 0 0 1 0 1 2 2 1
0 0 1 0 0 0 1 1 0 1 2 2
G12 =
0 0 0 1
0 0 1 2 1 0 1 2
0 0 0 0 1 0 1 2 2 1 0 1
0 0 0 0 0 1 1 1 2 2 1 0
from which we see that w(x) = 5 and d(x, y) = w(x − y) = 2. This means that x
must have a 1 in every position that y does.
46 CHAPTER 4. GOLAY CODES
Let X be the set of all codewords of weight 5 that begin with two 1s. We know
that for each y ∈ Y there is a unique x ∈ X such that d(x, y) = 2. That is, there are
exactly |Y | = 88 elements in the set {(x, y) : x ∈ X, y ∈ Y, d(x, y) = 2}. But each
x ∈ X contains exactly three ones after the first two positions. Thus, for each x ∈ X
there are precisely three vectors y ∈ Y such that d(x, y) = 2. That is, 3 |X| = 88.
This is a contradiction, since |X| must be an integer.
Remark: In 1973, Tietävainen, based on work by Van Lint, proved that any non-
trivial perfect code over the field Fqn must either have the parameters ((q r − 1)/(q −
1), q n−r , 3) of a Hamming code, the parameters (23, 212 , 7) of the binary Golay code,
or the parameters (11, 36 , 5) of the ternary Golay code.
Problem 4.5: Consider the extended ternary (q = 3) Golay [12, 6, 6] code C12 gen-
erated by
1 0 0 0 0 0 0 1 1 1 1 1
0 1 0 0 0 0 1 0 1 2 2 1
0 0 1 0 0 0 1 1 0 1 2 2
G12 = 0
0 0 1 0 0 1 2 1 0 1 2
0 0 0 0 1 0 1 2 2 1 0 1
0 0 0 0 0 1 1 1 2 2 1 0
⊥
(a) Use the fact that C12 is self-orthogonal (C12 ⊂ C12 ) to find a parity-check
matrix for C12 .
Since n − k = 12 − 6 = 6 = k, we know that the linear subspace C12 ⊥ and C
12 have the
⊥
same dimension, so C12 = C12 . Hence, G12 itself is a parity check matrix for C12 .
(b) Decode the received vector y = 010 000 010 101, assuming that at most two
errors have occurred.
The syndrome of this vector is
0 0 0
1 1 0
1 1
= + 2 0 .
1 1 0
1 1 0
0 1 1
Thus an error of 1 has occurred in position 7 and an error of 2 has occurred in position 6.
That is, the error vector is e = 000 002 100 000, so the transmitted vector was x = y − e =
010 001 210 101.
Chapter 5
Finite Fields
Until now we have always restricted our attention to the case where Fq is the set
Zq = {0, 1, 2, . . . , q − 1} for some prime number q. In this chapter, we review the fact
that Zq is a finite field if and only if q is prime. We then ask whether there exist finite
fields of order q when q is not a prime number. We will conclude that there exist
finite fields of order q if and only if q is an integral power pr of some prime p. Finally,
we demonstrate that although general finite fields of order q = pr may at first appear
to be somewhat complicated, they all have a very simple underlying structure.
Proof:
Theorem 5.2 (Subfield Isomorphic to Zp ): Every finite field has the order of a power
of a prime p and contains a subfield isomorphic to Zp .
47
48 CHAPTER 5. FINITE FIELDS
a1 x1 + a2 x2 + . . . + ar xr = 0 ⇒ a1 = a2 = . . . = 0, where ai ∈ Zp .
There must be at least one such subset having a maximal number of elements. Then,
if x is any element of F , the elements {x, x1 , . . . , xr } cannot be linearly independent,
so that x can be written as a linear combination of {x1 , . . . , xr }. Thus {x1 , . . . , xr }
forms a basis for F , so that the elements of F may be uniquely identified by all
possible values of the coefficients a1 , a2 , . . . , ar . Since there are p choices for each of
the r coefficients, there are exactly pr distinct elements in F .
Proof: Theorem 5.2 says that the prime p must be the power of a prime, which
can only be p itself. It also says that F contains Zp . Since the order of Zp is already p,
there are no other elements in F .
Proof:
• For example, we can construct a field with 8 = 23 elements using the polynomial
g(x) = x3 + x + 1 in Z2 [x]. Note that g is irreducible: the fact that g(c) =
c3 + c + 1 6= 0 for all c ∈ Z2 , implies that g(x) cannot have a linear factor (x − c).
Alternatively, we can establish the irreducibility of g in Z2 [x] directly: If g(x) =
(x2 + Bx + C)(x + D) = x3 + (B + D)x2 + (C + BD)x + CD, then
CD = 1 ⇒ C = D = 1
and hence
C + BD = 1 ⇒ B = 0,
which contradicts B + D = 0.
That is, if a and b are two polynomials in Z2 [x]/g, their product can be zero
(mod g) only if one of them is itself zero. Thus, Z2 [x]/g is a field with exactly
23 = 8 elements, corresponding to the 2 possible choices for each of the 3 polynomial
coefficients.
Definition: The Euler indicator or Euler totient function
.
ϕ(n) = |{m ∈ N : 1 ≤ m ≤ n, (m, n) = 1}|
is the number of positive integers less than or equal to n that are relatively prime
(share no common factors).
• ϕ(pr ) = pr − pr−1 for any prime number p and any r ∈ N since p, 2p, 3p, . . .,
(pr−1 − 1)p all have a factor in common with pr .
Remark: If we denote the set of integers in Zn that are not zero divisors by Z∗n , we
see that ϕ(n) = |Z∗n |.
x 1 2 3 4 5 6 7 8 9 10 11 12
ϕ(x) 1 1 2 2 4 2 6 4 6 4 10 4
Problem 5.1: The Chinese remainder theorem implies that ϕ(mn) = ϕ(m)ϕ(n)
whenever (m, n) = 1. Use this result to prove for any n ∈ N that
X
ϕ(d) = n.
d|n
Remark: Theorem 5.4 states that the elements of a finite field Fq can be listed in
terms of a primitive element, say α:
Fq = {0, α0 , α1 , α2 , . . . , αq−2 }.
Remark: The fact that all elements in a field Fq can be expressed as powers of a
primitive element can be exploited whenever we wish to multiply two elements
together. We can compute the product αi αj simply by determining which element
can be expressed as α raised to the power (i + j) mod(q − 1), in exactly the same
manner as one uses a table of logarithms to multiply real numbers.
Remark: A primitive element of a finite field Fpr need not be unique. In fact, we
see from the proof of Theorem 5.4 that the number of such elements is ϕ(pr − 1).
Specifically, if α is a primitive element, then the powers αi , for the ϕ(pr − 1) values
of i that are relatively prime to pr − 1, are also primitive elements.
Problem 5.3: Show that every nonzero element β of a finite field satisfies β q−1 = 1
in Fq .
Remark: In particular, Corollary 5.4.1 states that every element β in a finite field
Fpr is a root of some polynomial f (x) ∈ Fp [x].
Definition: Given an element β in a field Fpr , the monic polynomial m(x) in Fp [x]
of least degree with β as a root is called the minimal polynomial of β.
Theorem 5.5 (Minimal Polynomial): Let β ∈ Fpr . If f (x) ∈ Fp [x] has β as a root,
then f (x) is divisible by the minimal polynomial of β.
Proof: Let m(x) be the minimal polynomial of β. If f (β) = 0, then expressing
f (x) = q(x)m(x) + r(x) with deg r < deg m, we see that r(β) = 0. By the minimality
of deg m, we see that r(x) is identically zero.
Corollary 5.5.1 (Minimal Polynomials Divide xq − x): The minimal polynomial of
an element of a field Fq divides xq − x.
52 CHAPTER 5. FINITE FIELDS
Proof:
Proof: Exercise.
Theorem 5.7 (Reciprocal Polynomials): In a finite field Fpr the following statements
hold:
(a) If α ∈ Fpr is a nonzero root of f (x) ∈ Fp [x], then α−1 is a root of the reciprocal
polynomial of f (x).
Proof: Exercise.
Problem 5.4: Show that the pth powers of distinct elements of a field Fpr are distinct.
53
Problem 5.5: Show that the only elements a of Fpr that satisfy the property that
ap−1 = 1 in Fpr are the p − 1 nonzero elements of Fp .
Suppose we want to find the minimal polynomial m(x) of αi in Fpr . Identify the
2
set of distinct elements {αi , αip , αip , . . .}. The powers of α modulo pr − 1 in this set
form the cyclotomic coset of i. Suppose there are s distinct elements in this set. By
Corollary 5.6.1, each of these elements are distinct roots of m(x), and so the monic
polynomial
s−1
k
Y
f (x) = (x − αip )
k=0
Problem 5.6: Show that the elements generated by powers belonging to a particular
cyclotomic coset share the same minimal polynomial and order.
Remark: Every primitive polynomial of Fpr has degree r and each of its roots is
a primitive element of Fpr . This follows immediately from the distinctness of the
2 r−1 r
elements α, αp αp , . . ., αp for a primitive element α of Fpr , noting that αp = α.
• We now find the minimal polynomial for each of the 16 elements of the field F24 =
F2 [x]/(x4 + x3 + 1). It is straightforward to check that the polynomial x is a
primitive element of the field (see Table 5.1). Since x is a root of the irreducible
polynomial x4 + x3 + 1 in F2 [x]/(x4 + x3 + 1), we know from Corollary 5.5.2 that
x4 + x3 + 1 is the minimal polynomial of x and hence is a primitive polynomial
of F16 . The cyclotomic cosets consist of powers i2k (mod 15) of each element αi :
{1, 2, 4, 8},
54 CHAPTER 5. FINITE FIELDS
Remark: The cyclotomic cosets containing powers that are relatively prime to pr − 1
contain the ϕ(pr − 1) primitive elements of Fpr ; their minimal polynomials are
primitive and have degree r. Note that x4 + x3 + 1 and x4 + x + 1 are primitive
polynomials of F2 [x]/(x4 + x3 + 1) and their roots comprise the ϕ(15) = ϕ(5)ϕ(3) =
4 · 2 = 8 primitive elements of Fpr . Even though the minimal polynomial of the
element α3 also has degree r = 4, it is not a primitive polynomial, since (α3 )5 = 1.
Table 5.1: Nonzero elements of the field F2 [x]/(x4 + x3 + 1) expressed in terms of the
primitive element α = x.
and hence α = eiθ = cos θ + i sin θ for some real number θ. In fact, we see that
θ = 2π/3. Thus α3 = e3θi = e2πi = 1. In this way, we have constructed a number
α that is a primitive third root of unity, which is precisely what we mean when we
say that α is a primitive element of F4 . The field F4 may be thought of either as
the set {0, 1, x, x + 1} or as the set {0, 1, e2πi/3 , e−2πi/3 }, as illustrated in Fig. 5.1.
Similarly, the field F3 = {0, 1, 2} is isomorphic to {0, 1, −1} and F5 = {0, 1, 2, 3, 4}
is isomorphic to {0, 1, i, −1, −i}.
Cyclic Codes
Cyclic codes are an important class of linear codes for which the encoding and de-
coding can be efficiently implemented using shift registers. In the binary case, shift
registers are built out of two-state storage elements known as flip-flops and arith-
metic devices called binary adders that output the sum of their two binary inputs,
modulo 2.
Many common linear codes, including Hamming and Golay codes, have an equiv-
alent cyclic representation.
Definition: A linear code C is cyclic if
Remark: If x is a codeword of a cyclic code C, then all cyclic shifts of x also belong
to C.
• The (7, 16, 3) perfect code in Chapter 1, which we now know is equivalent to
Ham(3, 2), is cyclic.
• The binary linear code (0000, 1001, 0110, 1111) is not cyclic. However, upon inter-
changing the third and fourth positions, we note that it is equivalent to the linear
code (0000, 1010, 0101, 1111), which is cyclic.
It is convenient to identify a codeword a0 a1 . . . an−1 in a cyclic code C with the
polynomial
c(x) = a0 + a1 x + a2 x2 + . . . + an−1 xn−1 .
Then an−1 a0 a1 . . . an−2 corresponds to the polynomial
56
57
Definition: The polynomial ring Fq [x ] is the set of all polynomials P (x) with coeffi-
cients in Fq .
.
Definition: The residue class ring Rqn = Fq [x]/(xn − 1) is the set of all polynomial
remainders obtained by long division of polynomials in Fq [x] by xn − 1. That is,
Rqn is the set of all polynomials of degree less than n.
Remark: A cyclic code in Fqn can be thought of as a particular subset of the residue
class polynomial ring Rqn . In fact, the following theorem shows that a cyclic code C
is an ideal of Rqn .
Theorem 6.1 (Cyclic Codes are Ideals): A linear code C in Rqn is cyclic ⇐⇒
Remark: The next theorem states that every ideal in Rqn is a principal ideal (i.e. Rqn
is a Principal Ideal Domain).
58 CHAPTER 6. CYCLIC CODES
Definition: The monic polynomial of least degree in Theorem 6.2 is called the
generator polynomial of C.
Theorem 6.3 (Lowest Generator Polynomial Coefficient): Let g(x) = g0 +g1 x+. . .+
gr xr be the generator polynomial of a cyclic code. Then g0 6= 0.
Proof: Suppose g0 = 0. Then xn−1 g(x) = x−1 g(x) is a codeword of degree r − 1,
contradicting the minimality of deg g.
Theorem 6.4 (Cyclic Generator Matrix): A cyclic code with generator polynomial
g(x) = g0 + g1 x + . . . + gr xr
has dimension n − r and generator matrix
g0 g1 g2 . . . gr 0 0 ... 0
0 g0 g1 g2 . . . gr 0 ... 0
G= 0 0 g0 g1 g2 ... gr ... 0 .
. .. .. .. .. .. ..
.. . . . . . .
0 0 ... 0 g0 g1 g2 ... gr
59
Proof: Let c(x) be a codeword in a cyclic code C with generator g(x). From
Theorem 6.2, we know that
c(x) = q(x)g(x)
for some polynomial q(x). Note that deg q < n − r since deg c < n. That is,
c(x) = q0 + q1 x + . . . + qn−r−1 xn−r−1 g(x) = q0 g(x)+q1 xg(x)+. . .+qn−r−1 xn−r−1 g(x),
which is a linear combination of the n − r rows g(x), xg(x), x2 g(x), . . . , xn−r−1 g(x)
of G. The diagonal of nonzero g0 s next to a lower-triangular zero submatrix ensures
that the rows of G are linearly independent. Thus, the span of the rows of G is the
n − r dimensional code C.
Remark: Together, Theorems 6.1 and 6.2 say that an [n, k] code is cyclic ⇐⇒ it
is generated by a factor of xn − 1. The following lemma is useful in finding these
factors.
Lemma 6.1 (Linear Factors): A polynomial c(x) has a linear factor x − a ⇐⇒
c(a) = 0.
Proof: Exercise.
Definition: A polynomial is said to be irreducible in Fq [x] if it cannot be factored
into polynomials of smaller degree.
Lemma 6.2 (Irreducible 2nd or 3rd Degree Polynomials): A polynomial c(x) in Fq [x]
of degree 2 or 3 is irreducible ⇐⇒ c(a) 6= 0 for all a ∈ Fq .
Proof: c(x) can be factored into polynomials of smaller degree ⇐⇒ it has at
least one linear factor (x − a) ⇐⇒ c(a) = 0, by Lemma 6.1.
• Suppose we wish to find all ternary cyclic codes of length n = 4. The generators
for such codes must be factors of x4 − 1 in the ring F3 [x]. Since 1 is a root of the
equation x4 − 1 we know that (x − 1) is a factor and hence
(x4 − 1) = (x − 1)(x3 + x2 + x + 1)
By Lemma 6.2, the factor x3 + x2 + x + 1 is not irreducible because it has a linear
root at a = 2 = −1 in F3 . Using long division, we obtain
(x4 − 1) = (x − 1)(x + 1)(x2 + 1).
Since any combination of these three irreducible factors can be used to construct
a generator polynomial g(x) for a cyclic code, there are a total of 23 = 8 ternary
cyclic codes of length 4, as illustrated in Table 6.1. Upon examining the weights of
the rows of the possible generator matrices, we see that the generated codes either
have minimum distance less than or equal to 2 or else equal to 4. Hence, it is not
possible to have a cyclic code of length 4 and minimum distance 3. In particular,
Ham(2, 3), for which n = (32 − 1)/(3 − 1) = 4, cannot be cyclic. Thus, not all
Hamming codes have a cyclic representation.
60 CHAPTER 6. CYCLIC CODES
g(x) G
1 0 0 0
0 1 0 0
1
0
0 1 0
0 0 0 1
−1 1 0 0
x−1 0 −1 1 0
0 0 −1 1
1 1 0 0
x+1 0 1 1 0
0 0 1 1
2 1 0 1 0
x +1
0 1 0 1
2 −1 0 1 0
(x − 1)(x + 1) = x − 1
0 −1 0 1
(x − 1)(x2 + 1) = x3 − x2 + x − 1 [ −1 1 −1 1 ]
(x + 1)(x2 + 1) = x3 + x2 + x + 1 [1 1 1 1]
x4 − 1 = 0 [0 0 0 0]
Table 6.1: Generator polynomial g(x) and corresponding generator matrix G for all
possible ternary cyclic codes of length 4.
61
An easy way to find the parity check matrix for a cyclic [n, k] code (without
requiring that we first put G given by Theorem 6.4 in standard form) is to first
construct the check polynomial h(x) of C from its generator polynomial g(x), where
h(x) satisfies
xn − 1 = g(x)h(x)
in Fq [x]. Since g is monic and has degree n − k, we see that h is monic and has degree
k.
Proof:
“⇐” We can express any polynomial c(x) in Rqn as c(x) = q(x)g(x) + r(x)
where deg r < deg g = n − k. If c(x)h(x) = 0 then
Theorem 6.6 (Cyclic Parity Check Matrix): A cyclic code with check polynomial
h(x) = h0 + h1 x + . . . + hk xk
Proof: Since the degree of the generator polynomial g is r = n−k, by Theorem 6.4,
the dimension of the code must be k. From Theorem 6.5, we know that a codeword
c(x) = c0 +c1 x+. . .+cn−1 xn−1 must satisfy c(x)h(x) = 0. In particular, the coefficients
62 CHAPTER 6. CYCLIC CODES
But then, since each of these equations is one of the n − k rows of the matrix equation
the codewords are orthogonal to all cyclic shifts of the vector hk hk−1 hk−2 . . . h0 00 . . . 0.
The codewords are thus orthogonal to all linear combinations of the rows of H. This
means that C ⊥ contains the span of the rows of H. But hk = 1, so we see that H
has rank n − k and hence generates exactly the linear subspace C ⊥ . That is, H is a
parity check matrix for the code with check polynomial h(x).
h(x) = h0 + h1 x + . . . + hk xk
Remark: Since
we see that h(x) is a factor of xn − 1. In view of Theorems 6.1, 6.2, and 6.6, this
says that C ⊥ is itself a cyclic code, with (monic) generator h−1
0 h(x).
We are now able to show that all binary Hamming codes have an equivalent cyclic
representation.
Theorem 6.7 (Cyclic Binary Hamming Codes): The binary Hamming code Ham(r, 2)
is equivalent to a cyclic code.
H = [ 1 α α2 . . . αn−1 ]
is seen to be the parity check matrix for C = Ham(r, 2) since its columns are precisely
the distinct nonzero vectors of F2r . A codeword c(x) = c0 + c1 x + . . . + cn−1 xn−1 in
this code must then satisfy the vector equation c0 + c1 α1 + c1 α2 + . . . + cn−1 αn−1 = 0,
so that
C = {c(x) ∈ R2n : c(α) = 0 in F2 [x]/p(x)}.
If c(x) ∈ C and r(x) ∈ R2n , we have r(α)c(α) = r(α)0 = 0 in F2 [x]/p(x), noting that
αn = 1, so r(x)c(x) is also an element of C. Theorem 6.1 then implies that C is
cyclic.
• The irreducible polynomial x3 + x + 1 in F2 [x] can be used to generate the field
F8 = F2 [x]/(x3 + x + 1) with 23 = 8 elements. Note that F8 has x as a primitive
element since all polynomials in F2 [x] of degree less than 3 can be expressed as
powers of x:
F8 = {0, 1, x, x2 , x3 = x + 1, x4 = x2 + x, x5 = x2 + x + 1, x6 = x2 + 1}.
Note that x7 = x3 + x = 1; that is, the primitive element has order 7 = 8 − 1. The
primitive element x is a root of the primitive polynomial x3 + x + 1 in F8 .
A parity check matrix for a cyclic version of the Hamming code Ham(3, 2) is thus
1 0 0 1 0 1 1
H = 0 1 0 1 1 1 0 .
0 0 1 0 1 1 1
A. The close parallel between Theorem 6.2 and Theorem 5.5 when n = pr −1 gives us
a clue: on comparing these results, we see from Theorem 6.4 that any minimal
polynomial of a primitive element β of Fpr is a generator polynomial for a cyclic
code in Fpr (as we saw in Chapter 5, β is a root of xn −1 = 0). In particular, the
following corollary to Theorem 6.7 establishes that Ham(r, 2) can be generated
by any primitive polynomial of F2r , which is just a monic irreducible polynomial
in F2 [x] having a primitive element as a root.
64 CHAPTER 6. CYCLIC CODES
0 0 0 1 1 0 1
Chapter 7
BCH Codes
For noisy transmission lines, Hamming codes are of limited use because they cannot
correct more than one error. In this chapter, we discuss a class of important and
widely used cyclic codes that can correct multiple errors, developed by R. C. Bose
and D. K. Ray-Chaudhuri (1960) and independently by A. Hocquenghem (1959),
known as Bose–Chaudhuri–Hocquenghem (BCH) codes.
Definition: Let α be an element of order n in a finite field Fqs . A BCH code of
length n and design distance d is a cyclic code generated by the product of the
distinct minimal polynomials in Fq [x] of the elements α, α2 , . . . , αd−1 .
Remark: We will establish that a BCH code of odd design distance d has a minimum
distance of at least d, by showing that such a code can correct (d − 1)/2 errors.
To encode
Pk−1 the message word a0 a1 . . . ak−1 , we represent it by the polynomial
i
f (x) = i=0 ai x and form its product with the generator polynomial g(x), to obtain
the codeword c(x) = f (x)g(x).
• For the primitive element α = x of the field F24 = F2 [x]/(x4 +x+1), we can construct
a [15, 7] code that can correct two errors, by finding a generator polynomial g(x)
that has roots at α, α2 , a3 , and α4 . Such a generator can be created from the
product of the minimal polynomials m1 (x) = x4 + x + 1 of α and m3 (x) = x4 +
x3 + x2 + x + 1 of α3 :
g(x) = m1 (x)m3 (x) = (x4 + x + 1)(x4 + x3 + x2 + x + 1) = x8 + x7 + x6 + x4 + 1.
In fact, g(x) has even more roots than prescribed, namely at α, α2 , α4 , α8 , α3 , α6 ,
α12 , and α9 . Once we have shown that this code can correct two errors, we will
know that its minimum distance is exactly 5 since the codeword g(x) has weight 5.
65
66 CHAPTER 7. BCH CODES
Remark: In the binary case q = 2, the generator of a BCH code is just the product
of the distinct minimal polynomials of the odd powers, from 1 to d − 1, of the
primitive element.
Problem 7.1: Show that a polynomial c(x) belongs to a BCH code with design
distance d ⇐⇒ c(α) = c(α2 ) = . . . = c(αd−1 ) = 0.
We now describe the decoding procedure for BCH codes. To keep the notation
simple we begin by illustrating the procedure first for the binary case, where q = 2.
Suppose that y(x) is received rather than c(x) and that t errors have occurred. Then
the error polynomial e(x) = y(x) − c(x) can be written as e(x) = x`1 + x`2 + . . . + x`t
for some unknown powers `1 , `2 , . . . , `t . We then compute the syndrome S1 by
substituting α into y(x),
.
S1 = y(α) = c(α) + e(α) = e(α) = e1 + . . . + et ,
.
where ei = α`i for i = 1, 2, . . . , t. Likewise, we evaluate
.
S2 = y(α2 ) = c(α2 ) + e(α2 ) = e(α2 ) = e21 + . . . + e2t ,
.
S3 = y(α3 ) = c(α3 ) + e(α3 ) = e(α3 ) = e31 + . . . + e3t ,
...
.
Sd−1 = y(αd−1 ) = c(αd−1 ) + e(αd−1 ) = e(αd−1 ) = e1d−1 + . . . + ed−1
t .
The decoding problem now amounts to determining if there is a value of t and choices
of field elements e1 , e2 , . . . , et consistent with the above equations. If a solution
exists, from a table of the elements of Fqs , one would then like to determine the
.
corresponding powers `1 , `2 , . . . , `t such that ei = α`i . These powers tell us directly
which bits we need to toggle. To find a possible solution to the above equations, the
following definition will be helpful.
To understand how the above syndrome equations are solved, it will be helpful to
first discuss the case where d = 5 and t ≤ 2 errors have occurred. We define ei = 0
for i > t and write
S1 = e1 + e2 ,
S2 = e21 + e22 ,
67
S3 = e31 + e32 ,
S4 = e41 + e42 .
The error locator polynomial is σ(x) = b2 x2 +b1 x+1. Since σ(e−1
i ) = 0 for i = 1, 2
we know that
0 = e31 σ(e−1 3 −2 −1 2 3
1 ) = e1 (b2 e1 + b1 e1 + 1) = b2 e1 + b1 e1 + e1
and
0 = e32 σ(e−1 2 3
2 ) = b2 e2 + b1 e2 + e2 .
On adding these equations, we obtain
0 = b2 (e1 + e2 ) + b1 (e21 + e22 ) + (e31 + e32 ),
i.e.
S1 b2 + S2 b1 = −S3 .
If for each i we had multiplied σ(e−1 4 3
i ) = 0 by ei instead of ei and added the resulting
equations, we would have obtained
S2 b2 + S3 b1 = −S4 .
To find b1 and b2 , we only need to solve the system of equations
S1 S2 b2 S
=− 3 (7.1)
S2 S3 b1 S4
(Of course, in the binary case, the minus sign can be omitted.) If the coefficient
matrix in Eq. (7.1) has rank 0, then S1 = S2 = S3 = S4 = 0 and hence e1 + e2 = 0,
so that e1 = e2 . This would imply that `1 = `2 , in which case e(x) = 0; that is, no
error has occurred. That is, the system of equations will have rank 0 ⇐⇒ no errors
have occurred.
Suppose that the coefficient matrix has rank 1. Since q = 2, we know that S2 = S12 .
Note that S1 6= 0, for otherwise the first equation would imply that S3 = 0 and the
rank of the coefficient matrix would be 0. Since the determinant S1 S3 − S22 = 0, we
deduce S3 = S13 . But
e31 + e32 = (e1 + e2 )3 ⇒ 0 = 3e1 e2 (e1 + e2 ) = 3e1 e2 S1 .
This implies that e2 = 0 (only one error has occurred) and that S1 = e1 = α`1 .
Conversely, if only one error has occurred, then S3 = S13 6= 0 and the coefficient
matrix of Eq. (7.1) will have rank 1. Using a power table for F24 , we simply look
up the exponent `1 such that α`1 = S1 and then toggle bit `1 of y(x) to obtain the
transmitted codeword c(x).
Finally, if the rank of the coefficient matrix is 2, we can solve for the coefficients b1
and b2 . If two errors have occurred, the error locator polynomial σ(x) = b2 x2 +b1 x+1
must have two roots in F24 , which we can determine by trial and error. The powers
of α associated with the inverses of these roots identify the two bit positions in which
errors have occurred.
68 CHAPTER 7. BCH CODES
• Let us demonstrate this decoding scheme for the [15, 7] BCH code generated by
the polynomial g(x) = x8 + x7 + x6 + x4 + 1. Given the message word 110 0000,
the transmitted codeword is 110 011 100 100 000, i.e. c(x) = (1 + x) g(x) =
x9 + x6 + x5 + x4 + x + 1. Suppose that two errors have occurred, so that the
received vector is 110 010 101 100 000, that is, y(x) = x9 + x8 + x6 + x4 + x + 1.
Consulting the power table for F2 [x]/(x4 + x + 1), we see that the syndromes are
S1 = y(α) = α9 + α8 + α6 + α4 + α + 1
= (α3 + α) + (α2 + 1) + (α3 + α2 ) + (α + 1) + α + 1 = α + 1 = α4 ,
S2 = S12 = α8 ,
S3 = y(α3 ) = α27 + α24 + α18 + α12 + α3 + 1
= α12 + α9 + α3 + α12 + α3 + 1 = α9 + 1 = α3 + α + 1 = α7 ,
S4 = S22 = α16 = α.
Since S1 6= 0 and S1 S3 −S22 = S1 (S3 −S13 ) = S1 (α7 −α12 ) 6= 0, the system of equations
S1 b2 + S2 b1 = S3 ⇒ α4 b2 + α8 b1 = α7 ,
S2 b2 + S3 b1 = S4 ⇒ α8 b2 + α7 b1 = α,
has rank 2. Upon adding α4 times the first equation to the second equation, we see
that
(α12 + α7 )b1 = α11 + α.
Thus, b1 = α−2 α6 = α4 and hence b2 = α−4 (α7 + α8 α4 ) = α−4 α2 = α13 . Thus, the
error polynomial is
σ(x) = α13 x2 + α4 x + 1 = (e1 x − 1)(e2 x − 1)
We determine the roots of this equation by trial and error, That is, we search through
the field until we find an i such that α2i−2 + αi+4 = 1. Incrementing from i = 0, the
first such i we find is i = 7, so one root is α7 . The inverse, say e1 , of this root
is α8 . Since the product e1 e2 = b2 = α13 , we immediately see that e2 = α5 . That is,
the two errors are in the fifth and eighth positions, so we decode the received vector
110 010 101 100 000 as the codeword 110 011 100 100 000. Upon division of the
associated polynomial x9 + x6 + x5 + x4 + x + 1 by g(x) = x8 + x7 + x6 + x4 + 1 we
obtain x + 1, which corresponds to the original message, 110 0000.
• In the previous example, if instead the received vector was 110 010 100 100 000,
that is, y(x) = x9 + x6 + x4 + x + 1, the syndromes would be
S1 = y(α) = α9 + α6 + α4 + α + 1
= (α3 + α) + (α3 + α2 ) + (α + 1) + α + 1 = α2 + α = α5 ,
S2 = S12 = α10 ,
S3 = y(α3 ) = α27 + α18 + α12 + α3 + 1
= α12 + α3 + α12 + α3 + 1 = 1,
S4 = S22 = α20 = α5 .
69
Since S3 = S13 6= 0, we know that only one error has occurred. In fact, S1 = α5 ,
so the error must be in the fifth position; one again we see that the transmitted
codeword was 110 011 100 100 000.
In general, decoding requires that we solve the nonlinear system of d−1 syndrome
equations
Si = y(αi ) = ei1 + . . . + eit , i = 1, . . . , d − 1 (7.2)
for the value of t and the errors {ej : j = 1, . . . , t}. Here t ≤ (d − 1)/2 is the actual
number of errors that have occurred, so that each of the values ej for j = 1, . . . , t are
nonzero and distinct.
A straightforward generalization of the t = 2 decoding scheme leads to the t
equations:
0 = et+1
i σ(e−1 2 t t+1
i ) = bt ei + bt−1 ei + . . . + b1 ei + ei ,
0 = et+2
i σ(e−1 2 3 t+1
i ) = bt ei + bt−1 ei + . . . + b1 ei + et+2
i ,
...
−1 t+1
0 = e2t t
i σ(ei ) = bt ei + bt−1 ei + . . . + b1 e2t−1
i + e2t
i .
S1 S2 ... St bt St+1
S2 S3 ... St+1 bt−1 S
. = − t+2
... . (7.3)
. .. ..
.. . . ..
St St+1 . . . S2t−1 b1 S2t
Problem 7.2: Show that we may rewrite the coefficient matrix in Eq. (7.3) as
M = V DV t ,
t
Y
has determinant (ei − ej ).
i,j=1
i>j
Proof: When t = 2 we see that det V = e2 − e1 . For t > 2, suppose that all
(t − 1)×(t − 1) Vandermonde matrices have determinant
t−1
Y t−1 Y
Y i−1
(ei − ej ) = (ei − ej ).
i,j=1 i=1 j=1
i>j
e1 − et e2 − et ... et−1 − et
e1 (e1 − et ) e2 (e2 − et ) . . . et−1 (et−1 − et )
t−1
= (−1) .. .. ..
. . .
et−2 (e − e ) et−2 (e − e ) . . . et−2 (e − e )
1 1 t 2 2 t t−1 t−1 t
1 1 ... 1
e1 e2 . . . et−1
2
= e1 e22 . . . e2t−1 (et − e1 )(et − e2 ) . . . (et − et−1 )
... ..
.
..
.
et−2 et−2 . . . et−2
1 2 t−1
t−1 Y
Y i−1 t−1
Y Yt Y i−1
= (ei − ej ) · (et − ej ) = (ei − ej ).
i=1 j=1 j=1 i=1 j=1
71
Remark: If we attempt to increase the value of t in Eq. (7.2) beyond the actual
number of errors that have occurred, either the values ej will no longer be distinct
or at least one of them will be zero. In either case, M will no longer be invertible.
This gives us a method for finding the number of errors: t is just the largest number
such that
S1 S2 . . . St
. S2 S3 . . . St+1
M = ... .. ..
. .
St St+1 . . . S2t−1
is invertible.
Remark: If it is a priori known that no more than t errors have occurred in a received
polynomial y, then it is impossible for a (t + 1) × (t + 1) or larger syndrome matrix
based on y to be invertible.
Remark: Once we have determined the maximum value of t such that the coeffi-
cient matrix M is invertible, we simply solve the linear system Eq. (7.3) for the
coefficients b1 , b2 , . . . , bt of the error locator polynomial σ(x). We can determine
all t roots of σ(x) simply by searching through all of the field elements (at most
one pass is required). The exponents `1 , `2 , . . . , `t corresponding to the inverses of
these roots precisely identify the t positions of the received vector that are in error.
Remark: The above decoding procedure can be easily extended to nonbinary codes.
In this case, the error vector becomes e(x) = q1 x`1 + q2 x`2 + . . . + qt x`t , where each
qi ∈ Fq , the syndromes become Si = q1 ei1 +. . .+qt eit , and D = diag(q1 e1 , q2 e2 , . . . , qt et ).
We then see that any BCH code of design distance d can correct b(d − 1)/2c errors.
We encapsulate this result in the following theorem.
Theorem 7.2 (BCH Bound): The minimum distance of a BCH code of odd design
distance d is at least d.
Proof: This follows from Theorem 1.1 and the fact that the BCH code can correct
(d − 1)/2 errors.
Remark: Although Theorem 7.2 may be shown to hold also when the design dis-
tance d is even, we are normally interested only in the case of odd d.
Remark: It may happen that the minimum distance of a BCH code exceeds its
design distance d, as illustrated by the following example.
72 CHAPTER 7. BCH CODES
Remark: The special case of an [n, k] BCH code for s = 1, where the primitive
element α comes from the same field Fq as the coefficients of the generator poly-
nomial, is known as a Reed–Solomon code. Note that the minimal polynomial of
any element of Fq has degree s = 1. The generator polynomial of a Reed–Solomon
code of design distance d,
thus has degree n − k = d − 1. That is, the minimum distance of the code must
at least n − k + 1. But since there are rk H ≤ n − k independent columns in
the parity check matrix, we know from Theorem 2.2 that the minimum distance
can be no more than n − k + 1. Thus, a Reed–Solomon code achieves the so-
called singleton upper bound n − k + 1 for the minimum distance of a linear code.
Because Reed–Solomon codes are optimal in this sense and easily implementable
in hardware (using shift registers), they are widely used for error correction in
computer memory chips, magnetic and optical (compact disk) storage systems,
high-speed modems, and data transmission channels.
• Compact disk players use a double-error correcting [255, 251, 5] Reed–Solomon code
over F256 (the symbols are eight-bit bytes) with interleaving of the data over the
disk to increase robustness to localized data loss.
1
Since β 23 = 1, we know from Theorem 5.5 that x23 − 1 = (x − 1)m1 (x)m5 (x), moreover,
Theorem 5.7 implies that m5 (x) = m1 (x)
73
Problem 7.3: Prove that the Hamming code Ham(r, 2) is equivalent to an [n, k]
BCH code with distance d = 3. What is the value of k?
Problem 7.3 establishes the hierarchy of error-correcting codes illustrated in Fig. 7.1.
(e) We know that the minimum distance of C is at least the design distance 7.
Using the fact that g(x) itself is a codeword of C, prove that the actual minimum
distance of C is exactly 7.
The weight of g(x) is 7, so the minimum distance of this linear code cannot exceed 7.
(f) Use the irreducible factors of x15 − 1 (or any other means) to find the check
polynomial h(x) of C.
Since
x16 − x = x(x + 1)m1 (x)m3 (x)m5 (x)m7 (x),
we know that
(h) Suppose the vector 011 111 101 110 111 is received. Without computing
syndromes, find the transmitted codeword. How many errors have occurred?
Hint: Look for an obvious nearby codeword.
Since α15 = 1 in F16 , we see that
That is, the vector consisting of 15 ones is a codeword. Our received vector is only a distance
three away. Errors have occurred in positions 0, 7, and 11, but we can correct three errors.
The transmitted codeword was thus 111 111 111 111 111.
(i) Determine the original transmitted message polynomial and the corresponding
message.
The original message was
Thus, the original message polynomial was m7 (x) = x4 +x+1, corresponding to the message
11001.
Problem 7.6: In the field Z3 [x]/(x2 + 1), with primitive element x + 1, consider the
BCH code C of length n = 8 and design distance 5.
(a) Find the degrees of the minimal polynomials used to construct the generator
polynomial g of C and the degree of g itself.
Since we want α, α2 , α3 , and α4 to be roots of g(x), we know that g must be the
product of m1 (x) (degree 2), m2 (x) (degree 2), and m4 (degree 1). Hence g must have
degree 2 + 2 + 1 = 5. That is, n − k = 5.
(b) What is the dimension k of C?
k = n − 5 = 3.
(c) How many codewords are there in C?
There are 3k = 27 codewords in C.
(d) Compute the syndromes S1 , S2 , and S3 for the received polynomial v(x) =
x + x4 + 2x5 + x6 .
3
(e) Using the value of the syndrome S1 and the syndrome equation
S1 S2 b2 S
=− 3 ,
S2 S3 b1 S4
show that the received polynomial in part(d) contains exactly one error.
Since S1 6= 0 but S1 S3 − S22 = αα3 − (α2 )2 = 0, we know that exactly one error has
occurred.
76 CHAPTER 7. BCH CODES
2x + x3 + x4 + 2x5 + x6 .
Problem 7.7: Consider the [15, 7, 5] primitive BCH code C over F16 . Let α be a
primitive element of F16 .
{0}, {1, 2, 4, 8}, {3, 6, 12, 9}, {5, 10}, {7, 14, 13, 11},
we see that it must include the minimal polynomials of m1 (x) (degree 4) and m3 (x) (degree
4). Since the sum of these degrees is 4 + 4 = 8, we see that g cannot contain any additional
roots.
(c) Without computing syndromes, show that 100 100 100 100 100 is a codeword
of C.
We 3 6 9 12
know3 that
the3 geometric series c(x) = 1 + x + x + x + x has the value (1 +
x3 )−1 5
1 + (x ) for x 6= 1. Then for i = 1, 2, 3, 4,
−1 h 5 i −1 h i i
c(αi ) = 1 + α3i 1 + α3i = 1 + α3i 1 + α15
−1 −1
= 1 + α3i (1 + 1i ) = 1 + α3i 0=0
since α3i 6= 1. That is, α, α2 , α3 , and α4 (but incidentally, not α5 ) are roots of c(x). By
Theorem 5.5, c(x) is a multiple of the minimal polynomial of each of these elements, and
therefore, of g(x) itself. Thus c(x) is in the code generated by g(x).
Chapter 8
Cryptographic Codes
In contrast to error-correcting codes, which are designed only to increase the reliability
of data communications, cryptographic codes are designed to increase their security.
In cryptography, the sender uses a key to encrypt a message before it is sent through
an insecure channel (such as a telephone line, radio transmission or the internet).
An authorized receiver at the other end then uses a key to decrypt the received data
to a message. Often, data compression algorithms and error-correcting codes are
used in tandem with cryptographic codes to yield communications that are both
efficient, robust to data transmission errors, and secure to eavesdropping and/or
tampering. Typically, data compression is performed first; the resulting compressed
data is then encrypted and finally encoded with an error-correcting scheme before
being transmitted through the channel.
Definition: Let K be a set of cryptographic keys. A cryptosystem is a set
{e, d, Ee , Dd : e ∈ K}
of encrypting and decrypting keys, e and d, and their associated encrypting function
Ee and Dd , respectively.
Most cryptosystems, or ciphers, fall into one of two broad classes: symmetric-key
cryptosystems, where essentially the same key is used both to encrypt and decrypt
a message (precisely, where d can be easily determined whenever e is known) and
public-key cryptosystems, where the encryption key e is made publicly available, but
the decryption key d is kept secret and is (hopefully) known only to the receiver.
77
78 CHAPTER 8. CRYPTOGRAPHIC CODES
where d = −e. That is, encoding is accomplished by addition modulo e and decoding
is accomplished by subtraction modulo e. Caeser adopted the value e = 3 to encrypt
the n = 26 symbols of the Roman alphabet, using 0 to represent the letter A and
25 to represent the letter Z. Some fans of the film “2001: A Space Odyssey” even
suggest that the computer name HAL is really a shift cipher for IBM, with e = 25!
A slight generalization of the shift cipher is the affine cipher, defined by
Both shift and affine ciphers are very insecure since they are easily decoded simply
by trying all possible values of a and b! They are both special cases of simple sub-
stitution ciphers or monoalphabetic substitution ciphers, which permute the alphabet
symbols in a prescribed manner. Simple substitution ciphers can be cryptanalyzed
(decoded by an unauthorized third party) by frequency analysis, in which the en-
crypted symbol frequencies are compared to those of the original message language
to determine the applied permutation. Block substitution ciphers or polyalphabetic
substitution ciphers divide the message into blocks of length r and apply different
permutations to the symbols in individual positions of the block. Given enough en-
crypted text, block substitution ciphers are also easily cryptanalyzed once the block
size r is determined, simply by doing a frequency analysis on the letters in each fixed
position of all blocks.
Digraph ciphers map pairs of letters in the message text to encrypted pairs of
letters. A general example of this is the linear block or Hill cipher, which uses an
r × r invertible matrix e to encrypt an entire block m of r message symbols:
• Choose r = 2, n = 26 and
2 1
e= .
3 4
8.A. SYMMETRIC-KEY CRYPTOGRAPHY 79
We see that det e = 5 has no common factors with 26. To find the inverse of 5 in Z26
we use the Euclidean division algorithm: 1 = 5x+26y, 26 = 5·5+1 ⇒ 1 = 26−5·5,
from which we see that x = −5 is a solution. Thus
−1 4 −1 6 5
e = −5 = .
−3 2 15 16
We can use e to encrypt the word “SECRET”, in other words the message 18 4
2 17 4 19, by breaking the message up into vectors of length two: [18 4], [2 17],
[4 19] and then multiplying the transpose of each vector by e on the left. The
result is [14 18], [21 22], [1 10], or the cipher text “OSVWBK”. Note that the two
letters “E” are not mapped to the same symbol in the ciphertext. For this reason
the Hill cipher is less susceptible to frequency analysis (particularly for large block
sizes; however, the number of entries in the key matrix then becomes unreasonably
large).
Problem 8.1: Verify that the original message “SECRET” is recovered when “OS-
VWBK” is decoded with the matrix e−1 .
A special case of the block cipher is the permutation cipher, in which the order of
the characters in every block of text is rearranged in a prescribed manner. Permuta-
tion ciphers can be detected by frequency analysis since they preserve the frequency
distribution of each symbol. In fact, all linear or affine block methods are subject
to cryptanalysis using linear algebra, once r or r + 1 plaintext–ciphertext pairs are
known.
A widely used commercial symmetric-key cryptosystem is the Data Encryption
Standard (DES), which is a type of Feistel cipher endorsed in 1977 by the United
States National Bureau of Standards for the protection of confidential (but unclassi-
fied) government data. In 1981, DES was approved also for use in the private-sector.
Feistel ciphers are block ciphers over F22t . One divides a plaintext block message of 2t
bits into two halves, L0 and R0 , and then performs the iteration
Li = Ri−1 ,
Ri = Li−1 + f (Ri−1 , ei ), i = 1, . . . , r,
where the ei are the keys and f is a specific nonlinear cipher function. The encryption
function is Ee (L0 | R0 ) = Rr | Lr , where | denotes concatenation, and e = (e1 , . . . er )
denotes a set of r encryption keys. The decryption function De (Lr | Rr ) = R0 | L0 , is
implemented by applying the inverse iteration
Li−1 = Ri + f (Li , ei ),
Ri−1 = Li , i = r, . . . , 1.
With Feistel ciphers, the encryption and decryption algorithms are essentially the
same, except that the key sequence is reversed.
80 CHAPTER 8. CRYPTOGRAPHIC CODES
The DES cipher uses the half width t = 32 and r = 16 rounds of encryption.
All 16 keys are generated from a bit sequence of length 64 that is divided into 8
bytes. The eighth bit of each byte is an (odd) parity check, so in fact there are
only 56 information bits in the key. Hence the number of keys that need to be
searched in a brute-force attack on DES is 256 . In fact, because of an inherent ones-
complement symmetry, only 255 keys need to be checked. On a modern supercomputer
this is quite feasible; moreover, DES has recently been shown to be susceptible to
relatively new cryptanalytic techniques, such as differential cryptanalysis and linear
cryptanalysis. Somewhat more secure variants of DES (such as Triple-DES, where
DES is applied three times in succession with three different keys), were developed as
interim solutions. One common application of DES that persists today is its use in
encrypting computer passwords, using the password itself as a key. This is why many
computer passwords are still restricted to a length of eight characters (64 bits).
In October 2000, the Rijndael Cryptosystem was adopted by the National Bureau
of Standards as the Advanced Encryption Standard (AES) to replace DES. It is
based on a combinations of byte substitutions, shifts, and key additions, along with a
diffusion-enhancing technique based on cyclic coding theory, where the data values are
multiplied by the polynomial 3x3 + x2 + x + 2 in the polynomial ring F256 [x]/(x4 − 1).
and q chosen at random, but such that p and q cannot be easily determined from n.1
The receiver then selects a random integer e between 1 and ϕ(n) = (p − 1)(q − 1) that
is relatively prime to ϕ(n) and, using the Euclidean division algorithm, computes
d = e−1 in Zϕ(n) (why does e−1 exist?). The numbers n and e are made publicly
available, but d, p, q are kept secret.
Anyone who wishes to send a message m, where 0 ≤ m < n, to the receiver
encrypts the message using the encoding function
c = Ee (m) = me (mod n)
and transmits c. Because the receiver has knowledge of d, the receiver can decrypt c
using the decoding function
Theorem 8.1 (Modified Fermat’s Little Theorem): If s is prime and a and m are
natural numbers, then
m ma(s−1) − 1 = 0 (mod s).
Corollary 8.1.1 (RSA Inversion): The RSA decoding function De is the inverse of
the RSA encoding function Ee .
We first apply Theorem 8.1 with a = k(q − 1), s = p and then with a = k(p − 1),
s = q, to deduce that m[mk(q−1)(p−1) − 1] is a multiple of both of the distinct primes p
and q, that is, m[mk(q−1)(p−1) − 1] = 0 (mod pq). Thus
1
For example, if p and q are close enough that (p + q)2 − 4n = (p + q)2 − 4pq = (p − q)2 is small,
then the sum p + q could be determined by searching for a small value of p − q such that (p − q)2 + 4n
is a perfect square, which must be p + q. Knowledge of p − q and p + q is sufficient to determine
both p and q.
2
This follows from applying Theorem 5.4 to the field Zs .
82 CHAPTER 8. CRYPTOGRAPHIC CODES
• Let us encode the message “SECRET” (18 4 2 17 4 19) using the RSA scheme with
a block size of 1. The receiver chooses p = 5 and q = 11, so that n = pq = 55 and
ϕ(n) = 40. He then selects e = 17 and finds d = e−1 in Z40 , so that 17d = 40k + 1
for some k ∈ N. This amounts to finding gcd(17, 40):
40 = 2 · 17 + 6, 17 = 2 · 6 + 5, 6 = 1 · 5 + 1,
from which we see that
1 = 6 − 5 = 6 − (17 − 2 · 6) = 3 · (40 − 2 · 17) − 17 = 3 · 40 − 7 · 17.
That is, d = −7 (mod 40) = 33. The receiver publishes the numbers n = 55 and
e = 17, but keeps the factors p = 5, q = 11, and d = 33 (and ϕ(n)) secret.
The sender then encodes 18 4 2 17 4 19 as
1817 417 217 1717 417 1917 (mod 55) = 28 49 7 52 49 24
The two Es are encoded in exactly the same way, since the block size is 1: obviously,
a larger block size should be used to thwart frequency analysis attacks.3
The receiver would then decode the received message 28 49 7 52 49 24 as
2833 4933 733 5233 4933 2433 (mod 55) = 18 4 2 17 4 19.
a = 2833 (mod 4)
(mod 5) = 28 (mod 5) = 3,
Problem 8.2: You have intercepted an encrypted transmission that was intended
for your math professor. It consists of the three numbers 26, 14, and 19. Each of
these numbers was separately encoded, using RSA encoding and a block size of 1,
where the letters A to Z are mapped to Z26 in the usual way. On your professor’s
web page you notice that he lists his public key as (n = 33, e = 13).
(a) Without breaking into your professor’s computer, determine the “secret” prime
factors p and q of n.
p = 3, q = 11.
a = 21 (mod 3) = 2,
Problem 8.3: You have intercepted an encrypted transmission from the Prime Min-
ister’s Office to the Leader of the Official Opposition. It consists of the two numbers
27 and 14, which were separately encoded, using RSA encoding and a block size
of 1, where the letters A to Z are mapped to Z26 in the usual way. On the Leader
of the Official Opposition’s web page you notice that he lists his public key as
(n = 35, e = 11).
b = 145 (mod 7) = 0.
This yields m = −14(4) = −56 (mod 35) = 14.
Thus, the sent message was 13, 14 which spells NO!
8.C. DISCRETE LOGARITHM SCHEMES 85
To decode the message, the receiver must be able to compute square roots modulo n.
This can be efficiently accomplished in terms of integers x and y satisfying 1 = xp+yq.
First one notes from Lemma 6.1 that the equation 0 = x2 −c has at most two solutions
in Zp . In fact, these solutions are given by ±a, where a = c(p+1)/4 (mod p):
Similarly, the two square roots of c in Zq are ±b, where b = c(q+1)/4 (mod q). Conse-
quently, by the Chinese remainder theorem, the linear congruences
Let Fq be a publicly agreed upon field with q elements, with primitive element α.
If Alice and Bob want to agree on a common secret key, they each secretly choose
a random integer between 1 and q − 1, which they call a and b, respectively. Alice
computes and sends αa to Bob; likewise, Bob computes and sends αb to Alice. Their
common secret key is then αab . If the Diffie–Hellman assumption is correct, a third
party will be unable to determine αab from knowledge of the public keys αa and αb
alone.
In the ElGamal Cryptosystem, Alice sends a message m to Bob as the pair of
elements (αk , mαbk ), where k is a randomly chosen integer. Bob can then determine m
by dividing mαbk by (αk )b .
Bob first checks with the trusted authority to confirm that s and C(s) really belong
to Alice. But since s and C(s) aren’t secret, this doesn’t yet establish that Alice was
really the one who sent them to Bob. So Bob challenges the person claiming to be
Alice to prove that she really owns the signature s, by first sending her a random
number r ∈ Zq .
Alice now tries to prove to Bob that she owns the underlying private keys a1
and a2 , without actually revealing them to him, by choosing random numbers k1
and k2 in Zq and sending him the three numbers
y1 = k1 + a1 r (mod q),
y2 = k2 + a2 r (mod q),
γ = g1k1 g2k2 (mod p).
Bob uses the fact that g1q = g2q = 1 (mod p) and the value of s to verify in Zp that
g1y1 g2y2 sr = g1k1 +a1 r g2k2 +a2 r sr = g1k1 g2k2 = γ (mod p).
The agreement of the numbers g1y1 g2y2 sr and γ in Zp begins to convince Bob that
maybe he really is talking to Alice after all.
But the question is, is it possible that a third party, say Charlie, is impersonating
Alice, having somehow devised a clever algorithm to determine numbers y10 and y20 that
fool Bob into thinking the sender is Alice? We now show that impersonation under
the Okamoto authentication scheme is as difficult as solving the discrete logarithm
problem.
Suppose Charlie has managed to find a way to convince Bob that he is Alice. That
is, without knowing Alice’s secret key, he has figured out a clever way to generate
two exponents y10 and y20 such that
y0 y0 0
g1 1 g2 2 sr = γ (mod p),
no matter what challenge r0 Bob sends him. Suppose he uses his algorithm again, for
a distinct challenge r00 6= r0 , to determine exponents y100 and y200 :
y 00 y 00 00
g1 1 g2 2 sr = γ (mod p).
Then
y 0 −y100 y 00 −y20 r00 −r0
g1 1 = g2 2 s (mod p),
and so
(r00 −r0 )−1 (y10 −y100 ) (r00 −r0 )−1 (y20 −y200 )
g1 g2 = s.
Charlie has thus determined two exponents a01 and a02 such that
−a0 −a02
s = g1 1 g2 (mod p).
88 CHAPTER 8. CRYPTOGRAPHIC CODES
It is not hard to show, when q is sufficiently large, that for virtually all possible
challenges r0 and r00 , the resulting pair of exponents (a01 , a02 ) will be distinct from
Alice’s exponents (a1 , a2 ) (for example, see Stinson [1995]). Without loss of generality,
we can relabel these exponents so that a2 6= a02 .
Together with Alice’s original construction
s = g1−a1 g2−a2 (mod p),
Charlie’s scheme would then constitute a computationally feasible algorithm for com-
puting loga1 g2 in Zp : we would know
a −a01 a0 −a2
g1 1 = g2 2 (mod p),
so that, on using the fact that a02 6= a2 ,
logg1 g2 = (a02 − a2 )−1 (a1 − a01 ).
Note that since g1 and g2 were chosen by the trusted authority, neither Alice nor
Charlie had any prior knowledge of the value logg1 g2 .
[Lin & Daniel J. Costello 2004] S. Lin & J. Daniel J. Costello, Error Control Cod-
ing, Pearson Prentice Hall, Upper Saddle River, New
Jersey, 2nd edition, 2004.
[Ling & Xing 2004] S. Ling & C. Xing, Coding Theory: A First Course,
Cambridge Univ. Presso, Cambridge, 2004.
90
Index
91
92 INDEX