Unit Notes MAS225

Download as pdf or txt
Download as pdf or txt
You are on page 1of 144

Contents

1 Graph Theory: Introduction 1


1.1 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Euler circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Hamilton cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Planar graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 Colourings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.7 Key Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.8 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2 Graph Problems and Algorithms 21


2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.1 The Adjacency Matrix and Connectivity . . . . . . . . . . . . . . . . . . . 23
2.3 Shortest Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.1 The Edge Weight Matrix and Shortest Paths . . . . . . . . . . . . . . . . 27
2.4 Euler Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5 Hamiltonian Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.6 Degree Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.7 Matchings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.7.1 The marriage theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.8 Key Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3 Trees 42
3.1 Definition and essential features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2 Spanning trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.1 Algorithms for spanning trees . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3 Rooted trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

i
ii CONTENTS

3.3.1 Shortest distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49


3.3.2 Tree searches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4 Vertex Traversals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.4.1 Preorder Traversals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.4.2 Postorder Traversals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.4.3 Inorder Traversals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.4.4 Uniqueness properties of traversals . . . . . . . . . . . . . . . . . . . . . . 61
3.5 Key Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4 Coding Theory 67
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1.1 Modular Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.1.2 Basic Definitions and Assumptions . . . . . . . . . . . . . . . . . . . . . . 70
4.1.3 Probability of Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.1.4 (n, m) binary block codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.2 Error Detection and Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.2.1 Hamming Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.2.2 Error Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2.3 Decoding Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.2.4 Error Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.2.5 Hamming Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.3 Linear Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.3.1 Brief overview of linear algebra for linear codes . . . . . . . . . . . . . . . 82
4.3.2 Dual Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.3.3 Algorithm for Finding Bases for Linear Codes . . . . . . . . . . . . . . . . 83
4.3.4 Generating Matrices for Linear Codes . . . . . . . . . . . . . . . . . . . . 88
4.3.5 Encoding with a Linear Code . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.3.6 Parity Check Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.4 Key Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.6 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

5 Mathematical Programming 101


5.1 Introduction to Management Science (Taha Ch. 1) . . . . . . . . . . . . . . . . . 101
5.2 Modelling with Linear Programming (Taha Ch. 2) . . . . . . . . . . . . . . . . 102
5.3 The Simplex Method and Sensitivity Analysis (Taha Ch. 3) . . . . . . . . . . . . 108
5.4 Duality and Post-Optimal Analysis (Taha Ch. 4) . . . . . . . . . . . . . . . . . 117
5.5 Transportation Model (Taha Ch. 5) . . . . . . . . . . . . . . . . . . . . . . . . . 122
iii

5.6 Solutions to Selected Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

6 Networks and Project Scheduling (Taha Ch. 6) 131


6.1 Programme Evaluation and Review Technique PERT . . . . . . . . . . . . . . 132
6.2 Critical Path Method CPM cost model . . . . . . . . . . . . . . . . . . . . . . . 134
6.3 Complex networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
iv
Chapter 1

Graph Theory: Introduction

In mathematics the word “graph” often means a plot of a function f (x) on the X Y plane. In
this chapter we consider a different kind of graph.

1.1 Graphs
A graph is a pair of sets V and E. The elements of V are called vertices and we represent
them by points on the page. The elements of E are called edges and we represent them by lines
that join two vertices. If two vertices are joined by an edge, they are called adjacent vertices.
Some examples of graphs are shown in Figure 1.1 below. We will use these graphs to illustrate
a number of basic concepts.

Figure 1.1: Some examples of graphs

1
2 Chapter 1. Graph Theory: Introduction

 In a drawing of a graph, the edges don’t have to be straight lines: they can be as “wiggly”
as the drawer pleases (as in G1 ).

 Edges are allowed to cross at points that aren’t vertices. For example, in G2 there are five
vertices. The crossing of edges in the middle of the square does not create a new vertex.

 Two vertices may be joined by more than one edge, as in G3 . Two or more edges which
join the same pair of vertices are called parallel edges.

 An edge may join a vertex to itself, as in G4 . Such an edge is called a loop.

 Not all vertices need to be linked together by means of edges. The graph G5 is still a
graph, even though it has three “parts” that are isolated from one another. We call G5 a
disconnected graph. The other graphs in this figure are connected because we can get
from one vertex to any other vertex by “walking” along some edges.

A simple graph is a graph that has no loops and no parallel edges. In Figure 1.1 the simple
graphs are G1 , G2 and G5 .
The degree of a vertex is the number of edges coming out of it (or going into it, depending
on how you look at it). We denote the degree of vertex v by deg(v). Note that a loop counts
two (not one) towards the degree of a vertex because both ends of the loop go into the vertex.
The total degree of a graph is the sum of all the degrees of the vertices.
We now come to our first theorem, which is also the first theorem ever proved about graphs!

Theorem 1.1 (Euler, 1736) The total degree of a graph equals twice the number of edges.

Proof. Every edge touches two vertices (a loop touches the same vertex twice), so every edge
contributes 2 towards the total degree of the graph. By adding up all the degrees of the vertices,
you count each edge twice (once at each end). Thus the total degree must equal twice the
number of edges. 

Theorem 1.1 is also known as the handshaking lemma, because of the following analogy:
Treating each vertex as a person and each edge as a handshake, we see that every handshake
involves two hands and so the total number of hands that have been shaken must be twice the
number of handshakes.

Corollary 1.2 The total degree of a graph is even.

Proof. By the handshaking lemma, the total degree is twice the number of edges. No matter
how many edges there are, twice this number is certainly even. 

I Example 1.1 Is there such a thing as a graph with 5 vertices whose degrees are 1, 1, 2, 3, 4?
Solution. The total degree of such a graph would be 1 + 1 + 2 + 3 + 4 = 11, which is an
odd number. The above corollary shows that such a graph cannot exist. (By the handshaking
lemma, it would have to have 5 21 edges!)

The complete graph Kn on n vertices is the simple graph in which every vertex is adjacent
to every other vertex. The first four complete graphs are shown in Figure 1.2.
Section 1.2. Isomorphism 3

Figure 1.2: The small complete graphs

Exercises
1. Is there such a thing as a graph with three vertices of degree 2 and two vertices of degree 3?
If so, draw one, and if not, explain why.

2. Is there such a thing as a graph with two vertices of degree 2, and three vertices of degree 3?
Why or why not?

3. Use graph theory to justify your answer to the following question: Is it possible in a group
of nine people for each person to know exactly three others? (Assume that if a knows b,
then b knows a.)

4. How many edges are there in Kn ?

1.2 Isomorphism
Consider the graphs in Figure 1.3.

Figure 1.3: Two very similar graphs

These graphs look more or less the same, except that one is the reflection of the other through
an invisible vertical line between them. We would like to consider these graphs as “equal” in
4 Chapter 1. Graph Theory: Introduction

some way because they display exactly the same information. Put another way, by re-positioning
the vertices of G1 (and letting the edges stretch or shrink as we move these vertices), we get a
drawing of G2 .

Definition. Two graphs G and H are said to be isomorphic if they represent exactly the
same information, that is, it is possible to re-position the vertices of one to give a drawing of
the other. This means that G and H have the same number of vertices and edges and there is
a function θ from the vertex set of G to the vertex set of H such that the following conditions
hold.

(i) If u and v are not adjacent in G, then θ(u) and θ(v) are not adjacent in H.

(ii) If there are n edges between u and v in G, then there are n edges between θ(u) and θ(v)
in H.

The function θ is called an isomorphism between G and H.

To show that H is isomorphic to G we label the vertices of H and G with the same symbols
in such a way that exactly the same connections are displayed in each graph. This labelling
shows an isomorphism between the graphs because it shows how to re-position the vertices of G
to get a drawing of H.

I Example 1.2 Look again at the two graphs in Figure 1.3. To show that G1 is isomorphic to G2
we use the vertex labels of G1 to re-label the vertices of G2 in such a way that the same vertex
connections occur in each graph. One possible re-labelling of G2 is shown in Figure 1.4.

Figure 1.4: Illustrating isomorphism

The new labelling of G2 shows an isomorphism because in each graph:

 a is adjacent to b and d (twice),

 b is adjacent to a, c and d,

 c is adjacent to b and d,

 d is adjacent to a (twice), b and c.


Section 1.2. Isomorphism 5

The isomorphism that this labelling yields is:

a ↔ s i.e., θ(a) = s
b ↔ r i.e., θ(b) = r
c ↔ u i.e., θ(c) = u
d ↔ t i.e., θ(d) = t

Isomorphisms are not always obvious. For example, the graphs in Figure 1.5 below seem to
be different until we have a closer look, whereupon they turn out to be isomorphic: you can
check that the labels on the second graph are consistent with the labels on the first.

Figure 1.5: Isomorphic graphs

Two graphs which are isomorphic have exactly the same structure (although they may be
drawn quite differently on the page.) Therefore, in order to show that two graphs are not
isomorphic we have to exhibit some structural feature that only one of them possesses. Of
course, we may notice many structural differences between the two graphs, but we only have to
mention one of these to show that the graphs are non-isomorphic.

I Example 1.3 The graphs in Figure 1.6 below are not isomorphic because G2 has a vertex of
degree 4, but G1 doesn’t. Another reason we could have given is that G2 has two vertices of
degree 2 while G1 only has one.

Figure 1.6: Non-isomorphic graphs


6 Chapter 1. Graph Theory: Introduction

Exercises
5. Draw all graphs with

(a) 1 vertex and 3 edges


(b) 2 vertices and 3 edges
(c) 3 vertices and 2 edges

Note: “all” in this sense means that your list of graphs should contain exactly one graph
of each isomorphism type.

6. Draw all simple graphs with

(a) 1 vertex
(b) 2 vertices
(c) 3 vertices
(d) 4 vertices

7. Are the graphs in Figure 1.7 isomorphic? If so, give an isomorphism, and if not, state a
structural difference.

Figure 1.7: Exercise 7

8. Are the graphs in Figure 1.8 isomorphic? Why or why not?

Figure 1.8: Exercise 8


Section 1.3. Euler circuits 7

1.3 Euler circuits


Graph theory is one of the few areas of mathematics that has a definite birth date. The first paper
on graphs was written in 1736 by the great Swiss mathematician Leonhard Euler. He began
the paper with his solution to a popular puzzle of the day, called the Königsberg Bridges
Problem. The city of Königsberg (now Kaliningrad) in East Prussia is located on the banks
and two islands of the river Pregel. The various parts of the city were joined by seven bridges
as shown in Figure 1.9.

Figure 1.9: The Königsberg bridges

The puzzle was this: Is it possible to start from one’s home (wherever that may be) and
stroll around the city, crossing each and every bridge exactly once before returning home?
Euler represented the city and its bridges as in Figure 1.10. Each part of the city is a vertex
and each bridge between parts is an edge. The question is now: Can this graph be traversed
in a single continuous path? That is, can one start at a vertex, travel along every edge of the
graph exactly once and return to the start? Euler argued that this is impossible because such a
path has to leave each vertex as many times as it enters it. Therefore each vertex needs to have
an even number of edges (half “in”, half “out”). Since the Königsberg graph does not have this
property it cannot be traversed in the desired way. This was Euler’s beautifully simple solution
to the Königsberg Bridges Problem!

Figure 1.10: The Königsberg graph

Euler realised that his argument had broader application and so he then went on to deal with
the general case: Given any graph, can one “walk” along every edge exactly once and return to
the starting vertex? Such a “walk” in a graph is now called an Euler circuit. (To be a little
8 Chapter 1. Graph Theory: Introduction

more precise, you have to go through every vertex at least once too, which means disconnected
graphs automatically fail to have an Euler circuit.)

Theorem 1.3 (Euler, 1736) A graph has an Euler circuit if and only if it is connected and
every vertex has even degree.

Proof. (⇒) Suppose G is a graph with an Euler circuit. Then certainly G must be connected.
Furthermore, if v is any vertex of G, the circuit must enter v just as many times as it leaves.
Therefore v must have even degree.
(⇐) Suppose that G is a connected graph in which every vertex has even degree. Euler gave
an argument that showed that G must have an Euler circuit, but we won’t go through it here.
If you draw some connected graphs in which every vertex has even degree and try to find an
Euler circuit, you may begin to believe Euler’s claim. 

An Euler path (also known as an Euler trail ) in a graph is a path that travels along every
edge exactly once but does not finish where it started. (Again, you have to go through every
vertex at least once too, so a disconnected graph can’t have an Euler path.)

Theorem 1.4 A graph has an Euler path if and only if it is connected and has exactly two
vertices of odd degree.

In a connected graph in which exactly two vertices have odd degree, any Euler path must
start at one of these vertices and finish at the other. Notice that the Königsberg graph doesn’t
even have an Euler path because all of its vertices have odd degree! However, if one more bridge
were built (it doesn’t matter where) then the graph would possess an Euler path. Alternatively,
if one bridge were knocked down there would also be an Euler path!

Exercises
9. Which of the graphs in Figure 1.1 have an Euler circuit and which have an Euler path?

10. For which values of n does Kn have an Euler circuit?

11. For which values of n does Kn have an Euler path?

1.4 Hamilton cycles


A Hamilton cycle in a graph is a “walk” that visits every vertex exactly once and returns
to its starting vertex. A Hamilton path is the same thing except that you don’t return to
the starting vertex. These ideas are named after the Irish mathematician William Hamilton,
who first considered them in 1856, although they had been investigated earlier by Euler (1759),
Vandermonde (1771) and Kirkman (1856).
Figure 1.11 below contains some graphs. The heavy lines in G1 and G2 show a Hamilton
cycle in each. The graph G3 is called the Petersen graph and, surprisingly, does not have a
Hamilton cycle. (Go on, try to find one!) However, G3 does contain Hamilton paths.
Section 1.4. Hamilton cycles 9

Figure 1.11: Two graphs with Hamilton cycles and one without

Hamilton cycles seem very much related to Euler circuits: the former visit every vertex once,
the latter use every edge once. Euler’s Theorem 1.3 gives a very simple property for testing
whether a graph has an Euler circuit or not. Naturally we wonder whether there is a similar
test to decide whether there is a Hamilton cycle or not. Nobody has ever found such test, and
anybody who does will become very famous!
We now present the three oldest theorems about Hamiltonian graphs (i.e., graphs with
Hamilton cycles).

Theorem 1.5 (Dirac, 1952) Let G be a connected simple graph with n > 3 vertices. If every
vertex has degree > 21 n, then G is Hamiltonian.

Theorem 1.6 (Erdös and Gallai, 1959) Let G be a connected simple graph with n > 3 vertices.
If one vertex has degree > 2, and if every other vertex has degree > 12 n, then G is Hamiltonian.

Theorem 1.7 (Ore, 1960) Let G be a connected simple graph with n > 3 vertices. Suppose
that for every pair of non-adjacent vertices u and v, we have deg(u) + deg(v) > n. Then G is
Hamiltonian.

Dirac’s theorem is superseded by each of the other two, but neither of these two supersedes
the other. For example, consider the graphs in Figure 1.12 below.

Figure 1.12: Two Hamiltonian graphs


10 Chapter 1. Graph Theory: Introduction

Theorem 1.6 tells us that G1 is Hamiltonian, but doesn’t help with G2 (because there is
more than one vertex of degree < 21 n). On the other hand, Theorem 1.7 tells us that G2 is
Hamiltonian, but doesn’t help with G1 (because deg(a) + deg(b) < 6).
There are many other theorems that go: “If G has certain properties, then G is Hamiltonian”.
But such a theorem is only “half” a test of Hamiltonicity. It can tell us if there is a Hamilton
cycle, but doesn’t tell us anything if G doesn’t have the “certain properties”. For example,
the graph in Figure 1.13 is obviously Hamiltonian but none of the above theorems detect this
because the graph doesn’t have the relevant properties.

Figure 1.13: A “hard-to-detect” Hamiltonian graph

No-one has ever discovered a theorem that goes: “G is Hamiltonian if and only if G has such-
and-such a list of properties”. This would be a full test of Hamiltonicity and would guarantee
instant fame for the discoverer!
The travelling salesperson problem is a practical problem related to the search for
Hamilton cycles in graphs. Refer to the graph in Figure 1.14 below.

Figure 1.14: A travelling salesperson problem

In this graph the vertices represent cities that a travelling salesperson must visit. The edges
between vertices represent direct air-routes or major roads. The numbers, or weights, attached
to each edge represent the efficiency of travelling along that route, where lower numbers mean
higher efficiency. For example, they might be distances, or travelling times, or ticket costs. The
travelling salesperson problem is this: What is the most efficient way for the salesperson to visit
every city in such a way that he begins and ends at his home base? Since it is inefficient to visit
any city more than once, we see that the salesperson needs to travel along a Hamilton cycle in
the graph! If there are several different Hamilton cycles, then the salesperson wants to choose
Section 1.4. Hamilton cycles 11

one which is “shortest”, that is, one whose edge-weights add up to the smallest number possible.
In the graph in Figure 1.14 there are two Hamilton cycles: the one whose vertex sequence is
a b c d e f a with length 99, and the one whose vertex sequence is a c b d e f a
with length 96.
Nobody has yet found a fast algorithm for solving the travelling salesperson problem. (There
are many companies that would be interested in such an algorithm!) The algorithms that are
used in practice don’t actually find the shortest Hamilton cycle (because nobody has found a fast
way of doing this). Instead they find a Hamilton cycle that is guaranteed (by the theory behind
the algorithm) to be “close” to the shortest possible. There are a number of such algorithms
that are reasonably fast.

Exercises
12. For which values of n is Kn Hamiltonian?

13. Find a Hamilton cycle in the graph in Figure 1.15.

Figure 1.15: Exercise 13

14. Solve the travelling salesperson problem for the graph in Figure 1.16. That is, find a
shortest Hamilton cycle.

Figure 1.16: Exercise 14


12 Chapter 1. Graph Theory: Introduction

1.5 Planar graphs


A plane graph is a graph whose edges don’t cross (except where they meet at vertices). A
planar graph is a graph that is isomorphic to a plane graph. In other words, a planar graph is
one that can be re-drawn (if necessary) so that its edges don’t cross. In Figure 1.17 the graph
G1 is plane, and the graph G2 is planar because it is isomorphic to G1 , via the labelling shown.

Figure 1.17: Two planar graphs

A plane graph divides the page into faces, including the infinite “exterior” face. For example,
the graph in Figure 1.18 below has 6 faces, as shown by the circled numbers.

Figure 1.18: A planar graph with faces marked

Euler proved the following very famous theorem about plane graphs.

Theorem 1.8 (Euler, 1750) If G is a connected plane graph with v vertices, e edges and f faces
(including the exterior face), then
v e + f = 2. (1.1)

Equation (1.1) is known as Euler’s polyhedron formula. You should check on a few
graphs that this formula actually works. There are many consequences of the formula. Here are
two:
Section 1.5. Planar graphs 13

Corollary 1.9 If G is a connected planar graph, then no matter how G is drawn in the plane,
the number of faces is always the same, namely f = e v + 2.

Theorem 1.10 If G is a connected plane simple graph with v > 3 vertices, e edges and f faces,
then e 6 3v − 6.

This last theorem is very useful for showing that certain graphs are not planar.

Theorem 1.11 K5 is not planar.

Proof. We can use Theorem 1.10, because K5 is certainly connected and simple. So let’s
suppose that K5 is planar. Then by the above theorem, we must have e 6 3v 6, that is, e 6 9.
But in K5 there are 10 edges, and so we have a contradiction. Therefore K5 cannot be planar
after all. 

You should note that Theorem 1.10 says: “If G has certain properties, then e 6 3v 6”.
This does not mean the same thing as “If e 6 3v 6, then G is planar”. (Recall from the chap-
ter on propositional logic in MAS162 Foundations of Discrete Mathematics that a conditional
statement p → q is not equivalent to its converse q → p.) For example, the Petersen graph
(graph G3 in Figure 1.11) satisfies e 6 3v 6, but it can be shown that it is non-planar.

Exercises
15. In Figure 1.19 there are two graphs. One is planar and one is not. Give a plane drawing
of the planar one, and explain why the other is non-planar.

Figure 1.19: Exercise 15

16. From Lewis Carroll’s “Alice’s Adventures in Wonderland”: Three neighbours use the same
water, oil and treacle wells. Unfortunately they dislike each other so much that in order to
avoid meeting they want to find non-crossing paths from each of their houses to each of the
three wells. The problem is: Can this be done? Express the problem in graph theoretic
terms. Do you think there is a solution?
14 Chapter 1. Graph Theory: Introduction

1.6 Colourings
When we have a map before us, we think of the interior faces as being countries or states and
the exterior face as being the ocean. In a good atlas, the countries, including the ocean, are
coloured with different colours to distinguish them from one another. This means that countries
with a shared border must have different colours.
In about 1850, Francis Guthrie, a mathematics student in England, wondered if there was
a number n such that every map could be coloured in 6 n colours, and if there was such a
number, what was it? After playing with some maps, he guessed that there was such a number
and that it was equal to four: every map he drew only ever needed four colours (or less). A
couple of years later he happened to mention this observation to his younger brother Frederick
(who was studying physics). Frederick tried to invent a map needing five colours and found
that he couldn’t, yet neither could he convince himself that you only ever needed four. So he
asked a teacher of his, De Morgan, why you only needed four colours. De Morgan considered
this problem for a while but was also unable to prove that you only needed four colours. In
October 1852 he wrote to Hamilton asking whether he could solve the problem. Hamilton was
not interested in this problem, and it lay dormant for many years. Then in 1878, the famous and
respected English mathematician Cayley made the problem more widely known by mentioning
it in a seminar. The next year, Kempe gave a proof of the “four-colour theorem”: Every map
can be coloured with four or fewer colours.
However, in 1890 Heawood discovered an error in Kempe’s proof. Despite his best attempts,
Heawood could not “fix” this error. The best he could do was prove the five-colour theorem:
Every map can be coloured with five or fewer colours. So the four-colour “theorem” was no
longer a theorem! It became the “four-colour conjecture” again. The conjecture is so simple
to state that it serves well as a recreational puzzle: either prove the conjecture or invent a
map that really needs five colours. As more and more people played with this puzzle, the
mathematical community started to realise that this was a very difficult problem! Many very
talented mathematicians had tried to solve it and failed. Why was it so hard? Was it even true?
It certainly “seemed” true, but maybe there is a very large map, with thousands of countries,
that needs five colours?
For decades the four-colour conjecture was one of the most famous and difficult unsolved
problems in mathematics. Finally though, in 1976, Appel and Haken announced a proof of the
conjecture. No-one has ever found a mistake with their proof, so we do indeed now have a
four-colour theorem. But some mathematicians are dissatisfied with the proof: this is because
it relies on very complicated calculations done by a computer (following the algorithm designed
by Appel and Haken). In fact, there are so many calculations that no human could ever check
them all! The only way to check the proof is to gain an understanding of the algorithm, write
your own program from it, and see if your program comes to the same conclusion as the original.
This has already been done. In fact, a simpler proof using the same ideas and still relying on
computers was published in 1997 by Robertson, Sanders, Seymour, and Thomas. Later on,
in 2005, the theorem was proved by Georges Gonthier with general purpose theorem proving
Section 1.6. Colourings 15

software.
Nobody disputes the validity of the original proof by Appel and Haken, but many wish there
was a shorter one, one that they could sit down with, read and understand. Anybody who comes
up with a proof like this (even if it is dozens of pages long) would become very famous! Until
that day (if it ever comes) we are faced with the following unsettling fact: A mathematical truth
that is easy to state is not necessarily easy to prove. (Indeed, it may be extremely complicated
to prove!)
Well, now it is time to leave the history and start the mathematics of map colouring. First
we learn how to convert a map to a graph. Let M be a map with countries c1 , c2 , . . . (the
ocean is regarded as a country too). The graph GM representing M is obtained as follows: each
country ci of M corresponds to a vertex vi of GM , and two vertices vi and vj of GM are joined by
an edge if and only if the corresponding countries ci and cj share a border. Note that countries
that share only a single point of a border are not considered to be sharing a border (because
these two countries can be coloured with different colours) and so the corresponding vertices are
not joined in GM . A map and its graph are shown in Figure 1.20.

Figure 1.20: A map and its graph

Now we see that a colouring of the countries of a map M corresponds to a colouring of the
vertices of its graph GM so that adjacent vertices have different colours. We can extend this
idea to any graph (not all graphs are graphs of maps) as follows: A colouring of a graph is an
assignment of colours to its vertices in such a way that no two adjacent vertices have the same
colour. A graph is called k-colourable if it can be coloured with 6 k colours. The chromatic
number of a graph is the least number of colours needed to colour it. So if a graph can be
coloured with 5 colours but not with 4, then its chromatic number is 5. Note that if the graph
has a loop, then it is impossible to colour the graph, whereas the presence of parallel edges is
irrelevant to the colouring. Some colourings are shown in Figure 1.21 below, where we have used
the numbers 1, 2, 3, . . . to represent different colours.
16 Chapter 1. Graph Theory: Introduction

Figure 1.21: Some graph colourings

You should be able to convince yourself that the graph of any map must be planar (plane,
in fact). The four-colour theorem for maps tells us that every graph that is the graph of a map
is 4-colourable. What about planar graphs in general? In fact, the same result holds for them
too:

Theorem 1.12 (The four colour theorem for graphs) If G is a planar graph without loops, then
G is 4-colourable.

Exercises
17. Find the chromatic numbers of the graphs in Figure 1.1, except for G4 .

18. How many colours are needed for the map in Figure 1.20?

19. What is the chromatic number of Kn ?

20. Draw a graph with 6 vertices, each of degree 4, and whose chromatic number is 3.

1.7 Key Words


A list of keywords and concepts appears below. Use it to focus your revision of the material
covered in this chapter.
17

k-colourable graph, 15

adjacent vertices, 1

chromatic number, 15
colouring of a graph, 15
complete graph Kn , 2

degree of a vertex, 2

edges, 1
Euler circuit, 7
Euler path, 8
Euler’s polyhedron formula, 12

faces, 12

graph, 1
graph isomorphism, 4

Hamilton cycle, 8
Hamilton graphs, 9
Hamilton path, 8
handshaking lemma, 2

Königsberg Bridges Problem, 7

Petersen graph, 8
planar graph, 12
plane graph, 12

simple graph, 2

total degree of a graph, 2


travelling salesperson problem, 10

vertices, 1

weights, 10
18 Chapter 1. Graph Theory: Introduction

1.8 Solutions to Exercises


1. Yes. An example is given in Figure 1.22.

Figure 1.22: Solution to Exercise 1

2. The total degree of such a graph would be 4 + 9 = 13. Since this is odd, there can be no
such graph.

3. Let G be a graph where each vertex represents one person, and an edge between vertices
means that the two people know each other. Then G has 9 vertices and each vertex has
degree 3. So the total degree of G is 27, which is odd. This is impossible. So the answer
to the original question is “No”.

4. There are n vertices, each of degree n 1. So the total degree is n(n 1), and therefore,
by the handshaking lemma, there are 12 n(n − 1) edges in Kn .

5. (a) There is 1 such graph.


(b) There are 6 such graphs.
(c) There are 5 such graphs.

6. (a) There is 1 such graph.


(b) There are 2 such graphs.
(c) There are 4 such graphs.
(d) There are 11 such graphs.

7. The graphs are isomorphic. This is shown by the labelling in Figure 1.23.

Figure 1.23: Solution to Exercise 7


Section 1.8. Solutions to Exercises 19

8. The graphs are not isomorphic. Here are some structural differences, any one of which
suffices to prove non-isomorphism.

 In G2 there is a vertex of degree 2 that is adjacent to another vertex of degree 2,


whereas in G1 no vertex of degree 2 is adjacent to another vertex of degree 2.
 In G2 the four vertices of degree 3 make a square, whereas in G1 they don’t. (They
form two copies of K2 .)
 G2 has a face with eight sides, whereas G1 does not.

9. The only graph with an Euler circuit is G3 . The graphs with Euler paths are G2 and G4 .

10. Kn has an Euler circuit if and only if n is odd.

11. K2 is the only complete graph with an Euler path.

12. Kn is Hamiltonian if and only if n > 3. (Some mathematicians also regard K1 as Hamil-
tonian.)

13. A Hamilton cycle is shown in Figure 1.24.

Figure 1.24: Solution to Exercise 13

14. Two different Hamilton cycles of length 26 are shown in Figure 1.25. These are the shortest
I can find, but if you can find a shorter one please let me know! (By the way, there are 24
different Hamilton cycles in K5 .)

Figure 1.25: Solution to Exercise 14

15. The graph on the left is planar. A plane drawing is shown in Figure 1.26 below. Now
consider the graph on the right. If this graph were planar, then it would satisfy e 6 3v 6,
that is, e 6 15. However, we can see from the graph that e = 16. Therefore this graph
must be non planar.
20 Chapter 1. Graph Theory: Introduction

Figure 1.26: Solution to Exercise 15

16. Make a graph as follows. There is a vertex for each house and each well, and we join each
house to each well (representing the paths to the wells). The graph is shown in Figure 1.27
below.

Figure 1.27: Solution to Exercise 16

The problem is: Is this graph planar? It can be proved that the graph is not planar, but
we haven’t covered sufficient theory to do this.

17. G1 and G2 have chromatic number 4, and G3 and G5 have chromatic number 3.

18. Four.

19. n.

20. One such graph, with a 3-colouring, is shown in Figure 1.28.

Figure 1.28: Solution to Exercise 20


Chapter 2

Graph Problems and Algorithms

There are lots of interesting problems associated with graphs that are easy to state and whose
solutions have useful applications. We have seen some of these in the previous chapter. In this
and the next chapter we shall examine several important problems and describe algorithms that
can be used to solve them. We also show how these algorithms can be implemented using the
MATLAB computer package.

2.1 Examples
Some examples of graphs are shown below. Diagram (a) might be a road map showing towns
and distances between them, diagram (b) is showing the possible outcomes when a coin is tossed
three times, diagram (c) is a simple flowchart, and diagram (d) shows the chemical composition
of a molecule of normal butane. (Each C stands for a carbon atom, each H stands for a hydrogen
atom, and the lines represent chemical bonds.)

Figure 2.1: Examples of graphs

21
22 Chapter 2. Graph Problems and Algorithms

Figure 2.2: More examples of graphs

Diagrams such as those appearing in Figures 2.1 2.2 are useful for displaying graphs with
just a small number of vertices and edges. However if the graph is large, or if it is to be analysed
by a computer, then it must be represented in other ways. One way is simply to list all vertices
and edges. For example, a vertex list and an edge list for the graph in Figure 2.2 (c) are:

VERTEX LIST : µ, ν, ρ, σ, τ
EDGE LIST : {µ, ρ}, {µ, ν}, {µ, ν}, {ρ, σ}, {σ, σ}, {σ, τ }, {τ, ν}

An edge is specified simply by giving the two vertices that it joins. The set braces ”{” and
”}” are used to indicate that the ordering does not matter. For example, the edge joining µ and
ρ could be listed as {µ, ρ} or as {ρ, µ}.
Because the vertices labelled µ and ν in the graph (c) are joined by two edges, {µ, ν} appears
twice in the edge list. Recall that edges such as these which join the same vertices are called
parallel edges.
An edge may start and finish at the same vertex. Recall that such an edge is called a loop.
The loop attached to the vertex labelled σ in the graph (c) appears as {σ, σ} in the edge list.
Recall that a simple graph is one which has no parallel edges and no loops.
In some situations, for example in the flowchart in Figure 2.1 (c), it is useful to specify a
direction for each edge of a graph. Directions are usually indicated by arrows. Graphs in which
the edges are directed are called directed graphs or digraphs. In a digraph the edges are
listed as ordered pairs. For example (α, β) would denote an edge starting at a vertex labelled α
and finishing at one labelled β. The edge (β, α) goes in the reverse direction.
In example (a) of Figure 2.1 and in many other graphs there are numbers attached to the
edges. These numbers are called weights and can have various interpretations. A graph in
which the edges have weights is called a weighted graph.

2.2 Connectivity
A sequence of vertices v0 , v1 , v2 , . . . , vn in a graph lie on a path if {v0 , v1 }, {v1 , v2 }, {v2 , v3 }, . . . , {vn−1 , vn }
are all edges in the graph. Usually the sequence of edges {v0 , v1 }{v1 , v2 }, {v2 , v3 } . . . {vn−1 , vn }
is called the path, and v0 , v1 , v2 , . . . vn is called its vertex sequence. Both vertices and edges
may be repeated in a path. Thus in the graph in Figure 2.2 (a), α, γ, δ, , γ, δ, α, β is the vertex
sequence of a path in which one edge appears twice and three vertices are visited twice.
Section 2.2. Connectivity 23

A path may have just one edge, or it may have more than one edge. The number of edges
in a path is called the path length.
A vertex v in a graph is joined by a path to another vertex w if there is a path which
starts at v and finishes at w. It is also convenient to regard each vertex v as being joined by a
path to itself, even if there is no loop or other path starting and finishing at v. A graph is said
to be connected if each vertex is joined by a path to each other vertex.
How can we decide if a graph is connected?
If the graph is small enough we can see if it is connected by glancing at its picture. What
we are looking for is a systematic way of deciding that can be applied to graphs of any size and
can be implemented on a computer.

2.2.1 The Adjacency Matrix and Connectivity


To obtain the adjacency matrix of a graph G, directed or undirected, we start with the vertices
numbered 1, 2, 3, . . . , n. If the graph is directed, the adjacency matrix, M (G) = [mij ], is the
n × n matrix in which mij (the entry in the i-th row and the j-th column) is the number of
edges if any which start at vertex i and end at vertex j. In an undirected graph we set both mij
and mji equal to the number of edges joining vertices i and j, and hence the adjacency matrix
is symmetric.
In a simple graph 0 and 1 are the only possible values of mij ; larger values occur in graphs
with parallel edges. Non-zero entries on the main diagonal of M correspond to loops.
A non-zero entry in the adjacency matrix shows that certain vertices are joined by edges.
Non-zero entries in powers of the adjacency matrix show that there are longer paths in the
graph. In fact, the formula for matrix multiplication can be used to show that

the entry in the i th row and the j th column of M (G)k , the k th power of the adjacency
matrix, equals the number of paths of length k starting at vertex i and ending at vertex j.

Hence questions about connectivity in a graph can be answered using the adjacency matrix.
If we find a non-zero entry in the (i, j) position of any one of the powers M, M 2 , M 3 , . . . , then we
know that the i th vertex is connected to the j th vertex. Of course we need look no higher than
M n−1 , where n is the number of vertices, because if there is any path connecting two vertices
then there is a path of length n 1 or less.

I Example 2.1 Consider the graph G1 in Figure 1.3. Using the obvious ordering of the
vertices a, b, c, d, the adjacency matrix is
 
0 1 0 2
 
1 0 1 1
M =  
 0 1 0 1

2 1 1 0
The following MATLAB program connect.m computes the powers of the adjacency ma-
trix and determines if the graph is connected.
24 Chapter 2. Graph Problems and Algorithms

% Compute powers of the adjacency matrix of a graph to determine the number of joining paths
% between two given vertices and determine if the graph is connected
clear
% Enter the adjacency matrix:
M=[0 1 0 2;
1 0 1 1;
0 1 0 1;
2 1 1 0]
[m,n]=size(M);
if m~=n
disp(’Adjacency matrix must be square’)
return
end
MM=M;
for k=2:n-1
disp([’M to the power ’ num2str(k) ’ equals:’])
Mk=M^k;
disp(Mk)
MM=[MM; Mk];
end
% Enter two vertices to test if joined by a path
vert1=input(’Enter number of first vertex: ’);
vert2=input(’Enter number of second vertex: ’);
join=[];
for k=1:n-1
join=[join MM((k-1)*n+vert1,vert2)];
end
% Display table of number of paths joining vertices
disp(’length k number of joining paths of length k’)
disp([[1:n-1]’ join’])
% Determine if graph is connected
for i=1:n
for j=1:n
Mjoins(i,j)=0;
for k=1:n-1
Mjoins(i,j)=Mjoins(i,j)+MM((k-1)*n+i,j);
end
end
end
disp(’Each entry in the following matrix equals’)
disp([’the total no. of joining paths of length <= ’ num2str(n-1)])
disp(’for the corresponding pair of vertices:’)
disp(Mjoins)
if all(Mjoins)==1
disp(’The graph is connected’)
else
disp(’The graph is not connected’)
end
Section 2.2. Connectivity 25

Exercise Run the program connect.m, making sure you understand the meaning of each
statement. The output of the program is shown below.

M =

0 1 0 2
1 0 1 1
0 1 0 1
2 1 1 0

M to the power 2 equals:


5 2 3 1
2 3 1 3
3 1 2 1
1 3 1 6

M to the power 3 equals:


4 9 3 15
9 6 6 8
3 6 2 9
15 8 9 6

Enter number of first vertex: 2


Enter number of second vertex: 3
length k number of joining paths of length k
1 1
2 1
3 6

Each entry in the following matrix equals


the total no. of joining paths of length <= 3
for the corresponding pair of vertices:
9 12 6 18
12 9 8 12
6 8 4 11
18 12 11 12

The graph is connected

By inspection of the graph G1 we can list the joining paths from vertex b to vertex c as
follows:
Section 2.3. Shortest Paths 27

We can show that C = I + B(G) + B(G)[2] + B(G)[3] + . . . + B(G)[n−1] , where I is the identity
matrix and B(G)[2] , B(G)[3] , . . . are the Boolean powers of B(G).
We can get an idea of how quick it is to compute the connectivity matrix C(G) by considering
the number of elementary computations needed to calculate it. The Boolean product of two
n × n Boolean matrices requires n checks per entry: an entry in the product is zero if and only
if there are no 1’s in corresponding entries of the relevant row and column of the matrices being
multiplied. But there are n2 entries, giving n3 checks altogether, and we say that the method has
order of n3 efficiency: the number of steps taken to perform the method (generally considered
a measure of time taken) is a third degree polynomial in the parameter n (usually the number
of vertices in the graph in what follows).
Similarly, it is not hard to see that computing C(G) has order of n4 efficiency, because the
(n 1)-st Boolean power of B(G) is needed. We say that computing C(G) takes polynomial
time, meaning that a polynomial function of n gives a measure of the number of elementary
computations or tests needed.

2.3 Shortest Paths


In a weighted graph the sum of the weights on the edges of a path is called the path weight.
If the weights on the edges represent distances, then the weight of a path is the total distance
along that path.
The problem of finding, amongst all possible paths which start and finish at two given vertices
in a graph, the one which has the smallest path weight is known as the shortest path problem

2.3.1 The Edge Weight Matrix and Shortest Paths


We can store the edge weights in a weighted graph as entries of a matrix. The edge weight
matrix of a simple weighted digraph G is the matrix W (G), whose entries wij are defined as
follows :

 0 if i = j,

wij = the weight of the edge from vertex i to vertex j (if it exists),

∞ if there is no edge from vertex i to vertex j.

We make the obvious modifications if the graph is undirected. If we want to store W (G) in
a computer, we replace ∞ by a very large number, i.e., any number much larger than any of the
weights in G.
If we regard the edge weights as distances, then the shortest path between two vertices is
the path from one vertex to the other with smallest path weight. We define W ∗ (G), the matrix
of shortest distances in G, as the matrix whose entries are the weights of the shortest paths
between pairs of vertices in G. (If there is no path from a vertex i to a vertex j the corresponding
entry in W ∗ (G) is ∞.)
There are several ways of obtaining W ∗ from the edge weight matrix W . One of these is
Warshall’s algorithm.
Section 2.4. Euler Circuits 29

if W(i,j) > W(i,k)+W(k,j)


W(i,j) = W(i,k)+W(k,j);
end
end
end
end
disp(’Matrix of shortest distances: ’)
disp(W)

The output of the program is shown below.

Edge weight matrix:

W =

0 360 440 Inf Inf Inf


360 0 530 190 230 Inf
440 530 0 Inf 460 240
Inf 190 Inf 0 270 Inf
Inf 230 460 270 0 300
Inf Inf 240 Inf 300 0

Matrix of shortest distances:


0 360 440 550 590 680
360 0 530 190 230 530
440 530 0 720 460 240
550 190 720 0 270 570
590 230 460 270 0 300
680 530 240 570 300 0

Thus the shortest distance from vertex 1 to vertex 5 is the element W ∗ (1, 5) = 590, which can
easily be checked by inspection of the graph.

2.4 Euler Circuits


Recall that Euler’s solution to the Königsberg Bridges Problem was based on a very simple idea,
namely the concept of vertex degree. The degree of a vertex in an undirected graph is the
number of edges to which it is joined. (Loops are counted twice.)
Each visit to a vertex on an edge tour of a graph allows us to pair two of the edges joined
to that vertex: we pair the edge by which we reach the vertex to the edge by which we leave.
If the edge tour is an Euler circuit, then each edge belongs to just one such pair. This tells us
that the degree of each vertex must be even if there is to be an Euler circuit. This condition is
not only necessary but sufficient.

Theorem 2.1 (Euler’s Theorem) An undirected graph has an Euler circuit if and only if it
is connected and each vertex has even degree.
Section 2.4. Euler Circuits 31

[m,n]=size(M);
if m ~= n
disp(’Adjacency matrix must be square’)
return
end
if M’ ~= M
disp(’Adjacency matrix must be symmetric’)
return
end
% Check the necessary and sufficient condition for existence of an Euler circuit
deg=sum(M);
vertodd=find(rem(deg,2) ~= 0);
if sum(vertodd > 0)
disp([’Vertex no. ’ num2str(vertodd) ’ has odd degree so there is no Euler circuit’])
return
end
% Initialise vertex and edge sequences
v0=1;
VS=v0, ES=[];
v=v0;
% The degree deg(v) equals sum of v’th row
degv=sum(M(v,:));
while degv > 0
if degv == 1
w=find(M(v,:)>0);
e=[v w];
% Remove v from vertex list Vlist
M(v,:)=zeros(1,n);
M(:,v)=zeros(n,1);
end
if degv > 1
e=input([’Enter edge [’ num2str(v) ’ ?] whose removal does not disconnect graph: ’]);
w=e(2);
end
ES=[ES; e]
if M(v,w)<=1
M(v,w)=0;
else
M(v,w)=M(v,w)-1; % in case of parallel edges or loops
end
if M(w,v)<=1 % symmetry
M(w,v)=0;
else
M(w,v)=M(w,v) 1;
end
v=w;
VS=[VS v]
degv=sum(M(v,:));
end
32 Chapter 2. Graph Problems and Algorithms

The output of the program is shown below. Notice how the edge sequence ES and vertex
sequence VS for the Euler circuit grow as new edges are chosen. Follow the progress of the
algorithm on the graph in Figure 2.1(a). Note that when we reach vertex 3 we can’t choose edge
[3 1] because its removal would disconnect the current graph.

M =
0 1 1 0 0 0
1 0 1 1 1 0
1 1 0 0 1 1
0 1 0 0 1 0
0 1 1 1 0 1
0 0 1 0 1 0
VS =
1
Enter edge [1 ?] whose removal does not disconnect graph: [1 2]
ES =
1 2
VS =
1 2
Enter edge [2 ?] whose removal does not disconnect graph: [2 3]
ES =
1 2
2 3
VS =
1 2 3
Enter edge [3 ?] whose removal does not disconnect graph: [3 6]
ES =
1 2
2 3
3 6
VS =
1 2 3 6
ES =
1 2
2 3
3 6
6 5
VS =
1 2 3 6 5
Enter edge [5 ?] whose removal does not disconnect graph: [5 2]
ES =
1 2
2 3
3 6
6 5
5 2
Section 2.4. Euler Circuits 33

VS =
1 2 3 6 5 2
ES =
1 2
2 3
3 6
6 5
5 2
2 4
VS =
1 2 3 6 5 2 4
ES =
1 2
2 3
3 6
6 5
5 2
2 4
4 5
VS =
1 2 3 6 5 2 4 5
ES =
1 2
2 3
3 6
6 5
5 2
2 4
4 5
5 3
VS =
1 2 3 6 5 2 4 5 3
ES =
1 2
2 3
3 6
6 5
5 2
2 4
4 5
5 3
3 1
VS =
1 2 3 6 5 2 4 5 3 1
34 Chapter 2. Graph Problems and Algorithms

Exercise The program fleury.m above requires the user to enter a new edge that does not
disconnect the current graph. See if you can make this part automatic by incorporating a test
using the code in the program connect.m in Section 2.2.

2.5 Hamiltonian Circuits


There is no known simple way of checking whether a graph has a Hamiltonian circuit or path.
Why should Hamiltonian circuits be so much more difficult to find than Euler circuits? These
unexplained differences are some of the reasons why some people find graph theory fascinating.
As we have seen in the previous chapter, there are some known sufficient conditions for the
existence of Hamiltonian circuits. One of these, of somewhat limited usefulness, appears below.

Theorem 2.2 A simple graph has a Hamiltonian circuit if it has n vertices, where n > 2, and
if each vertex has degree n/2 or more.

The travelling salesperson problem is another for which no quick and systematic method of
solution is known. For graphs with just few vertices or few edges it may be feasible to write down
all possible Hamiltonian circuits. We could then find the circuit with minimum weight simply
by calculating all path weights of all circuits. However in larger graphs this is not feasible. For
example, in a graph with 20 vertices there may be more than 6 × 1016 circuits to be checked.
In general, to solve the travelling salesperson problem by a direct “brute force” method would
require that every possible listing of the n vertices be tested out as a Hamiltonian circuit. There
are n! such listings (one for each permutation of the vertices), and each check requires there to
be a check whether consecutive vertices have an edge between them, so the method has order
at least n! efficiency. For large values of n, this is much bigger than any polynomial function
of n. It is not known whether there exists a algorithm with polynomial efficiency which solves
the travelling salesperson problem, and it is a remarkable fact that if such an algorithm exists,
any problem can be solved by a polynomial algorithm. The travelling salesperson problem is
“at least as hard” as any other problem.

2.6 Degree Sequences


The problem of deciding whether two given graphs are isomorphic is another apparently very
difficult one. No one yet knows of a simple test for isomorphism that is universally applicable.
Our purpose here is to examine another isomorphism invariant for graphs. This is not a
sufficient test for isomorphism, but it is quite useful for distinguishing between isomorphism
classes, at least for small graphs.
The degree sequence of an undirected graph is a list of the degrees of the vertices of
the graph, written out in non-decreasing order. The degrees of vertices are unchanged by
isomorphism, and so is the degree sequence. This particular isomorphism invariant is relevant
Section 2.7. Matchings 35

for graphs of chemical molecules, because atoms of a particular type usually have a fixed number
of chemical bonds.
Each of the two graphs in Figure 2.6 below has degree sequence 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
4, 4, 4, 4, but they are non-isomorphic. They represent molecules of the different types of chemical
compound butane. These different types are called isomers, and have slightly different physical
properties. Is there any other isomer of butane? In graphical terms we are asking if there is
any other simple connected graph with the same degree sequence as the ones above, but which
is isomorphic to neither of these.

Figure 2.6: The two butane isomers

2.7 Matchings
There is a class of combinatorial problems, commonly known as assignment or matching prob-
lems, in which the elements of two sets must be suitably paired. Such problems can often be
modelled by graphs.
In a typical assignment problem there are m positions for which there is a total of n
applicants. Some applicants may be suitable for more than one of the positions. Is it possible to
fill all of the positions with suitable applicants. If so, how? If not, what is the largest number
of assignments that can be made, and how can this be done?
In a typical representation problem there are m committees with a total membership of
n people. The committees need not be disjoint, i.e. some people may serve on more than one
committee. Is it possible to choose a chairperson for each committee in such a way that no
person chairs more than one committee?
The same kind of problem can be described more colourfully as a marriage problem. Given
m girls and n boys, some of whom like each other, when is it possible for each of the girls to be
married to a boy she likes?
Each of these problems can be described in terms of a special kind of graph. A directed
bipartite graph is a directed graph G whose vertex set is the disjoint union of subsets A and
B, and whose edges are all of the form (α, β), where α ∈ A and β ∈ B. That is, each edge starts
at a vertex in A and ends at a vertex in B. We call A the initial set and B the final set of G.
The bipartite graph G for the marriage problem is constructed as follows. Let A denote the
set of girls and let B denote the set of boys. The vertex set of G is A ∪ B. The edge set of G
36 Chapter 2. Graph Problems and Algorithms

shows the likings of the girls: the edge (α, β) belongs to the graph if and only if the girl α likes
the boy β.
The graphs for the assignment problem and the representation problem are constructed
similarly. For example, in the graph of the assignment problem the vertices are the positions
and the applicants, and there is an edge from a position to an applicant if and only if the
applicant is suitable for the position.
A matching for G is a set M of edges with no vertices in common. A matching matches
some (or perhaps all) of the vertices in A with distinct vertices in B. A maximal matching
is a matching with a maximal number of edges. A complete matching is a matching M in
which each vertex in A is the initial vertex of an edge in M .
Any finite directed bipartite graph has a maximal matching. The assignment, representation
and marriage problems all ask for complete matchings.
It is also reasonable to take a more constructive approach, and ask how a complete matching
can be found if there is one, and, in cases where there is no complete matching, how can a
maximal matching be found.
How can we decide if a directed bipartite graph has a complete matching?
Some necessary conditions are immediately apparent. Suppose that A is the initial set and that
B is the final set of the directed bipartite graph G. Then clearly there can only be a complete
matching if there are at least as many vertices in B as there are in A, i.e., |A| ≤ |B|.∗
This condition can be generalised a little further. For each subset X of A let R(X) denote
the subset of B consisting of all vertices in B which are joined to at least one vertex in X, i.e.,

R(X) = {b | b ∈ B and (x, b) is an edge in G for some x in X}.

Any matching for G match vertices in the subset X with vertices in R(X), because, by the
definition of R(X), each edge starting at a vertex in X must finish at a vertex in R(X). So if
there is a complete matching for G, then |X| ≤ |R(X)|. The condition

|X| ≤ |R(X)| for all subsets X of A

is called the matching condition.

Figure 2.7: The matching condition


Recall that the number of elements of a set S is usually denoted by |S|
Section 2.7. Matchings 37

The graph shown in Figure 2.7 above has no complete matching because the matching
condition is violated by the indicated subset X.
Is the matching condition sufficient for the existence of a complete matching? That is, is there
a complete matching for each directed bipartite graph which satisfies the matching condition?
Turning to more constructive aspects, we can ask the following: How can we find a maximal
matching in any given directed bipartite graph?

2.7.1 The marriage theorem


It was shown by Philip Hall in 1935† that the matching condition is sufficient for the existence
of a complete matching.

Theorem 2.3 (Hall’s marriage theorem) If A is the initial set and B is the final set of the
directed bipartite graph G, and if |X| ≤ |R(X)| for each subset X of A, where R(X) denotes
the subset of B consisting of all vertices in B which are joined to at least one vertex in A, then
there is a complete matching for G.

The theorem says that the matching condition is both necessary and sufficient for the exis-
tence of a complete matching.

Proof. The theorem can be proved using the theory of linear programming. However a proof
by induction can be given as follows. We shall use “matrimonial” terminology throughout the
proof. Thus the initial set A is regarded as a set of girls, and the final set B as a set of boys.
The theorem is trivially true if there is just one girl. Now suppose that m > 1, and consider
a bipartite graph G in which there are m girls.
First suppose that every set of k girls (1 ≤ k < m) collectively like at least k + 1 boys (i.e.,
|X| < |R(X)| for all proper subsets X of A). Then if we choose any girl and marry her to any
boy that she likes, the original condition will be true for the remaining girls and boys. These
m 1 girls can be married to boys they like by the inductive hypothesis, and this completes the
proof for this case.
Now suppose that there is a set X of k girls (k < m) who collectively like exactly k boys (i.e.,
|X| = |R(X)| = k). Then these k girls can be married to boys that they like by the induction
hypothesis, leaving m k girls. Now any h of the remaining girls must know at least h of the
remaining boys, since otherwise these h girls together with the above collection of k girls would
know fewer than h + k boys, contrary to the hypothesis. It follows that the matching condition
applies to the remaining m k girls and boys, and so by the induction hypothesis these girls
can also be married to boys they like. 


It was later discovered that others had proved equivalent results somewhat earlier, but the theorem remains
known as Hall’s marriage theorem
38 Chapter 2. Graph Problems and Algorithms

It is probably worth remarking that although the marriage theorem is interesting as an


existence theorem, it does have severe practical limitations. First, the matching condition is
usually difficult to check because it involves all subsets of the initial subset, and there are 2n of
these if n is the number of elements of the subset. Secondly, it gives no indication of how the
complete matching can be constructed. However there are special linear programming algorithms
which will produce maximal matchings, and in particular, complete matchings if they exist.

2.8 Key Words


A list of keywords, concepts and algorithms appears below. Use it to focus your revision of the
material covered in this chapter.
39

adjacency matrix, 23 Warshall’s algorithm, 27


assignment problem, 35 weighted graph, 22

Boolean addition, 26
Boolean adjacency matrix, 26
Boolean multiplication, 26

complete matching, 36
connected, 23
connectivity matrix, 26

degree of a vertex, 29
degree sequence, 34
directed bipartite graph, 35
directed graph, 22

edge list, 22
edge weight matrix, 27
Euler’s Theorem, 29

Fleury’s algorithm, 30

isomer, 35

loop, 22

marriage problem, 35
matching, 36
matching condition, 36
matrix of shortest distances, 27
maximal matching, 36

parallel edge, 22
path, 22
path length, 23
path weight, 27

representation problem, 35

shortest path, 27
shortest path problem, 27
simple graph, 22

vertex list, 22
vertex sequence, 22
40 Chapter 2. Graph Problems and Algorithms

2.9 References
Most books on discrete mathematics and elementary graph theory cover the material discussed
here. Some references are listed in the Unit Information. You will notice inconsistencies between
the texts and these notes in the definitions and the notation. Unfortunately this is unavoidable,
because in this comparatively young branch of mathematics universally accepted definitions have
not yet been obtained. In these notes we have tried to use the most common meanings of terms,
and these are what you should use (in this unit, at least).

2.10 Exercises
Problems 1, 2, 3, 4, 5, 6 all refer to the graph shown below.

1. (a) Write down the Boolean adjacency matrix B(G) for this graph. (Use the natural
labelling of the vertices: s ↔ 1, t ↔ 2, . . . , z ↔ 8.)
(b) Calculate the Boolean powers B(G)[2] , B(G)[3] , and B(G)[4] , and comment on your
answers.

2. (a) Write down the edge weight matrix W for this graph.
(b) Find the matrix of shortest distances for this graph using Warshall’s algorithm. Show
the matrix W at the end of each stage of the outer loop in the algorithm.

3. Explain why this graph has an Euler circuit and then find one.

4. Solve by trial and error the travelling salesperson problem for this graph.

5. Find a planar drawing of this graph, and verify Euler’s polyhedron formula (see Theo-
rem 1.8).

6. Find the chromatic number of this graph.


Section 2.10. Exercises 41

7. How many non-isomorphic, connected, simple graphs are there which have the following
degree sequences?
(a) 1,1,1,1,1,1,2,4,4,4
(b) 1,1,1,1,1,1,3,3,3

8. A set of code words {bcd, aef g, abef, abdf, abc, cdeg} is to be transmitted. Is it possible to
represent each word by one of its letters so that different code words are represented by
different letters? If so, how?

9. A warden wishes to place six prisoners in two cells so that no two prisoners in the same
cell speak a common language.

Prisoner 1 speaks Chinese, Malay and Hebrew.


Prisoner 2 speaks German, Hebrew and Italian.
Prisoner 3 speaks English and Malay.
Prisoner 4 speaks Chinese and Spanish.
Prisoner 5 speaks English and German.
Prisoner 6 speaks Italian and Spanish.

Can the warden achieve his aim? If so, how?

10. Show that if each girl knows at least 3 boys, and if each boy is known by no more than 3
girls, then every girl can marry a boy she knows.
Chapter 3

Trees

In mathematics and computer science we often encounter precedence relations and hierarchical
structures. Trees are a type of graph especially suited to representing such structures.

3.1 Definition and essential features


A tree is a connected, acyclic, undirected graph. In Figure 3.1 below, G1 and G2 are trees, but
G3 and G4 are not. G3 has a cycle and G4 is not connected.

Figure 3.1: Trees and Non-trees

Since a tree is acyclic, it can have no loops and no parallel edges, i.e., it must be simple. The
following theorem gives some conditions which are necessary and sufficient for a simple graph
to be a tree.

Theorem 3.1 For a simple graph G the following are equivalent:

(i) G is a tree.

(ii) Any two vertices are joined by exactly one simple path.

(iii) G is connected, but will not be if any edge is removed.

(iv) G is acyclic, but will not be if any edge is added.

(v) G is connected and |V (G)| = |E(G)| + 1, where |V (G)| is the number of vertices of G and
|E(G)| is the number of edges.

42
Section 3.2. Spanning trees 43

3.2 Spanning trees


A subgraph of any graph G is a graph obtained by removing some of the vertices and some of
the edges of G. (Of course if any vertex v is removed then any edge joined to v must also be
removed.)
A subgraph of a graph G, which contains all the vertices of G and is a tree, is called a
spanning tree for G. Only connected graphs have spanning trees.
The weight of a spanning tree or of any subgraph of a weighted graph is the sum of the
weights of its edges. A minimal spanning tree is a spanning tree with minimal weight. A
graph may have several minimal spanning trees.
A spanning tree in a graph G contains just enough of the edges of G to connect all vertices in
G. If the weights of the edges represent the costs of using those edges, then a minimal spanning
tree is the cheapest way of connecting all of the vertices.

3.2.1 Algorithms for spanning trees


Every connected graph has a spanning tree, and a hence minimal spanning tree. We shall look
at two algorithms for finding minimal spanning trees. The first of these was introduced by R.C.
Prim in 1957.
Prim’s algorithm begins by choosing an edge of smallest weight in a graph G, and uses it
and the vertices that it joins to start a subgraph T which is a tree. The vertices of T are stored
in a list V 0 , and the edges of T are stored in a list E 0 . At each stage the algorithm chooses an
edge of smallest weight that joins a vertex in T to one not in T , and so T will continue to be a
tree.

 If G is connected, then T will eventually contain all vertices of G and so will span G. The
number w (determined as part of the algorithm) is the sum of the weights of T . It can be
shown that w is in fact the weight of any minimal spanning tree.

 If G is not connected, then T will span just some of its vertices (not all).

The second algorithm was introduced by J.B. Kruskal in 1956. Kruskal’s algorithm works
through a list of edges, sorted into increasing weight order, and adds edges which do not introduce
cycles. The subgraphs produced along the way are acyclic, but not necessarily connected.
However if the graph is connected then the algorithm finally produces a spanning tree. It is
possible to show that this spanning tree has minimal weight.
Section 3.2. Spanning trees 45

Figure 3.4: A Connected Weighted Graph and A Minimal Spanning Tree

A minimal spanning tree is shown on the right. Both Prim’s algorithm and Kruskal’s algo-
rithm produce the same minimal spanning tree, but the edges are obtained in different orders.
The edge lists of the minimal spanning tree are as follows.

Edge list obtained by Prim’s algorithm : e1 , e4 , e2 , e3 , e5 , e11 , e8


Edge list obtained by Kruskal’s algorithm : e1 , e2 , e3 , e4 , e5 , e8 , e11

Each of the algorithms is a greedy algorithm, because optimal choices are made at each
stage. We shall see that the greedy approach is not always the best strategy for solving optimi-
sation problems associated with graphs. There are some problems where choosing the best at
each stage does not guarantee that an overall optimal solution is obtained.
The initial search for an edge of minimal weight in Prim’s algorithm clearly takes order n
steps where n is the number of vertices. More significant is the second stage, where for each of
n vertices there as many as order of n2 comparisons to be made, one for each edge (so up to n2


comparisons), meaning the algorithm has efficiency order of n3 . The same is true of Kruskal’s
algorithm, although the reasoning is more difficult. Thus finding minimal spanning trees has
polynomial complexity.
The algorithms can be used for purposes other than finding minimal spanning trees. If we are
just looking for a spanning tree for an unweighted graph, then we can assign arbitrary weights
to the edges and then set either Prim’s or Kruskal’s algorithm to work. Any minimal spanning
tree that is found will of course be a spanning tree of the original unweighted graph.
They can also be used to test connectivity of a graph. To do this add enough edges to
a graph G to ensure that G is connected. (For example, we might join one particular vertex
to each of the other vertices.) However, we distinguish the “artificial” edges from the original
edges by assigning huge weights to “artificial” ones. Next we examine the weight of a minimal
spanning tree found using one of the algorithms. If this is huge, then we know that the minimal
spanning tree uses one of the artificial edges. This implies that there is no spanning tree which
uses just the original edges, and hence G is not connected.
The following MATLAB program prim.m implements Prim’s algorithm for the weighted
graph in Figure 3.4 (see Figure 3.7 for the weights). Note that the program works by successively
46 Chapter 3. Trees

finding the minimum element in the submatrix of the edge weight matrix W defined by taking
the columns corresponding to the current tree vertices and rows corresponding to the remaining
vertices. This way it finds the shortest edge joining a current tree vertex to a remaining vertex.

% Construct a minimal spanning tree for an undirected weighted graph


% using Prim’s algorithm
clear
format compact
% Enter the edge weight matrix
disp(’Edge weight matrix: ’)
W=[ 0 14 8 12 11 Inf Inf Inf;
14 0 Inf 7 2 3 Inf Inf;
8 Inf 0 Inf Inf Inf 13 Inf;
12 7 Inf 0 Inf Inf 9 1;
11 2 Inf Inf 0 Inf 5 4;
Inf 3 Inf Inf Inf 0 Inf 10;
Inf Inf 13 9 5 Inf 0 6;
Inf Inf Inf 1 4 10 6 0]
[m,n]=size(W);
if m ~= n
disp(’Edge weight matrix must be square’)
return
end
if W’ ~= W
disp(’Edge weight matrix must be symmetric’)
return
end
% Make diagonal infinite to simplify minimisation
W=W+diag(Inf*ones(1,n));
% Find initial edge of minimum weight
[wjmins,jmins]=min(W); % Find min values in each column
[wijmin,imin]=min(wjmins);
jmin=jmins(imin); % Minimum values are W(imin,jmin) and W(jmin,imin)
disp(’Initial edge is: ’)
disp([imin jmin])
% Initialise list of tree vertices T and tree edges TE
T=[imin jmin];
TE=[imin jmin];
% Find remaining vertices Tprime
Tprime=1:n;
Tprime(find(Tprime==imin))=[];
Tprime(Tprime==jmin)=[];
% Begin iteration
while length(Tprime) > 1
% Find next edge of minimum weight by
% finding minimum value in submatrix Wsub
Wsub=W(Tprime,T);
[wjminssub,jminssub]=min(Wsub);
[wijminsub,iminsub]=min(wjminssub);
jminsub=jminssub(iminsub);
imin=T(iminsub);
Section 3.2. Spanning trees 47

jmin=Tprime(jminsub);
% Remove vertex no. jmin from Tprime
Tprime(find(Tprime==jmin))=[];
% Add vertex no. jmin to T and edge [imin jmin] to TE
T=[T jmin];
disp(’Current edge list in minimal spanning tree: ’)
TE=[TE; imin jmin];
disp(TE)
end
% Find last edge of minimum weight by
% finding minimum value in row vector Wsub
Wsub=W(Tprime,T);
[wijminsub,iminsub]=min(Wsub);
imin=T(iminsub);
jmin=Tprime;
% Add vertex no. jmin to T and edge [imin jmin] to TE
T=[T jmin];
disp(’Final edge list in minimal spanning tree: ’)
TE=[TE; imin jmin];
disp(TE)

The output of this program is shown below. Note that the final edge list for the minimal
spanning tree is the same as given above.

Edge weight matrix:


W =
0 14 8 12 11 Inf Inf Inf
14 0 Inf 7 2 3 Inf Inf
8 Inf 0 Inf Inf Inf 13 Inf
12 7 Inf 0 Inf Inf 9 1
11 2 Inf Inf 0 Inf 5 4
Inf 3 Inf Inf Inf 0 Inf 10
Inf Inf 13 9 5 Inf 0 6
Inf Inf Inf 1 4 10 6 0
Initial edge is:
4 8
Current edge list in minimal spanning tree:
4 8
8 5
Current edge list in minimal spanning tree:
4 8
8 5
5 2
Current edge list in minimal spanning tree:
4 8
8 5
5 2
2 6
Current edge list in minimal spanning tree:
4 8
8 5
48 Chapter 3. Trees

5 2
2 6
5 7
Current edge list in minimal spanning tree:
4 8
8 5
5 2
2 6
5 7
5 1
Final edge list in minimal spanning tree:
4 8
8 5
5 2
2 6
5 7
5 1
1 3

3.3 Rooted trees


In many applications of trees there is a special vertex which we call the root. Once a root is
specified it is possible to give a direction to each edge by directing each edge away from the root.
A tree, together with its root, produces a directed tree which is usually called a rooted tree.
We usually draw a rooted tree with its root at the top and with its edges directed downwards.
We don’t always bother to indicate the direction of the edges with arrows. It is assumed that
all edges are directed downwards and away from the root. Note that different choices of the
root produce different rooted trees. A tree, and two rooted trees obtained by choosing different
vertices as the roots, is shown Figure 3.5 below.

Figure 3.5: A Tree And Two Rooted Trees

The terminology for trees has botanical and genealogical origins. Suppose that T is a rooted
tree and that r is its root. If v is any vertex in T other than r, then there is a unique edge in
T which finishes at v. If this edge starts at u then we say that u is the parent of v and that v
is a child of u. Vertices which have the same parent are called siblings. The ancestors of v
are the vertices (other than v itself) on the unique path from the root r to the vertex v. The
descendants of v are those vertices which have v as an ancestor. A vertex which has children
50 Chapter 3. Trees

We apply Dijkstra’s algorithm to the graph shown on the left in Figure 3.7 below to find
shortest paths from the vertex s. The tree of shortest paths and the shortest distances are shown
on the right. Vertices are added to the list L in increasing order of distance away from vertex s.

Figure 3.7: A tree of shortest paths

The first and last loops in Dijkstra’s algorithm are order of n, while the middle one is order
of n2 since for each vertex there is an order of n loop, so n2 is the order of the whole algorithm.
The following MATLAB program dijkstra.m implements Dijkstra’s algorithm for the graph
on the left in Figure 3.7.

% Find the tree of shortest paths in a weighted graph using Dijkstra’s algorithm
clear
format compact
% Enter the edge weight matrix
disp(’Edge weight matrix: ’)
W=[ 0 14 8 12 11 Inf Inf Inf;
14 0 Inf 7 2 3 Inf Inf;
8 Inf 0 Inf Inf Inf 13 Inf;
12 7 Inf 0 Inf Inf 9 1;
11 2 Inf Inf 0 Inf 5 4;
Inf 3 Inf Inf Inf 0 Inf 10;
Inf Inf 13 9 5 Inf 0 6;
Inf Inf Inf 1 4 10 6 0]
[m,n]=size(W);
if m~=n
disp(’Edge weight matrix must be square’)
return
end
L=1; % starting vertex for shortest path
for j=1:n
P(j)=0;
D(j)=W(j,1);
if D(j) < Inf
P(j)=1;
end
end
Section 3.3. Rooted trees 51

Lnew=L;
searchset=1:n;
while length(L) < n
% Find search set for next vertex
searchset(find(searchset==Lnew))=[];
[Dmin,kind]=min(D(searchset));
k=searchset(kind); % vertex k is next closest to vertex 1
Lnew=k;
L=[L Lnew];
for j=2:n
if all(j~=L)
% For all j not in L find
% D(j) = current shortest distance from vertex 1 to vertex j
if D(j) > D(k)+W(k,j)
D(j)=D(k)+W(k,j);
P(j)=k;
end
end
end
end
disp(’Vertex list in order of increasing distance from vertex 1: ’)
disp(L)
disp(’Shortest distance from vertex 1 to each vertex: ’)
disp(D(L))
E=[];
for j=L
if j > 1 & P(j) ~= 0
E=[E; [P(j) j]];
end
end
disp(’Edge list in tree of shortest paths: ’)
disp(E)

The output of the program is shown below. Note that the vertex list on the tree is computed
in order of increasing distance from vertex 1.

Edge weight matrix:


W =
0 14 8 12 11 Inf Inf Inf
14 0 Inf 7 2 3 Inf Inf
8 Inf 0 Inf Inf Inf 13 Inf
12 7 Inf 0 Inf Inf 9 1
11 2 Inf Inf 0 Inf 5 4
Inf 3 Inf Inf Inf 0 Inf 10
Inf Inf 13 9 5 Inf 0 6
Inf Inf Inf 1 4 10 6 0
52 Chapter 3. Trees

Vertex list in order of increasing distance from vertex 1:


1 3 5 4 2 8 6 7
Shortest distance from vertex 1 to each vertex:
0 8 11 12 13 13 16 16
Edge list in tree of shortest paths:
1 3
1 5
1 4
5 2
4 8
2 6
5 7

Exercise Extend the program above to find the shortest distance from vertex 1 to a particular
vertex that is entered by the user.

3.3.2 Tree searches


The problem of finding efficient methods of searching for items in a list has received much
attention in computer science. In the simplest searching procedure we simply start at the
beginning of the list and work our way through it, one item at a time, until either we find the
item or we reach the end. Such a search will be quick if the item we are looking for happens to
be near the front of the list, but we won’t always be so lucky. If there are n items in the list,
then we expect on average to make n/2 comparisons before we find the item we want.
More efficient searching methods are available if the list is arranged as a rooted tree. When
we search a tree we start at the root and work out towards the leaves. At each stage we examine
the item at the current vertex, and if we don’t find what we are looking for then we move to one
of the children. The search stops when we find what we want or when we have reached a leaf.
To examine tree searches more carefully we need some more terminology. The level of a
vertex v in a rooted tree T is the length of the path from the root r to v. (We say that the level
of r is 0.) The height of T is the maximum of the levels of its vertices. The maximum number
of comparisons that may be needed in a tree search is determined by the height of the tree.
When searching a tree we move from a vertex to one of its children, and we have to decide
which child to choose. This will be easy if there are few children, or if the children are arranged
in order. The degree of a rooted tree is the maximum number of children of any of its
vertices.
In an ordered rooted tree the children of each parent are linearly ordered. The order
of the children is usually shown from left to right in any tree diagram. A binary tree is an
ordered, rooted tree of degree 2. In an ordered binary tree the first child is called the left child
and the second child is called the right child.
The average search of a tree will be quick if the height and the degree of the tree are small.
But this limits the size of the tree. The largest trees of given height and given degree are those
in which internal vertices all have the same number of children, and all leaves are at the same
level. Such trees are called full trees.
Section 3.3. Rooted trees 53

Figure 3.8: Rooted Trees With Height 3 And Degree 2

Each of the trees in Figure 3.8 above is a binary tree with height 3, but only T3 is full.
A simple counting argument produces the following estimate on the number of vertices of
a tree.

Theorem 3.2 A rooted tree with height k and degree h has no more than

hk+1 1
1 + h + h2 . . . + hk = ≈ hk vertices
h−1
and this upper bound is attained if and only if the tree is full.

This means that if a full tree has n vertices then k ≈ logh n. For all except very small values
of h and k, this number is considerably smaller than n/2. So searches of full trees are generally
much quicker than linear searches of lists.
Binary search trees are commonly used for arranging lists of items which can be linearly
ordered. The items might, for example, be numbers or words (ordered with the dictionary
order). The characteristic feature of a binary search tree is that any left child (together with
all its descendants) precedes its parent, and any right child (together with all its descendants)
follows its parent. A binary search tree is shown below in Figure 3.9.

Figure 3.9: A Binary Search Tree


Section 3.3. Rooted trees 55

The following MATLAB program bintree.m implements the search algorithm for the binary
search tree in Figure 3.9 with alphabetical ordering. Note that when entering the list of items
and edges of the tree, the children of each parent must be ordered from left to right. For example,
in the list E of edges in the program, vertex number 1 (i.e., Gauss) has children vertex numbers
2 and 8 (i.e., Euler and Newton), which are in the correct (alphabetical) order in E.

% Search algorithm for a binary search tree


clear
format compact
% Enter items i.e. names of the vertices as a string vector
% where vert(j) is the name of vertex j and vert(1) is the root
vert=char(’Gauss’, ’Euler’, ’Archimedes’, ’Fourier’, ’Boole’, ’Bernoulli’,...
’Cauchy’, ’Newton’, ’Leibniz’, ’von-Neumann’)
[n,k]=size(vert); % n is the number of items i.e. vertices
% Enter tree edges as a 2 times n 1 matrix
% with columns [P(j) j]’ where j=2,...,n (excluding root vertex 1)
% and P(j) is the vertex preceding vertex j in the tree
E=[1 2 2 3 5 5 1 8 8;
2 3 4 5 6 7 8 9 10]
[mE,nE]=size(E);
if nE ~= n-1
disp(’The matrix E should have n-1 columns’)
return
end
% Enter search item
x=input(’Enter search item as a string in quotes: ’);
Z=[]; % the empty message
v=1; % start search at root
while isempty(Z) == 1
currvert=strtok(vert(v,:)); % removes blanks from string in matrix vert
if size(currvert) == size(x)
if currvert == x
Z=’FOUND’;
if v == 1
disp([’Item ’ x ’ is found as the root’])
else
parent=E(1,find(E(2,:)==v));
parentvert=vert(parent,:);
disp([’Item ’ x ’ is found as a child of ’ parentvert])
end
break
end
end
child=E(2,find(E(1,:)==v));
disp([’Is ’ x ’ less than ’ currvert ’ in alphabetical order?’])
xless = input(’Enter 1 for yes or 0 for no: ’);
56 Chapter 3. Trees

if xless == 1
if length(child) > 0
v=child(1);
else
Z=’NOT FOUND’;
disp(’Item not found’)
disp([’Place item ’ x ’ as left child of ’ vert(v,:)])
end
else
if length(child) > 0
v=child(length(child));
else
Z=’NOT FOUND’;
disp(’Item not found’)
disp([’Place item ’ x ’ as right child of ’ vert(v,:)])
end
end
end % end while

For the search item Boole the program output is shown below.

vert =
Gauss
Euler
Archimedes
Fourier
Boole
Bernoulli
Cauchy
Newton
Leibniz
von-Neumann
E =
1 2 2 3 5 5 1 8 8
2 3 4 5 6 7 8 9 10
Enter search item as a string in quotes: ’Boole’
Is Boole less than Gauss in alphabetical order?
Enter 1 for yes or 0 for no: 1
Is Boole less than Euler in alphabetical order?
Enter 1 for yes or 0 for no: 1
Is Boole less than Archimedes in alphabetical order?
Enter 1 for yes or 0 for no: 0
Item Boole is found as a child of Archimedes
Section 3.4. Vertex Traversals 57

For the search item Turing the program output is shown below.

vert =
Gauss
Euler
Archimedes
Fourier
Boole
Bernoulli
Cauchy
Newton
Leibniz
von Neumann
E =
1 2 2 3 5 5 1 8 8
2 3 4 5 6 7 8 9 10
Enter search item as a string in quotes: ’Turing’
Is Turing less than Gauss in alphabetical order?
Enter 1 for yes or 0 for no: 0
Is Turing less than Newton in alphabetical order?
Enter 1 for yes or 0 for no: 0
Is Turing less than von-Neumann in alphabetical order?
Enter 1 for yes or 0 for no: 1
Item not found
Place item Turing as left child of von-Neumann

Exercise Run the program to search for each of Bernoulli and Hilbert. Also modify the
program to search the binary tree in Exercise 4 at the end of this chapter.

The search algorithm certainly has no worse than order of n2 efficiency, where n is the
number of vertices in the final tree, since the final vertex to be positioned in the tree will need
no more than n 1 comparisons, the second to last will require no more than n 2, and so on,
giving a total of at most n2 comparisons.


3.4 Vertex Traversals


In the preceding section we examined ways of turning a list into a tree. Now we do the reverse.
A tree traversal algorithm is an algorithm for listing (visiting or searching) all the vertices
of an ordered rooted tree. The three most commonly used ones are described below.

3.4.1 Preorder Traversals


In a preorder traversal, roots are listed first and then subtrees (if any) are listed in the order
of their roots. A preordering algorithm is given in Figure 3.12 below.
Section 3.4. Vertex Traversals 59

while containstree == 1
containstree=0;
newlist=[];
for j=1:length(list)
if list(j) < 0
containstree=1;
v=-list(j);
child=E(2,find(E(1,:)==v));
newlist=[newlist v child];
else
newlist=[newlist list(j)];
end
end
list=newlist;
end
disp(’Preorder list of items is:’)
disp(vert(list,:))

The output of the program is shown below.

vert =
Gauss
Euler
Archimedes
Fourier
Boole
Bernoulli
Cauchy
Newton
Leibniz
von-Neumann
E =
1 2 2 3 5 5 1 8 8
2 3 4 5 6 7 8 9 10
Preorder list of items is:
Gauss
Euler
Archimedes
Boole
Bernoulli
Cauchy
Fourier
Newton
Leibniz
von-Neumann

Exercise Modify the program preorder.m to find the preorder listing of each of the ordered
rooted trees in Figure 3.5. Check that your computed listing is the same as given above.
Section 3.4. Vertex Traversals 61

The inorder algorithm applied to the first rooted tree in Figure 3.5 (the only binary tree)
produces the following vertex lists:

T (a) T (d), a T (b), d, T (c), a b, d, T (e), c, T (f ), a b, d, e, c, f, a

3.4.4 Uniqueness properties of traversals


Can we recover an ordered, rooted tree from any of the standard orderings of its vertices? In
general the answer is no: different ordered rooted trees have the same ordering. However, there
are some interesting special cases. Before we can discuss this further we need to agree when two
such trees are the “same”, i.e., isomorphic, and when they are “different”.
Isomorphisms between mathematical structures are one-to-one correspondences which “pre-
serve” the key features of those structures. Ordered, rooted trees are quite complicated struc-
tures, and so the corresponding isomorphisms must be defined with care. It is best if we deal
with isomorphisms between simpler types of trees first. Recall that we discussed isomorphisms
for undirected simple graphs in Chapter 1.
A tree isomorphism has the same properties as an isomorphism between undirected simple
graphs. It is simply a one-to-one correspondence between the vertices of two trees T and T 0
with the property that there is an edge between two vertices u and v in T if and only if there is
an edge between the corresponding vertices u0 and v 0 in T 0 .
A rooted tree isomorphism is a tree isomorphism between rooted trees T and T 0 , with
the additional property that the roots must match, i.e.,

if r is the root of T then the corresponding vertex r0 is the root of T 0 .

An ordered rooted tree isomorphism is a rooted tree isomorphism between ordered rooted
trees T and T 0 , with the additional property that it must preserve the order between siblings.
This means that

if w1 , w2 , . . . , wk are the children of some vertex v in T , listed in order, then the


children of the corresponding vertex v 0 in T 0 , listed in order, are w10 , w20 , . . . , wk0 .

The following example should illustrate the rather subtle distinctions between these concepts.

Figure 3.15: Tree Isomorphisms


62 Chapter 3. Trees

The three graphs in Figure 3.15 are isomorphic as trees. You should verify this by making
the correct correspondences between their vertices.
The trees T 0 and T 00 are also isomorphic as rooted trees because we can find a tree isomor-
phism between them which matches the roots r0 and r00 . However there is no tree isomorphism
between T and T 0 which matches r with r00 . (Can you verify this?) So T is not isomorphic to
T 0 (or to T 00 ) as a rooted tree.
Any rooted tree isomorphism between T 0 and T 00 must match s0 with s00 and t0 with t00 .
00 00
Because s0 precedes t0 as a child of r0 , whereas s follows t , there can be no ordered, rooted
tree isomorphism between T 0 and T 00 . So no two of T , T 0 and T 00 are isomorphic as ordered,
rooted trees.
In general none of the standard vertex traversals is enough to determine an ordered, rooted
tree. There are simple examples of non isomorphic, ordered, rooted trees which have the same
preorder listings, others which have the same postorder listings, and non-isomorphic, ordered
binary trees which have the same inorder listings.

Figure 3.16: Non-isomorphic Ordered, Rooted Trees

The preordering of both T1 and T2 in Figure 3.16 above is 1, 2, 3, and the postordering of
both T2 and T3 is 2, 3, 1. The inordering of both T4 and T5 is 1, 2, 3, 4, 5. Note that T4 and T5
are isomorphic as rooted trees, but not as ordered, rooted trees.
The situation changes if we specify the number of children of each vertex.

Theorem 3.3 Let S and T be ordered rooted trees whose preorder (or postorder) listings of
the vertices are s1 , s2 , . . . , sn and t1 , t2 , . . . tn , respectively. Suppose also that, for each i, si has
the same number of children as ti . Then the correspondence s1 ↔ t1 , s2 ↔ t2 , . . . , sn ↔ tn , is
an ordered, rooted tree isomorphism between S and T .

The same cannot be said about inordering. The binary trees T4 and T5 in Figure 3.16 are
non-isomorphic as ordered rooted trees, but they have the same inordering of their vertices.
Furthermore, in each case 2 and 4 are internal vertices with two children each, and 1, 3, and 5
are leaves.
Section 3.5. Key Words 63

3.5 Key Words


A list of keywords, concepts and algorithms appears below. Use it to focus your revision of the
material covered in this chapter.
64 Chapter 3. Trees

ancestor, 48 tree of shortest paths, 49


tree traversal, 57
binary search tree, 53
binary tree, 52 weight of a spanning tree, 43

child, 48

degree of a rooted tree, 52


descendant, 48
Dijkstra’s Algorithm, 49

full tree, 52

greedy algorithm, 45

height, 52

inorder traversal, 60
internal vertex, 49

Kruskal’s algorithm, 43

leaf, 49
level, 52

minimal spanning tree, 43

ordered rooted tree, 52


ordered rooted tree isomorphism, 61

parent, 48
postorder traversal, 60
preorder traversal, 57
Prim’s algorithm, 43

root, 48
rooted tree, 48
rooted tree isomorphism, 61

sibling, 48
spanning tree, 43
subgraph, 43
subtree, 49

tree, 42
tree isomorphism, 61
Section 3.6. Exercises 65

3.6 Exercises
Questions 1 and 2 refer to the following weighted graph.

1. (a) Use both Prim’s algorithm and Kruskal’s algorithm to find a minimal spanning tree
for the graph.
(b) What is the weight of any minimal spanning tree?

2. Regard the edge weights as distances, and find the tree of shortest paths from the vertex s.
Show your working, and give the shortest distances from s to the other vertices.

3. For the rooted tree shown below

(a) Give a preorder listing of its vertices.


(b) Give its height and degree.

4. Construct a binary search tree from the following list of numbers:

437, 56, 621, 143, 342, 548, 21, 298, 793, 313, 47, 246, 610, 98

5. (a) Find as many non-isomorphic trees with 5 vertices as you can.


(b) Find as many non isomorphic rooted trees with 5 vertices as you can.
(c) Find as many non-isomorphic ordered rooted trees with 5 vertices as you can.
66 Chapter 3. Trees

6. True or false? Justify your answers briefly.

(a) Given any two edges e1 and e2 of a simple connected graph, there is a spanning tree
which contains e1 and e2 .
(b) Given any three edges e1 , e2 and e3 of a simple connected graph, there is a spanning
tree which contains e1 , e2 and e3 .
(c) Any two spanning trees of a simple connected graph have a common edge.

7. Prove that if a tree with n vertices has a path of length n 1, then the degree of any
vertex is no more than 2.

8. Prove that if a tree has a vertex of degree 3, then there are at least three vertices of
degree 1.

9. Explain why every tree is planar.

10. Show that Kn has at least two spanning trees with no edges in common if and only if n ≥ 4.
Chapter 4

Coding Theory

Coding theory is the study of methods for efficient and accurate transfer of information from
one place to another. The aim of this section is to give an introduction to algebraic coding
theory. This will involve elements of combinatorics, algebra, probability, and number theory.
Note that some more advanced material (including some mathematical concepts and proofs
of theorems) on coding theory has been included in this chapter (particularly in Section 4.3 on
linear codes), both for the sake of completeness and to give students an idea of what concepts
underpin the development of codes. However, not all of this advanced material is required
knowledge for this unit. The examples and exercises in this chapter demonstrate the key concepts
that students are expected to understand about codes.

4.1 Introduction

Coding theory has many uses, such as data transfer over the Internet or from memory to CPU
in a computer.

When transferring information from one place to another the information passes through a
channel (e.g., a phone line, cable, or the atmosphere). Errors in the transferred information
occur due to “noise” on the channel, i.e., undesirable random disturbances that may cause
information received to differ from information transmitted.

Coding theory primarily deals with the problem of detecting and correcting transmission
errors caused by noise on the channel. The main goals are to provide: fast encoding of informa-
tion; easy transmission of encoded messages; fast decoding of received messages; detection and
correction of errors introduced in the channel; and maximum transfer of information per unit
time. We will mainly focus on error detection and correction.

67
Section 4.1. Introduction 69

(To guarantee that a message will always break up into blocks of the same size, we can always
append a string of copies of the same symbol, such as Z or ZZZ.)
One way to encode blocks is via matrices. For instance, if we replace each letter in A by the
number representing its position in the alphabet A (with Z assumed to have position 0), then
blocks of length 2 correspond to “vectors” in Z226 := Z26 × Z26 = {(x1 , x2 ) | x1 , x2 ∈ Z26 }, where
Z26 := {0, 1, 2, . . . , 25}. (In general, for any integer k ≥ 2, we define Zk := {0, 1, 2, . . . , k 1}.)
In our example, the block DE corresponds to (4, 5) ∈ Z226 .
Now, taking any 2 × 2 non-singular (i.e., invertible) matrix M , such as
!
2 3
M= ,
1 2

we encode each block x = (x1 , x2 ) as a block xM , i.e., we use the encoding rule

ψ(x) = xM.

For example, the block DE (which corresponds to (4, 5)) is encoded as


!
2 3
(4, 5)M = = (13, 22).
1 2

So the original message gets encoded as

13, 22, 47, 78, 47, 75, 60, 100, 24, 43, 35, 57, 36, 60, 33, 57, 32, 50, 42, 69, 25, 38, 55, 92.

Such a procedure is called a matrix encoding. It is highly efficient, and moreover decoding is
easy. To decode, we break an encoded message into blocks y and find x so that xM = y.
We can unambiguously decode provided the matrix M has an inverse (in which case the
encoding rule ψ is one-to-one), so that x = yM −1 . In our example,
!
2 3
M −1 = .
1 2

So the encoded block (60, 100), for instance, is decoded as


!
2 −3
(60, 100) = (20, 20) ↔ T T.
1 2

Now suppose that, after encoding and transmission, we receive the block (3, 4). Then decoding
gives !
2 3
(3, 4) = (2, 1)
−1 2
which does not correspond to any message block. So we have a detected that an error occurred
during transmission, but we cannot correct this error.

The above example referred to Z26 (known as the ring of integers modulo 26) in which
arithmetic is done modulo 26. Later, we will need to do arithmetic modulo 2 with Z2 = {0, 1},
so let us make the notions of Zn and arithmetic modulo n more concrete now.
Section 4.1. Introduction 71

 For any integer m ≥ 1, Am denotes the set of all words of length m over A.

 Any subset C ⊆ A+ in which every symbol in A appears in at least one word in C is called
a q-ary code. The elements of C are called codewords.

 A q-ary code in which each codeword has the same length, n say, is called a block code of
length n.

Note that 2 ary codes are called binary codes, and are usually based on the alphabet {0, 1}.
From now on, we will restrict our attention to binary block codes, so the word “code” will refer
to a binary block code, unless otherwise stated. This may appear to be too restrictive because
people rarely communicate with each other in messages consisting of binary strings. However,
the binary set {0, 1} is appropriate for communication between machines, precisely because it
has just two symbols. So in order to transmit mechanically a message, such as the text of a
book written in English, it is usually necessary to first translate that into binary strings.

Remark 4.1 Hereafter, we will also assume the following.

 Each binary codeword is transmitted by sending its digits, one at at time and in order.

 The only possible errors that can occur in transmission are interchanges of the digits 0
and 1. Other errors, such as addition or deletion of digits will be disregarded.

 Transmission is over a binary symmetric channel (BSC) a binary channel over which
the probability p of an error occurring in transmission (i.e., the probability p of switching
0 to 1 or switching 1 to 0) is the same for each digit of the sent message, independent of
any previous errors that may have occurred. Moreover, we assume that 0 ≤ p < 0.5.

 If we ever receive a word that is not a codeword, then we have detected an error. Depending
on the code, we may also be able to correct the error (i.e., correctly guess the codeword
that was actually sent), which is useful in situations when retransmission is not possible.

I Example 4.2 Suppose that we send the string w = 0101100 ∈ {0, 1}7 . This string can be
identified with the element w = (0, 1, 0, 1, 1, 0, 0) of the Cartesian product

Z72 = Z2 × Z2 × · · · × Z2 .
| {z }
7

Suppose now that the received string is v = 0111101 6= w. As above, we can identify this string
with the element v = (0, 1, 1, 1, 1, 0, 1) of Z72 .

Notice that two errors have occurred in transmission: one at position 3 and the other at
position 7. We can think of the “error” as e = (0, 0, 1, 0, 0, 0, 1) ∈ Z72 , where an entry 1 indicates
an error in transmission (so we know that the 3rd and 7th entries have been incorrectly received,
72 Chapter 4. Coding Theory

whilst all other entries have been correctly received). Now, if we interpret w, v, and e as elements
of Z72 with coordinate-wise addition modulo 2, then we have

w + e = v,
w + v = e,
v + e = w.

Assuming we are transmitting the message over a binary symmetric channel with probability
p of error, the probability of receiving the “error” e = (0, 0, 1, 0, 0, 0, 1) is

(1 p)2 p(1 p)3 p = p2 (1 p)5 .

4.1.3 Probability of Error


More generally, suppose that we send the string w = w1 · · · wn ∈ {0, 1}n and the received string
is v = v1 v2 · · · vn ∈ {0, 1}n . We identify the strings w and v with the elements w = (w1 , · · · , wn )
and v = (v1 , v2 , . . . , vn ) of Zn2 . The “error” e = (e1 , . . . , en ) ∈ Zn2 is defined by e = w + v (so
that w +e = v), where we interpret w, v, and e as elements of Zn2 with coordinate-wise addition
modulo 2. (Note also that v + e = w.)
From now on, we will make no distinction between the strings w, v, e ∈ {0, 1}n and their
corresponding elements w, v, e ∈ Zn2 , and abusing notation, we will write w, v, e ∈ Zn2 and
e = w + v to mean w, v, e ∈ Zn2 and e = w + v, respectively.

Theorem 4.1 The probability that exactly k errors occur when transmitting a string w ∈ Zn2
over a binary symmetric channel with probability p of error is
 
n k
p (1 − p)n−k .
k
n!
= nk error patterns e ∈ Zn2 containing exactly k 1’s, and the

Proof. There are k!(n−k)!
probability of any such error occurring is pk (1 p)n−k . 

This theorem shows that, under the assumptions in Remark 4.1, the number of transmission
errors has a binomial distribution.
If p is sufficiently small, then the chance of multiple errors will be negligible, and this possi-
bility can reasonably be ignored. If, for example, p = 10−5 and n = 10, then the probability of
making 1 error ' 10−4 , whereas the probability of making 2 or more errors ' 4 × 10−9 .

I Example 4.3 When transmitting strings of length 7 over a binary symmetric channel with
p = 0.01, the probability of no errors is

(1 0.01)7 = 0.932065,

the probability of one error is


 
7
0.01(1 0.01)6 = 0.065904,
1
Section 4.1. Introduction 73

and the probability of two errors is


 
7
0.012 (1 0.01)5 = 0.001997.
2

4.1.4 (n, m) binary block codes


Suppose that we wish to transmit strings in Zm
2 . One way to decrease the probability of error
is to use extra digits, as follows.

1. Encoding process: We add, or concatenate, extra digits to each string in Zm 2 (of length
m) in order to make it a string in Zn2 (of length n). More precisely, we use a 1-1 encoding
function ψ : Zm n m
2 → Z2 to obtain the set C = ψ(Z2 ) of binary codewords of length n.

2. Suppose now that w ∈ Zm n


2 and that c = ψ(w) ∈ Z2 . Suppose further that during trans-
mission, the string c ∈ Zn2 is received as τ (c). As errors may occur during transmission, τ
is not a function.

3. Decoding process: Upon receipt of the transmitted message τ (c), we now want to decode
it (with the hope that it is actually c) to recover the original message w. This is known
as the decoding process, and is represented by a 1 1 function σ : Zn2 → Zm2 .

4. Ideally, the composition σ◦τ ◦ψ should be the identity function. As this cannot be achieved
(assuming τ (c) 6= c), we hope to find two functions ψ : Zm n n m
2 → Z2 and σ : Z2 → Z2 so that
either (σ ◦ τ ◦ ψ)(w) = w, or the error τ (c) 6= c can be detected and/or corrected with high
probability.

A scheme of the kind described in Steps 1 3 (above) is known as an (n, m) binary block code.
The ratio m/n is known as the rate of the encoding, and measures the efficiency of our scheme.
Naturally, we hope that it will not be necessary to add too many digits during encoding (i.e.,
we hope that we do not need n to be too much larger than m).

I Example 4.4 (Parity Check Code) Consider an (8, 7) binary block code. Define the encoding
function ψ : Z72 → Z82 as follows: for each string w = w1 · · · w7 ∈ Z72 , let ψ(w) = w1 · · · w7 w8 ,
where the extra digit (called the parity check digit) is defined by w8 ≡ w1 + · · · + w7 (mod 2). It
is easy to see that for every w ∈ Z72 , the string ψ(w) always contains an even number of 1’s. It
follows that if there is at most one error in transmission, it will always be detected, but we have
no way to correct it.

I Example 4.5 (Repetition Code) Now consider a (21, 7) binary block code. Define the encoding
function ψ : Z72 → Z21 7
2 as follows: for each string w = w1 · · · w7 ∈ Z2 , let

c = ψ(w) = w1 · · · w7 w1 · · · w7 w1 · · · w7 .

In other words, we repeat the string w1 · · · w7 two more times. After transmission, suppose that
we receive the string
τ (c) = v1 · · · v7 v10 · · · v70 v100 · · · v700 .
74 Chapter 4. Coding Theory

We now use the decoding function σ : Z21 7


2 → Z2 , where

σ(v1 · · · v7 v10 · · · v70 v100 · · · v700 ) = u1 · · · u7 ,

and where, for every j = 1, . . . , 7, the digit uj is equal to the majority of the three entries vj ,
vj0 , and vj00 . It follows that if at most one entry among vj , vj0 , and vj00 is different from wj , then
we still have uj = wj . So we can detect (and correct) at most seven errors when using such a
scheme (called a 3-fold repetition code).

Let us now compare repetition codes and parity check codes. Suppose that we wish to send
a message of 1000 digits over a binary symmetric channel, with the probability of error p = 0.01.

 If there is no encoding, the probability of no errors occurring in transmission is


(1 0.01)1000 ≈ 0.000043.

 If we use a 3-fold repetition code, then a single digit x (0 or 1) is encoded as xxx. The
probability of correctly sending the block xxx is (0.99)3 ≈ 0.970299, and by Theorem 4.1,
the probability of a single error occurring in transmission is 31 (0.01)(0.99)2 ≈ 0.029403.


Therefore, since we can correct single errors, the probability of correctly interpreting the
message xxx, and hence correctly decoding to x, is approximately 0.970299 + 0.029403 =
0.999702. So the probability of correctly decoding the original 1000-digit message is
(0.999702)1000 ≈ 0.742268, which is much greater than 0.000043. Note, however, that
the greater likelihood of error-free transmission comes at a price: we need to send 3000
digits to receive only 1000.

 Now suppose we use a parity check code in the following way. We break the 1000-digit
message into 100 blocks of length 10 and encode each block as an 11 digit block by adding a
parity check digit. The probability of sending an 11-digit block with no errors is (0.99)11 ≈
0.895338, and by Theorem 4.1, the probability of a single error occurring in transmission
of the block is 11 10 ≈ 0.099482. Now, if a single error occurs, we can detect

1 (0.01)(0.99)
it and ask for retransmission. It is therefore reasonable to assume we can eliminate single
errors, and hence the probability of correctly receiving an 11-digit block is approximately
0.895338 + 0.099482 = 0.994820. Thus, the probability of correct transmission of the
original 1000-digit message is (0.994820)100 ≈ 0.594909, which is less than with a 3 fold
repetition code, but much greater than if there was no encoding. Moreover, the “price”
is much less than with the 3-fold repetition code since we need only send 1100 digits to
receive 1000.

Redundancy plays a fundamental role in coding theory. We will be adding extra bits of
information to each word before transmission in order to (hopefully) allow the effect of noise to
be encountered, and yet the correct message to be inferred. The challenge is to add as little extra
information as possible, while still achieving the desired level of error detection and correction.
Section 4.2. Error Detection and Correction 75

4.2 Error Detection and Correction


There are good ways of designing error detecting and correcting codes. Suppose we choose the
encoding function so that in the set of codewords, no two codewords are too “close” or similar.
For example, suppose that the only codewords are

0000000, 0101010, 1010101, and 1111111.

Then it is very unlikely that one could confuse these codewords, and a received message that
is different from any of the above codewords could readily be interpreted as the codeword to
which it is “closest”. For instance, if we received the string 0101011, we would interpret it as
the codeword 0101010 (where an error occurred at the last position).

4.2.1 Hamming Metric

We need an efficient way of finding which codeword is closest to a received word. This is
where Hamming weights and Hamming distances come into play (named after the American
mathematician Richard Hamming).

Definition. Suppose that x = x1 · · · xn ∈ Zn2 . Then the weight (or Hamming weight) of x
is given by
n
X
ω(x) = |{j = 1, . . . , n | xj = 1}| = xj .
j=1

That is, ω(x) denotes the number of 1’s amongst the digits of x.

Definition. Suppose that x = x1 · · · xn ∈ Zn2 and y = y1 · · · yn ∈ Zn2 . Then the distance (or
Hamming distance) between x and y is given by

d(x, y) = |{j = 1, . . . , n | xj 6= yj }|.

That is, d(x, y) denotes the number of positions in which the strings x and y differ.

For example, ω(110101) = 4 and d(110101, 000010) = 5.

Lemma 4.2 Suppose that x, y ∈ Zn2 . Then d(x, y) = ω(x + y).

Proof. Let x = x1 · · · xn and y = y1 · · · yn . We simply note that xi 6= yi if and only if xi + yi = 1


modulo 2 (since 0 + 0 = 0 and 1 + 1 = 0 modulo 2). Thus,

X n
X
d(x, y) = 1= (xi + yi ) = ω(x + y). 
xi 6=yi i=1

Lemma 4.3 Suppose that x, y ∈ Zn2 . Then ω(x + y) ≤ ω(x) + ω(y).


76 Chapter 4. Coding Theory

Proof. Suppose x = x1 · · · xn and y = y1 · · · yn , and let x + y = z1 · · · zn ∈ Zn2 . Then

zj ≤ x j + yj for all j = 1, 2, . . . , n,
Pn Pn
where the + sign on the RHS represents ordinary addition. Therefore, j=1 zj ≤ j=1 xj +
Pn
j=1 yj , and hence ω(x + y) ≤ ω(x) + ω(y). 

Theorem 4.4 For every x, y, z ∈ Zn2 , the Hamming distance function d : Zn2 × Zn2 → N satisfies
all of the following conditions:

(i) d(x, y) ≥ 0;

(ii) d(x, y) = 0 if and only if x = y;

(iii) d(x, y) = d(y, x) (symmetry);

(iv) d(x, z) ≤ d(x, y) + d(y, z) (triangle inequality).

Proof. The statements (i)–(iii) are straightforward. To prove (iv), we note that for any y ∈ Zn2 ,
we have y + y = 0 (zero word of length n). Therefore, letting x = x1 · · · xn , y = y1 · · · yn , and
z = z1 · · · zn , we have

ω(x + z) = ω(x + y + y + z) ≤ ω(x + y) + ω(y + z) by Lemma 4.3.

Hence, it follows from Lemma 4.2 that d(x, z) ≤ d(x, y) + d(y, z). 

Note: The above theorem shows that (Zn2 , d) is a metric space, so the function d is sometimes
referred to as the Hamming metric.

4.2.2 Error Detection


Definition. A code C is said to detect an error pattern e if and only if c + e is not a codeword
for every c ∈ C.

I Example 4.6 Let C = {001, 101, 110}. To show that C detects the error pattern e1 = 010, but
does not detect the error pattern e2 = 100, it suffices to show that c + e1 6∈ C for all c ∈ C,
but c + e2 ∈ C for some c ∈ C. Indeed, it is easy to see that c + e1 6∈ C for any c ∈ C and
101 + e2 = 101 + 100 = 001 ∈ C.

A good method for determining which error patterns a code can detect is to first determine
which error patterns it cannot detect. Given a code C and any pair of codewords v and w, if
e = v + w, then C cannot detect the error pattern e (as v + e = w, which is a codeword). Thus
the set of all error patterns that cannot be detected by C is the set of all words that can be
written as the sum of two codewords.
Section 4.2. Error Detection and Correction 77

I Example 4.7 The error patterns that cannot be detected by the code C = {1000, 0100, 1111}
are 1000 + 1000 = 0000
1000 + 0100 = 1100
1000 + 1111 = 0111
0100 + 1111 = 1011

For certain codes, we can calculate some of the error patterns that the code can detect,
without needing to go through any of the above calculations. For this, we make use of the
Hamming distance between two codewords.

Definition. The minimum distance of a code C ⊆ Zn2 is the smallest Hamming distance
between any pair of distinct codewords. That is, the minimum distance of C, denoted by δ,
is defined by
δ := min{d(x, y) | x, y ∈ C, x 6= y}.

Remark 4.2 We know that d(v, w) = ω(v + w) (by Lemma 4.2), so δ is the smallest value of
ω(v + w) as v, w range over all possible pairs of distinct codewords.

I Example 4.8 To find the minimum distance of the code C = {0000, 1010, 0111}, we find the
smallest weight ω(w+v) over all pairs w, v of distinct codewords. The sums of distinct codewords
and their corresponding Hamming weights are listed below:

0000 + 1010 = 1010, ω(1010) = 2


0000 + 0111 = 0111, ω(0111) = 3
1010 + 0111 = 1101, ω(1101) = 3

We see that the minimum weight is 2, so this is the minimum distance of the code.

The key ideas thus far are summarised by the following result.

Theorem 4.5 Let C be a code with minimum distance δ. Then C can detect all non-zero error
patterns of weight at most δ 1. Moreover, there is at least one error pattern of weight δ that
C will not detect.

Proof. Let e be a non-zero error pattern with ω(e) ≤ δ 1. Then for any c ∈ C, we have

d(c, c + e) = ω(c + (c + e)) = ω(e) < δ.

Hence, since C has minimum distance δ, c + e 6∈ C. Therefore, C detects the error pattern e.
Now, from the definition of δ, there exist codewords c, c0 ∈ C with d(c, c0 ) = δ. Consider
the error pattern e = c + c0 . Then c = c0 + e ∈ C, so C will not detect this error pattern e of
weight δ. 

Note: A code C with minimum distance δ could possibly detect some error patterns of weight
δ or more, but it does not detect all error patterns of weight δ.
78 Chapter 4. Coding Theory

Definition. A code C is said to be k-error-detecting if it detects all error patterns of weight


at most k, and does not detect at least one error pattern of weight k + 1.

By Theorem 4.5, if C has minimum distance δ, then C is (δ 1)-error detecting. Equivalently,


for a code C to be k-error-detecting, it must have minimum distance k + 1.

I Example 4.9 Let C = {0000, 1010, 0111}.

(a) For which k is C is k-error-detecting?


Answer: As we found in Example 4.8, the minimum distance of C is δ = 2, so C is
1-error-detecting.

(b) Find all error patterns that C detects, and hence show that 1010 is the only error pattern
of weight 2 that C does not detect.
Solution: By part (a), C detects all error patterns of weight 1. These are all binary words
of length 4 that have weight 1 but which are not in C: 1000, 0100, 0010, 0001. The code C
also detects the following error patterns e of weight 2 because, for each of these e, c + e 6∈ C
for all c ∈ C:
1100, 1001, 0110, 0101, 0011

These are all the possible error patterns of length 4 and weight 2, except 1010, which the
code cannot detect because 1010 + 1010 = 0000 ∈ C.

4.2.3 Decoding Rule


From now on, we assume that the following decoding process is being used.
Let C be a set of codewords. Suppose a codeword is transmitted and v is the word received.
If there is a unique codeword c ∈ C for which d(c, v) < d(c0 , v) for all c0 ∈ C, c0 6= c, then
we decode v as c. However, if there are two or more codewords that are equally close to
w, then retransmission is requested. This process is called Incomplete Maximum Likelihood
Decoding. (Note that so-called Complete Maximum Likelihood Decoding arbitrarily selects one
of the equally closest codewords to the received word v and decodes v as that arbitrary choice,
without asking for retransmission.)
One of the main goals is to ensure that decoding to the closest codeword will (almost) always
produce the correct codeword.

4.2.4 Error Correction


We now formalise the definition of error correction.

Definition. A code C corrects an error pattern e if, for all codewords c ∈ C, c + e is closer
to c than any other codeword in C.

Definition. A code C is said to be k-error-correcting if it corrects all error patterns of


weight at most k, and does not correct at least one error pattern of weight k + 1.
Section 4.2. Error Detection and Correction 79

Given a codeword c ∈ C ⊆ Zn2 , we can think of a “closed ball” in n dimensional space of


radius k centred on c by saying that another word w falls within this ball if and only if w is
within distance k of the codeword c. More formally:

Definition. Suppose k ∈ N and x ∈ Zn2 . Then the set

B(x, k) = {y ∈ Zn2 | d(x, y) ≤ k}

is the closed ball with centre x and radius k; it is the set of binary strings of length n that are
at most distance k from the string x.

Note: |B(x, k)| = n0 + n n n


   
1 + ··· + k since there are i different ways in which to change i
digits for i = 1, 2, . . . , k.

I Example 4.10 B(10101, 1) = {10101, 00101, 11101, 10001, 10111, 10100} with |B(10101, 1)| =
5 5
 
0 + 1 = 1 + 5 = 6.

Intuitively, a code C is k-error-correcting if, for any two codewords c, c0 ∈ C, the closed balls
B(c, k) and B(c0 , k) do not intersect. Then any received word that falls within the ball of radius
k centred on a codeword c will be corrected unambiguously to c.

I Example 4.11 Show that C = {000, 111} is a 1-error-correcting code.

Solution: We need to show that C can correct all error pattens of weight at most 1, but there
exists an error pattern of weight 2 that it cannot correct. Indeed, C can correct all of the
error patterns of weight at most 1 (namely, 000, 100, 010, 001) because, for each of these error
patterns e, we have 000 + e is closer to 000 than the other codeword 111, and also 111 + e
is closer to 111 than 000. This is easily seen from the fact that the closed balls B(000, 1) =
{000, 100, 010, 001} and B(111, 1) = {111, 011, 101, 110} do not intersect. However, we note
that C cannot correct the error pattern 110 of weight 2, for example, because 000 + 110 = 110
is closer to the other codeword 111 than it is to 000. In fact, we see that the closed balls
B(000, 2) = {000, 100, 010, 001, 110, 101, 011} and B(111, 2) = {111, 011, 101, 110, 001, 010, 100}
intersect, i.e.,
B(000, 2) ∩ B(111, 2) = {100, 010, 001, 110, 101, 011}.

The intersection contains the following three error patterns of weight 2: 110, 101, 011, none of
which the code can correct.

The following fundamental theorem forms the basis for error correcting codes.

Theorem 4.6 Let C be a code with minimum distance δ.

(i) If δ is odd, then C can correct all error patterns with weight at most (δ 1)/2

(iii) If δ is even, then C can correct all error patterns with weight at most (δ 2)/2.
80 Chapter 4. Coding Theory

Proof. (Sketch Proof ) If C has minimum distance δ, then for any two codewords c, c0 ∈ C,
d(c, c0 ) ≥ δ. If δ is odd and an error pattern of weight ≤ (δ 1)/2 occurs in transmission, or if
δ is even and an error pattern of weight ≤ δ/2 occurs in transmission, then the received word
v will not be a codeword, so such an error pattern will be detected. In most of these cases, it
is also possible to correct the error(s), by selecting the unique codeword which is closest to the
received word. But, if δ is even and δ/2 errors have occurred, then the received word v could be
equidistant from two distinct codewords, in which case it would not be possible to unambiguously
select the closest codeword, i.e., error correction may not be possible. Similarly, error correction
may not be possible if δ is odd and (δ 1)/2 errors have occurred during transmission. 

The above theorem says that a code C with minimum distance δ can correct at most b δ−1 2 c
errors (where bzc denotes the integer part of z). However, whilst C may correct some error
patterns of weight > b δ−1
2 c, it will not correct all such error patterns as there is at least one
error pattern of weight 1 + b δ 2 1 c that C cannot correct.

I Example 4.12 The code C = {000, 111} from Example 4.11 has minimum distance 3, so by
Theorem 4.6, it can correct all error patterns of weight at most (3 1)/2 = 1, i.e., it is 1-error-
correcting, as we showed previously.

4.2.5 Hamming Bound


The problem of determining precisely the maximum number of codewords that a code C of
length n and minimum distance δ can possibly contain has not been solved in general (only for
certain values of n and δ). In practise, we would like to choose codes containing the maximum
number of possible codewords for given values of n and δ. There is a well known upper bound,
called the Hamming bound.
Recall that for a given word w ∈ C ⊆ Z2 , the closed ball of radius k centred at w, namely
B(w, k) = {v ∈ Zn2 | d(w, v) ≤ k}, consists of all the binary words of length n that are at most
(Hamming) distance k from w. As shown previously, |B(w, k)| = n0 + n1 + · · · + nk for any
  

binary word w ∈ Zn2 .


We have the following important theorem.

Theorem 4.7 (The Hamming Bound) If C is a code of length n with minimum distance δ,
then
2n
 where k = δ 2 1 .
 
|C| ≤ n n
 n
0 + 1 + ··· + k

Remark 4.3 Recall that a code of minimum distance δ can correct all error patterns of weight
n
 n n
at most b δ−1 n

2 c. Also note that 0 + 1 + · · · + n = 2 (by the well-known Binomial Theorem).

4.3 Linear Codes


We will now consider a broad class of codes that not only allow for fast and efficient transmission
of information, but also detect and correct errors; namely, linear codes. These codes have many
Section 4.3. Linear Codes 81

properties, which are interesting both theoretically and practically. We will need to use our
knowledge of matrices and linear algebra to study them.

Definition. A linear code is a (binary block) code in which the sum modulo 2 of any two
codewords is also a codeword. That is, a code C is linear if and only if v + w ∈ C for any pair
of codewords v, w ∈ C.

So any linear code is closed under addition modulo 2.

I Example 4.13 Show that the code C1 = {0000, 0101, 1010, 1111} is linear, but the code C2 =
{0000, 1001, 1010, 0011, 1111} is not linear.
Solution: As we can see below, the sums of all pairs of codewords in C1 are also in C1 , so C1 is
a linear code.
0000 + 0000 = 0000
0000 + 0101 = 0101
0000 + 1010 = 1010
0000 + 1111 = 1111
0101 + 1010 = 1111
0101 + 1111 = 1010
1010 + 1111 = 0101
However, the code C2 is not linear because, for instance, the sum of the two codewords 1001 and
1111 is not a codeword, i.e., 1001 + 1111 = 0110 6∈ C2 .

Note: Any linear code C must contain a codeword consisting entirely of zeros (called a “zero
word”) because, by definition of a linear code C, for any w ∈ C, w + w (which consists entirely
of zeros) is in C too.

One of the advantages of a linear code is that its minimum distance is easier to find than for
non-linear codes.

Theorem 4.8 For any linear code C, the minimum distance of C is the smallest weight of any
non-zero codeword in C.

Proof. Let C be a linear code of minimum distance δ, and let w be a non-zero codeword of
smallest weight. We will prove that ω(w) = δ by showing that ω(w) ≥ δ and ω(w) ≤ δ. First,
since the zero word is in C, we have δ ≤ d(0, w) = ω(0 + w) = ω(w), i.e., ω(w) ≥ δ. Now
suppose there exist codewords v1 , v2 ∈ C such that δ = d(v1 , v2 ) < ω(w). As C is linear, the
word v1 + v2 must be in C with ω(v1 + v2 ) = d(v1 , v2 ) < ω(w), contradicting the minimality of
w. Hence, δ ≥ ω(w), and therefore ω(w) = δ, as claimed. 

So we need only make |C| 1 calculations to determine the minimum distance of a linear code,
compared to |C|

2 calculations in the case of a non-linear code (for which we need to calculate
the weight of the sum of each pair of distinct codewords).

I Example 4.14 By Theorem 4.8, the minimum distance of the linear code

C = {0000, 1100, 0011, 1111}


82 Chapter 4. Coding Theory

is δ = 2 because the weights of the non-zero codewords are ω(1100) = ω(0011) = 2 and
ω(1111) = 4, the smallest of which is 2.

Linear codes are also advantageous in that it is much easier to describe the set of error
patterns that a linear code will detect and correct than it is for non-linear codes. For instance,
the set of error patterns that a linear code C ⊆ Zn2 can detect is Zn2 \ C because C is itself the
set of all error patterns that it cannot detect (since C is precisely the set of all binary words of
length n that can be expressed as a sum of two codewords in C).
Before stating an important theorem concerning linear codes, we will need to review some
linear algebra.

4.3.1 Brief overview of linear algebra for linear codes


Viewing elements of Zn2 = {x1 x2 · · · xn | xi ∈ Z2 , 1 ≤ i ≤ n} as vectors of length n and the digits
0, 1 as scalars, Zn2 forms a so-called vector space under coordinate-wise addition modulo 2 and
scalar multiplication modulo 2. Moreover, any subset S ⊆ Zn2 is a subspace of Zn2 if and only if,
for any u, v ∈ S and for any scalar α ∈ Z2 , u + v ∈ S and αu ∈ S, i.e., if and only if S is closed
under coordinate-wise addition modulo 2 (since the only scalars are 0 and 1).

Thus, a code C ⊆ Zn2 is linear if and only if C is a subspace of Zn2 .

For any non-empty subset S = {v1 , v2 , . . . , vk } ⊆ Zn2 , the span of S is defined by

hSi := {α1 v1 + α2 v2 + · · · + αk vk | αi ∈ Z2 , 1 ≤ i ≤ n},

i.e., hSi is the set of all linear combinations of the vectors in S. Moreover, hSi is a subspace of
Zn2 , called the subspace spanned or generated by S.

Hence, for any non-empty subset S ⊆ Zn2 , hSi is a linear code.

A set of vectors S = {v1 , v2 , . . . , vk } are dependent if there exist scalars α1 , α2 , . . . , αk , with


at least one αi 6= 0, such that

α1 v1 + α2 v2 + · · · + αk vk = 0.

Otherwise, S is said to be linearly independent. A basis for a vector space V is any non-empty
subset B ⊆ V such that B spans V (i.e., hBi = V ) and B is linearly independent. All bases for
a vector space contain the same number of elements and this number is called the dimension of
the vector space.

Definition. A basis for a linear code C ⊆ Zn2 is a basis for C when viewed as a subspace of
Zn2 . The dimension of a linear code C ⊆ Zn2 is the dimension of C when viewed as a subspace
of Zn2 .
Section 4.3. Linear Codes 83

If a linear code C has dimension k and B = {v1 , v2 , . . . , vk } is a basis for C, then each
codeword c ∈ C can be written as

c = α1 v1 + α2 v2 + · · · + αk vk

for a unique choice of binary digits α1 , α2 , . . . , αk . Since each αi = 0 or 1, there are 2k distinct
choices for α1 , α2 , . . . , αk . We thus have the following important theorem.

Theorem 4.9 A linear code of dimension k contains exactly 2k distinct codewords.

4.3.2 Dual Codes


The dot product (or scalar product) of two vectors x = x1 x2 · · · xn and y = y1 y2 · · · yn in Zn2 is
defined by
x · y = x1 y1 + x2 y2 + · · · + xn yn

with addition and multiplication modulo 2. Moreover, the vectors x and y are said to be
orthogonal if x · y = 0. If S is a set of vectors in Zn2 , we say that a vector v ∈ Zn2 is orthogonal
to the set S iff v · w = 0 for all vectors w ∈ S. The set of all vectors that are orthogonal to a set
S is denoted by S ⊥ , called the orthogonal complement of S. For any subset S of a vector space
V , S ⊥ is a subspace of V .

Definition. Let S ⊆ Zn2 . The dual code of the linear code C = hSi is the code C ⊥ := S ⊥ .

Theorem 4.10 Let C = hSi be the linear code generated by a subset S of Zn2 . If the dimension
of C is k1 and the dimension of its dual C ⊥ is k2 , then k1 + k2 = n.

Hereafter, a linear (binary block) code of length n and dimension k will be called a linear
(n, k) code.

4.3.3 Algorithm for Finding Bases for Linear Codes


An algorithm for finding bases for C = hSi and its dual C ⊥ = S ⊥ is given in Figure 4.2 on the
next page.
Section 4.3. Linear Codes 85

Now, we apply row operations to find the RREF of M (using arithmetic modulo 2):
       
1 1 0 1 0 1 1 0 1 0 1 0 0 0 1 1 0 0 0 1
1 0 0 0 1 0 1 0 1 1 0 1 0 1 1 0 1 0 0 1
M =  →  →  →  .
       
0 1 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0
 

1 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0
R2 →R2 +R1 , R4 →R4 +R1 R1 →R1 +R2 , R3 →R3 +R2 R2 →R2 +R3 , R4 →R4 +R3

Hence, we have  
1 0 0 0 1
G = 0 1 0 0 1 ,
 

0 0 0 1 0
the rows of which form a basis for C. Now, we form the matrix G 0 by swapping columns 3 and
4 of G, i.e., we have  
1 0 0 0 1
G 0 = 0 1 0 0 1 ,
 

0 0 1 0 0
which is of the form (I3 | A) where
 
0 1
A = 0 1  .
 

0 0
Hence,  
0 1
0
!  1

A
H0 =
 
=
0 0.
I2
1 0
 

0 1
Lastly, swapping rows 3 and 4 (as was done to the columns of G), we obtain
 
0 1
0 1
 
 
H=
1 0,
0 0
 

0 1

the columns of which form a basis for C ⊥ .

The following MATLAB program codes.m implements the linear codes algorithm for
Example 4.15.
86 Chapter 4. Coding Theory
% Find bases for a linear code C and its dual
clear
% Enter the matrix M whose rows consist of the words in the set S s.t. C=<S>
M=[1 1 0 1 0;
1 0 0 0 1;
0 1 0 0 1;
1 1 0 0 0]
[m,n]=size(M);
% Compute the RREF, R, of M using g2rref a modified version of MATLAB’s rref
% command that uses arithmetic modulo 2
R_dash=g2rref(M);
R=R_dash;
z=find(all(R_dash==0,2)==1); % find the zero rows of R_dash
nz=find(~all(R_dash==0,2)==1); % find the non-zero rows of R_dash
R((m-length(z)+1):m,:) = zeros(length(z),n); % put zero rows at bottom of R
R(1:length(nz),:) = R_dash(nz’,:); % put non-zero rows at top of R
disp(’The RREF of M:’)
disp(R)
% Compute the matrix G consisting of the non zero rows of R, which form a basis
% for the code C
for i=1:m
if R(i,:)==zeros(1,n)
G=R(1:i-1,:);
break
end
end
disp(’The matrix G whose rows form a basis for the code:’)
disp(G)
[k,n]=size(G);
% Compute the matrix G’ of the form [I_k|A] obtained from G by permuting columns
G_dash=G;
perm=[1:n];
for i=1:k
if G(i,i) ~= 1
for j=(i+1):n
if G(i,j) == 1 & find(G(:,j)==1) == i
perm(i)=j;
perm(j)=i;
G_dash(:,i)=G(:,j);
G_dash(:,j)=G(:,i);
end
end
end
end
disp(’The matrix G’’ of the form [I_k|A] obtained by permuting columns of G:’)
disp(G_dash)
disp(’using the permutation:’)
disp(perm)
A = G_dash(1:k,k+1:n);
% Compute the matrix H’ of the form [A|I_{n k}]
H_dash=[A;eye(n-k)];
disp(’The matrix H’’ of the form [A over I_{n-k}]:’)
disp(H_dash)
% Compute the matrix H obtained from H’ by applying the inverse of the
% permutation that was applied to the columns of G -- its columns form a
% basis for the dual of C
for i=1:n
H(i,:) = H_dash(perm(i),:);
end
disp(’The matrix H whose columns form a basis for the dual code:’)
disp(H)
Section 4.3. Linear Codes 87

Note: The above MATLAB program uses g2rref a modified version of MATLAB’s rref
command that uses arithmetic modulo 2. The file g2rref.m is available on the LMS, and you
must put it in the same folder as codes.m in order for the latter program to run successfully.
The output is shown below.

M =
1 1 0 1 0
1 0 0 0 1
0 1 0 0 1
1 1 0 0 0

The RREF of M:
1 0 0 0 1
0 1 0 0 1
0 0 0 1 0
0 0 0 0 0

The matrix G whose rows form a basis for the code:


1 0 0 0 1
0 1 0 0 1
0 0 0 1 0

The matrix G’ of the form [I_k|A] obtained by permuting columns of G:


1 0 0 0 1
0 1 0 0 1
0 0 1 0 0

using the permutation:


1 2 4 3 5

The matrix H’ of the form [A over I_{n k}]:


0 1
0 1
0 0
1 0
0 1

The matrix H whose columns form a basis for the dual code:
0 1
0 1
1 0
0 0
0 1
88 Chapter 4. Coding Theory

Justification of the linear codes algorithm

Let us give some justification as to why the linear codes algorithm gives bases for C and C ⊥ .
Firstly, the rows of M generate C and elementary row operations only interchange codewords
(rows) and/or replace one codeword (row) by the sum of itself with another codeword (row)
giving a new set of codewords (rows) that still generate C. And clearly the rows in the resulting
RREF G are linearly independent so they form a basis for C.
Now, by construction, it is also clear that the n k columns of H are linearly independent,
and by Theorem 4.10 the dimension of C ⊥ is n minus the dimension of C, i.e., the dimension of
C ⊥ is n k, so the columns of H form a basis for a subspace of the correct dimension. Moreover,
!
0 0 A
G H = (Ik | A) = A + A = 0, the zero matrix of size k × k (under addition modulo 2).
In−k

We apply the same permutation to the columns of G 0 and to the rows of H0 to obtain the
respective matrices G and H, so we also have GH = 0. Thus, each row of G is orthogonal to
each column of H, and hence if the rows of G form a basis for C, then the columns of H form a
basis for C ⊥ .

4.3.4 Generating Matrices for Linear Codes


Definition. If C is a linear (n, k) code, then any matrix whose rows form a basis for C is called
a generating matrix for C. Such matrices have k rows, n columns, and rank k.

Note: The rank of a matrix M is the number of non-zero rows in any RREF of M .

We have the following two theorems about generating matrices.

Theorem 4.11 A matrix G is a generating matrix for some linear code C if and only if the rows
of G are linearly independent (and so the rank of G, or equivalently the dimension of C, is the
number of rows of G).

Theorem 4.12 If G is a generating matrix for a linear code C, then any matrix that is row
equivalent to G (i.e., any matrix that can be obtained from G by a sequence of elementary row
operations) is also a generating matrix for C. In particular, C has a (unique) generating matrix
in RREF, called the standard generating matrix for C.

Remark 4.4 Let C = hSi be the linear (n, k) code generated by a subset S of Zn2 . The k × n
matrix G obtained via the linear codes algorithm is the standard generating matrix for C and
the (n k) × n matrix HT (i.e., the transpose of H) is the standard generating matrix for the
dual code C ⊥ .

To find the standard generating matrix for any linear code C (for which we must have C = hCi
by definition of “linear”), we can use the linear codes algorithm by forming the matrix M whose
rows are the non-zero codewords of C in Step 1.
Section 4.3. Linear Codes 89

I Example 4.16 Find the standard generating matrix for the linear code C = {0000, 1110, 0111, 1001}.
Solution: Implementing the linear codes algorithm, we have
 
1 0 0 1
M = 1 1 1 0 .
 

0 1 1 1
Applying row operations to find the RREF of M:
     
1 0 0 1 1 0 0 1 1 0 0 1
M = 1 1 1 0 → 0 1 1 1 → 0 1 1 1 .
     

0 1 1 1 0 1 1 1 0 0 0 0
R2 →R2 +R1 R3 →R3 +R2

Hence, we have !
1 0 0 1
G= ,
0 1 1 1
which is the standard generating matrix for C and its rows form a basis for C.

4.3.5 Encoding with a Linear Code


Let C be a linear (n, k) code. If G is a generating matrix for C and u is a binary word of length
k written as a row vector, then

uG = v is a codeword in C,

since v is a linear combination of the rows of G and these rows form a basis for C. More precisely,
let u = α1 α2 · · · αk ∈ Zk2 be a message word and let g1 , g2 , . . . , gk be the rows of G. Then

v = uG = α1 g1 + α2 g2 + · · · + αk gk ∈ C.

It is easy to see that if u1 G = u2 G then u1 = u2 , since each codeword in C is a unique linear


combination of the codewords in a basis of dimension k. Thus, no codeword v = uG in C is
obtained from more than one message word u ∈ Zk2 .
These observations lead to the following important theorem.

Theorem 4.13 If G is a generating matrix for a linear (n, k) code C, then v = uG ranges over
all 2k codewords in C as u ranges over all 2k binary words of length k. Thus C is the set of all
words uG with u ∈ Zk2 . Moreover, u1 G = u2 G if and only if u1 = u2 .

Note: This theorem says that the message words that can be encoded by a linear (n, k) code
are precisely all the words in Zk2 , and these message words are encoded as codewords of length
n via a generating matrix.

I Example 4.17 Let C be the linear (5, 3) code with generating matrix
 
1 0 1 1 0
G = 0 1 0 1 1 .
 

0 0 1 0 1
Assign letters/characters to each of the 23 = 8 (message) words in Z32 as follows:
90 Chapter 4. Coding Theory

space E H L M P T W
000 100 010 001 110 101 011 111

We will use the given generating matrix G to encode the following message: HELP ME
For each u ∈ Z32 , u is encoded as uG, so we have:

H E L P M E
010 100 001 101 000 110 100
↓ ↓ ↓ ↓ ↓ ↓ ↓
01011 10110 00101 10011 00000 11101 10110

So the encoded message would be the sequence of bits (one after the other and in order) that
appear in the bottom row of the table above.

4.3.6 Parity Check Matrices


We now introduce a class of matrices that are used in error detection and correction.

Definition. A matrix H is called a parity check matrix for a linear code C if the columns
of H form a basis for the dual code C ⊥ (or equivalently, if the rows of HT form a basis for C ⊥ ).

If C is a linear code of length n and dimension k (i.e., if C is a linear (n, k) code), then, since
the sum of the dimensions of C and C ⊥ is n (by Theorem 4.10), any parity check matrix for C
must have n rows, n k columns, and rank n k.

Theorem 4.14 A matrix H is a parity check matrix for some linear code if and only if the
columns of H are linearly independent.

The next theorem describes a linear code in terms of a parity check matrix.

Theorem 4.15 If H is a parity check matrix for some linear code C of length n, then C consists
of all words v ∈ Zn2 such that vH = 0.

I Example 4.18 Consider the linear (4, 2) code C = {0000, 0011, 1100, 1111}. To verify that the
matrix  
1 1
 
1 1
H=   is a parity check matrix for C,
0 1

0 1
we check that C consists of all possible codewords v such that vH = 0 (left as an exercise).
Alternatively, we can find another parity check matrix for C by implementing the linear codes
algorithm. We have  
1 1 1 1
M = 1 1 0 0 .
 

0 0 1 1
Section 4.3. Linear Codes 91

Applying row operations to find the RREF of M :


     
1 1 1 1 1 1 1 1 1 1 0 0
M = 1 1 0 0 → 0 0 1 1 → 0 0 1 1 .
     

0 0 1 1 0 0 1 1 0 0 0 0
R2 →R2 +R1 R1 →R1 +R2 , R3 →R3 +R2

Hence, we have ! !
1 1 0 0 1 0 1 0
G= and G 0 = ,
0 0 1 1 0 1 0 1
where G 0 was obtained from G by swapping columns 2 and 3. Lastly,
   
1 0 1 0
   
0
0 1  1 0 
H = 
 and H = 0 1 ,
  
1 0   
0 1 0 1

where the columns of H form a basis for the dual code C ⊥ (by the linear codes algorithm), and
hence H is a parity check matrix for C.

Why is the matrix H called a parity check matrix?

For a linear (n, k) code, the matrix equation xH = 0, with x = (x1 , x2 , . . . , xn ) ∈ Zn2 , represents
a system of n k simultaneous equations in the variables x1 , x2 , . . . , xn . If the rows in which
the j-th column of H contains 1’s are i1 , i2 , . . . , im with 1 ≤ i1 < i2 < · · · < im ≤ n, then

x i1 + x i2 + · · · + x im ≡ 0 (mod 2).

So a binary word x of length n in Zn2 satisfies xH = 0 if and only if certain (ordinary) sums of
its bits (as determined by H) are divisible by 2, and thus have even parity. The n k linear
equations (determined by the n−k columns of H, as above) are thus called parity check equations
for the code, and H is the corresponding parity check matrix.
The following two theorems characterise the relationship between a generating matrix and
a parity check matrix for a linear code C, and the relationship between these matrices for the
dual code C ⊥ .

Theorem 4.16 Matrices G and H are respective generating and parity check matrices for some
linear (n, k) code C if and only if all of the following conditions are satisfied:

(i) the rows of G are linearly independent;

(ii) the columns of H are linearly independent;

(iii) G has k (non-zero) rows and n columns;

(iv) H has n k (non-zero) columns and n rows;

(v) GH = 0.
92 Chapter 4. Coding Theory

Theorem 4.17 A matrix H is a parity check matrix for some linear code C if and only if HT
is a generating matrix for the dual code C ⊥ .

This theorem follows from Theorem 4.16 and the fact that HT G T = (GH)T = 0T = 0.

Remark 4.5 Let C = hSi be the linear (n, k) code generated by a subset S of Zn2 . By Re-
mark 4.4, the (n k) × n matrix HT , where H is the n × (n k) matrix obtained via the
linear codes algorithm, is the standard generating matrix for the dual code C ⊥ . Hence, by Theo-
rem 4.16, the (n k) × n matrix (HT )T = H is a parity check matrix for C, called the standard
parity check matrix for C.

Let GC and HC denote the respective generating and parity check matrices for a linear code
C. Similarly, let GC⊥ and HC⊥ denote the respective generating and parity check matrices for
the dual code C ⊥ . Given any one of these four matrices, we can use the linear codes algorithm
and Theorem 4.17 to find the other three matrices, as shown in the following diagram:
linear codes algorithm
HC ⊥ ← GC ⊥
 x
 
Transposey Transpose
linear codes algorithm
GC → HC

I Example 4.19 Let C be a linear code with parity check matrix


 
1 1
1 1 
  !
  A
HC = 0 1  = I .

2
1 0 
 

0 1

 By taking the transpose of HC , we find a generating matrix for the dual code C ⊥ :
!
T 1 1 0 1 0
GC ⊥ = H C = .
1 1 1 0 1

 Now, starting with GC⊥ , we can use the linear codes algorithm to find a parity check matrix
for C ⊥ . We find that the RREF of GC⊥ is
!
1 1 0 1 0
G0 ⊥ = ,
C 0 0 1 1 1

and from this matrix we obtain the following parity check matrix for C ⊥ (via the linear
codes algorithm):
 
1 1 0
1 0 0
 
 
HC ⊥ = 
0 1 1.
0 1 0
 

0 0 1
Section 4.3. Linear Codes 93

 Lastly, we find a generating matrix for C by taking the transpose of the parity check matrix
HC ⊥ :
 
1 1 0 0 0
GC = HT⊥ = 1 0 1 1 0 .
 
C
0 0 1 0 1
Note: Alternatively, we could have found a generating matrix for C by apply the linear codes
algorithm backwards, stating with HC , which would give us the following (standard) generating
matrix for C:  
1 0 0 1 1
0 1 0 1 1 .
 

0 0 1 0 1
Parity check matrices can be used for error detection in the following way. Let C be a linear
code with parity check matrix H. If v is a codeword in C, then by Theorem 4.15, we have
vH = 0. So if we receive a word w and we find that wH 6= 0, then we have detected an error.

I Example 4.20 Suppose the following matrix is a parity check matrix for a linear code C:
 
1 1
1 1
 
 
HC = 0 1.
1 0
 

0 1
The four words 11000, 01100, 11101, and 11010 are received. Use HC to determine whether it
is likely that errors have occurred in the transmission of these words.
Solution: For each of the four received words v ∈ {11000, 01100, 11101, 11010}, we compute
vHC :
   
1 1 1 1
1 1  1 1 
   
       
1 1 0 0 0 0 1 
  = 0 0 , 0 1 1 0 0 0 1   = 1 0 ,
1 0  1 0 
   

0 1 0 1
   
1 1 1 1
1 1  1 1  
   
        
1 1 1 0 1  0 1 = 0 0 , 1 1 0 1 0 0 1  = 1 0 .
   
1 0  1 0 
   

0 1 0 1
Hence, the words 11000 and 11101 were received correctly (they are codewords in C), but errors
occurred in the transmission of the other two words.

Recall that the minimum distance of a linear code C is the smallest weight of any non-zero
codeword in C (see Theorem 4.8). The following theorem tells us how to use a parity check
matrix for a code to determine the minimum distance of the code.
94 Chapter 4. Coding Theory

Theorem 4.18 Let H be a parity check matrix for a linear code C. Then C has minimum
distance δ if and only if every set of δ 1 rows of H is linearly independent, whilst at least one
set of δ rows of H is linearly dependent.

In other words, the minimum distance δ of C is the number of rows in the smallest collection of
distinct rows of the parity check matrix whose sum is zero.

Proof. Let H be a parity check matrix for a linear code C and let us first suppose that C
has minimum distance δ. Let v ∈ C be a non-zero codeword of minimum weight. Since C is
linear, ω(v) = δ (by Theorem 4.8), and by Theorem 4.15, we have vH = 0. Now vH is a linear
combination of exactly δ rows of H (since v contains exactly δ 1’s). Thus, the set of δ rows of
H corresponding to the δ 1’s in v forms a linearly dependent set. Moreover, for any word w of
weight less than δ, w 6∈ C, so wH 6= 0. Hence, no set containing less than δ rows of H can be
linearly dependent. Thus, every set of δ 1 rows of H is linearly independent.

Conversely, suppose that every set of δ 1 rows of H is linearly independent, whilst at least
one set of δ rows of H is linearly dependent. Then there exists a word u of weight δ such that
uH = 0. (The word u has 1’s in the positions corresponding to the rows in the set of δ rows
of H that is linearly dependent.) Thus, since u must be a codeword (by Theorem 4.15) and
has weight δ, the minimum distance of C is at most δ. Now, since every set of δ 1 rows of
H is linearly independent, no non-zero linear combination of less than δ rows of H will give 0.
So it follows that if w is a word of weight less than δ, then wH = 6 0, and hence w 6∈ C (by
Theorem 4.15). Thus u is a codeword of minimum weight in C, and so the minimum distance
of C is δ (by Theorem 4.8). 

Let’s now do an example demonstrating the use of Theorem 4.18 in determining the number
of errors a given linear code can detect and correct. For this, recall that a (linear) code of
minimum distance δ can detect at most δ 1 errors (by Theorem 4.5) and can correct at most
δ 1
2 errors (by Theorem 4.6).

I Example 4.21 Determine the minimum distance of the linear code C with the following parity
check matrix:  
1 0 1
0 1 1
 
 
1 1 0
H=
1

 0 0
 
0 1 0
0 0 1
Hence determine how many errors the code C can detect and correct.

Solution: We can easily see that the sum of every pair of rows of H does not give the zero
word (or equivalently, we see that the rows of H are distinct). Hence, every set of two rows of
H is linearly independent. However, the set consisting of the 3rd, 4th, and 5th rows of H
namely, the set {110, 100, 010} is linearly dependent because 110 + 100 + 010 = 000. Hence,
by Theorem 4.18, the minimum distance of C is δ = 3, and therefore C is 2 error-detecting and
1-error-correcting by Theorems 4.5–4.6.
96 Chapter 4. Coding Theory

(n, m) binary block code, 73


k-error-correcting code, 78
k-error-detecting code, 77

basis for a linear code, 82


binary block code, 71
binary code, 71
binary symmetric channel, 71
binary word, 70
block code, 71

codeword, 71

dimension of a linear code, 82


dual code, 83

error correction, 78
error detection, 76

generating matrix for a linear code, 88

Hamming distance, 75
Hamming metric, 76
Hamming weight, 75

linear (n, k) code, 83


linear code, 80
linear codes algorithm, 83

Maximum Likelihood Decoding, 78


minimum distance of a code, 77
minimum distance of a linear code, 81
modular arithmetic, 70

parity check matrix for a linear code, 90


probability of error, 72

standard generating matrix, 88


standard parity check matrix, 92

vector space, 70
Section 4.5. Exercises 97

4.5 Exercises
1. Let C be the code consisting of all the binary words of length 3 (i.e., C = Z32 ), and let E be
the 3-fold repetition code of length 9 formed by repeating each codeword in C three times.

(a) Can E detect any errors? If so, for which k is E k-error-detecting?


(b) Suppose that the word 001000001 is received. List the codeword(s) that were most
likely to have been transmitted.
(c) Suppose that the word 101100001 is received. List the codeword(s) that were most
likely to have been transmitted.

2. Write down the Hamming distance d(x, y) in each of the following cases:

(a) x = 1111, y = 0101


(b) x = 0110, y = 1001
(c) x = y = 1001
(d) x = 11000, y = 10101
(e) x = 1100101, y = 0111011

[Tip: Calculations are quicker using the fact that d(x, y) = ω(x + y) for any two binary
words x, y of the same length, where ω denotes Hamming weight.]

3. Consider the code C = {10110, 01011, 10011, 00101}.

(a) Determine whether or not the code C is linear.


(b) Find the minimum distance of C.
(c) Write down the set of error patterns that C can detect.
(d) For which k is C k-error-detecting? Write down an error pattern of weight k + 1 that
C cannot detect.
(e) For which k is C k-error-correcting? Find an error pattern of weight k + 1 that C
cannot correct.
(f ) List the elements of the linear code D := hCi.
(g) Hence determine the minimum distance of D and state the dimension of D.

4. Let S = {0001111, 0110101, 1010011, 1011100, 1100110}.

(a) Find the standard generating matrix G (in RREF) for the linear code
C := hSi, and hence write down a basis for C.
(b) Find a parity check matrix H for C, and hence write down a basis and a generating
matrix for the dual code C ⊥ := S ⊥ .
(c) Let dim C and dim C ⊥ denote the dimensions of the respective codes C and C ⊥ . Verify
that dim C + dim C ⊥ = 7.
98 Chapter 4. Coding Theory

(d) Using the parity check matrix H for C, determine the minimum distance of C.
(e) Assign letters/characters to each of the 23 = 8 (message) words in Z32 as follows:
space A C E L M R T
000 100 010 001 110 101 011 111
Use the generating matrix G to encode the following message: CALL ME
(f ) The following three words are received: 1111100, 1101001, 0111110. Use the parity
check matrix H from part (b) to determine whether it is likely that errors have
occurred in the transmission of any of these words.

4.6 Solutions to Exercises


1. (a) E can detect any two errors. It can detect some sets of three errors, but there are
sets of three errors which it cannot detect, so E is 2-error-detecting.
(b) The unique codeword that is closest to 001000001 is 001001001 (at distance 1).
(c) The unique codeword that is closest to 101100001 is 101101101.

2. (a) d(x, y) = 2
(b) d(x, y) = 4
(c) d(x, y) = 0
(d) d(x, y) = 3
(e) d(x, y) = 5

3. (a) The code C is not linear because it is not closed under addition; for instance, 01011 ∈
C and 10011 ∈ C, but 01011 + 10011 = 11000 6∈ C.
(b) The minimum distance of C is the smallest weight of any sum of two distinct codewords
in C (see Remark 4.2). Adding together all 42 = 6 pairs of distinct codewords:


10110 + 01011 = 11101, 10110 + 10011 = 00101, 10110 + 00101 = 10011,


01011 + 10011 = 11000, 01011 + 00101 = 01110, 10011 + 00101 = 10110,

we see that the smallest weight of any sum is 2. Thus δ = 2 is the minimum distance
of the code C.
(c) The set of error patterns that C cannot detect is precisely the set of sums of pairs
of codewords. Thus C cannot detect 00000 and the six sums of pairs of distinct
codewords calculated in part (b). So the set of error patterns that C can detect is
Z52 \ {00000, 11101, 00101, 10011, 11000, 01110, 10110}.
(d) Since δ = 2, C is 1 error-detecting. An error pattern of weight 2 that C cannot detect
is 11000 (= 01011 + 10011), for instance.
Section 4.6. Solutions to Exercises 99

(e) Since δ = 2, C is 0-error-correcting. It cannot correct the error pattern e = 01000 (of
weight 1) since the word e + 01011 = 00011 (6∈ C) is at distance 1 from two different
codewords (namely 01011 and 10011), i.e., 00011 ∈ B(01011, 1) ∩ B(10011, 1) (and
hence we do not know which codeword was originally sent when such an error pattern
occurs).
(f ) D = hCi = {00000, 10110, 01011, 10011, 00101, 01110, 11101, 11000}
| {z }
C
(g) Since D is linear, the minimum distance of D is the smallest weight of any non-zero
codeword (see Theorem 4.8). Hence, we see that D has minimum distance δ = 2.
Moreover, by Theorem 4.9, the dimension of D is k = 3 since |C| = 23 . This can be
verified by finding a basis for D using the linear codes algorithm.

4. (a) Implementing the linear codes algorithm, we find that


 
1 0 1 0 0 1 1
G = 0 1 1 0 1 0 1
 

0 0 0 1 1 1 1

is the standard generating matrix for C and its rows form a basis for C.
(b) Continuing to apply the linear codes algorithm, we find that
 
1 0 1 1
1 1 0 1
 
 
1 0 0 0
 
H = 0 1 1 1
 
 
0 1 0 0
 
0 0 1 0
 

0 0 0 1

is a parity check matrix for C and its columns form a basis for C ⊥ . A generating
matrix for C ⊥ is  
1 1 1 0 0 0 0
 
0 1 0 1 1 0 0
GC⊥ := HT = 

.
1 0 0 1 0 1 0
 
1 1 0 1 0 0 1

(c) By parts (a) and (b), we have dim C = 3 and dim C ⊥ = 4, and so dim C + dim C ⊥ = 7.
(d) In the parity check matrix H for C, no single row is a linearly dependent set (since
there is no row consisting entirely of 0’s), no set of two rows is a linearly dependent
set (since no sum of two rows is 0), no set of three rows is a linearly dependent set
(since no sum of three rows is 0), but the set of rows 4, 5, 6, and 7 is a linearly
dependent set (since the sum of these rows is 0). Hence the minimum distance of C
is δ = 4 (by Theorem 4.18).
(e) For each u ∈ Z32 , u is encoded as uG, so we have:
100 Chapter 4. Coding Theory

C A L L M E
010 100 110 110 000 101 001
↓ ↓ ↓ ↓ ↓ ↓ ↓
0110101 1010011 1100110 1100110 0000000 1011100 0001111

So the encoded message would be the sequence of bits (one after the other and in
order) that appear in the bottom row of the table above.
(f ) Given a received word w, if wH = 0, then w is a codeword; otherwise, if wH 6= 0,
then w is not a codeword, in which case an error has occurred in transmission. We
have
(1111100)H = 1101, (1101001)H = 0000, (0111110)H = 0100.

Thus, errors have occurred in the received words 1111100 and 0111110, but the re-
ceived word 1101001 is a codeword so it is likely that no errors occurred in its trans-
mission.
Chapter 5

Mathematical Programming

5.1 Introduction to Management Science (Taha Ch. 1)


Most of the material for this Unit is covered either by these Unit Notes or by the textbook:
Taha, H.A.: Operations Research, An Introduction, 10th ed. (2017). The textbook discusses a
computer package TORA which will be used to solve numerical problems and to illustrate some
of the concepts of this Unit, starting in the next section. Using a package to carry out much of
the numerical work enables us to spend more time on important ideas, formulation of models,
their suitability, conclusions, possible further work and to treat more interesting and realistic
examples. Clearly packages are important in practical applications. Some examples using the
packages TORA and MATLAB are given in the Unit Notes and in the textbook and more will
be given in problems and assignments.
The section headings in Chapter 1 of these Unit Notes correspond to chapter headings in
the textbook, and each section specifies the chapter and sections in the textbook that you
are required to read. The objectives of each section are highlighted, although these are not
necessarily an exhaustive list. The discussion in the Unit Notes is intended to draw your attention
to important ideas or results or explain concepts in a slightly different way. At the end of each
section there is a list of problems from the textbook and sometimes a few additional problems.
You should attempt enough problems from each section to convince yourself that you have
grasped the material. You should then be ready to attempt the assignment problems, some of
which are similar to the practice problems. This applies to later chapters of the Unit Notes as
well.
You should read all of chapter 1 of Taha as an introduction to the basic concepts of Man-
agement Science (MS) or Operations Research (OR). As discussed in sections 1.3 and 1.4, there
are several different types of models in MS, with the most prominent being linear programming
(LP) problems. For LP and other mathematical programming problems, solutions can be found
analytically using appropriate algorithms. Some types of MS problems are too difficult to solve
analytically, but it may be possible to simulate the problem to give useful solution estimates.
This simulation approach will not be covered in MAS225, but it is discussed in the unit MAS354
Modelling and Simulation.

101
102 Chapter 5. Mathematical Programming

Consider the five phases of an OR study in section 1.7: 1. Definition of the problem;
2. Construction of the model; 3. Solution of the model; 4. Validation of the model; 5.
Implementation of the solution. The first and last of these are of course important, even critical,
but they are not really mathematical problems. Although in this unit they will be often dealt
with rather briefly, they should be kept in mind. Formulation (construction) of the mathematical
model from the problem description is a vital part of the study; it is not necessarily easy and
considerable practice is helpful. It involves recognising what kind of model is suitable (if there
exists one at all!) and ‘translating’ a word description of the problem into a mathematical form
of an appropriate model.
We will study various techniques for finding solutions of MS models. The simplex algorithm
will be used for finding optimal solutions of linear programming problems. Graphical methods
are helpful in understanding how and why the algorithm works. The simplex method also
yields important information about the optimal solution, in particular information for sensitivity
analysis. The idea of sensitivity analysis is to investigate the effect on an optimal solution of
small changes in parameter values of the model, which is an important part of the validation
phase. Special attention will be paid to this in linear programming.

5.2 Modelling with Linear Programming (Taha Ch. 2)

Objectives: When you have completed this section you should be able to

 Formulate a mathematical model of a linear program from a description of the problem.

 Understand concepts of decision variables, objective function, constraints and feasibility in


an LP problem.

 Use graphical methods for finding solutions of a two-variable LP problem.

 Use TORA to solve an LP problem.

You should read all of Taha chapter 2 except for section 2.3 Computer Solution with Solver
and AMPL. This chapter introduces Linear Programming and many of its important ideas and
notation. Graphs help in understanding how problems with two variables can be solved, provide
motivation and feeling for how the simplex algorithm studied in the next section might work
and towards understanding sensitivity analysis in the third section of these notes. TORA should
be used in combination with graphical solutions in getting a feeling for the problems and the
solutions.
Section 5.2. Modelling with Linear Programming (Taha Ch. 2) 103

The TORA software can be downloaded as a zipped file from the Companion Website for the
textbook by Taha (10th edition) - see the Unit Information for information about this. Follow
the instructions in the “ReadMe” text file to install the software. TORA is totally menu driven
and so it is easy to use (there is no user’s manual). The “MAIN Menu” lists the types of MS
problems that can be solved with TORA and one of these is Linear Programming. After clicking
on “Linear Programming” you will see that you can enter a new problem or select an existing
file (corresponding to an existing problem). To start with it is best to select an existing file,
in particular test.txt(ReddyMikks).txt corresponding to the Reddy Mikks problem in the
textbook. This way you can see what goes into each editor cell and how it naturally relates
to the mathematical representation of the problem. You can then click on “SOLVE menu”
and “Solve Problem” using either “Graphical” or “Algebraic > Final solution”. Don’t worry
about “Iterations” at this stage; this relates to the simplex method which will be discussed in
the next section.
The formulation of linear programming models from word descriptions of problems is an
essential part of management science. But formulating the mathematical models can be quite
difficult, so it is important to spend plenty of time studying section 2.4. In any LP problem there
are three basic elements in its formulation: We need decision variables, an objective function to
optimize and constraints that must be satisfied. These concepts are best illustrated by doing
many examples. See the textbook and the extra problems below.
An LP problem with n variables and m constraints is of the form:
maximize z = c1 x1 + c2 x2 + . . . + cn xn (objective function)
subject to
a11 x1 + a12 x2 + . . . + a1n xn ≤ b1
a21 x1 + a22 x2 + . . . + a2n xn ≤ b2
..........................................
am1 x1 + am2 x2 + . . . + amn xn ≤ bm
x1 ≥ 0, x2 ≥ 0, . . . , xn ≥ 0 (non-negativity constraints)

The numbers cj , bi and aij (j = 1, 2, . . . , n; i = 1, 2, . . . , m) are given numbers and are called
parameters or coefficients of the problem. LP problems may have other forms, such as minimizing
the objective function, = or ≥ in the constraints and other than non-negativity constraints.
However, these can always be converted to the form given above.
The above LP problem can also be written in the matrix form:
maximize z = cT x,
subject to Ax ≤ b and x ≥ 0,
where x, b, c are column vectors and A is an m × n matrix, and cT denotes the transpose of c.
Graphical solutions are very useful for illustrating many important methods and ideas, both
in searching for the optimal solution to an LP problem and in a sensitivity analysis, that is
investigating the effect of (small) changes in parameter values of the problem. It can be seen
that an optimal solution is always found at a corner point of the feasible region. We can then
give a geometric interpretation of the way an optimal solution is reached. This idea extends to
104 Chapter 5. Mathematical Programming

more general LP problems and is basic to the simplex algorithm dealt with in the next section.

Practice Problems.
Taha Problems 2.1, 2.3, 2.5(a)(e), 2.6(a)(d), 2.9, 2.11, 2.22, 2.24, 2.26, 2.27, 2.32, 2.34
Note that the problems marked with a * have answers at the back of the book. Use TORA to
find the optimal solution where this is required.
Formulation Exercises 1 5 below (exercise 5 is more complex than the others).
Note Do parts (a) - (c) for each of the problems now; return to complete exercises 1 4 after
reading the next two sections on Linear Programming.

Formulation Exercises

1. A furniture manufacturer produces two types of desks: Standard and Executive. These
desks are sold to an office furniture wholesaler, and for all practical purposes there is an
unlimited market for any mix of these desks, at least within the manufacturer’s production
capacity. Each desk has to go through four basic operations: cutting the lumber, joining
the pieces, prefinishing, and final finishing. Each unit of the Standard desk produced takes
48 minutes of cutting time, 2 hours of joining 40 minutes of prefinishing, and 5 hours and
20 minutes of final finishing time. Each unit of the Executive desk requires 72 minutes of
cutting, 3 hours of joining, 2 hours of prefinishing, and 4 hours of final finishing time. The
daily capacity for each operation amounts to 16 hours of cutting, 30 hours of joining, 16
hours of prefinishing, and 64 hours of final finishing time. The profit per unit produced is
$40 for the Standard desk and $50 for the Executive desk. What product mix is optimal?

(a) Formulate this problem as a linear program maximizing daily profit, and show each
constraint graphically. Show the objective function for z = $400 and z = $600. Are
there any redundant constraints? Which ones?
(b) Find the optimal solution graphically. What is the amount of slack for each con-
straint?
(c) Use TORA to find an optimal solution.
(d) Find an optimal solution of the LP problem using the simplex algorithm. Compare
your result with the graphical and TORA solution.
(e) Find the shadow price for each constraint graphically and interpret its meaning.
(f ) Determine individually for each objective function coefficient the range of values for
which the present solution remains optimal.
(g) The firm receives a request to produce 1 unit per day of a third type of desk style
called the Economy desk, which requires 30 minutes of cutting, 90 minutes of joining,
30 minutes of prefinishing, and 3 hours of finishing time. The profit would amount to
$20 per unit. Should the firm offer to make one unit of the Economy per day? Why
or why not?
Section 5.2. Modelling with Linear Programming (Taha Ch. 2) 105

2. A chicken feed manufacturer wants to find the lowest cost mix for a high protein formula
that contains 90 grams of nutrient A, 48 grams of nutrient B, 20 grams of nutrient C,
and 1.5 grams of vitamin X for each kilogram of feed. He can mix the formula from two
ingredients and a filler. Ingredient 1 contains 100 grams of nutrient A, 80 grams of nutrient
B, 40 grams of nutrient C, and 10 grams of vitamin X, and costs 40 cents per kilogram.
Ingredient 2 contains 200 grams of A, 150 grams of B, 20 grams of C, none of vitamin X,
and costs 60 cents per kilogram.

(a) Formulate this problem as a linear program minimizing cost per kilogram of mix.
Note that you do not need a variable for the filler. The amount of filler that has to
be added can be determined once the optimal mix of ingredients 1 and 2 has been
found. Show each constraint graphically, and show the objective function for z = 24
cents and z = 36 cents. Are there any redundant constraints? Which ones?
(b) Find the optimal solution that minimizes the cost of the mix. What is the amount
of slack for each constraint? How much filler has to be added?
(c) Use TORA to find an optimal solution.
(d) Write down the dual LP problem. Find an optimal solution to the dual using the
simplex algorithm. Check this with the graphical and TORA solution.
(e) Find the shadow price for each constraint graphically and interpret its meaning.
(f ) Determine individually for each objective function coefficient the range of values for
which the present solution remains optimal.

3. A firm produces three types of refined chemicals: A, B, and C. At least 4 tons of A, 2


tons of B, and 1 ton of C have to be produced per day. The inputs used are compounds
X and Y . Each ton of X yields 41 ton of A, 41 ton of B, and 121
ton of C. Each ton
1 1 1
of Y yields 2 ton of A, 10 ton of B, and 12 of C. Compound X costs $250 per ton,
compound Y $400 per ton. The cost of processing is $250 per ton of X and $200 per
ton of Y . Amounts produced in excess of the daily requirements have no value, as the
products undergo chemical changes if not used immediately. The problem is to find the
mix with minimum cost input.

(a) Formulate this problem as a linear program with the objective of minimizing total
daily costs. Show each constraint graphically and the objective function for z = $6000
and z = $12, 000.
(b) Find the optimal solution graphically. What is the amount of excess produced for
each chemical?
(c) Use TORA to find an optimal solution.
(d) Write down the dual LP problem. Find an optimal solution to the dual using the
simplex algorithm. Check this with the graphical and TORA solution.
(e) The daily requirement for C is increased to 1.25 tons. By how much does the daily
cost increase? The daily requirement for B is increased to 2.25 tons. By how much
106 Chapter 5. Mathematical Programming

1
does the daily cost increase? The requirement for A is reduced by 2 ton. By how
much does the daily cost change?
(f ) Determine for each individual compound the range of prices for which the present
solution remains optimal.
(g) The firm receives an offer for a third compound that yields 51 ton of A, 12 ton of B,
1
and 10 ton of C at a price of $300 per ton and a processing cost of $300 per ton.
Should the firm accept this offer? Why or why not?

4. The Playsafe Insurance Company of Knockville, ME or PICKME for short, has idle funds
of 20 million dollars available for short term and long term investments. Government
regulations require that no more than 80 percent of all investments be long term, that nor
more that 40 percent be invested at short term, and that the ratio of long-term to short-
term investments not exceed 3 to 1. Long-term investments currently yield 15 percent
annually, while the annual rate for short term investments is 10 percent.

(a) Formulate this problem as a linear program with the objective of maximizing the
weighted return. Formulate it in terms of what fraction of the funds to invest in each
investment, rather than in dollar amounts. Show all constraints graphically, shade
in the feasible region, and show the objective function for three values (one for 9
percent, one for 15 percent, and the optimal one).
(b) Find the optimal solution graphically. Which constraints are slack? What is the
amount of slack? Are any constraints redundant? What is the annual revenue from
the funds available for investment?
(c) Use TORA to find an optimal solution.
(d) Find an optimal solution using the simplex algorithm. Check this with the graphical
and TORA solution.
(e) Find the shadow prices of all constraints and interpret their meaning.
(f ) For each objective function coefficient, find the range for which the current solution
remains optimal.
(g) How does the solution change if no more that 20 percent of total funds can be invested
short term?

5. A firm would like to find the least-cost production schedule for a seasonal product. The
demand is 2000 units in May, 4000 in June, 6000 in July, 6000 in August and 2000 in
September. The product cannot be kept in storage for more than 2 months; e.g. if pro-
duced in May, it has to be sold by the end of July. The work force of seasonal workers must
be hired at the beginning of May and kept until the end of September. Initial training
costs per worker amount to $200. Each worker can produce 400 units a month on regular
time and, if desired, up to an additional 100 units on overtime. Each worker costs $800
per month for regular time and $3 per unit produced in overtime. Units produced are
available for sale that same month. Each unit put into storage incurs a handling cost of
Section 5.2. Modelling with Linear Programming (Taha Ch. 2) 107

$0.50. The cost of holding one unit in storage amounts to $0.40 per month stored.

(a) Formulate this problem as a linear program minimizing the total cost.
(b) Use TORA to find the optimal solution to the problem. Give an interpretation of the
optimal solution.
(c) It is not possible to hire a fraction of a worker. Modify the input to TORA by adding
an extra constraint fixing the number of workers, using the optimal solution to choose
this number, and find a new optimal solution. Give an interpretation to this solution.
Write a brief conclusion of what the firm should do, and why.
108 Chapter 5. Mathematical Programming

5.3 The Simplex Method and Sensitivity Analysis (Taha Ch. 3)

Objectives: When you have completed this section you should be able to

 Put an LP problem into standard form.

 Understand basic, non basic, slack and surplus variables, and basic and feasible solutions.

 Construct an initial tableau for the simplex algorithm.

 Determine an optimal solution using the simplex algorithm for two and three variable LP
problems.

 Understand the big Big M Method and the Two Phase Method for LP problems with
artificial variables.

 Recognize special problems such as degeneracy, unbounded or infeasible solutions.

 Use TORA for obtaining the optimal solution to LP problems.

 Understand and carry out a sensitivity analysis graphically, algebraically and with TORA.

In this and later sections matrix methods will be used for the solution and analysis of
linear programming problems. Taha has an Appendix D titled Review of Vectors and Matrices
which is available from the Companion Website or from My Unit Readings for MAS225. Read
this Appendix up to the end of section D.2.6 Inverse of a Nonsingular Matrix, but you can
omit section D.2.4 Determinant of a Square Matrix. Also read the part titled Row Operations
(Gauss-Jordan) Method of section D.2.7 Methods of Computing the Inverse of a Matrix. A lot
of this Appendix should be revision of material that you have studied in a previous unit.
Taha chapter 3 describes and discusses the simplex algorithm, how it is used to arrive at
an optimal solution to an LP problem, extensions to problems with so-called artificial variables,
and some other special cases where some difficulties might arise. You should read all of this
chapter, with special attention to understanding the basic simplex algorithm, noting potential
problems with non-standard setups. Use TORA as a help in understanding the method and
interpretation of the optimal solution.
As indicated in the previous chapter the optimal LP solution is always at a corner point
of the feasible region (solution space). The simplex algorithm is a procedure for searching the
corner points of the feasible region in a way guaranteed to find the optimal solution. Firstly
the problem must be set up in an appropriate form so that the algorithm can be applied to
the data. This form is called the equation form in Taha. (Note that in some books it is called
the canonical form.) To get the problem into equation form may require use of slack and/or
110 Chapter 5. Mathematical Programming

Figure 5.1: Feasible region

In addition to the tableau type in Taha an extra row c of the cj values has been added at
the top. This is not strictly needed for the algorithm, but might help to make clearer what
is happening. It also provides an alternative way of obtaining the values in the z row. It is
known that, for each tableau of the simplex algorithm, the jth value in the z row is the sum
of the products (inner product) of the elements in the c column (i.e. the objective function
coefficients of the basic variables) and those in the xj column minus cj . This jth value can
be written concisely as cT āj cj , where c is the vector in the c column and āj is the column
vector in the xj column of the current tableau. For example under the x1 column we have
0 × 2 + 0 × 3 + 0 × 0 + 0 × ( 3) 3 = 3, etc. Because the elements in the z row can be computed
this way, in some books the z row is labelled as “zj cj ”, where zj = cT āj .
The basic feasible solution chosen to start with has the 4 slack variables as basic and x1 , x2
as non-basic. The z row value in the x1 column is the most negative, so x1 is chosen as the
variable to enter the solution and the x1 column is the pivot column. Then the values in the
b/a column are calculated using the ratios of the corresponding values in the b column and the
pivot column a (ignoring any ratios where the a value is 0 or negative), giving 12/2 = 6 and
28/3. Clearly the former is the smallest ratio, so s1 becomes the leaving variable and the s1 row
is the pivot row. Taking the pivot element 2 and using elementary row operations to create a
standard unit vector in the x1 column yields the next tableau as:
Section 5.3. The Simplex Method and Sensitivity Analysis (Taha Ch. 3) 113

the point (6, 0). Reduced costs will be discussed further in section 1.4.
We will now look at the output from TORA for this problem. It is seen that it takes the
same steps as we have done by hand above (though it does it a lot faster!). But knowing how
the procedure works makes it easier to interpret the output. Furthermore, if you meet some LP
problem later in your career, it is important to have some idea of what is going on so that you
have a better chance of solving your problem.
The input for Example N1.1 to TORA is initiated by selecting “Linear Programming”,
“Enter New Problem”, with title Example N1.1 having 2 variables and 4 constraints, and then
entering the given parameter values in the input grid as:

x1 x2 RHS
Var. Name
Maximize 3 2
Constraint 1 2 1 <= 12
Constraint 2 3 1 <= 28
Constraint 3 0 1 <= 7
Constraint 4 3 4 <= 16
Lower Bound 0 0
Upper Bound infinity infinity
Unrestr’d (y/n)? n n

To see the iterations of the simplex algorithm in TORA select “Solve Problem” > “Alge-
braic” − > “Iterations” − > “All-slack starting solution”. The initial tableau shown in TORA
is:
Title: Example N1.1 (Maximize)
Iteration 1

Basic x1 x2 sx3 sx4 sx5 sx6 Solution


z (max) -3.00 -2.00 0.00 0.00 0.00 0.00 0.00
sx3 2.00 1.00 1.00 0.00 0.00 0.00 12.00
sx4 3.00 1.00 0.00 1.00 0.00 0.00 28.00
sx5 0.00 1.00 0.00 0.00 1.00 0.00 7.00
sx6 3.00 4.00 0.00 0.00 0.00 1.00 16.00
Lower Bound 0.00 0.00
Lower Bound infinity infinity
Unrestr’d (y/n)? n n

By clicking “Next Iteration” (for one at a time) or “All Iterations” (for all at once) TORA
successively displays the following tableaux.
114 Chapter 5. Mathematical Programming

Iteration 2

Basic x1 x2 sx3 sx4 sx5 sx6 Solution


z (max) 0.00 3.50 1.50 0.00 0.00 0.00 18.00
x1 1.00 0.00 0.50 0.00 0.00 0.00 6.00
sx4 0.00 2.50 1.50 1.00 0.00 0.00 10.00
sx5 0.00 1.00 0.00 0.00 1.00 0.00 7.00
sx6 0.00 2.50 1.50 0.00 0.00 1.00 34.00
Lower Bound 0 0
Upper Bound infinity infinity
Unrestr’d (y/n)? n n

Iteration 3

Basic x1 x2 sx3 sx4 sx5 sx6 Solution


z 0.00 0.00 -0.60 1.40 0.00 0.00 32.00
x1 1.00 0.00 0.20 0.20 0.00 0.00 8.00
x2 0.00 1.00 -0.60 0.40 0.00 0.00 4.00
sx5 0.00 0.00 0.60 -0.40 1.00 0.00 3.00
sx6 0.00 0.00 3.00 -1.00 0.00 1.00 24.00
Lower Bound 0 0
Upper Bound infinity infinity
Unrestr’d (y/n)? n n
Section 5.3. The Simplex Method and Sensitivity Analysis (Taha Ch. 3) 115

Iteration 4

Basic x1 x2 sx3 sx4 sx5 sx6 Solution


z (max) 0.00 0.00 0.00 1.00 1.00 0.00 35.00
x1 1.00 0.00 0.00 0.33 -0.33 0.00 7.00
x2 0.00 1.00 0.00 0.00 1.00 0.00 7.00
sx3 0.00 0.00 1.60 -0.67 1.67 0.00 5.00
sx6 0.00 0.00 0.00 1.00 -5.00 1.00 9.00
Lower Bound 0 0
Upper Bound infinity infinity
Unrestr’d (y/n)? n n

The successive TORA tableaux shown correspond to the tableaux determined manually
above. It can be seen that Iteration 4 yields an optimal solution, as the z row at the top has
only non-negative elements. The optimal solution is x1 = 7, x2 = 7 and z = 35. This solution
can also be obtained directly in TORA by clicking on “Final solution” (instead of “Iterations”).
This gives:
LINEAR PROGRAMMING OUTPUT SUMMARY
-
Title: Example N1.1
Final Iteration No: 4
Objective Value (Max) = 35.00

Variable Value Obj Coeff Obj Val Contrib


x1: 7.00 3.00 21.00
x2: 7.00 2.00 14.00
Constraint RHS Slack-/Surplus+
1(<) 12.00 5.00-
2(<) 28.00 0.00
3(<) 7.00 0.00
4(<) 16.00 9.00-

***Sensitivity Analysis***

Variable Current Obj Coeff Min Obj Coeff Max Obj Coeff Reduced Cost
x1: 3.00 0.00 6.00 0.00
x2: 2.00 1.00 infinity 0.00
Constraint Current RHS Min RHS Max RHS Dual Price
1(<) 12.00 7.00 infinity 0.00
2(<) 28.00 19.00 35.50 1.00
3(<) 7.00 4.00 8.80 1.00
4(<) 16.00 7.00 infinity 0.00

The sensitivity analysis output here is discussed in section 1.4.


116 Chapter 5. Mathematical Programming

When the formulation of an LP problem with maximization of the objective function has
constraints with ‘=’ or ‘≥’, it is not usually easy to recognize a simple starting basic feasible so-
lution (like (0, 0, . . . , 0) before). In such cases artificial variables are included in the formulation,
and either the Big M Method or the Two Phase Method discussed in section 3.4 can be used to
drive the artificial variables out of the basis. Section 3.5 discusses some important special cases
for LP problems (degeneracy, alternative optima, unbounded solutions and infeasible solutions)
and how they relate to the simplex method.
Besides finding the optimal solution of an LP problem, it is often important to know how
the optimal solution will change (if at all) when changes are made to the parameters of the
problem. This is the subject of sensitivity analysis. Some aspects of sensitivity analysis are
discussed in Taha section 3.6 in both a graphical and an algebraic way. This section also shows
how to interpret the sensitivity analysis information in the TORA output. (You do not need to
read about Excel Solver or AMPL in section 3.6.)
Section 3.7 of Taha briefly discusses several computational issues in linear programming
including other algorithms (apart from the classical simplex algorithm) that are used for large
scale problems. This is not examinable material but it is useful to read this section to get an
idea about the state-of the-art methods in this area.

Practice Problems.
Use the simplex algorithm to solve the Formulation Exercises 1 and 4 (i.e. do part 1(d) and
4(d)) of section 2 of these Notes.
Taha Problems 3.1, 3.10, 3.11, 3.13, 3.14, 3.18, 3.23, 3.25, 3.26, 3.32, 3.39, 3.62, 3.64, 3.67
Section 5.4. Duality and Post-Optimal Analysis (Taha Ch. 4) 117

5.4 Duality and Post-Optimal Analysis (Taha Ch. 4)

Objectives: When you have completed this section you should be able to

 Understand the concept of duality and its relation to the primal LP problem.

 Use the dual problem in sensitivity analysis.

 Provide an economic interpretation of duality.

 Carry out further sensitivity analysis (post-optimal analysis) by hand and using TORA.

Taha chapter 4 describes and discusses the dual LP problem to a primal LP problem and
its use in sensitivity analysis (or post-optimal analysis). With this knowledge, changes to the
problem can be analysed without having to use the simplex algorithm from the start again.
These changes include changes to an existing parameter in the objective function or to the RHS
of a constraint, or the addition of a new activity or a new constraint. TORA output provides
useful information for sensitivity analysis.
You should read all of chapter 4 of Taha except section 4.4 Additional Simplex Algorithms.
The economic interpretation of duality in Taha section 4.3 is useful in drawing conclusions and
in understanding the consequences of management decisions.

Example N1.1 continued


As in section 1.3, the output from TORA for Example N1.1 is:
Title: Example N1.1
Final Iteration No: 4
Objective Value (Max) = 35.00

Variable Value Obj Coeff Obj Val Contrib


x1: 7.00 3.00 21.00
x2: 7.00 2.00 14.00
Constraint RHS Slack-/Surplus+
1(<) 12.00 5.00-
2(<) 28.00 0.00
3(<) 7.00 0.00
4(<) 16.00 9.00-
118 Chapter 5. Mathematical Programming

***Sensitivity Analysis***

Variable Current Obj Coeff Min Obj Coeff Max Obj Coeff Reduced Cost
x1: 3.00 0.00 6.00 0.00
x2: 2.00 1.00 infinity 0.00
Constraint Current RHS Min RHS Max RHS Dual Price
1(<) 12.00 7.00 infinity 0.00
2(<) 28.00 19.00 35.50 1.00
3(<) 7.00 4.00 8.80 1.00
4(<) 16.00 7.00 infinity 0.00

From the x1 row of the sensitivity analysis output the coefficient c1 of x1 could move to
anywhere in the range 0 to 6 without changing the corner of the feasibility region where the
optimum is, although the value of the objective function may alter from its original value of 35.
Look at the graphical solution to see that this must be true. Also from the optimal simplex
tableau found in section 1.3, we see that changing c1 in this range will leave all values in the
top z row non-negative, but if c1 > 2, then the value under s3 would become negative and the
optimal point would change. Similarly, the coefficient c2 could change from 2 to anywhere in
the range (1, ∞) without altering the optimal solution corner.
A change in the RHS coefficient bj is likely to change the feasible region and therefore the
position of some corners. It is possible, however, that the position of the optimal corner will
stay the same or at least its defining edges (equivalently the set of basic variables) will stay the
same. In the example, the RHS coefficients bj may move in the ranges given without altering
the defining edges of the the optimal solution corner. For example, the current value of b1 = 12
could move anywhere in the range [7, ∞).
This fact can be derived algebraically from the optimal simplex tableau as follows. The
original matrix [P|I|b] in the initial simplex tableau is changed in the optimal tableau to
[B−1 P|B−1 |B−1 b], as described in Taha section 4.2.2. In this case we have:
 
0 1/3 1/3 0
 
0 0 1 0
B−1 = 
 


 1 2/3 5/3 0 

0 1 5 1

A change δb1 in b1 with no change in b2 , b3 , b4 corresponds to a column vector ∆b =


(δb1 , 0, 0, 0) representing the change in the RHS vector. The new RHS vector in the origi-
nal matrix is b + ∆b and so, with the same steps of the simplex algorithm as before, it would
become
B−1 (b + ∆b) = B−1 b + B−1 ∆b

in the final tableau. Note that there is no change in the z row of the final tableau, so the new
RHS vector is optimal so long as it is feasible. Since B−1 ∆b = (0, 0, δb1 , 0)T , the change only
affects the third component (corresponding to the slack variable s1 ) which becomes 5+δb1 in the
Section 5.4. Duality and Post-Optimal Analysis (Taha Ch. 4) 119

final tableau. This component remains greater than or equal to 0 (as is required for feasibilty)
if and only if 5 + δb1 > 0, i.e. δb1 > 5. As originally b1 = 12, this condition corresponds to
b1 = 12 + δb1 belonging to [7, ∞), which is the same as shown in the TORA sensitivity analysis
output. As the change only affected the value of s1 , the optimal values of x1 = 7 and x2 = 7
are unchanged.
Note that this corresponds geometrically to the fact that (7, 7) remains the optimal corner
point in Figure 5.1 so long as the edge E3 (which is defined by the first constraint) is to the right
of (or on) the line 2x1 x2 = 7 (i.e. using the lower endpoint of [7, ∞) as the RHS parameter).
Recall that in general the dual price (or shadow price) of a resource represented by a 6
constraint with slack variable sk appears in the optimal simplex tableau in the z row under the
sk variable. Looking at the optimal simplex tableau for Example N1.1 in section 1.3, we see that
the dual price for the first constraint is 0, which is the same as shown in the TORA sensitivity
analysis output. This corresponds to the fact that the optimal value of the objective function
does not change as b1 is changed, which is clear geometrically because the optimal corner point
in Figure 5.1 does not change as the edge E3 is translated locally from its original position (so
long as it remains to the right of the line 2x1 x2 = 7).
The argument for a change to each of the other RHS parameters can be made in the same
way. As an exercise, use the matrix B−1 to find the values for constraint 2 in the “min RHS”
and “max RHS” columns of the TORA sensitivity analysis output, and interpret the results
geometrically with Figure 5.1. Also identify the dual price for constraint 2 from the optimal
simplex tableau. This dual price is equal to 1 as shown in the TORA output above. This means
that the objective function value increases at the rate 1 as the value b2 is increased (locally)
from its current value of 28. The fact that it increases is clear geometrically from Figure 5.1,
because the optimal corner point will shift to the right as the edge E4 moves to the right. By
considering the change δb2 = 1 and the corresponding edge 3x1 + x2 = 29, convince yourself
that the rate of increase of the objective function is equal to 1.
Example 4.5-4 in Taha section 4.5.2 illustrates the usefulness of the optimal dual values to
investigate the addition of a new activity. In general, let xN be the new activity variable and
cN be its objective function coefficient. Suppose that one unit of the new activity requires aiN
units of resource i for i = 1, . . . , m. Let yi be the current optimal dual price corresponding to
resource i, which can be read off the optimal simplex tableau. Then the reduced cost of the new
activity can be found as

zN cN = a1N y1 + a2N y2 + · · · + amN ym cN .

This approach is sometimes called pricing out the new variable because the yi represent the
current optimal values of the resources per unit.
If the value of zN cN is positive, then it is not profitable to bring the new activity in at a
positive level. If the value of zN cN is negative, then it is profitable to bring the new activity
in at a positive level, i.e. to include xN in the optimal basic solution. In the latter case, we find
the new optimal solution by first finding the column corresponding to xN in the original optimal
tableau using the product B−1 (a1N , . . . , amN )T (which is identical to what would have resulted
120 Chapter 5. Mathematical Programming

if all the previous row operations had been applied to the column vector (a1N , . . . , amN )T ).
Then we simply apply one iteration of the simplex algorithm to find the required new optimal
solution. See Example 4.5-4 in Taha to see how this works.

Practice Problems.
Taha Problems 4.2, 4.4, 4.8, 4.20, 4.25
Complete the remaining parts of the Formulation Exercises 1 4 at the end of section 2 of these
Notes. Also write down the dual LP formulation for these problems.
Exercises.
6. Consider the LP problem:
maximize z = 4x1 + 5x2
subject to 2x1 + 3x2 ≤ 6
2x1 + 2x2 ≤ 5
x1 , x2 ≥ 0
(a) Construct the dual problem.
(b) Solve both the primal and dual problems graphically.
(c) Find all the basic solutions of the primal and the complementary basic solutions of the
dual. Write down which of these are feasible.
(d) Solve the primal by the simplex algorithm. For each tableau, identify the current basic
feasible solution for the primal and the complementary basic solution for the dual.
7. A firm can produce four products in its factory. It takes only one day to produce a unit
of each product, but production is limited by floor space in the factory and the amount of labor
available. The relevant data are given in the table.

Product 1 2 3 4 Availability
Floor area, m2 /unit 10 30 80 40 900 m2
Labor/unit 2 1 1 3 80 workers
Variable cost/unit 20 30 45 58
Sales revenue/unit 30 50 85 90
The following linear program is formulated, where xj is the daily production of product j:
maximize z = x1 + 2x2 + 4x3 + 3.2x4 (profit in units of $10)
subject to x1 + 3x2 + 8x3 + 4x4 ≤ 90 (factory space in units of 10m2 )
2x1 + x2 + x3 + 3x4 ≤ 80 (labour)
xj ≥ 0 for all j
The optimal tableau is as shown.

cj 1 2 4 3.2 0 0
cj Basis Solution x1 x2 x3 x4 x5 x6
3 4
1 x1 10 1 1 4 0 5 5
2 1
3.2 x4 20 0 1 3 1 5 5
1 8 17 4
zj cj 74 0 5 5 0 25 25
Section 5.4. Duality and Post-Optimal Analysis (Taha Ch. 4) 121

(a) A raw material used in products 1 and 3 is very unstable in price. At the moment, it
1
costs $100 a ton. Product 1 uses 10 of a ton, and product 3 uses 15 of a ton. The cost of the
raw material is included in the variable costs shown above. What is the price range of this raw
material for which the present solution is still optimal?
(b) What are the optimal values of the dual variables of this problem? Interpret these
variables. What is the range of RHS values in which these variables’ values hold?
(c) The firm can increase its effective floor space to 1000m2 by renting a new conveyor and
stacking system. The machine costs $50 a day to rent and operate. Should it be rented? If so,
what is the new production schedule?
(d) The firm is considering a fifth product that requires a floor area of 30m2 /unit and 2
workers/unit. It costs $35/unit to produce and generates $65/unit revenue. Should this product
be produced and if so what is the new optimal production schedule.
(e) Five workers become sick and cannot work. How should the firm adjust its production
schedule and how much profit will be lost?
Note: Consider parts (c), (d), (e) independently.
Optional: Check your results using TORA. You may use TORA to directly solve some of the
above problems.
122 Chapter 5. Mathematical Programming

5.5 Transportation Model (Taha Ch. 5)

Objectives: When you have completed this section you should be able to

 Understand the transportation model and its relation to the LP problem.

 Use the northwest corner rule or the least-cost method for finding an improved starting
solution.

 Use the method of multipliers to determine an optimal solution.

 Use TORA for finding and interpreting the optimal solution.

Read Taha chapter 5, sections 5.1 5.3. In section 5.3 you can omit the discussion of the
transshipment model, the Solver Moment and the AMPL Moment. The remainder of the chapter
may be omitted.
The transportation model is a special type of linear program that arises in planning the
transportation of a commodity from a number of sources to a number of destinations. Each
source provides a certain supply of the commodity and each destination has a certain demand
for the commodity, and it is desired to minimize the total transportation cost. Taha section 5.2
shows that the transportation model can also be applied in areas other than transporting goods.
The transportation problem can be set up as an LP problem, as in Taha Example 5.1 1,
so the simplex method could be used directly to solve the problem. However, because of the
special structure of the transportation model, a better starting solution (as compared to the
standard simplex method) can be obtained and a simplified algorithm using a transportation
tableau can be used to find the optimal solution. Taha section 5.3.1 covers three methods for
finding an improved starting solution, but it is enough to consider the northwest corner rule and
the least-cost method (omitting the Vogel approximation method). Section 5.3.2 discusses the
transportation algorithm, including the method of multipliers (sometimes called the u-v method),
to find an optimal solution. TORA can be used to see how the algorithm works and to interpret
the optimal solution.

Practice Problems.
Taha Problems 5.3, 5.4, 5.6, 5.9, 5.15, 5.23(c)
Exercise 8. The Link Company operates three factories. Their products are shipped to three
warehouses which have locations and capacities as follows:
Sydney 1200 units, Melbourne 800 units, Brisbane 1000 units.
The capacity of each factory and the freight cost per unit from each factory to each warehouse
are as follows:
Section 5.5. Transportation Model (Taha Ch. 5) 123

Factory Capacity Freight to $ Cost per Unit


1 600 units Sydney 9
Melbourne 6
Brisbane 7
2 800 units Sydney 4
Melbourne 7
Brisbane 7
3 1400 units Sydney 8
Melbourne 6
Brisbane 9

The company wants to ship its products to the warehouses at minimum cost.
(a) Formulate this as a transportation problem.
(b) Use the northwest corner rule and the u-v method to solve the problem.
(c) Using part (b) (not the simplex algorithm), write down the optimal simplex tableau
corresponding to your solution.
(d) Find all optimal shipping schedules if the unit cost of freight from Factory 2 to Sydney
rises to $6.
124 Chapter 5. Mathematical Programming

5.6 Solutions to Selected Exercises

Objectives: It is intended that these solutions to the exercises from the end of the sections of
these Notes should not be looked at until a real attempt has been made to solve the problems
yourself.

Formulation Exercises.
1. (a) Let x1 (x2 ) be the number of standard (executive) desks produced per day. Consider
time in hours. Then the problem is to:
Maximize z = 40x1 + 50x2
subject to

0.8x1 + 1.2x2 6 16 (cutting)


2x1 + 3x2 6 30 (joining)
2 6
3 x1 + 2x2 16 (prefinishing)
5 31 x1 + 4x2 6 64 (finishing)
x1 > 0, x2 > 0 (non-negativity)

(b)

(c)
Section 5.6. Solutions to Selected Exercises 127

(c)

3. (a) Let x1 (x2 ) be the number of tons of compound X (Y) used. Then the problem is to:
Minimize z = 500x1 + 600x2
subject to

0.25x1 + 0.5x2 > 4 (chemical A)


0.25x1 + 0.1x2 > 2 (chemical B)
1 1 >
12 x1 + 12 x2 1 (chemical C)
x1 > 0, x2 > 0 (non-negativity)

(b)
Section 5.6. Solutions to Selected Exercises 129

4. (a) Let x1 (x2 ) be the fraction of short (long) term investments. Then the problem is to:
Maximize z = 0.1x1 + 0.15x2
subject to
x2 /(x1 + x2 ) 6 0.8, i.e. −0.8x1 + 0.2x2 ≤ 0
x1 /(x1 + x2 ) 6 0.4, i.e. 0.6x1 0.4x2 ≤ 0
3x1 x2 > 0
x1 + x2 6 1
x1 , x2 > 0

5. (a) We need rather a lot of variables here; the problem is clearly more complex than the
previous four exercises. We have five months i = 1, 2, 3, 4, 5 for May to September. Let di be
the demand in month i, namely 2000, 4000, 6000, 6000 and 2000. Let Ri and Oi be the number
produced in month i in regular and in overtime respectively. Let Sij be the amount stored in
month i for j months. Let w be the number of workers hired.
Each worker costs $4, 200 = 5 × 800 + 200, overtime costs are 3 51 Oi and storage costs are
P

0.5 4i=1 2j=1 Sij + 0.4 4i=1 Si1 + 0.8 3i=1 Si2 . A formulation of the problem is then:
P P P P

Minimize z = 4200w + 3 5i=1 Oi + 0.9 4i=1 Si1 + 1.3 3i=1 Si2


P P P

subject to
R1 + O1 (S11 + S12 ) > 2000
R2 + O2 + S11 (S21 + S22 ) > 4000
R3 + O3 + S12 + S21 (S31 + S32 ) > 6000
R4 + O4 + S22 + S32 S41 > 6000
R5 + O5 + S32 + S41 > 2000
(These are five demand constraints. Also 5i=1 Ri + 5i=1 Oi > 2000).
P P

Ri 6 400w, i = 1, 2, 3, 4, 5 (Regular time production, 5 of these)


Oi 6 100w, i = 1, 2, 3, 4, 5 (Overtime production, 5 of these)
Ri > 0, Oi > 0, Sij > 0, w > 0 (18 of these).
There are 18 variables Ri , Oi , Sij , w, with respectively 5, 5, 7 and 1 of each. There are
15 constraints and 18 non negativity constraints. This is not a problem we would want to
do manually by simplex. We let TORA do it for us. This has the an optimal solution with
z = $51, 400, w = 10.5882, Ri = 4235.2939, i = 1, 2, 3, 4, R5 = 2000, O4 = 1058.8235, S21 =
235.2944, S31 = 705.8826, S12 = 2235.2939, with other variables having zero coefficients. But it
is not possible to hire a fraction of a worker. We can try with w = 10 or w = 11 to see which
is best. For w = 11 (an extra constraint) we get an optimal solution with z = $51, 960, w =
11, Ri = 4400, i = 1, 2, 3, 4, R5 = 2000, O4 = 400, S21 = 400, S31 = 1200, S12 = 2400, with other
variables having zero coefficients. We see that the optimum is slightly larger when we insist of
an integer number of workers, as we should expect.
Chapter 6

Networks and Project Scheduling


(Taha Ch. 6)

Objectives: When you have completed this chapter you should be able to

 Formulate and understand models for shortest route and maximum flow problems though
a network.

 Apply a minimum spanning tree, a shortest route (Dijkstra) and a maximal flow algorithm.

 Construct a critical path in a network for planning, scheduling and controlling a project.

 Use the Critical Path Method (CPM) and the Program Evaluation and Review Technique
(PERT).

 Use a package (TORA) for solving and interpreting solutions of network problems.

 Appreciate some of the basic ideas and questions about complex networks.

Read Taha chapter 6 (omitting the discussion of Solver and AMPL) and also the Notes below
on PERT and the CPM cost model.
Networks arise in many areas, such as business, communication systems, project scheduling,
electrical engineering and control systems. This chapter looks at problems which can conve-
niently be formulated by a graph or a network. This often makes the problems much clearer
and consequently easier to solve. A variety of algorithms are studied for these problems. TORA
is used for solving problems and for seeing how to interpret the results from the algorithms.
Many firms use network models and methods in their management and production activities,
in particular the CPM and PERT methods. PERT involves some knowledge of probability and
distributions, which is covered in Taha section 14.4.4. Below are additional notes on PERT and
the CPM cost model.

131
132 Chapter 6. Networks and Project Scheduling (Taha Ch. 6)

6.1 Programme Evaluation and Review Technique PERT

In actual projects there may be considerable uncertainty in the task durations. PERT considers
these durations to be independent random variables and attempts to measure the variability of
the project completion time.
Suppose that each activity (i, j) has a minimum (optimistic), a maximum (pessimistic) and
a most likely duration time. Denote these by a, b and m respectively. If the duration tij is
modelled to have a beta probability distribution (see graph below) then it can be shown that
the expected value Etij and the variance σij2 of t are:
ij

Etij = 31 [2m + 21 (a + b)] 2 = [ 1 (b


σij 6 a)]2 .

Define a critical path here to be a path with greatest expected length. Since the individual
task durations are independent random variables, the length of the critical path is also a random
variable, with expected value µ and variance σ 2 given by:
µ = (i,j) Etij and σ 2 = (i,j) σij 2,
P P

where the sum is over each (i, j) on the critical path. The latter equation follows because the
variance of a sum of independent random variables equals the sum of the variances.
Assume that the length of the critical path has a normal distribution. This is sometimes
a reasonable assumption because the central limit theorem of probability theory implies that
the probability distribution of the sum of sufficiently many independent random variables is
approximately normal.
At first sight, it may appear that we now know the probability distribution of the earliest
project completion time. However this is not the case. Because, of all paths, the critical path
has the greatest expected length, it is true that the earliest expected project completion time
is the expected length of the critical path, i.e. µ above. But this is not necessarily the same as
the expected value of the earliest project completion time. That is because a non critical path
could, due to the variability, turn out to have a greater length than the critical path. This is
especially likely if the non-critical tasks have small slack but large variances. This also means
that the variance of the earliest project completion time could be significantly larger than σ 2
above.
Despite these limitations, if the variances of the task durations are relatively small, then the
length of the critical path is a useful guide to the variability of the earliest project completion
time.
Section 6.1. Programme Evaluation and Review Technique – PERT 133

For example, consider the following house building project, with activities, precedences and
durations as follows:

Activity Precedence Duration (days)


A excavation - 2
A foundation A 4
C frame B 10
D exterior plumbing C 4
E roof C 6
F wiring C 7
G interior plumbing D 5
H exterior siding E 7
I wall board F,G 8
J exterior painting D,H 9
K floor I 4
L interior painting I 5
M interior fixtures K,L 6
N exterior fixtures J 2

As an exercise show that the appropriate network and critical path are as shown on the
following graph:
134 Chapter 6. Networks and Project Scheduling (Taha Ch. 6)

The critical path has activities A B C D G I L N, taking 44 days. Now suppose that
for each task in the house building project the minimum, maximum and most likely durations
are 50% below, 50% above and equal to (respectively) the durations given above. What is the
probability that the house is completed in 50 days?
Since the expected values of the durations are simply those given above, the expected length
of the critical path is µ=44. The variance equals
σ 2 = (2/6)2 + (4/6)2 + (10/6)2 + (4/6)2 + (5/6)2 + (8/6)2 + (5/6)2 + (6/6)2 = 143/18
so the standard deviation is σ = 143/18 ∼
p
= 2.82.
Since the length, `, of the critical path is assumed to have a normal distribution, an estimate
of the required probability is

P (` ≤ 50) = P (z ≤ (50 44)/σ) ∼


= 0.983.

6.2 Critical Path Method CPM cost model

The CPM cost model considers task durations to be deterministic but incorporates a cost-time
tradeoff for each activity, and hence for the whole project.
Suppose that each task (i, j) has a normal time (duration) bij and a crash time aij , which
is the shortest time that this task can be completed by expending maximum effort (using e.g.
overtime, extra equipment etc.). Define the normal cost Cbij to be the direct cost of performing
task (i, j) in normal time and suppose that there is no cost saving in performing the task any
slower. Define the crash cost Caij to be the direct cost of performing task (i, j) in crash time.
Assuming a linear relationship between the direct cost of task (i, j) and its duration tij , then
direct cost of task (i, j) = cij vij tij ,
where cij is the intercept for tij = 0, and
Caij −Cbij
vij = bij aij
is the rate of increase of direct cost with a reduction in task duration.
Section 6.2. Critical Path Method – CPM cost model 135

The objective of CPM is to determine which time-cost tradeoff to choose in order to meet
a scheduled project completion time at minimum cost. This can be set up and solved by linear
programming or by using the project network. Only the latter is considered here.
Begin with all the tasks at their normal times. Since the project completion time depends on
the critical path(s), it can be reduced only by shortening the critical tasks. Find the critical task
(or perhaps a combination of critical tasks) which reduces the project time at the least marginal
cost. Shorten this task until either it crashes or some other tasks become critical. Then repeat
the above step and iterate until the project time can no longer be reduced.
For example, suppose that the activities in the house building project can be crashed as
follows:
Activity Times Direct Cost Cost/Day
Activity Normal Crash Normal Crash Reduced
A (1,2) 2 2 2100 2100
B (2,3) 4 4 3500 3500 -
C (3,4) 10 7 6500 7400 300
D (4,5) 4 3 4400 4700 300
E (4,6) 6 4 2900 3400 250
F (4,7) 7 5 2400 2800 200
G (5,7) 5 3 2100 2600 250
H (6,8) 7 5 9300 9900 300
I (7,9) 8 5 4600 5200 200
J (8,10) 9 5 2300 2900 150
K (9,11) 4 3 1900 2100 200
L (9,12) 5 4 2800 3100 300
M (12,13) 6 4 3600 3900 150
N (10,13) 2 2 1300 1300 -
49700 54900

While carrying out the procedure below, consult the project network above. The first step is
to crash activity (12,13), giving the project time 42 days. Next, shorten activity (7,9), but only
by 2 days out of the possible 3, because then the project time is 40 days and there are two critical
paths: 1 → 2 → 3 → 4 → 5 → 7 → 9 → 12 → 13 and 1 → 2 → 3 → 4 → 6 → 8 → 10 → 13.
Now, activities after event 4 must be shortened in pairs, one from each critical path, in order to
reduce the project time. But first it is cheaper to crash activity (3,4). Then, with cheapest total
marginal cost of 200 + 150 = $350, the pair of activities (7,9) and (8,10) should be shortened by
one day, which crashes (7,9). The next cheapest pair of activities is (5,7) and (8,10) with total
marginal cost of 250 + 150 = $400. Shortening these by 2 days crashes (5,7) and also makes
activity (4,7) critical. Next, crash activities (9,12) and (8,10) at a total cost of 300 + 150 =
$450. Lastly, to reduce the project time any further we must shorten all three activities (4,7),
(4,5) and (4,6) by one day, hence crashing (4,5) and costing 200 + 300 + 250 = $750. Since all
activities on the critical path 1 → 2 → 3 → 4 → 5 → 7 → 9 → 12 → 13 have been crashed,
there can be no further reduction in the project time by shortening any other activities.
136 Chapter 6. Networks and Project Scheduling (Taha Ch. 6)

This procedure is summarized below.

Activities Marginal Project Total


Step Shortened Cost Time Cost
0 - - 44 49700
1 M(12,13) by 2 150 42 50000
2 I (7,9) by 2 200 40 50400
3 C (3,4) by 3 300 37 51300
4 I (7,9) by 1 200 36 51650
J (8,10) by 1 150
5 G (5,7) by 2 250 34 52450
J (8,10) by 2 150
6 L (9,12) by 1 300 33 52900
J (8,10) by 1 150
7 F (4,7) by 1 200 32 53650
D (4,5) by 1 300
E (4,6) by 1 250

From these results, we have the following graph of the time-cost tradeoff for the house
building project. This can be useful for the builder. For example, if a contract required that
the house be completed in 38 days, then the minimum total direct cost would be $51,000.

In addition to direct costs of separate activities, a project usually has indirect costs as a
whole, e.g. management, facilities, interest etc. The total indirect cost increases with project
time.
Suppose, for example, the builder has indirect costs which increase linearly at the rate of
$375 per day of project time. Then clearly it is beneficial for him to reduce the project time
until the marginal total direct cost exceeds $375. Therefore, from the results above, the optimal
project time would be 36 days.
The total cost of the project is the total direct cost plus the total indirect cost. Suppose
that the builder’s total indirect cost is $1000 for a project time of 32 days and increases $375
138 Chapter 6. Networks and Project Scheduling (Taha Ch. 6)

(a) Using the labeling technique, find the maximum flow in the network.
(b) Verify your answer in (a) by finding the minimum cut(s).

3. A passenger-freight vessel is nearing the Golden Gate to berth in San Francisco. The
activities listed in the table have to be performed before it can sail for Acapulco. All times are
in hours.
Activity Precedence Mode Minimum Maximum
A Get towed to berth - 2 1 41 2 34
1
B Disembark passengers A 1 2 2
C Unload cargo A 4 3 8
D Carry out safety inspection B,C 1 12 1 2 12
E Refuel ship D 3 3 4
F Load cargo C 5 4 10
G Board passengers E,F 2 1 4
H Order tug E,F 3 2 21 3 12
I Leave port G,H 2 21 2 3 12
(a) For this PERT network, find the expected task durations and the variances of each task
duration.
(b) Draw a network, and find the critical path. What is the expected length of the critical
path, and what is its variance?
(c) What is the probability that the length of the critical path does not exceed 12 hours?

6.3 Complex networks

Over the last two decades there has been a big expansion in the study of complex networks and
complex systems. These systems are ones that have (relatively) simple individual components
linked together in a large-scale network which displays complex behaviour. There are many
important examples of complex networks, including information networks (e.g. the Internet and
the World Wide Web), social networks (e.g. friendships at a school), the brain (a network of neu-
rons and synapses), food webs (predators linked with prey) and metabolic networks (metabolites
connected by metabolic reactions).
In this section we will look at some of the basic ideas of complex networks and the important
questions about them. Read sections 1 4 of the following review article (available on My Unit
Readings):
M. Mitchell, Complex systems: Network thinking, Artificial Intelligence 170, 2006, pp. 1194
1212. Also read sections 1 and 2 of the following review article (available on My Unit Readings):
M. E. J. Newman, The structure and function of complex networks, SIAM Review, 45, 2003,
pp. 167 256.
We will also discuss some ideas about network searches, in particular searches on the World
Wide Web. Read section 8.3.1 in the article by Newman above.

You might also like