0% found this document useful (0 votes)
12 views

CO Lecture Notes

Uploaded by

lxz1160915566
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

CO Lecture Notes

Uploaded by

lxz1160915566
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

Lecture Notes on Combinatorial Optimisation

Peter Butkovič, Sabrina Kombrink and Sergey Sergeev

2024
Contents

1 Introduction 2

I The primal-dual algorithm 3

2 Brief review of duality in linear programming 4


2.1 Linear programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Definition of Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 The primal-dual algorithm 12

II Efficiently solvable problems 18

4 The maximum flow problem in networks 19


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Directed graphs and flows. Definition of the max-flow problem . . . . . . . . . . . . . . . . . . . . . 20
4.3 MFP is a linear program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.4 The Ford and Fulkerson Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.5 Correctness of the Ford and Fulkerson algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.6 Finite termination of the Ford and Fulkerson algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 34

5 Transportation problem 36
5.1 Basic properties of the transportation problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2 An application of the primal-dual algorithm: algorithm ALPHA–BETA . . . . . . . . . . . . . . . . 40

6 Introduction to the theory of matroids 50


6.1 Undirected graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.2 The greedy algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.3 Independence systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.4 Matroids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.5 Matroids and beyond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

III Hard problems 65

7 NP - completeness 67
7.1 Why “efficient” means “polynomial” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.2 Examples of hard problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.3 N P -completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

1
8 Algorithms for N P - complete problems 74
8.1 Relative error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.2 Multigraphs and Eulerian spanning graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.3 Approximation algorithms for ∆TSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

1
Chapter 1

Introduction

Combinatorial optimisation is a key area of discrete optimisation. It deals with optimisation problems formulated
in combinatorial terms. A substantial part of combinatorial optimisation is based on linear programming. We will
therefore review essential bits of linear programming at the beginning of this course, with emphasis on duality.
There are a good number of problems in combinatorial optimisation, for which no efficient solution method is
known (the meaning of the word “efficient” is explained in Section 7.1). We will pay special attention to these
problems towards the end of the course.
These lecture notes are based mainly on [1]. The classical monograph [2] is recommended for an advanced
reading on computational complexity. Additional examples can be found in [3] and [4].
All numbers, vectors and matrices in these notes are real. The symbol ⊤ indicates the transposition of a matrix.
The vectors are one-column matrices, therefore for instance the scalar (inner) product of vectors x and y is written
as x⊤ y.

2
Part I

The primal-dual algorithm

3
Chapter 2

Brief review of duality in linear


programming

2.1 Linear programs


The course on linear programming is a prerequisite/corequisite for combinatorial optimisation and we therefore
assume good knowledge of that theory from the beginning. However, for convenience, terminological and notational
comfort we will provide, without proofs, basic results from linear programming, with emphasis on duality.
Recall that by a linear program we understand the task of minimising or maximising a linear function of several
variables with respect to a system of linear equations and/or inequalities. A general linear program can be written
as follows:

min (or max) c1 x1 + c2 x2 + . . . + cn xn


s.t.
(2.1)
ai1 x1 + ai2 x2 + . . . + ain xn ⟨≤, =, ≥⟩ bi , for i ∈ {1, . . . , m},
x1 , . . . , xp ≥ 0.

Here m, n ∈ N; cj , aij , bi ∈ R for all i ∈ {1, . . . , m}, j ∈ {1, . . . , n}; and the brackets ⟨⟩ indicate that for every i
exactly one of the operators inside applies. Notice that in general not all variables are required to be non-negative.
But strict inequalities are not allowed in linear programming problems that we study here.
In optimisation it does not matter too much whether we maximise or minimise, because the minimisation of a
function f (x) is essentially the same task as maximisation of −f (x). So we may assume without loss of generality
that our task is the minimisation (say). This is what we will normally assume (unless stated otherwise). The
inequalities with ≥ can be transformed to inequalities with ≤ (and the other way round) simply by multiplying
both sides by (−1). Also, a condition (symbolically)

LHS ≤ bi

can be replaced by two conditions (involving a new variable, called slack):


LHS + x = bi ,
x ≥ 0.
Finally, a free variable can be replaced by the difference of two (new) non-negative variables:

xk = x+
k − xk ,

x+
k , xk ≥ 0.

4
Hence any linear program can easily be transformed to a linear program with minimisation, equations only and all
variables non-negative, that is to:

min f (x) = c⊤ x
s.t. Ax = b, (2.2)
x ≥ 0.

Here f is called the objective function, A ∈ Rm×n , c = (c1 , . . . , cn )⊤ ∈ Rn , b = (b1 , . . . , bm )⊤ ∈ Rm are given
coefficient matrix and coefficient vectors, x = (x1 , . . . , xn )⊤ ∈ Rn is the sought vector of unknown variables,
and (2.2) is called a linear program in standard form (SLP). In contrast, the following linear program

min c⊤ x
s.t. Ax ≥ b, (2.3)
x ≥ 0.

is said to be in canonical form (CLP).

Example 2.1 Consider the linear program

max 3x1 − 2x2


s.t. 2x1 + 5x2 ≥ 17
3x1 − 5x2 = − 27
x1 − x2 ≤ 20
x2 ≥ 0

We start by multiplying the first inequality by (−1), then we add a slack variable in each of the two inequalities
and replace x1 by the difference of two new non-negative variables. Finally we multiply the objective function by
(−1) and we get the following SLP:

min −3x1 + 3x2 + 2x3


s.t. − 2x1 + 2x2 − 5x3 + x4 = − 17
3x1 − 3x2 − 5x3 = − 27
x1 − x2 − x3 + x5 = 20
x1−5 ≥ 0

We will now introduce some terminology for the SLP (linear programs in the standard form). This is done
similarly for CLP or general linear programs.
The set of feasible solutions is
M = {x ∈ Rn : Ax = b, x ≥ 0},

and the set of optimal solutions is

M opt = {x ∈ M : c⊤ x ≤ c⊤ z for every z ∈ M }.

The notation M min may be used instead of M opt in this case.


The theory of linear programming is mainly concerned with two basic tasks: solution methods and duality
theory. There are a variety of solution methods for solving linear programs, the most famous is the simplex method
(briefly SIMPLEX), which is historically the first method for solving general linear programs and was voted one of
the 10 most important algorithms of the 20th century (actually it was placed 2nd after the Monte Carlo method)
because of its widespread and efficient use in many real-life situations. But it should be remembered that this

5
method is not polynomial since it behaves exponentially in extreme cases. However, (and this is also important
for combinatorial optimisation) linear programs are polynomially solvable by other methods, such as ellipsoid,
Karmarkar‘s and most recently the interior point method. We are not going to review SIMPLEX here but need to
see a fundamentally important consequence of this method which is summarised in the following statement offering
exactly three alternatives for each linear program.

Theorem 2.2 (Fundamental Theorem of Linear Programming) For every linear program (LP) exactly one
of the following holds:

1. LP is infeasible (that is M = ∅),

2. LP has an optimal solution (that is M opt ̸= ∅), or

3. minx∈M f (x) = −∞.

The third of these alternatives means that LP is not bounded from below, i.e., it is unbounded.
Due to this fundamental result, by a “solution” of a linear program we will always understand (unless stated
otherwise) either finding an optimal solution, or deciding that the linear program is infeasible or that the objective
function is unbounded.

2.2 Definition of Duality


We start by an example.

Example 2.3 Consider the following pair of linear programs:

min f (x) = 3x1 − 4x2 max g(π) = 5π1 + 7π2

2x1 − 7x2 ≥ 5 2π1 + 3π2 ≤ 3

3x1 + x2 ≥ 7 −7π1 + π2 ≤ −4

x1,2 ≥ 0 π1,2 ≥ 0

Table 2.1: An example of dualisation

Note that the coefficients of f (x) are the same as the right-hand side constants of the second problem and
similarly, the coefficients of g(π) are the same as the right-hand side constants of the first problem. Also, the matrix
of the left-hand side coefficients of the second problem is obtained by the transposition of the same matrix in the
first problem.
Let us replace the constants of f (x) using the right-hand sides of the second problem and manipulate:

f (x) = 3x1 − 4x2


≥ (2π1 + 3π2 )x1 + (−7π1 + π2 )x2
= (2x1 − 7x2 )π1 + (3x1 + x2 )π2
≥ 5π1 + 7π2 = g(π).

6
In this example, it follows that f (x) ≥ g(π) for any feasible solution x to the first problem and feasible solution π
to the second problem. It is obvious that many such examples could be constructed where pairs of linear programs
are strongly related. Several questions then arise, for instance, can equality f (x) = g(π) be attained? Or if one of
the problems is infeasible, what can be said about the other problem?
Linear programming originated in the 1940s and already in the first years the following interesting phenomenon
was observed: there is a scheme by which every linear program can be associated with another linear program
(minimisation to maximisation or vice versa), called dual, so that the properties of these programs expressed in
Theorem 2.2 are strongly related as in the example above. This may seem to be of little practical importance at
first but the subsequent years showed that duality is actually very useful, and in particular it is a powerful tool
for constructing algorithms for combinatorial optimisation problems. The first part of this course will demonstrate
this.
It is important to understand with confidence how a dual problem to a given linear program is constructed. We
will now explain this in some detail. We will first show it for general linear programs as described in (2.1). However,
for ease of discussion we will assume that all inequalities use the same sign, we fix it to be ≥. So if necessary you
will need to multiply the inequalities with ≤ by (−1). (Keep this convention in mind to avoid a disappointment!)
So our linear program now has the form

min f (x) = c⊤ x
s.t. a⊤
i x = bi , i ∈ I,
a⊤
i x ≥ bi , i ∈ I, (2.4)
xj ≥ 0, j∈J
xj ≥ ≤ 0, j ∈ J.

where the sign ≥≤ indicates that there is no restriction on xj . The vectors a⊤ i , for i ∈ I, are the vectors of
coefficients of the left-hand sides of equations, similarly a⊤i , i ∈ I, for inequalities. The set J (respectively, J) is
the set of indices of non-negative (respectively, free) variables. We will call this problem primal.
In the construction of the dual problem it is important to bear in mind that we assign dual variables to primal
constraints and dual constraints to primal variables. So the correspondence is not variable-variable and constraint-
constraint! This is a very frequent mistake that should be avoided. In order to define the dual problem we first
form the matrix A of all left-hand side coefficients. This matrix can be written row-wise as
 ⊤
ai , i ∈ I

A= .
a⊤
i , i∈I

The same matrix can be written as


A = (Aj , j ∈ J | Aj , j ∈ J),
where Aj stands for column j of A. The dual problem is defined by the following table, where by convention the
dual variables are denoted by Greek letters.
We can informally summarise this construction (“dualisation”), displayed in Table 2.2, as follows:

ˆ A dual variable is assigned to each primal constraint, this variable is free if it is assigned to an equation and
non-negative if it is assigned to an inequality.

ˆ A dual constraint is assigned to every primal variable; this constraint is an equation if it is assigned to a free
variable and it is an inequality if it is assigned to a non-negative variable.

ˆ The matrix of the left-hand side coefficients of the dual is obtained by the transposition of this matrix in the
primal problem (because π ⊤ Aj = A⊤ j π).

ˆ The coefficients of the dual objective function are equal to the right-hand side coefficients of the primal
problem.

7
min f (x) = c⊤ x max g(π) = π ⊤ b

a⊤
i x = bi i∈I πi ≥≤ 0

a⊤
i x ≥ bi i∈I πi ≥ 0

xj ≥ 0 j∈J π ⊤ Aj ≤ c j

xj ≥≤ 0 j∈J π ⊤ Aj = c j

Table 2.2: The dualisation rules

ˆ The right-hand side coefficients in the dual problem are equal to the coefficients of the primal objective
function.

ˆ The minimisation is replaced by maximisation.

Example 2.4 Consider the linear program (slightly different from that of the previous example):

min f (x) = 3x1 − 4x2


s.t. 2x1 − 7x2 ≥ 5
3x1 + x2 = 7
x2 ≥ 0

The dual will have two variables π1 and π2 because the primal problem has two constraints. The first dual variable
will be non-negative since it is assigned to an inequality, the second will be free since it is assigned to an equation.
There will be two dual constraints, the first an equation since it is assigned to a free variable x1 , the second will
be an inequality since it is assigned to a non-negative variable x2 . Finally, after swapping the coefficients of the
objective function with the ones on the right-hand side we obtain the dual as follows:

max g(π) = 5π1 + 7π2


s.t. 2π1 + 3π2 = 3
−7π1 + π2 ≤ −4
π1 ≥ 0

Now we give a brief overview of the main theorems on duality.

Theorem 2.5 (On Symmetry) The dual problem to the dual problem is equal to the primal problem.

Due to the Theorem on Symmetry (Theorem 2.5) it is more appropriate to speak about a “pair of primal-dual
problems”, rather than about a primal and dual problem.
It will be practical to use the following notation:

ˆ MP is the set of feasible solutions to the primal problem;

ˆ MD is the set of feasible solutions to the dual problem;

ˆ MPopt is the set of optimal solutions to the primal problem;

8
opt
ˆ MD is the set of optimal solutions to the dual problem

The next theorem generalises the observation of Example 2.3.

Theorem 2.6 (Weak Duality Theorem) For any x ∈ MP and any π ∈ MD , we have c⊤ x ≥ π ⊤ b.

It is also quite important to know that, if x and π are feasible solutions and the values of objective functions
are the same, then x and π are optimal.

Corollary 2.7 Let x ∈ MP and π ∈ MD (i.e., x and π are primal feasible and dual feasible) and c⊤ x = π ⊤ b.
Then x ∈ MPopt and π ∈ MD
opt
(i.e., x and π are optimal).

The unboundedness of the primal problem implies that the dual problem is infeasible, and vice versa.

Corollary 2.8 The following implications hold:

(a) If minx∈MP c⊤ x = −∞, then MD = ∅.

(b) If maxπ∈MD π ⊤ b = +∞ then MP = ∅.

The following claim is the most nontrivial and striking statement of the duality theory of LP.

Theorem 2.9 (Strong Duality Theorem) MPopt ̸= ∅ if and only if MD


opt
̸= ∅. If optimal solutions exist then

min c⊤ x = max π ⊤ b.

Corollary 2.10 For every primal-dual pair exactly one of the possibilities indicated in the table below holds.

MPopt ̸= ∅ minx∈MP c⊤ x = −∞ MP = ∅
opt
MD ̸= ∅ ✓ ✗ ✗

maxπ∈MD π b = +∞ ✗ ✗ ✓
MD = ∅ ✗ ✓ ✓

If a linear program is special then it may be unnecessary to use the general Table 2.2 for finding the dual
problem. The special schemes in Table 2.3 are easily derived from Table 2.2 and may help us to find the dual more
quickly.

min f (x) = c⊤ x max g(π) = π ⊤ b min f (x) = c⊤ x max g(π) = π ⊤ b

Ax ≥ b π ⊤ A ≤ c⊤ Ax = b π ⊤ A ≤ c⊤

x≥0 π≥0 x≥0

Table 2.3: Dualisation of the CLP (left) and SLP (right)

A specific problem in duality theory is to find an optimal solution to the dual problem once an optimal solution
to the primal problem is known. The theorem below is helpful here but it also has far reaching consequences in
combinatorial optimisation.

Theorem 2.11 (Complementary Slackness) Suppose x ∈ MP and π ∈ MD . Then x ∈ MPopt and π ∈ MD


opt
if
and only if for all i and j
(cj − π ⊤ Aj )xj = 0 = πi (a⊤
i x − bi ).

9
If we compare this statement with Table 2.2 we see it effectively requires as a necessary and sufficient condition
for optimality that in each row of Table 2.2 on at least one side the constraint is satisfied with equality.

Proof of the Complementary Slackness Theorem. Suppose x ∈ MP and π ∈ MD . Set

uj = (cj − π ⊤ Aj )xj ,
vi = πi (a⊤
i x − bi )
P P
for every i and j and u = j uj , v = i vi . It follows from Table 2.2 that uj ≥ 0 for all j and vi ≥ 0 for all i. We
also have
X X X X
u+v = cj xj − π ⊤ Aj x j + πi a ⊤
i x− πi bi
j j i i
X X
= cj xj − π ⊤ Ax + π ⊤ Ax − π i bi
j i

= c⊤ x − π ⊤ b

Suppose first that x ∈ MPopt and π ∈ MD opt


. Then c⊤ x = π ⊤ b, hence j uj + i vi = 0 and since all uj and vi
P P

are non-negative, it follows that uj = vi = 0 for all i and j.


Conversely suppose that uj = vi = 0 for all i and j. Then u + v = 0, therefore c⊤ x = π ⊤ b. Since x ∈ MP and
π ∈ MD it follows now by Corollary 2.7 that x ∈ MPopt and π ∈ MD opt
.

Example 2.12 For the linear program below construct its dual, solve the dual (for instance, graphically) and then
use duality theory to find a solution to the primal problem (recall that this means either to find an optimal solution,
or decide about infeasibility or unboundedness):

min x1 + 7x2 − 10x3


s.t. − x1 + x2 − 5x3 ≥ 1
x1 + x2 − x3 ≥ 2
x1−3 ≥ 0

The dual is

max π1 + 2π2
s.t. − π1 + π2 ≤ 1
π1 + π2 ≤ 7
−5π1 − π2 ≤ −10
π1,2 ≥ 0

An optimal solution π ∗ = (3, 4)⊤ to the dual can be found graphically. Hence by the Strong Duality Theorem
the primal also has an optimal solution and we can find it using the Complementary Slackness Theorem as follows.
By this theorem the following conditions must be satisfied by feasible x and π for them to be optimal:

π1 (−x1 + x2 − 5x3 − 1) = 0
π2 (x1 + x2 − x3 − 2) = 0
(1 + π1 − π2 )x1 = 0
(7 − π1 − π2 )x2 = 0
(−10 + 5π1 + π2 )x3 = 0

10
Since the third dual inequality is satisfied by π ∗ strictly it follows that x3 = 0 at optimality. At the same time both
components of π ∗ are positive and so the first two conditions imply (after setting x3 = 0):

−x1 + x2 − 1 = 0
x1 + x2 − 2 = 0
⊤ ⊤
There is a unique solution 21 , 32 to this system and so x∗ = 21 , 32 , 0 . We can check that the values of both
objective functions (with arguments x∗ and π ∗ ) are equal to 11, which confirms that x∗ and π ∗ are indeed optimal.

This example shows a situation in which it is possible to solve a linear program easily using duality theory: it
arises when the dual problem can easily be solved and if an optimal solution exists, the Complementary Slackness
Theorem then enables us to find an optimal solution to the primal problem. This situation always arises when the
primal problem has only two constraints because then the dual has only two variables and therefore can be solved
graphically. But it should be stressed that this is not the only case when a solution to a linear program can be
found in this way.
In what follows we will show a much more substantial application of duality and specifically, of the Complemen-
tary Slackness Theorem.

11
Chapter 3

The primal-dual algorithm

The primal-dual algorithm is a meta-algorithm, that is an algorithm designed to generate algorithms for linear
programs of special forms, in particular for a class of combinatorial optimisation problems that can be formulated
as linear programs. It is based on the duality theory of linear programming. In this chapter we derive this
algorithm from the Complementary Slackness Theorem (Theorem 2.11) and later on we will show how it can be
used to generate an algorithm, called ALPHABETA, for the transportation problem.
We will assume without loss of generality that the linear program (referred to as P ) we want to solve is in
standard form

min c⊤ x
s. t. Ax = b
x≥0

where, as before, A = (aij ) ∈ Rm×n , c = (c1 , . . . , cn )⊤ ∈ Rn , b = (b1 , . . . , bm )⊤ ∈ Rm . In addition we will also


suppose without loss of generality that b ≥ 0. Together with its dual (D) we have

min f (x) = c⊤ x max g(π) = π ⊤ b

Ax = b
x≥0 π ⊤ A ≤ c⊤

By the Complementary Slackness Theorem (Theorem 2.11) a necessary and sufficient condition for x ∈ MP and
π ∈ MD to be optimal is that
(cj − π ⊤ Aj )xj = 0 = πi (a⊤
i x − bi ) (3.1)

are satisfied by all i and j. In the current setting Ax = b is satisfied by any x ∈ MP so the products on the right
in (3.1) are all 0. Hence (3.1) reduces to
(cj − π ⊤ Aj )xj = 0 (3.2)
opt
for all j. So, once we know a π ∈ MD all we need to do to find an optimal solution x to the primal is to find an
x ∈ MP that satisfies (3.2). Clearly, it is not reasonable, in general, to assume that we know an optimal solution
to the dual problem from the start. In the following we will build an algorithm, the primal-dual algorithm, which
requires a feasible solution π ∈ MD of the dual rather than an optimal solution to the dual to start with. However,
even this assumption is difficult, since finding a feasible solution to a linear program is in general equally difficult
as finding an optimal solution. All the discussion of the primal-dual algorithm that will follow is done under this

12
assumption. However, in many cases when the primal-dual algorithm is applied to combinatorial optimisation
problems, it is easily satisfied.
So, suppose now that a π ∈ MD is known and that we are looking for an optimal x ∈ MP . For the given π we
might not be able to find an x ∈ MP that satisfies (3.2) for all j. However, we will iteratively try to find a pair
(x∗ , π ∗ ) which satisfies (3.2) for all j starting with the given π ∈ MD .
Recall that π ∈ MD means π ⊤ A ≤ c⊤ and so (3.2) will be satisfied for all j for which π ⊤ Aj = cj . If, on the
other hand, π ⊤ Aj < cj then xj must be 0. Thus when solving the system Ax = b, x ≥ 0, we may remove all
variables xj where π ⊤ Aj < cj . We therefore set

J = {j | π ⊤ Aj = cj }

and solve the system Ax = b, x ≥ 0 with variables xj , j ∈ J only. That is we need to solve
X 
aij xj = bi , i ∈ {1, . . . , m}

j∈J (3.3)

xj ≥ 0, j∈J

One of the important applications of linear programming is a method for solving linear systems of equations
with non-negativity constraints. This method (sometimes called the Two-Phase Method) introduces a new variable,
called auxiliary, in every equation and then minimises the sum of all auxiliary variables. This is what we will now
do to solve (3.3). More precisely, let us consider
Xm 
a
min ξ = xi





i=1 

X 
a

s. t. aij xj + xi = bi , i ∈ {1, . . . , m}

(3.4)
j∈J 


xai ≥ 0, i ∈ {1, . . . , m}





xj ≥ 0, j ∈ J

where xai stands for the i-th auxiliary variable. This linear program is called the restricted primal (abbr. RP ). The
set J is called the set of admissible indices or columns (or sometimes the set of admissible variables).
Proposition 3.1 The linear program (3.4) has an optimal solution and ξ min ≥ 0.

Proof. This linear program has a feasible solution (xj = 0, j ∈ J | xai = bi , i ∈ {1, . . . , m}), recall that b ≥ 0, and
the objective function is bounded below by 0. Therefore by Theorem 2.2 this LP has an optimal solution and the
optimal objective function value is non-negative.
Proposition 3.2 ξ min = 0 in (3.4) if and only if (3.3) has a feasible solution.

Proof. Suppose (xj , j ∈ J | xai , i ∈ {1, . . . , m}) is an optimal solution to (3.4) and ξ min = 0. Then all xai = 0 and
xj , j ∈ J is a solution to (3.3).
Conversely, if xj , j ∈ J is a solution to (3.3) then (xj , j ∈ J | xai = 0, i ∈ {1, . . . , m}) is a feasible and therefore
Pm
also optimal solution to (3.4) with ξ min = i=1 xai = 0.
It follows from these two propositions that if ξ min = 0 in (3.4) then an optimal solution for (P ) has been found.
Let us now suppose that ξ min > 0. It follows from the above propositions that the π ∈ MD with which we
started is not dual optimal. We will therefore seek a “correction” of this vector in a way that increases our chances
of obtaining a dual optimal vector. This will be done using an optimal solution to the dual of the restricted primal
problem (DRP ). It follows from (3.4) that this dual is

max w = π ⊤ b 



s. t. π Aj ≤ 0, j∈J (3.5)


πi ≤ 1, i ∈ {1, . . . , m}

13
We assume here that it is not difficult to find an optimal solution to the DRP ; it will turn out to be the case in the
applications of the primal-dual algorithm to combinatorial optimisation problems. So let π be an optimal solution
to (3.5). By the Strong Duality Theorem (Theorem 2.9) we have

π ⊤ b = ξ min > 0. (3.6)

We will seek a correction of π in the form


π ∗ = π + θπ,
where the parameter θ needs to be determined. This will be done using two requirements.
First, the objective function value at the new dual feasible solution should be closer to the optimal objective
function value, that is
π∗ ⊤ b > π⊤ b
which implies
π ⊤ b + θπ ⊤ b > π ⊤ b
from which, using (3.6) it follows that θ should be positive.
Second, we need to make sure that the new vector actually is dual feasible, that is

π ∗ ⊤ Aj ≤ cj , j ∈ {1, . . . , n}

or, equivalently
π ⊤ Aj + θπ ⊤ Aj ≤ cj , j ∈ {1, . . . , n}. (3.7)

Since π ⊤ Aj ≤ cj for all j and θ > 0 this requirement will automatically be satisfied whenever π ⊤ Aj ≤ 0. If on
the other hand π ⊤ Aj > 0 then we need to solve (3.7) for θ which yields

cj − π ⊤ Aj
θ≤ , if π ⊤ Aj > 0
π ⊤ Aj
or, equivalently, ( )
cj − π ⊤ Aj ⊤
θ ≤ min | π Aj > 0 .
π ⊤ Aj
For a while we will assume that π ⊤ Aj > 0 holds for at least one j and we will return to this question later on.
The value of θ should be as big as possible in order to get as close as possible to an optimal solution to the dual.
Therefore the choice is ( )
cj − π ⊤ Aj ⊤
θ = min | π Aj > 0 .
π ⊤ Aj
This value will be denoted by θ1 . Clearly θ1 > 0 because if π ⊤ Aj > 0 then j ̸∈ J and so π ⊤ Aj < cj . We also
have an increase of the dual objective function value since, see (3.6),

w∗ = π ⊤ b + θ1 π ⊤ b > π ⊤ b = w.

After finding the new dual feasible π ∗ a new iteration can start, π ∗ will be renamed to π, the set J re-defined,
etc. If necessary further iterations will follow and this process will stop when ξ min = 0. The corresponding vector
(xj ; j ∈ J) complemented by zero components of x is then an optimal solution of (P ), that is an optimal solution
is (xj , j ∈ J | xj = 0, j ̸∈ J). It can be proved [1] that this will happen after a finite number of iterations; this
proof is omitted here.
However, one point in the above procedure needs to be revisited, namely we need to clarify the case when

π Aj ≤ 0 for all j. This is necessary to discuss as in such a case θ1 is undefined. But the next theorem gives a
clear answer.

14
Theorem 3.3 (Infeasibility Criterion) If ξ min > 0 and π ⊤ Aj ≤ 0 for all j ̸∈ J then MP = ∅.

Proof. By Corollary 2.8 it is sufficient to prove that maxπ∈MD π ⊤ b = +∞. For any positive real number θ we define

π(θ) = π + θπ.
opt
If j ∈ J then π ⊤ Aj ≤ 0 because π ∈ MDRP ⊂ MDRP and so by the theorem assumption π ⊤ Aj ≤ 0 for all j. Also,

for any j we have π Aj ≤ cj since π ∈ MD . Hence for every j ∈ J and θ > 0 we have:

π(θ)⊤ Aj = π ⊤ Aj + θπ ⊤ Aj ≤ cj .

At the same time


π(θ)⊤ b = π ⊤ b + θπ ⊤ b −→ +∞
as θ → +∞ since π ⊤ b = ξ min > 0 and π ⊤ b is constant. Hence maxπ∈MD π ⊤ b = +∞.
We are ready to formulate the algorithm.
Algorithm 3.4 (Primal-Dual Algorithm)
Input: Linear program P :

min c⊤ x
s. t. Ax = b
x≥0

with b ≥ 0 and a dual feasible solution π.


Output: An optimal solution to P or an indication that MP = ∅.
1. Set J = {j | π ⊤ Aj = cj }.

2. Solve RP :
m
X
min ξ = xai
i=1
X
s. t. aij xj + xai = bi , i ∈ {1, . . . , m}
j∈J

xai ≥ 0, i ∈ {1, . . . , m}
xj ≥ 0, j ∈ J.

3. Let (xj , j ∈ J | xai , i ∈ {1, . . . , m}) be an optimal solution to RP . If ξ min = 0 then stop, (xj , j ∈ J) is an
optimal solution to P .

4. Find an optimal solution π to DRP


max w = π⊤ b
s. t. π ⊤ Aj ≤ 0, j∈J
πi ≤ 1, i ∈ {1, . . . , m}.

5. If π ⊤ Aj ≤ 0 for all j ̸∈ J then stop, MP = ∅.

6. Set ( )
cj − π ⊤ Aj
θ1 = min | π ⊤ Aj > 0 .
π ⊤ Aj
and
π := π + θ1 π.

7. Go to 1.

15
Stop
ξ(x∗ ) = 0

Restricted Dual of the


primal (RP) restricted primal
Primal (P) Dual (D)
– find x∗ ξ(x∗ ) > 0 (DRP) – find π
optimal to RP optimal to DRP

π > Aj ≤ 0
∃j : π > Aj > 0 ∀j 6∈ J
Adjustment to π
Stop

Figure 3.1: The primal-dual algorithm

The algorithm is summarised in the scheme shown in Figure 3.1.

Example 3.5 Given is the linear program

min 6x1 + 16x2 − 15x3


s.t. 2x1 − 4x2 + x3 =1
x1 + 5x2 − 5x3 =2
x1−3 ≥0

and a dual feasible solution π = (0, 3)⊤ . We will solve this linear program by the primal-dual algorithm.
To find J we need to start with constructing the dual:

max π1 + 2π2
s.t. 2π1 + π2 ≤ 6
−4π1 + 5π2 ≤ 16
π1 − 5π2 ≤ − 15

and substitute π = (0, 3)⊤ in the left-hand sides to obtain π ⊤ Aj for j ∈ {1, 2, 3}. It will be practical to write the
results of the calculation in a table:

j cj π ⊤ Aj π ⊤ Aj π ∗ ⊤ Aj
1 6 3
2 16 15
3 -15 -15

We deduce that j = {3} and so RP is

min xa1 + xa2


s.t. x3 + xa1 =1
−5x3 + xa2 =2
x3 ≥0
xa1,2 ≥ 0.

16
Since ξ = xa1 + xa2 = 0 and the two equations would contradict each other, we have ξ min > 0 and so we need to solve
the DRP which is:

max π1 + 2π2
s.t. π1 − 5π2 ≤0
π1 ≤1
π2 ≤ 1.

An optimal solution to DRP can be found graphically: π = (1, 1)⊤ , and we can calculate π ⊤ Aj :

j cj π ⊤ Aj π ⊤ Aj π ∗ ⊤ Aj
1 6 3 3
2 16 15 1
3 -15 -15 -4

This table is now helpful for calculating θ1 :


( )
6 − 3 16 − 15
θ1 = min , = 1.
3 1

So ! ! !
∗ 0 1 1
π = +1· = .
3 1 4

The second iteration starts by finding (π ∗ )⊤ Aj for j ∈ {1, 2, 3}:

j cj π ⊤ Aj π ⊤ Aj π ∗ ⊤ Aj
1 6 3 3 6
2 16 15 1 16
3 -15 -15 -4 -19

and then by finding a new J = {1, 2}. Hence the new RP is

min xa1 + xa2


s.t. 2x1 − 4x2 +xa1 =1
x1 + 5x2 xa2 =2
x1,2 ≥0
xa1,2 ≥0

where again ξ = xa1 + xa2 . For testing whether ξ min = 0 we set xa1 = xa2 = 0 and try to solve the system

2x1 − 4x2 + = 1
x1 + 5x2 = 2
x1,2 ≥ 0.

13 3 ⊤ 13 3
⊤
from which we deduce that an optimal solution to P is x∗ =

It has a (unique) solution 14 , 14 14 , 14 , 0 and
f min = 9, where f (x) = 6x1 + 16x2 − 15x3 .

17
Part II

Efficiently solvable problems

18
Chapter 4

The maximum flow problem in networks

4.1 Introduction
The max-flow problem is one of the fundamental combinatorial optimisation problems. It has a number of ap-
plications both within and outside combinatorial optimisation such as in scheduling, allocation of resources or
communication.
We will start with a very simple example. Suppose a company wants to ship the maximum possible amount of
oil per hour via the pipeline from station s (source) to station t (sink) in Figure 4.1. On its way from s to t, the oil
must pass through some or all of the stations a, b and c. The various arcs in Figure 4.1 represent pipes of different
diameters. The maximum number of barrels of oil (millions of barrels per hour) that can be pumped through each
pipe is indicated in rectangles at each pipe (“pipe capacity”).
In order to determine the maximum volume of oil per hour that can be transported via this pipeline we need
to take into account natural restrictions. One is of course the capacity of each individual pipe which cannot be
exceeded. The other one is the basic principle that what is entering a station must also leave this station (we
ignore any possible losses due to leakages etc). This principle is usually referred to as the conservation law. The
conservation law of course does not apply at the source and sink.
At the same time we need to introduce a measure that represents the volume of the oil transported and which
we wish to maximise. By definition it is the total amount leaving the source via any pipe incident with the source
(which, as we will prove later on, is the same as the total value entering the sink). In general we will use the words
nodes and arcs rather than “stations” and “pipes”. Nodes, arcs, arc capacities and a specification of the source and

4
2

5
s b t

2
4 1

Figure 4.1: Max-flow problem: A simple example

19
a

4
2 2
2

5
s b t
1

2 1
1 2
4 1

Figure 4.2: Max-flow problem: Example of a flow

4
2 2
2

5
s b t
2

2 2
2
4 1

Figure 4.3: Max-flow problem: No flow – capacity constraints are violated

sink together determine a network. We will give exact definitions shortly.


The actual flow values through individual pipes will be shown as free-standing numbers. By a “flow” we normally
understand the values on all individual arcs together. So for instance the free-standing numbers in Figure 4.2
represent a flow in the network of Figure 4.1. But the free-standing numbers in Figure 4.3 do not since the value
on arc (c, t) exceeds its capacity (although the conservation law at a, b and c is satisfied). On the other hand
the free-standing numbers in Figure 4.4 do not represent a flow because the conservation law at c is not satisfied,
although the capacities are not exceeded at any arc.

4.2 Directed graphs and flows. Definition of the max-flow problem


We need to introduce the terminology of directed graphs (briefly digraphs) before we can start the theory of flows
in networks. A digraph is any ordered pair D = (V, E) where V is a nonempty finite set called the set of nodes and
E ⊆ V × V , where V × V = {(u, v) : u, v ∈ V }, is the set of arcs. Thus (u, v) ∈ E does not imply that (v, u) ∈ E.
If (u, v) ∈ E then we say that the arc (u, v) is incident with u and v, and that the nodes u and v are adjacent. We
also say that (u, v) is outgoing for u and incoming for v or that it is leaving u and entering v. The arcs of the form
(u, u) are called loops. The number of nodes (respectively, arcs) will usually be denoted by m (respectively, n). An
example of a small digraph is in Figure 4.5, where m = 3 and n = 5.
A digraph (V, E) is called oriented if (u, v) ∈ E ⇒ (v, u) ∈
/ E. Hence the digraph of Figure 4.5 is not oriented.
A network is N = (V, E, s, t, b), where (V, E) is an oriented digraph without loops, s ∈ V has no incoming arcs,

20
a

4
2 2
2

5
s b t
1

1 1
1 2
4 1

Figure 4.4: Max-flow problem: No flow – conservation law is violated

Figure 4.5: Digraph

t ∈ V has no outgoing arcs and b : E → R+ ∪ {+∞} (i.e., b is a function defined on the set of arcs with values in
R+ ∪ {+∞}. If (i, j) ∈ E then b(i, j) or just bij will be called the capacity of the arc (i, j). If E = {e1 , . . . , en },
then bi will stand for b(ei ).
Any function f : E → R+ ∪ {+∞} is called a flow on N if the following capacity constraints hold

0≤f ≤b (4.1)

and if the following conservation law(s)


X X
f (x, y) = f (y, x) (4.2)
y : (x,y)∈E y : (y,x)∈E

hold for all x ∈ V , x ̸= s, x ̸= t.


The number f (i, j) or just fk (if ek = (i, j)) is called the flow value on the arc (i, j). We can write the
conservation law alternatively as follows:
X X
fk = fk (4.3)
ek enters x ek leaves x

for all x ∈ V , x ̸= s, x ̸= t. The quantity X


v= fk (4.4)
ek leaves s

is called the value of the flow f , notation v(f ) or just v.


The max-flow problem (MFP) can now be defined as follows: Given a network, find a flow (“max-flow”) in this
network whose value is maximal.
The value of the flow in Figure 4.2 is 3. It is easily seen that the value of any flow in this network cannot be
greater than 3. Therefore this flow is a max-flow in this network. It is not a unique max-flow (can you find another
one?).

21
a
e1 e4

s e3 t

e2 e5

Figure 4.6: Digraph D whose node-arc incidence matrix AD is stated in (4.5)

4.3 MFP is a linear program


It will be important to know that the max-flow problem is a linear program. In order to prove it, we need to
introduce a few concepts.
Let D = (V, E) be a digraph without loops, V = {v1 , . . . , vm } , E = {e1 , . . . , en }. The matrix AD = (aij ) ∈
m×n
R is called the node-arc incidence matrix of D if

1, if ej is leaving vi ,


aij = −1, if ej is entering vi ,


0, otherwise.

For the digraph of Figure 4.6 we have


e1 e2 e3 e4 e5
s 1 1
AD = a −1 −1 1 (4.5)
b −1 1 1
t −1 −1
Here the missing entries are all zero.
It follows from the definition of a node-arc incidence matrix that if there are no loops in D then each column of
AD contains exactly one 1 and one −1 since every arc is leaving exactly one node and entering exactly one node.
We will now derive a linear program that describes the max-flow problem in a network. So suppose a network
N = (V, E, s, t, b) is given and A is the node-arc incidence matrix of (V, E). Let f be a flow in N and V =
{x1 , . . . , xm }.
P
Consider the sum j aij fj for a fixed node xi :
 P P


 fj − fj = 0, if xi ∈
/ {s, t},


 ej is leaving xi ej is entering xi
X  P
aij fj = fj = v, if xi = s, (4.6)
 ej is leaving xi
j 

fj = v ′ ,
− P
if xi = t.



ej is entering xi

where v is (at the moment) an unknown quantity. This is a system of linear equations with variables f1 , f2 , . . . , fn ,
which can be written compactly as follows:

Af = (0, . . . , v, . . . , v ′ , . . . , 0)⊤ (4.7)

with v in the row corresponding to the source and v ′ in the row corresponding to the sink. Since every column of
A contains exactly one 1 and one −1, if we add up all equations we get:

0 = v + v′ ,

22
from which we find that v ′ = −v. Hence the last equation in (4.6) now yields what was intuitively clear at the
beginning of this section, namely that the total flow value on arcs leaving the source (that is the value of the flow) is
equal to the total flow value on arcs entering the sink. Hence we can write (4.7) with all variables on the left-hand
side as follows:
Af + dv = 0
where d is the constant vector
..
 
 .
−1
 
 . 
 . 
 . 
 
1
..
 
.
with −1 in the row corresponding to the source and +1 corresponding to the sink and all other entries 0. It remains
to recall also the capacity conditions on f :
0 ≤ f ≤ b,
and the result below now follows.
Theorem 4.1 (MFP as LP) If A is a node-arc incidence matrix of the network N = (V, E, s, t, b) then f : E →
R+ is a flow in N if and only if it satisfies the following conditions:
Af + dv = 0
f ≤b
f ≥0
Hence the max-flow problem in N is equivalent to the linear program
max v
s.t. Af + dv = 0
(4.8)
f ≤b
f ≥0
It may be useful to realise the following properties (recall that m is the number of nodes and n is the number
of arcs in N ):

ˆ The linear program (4.8) has n + 1 variables (f1 , . . . , fn , v) and m + n non-trivial constraints (m equations
and n inequalities).

ˆ v can be assumed to be non-negative without loss of generality.

ˆ f = 0 (with v = 0) is always a feasible solution to (4.8), and thus it is a flow in any network.

Corollary 4.2 A max-flow exists in every network with finite capacities of arcs leaving the source (or entering the
sink).

Proof. Since f = 0 is always a feasible flow, by the Fundamental Theorem of Linear Programming (Theorem 2.2)
it is sufficient to show that the objective function in (4.8) is bounded. This is easily seen since by the assumption
of the corollary we have X X
v = v(f ) = f (ej ) ≤ b(ej ) < +∞,
ej is leaving s ej is leaving s

thus showing the claim.


In what follows we will assume that (although the capacities of arcs in a network may be +∞ in general) the
capacities of all arcs leaving the source are finite.

23
a

e3 2 3
e5

1
s b t
e1

e2 e6
e4 2
2 1

Figure 4.7: MFP as LP

Example 4.3 Let N be the network of Figure 4.7. The corresponding linear program is

max v
s.t. f1 + f2 = v
− f3 + f5 = 0
−f1 + f3 − f4 = 0
− f2 + f4 + f6 = 0
− f5 − f6 = −v
f ≤ (1, 2, 2, 2, 3, 1)⊤
f ≥ (0, 0, 0, 0, 0, 0)⊤

In compact form (the missing entries are zero)

max v
 
  f1
1 1
 
−1
 f2   
 
−1 1     .. 


  f3   . 
s.t. −1 1 −1   +  v = 0
 f   . 
.

  4
−1 1 1    . 

f5 

−1 −1 1
f6
f ≤ (1, 2, 2, 2, 3, 1)⊤ ,
f ≥ (0, 0, 0, 0, 0, 0)⊤ .

4.4 The Ford and Fulkerson Algorithm


The Ford and Fulkerson Algorithm (FFA), developed in the 1950s, is historically one of the first algorithms for
solving the max-flow problem. It has been superseded in speed by other algorithms, however it is the most suitable
algorithm for the application of the primal-dual algorithm to the transportation problem in the next Chapter.
The algorithm is based on the idea of a repeated improvement (called augmentation) of flows, starting from
any flow, for instance from the zero flow. The augmentation uses so called augmenting paths. All these paths are
s − t paths and they ignore the direction of arcs they include. Hence they contain two sorts of arcs: those whose
direction is the same as that of the path (forward arcs) and those whose direction is opposite (backward arcs).
Once an augmenting path is found the current flow is augmented along this path to a flow of greater value. This is
repeated until a max-flow is reached (but see the reachability question at the end of this chapter).

24
a

3
1 2
1

1
s b t
1

1 1
0 2
2 1

Figure 4.8: Augmenting path with forward arcs only

3
2 2
2

1
s b t
0

2 0
2 2
2 1

Figure 4.9: Augmenting path with some backwards arcs

We start with two examples that explain the idea in some detail. In each an augmenting path is indicated by
dotted lines. In Figure 4.8 it is the path (s, c, b, a, t) on which the capacities of the arcs exceed the current flow
values by 1, 2, 1 and 2, so it is possible to transport extra flow along this path of value min{1, 2, 1, 2} = 1. By
adding 1 to the current flow on every arc on this path we obtain a new flow whose value will be greater by 1.
In Figure 4.9 no such path exists, nevertheless it is possible to augment the current flow of value 2 along the
indicated path, which, unlike the previous path, contains a backward arc (c, b). If the flow values are increased by
1 on the forward arcs, that is on (s, b) and (c, t), and decreased by the same value on the backward arc (c, b) then
the new function will again be a flow (check the capacity constraints and the conservation law!) and its value will
be greater by 1.
The above example demonstrates the need to consider also s − t paths with backward arcs. We will now give
precise definitions of the concepts which are important in the Ford and Fulkerson algorithm.
Given a network N = (V, E, s, t, b), and a flow f , an augmenting path p is an s − t path in the undirected graph
resulting from (V, E) by ignoring arc directions, with the following properties:

1. For every forward arc (i, j) we have f (i, j) < b(i, j). That is, forward arcs are unsaturated.

2. For every backward arc (j, i) we have f (j, i) > 0. That is backward arcs have non-zero flow.

The main purpose of an augmenting path is to find a new flow whose value is greater than that of the current
flow. This can be done as follows:

ˆ increase the flow value on every forward arc but still satisfying the capacity constraint and

25
ˆ decrease the flow value on every backward arc but keeping it non-negative.

Each of these changes should be by the same value so that the conservation law remains intact. For this common
value we can take any positive real number not exceeding the remaining positive capacity of any forward arc and
not exceeding the (positive) flow value on any backward arc. The maximum such value for a fixed augmenting path
p therefore is

b(i, j) − f (i, j) if (i, j) is forward,
δ = min (4.9)
(i, j) on p f (j, i) if (j, i) is backward.

Next we need to explain how to find an augmenting path in a systematic way. We may suppose that a flow in
the network is already known (if necessary we start with the zero flow). The method for finding an augmenting
path is a labelling algorithm. Labelling starts from the source, proceeds through other nodes (not necessarily all)
and stops as soon as the sink is reached. The constructed labels should provide an efficient way of identifying an
augmenting path and the augmenting value δ. Therefore each label, say label(x) for a node x, will consist of two
parts, the first giving information about the previous node from which x has been reached, and the second a value
(positive real) of extra flow that can be brought from the source to x. More precisely we define

label(x) := (L1[x], L2[x])

where L1[x] is the node from which node x has been reached and L2[x] is the amount of extra flow that can be
brought from s to x. The labelling starts with the source:

label(s) := (∅, +∞),

reflecting that s has no predecessors, and continues by propagating the labels from node to node with the following
rules:

ˆ the label is propagated from labelled nodes to adjacent nodes (this process is called scanning) subject to
precisely specified conditions, see below;

ˆ scanning of a node must be carried out completely (that is to all unlabelled adjacent nodes that qualify,
see below) before another node is scanned;

ˆ the labelling stops as soon as the sink is labelled;

ˆ otherwise it continues until no node can be scanned any more.

Scanning is the process of propagating the label from a node, say x, to another node, say y, that satisfies the
following conditions:

ˆ y is adjacent to x (that is, either (x, y) is an arc or (y, x) is an arc);

ˆ y is unlabelled;

ˆ If (x, y) is an arc then require f (x, y) < b(x, y). The result of the scanning is label(y) := (L1[y], L2[y]), where
L1[y] = x and
L2[y] := min{L2[x], b(x, y) − f (x, y)}

If f (x, y) = b(x, y) then y is not scanned in this scanning step (it might be scanned later from a different
node).

26
f e

8 6
8 13
(a, 5)

g 7 18
x d
3 18

12 4
7 6
(w, 8) 20 10

a c
(u, 15)

Figure 4.10: Before scanning x

ˆ If (y, x) is an arc then require f (y, x) > 0. The result of scanning is label(y) := (L1[y], L2[y]), where
L1[y] = −x and
L2[y] := min{L2[x], f (y, x)}.

If f (y, x) = 0 then y is not scanned in this scanning step (it might be scanned later from a different node).

In order to prevent any re-scanning of a node within an iteration we introduce a list called LIST which will help
us to keep track of the nodes that can be scanned. That is at any stage LIST is the set of nodes that have been
labelled but not scanned yet.

Example 4.4 Figure 4.10 shows a section of a network before scanning node x, Figure 4.11 shows the same section
after the scanning.

We can summarise the labelling algorithm.


Initialisation:

ˆ LIST = {s}

Iteration (repeated until LIST = ∅):

ˆ Select a node x from LIST and remove x from LIST ;

ˆ Scan x and add all nodes labelled from x to LIST .

Termination conditions of the labelling algorithm:

ˆ Either t gets labelled: in this case reconstruct an augmenting path backwards from t using L1’s and augment
the current flow along the augmenting path by δ = L2[t], that is, for every arc ej on the augmenting path the
change is carried out as follows:

f + δ, if e is a forward arc,
j j
fj 7→ fj′ =
fj − δ, if ej is a backward arc,

Note that before the next run of the labelling algorithm (with the new flow) all labels must be removed.

27
(−x, 5) (x, 5)
f e

8 6
8 13
(−x, 3) (a, 5)

g 7 18
x d
3 18

12 4
7 6
(w, 8) 20 10 (x, 2)
a c
(u, 15)

Figure 4.11: After scanning x

ˆ Or LIST becomes empty: in this case the current flow is optimal (this will be proved later on). When we
achieve optimality after an application of FFA to MFP, we say we are at non-breakthrough (NBT).

We are now ready to summarise the whole algorithm.

Algorithm 4.5 (Ford and Fulkerson Algorithm)


Input: A network N = (V, E, s, t, b).
Output: A maximum flow in N
Set f := 0 or equal to any other flow if known. Set N BT := F ALSE.
while N BT = F ALSE do
Delete all existing labels (if any).
LIST := {s} , label(s) := (∅, +∞).
while LIST ̸= ∅ do
Take any x from LIST and remove it from LIST .
Scan x.
Extend LIST by all nodes scanned from x.
if LIST = ∅ then
Set N BT := T RU E and break.
end if
if t is labelled then
Augment flow f along the augmenting path by δ = L2[t] and break.
end if
end while
end while
Return f .

28
Example 4.6 We find a max-flow in the network of Figure 4.12 using the FFA starting from the flow given by the
free-standing numbers.The first iteration of the algorithm is recorded in the tab1e below.

x ∈ LIST label(x)
s (∅, +∞)

b (s, 2)

c (s, 1)

t (c, 1)
δ=1
p = (s, c, t)

The new flow is in Figure 4.13 and the second iteration is:

x ∈ LIST label(x)
s (∅, +∞)

b (s, 2)

c (−b, 2)

t (c, 1)
δ=1
p = (s, b, c, t)

The new flow is in Figure 4.14 and the third iteration is:

x ∈ LIST label(x)
s (∅, +∞)

b (s, 1)

c (−b, 1)
LIST = ∅

The non-breakthrough has been reached, the current flow is maximal, its value is 4.

Next we aim to prove that the algorithm we have deduced intuitively is indeed a correct solution method for
solving the max-flow problem; then we will also discuss its finite termination.

4.5 Correctness of the Ford and Fulkerson algorithm


The key tool for proving correctness is the concept of a cut in a network. Given a network N = (V, E, s, t, b) we
say that a pair of sets (W, W ) is a cut in N (Figure 4.15) if the following conditions are satisfied:

1. V = W ∪ W ,

29
a

3
2 2
2

2
s b t
0

2 0
2 2
3 2

Figure 4.12: Initial flow of the Network Flow Problem of Example 4.6

3
2 2
2

2
s b t
0

3 1
2 2
3 2

Figure 4.13: Flow after iteration 1 of the Network Flow Problem of Example 4.6

3
2 2
2

2
s b t
1

3 2
1 2
3 2

Figure 4.14: Flow after iteration 2 of the Network Flow Problem of Example 4.6

30
b(x, y)

b(x, y)
s t
b(x, y)

ignore

ignore
W W

Figure 4.15: A cut in a network

a
3 8
2 5
s 3 3 0 6 t
5 2
6 2
b

Figure 4.16

2. W ∩ W = ∅ and

3. s ∈ W , t ∈ W .

Note that the first two conditions together are usually referred to by saying that (W, W ) is a partition of V .
The capacity of a cut (W, W ) is:
X
c (W, W ) = {b(x, y) : x ∈ W, y ∈ W }.

A cut whose capacity is least is called a minimal cut (min-cut).

Example 4.7 We list all cuts and their capacities in the network of Figure 4.16.

W W c (W, W )
{s} {a, b, t} 3+6=9

{s, a} {b, t} 6 + 6 + 8 = 20

{s, b} {a, t} 3+3+2=8

{s, a, b} {t} 8 + 2 = 10

The cut ({s, b}, {a, t}) is a min-cut.

31
Lemma 4.8 Let N = (V, E, s, t, b) be a network. The following relation holds for any cut (W, W ) and flow f in
N: X X
v= f (x, y) − f (x, y). (4.10)
x∈W, y∈W x∈W , y∈W
(x,y)∈E (x,y)∈E

Proof. Let (W, W ) be a cut and f a flow in N = (V, E, s, t, b). Let v = v(f ), then by the conservation law at every
node x we have:
X X
f (x, y) − f (y, x) = 0, if x ∈ V \{s, t};
y : (x,y)∈E y : (y,x)∈E
X
f (s, y) = v,if x = s;
y : (x,y)∈E
X
− f (y, t) = −v,if x = t.
y : (y,t)∈E

Let us add up all equations in the above system that correspond to x ∈ W . Since s ∈ W and t ∈
/ W we get
(writing the sum of the right-hand sides first):
X X X X
v= f (x, y) − f (y, x).
x∈W y : (x,y)∈E x∈W y : (y,x)∈E

We can expand and then simplify this expression using the fact that (W, W ) is a partition of V :
X X X X X X X X
v= f (x, y) + f (x, y) − f (y, x) − f (y, x).
x∈W y∈W x∈W y∈W x∈W y∈W x∈W y∈W
(x,y)∈E (x,y)∈E (y,x)∈E (y,x)∈E

By swapping the symbols x and y in the third term and using the commutative law we see that the first and
third terms are equal and can therefore be omitted. By the same swap in the fourth term and again using the
commutative law we get exactly the expression in the lemma statement.
We can illustrate this lemma on the previous example, where we calculate the two terms on the right-hand side
of (4.10) and compare their difference with the value of the current flow v(f ) = 7:
P P
W (W, W ) f (x, y) f (x, y)
x∈W, y∈W x∈W , y∈W
(x,y)∈E (x,y)∈E

{s} {a, b, t} 7 0

{s, a} {b, t} 10 3

{s, b} {a, t} 7 0

{s, a, b} {t} 7 0

Theorem 4.9 (Max-Flow, Min-Cut Theorem) Let N = (V, E, s, t, b) be a network. Then

(a) the relation


v(f ) ≤ c (W, W )

holds for every flow f and every cut (W, W ) in N , and

(b) the value of any max-flow is equal to the capacity of any min-cut.

32
Proof. (a) Let (W, W ) be a cut and f a flow in N = (V, E, s, t, b). By Lemma 4.8 and using the capacity constraints
we have:

X X
v(f ) = f (x, y) − f (x, y)
x∈W, y∈W , x∈W , y∈W,
(x,y)∈E (x,y)∈E
X X
≤ b(x, y) − 0
x∈W, y∈W , x∈W , y∈W,
(x,y)∈E (x,y)∈E

= c (W, W ).

(b) Let f ∗ be a max-flow in N (it exists by Corollary 4.2) and let us start the Ford and Fulkerson algorithm
with f ∗ as the initial flow. Then the algorithm will stop in the first iteration with a non-breakthrough. Let us
denote

W ∗ = {x ∈ V : x is labelled at NBT},
W ∗ = V \W ∗ .

Then s ∈ W ∗ , t ∈ W ∗ and so (W ∗ , W ∗ ) is a cut in N . By Lemma 4.8 and the rules of the Ford and Fulkerson
algorithm (see Figure 4.17) we have:
X X
v(f ∗ ) = f ∗ (x, y) − f ∗ (x, y)
x∈W ∗ , y∈W ∗ , x∈W ∗ , y∈W ∗ ,
(x,y)∈E (x,y)∈E
X X
= b(x, y) − 0

x∈W , y∈W ∗ , x∈W ∗ , y∈W ∗ ,
(x,y)∈E (x,y)∈E

= c (W , W ∗ ).

It remains to show that c (W ∗ , W ∗ ) is indeed the smallest Let (W, W ) be any cut. Then by (a) we have:

c (W, W ) ≥ v(f ∗ ) = c (W ∗ , W ∗ )

and the statement now follows.


This theorem can be illustrated in Example 4.6, where at non-breakthrough W ∗ = {s, b, c} and c (W ∗ , W ∗ ) =
4 = v(f ∗ ).
We are now ready to prove correctness of the Ford and Fulkerson algorithm.

Theorem 4.10 (Ford and Fulkerson, 1956) If the Ford and Fulkerson algorithm terminates it does so at a
max-flow.

Proof. Let f ∗ be the flow reached at termination, thus a non-breakthrough has occurred. (Note that the next eight
lines are an exact copy of the lines in the previous proof, but not the end of the proof).
Let us denote

W ∗ = {x ∈ V : x is labelled at NBT},
W ∗ = V \W ∗ .

Then s ∈ W ∗ , t ∈ W ∗ and so (W ∗ , W ∗ ) is a cut in N . By Lemma 4.8 and the rules of the Ford and Fulkerson

33
f =b

f =b

s t
f =b

Labelled nodes Unlabelled nodes


f =0

f =0 ∗
W∗ W

Figure 4.17: A non-breakthrough in the FFA

algorithm (see Figure 4.17) we have:


X X
v(f ∗ ) = f ∗ (x, y) − f ∗ (x, y)
x∈W ∗ , y∈W ∗ , x∈W ∗ , y∈W ∗ ,
(x,y)∈E (x,y)∈E
X X
= b(x, y) − 0
x∈W ∗ , y∈W ∗ , x∈W ∗ , y∈W ∗ ,
(x,y)∈E (x,y)∈E

= c (W ∗ , W ∗ ).

So by Part (a) of the Max-Flow Min-Cut Theorem (Theorem 4.9) we now have for any flow f and any cut
(W, W ) in N :
c(W, W ) ≥ v(f ∗ ) = c (W ∗ , W ∗ ) ≥ v(f ),

which completes the proof.


This theorem is a key result in the theory of network flows, however it contains the assumption of termination.
We will now discuss this question as the final point on flows in networks.

4.6 Finite termination of the Ford and Fulkerson algorithm


It may come as a surprise that the algorithm may not always finitely terminate. Fortunately such cases are not very
typical in applications as they may only happen when the capacities are irrational. One such example is presented
in [1]. That example shows that even if the sequence of flows constructed during the run of the Ford and Fulkerson
algorithm converges, the limit may not be a max-flow!
We will now show that if all capacities in a network are rational then the algorithm finds a max-flow in a finite
number of steps. We start with the case when all capacities are integer and the general rational case can then be
deduced as a corollary.

Theorem 4.11 Let N = (V, E, s, t, b) be a network, where b is an integer function on E. Then the Ford and
Fulkerson algorithm will find an integer max-flow, and therefore terminate, after a finite number of steps if it starts
with an integer flow.

Proof. Suppose that the algorithm starts to run from an integer flow f . The increment δ is calculated by (4.9) as
the minimum of the positive expressions of the form b(i, j), f (i, j) and f (j, i). Each of these is an integer since both
f and b are integer functions. Hence δ is a positive integer and therefore δ ≥ 1. If the flow after augmentation is
denoted as f ′ then v(f ′ ) = v(f ) + δ ≥ v(f ) + 1. So after k iterations (augmentations) the flow increases by at least

34
k. At the same time a max-flow exists by Corollary 4.2. If its value is v ∗ then it follows that the algorithm will
reach a max-flow (which will obviously be integer) and therefore terminate after at most v ∗ iterations.

Corollary 4.12 Let N = (V, E, s, t, b) be a network, where b is a rational function on E. Then the max-flow
problem on N can be solved by the Ford and Fulkerson algorithm in a finite number of steps.

Proof. We show that the rational case can easily be converted to the integer case, which then can be solved by the
Ford and Fulkerson algorithm in a finite number of steps due to the theorem. Suppose that

p(i, j)
b(i, j) = , for any (i, j) ∈ E,
q(i, j)

where p(i, j) and q(i, j) are integers, q(i, j) > 0. Let D = lcm(i,j)∈E q(i, j) . Then the max-flow problem

max v
s.t. Af + dv = 0
f ≤b
f ≥0

is equivalent to

max v ′
s.t. Af ′ + dv ′ = 0
f ′ ≤ b′
f′ ≥ 0

where f ′ = Df , v ′ = Dv, b′ = Db. This is a max-flow problem, where the capacities function b′ is integer. By the
theorem this max-flow problem can be solved in a finite number of steps. If the max-flow in the second problem is
f ′ then the max-flow f of the original problem is

f′
f= .
D

Note that a simple modification of the just presented algorithm may guarantee finite termination for networks
with arbitrary real capacities (Karzanov’s algorithm [1]). It is avoided here as we need the basic version for the
application of the primal-dual algorithm.
After the discovery of the Ford and Fulkerson algorithm many other algorithms for the max-flow problem have
been found. There was a tireless effort to improve the speed of these algorithms measured by the number of
operations to termination in the worst case (called computational complexity see Chapter 7). Among the fastest is
that by Goldberg and Tarjan (1998) whose computational complexity is O(m2 · n), where as before m = |V | and
n = |E| .
It should be noted that this section provides a very basic introduction to the theory of max-flows. There are
many modifications that have not been presented here, for instance the case when the capacities have a general
lower bound rather than zero or the case of multi-source, multi-sink networks. The monograph [4] is recommended
for further reading.

35
Chapter 5

Transportation problem

5.1 Basic properties of the transportation problem


Transportation problems are in the centre of attention in combinatorial optimisation. They are all concerned with
finding the cheapest way of transporting objects between senders and recipients. We will discuss perhaps the most
classical of them, also called the Hitchcock problem (1941). We will use the classical terminology and will speak
about products, producers and customers. In this basic transportation problem the task is to transport one fixed
commodity which is assumed to be arbitrarily divisible (such as flour or oil).
More precisely (see Figure 5.1), we suppose that the following are given:

ˆ Producers A1 , . . . , Am with capacities a1 , . . . , am .

ˆ Customers B1 , . . . , Bn with demands b1 , . . . , bn .

ˆ Unit transportation costs cij (from Ai to Bj ) for every i ∈ {1, . . . , m} and j ∈ {1, . . . , n}.

Producers Customers
(capacities) (demands)

(a1 ) A1 B1 (b1 )

(a2 ) A2 B2 (b2 )

c24
(a3 ) A3 B3 (b3 )
x24 =?

(a4 ) A4 B4 (b4 )

(a5 ) A5 B5 (b5 )

Figure 5.1: Transportation problem for 5 producers and 5 customers

And the task is to find the amount of product xij to be transported from Ai to Bj for every i ∈ {1, . . . , m} and
j ∈ {1, . . . , n} (transportation plan) which minimises the total cost of the transport.
It will be practical to store the input data in a table called the transportation table. It has the following form:

36
bj
..
.
ai ······ cij ······
..
.

This table will also be used for making a record of the work of the solution method and for writing the output,
where the quantities xij (initial, current and final) replace cij . Hence the correspondence between the graph of
Figure 5.1 and such a table is as follows:

Table Graph
Cell (i, j) Arc Ai Bj
Row i Producer Ai
Column j Customer Bj

Example 5.1 Powerco has 3 electrical power plants that supply power for 4 cities. The table below shows capacities
of the plants in 106 kWh, peak power demands of the cities in 106 kWh and the costs (in GBP 1,000) of sending
106 kWh from the plants to the cities. The task is to find the distribution of the electricity minimising the cost and
meeting each city’s peak power demand.

Demands → 45 20 30 30
↓ Capacities
35 8 6 10 9
50 9 12 13 7
40 14 9 16 5

Clearly the task is to enter real non-negative numbers in the table

45 20 30 30
35
50
40

so that their row and column sums are equal to the values given in the row/column headings. In the absence of any
solution method we could find a feasible solution using a greedy approach, that is repeatedly using a most attractive,
feasible link between the power stations and cities until all power is distributed. The cheapest link is between A3
and B4 , so we may suggest to transport the maximum amount (which is 30) via this link. This will reduce the
capacity of A3 and demand of B4 by 30, so our situation before the next step may be shown as follows:

45 20 30 0
35 ×
50 ×
10 30

37
where × indicates that the link is unavailable for the next steps. Now the cheapest available link is A1 B2 and the
maximum we can transport via this link is 20. If we continue in this way, we obtain the following transportation
plan (the original capacities and demands have been reinstated):

45 20 30 30
35 15 20 × ×
50 30 × 20 ×
40 × × 10 30

The cost of this plan is 15 × 8 + 20 × 6 + 30 × 9 + 20 × 13 + 10 × 16 + 30 × 5 = 1,080 (= 1,080,000). However


attractive this plan may seem to be, we will see later in this chapter that it is not cheapest. Note that later in this
course we will discuss the greedy approach in more detail.

The rest of this chapter deals with developing a solution method for which we will prove that it always produces
an optimal solution to the transportation problem. It is called the Alphabeta algorithm (ALPHABETA) and it
is the result of the application of the Primal-Dual Algorithm (Algorithm 3.4) to the transportation problem. In
order to do this we first need to formulate the transportation problem as a linear program. We will do this using
the terminology and symbols introduced at the beginning of this chapter. Since cij is the unit cost of transport
between Ai and Bj and xij is the (initially unknown) amount of units transported between Ai and Bj , the cost of
the transport from Ai to Bj is cij xij . Hence the total cost of the transportation plan is
m X
X n
cij xij .
i=1 j=1

The transportation problem is the following linear program:

m X
X n 
min cij xij





i=1 j=1 


n


X 

s. t. xij = ai , i ∈ {1, . . . , m} 

j=1 (5.1)

Xm 


xij = bj , j ∈ {1, . . . , n}





i=1 



xij ≥ 0, i ∈ {1, . . . , m}, j ∈ {1, . . . , n}

Here we can obviously assume without loss of generality that ai > 0, i ∈ {1, . . . , m} and bj > 0, j ∈ {1, . . . , n}.
The matrix of the system in (5.1) is a sparse 0 − 1 matrix, that is all entries are zeros and ones only but there
are relatively “few” ones. This is illustrated in the following example, from which one can also see the very special
structure of this matrix, sometimes also called the transportation matrix .

Example 5.2 We write the transportation problem and then its matrix (with indicated structure) when m = 2 and
n = 3:
c11 x11 + c12 x12 + c13 x13 + c21 x21 + c22 x22 + c23 x23 → min
x11 + x12 + x13 = a1
x21 + x22 + x23 = a2
x11 + x21 = b1
x12 + x22 = b2
x13 + x23 = b3
xij ≥ 0 ∀i, j

38
The matrix of the system is:  
1 1 1
1 1 1 
 

 
 1 1 
 
1 1
 
 
1 1
Before we develop a solution method for transportation problems we need to clarify basic questions of existence
of feasible and optimal solutions.
Theorem 5.3 If a transportation problem has a feasible solution then it also has an optimal solution.

Proof. Every transportation problem is a linear program for which there are only three possibilities by the Funda-
mental Theorem of Linear Programming (Theorem 2.2). Since feasibility is assumed we only need to eliminate the
possibility that the problem is unbounded. This is easily seen from the following:
m X
X n m X
X n
cij xij ≥ − |cij | · |xij |
i=1 j=1 i=1 j=1
Xm X n
=− |cij | · xij
i=1 j=1
Xm X n
≥− |cij | · ai ,
i=1 j=1

since the last expression is a constant (real number).


Theorem 5.4 The transportation problem (5.1) has a feasible solution if and only if it is balanced, that is if
m
X n
X
ai = bj . (5.2)
i=1 j=1

Proof. Let (xij ) be a feasible solution to (5.1). Let us add up separately the first subsystem and then the second
subsystem. Using the commutative and associative laws we get
m
X m X
X n n X
X m n
X
ai = xij = xij = bj .
i=1 i=1 j=1 j=1 i=1 j=1

On the other hand suppose that (5.2) is satisfied and define


ai bj
xij = , i ∈ {1, . . . , m}, j ∈ {1, . . . , n}
d
where d is the common value in (5.2). Then xij > 0 and for every i ∈ {1, . . . , m} we have
n n n
X X a i bj ai X
xij = = bj = ai
j=1 j=1
d d j=1

and similarly for every j ∈ {1, . . . , n}


m
X
xij = bj .
i=1

If (5.2) is not satisfied then it is possible to deal with a closely related balanced transportation problem by
Pm Pn Pm Pn
introducing a dummy producer (if i=1 ai < j=1 bj ) or dummy customer (if i=1 ai > j=1 bj ).
Details can be found in [1]. This discussion is skipped here as we will focus on the use of the primal-dual algorithm
for the construction of ALPHABETA. So everywhere in what follows we will assume that the transportation problem
is balanced and thus it has both a feasible and optimal solution.

39
5.2 An application of the primal-dual algorithm: algorithm ALPHA–
BETA
The primal-dual algorithm starts with a linear program in standard form (we referred to this problem as (P )). In
the present case this problem is (5.1), which we repeat here for ease of reference:
Xm X n 
min cij xij





i=1 j=1 


n


X 

s. t. xij = ai , i ∈ {1, . . . , m} 

j=1 (P )

Xm 


xij = bj , j ∈ {1, . . . , n}





i=1 



xij ≥ 0, i ∈ {1, . . . , m}, j ∈ {1, . . . , n}

Note that the right-hand sides are positive without loss of generality so that the first assumption of the PDA is
satisfied. Next we need to construct the dual. For a better understanding we find the dual first in an example.
Example 5.5 We find the dual to the problem discussed in Example 5.2. Recall that the dual variables are assigned
to the primal constraints. In our case (see Example 5.2) there are two types of constraints (first two equations and
the remaining three equations) and therefore there will be two types of dual variables, we will call them α1 , α2 and
β1 , β2 , β3 . The first dual constraint is assigned to the first column in the primal problem (corresponding to x11 )
which has only two ones and all other entries are zero. The two ones are in the rows assigned to α1 and β1 , so the
LHS will just be α1 + β1 . This constraint will be an inequality since x11 ≥ 0 and the RHS is the coefficient at x11
in the primal objective function, that is c11 . Thus the first constraint is α1 + β1 ≤ c11 . All dual variables are free as
they are all assigned to equations and the coefficients of the dual objective function are taken from the right-hand
sides of the primal problem. Hence the dual is
a1 α1 + a2 α2 + b1 β1 + b2 β 2 + b3 β3 → max
α1 + β1 ≤ c11
α1 + β2 ≤ c12
α1 + β3 ≤ c13
α2 + β1 ≤ c21
α2 + β2 ≤ c22
α2 + β3 ≤ c23
It should be clear from this example that the dual to (P ) is
m
X n
X

max w= ai αi + bj βj



i=1 j=1 (D)


s. t. αi + βj ≤ cij , i ∈ {1, . . . , m},j ∈ {1, . . . , n}

We know that the primal-dual algorithm can only be used if a dual feasible solution is known or can be easily
found. This is satisfied here as for a dual feasible solution we can always take

αi = 0, i ∈ {1, . . . , m}
βj = min cij , j ∈ {1, . . . , n}.
i∈{1,...,m}

It will be convenient to denote I = {1, . . . , m} and J = {1, . . . , n}. We will usually shorten the reference to Ai
to just i, similarly Bj to j. The construction of the dual can graphically be interpreted in Figure 5.2 by assigning
variables αi to every i ∈ I and βj to every j ∈ J so that αi + βj ≤ cij for (i, j).

40
j βj

αi + βj ≤ cij
αi i

Figure 5.2: Construction of the dual of the transportation problem (P)

The next step in the primal-dual algorithm is the construction of the set of admissible indices and the restricted
primal problem. The set of admissible indices clearly is

IJ = {(i, j) ∈ I × J | αi + βj = cij },

where the double letter IJ (rather than just J) indicates that these indices are two-dimensional, the first being
from I and the second from J. Hence the restricted primal is

m+n
X
a

min ξ = xi





i=1 


X 
a
s. t. xij + xi = ai , i ∈ {1, . . . , m}





j:(i,j)∈IJ

X (RP )
xij + xam+j = bj , j ∈ {1, . . . , n} 



i:(i,j)∈IJ 



a

xi ≥ 0, i ∈ {1, . . . , m + n}





x ≥ 0, (i, j) ∈ IJ
ij

In the next step of the primal-dual algorithm we need to find out whether ξ min is positive or zero; note that if
ξ min > 0 then we are not actually interested in an optimal solution of (RP ). Before (RP ) is solved we simplify it
to a problem that will be denoted (RP ′ ), which will turn out to be a max-flow problem; we will also show a simple
link between optimal solutions to (RP ) and (RP ′ ).
The simplification of (RP ) is based on the elimination of auxiliary variables xai and xam+j . They can easily be
eliminated from the equations in (RP ) by omitting them and replacing the equations by inequalities. However,
they also appear in the objective function. To eliminate them we add up all equations in (RP ) and get:
m
X X n
X X m
X n
X m
X n
X
xij + xij + xai + xam+j = ai + bj .
i=1 j:(i,j)∈IJ j=1 i:(i,j)∈IJ i=1 j=1 i=1 j=1

P
The first two terms are the same and each can be written as (i,j)∈IJ xij , the third and fourth together give ξ,
hence we have
Xm n
X X
ξ= ai + bj − 2 xij . (5.3)
i=1 j=1 (i,j)∈IJ
Pm Pn P
Since i=1 ai + j=1 bj is a constant, the minimisation of ξ is equivalent to the maximisation of (i,j)∈IJ xij , so

41
Admissible
arcs only
B1

A1 ∞
B2 b1

a1
A2 ..
. b2
a2
bj
s .. Bj t
.
ai ∞
bn−1
Ai ∞ ..
.
am
..
. bn
∞ Bn−1
Am

Bn

Figure 5.3: (RP ′ ) as a max-flow problem

the problem (RP ′ ) after the elimination of xai , which is equivalent to (RP ), is:
X 
max g(x) = xij 


(i,j)∈IJ




X 
s. t. xij ≤ ai , i ∈ {1, . . . , m}



j:(i,j)∈IJ (RP ′ )
X 
xij ≤ bj , j ∈ {1, . . . , n}




i:(i,j)∈IJ





xij ≥ 0, (i, j) ∈ IJ

We will now show that (RP ′ ) is a max-flow problem. The underlying network is shown in Figure 5.3. Note that
the arcs between Ai and Bj are included if and only if they are admissible; we will set their capacities to +∞.

Lemma 5.6 (a) If f is a flow in the network N of Figure 5.3 and if

xij = f (i, j), for all(i, j) ∈ IJ (5.4)

then x = (xij ) ∈ MRP ′ and g(x) = v(f )


P P
(b) If x = (xij ) ∈ MRP ′ , f (i, j) = xij , (i, j) ∈ IJ, f (s, i) = j:(i,j)∈IJ xij , i ∈ I, f (j, t) = i:(i,j)∈IJ xij , j ∈ J,
then f is a flow in N and v(f ) = g(x).

Proof.

42
(a) Let f be a flow in N and i ∈ I, then by the conservation law at node i we have:
X X
xij = f (i, j) = f (s, i) ≤ ai .
j:(i,j)∈IJ j:(i,j)∈IJ
P
Similarly it can be shown that i:(i,j)∈IJ xij ≤ bj for any j ∈ J. Hence x defined by (5.4) is a feasible solution
of (RP ′ ) and
 
X X X X X
g(x) = xij = f (i, j) =  f (i, j) = f (s, i) = v(f ).
(i,j)∈IJ (i,j)∈IJ i∈I j:(i,j)∈IJ i∈I

(b) Let f be defined on the set of arcs of N as described in the lemma statement. In order to prove that f is a flow
we need to verify the capacity constraints at all arcs and the conservation law at all nodes. The capacities
P
of the (i, j) arcs are +∞ so nothing needs to be done there. If i ∈ I then f (s, i) = j:(i,j)∈IJ xij ≤ ai
P
and if j ∈ J then f (j, t) = i:(i,j)∈IJ xij ≤ bj , which confirms that all capacity constraints are satisfied. If
P P
i ∈ I then f (s, i) = j:(i,j)∈IJ xij = j:(i,j)∈IJ f (i, j) which confirms the conservation law at i. Similarly,
P P P
f (j, t) = i:(i,j)∈IJ xij = i:(i,j)∈IJ f (i, j) verifies this law for any j ∈ J. Finally, v(f ) = i∈I f (s, i) =
P P
i∈I j:(i,j)∈IJ xij = g(x).

Lemma 5.6 enables us to easily check whether ξ min is positive or zero: Since the transportation problem is
balanced, by (5.3) using this lemma we have
X X
ξ=2 ai − 2g(x) = 2 ai − 2v(f ).
i∈I i∈I

Hence X
ξ=0 ⇔ v(f ) = ai .
i∈I
P
Since f (s, i) ≤ ai for every i ∈ I, and v(f ) = i∈I f (s, i) we have equivalently

ξ=0 ⇔ f (s, i) = ai for every i ∈ I,

that is ξ = 0 if and only if the current flow saturates all arcs leaving the source.
So we can summarise how to solve (RP ): We construct the network of Figure 5.3 and find a max-flow using the
Ford and Fulkerson algorithm. At the end (that is at non-breakthrough) there are two possibilities:

1. The max-flow f saturates all arcs leaving the source, that is f (s, i) = ai for every i ∈ I. In this case an
optimal solution x = (xij ) to the transportation problem has been found and can be described as follows:

f (i, j), if (i, j) ∈ IJ;
xij = (5.5)
0, else.

2. The max-flow f does not saturate all arcs leaving the source, that is f (s, i) < ai for some i ∈ I. In this
case the primal-dual algorithm continues by finding an optimal solution to the dual of the restricted primal
(DRP ), (α, β) and then amends the current (α, β) using (α, β).

So the next question we need to deal with is that of finding (α, β). To do this we define

I ∗ = {i ∈ I | node i is labelled at non-breakthrough} and


J ∗ = {j ∈ J | node j is labelled at non-breakthrough}.

43
Figure 5.4: At non-breakthrough some arcs do not exist

Lemma 5.7 If the Ford and Fulkerson algorithm is applied to the network of Figure 5.3 then the implication

i ∈ I∗ ⇒ j ∈ J∗

holds at non-breakthrough for every (i, j) ∈ IJ. (see Figure 5.4).

Proof. The capacity of every arc (i, j) ∈ IJ is +∞. Since i ∈ I ∗ , node i was scanned during the run of the labelling
algorithm in the last iteration and since f (i, j) is strictly less than the capacity of (i, j) the node j was labelled
during the scan of i or already was labelled. In any case j ∈ J ∗ .
We are now ready to prove the key statement for ALPHABETA.

Theorem 5.8 If the Ford and Fulkerson algorithm is applied to the network of Figure 5.3 then at non-breakthrough
an optimal solution to DRP is given by (αi , βj ), where
 
1 : i ∈ I∗ −1 : j ∈ J∗
αi = and βj =
−1 :i∈ / I∗ 1 :j∈ / J ∗.

Proof.

(i) First we show that (α, β) is feasible to the DRP . For this we need to construct the DRP (note that there
is no need to use the DRP outside this proof). Similarly as (D) it will have two type of variables forming
vectors α and β. The constraints corresponding to variables xij , (i, j) ∈ IJ in RP will have zero right-hand
sides, since such are the respective coefficients of the objective function in RP at xij . There will be extra
constraints corresponding to the auxiliary variables xai with trivial left-hand sides and the right-hand sides
equal to 1. It may be useful to realise that the objective function in RP has all coefficients 0 or 1 and can be

44
seen as
X m+n
X
ξ= 0 · xij + 1 · xai .
(i,j)∈IJ i=1

Thus the DRP is


m n

X X 
max w(α, β) = ai αi + bj β j 



i=1 j=1




s. t. αi + βj ≤ 0, (i, j) ∈ IJ (DRP )


αi ≤ 1, i∈I






βj ≤ 1, j∈J 

All constraints in (DRP ) are satisfied by (α, β) trivially except for the case when αi + βj = 1 + 1 = 2, which
occurs when i ∈ I ∗ and j ∈
/ J ∗ . But this case is excluded for all (i, j) ∈ IJ by Lemma 5.7.

(ii) Next we show that (α, β) is optimal to DRP , see Figure 5.5. By a corollary of the weak duality theorem
(Corollary 2.7) it is sufficient to show that

ξ min = w(α, β). (5.6)

First note that X X X X


w(α, β) = ai − ai + bj − bj . (5.7)
i∈I ∗ / ∗
i∈I / ∗
j ∈J j∈J ∗

On the other hand by (5.3) and Lemma 5.6 we have


m
X n
X
ξ min = ai + bj − 2v(f ), (5.8)
i=1 j=1

where f is a max-flow at non-breakthrough. From the definition of the value of a flow we also have
X X X
v(f ) = f (s, i) = f (s, i) + f (s, i). (5.9)
i∈I i∈I ∗ / ∗
i∈I

However,
f (s, i) = ai (5.10)
/ I ∗ because at non-breakthrough all scanning is finished and if f (s, i) < ai , then node i would
for every i ∈
have been labelled during the scan of s.
Let us consider the term i∈I ∗ f (s, i) and fix i ∈ I ∗ . Then by the conservation law
P
X X X X
f (s, i) = f (i, j) = f (i, j) + f (i, j) = f (i, j), (5.11)
j∈J j∈J ∗ / ∗
j ∈J j∈J ∗
(i,j)∈IJ (i,j)∈IJ (i,j)∈IJ (i,j)∈IJ

where the last equality follows from Lemma 5.7.


If j ∈ J ∗ at non-breakthrough then the arc (j, t) is saturated since otherwise t would have been labelled during
the scanning of j. Hence, using also the conservation law at j, we have for any j ∈ J ∗ :
X X X
bj = f (j, t) = f (i, j) = f (i, j) + f (i, j),
i∈I i∈I ∗ / ∗
i∈I
(i,j)∈IJ (i,j)∈IJ (i,j)∈IJ

where the last term is in fact zero because if f (i, j) > 0 for some i then i would have been labelled during the
scan of j along the (backward) arc (i, j). Thus we get:
X
bj = f (i, j). (5.12)
i∈I ∗
(i,j)∈IJ

45
We will now substitute from (5.10), (5.11) and (5.12) in (5.9) to obtain (using also the commutative and
associative laws):
X X X
v(f ) = ai + f (i, j)
/ ∗
i∈I i∈I ∗ j∈J ∗
(i,j)∈IJ
X X X
= ai + f (i, j)
/ ∗
i∈I j∈J ∗ i∈I ∗
(i,j)∈IJ
X X
= ai + bj .
/ ∗
i∈I j∈J ∗

Finally, we substitute the last expression for v(f ) in (5.8):


 
Xm n
X X X
ξ min = ai + bj − 2  ai + bj 
i=1 j=1 / ∗
i∈I j∈J ∗
X X X X X X
= ai + ai + bj + bj − 2 ai − 2 bj
i∈I ∗ / ∗
i∈I j∈J ∗ / ∗
j ∈J / ∗
i∈I j∈J ∗
X X X X
= ai − ai + bj − bj ,
i∈I ∗ / ∗
i∈I / ∗
j ∈J j∈J ∗

which together with (5.7) confirms that (5.6) holds and thus (α, β) is indeed an optimal solution.

Figure 5.5: To the proof of Theorem 5.8

46
opt
Once (α, β) ∈ MDRP is found the next step in the primal-dual algorithm is checking whether αi + βj > 0 for
at least one pair (i, j) ∈/ IJ. However, the negative answer to this question would mean that the transportation
problem is infeasible, which is impossible by the assumption that this problem is balanced and so by Theorem 3.3
feasible. Therefore this check can be skipped.
The formula for θ1 gets the following form in the transportation problem:
cij − αi − βj
 
θ1 = min αi + βj > 0
αi + βj
(5.13)
cij − αi − βj
 
= min i ∈ I ∗, j ∈
/ J∗
2

The update of (α, β) using (α, β) can now also easily be formulated:

α + θ
i 1 : if i ∈ I ∗
αi := αi + θ1 αi = (5.14)
αi − θ1 : if i ∈ / I∗

and

β − θ
i 1 : if j ∈ J ∗
βj := βj + θ1 βj = (5.15)
βj + θ1 : / J ∗.
if j ∈

We are ready to formulate the algorithm.

Algorithm 5.9 (ALPHABETA)


Input: Transportation problem with inputs cij , ai , bj .
Output: An optimal solution (xij )
αi := 0, i ∈ I
βj := mini∈{1,...,m} cij , j ∈ J
again: IJ = {(i, j) ∈ I × J | αi + βj = cij }
Use the FFA to solve MFP of Figure 5.3 with the set of admissible arcs IJ.
If f (s, i) = ai , for all i ∈ I then stop (an optimal solution is given by (5.5))
Let I ∗ and J ∗ be the sets of labelled nodes at non-breakthrough
Find θ1 using (5.13)
Update (α, β) using (5.14) and (5.15)
Go to again

When using APLHABETA we will extend the transportation table by an extra column for αi ’s and an extra
row for βj ’s:

bj
..
.
ai ······ xij ······ αi
..
.
βj

Example 5.10 We return to Example 5.1 and solve it using ALPHABETA. At the end we will compare the solution
with that found by GREEDY.
Initialisation: α = (0, 0, 0)⊤ , β = (8, 6, 10, 5)⊤ (column minimum).

47
45 20 30 30 αi
35 8 6 10 9 0
50 9 12 13 7 0
40 14 9 16 5 0
βj 8 6 10 5

Iteration 1: IJ = {(1, 1), (1, 2), (1, 3), (3, 4)}, the network is shown in Figure 5.6. A max-flow can easily be found
by inspection and is given by free-standing numbers. (By convention the missing flow values are zero). However,
the labelling algorithm has to be run to reach a non-breakthrough, at which the labels are displayed in bold (as we
do not need the details of labels here). Hence I ∗ = {2, 3}, J ∗ = {4}. The arcs leaving the source are not saturated,
thus another iteration is needed. For this we calculate
1 1
θ1 = min{9 − 8 − 0, 12 − 6 − 0, 13 − 10 − 0, 14 − 8 − 0, 9 − 6 − 0, 16 − 10 − 0} =
2 2
and update α and β. The result is in the table below:

45 20 30 30 αi
35 8 6 10 9 −0.5
50 9 12 13 7 0.5
40 14 9 16 5 0.5
βj 8.5 6.5 10.5 4.5

Iteration 2: IJ = {(1, 1), (1, 2), (2, 1), (1, 3), (3, 4)}, the network is shown in Figure 5.7. After finding a max-flow
we get I ∗ = {2, 3}, J ∗ = {1, 4}. The flow is still not saturating the arcs leaving the source and so another iteration
is needed. For this we calculate
1
θ1 = min{12 − 6.5 − 0.5, 13 − 10.5 − 0.5, 9 − 6.5 − 0.5, 16 − 10.5 − 0.5} = 1
2
and update α and β. The result is in the table below:

45 20 30 30 αi
35 8 6 10 9 −1.5
50 9 12 13 7 1.5
40 14 9 16 5 1.5
βj 7.5 7.5 11.5 3.5

Iteration 3: IJ = {(1, 2), (1, 3), (2, 1), (2, 3), (3, 2), (3, 4)}, the network is shown in Figure 5.8. The max-flow now
saturates the arcs leaving s, an optimal solution to the transportation problem can be read off from the flow values
on the admissible arcs and is summarised in the table below:

45 20 30 30
35 10 25
.
50 45 5
40 10 30

The cost of this transportation plan is GBP 1,020,000 which is GBP 60,000 cheaper than that found by GREEDY
in Example 5.1.

48
35
A1 B1

35 35
B2
35 45

20
50
s A2 t

30

40 30
B3
30 30

A3 B4
30

Figure 5.6: Iteration 1

A1 B1

45

35 15 45
B2
35 45

20
50 20 15
s A2 t
45 20
30

40 30
B3
30 30

A3 B4
30

Figure 5.7: Iteration 2

A1 B1
10 45

35 45
B2
35 45
20
25 20
50
s A2 t
50
30
30
40 5 30
B3
40 30
10

A3 B4
30

Figure 5.8: Iteration 3

49
Chapter 6

Introduction to the theory of matroids

This chapter presents a basic introduction into the theory of matroids. It provides an explanation why some
combinatorial optimisation problems can be solved by a straightforward “greedy” algorithm and characterises such
problems. Before we do so we need to discuss a selection of properties of undirected graphs.

6.1 Undirected graphs


We have already introduced directed graphs in Section 4.2. Now we will introduce basic notation and terminology
of undirected graphs (usually called simply “graphs”).
A graph consists of nodes and (undirected) arcs. (Note that nodes are called “vertices” and arcs are called
“edges” by some authors.) There are various versions of the concept of a graph in the literature; we will restrict
our attention to what is often called a simple graph (but we will not use the word “simple”) where there are no
loops (that is arcs joining a node with itself) and there are no multiple arcs joining the same pair of nodes. Since
every arc in this setting is fully determined by its end nodes, our formal definition is: a graph is any ordered pair
G = (V, E) , where V is a non-empty set (of nodes) and E ⊆ {{u, v} : u, v ∈ V, u ̸= v} (the set of arcs). Once a
graph G is given it will be practical to denote its set of nodes V also by V (G) and the set of arcs E by E(G). We
will only consider finite graphs, where the set of nodes (and therefore also the set of arcs) is finite
We will use the convention that uv stands for the set (arc) {u, v}; clearly with this convention uv = vu for every
u and v. If e = uv is an arc then we say that the nodes u and v are adjacent and that the arc e is incident with
(or upon) u and v. Unless stated otherwise, the number of nodes, that is |V |, will be denoted by m, the number
of arcs, that is |E|, by n. Figure 6.1 shows a graph with 4 nodes and 5 arcs; more precisely, V = {a, b, c, d},
E = {ab, bc, cd, ad, bd}. Let G = (V, E) and G ′ = (V ′ , E ′ ) be graphs. We say that G ′ is a subgraph of G (notation
G ′ ⊆ G) if V ′ ⊆ V and E ′ ⊆ E. A subgraph G ′ of G is called proper if G ′ ̸= G. We say the subgraph G ′ of G is a
spanning subgraph of G if V (G ′ ) = V (G).
In a general graph each of the m nodes can be connected with up to m − 1 other nodes. If we add up the

d c

a b

Figure 6.1: A graph with 4 nodes and 5 arcs

50
K1 K2 K3

Figure 6.2: Small complete graphs

Figure 6.3: Acyclic but not connected

numbers of arcs incident with individual nodes then every arc is counted twice. Hence in any graph we have
0 ≤ n ≤ 12 m(m − 1).
If E = {{u, v} : u, v ∈ V, u ̸= v}, then G is called complete, notation Km . The complete graphs with 1, 2 and 3
nodes are shown in Figure 6.2. In Km clearly n = 12 m(m − 1).
Given a graph G = (V, E) the sequence p = (v0 , v1 , . . . , vk ) is a walk (in G) if v0 , v1 , . . . , vk ∈ V and vi−1 vi ∈ E
(i = 1, 2, . . . , k). If, moreover, vi ̸= vj (i ̸= j; i, j = 0, 1, . . . , k), then p is called a path. If p is indeed a path then
it is called a u − v path if v0 = u and vk = v and we say that p passes through v0 , v1 , . . . , vk or that p contains
v0 , v1 , . . . , vk . We also say that v is reachable from u (in G) if there is a u − v path (in G).
A walk c = (v0 , v1 , . . . , vk ) is called closed if v0 = vk . If, moreover, (v0 , v1 , . . . , vk−1 ) is a path in G and k ≥ 3,
then c is called a cycle. A cycle is called Hamiltonian (in G) if it contains all nodes of G.
The number of arcs incident with a node v is called the degree of v, notation deg(v). We say that v is isolated
if deg(v) = 0.
A graph G = (V, E) is called acyclic (or a forest) if there is no cycle in G; it is called connected if there is a u − v
path in G for all u, v ∈ V (G) with u ̸= v; and it is called a tree if G is both acyclic and connected. For illustration
see Figures 6.3, 6.4 and 6.5.
If you try to draw a few trees you may notice that each of them has the number of arcs one less than the number
of nodes, for instance in the tree of Figure 6.5 we have m = 9, n = 8. This is a simple but fundamentally important
observation confirmed by the next statement which is very well known but we repeat it here for completeness.

Theorem 6.1 If G is a tree then n = m − 1.

Proof. By induction on m.
If m = 1 then G has no arcs and so n = 0.

51
Figure 6.4: Connected but not acyclic

Figure 6.5: A tree

Suppose now m > 1. There are no isolated nodes since otherwise G would not be connected. If there was no
node of degree 1 then starting from any node and an incident arc it would be possible to move from node to node by
entering and leaving every node using different arcs, hence eventually returning back to one of the previous nodes
and thus creating a cycle, which is a contradiction.
Let v be a node of degree 1 and e be the incident arc. Let G ′ = (V \{v}, E\{e}) and denote the number of nodes
by m′ and arcs by n′ . Hence m′ = m − 1, n′ = n − 1 and G ′ is obviously again a tree. By the induction hypothesis
n′ = m′ − 1. We deduce that m − 1 = m′ = n′ + 1 = (n − 1) + 1 = n.
We will refer to the next statement quite often; its proof is trivial.

Lemma 6.2 (obvious) If a graph G is acyclic then also every subgraph of G is acyclic.

The concept of a connected component, briefly just component, will play a key role in the theory that follows.
Informally, a component of a graph G is any subgraph of G that is connected and cannot be extended to a larger
connected subgraph of G (but a larger connected subgraph of G may exist). More precisely, a graph H is called a
component of the graph G if

ˆ H ⊆ G,

ˆ H is connected and

52
ˆ H ⊆ H′ , H′ ⊆ G, H′ connected implies H = H′ .

It follows from this definition that every component of a forest is a tree.


The number of components of a graph G will be denoted by p(G). For example the graph of Figure 6.6 has two
components, the graph of Figure 6.7 is one of its components whereas those in Figures 6.8 and 6.9 are not.
We can now generalise Theorem 6.1 stated for graphs consisting of one component to graphs with any number
of components.

Theorem 6.3 If G is an acyclic graph then n = m − p(G).

Proof. Let p = p(G) and mi (ni ) denote the number of nodes (arcs) in the i-th component of G. Then ni = mi − 1
(i = 1, . . . , p) by Theorem 6.1 and so
p
X p
X p
X p
X
n= ni = (mi − 1) = mi − 1 = m − p.
i=1 i=1 i=1 i=1

Another key concept in this chapter is that of a spanning tree. Let G be a connected graph (this assumption
is essential!). Then a subgraph T of the graph G is called a spanning tree of G if

ˆ T is a tree and

ˆ V (T ) = V (G) (T is spanning G).

It follows from this definition and Theorem 6.1 that every spanning tree of a graph with m nodes has m − 1
arcs.
In summary, a spanning tree of a connected graph is any subgraph of this graph that is acyclic, connected and
spanning. Figure 6.10 shows a connected graph G and five subgraphs of G. The first two are spanning trees of
G, however the others are not: G3 is connected and spanning but not acyclic, G4 is acyclic and spanning but not
connected and G5 is connected and acyclic but not spanning.
Let G be a graph. A graph F is called a maximal acyclic subgraph (MAS) of G if F is an acyclic subgraph
of G but is not a proper subgraph of any acyclic subgraph of G. The next statement confirms what is probably
intuitively obvious. It is a basic building block of the matroid theory.

Theorem 6.4 Let G be a connected graph with m nodes. Then every MAS of G is a spanning tree of G and hence
has m − 1 arcs.

Proof. Let G be a connected graph with m nodes and F be a MAS of G. We need to prove that F is a spanning
tree of G, that is, it is acyclic, connected and spanning.
The first property is immediate.
If V (F) ̸= V (G) then by adding any node from V (G)\V (F) (but no arc) to F we get an acyclic subgraph of G
which contains F as a proper subgraph, a contradiction. Therefore V (F) = V (G), that is, F is a spanning subgraph
of G.

Figure 6.6: Graph G

53
Figure 6.7: Component of G in Figure 6.6

Figure 6.8: No component of G in Figure 6.6

Figure 6.9: No component of G in Figure 6.6

G G1 G2 G3 G4 G5

Figure 6.10: A connected graph G and 5 of its subgraphs.

54
Finally, we prove by contradiction that F is connected. Suppose that p(F) > 1 and let F1 be one (fixed)
component of F. We define F2 to be the graph

(V (F)\V (F1 ), {(uv ∈ E(G) : u, v ∈ V (F)\V (F1 )}).

Take any two nodes u ∈ V (F1 ) and v ∈ V (F2 ). The graph G is connected and so there exists a u − v path p in
G. On this path there are consecutive nodes u′ and v ′ such that u′ ∈ V (F1 ) and v ′ ∈ V (F2 ). Hence u′ v ′ ∈ E(G).
Consider now the graph F ′ = (V (F), E(F)) ∪ {u′ v ′ }). This graph is acyclic because it is obtained from an
acyclic graph F by adding an arc joining two different components of F. We also have F ⊆ F ′ ⊆ G and F = ̸ F′
′ ′ ′
because u v ∈ / E(F). Hence F is an acyclic subgraph of G containing F as a proper subgraph, a contradiction.
This completes the proof.
Let G be a graph. A graph F is called a spanning forest of the graph G if V (F) = V (G) and each component of
F is a spanning tree of the corresponding component of G.

Corollary 6.5 (of Theorem 6.4) Let G be a graph with m nodes and p components. Then every maximal acyclic
subgraph of G is a spanning forest of G and (therefore) has m − p arcs.

Proof. Let F be a MAS of G. Then every component of F is a maximal acyclic subgraph of that component
(otherwise we have a contradiction with the assumption that F is a MAS of G) and hence a spanning tree of that
component by Theorem 6.4. If V (F) ̸= V (G) then by adding any node from V (G)\V (F) to F we get an acyclic
subgraph of G which contains F as a proper subgraph, a contradiction. Therefore V (F) = V (G) and so F is a
spanning forest of G. The rest follows from Theorem 6.3.

6.2 The greedy algorithm


The “greedy” approach to solving problems is used in various theoretical and practical situations and is popular
for its simplicity, especially when no solution method is available. The key question of course is whether such an
approach constitutes a correct solution method. As it can be expected it is the case for some problems but not
always; and obviously it is desirable to characterise problems for which this approach is a correct solution method.
The matroid theory provides an answer to this question. It should be noted that the greedy algorithm may be
useful even if it is not a correct solution method since it may provide a good approximation of a solution or a helpful
start in a more complex algorithm.
The greedy approach is intuitively obvious but may differ in details in various areas of mathematics. We will
give a precise formulation for combinatorial optimisation later on. For the time being we will assume that it means
the following: “Repeatedly select a most attractive feasible element until there are no feasible elements any more”.
We will refer to this as GREEDY.
We illustrate GREEDY on a few examples.

(1) The assignment problem (AP) is: Given an m × m matrix A, find m entries in A, no two from the same row
or column so that their sum is minimal (or maximal). Note that the minimisation and maximisation versions
of the assignment problem are converted to each other by taking −A instead of A.
If we take !
1 2
A= ,
3 5
then GREEDY will start by choosing 1, skip 2 and 3 (since these are not feasible as they are in the same row
or column as the already chosen 1) and finally take 5. The total then is 1 + 5 = 6 but obviously the choice of
2 + 3 would have been better. So GREEDY is not a correct solution method for AP. An explanation of this
simple observation will follow from the matroid theory in the next section. Observe that if 5 was replaced by

55
a

2 3

s t
4

Figure 6.11: Shortest s-t path

b
5 2

6
a c

8 9

1 10
e d
12

Figure 6.12: A graph and its MST

any big number, say M , then GREEDY would still choose 1 and M rather than the optimal choice of 2 + 3
so the error of GREEDY in AP may be arbitrarily large.

(2) The shortest path problem (SPP). Consider the problem of finding a shortest path from s to t in the graph
of Figure 6.11. Here GREEDY would obviously start by choosing the arc (s, a), followed by (a, t), yielding
the path of length 5 but clearly the path consisting of the single arc (s, t) is shorter. Thus GREEDY is not
a correct solution method for SPP.

(3) The minimal spanning tree problem (MSTP or just MST) is the following: Given a connected graph G =
(V, E) and a function w : E → R, find a spanning tree T = (V, E ∗ ) of G such that
X
w(E ∗ ) = w(e)
e∈E ∗

is minimal.
In the graph of Figure 6.12 GREEDY will choose arcs (in this order) ce, bc, ab, then will reject arc ae because
otherwise the cycle (a, b, c, a) is created and similarly it will reject ae. Then it will accept cd and reject the
remaining arcs. So E ∗ = {ab, bc, ce, cd} and the minimum weight is 17.
Indeed, GREEDY is a correct solution method for the MST in general, and the next section provides an
explanation of why GREEDY is a correct solution method for the MST.
The proof is actually done for the following maximum weight forest problem (MWF), which is equivalent to
the MST: Given a graph G = (V, E) and a function w : E → R≥0 , find a forest F = (V, E ∗ ) of G such that
X
w(E ∗ ) = w(e)
e∈E ∗

is maximal. In this definition it is important that the weights of arcs are restricted to non-negativity. Note
that by definition w(∅) = 0.

56
b
7 10

6
a c

4 3

11 2
e d
0

Figure 6.13: A graph and its MWF

Details of the conversion from MWF to MST are left as an exercise (see problem sheets) but we illustrate
this transformation on the graph of Figure 6.12. We now show how the MST in that graph can be solved
provided that we can solve the MWF (which will be shown to be solvable by GREEDY in Section 6.4). The
transformation that can be used is
w′ (e) = W − w(e), e ∈ E,
where W is the greatest arc weight in G. In Figure 6.12 we have W = 12 and the graph with transformed
weights is displayed in Figure 6.13. GREEDY now finds an MWF (remember we now have a maximisation
problem so GREEDY is choosing the arcs of greatest weights first) which coincides with the MST found
before.

6.3 Independence systems


Recall that our aim in this chapter is to characterise combinatorial optimisation problems that are solvable by
GREEDY. A key concept for achieving this aim is that of an independence system.
S = (X, F) is called an independence system (IS) if
1. X is a finite set (called the ground set), and

2. F is a collection of subsets of X closed under inclusion, that is A ∈ F, A′ ⊆ A ⇒ A′ ∈ F


If (X, F) is an independence system then any set A ∈ F is called an independent set.
We will now present four examples of independence systems, denoted S1 , S2 , S3 and S4 .

6.3.1 System 1
Let G = (V, E) be a graph and let S1 = (E, A) where A = {A ⊆ E : (V, A) is acyclic}. It follows immediately from
the “obvious lemma”, Lemma 6.2, that S1 is an independence system (because for any acyclic graph also every
subgraph is acyclic).
We illustrate S1 on the following instance. Let G be the graph of Figure 6.14. Then V = {a, b, c}, E =
{ab, bc, ac} and A = {∅, {ab}, {bc}, {ac}, {ab, bc}, {ab, ac}, {bc, ac}} = 2E \E.

6.3.2 System 2
For S2 we need a bit more of graph theory: Let G = (V, E) be a graph. Then M ⊆ E is called a matching (in G) if
e ∩ f = ∅ for all e, f ∈ M , e ̸= f . So a matching in a graph G is a set of edges of G, no two of which have a vertex
in common.

57
a

b c

Figure 6.14: Example of S1

G Not a matching Matching but not Maximal but not Maximum and per-
maximal maximum fect

Figure 6.15: Maximal, maximum and perfect matchings

Thus |M | ≤ 12 m for every matching M . Clearly, ∅ is a matching in any graph, {e} is a matching for any e ∈ E.
If |M | = 12 m then M is called perfect. If M is a matching in G and M is not a proper subset of any other matching
in G then M is called maximal (“it cannot be extended to a bigger matching”). If M is a matching in G of biggest
size then M is called maximum. These concepts are illustrated in Figure 6.15 where M is formed by red dashed
arcs.
The difference between the size of a maximum matching and 21 m can be arbitrarily big as can be seen in the
graph of Figure 6.16 where any maximum matching consists of only one arc.
Let G = (V, E) be a graph. Then S2 = (E, M), where M = {M ⊆ E : M is a matching}, is an independence
system because with every matching also every subset is a matching.
We illustrate S2 on the following instance. Let G be the graph of Figure 6.17.
Then V = {a, b, c, d}, E = {ab, bc, cd, ad} and (E, M) is an instance of S2 , where

M = {∅, {ab}, {bc}, {cd}, {ad}, {ab, cd}, {ad, bc}}.

Note that independence systems can be constructed from graphs also in ways other than in S1 and S2 . For example

..
.

Figure 6.16: Maximum matchings of size 1

58
d c

a b

Figure 6.17: To example of S2

if G is the graph of Figure 6.17 and F = {∅, {ab}, {ad}, {ab, ad}} then (E, F) is an independence system (different
from S1 and S2 ): this can easily be verified and is left as an exercise.
The remaining two independence systems will not be graph related.

6.3.3 System 3
Let m ≥ 1 be an integer, S3 = (E, I) where E = {(i, j) : i, j = 1, . . . , m} (the set of all positions in an m × m
matrix) and I ∈ I if and only if I is a set of positions in E such that no two are from the same row or column. S3
is an independence system because if no two positions in a set of positions are from the same row or column then
the same holds also for any subset.
We illustrate S3 on the following instance. Let m = 2, that is we consider 2 × 2 matrices, generally of the form
!
a11 a12
A= .
a21 a22

Here E = {(1, 1), (1, 2), (2, 1), (2, 2)} and

I = {∅, {(1, 1)}, {(1, 2)}, {(2, 1)}, {(2, 2)}, {(1, 1), (2, 2)}, {(1, 2), (2, 1)}}.

Hence (E, I) is an instance of S3 .

6.3.4 System 4
Let A ∈ Rm×n and S4 = (C, L) where C = {1, . . . , n} (the set of column indices of A) and

L = {J ⊆ C : columns with indices from J are linearly independent}.

S4 is an independence system because (as is well known from linear algebra, every subsystem of a linearly indepen-
dent system of vectors is also linearly independent).
We illustrate S4 on the following instance. Let
!
1 0 1 2 0
A=
0 1 1 0 0

Then C = {1, 2, 3, 4, 5} and L will contain ∅, every one-element subset of C except {5} (because the zero vector
alone is not linearly independent); it will also contain pairs of indices of linearly independent columns (for instance
it will not contain the set {1, 4} because the first and fourth columns are linearly dependent). It cannot contain
any set of 3 or more indices because in the two-dimensional space the maximum number of linearly independent
columns is 2. Hence (C, L) is an instance of S4 , where

L = {∅, {1}, {2}, {3}, {4}, {1, 2}, {1, 3}, {2, 3}, {2, 4}, {3, 4}}.

59
6.3.5 Systems 1-4: some remarks
The size of an independent set plays a key role in the theory that will now follow. Let us therefore have a closer
look at the upper bounds of these sizes in S1 − S4 .
It follows from Theorem 6.3 that an acyclic subgraph of a graph with m nodes and p components cannot have
more than m − p arcs. Therefore for any independent set A in S1 we have |A| ≤ m − p.
We have observed immediately after the definition of a matching that a matching cannot have more than 12 m
arcs so for any independent set M in S2 we have |M | ≤ 21 m.
In S3 we have |I| ≤ m for any independent set I because in a set of m + 1 or more positions of an m × m matrix
at least two are from the same row (by, say, the pigeonhole principle).
Finally, it is proved in linear algebra that the size of any linearly independent system of columns of a real
matrix A does not exceed the rank of A (denoted by r(A)). Hence in S4 for any independent set J we have
|J| ≤ r(A) ≤ min(m, n).
Before we continue with matroids it may be useful to realise that “independence” in independence systems is a
generalisation of various concepts in specific independence systems. For instance the meaning of “independence”
in the four examples above is as follows:

S1 acyclic

S2 matching

S3 independent positions

S4 linearly independent

6.4 Matroids
Some independence systems have the desirable powerful feature: whatever combinatorial optimisation problem is
associated with them, GREEDY will always be a correct solution method. Such independence systems are called
matroids. In this section we will formalise this idea and characterise matroids. In order to do so, clearly we need first
to say precisely what a combinatorial optimisation problem is and what exactly GREEDY means. Independence
systems will help us do this fairly easily.
Let S = (E, F) be an independence system. A combinatorial optimisation problem (COP) associated with S,
notation (S, w), is the following problem:
Given a weight function w : E → R≥0 , find an independent set (called an optimal solution) that has the biggest
total weight, i.e., find X ∗ ∈ F such that
w(X ∗ ) = max w(X),
X∈F

where X
w(X) = w(e)
e∈X

for X ⊆ E.
Hence a combinatorial optimisation problem associated with S1 (where “independent” means “acyclic”) is:
Given a graph G = (V, E) and w : E → R≥0 , find an acyclic subgraph of G of maximum total weight. This is
exactly the definition of the maximum weight forest problem.
In S3 “independent” means “being from different rows and columns”. Assigning a weight with every position
(i, j) means creating a matrix (aij ); and so the combinatorial optimisation problem associated with S3 is:

60
Given an m × m matrix A = (aij ) ≥ 0, find m entries in A, no two either from the same row or column, of
maximum total value. This is exactly the definition of the assignment problem (with maximisation).
Next we formally define GREEDY.

Algorithm 6.6 (The Greedy Algorithm (GREEDY))


Input: Independence system S = (E, F) and w : E → R≥0
Output: X ∗ ∈ F, for which w(X ∗ ) is maximal. [Remark: this is an expectation rather than a statement]

(1) X ∗ := ∅

(2) (“Find a most attractive element in E”) Find an e ∈ E such that w(e) = maxz∈E w(z),

(3) E := E\{e},

(4) (“Check feasibility”) If X ∗ ∪ {e} ∈ F then X ∗ := X ∗ ∪ {e}

(5) If E = ∅ then stop, else go to (2).

We are now ready to formulate and prove one of the key statements of this chapter, and in fact of the whole
course.

Theorem 6.7 Let S = (E, F) be an independence system. Then the following statements are equivalent:

(A) For every w : E → R≥0 GREEDY correctly solves the combinatorial optimisation problem (S, w).

(B) [Exchange Proposition (EP)] For all F, F ′ ∈ F, if |F ′ | = |F | + 1 then there is an e ∈ F ′ \F such that
F ∪ {e} ∈ F.

(C) [Rank Proposition (RP)] If A ⊆ E and F , F ′ are maximal independent subsets of A then |F | = |F ′ |.

An independence system satisfying one (and therefore all) of the conditions (A), (B), (C) in Theorem 6.7 is
called a matroid.
Proof. Let S = (E, F) be an independence system.
(A) ⇒ (B)
We prove, equivalently, that if (B) is not true that then also (A) is not true. So suppose that for some F, F ′ ∈ F,
we have |F ′ | = |F | + 1 and for all e ∈ F ′ \F we have F ∪ {e} ∈/ F. Let us denote |F | by p and define w : E → R≥0
as 

p + 1, if e ∈ F \F ,


w(e) = p + 2, if e ∈ F ,


0, if e ∈ E\(F ∪ F ′ ).

If GREEDY is used for solving the combinatorial optimisation problem (S, w), then it will first choose (one by
one) all elements of F because they have the biggest weight and are feasible (since every subset of F is independent).
Then it will reject (again one by one) all elements from F ′ \F due to the assumption that F ∪ {e} ∈ / F for any
e ∈ F ′ \F . Next, GREEDY will accept or reject the remaining elements which are all in E\(F ∪ F ′ ), and therefore
all are of zero weight. Hence the set X ∗ found by GREEDY has the weight w(X ∗ ) = w(F ) = p(p + 2). However
the set F ′ is independent and has p + 1 elements all of weight p + 1 or p + 2 and hence w(F ′ ) ≥ (p + 1)2 > p(p + 2).
Thus GREEDY fails to find an optimal solution for (S, w), that is, (A) is not true.
(B) ⇒ (C)
Suppose (B) holds, A ⊆ E and F , F ′ are maximal independent subsets of A. We prove by contradiction that
|F | = |F ′ |. Suppose that |F | =
̸ |F ′ |; without loss of generality |F | < |F ′ |. Let F ′′ be any subset of F ′ such that
′′ ′′
|F | = |F | + 1. Clearly F ∈ F (because any subset of an independent set is independent) and so by (B) applied

61
d c

a b

Figure 6.18: Graph to the proof that S2 is not a matroid

to F and F ′′ there exists an e ∈ F ′′ \F such that F ∪ {e} ∈ F. Since e ∈ F ′′ ⊆ F ′ ⊆ A and F ⊆ A, we have


that F ∪ {e} ⊆ A. At the same time F ̸= F ∪ {e} because e ̸∈ F . We deduce that F can be extended to a larger
independent subset of A which is a contradiction with the maximality of F . Hence indeed |F | = |F ′ |.
(C) ⇒ (A)
Suppose that (C) holds, we shall prove that (A) then also holds, by contradiction. So suppose that for some
w : E → R≥0 GREEDY fails to find an optimal solution, more precisely GREEDY finds X ∗ = {e1 , ..., em } but for
some independent set X = {e′1 , . . . , e′k }
w(X ∗ ) < w(X). (6.1)

We may assume without loss of generality that w(e1 ) ≥ w(e2 ) ≥ . . . ≥ w(em ) and w(e′1 ) ≥ w(e′2 ) ≥ . . . ≥ w(e′k ).
The set X ∗ is maximal independent in A = E by its construction by GREEDY. The set X may also be assumed
to be maximal in A = E without loss of generality (if necessary, suitable elements of zero weight are added until
this assumption is satisfied). By (C) then k = m. We will now show by induction that

w(ei ) ≥ w(e′i ), i = 1, . . . , m (6.2)

which yields a contradiction with (6.1). The set {e′1 } is independent since it is a subset of the independent set
X. Hence by the choice of GREEDY at the first step we have that w(e1 ) ≥ w(e′1 ). For the second induction step
suppose that
w(ei ) ≥ w(e′i ), i = 1, . . . , r − 1

but w(er ) < w(e′r ). Let us define A = {e ∈ E : w(e) ≥ w(e′r )}. The set {e1 , . . . , er−1 } is a maximal independent
subset of A because if there was an e ∈ A such that {e1 , . . . , er−1 , e} is also independent then w(e) ≥ w(e′r ) > w(er ),
but then GREEDY would have chosen e rather than er . So we have that {e1 , . . . , er−1 } is a maximal independent
subset of A. At the same time {e′1 , . . . , e′r } is an independent subset of A of a larger size which contradicts (C).
This concludes the inductive proof of (6.2), hence we have a contradiction with (6.1), which completes the proof.

Let S = (E, F) be a matroid. The (common) size of maximal independent subsets of a set A ⊆ E is called the
rank of A and any maximal independent subset of A is called a basis of A. A minimal dependent subset of E (that
is a dependent set whose every proper subset is independent) is called a circuit.
We will now apply Theorem 6.7 to each of the four examples of independence systems S1 − S4 in Section 6.3
and deduce whether they are matroids or not. The results are formulated as four corollaries of Theorem 6.7; we
will use the notation of Section 6.3.

Corollary 6.8 S1 = (E, A) is a matroid for every G = (V, E). Hence GREEDY will correctly solve every instance
of the maximum weight forest problem and, consequently, also every minimal spanning tree problem.

Proof. For the proof we use the Rank Proposition: Let A ⊆ E be fixed and take F ⊆ A. Then F is an independent
subset of A if and only if (V, F ) is an acyclic subgraph of (V, A), which we will denote by GA for ease of reference.
Hence F is a maximal independent subset of A if and only if (V, F ) is a maximal acyclic subgraph of GA . But by

62
Corollary 6.5 each such graph has m − p(GA ) arcs, thus the Rank Proposition holds, and the statement now follows.

A matroid defined in the same way as S1 for a graph G is called a graphic matroid associated with G and denoted
by Gr(G).

Corollary 6.9 S2 = (E, M) is not a matroid in general. Hence GREEDY is not a correct solution method for the
maximal weighted matching problem.

Proof. Let G be the graph of Figure 6.18 and A = {ab, ad, bc}. Then both {ad, bc} and {ab} are maximal
independent subsets of A, they are of different sizes, hence by the Rank Proposition GREEDY does not always
solve the maximal weighted matching problem correctly.

Corollary 6.10 S3 = (E, I) is not a matroid in general. Hence GREEDY is not a correct solution method for the
assignment problem.

Proof. Let m = 2, that is, E is the set of all positions of a 2 × 2 matrix


!
a11 a12
a21 a22

and A = {(1, 1), (1, 2), (2, 1)}. Then both {(1, 2), (2, 1)} and {(1, 1)} are maximal independent subsets of A.
Since they are of different sizes by the Rank Proposition GREEDY does not always solve AP correctly.

Corollary 6.11 S4 is a matroid.

Proof. It is proved in linear algebra that the size of any maximal linearly independent system of columns of a (fixed)
matrix is equal to the rank of this matrix. Hence the Rank Proposition holds and S4 therefore is a matroid.

6.5 Matroids and beyond


Matroids were discovered in 1935 independently by Whitney (US) and Nakasawa (Japan). The theory of matroids
today is a mathematical area on its own and by far exceeds the scope of this course. We will briefly sketch another
result of this theory. First we need to introduce some new concepts.
An ordered triple K = ⟨K, +, ·⟩ is called a field if +, · : K × K → K and if the following (axioms) hold:

a+b=b+a a·b=b·a

a + (b + c) = (a + b) + c a · (b · c) = (a · b) · c

a · (b + c) = a · b + a · c

(∃0 ∈ K) a + 0 = a (∃1 ∈ K) a · 1 = a

(∀a ∈ K) (∃ − a ∈ K) a + (−a) = 0 (∀a ∈ K, a ̸= 0) (∃a−1 ∈ K) a · a−1 = 1

These axioms resemble the well known properties of real numbers. Many concepts for fields are developed in a
way that is close or identical with real numbers. These include vectors, matrices and linear independence.
Let A be an m × n matrix over K. By col(A) we denote the set of column indices of A and

L(A) = {X ⊆ col(A) : columns of A with indices from X are linearly independent}.

63
Figure 6.19: Overview of matroids

Then for the same reasons as for S4 the pair (col(A), L(A)) is a matroid; notation M(A). Clearly, S4 is M(A)
for K = R.
Independence systems S = (E, F) and S ′ = (E ′ , F ′ ) are called isomorphic (notation S ≈ S ′ ) if a bijection
ψ : E → E ′ exists such that
F ∈ F ⇔ ψ(F ) ∈ F ′ ,

where ψ(F ) = {ψ(x) : x ∈ F }.


A matroid S is called matric if S ≈ M(A) for some field K = ⟨K, +, ·⟩ and for some matrix A over K; S is
called graphic if S ≈ Gr(G) for some graph G = (V, E).
The following statement can be found in [1].

Theorem 6.12 Every graphic matroid is matric.

It is known [1] that not every matric matroid is graphic and not every matroid is matric. We summarise these
results in Figure 6.19 which provides an overview of the concepts discussed in this chapter.

64
Part III

Hard problems

65
In combinatorial optimisation, similarly as in other parts of mathematics, there are problems that always seemed
to be harder. This intuitive perception may be tricky: there are numerous examples of problems in mathematics
that people were unable to solve for many years or even centuries but then one day, suddenly, someone came with a
solution. However, there is a theory that enables us to decide about a wide class of problems that this is extremely
unlikely to ever happen. In 1972 S. A. Cook laid the foundations of this theory, called computational complexity.
A key concept here is that of N P -completeness.
In this chapter we first explain the concept of an N P -complete problem. N P -complete problems are extremely
unlikely to be efficiently solvable exactly and therefore in the second chapter of this part we discuss approximation
methods.

66
Chapter 7

N P - completeness

7.1 Why “efficient” means “polynomial”


It will be explained shortly that no efficient algorithms are known for N P -complete problems and, moreover, it
is extremely unlikely that such algorithms will ever be known in the future. Before we do this we need to explain
what “efficient” means. We will argue that this essentially means “polynomial”. So we start with the explanation
of the concept of a polynomial algorithm and then we will explain why “efficient” means “polynomial”. In the next
section we will use this discussion for an (informal) definition of N P-completeness.
Suppose we are given a certain problem Q. First of all realise that this may mean two very different things:
either a general formulation (called simply “problem Q”), such as: “Given real numbers a, b, c, find a complex
number x such that ax2 + bx + c = 0” or a specific formulation with concrete coefficients (called “an instance of Q”)
such as “Find a complex number x such that 3x2 − 7x + 4 = 0”. So the first was what we would call the problem
of solving a quadratic equation, the second was an instance of this problem (a quadratic equation with concrete
coefficients).
If we compare two algorithms for solving a fixed problem Q then it is natural to say that one of these algorithms
is more efficient (than the other) if it requires less operations for finding a solution (we will restrict our attention here
to basic arithmetic operations of addition, multiplication, division, checking equality or inequality and swapping
the order, and will assume that they are computationally all equally demanding). So the number of operations that
needs to be carried out by an algorithm is used as a measure of efficiency.
There are two main difficulties with this idea:

(a) typically the bigger the size of an instance, the more operations are needed or, in other words, the number of
operations is a function of the size of an instance (we also call this “the size of input”);

(b) it may happen that even if two instances have the same size the number of operations is different because
of the internal structure of instances (for example if a max-flow problem is solved for two networks with the
same number of nodes and arcs, the number of operations may still be very different because of the structure
of the underlying graphs).

In the case (b) a usual (but not exclusive) convention is to consider the worst possible case, that is the greatest
number of operations over all instances of the same size. The number of operations that an algorithm A needs to
find a solution in the worst case, expressed as a function, say f (L), of the size of input L is called the computational
complexity of A. Shortly below we will give a more formal definition.
For illustration consider the very simple task of calculating the scalar product of two vectors from the definition.
That is if u = (u1 , . . . , un )⊤ and v = (v1 , . . . , vn )⊤ then the scalar product is

u⊤ v = u1 v1 + · · · + un vn .

67
The size of input is L = n + n = 2n, the number of operations is n (multiplications) + (n − 1) additions
= 2n − 1 = L − 1 = f (L). Thus here f (L) is a linear function and we would therefore say that this (trivial)
algorithm is “of linear complexity”, or just “linear”.
As another simple example consider multiplication of two real n × n matrices, say A = (aij ) and B = (bij ) from
the definition. Thus L = 2n2 and we need to calculate
n
!
X
C = (cij ) = A.B = aik bkj .
k=1

Each of the n2 entries of C is the scalar product of two n dimensional vectors, hence the total number of
operations (using the example above) is (recall that in these discussions L ≥ 1):
1 1
f (L) = n2 (2n − 1) = 2n3 − n2 = √ L3/2 − L ≤ L2 .
2 2
We may say that the computational complexity is “less than quadratic”.
More formally, given a problem Q and an algorithm A for solving Q, the computational complexity of A (also
denoted as cc(A)) is the function

fA (L) = sup{op(A, I) | I ∈ inst(Q), |I| = L},

where op(A, I) stands for the number of operations that algorithm A needs to solve instance I of the problem
Q, inst(Q) is the class of all instances of Q and |I| is the size of I. If there is no risk of confusion, we will say
“complexity” rather than “computational complexity”.
We should also give a more precise definition of the size of an instance. In fact, in the theory of computational
complexity there are three main ways of defining the size of an instance:

(a) Size L1 is the number of parameters. Thus if an input is given by one number (integer or real) then L1 = 1.
If it is an m × n matrix A = (aij ) then L1 = mn. Note that this concept of the size was used in the two
introductory examples above, so the symbol L used there was actually L1 .

(b) Size L2 is the number of all individual symbols used to formulate an instance, that is typically the number
of digits and symbols such as decimal points or minus signs. Since the number of decimal points and signs is
normally negligible, compared to the number of digits, we will only consider the number of digits. Note that
if the input is one integer such as 325 then the size of input in the decimal system is L2 = 3 = 1 + ⌊log10 325⌋,
in general the size of an integer k ̸= 0 is L2 = 1 + ⌊log2 |k|⌋, where the base 2 is used since computers work
in the binary; if the input is 0 then obviously L2 = 1. If the input is given by an m × n integer matrix, say
A = (aij ), then X X X
L2 = (1 + ⌊log2 |aij |⌋) + 1 = mn + ⌊log2 |aij |⌋ .
aij ̸=0 aij =0 aij ̸=0

Obviously, L2 is applicable only if the number of symbols of any instance is finite: usually we restrict ourselves
to instances with integer entries.

(c) Size L3 is the sum of the moduli of all parameters. Thus if the input is given by one integer k then L3 = |k|.
Pm Pn
If it is an m × n integer matrix A = (aij ) then L3 = i=1 j=1 |aij |.

Clearly, L1 ≤ L2 . We also have L2 ≤ L3 because 1 + ⌊log2 x⌋ ≤ x for any positive real number x. Therefore if
the computational complexity is expressed in terms of L1 using a function f1 , in terms of L2 using a function f2
and in terms of L3 using a function f3 then

f1 (x) ≥ f2 (x) ≥ f3 (x). (7.1)

68
An algorithm is called strongly polynomial (polynomial, pseudopolynomial) if there is a polynomial p(x) such
that f1 (x) ≤ p(x) (f2 (x) ≤ p(x), f3 (x) ≤ p(x)). It follows from (7.1) that if an algorithm is strongly polynomial
then it is also polynomial and if it is polynomial then it is also pseudopolynomial. In what follows we will only
discuss polynomiality (where L2 is used) but it is important to know that in many cases an algorithm is proved to
be polynomial by actually showing that it is strongly polynomial because this is usually much easier. On the other
hand, if we cannot find a polynomial algorithm for a problem then we try (at least) to find a pseudopolynomial
algorithm.
We will now explain why it is generally accepted that “efficient” means “polynomial”. The table below shows
the actual running times of algorithms depending on the computational complexity.

Size of input → 20 60 ··· 100 1000


↓ Computational Complexity
L 0.002µs 0.006µs ··· 0.01µs 0.1µs
L2 0.04 µs 0.36 µs ··· 1 µs 0.1ms
L4 16 µs 1.296 ms ··· 100ms 100s
2L 0.1 ms 3 years ··· 12
10 years .
L! 7.7years . ··· . .

The figures in this table are calculated on the assumption that the computer speed is 10GHz (that is 1010
operations per second or, equivalently, one operation is carried out in 10−10 second). So for instance the entry in
this table for size 60 and computational complexity L2 is 602 operations ×10−10 seconds = 3600 × 10−10 seconds
= 0.36 × 10−6 seconds = 0.36µs.
It is clearly visible from the table that the running times are acceptable even for problems of larger sizes while
the computational complexity is polynomial (that is L, L2 and L4 ). However, once it is exponential it is acceptable
only for problems of very small sizes.
One may argue that computers are faster and faster and so in the future computational complexity will not
matter very much. The next table shows how much the capability of solving problems will increase in the future
assuming that the computer speed increases 1000 times, depending on the computational complexity. The column
“Size” shows an approximate size of instances of a (fixed) problem that will be solved by algorithms of various
complexities within a time limit within which today we can solve instances of this problem of size 100. For example
if the size solvable today is s = 100 and the (unknown) size in the future is s′ then for complexity L2 we have

(s′ )2 = 1000s2 and thus s′ = s 1000 ∼ 3,162.

Computational complexity Size


L 100,000
L2 3,162
L4 562
2L 109
L! 101

Again, we see that there is a clear borderline between algorithms of polynomial and exponential complexities:
faster computers in the future will enable us to solve problems of substantially larger sizes than today, provided
that the complexity is polynomial, but not in the case of algorithms of exponential complexity.
One may argue that to require “polynomial” alone is not enough because for instance a complexity 1020 would
hardly be considered more plausible than 10−5 2L . But complexities with extremely big and small coefficients
practically never occur.
The above discussion motivates us to accept the convention that “efficient” means “polynomial”.

69
7.2 Examples of hard problems
The discussion in the previous section justifies the convention that “efficient algorithm” means “polynomial al-
gorithm”. Consequently it is reasonable to accept that a “hard problem” is a problem for which a polynomial
algorithm “does not exist”. However, the expression “does not exist” has two possible interpretations:

a. we (people) do not know a polynomial algorithm (for a selected fixed problem) today but admit that maybe
it will be known in the future, or

b. that it does not exist at all (independently of the human mind).

Obviously, we would like to make the assessment of hardness independently of the state of research, that is (if
true and possible) we would like to know not only that such an algorithm is not known today but that it will never
be known, or at least that it is extremely unlikely that this will ever happen.
We start with the formulations of a few problems for which no polynomial algorithm is known and later on we
will explain why it is extremely unlikely that such an algorithm will be known in the future for these and many
other problems (we will call such problems N P-complete).
Recall that a cycle in a graph G = (V, E) is called Hamiltonian (or a tour ) if it contains all nodes of G and G is
called Hamiltonian if it contains a Hamiltonian cycle.

Problem 7.1 (HAM – Existence of a Hamiltonian cycle:) Given a graph, is it Hamiltonian?


No polynomial method is known for HAM and neither for the digraph version of HAM (sometimes denoted
DIRHAM).

Problem 7.2 (TSP – The Travelling Salesman Problem:) Given a complete arc-weighted graph, find a tour
of minimal total length.
No polynomial method is known for TSP and neither for the digraph version of the TSP.

Remark 7.3 The TSP problem can be tought of in the following way: A travelling salesman wishes to visit a number
of cities and return to the starting point, in such a way that each city is visited exactly once, and the total distance
covered is as short as possible.

Similarly, no polynomial method is known for the three problems below.

Problem 7.4 (ILP - Integer Linear Programming:) Given an integer matrix A and integer vectors b and c,
find an integer vector x which solves the optimisation problem

min c⊤ x
s. t. Ax = b
x ≥ 0.

Problem 7.5 (INTEGER KNAPSACK) Given integers a1 , . . . , an and K, are there integers x1 , . . . , xn ≥ 0
such that
Xn
aj xj = K?
j=1

Problem 7.6 (0-1 KNAPSACK) Given integers a1 , . . . , an and K, is there a subset S of {1, . . . , n} such that
X
aj = K?
j∈S

70
7.3 N P -completeness
As mentioned before the concept of an N P-complete problem is a formalisation of the intuitive idea that a problem
is “hard”, not only because no polynomial algorithm is known today but it is extremely unlikely that such an
algorithm will ever be known. The question we aim to address in this section is how we know that this is the case
once a problem is given.
Recall that any optimization problem can be represented as

min{cT x | x ∈ S} (7.2)

where S is the feasible set, which is normally defined by functional inequalities or equalities. This problem is
associated with the following decision problem:

For a given k is there an x ∈ S with value cT x ≤ k? (7.3)

Clearly, (7.2) can be easily solved if and only if (7.3) is easy to decide for any k.
A problem whose answer is “yes” or “no” is called a decision problem. So (7.3) is a decision problem.

Definition 7.7 (Complexity class P) A decision problem is in the complexity class P (the class of polynomial-
time solvable problems) if there exists an algorithm for which there exist constants C, p > 0 such that

a) the algorithm gives the correct decision ”Yes/No” for all instances I of the problem, and

b) the number of bit operations and comparisons required to arrive at this answer is bounded by C(L(I))p , where
L(I) denotes the size of the instance I.

Definition 7.8 (Complexity class N P) A decision problem is in the complexity class N P (the class of polynomial-
time checkable problems) if there exist constants C, p > 0 such that every “yes”-instance I can be verified as such
using at most C(L(I))p bit operations and comparisons.

Proposition 7.9 P ⊆ N P.

Proof. For problems in P, both “Yes” and “No”-instances can be correctly decided in polynomial time.
Note that the letters “N P” stand for “non-deterministically polynomial”. The class N P can be viewed as the
class of all “reasonable” problems (including those for which a polynomial solution method is not known and those
which are polynomially solvable). Let us point out that a decision problem lies in N P if once a solution is known
for an arbitrary instance, it can be verified by a polynomial algorithm (but it is not required that a solution can
be found by a polynomial algorithm).
For instance if a solution to HAM is known it can be easily verified (that is it is easy to verify that a sequence
is a cycle in the graph, passing through all nodes of this graph) although it is in general not easy to find such a
cycle; but it is not clear whether NON-HAM is in N P, where NON-HAM is the question: given a graph is it non-
Hamiltonian? Because what and how would you use to verify in an efficient way that a graph is non-Hamiltonian
even if you know that?
The five problems presented in the previous section, for which existence of polynomial algorithms is not known,
are in N P .
It is a long-standing open conjecture to prove that P ̸= N P. Equality would of course mean that all reasonable
problems are polynomially solvable. The reason why this conjecture is believed to be true is that the class N P
includes some notoriously hard problems. So far, the best one can do to prove that a polynomial-time checkable
problem is intractable is to show that it is “hardest” among all problems in N P. The conjecture P = ̸ N P remains
one of the greatest unsolved conjectures of mathematics.

71
Definition 7.10 (N P-hard) A decision problem P is N P-hard if for all problems Q ∈ N P there exists a mapping
θ that transforms any instance I of Q into an instance θ(I) of P , such that for some C, p > 0,

a) the number of bit operations and comparisons required for computing θ(I) is bounded by C(L(I))p , and

b) the output (i.e. “yes/no”) for the instance I is the same as the output of θ(I).

This is a sensible notion of hardness because if an N P -hard Q was polynomial time solvable then it would be the
case that N P ⊆ P. Thus, if a problem Q is N P-hard then Q is extremely unlikely to be polynomially solvable
because it would imply that all reasonable problems are polynomially solvable.

Definition 7.11 (N P-complete) A polynomial-time checkable problem (NP) is called N P-complete if it is N P-


hard.

The class of N P-complete problems will be denoted by N PC. Figure 7.1 provides an overview of the defined
concepts.

Figure 7.1: The class N P

Methodology for proving that a problem is N P-complete is beyond the scope of this course, recommended
reading for this is [1] and [2]. Important examples of N P-complete problems are all problems of the previous
section, that is HAM, TSP, ILP, KNAPSACK and 0-1 KNAPSACK. Examples of problems in P are LP, MFP, SPP
and AP.
Interestingly, some important special cases of the above mentioned N P problems are also N P-complete. We
discuss this now because it will be important in the next chapter.
The TSP is called metric (notation ∆TSP) if the direct-distances matrix A = (aij ) satisfies the triangle inequal-
ity, that is
aik + akj ≥ aij (7.4)
for all i, j, k.
The graph of Figure 7.2 is an instance of the ∆TSP (verification of all inequalities is left as an exercise) whereas
that of Figure 7.3 is not (for instance 1 + 2 < 7).
In what follows the word “distance” may have various practical meanings: geographical distance, time, cost, . . .
The TSP is called Euclidean (notation ETSP) if the direct-distances matrix A = (aij ) is the matrix of Euclidean
distances between points in a plane with the Cartesian coordinate system, that is R2 . For instance, given four
points z1 , z2 , z3 , z4 in the x-y plane we may create a direct-distances matrix A = (aij ), where

aij = d(zi , zj )

72
3
d c
1 4

2 2

a b
3

Figure 7.2: An instance of ∆TSP

7
d c
1 4

5 2

a b
3

Figure 7.3: An instance of a general TSP

and for a = (x1 , y1 ), b = (x2 , y2 ) the symbol d(a, b) stands for the Euclidean distance, that is
p
d(a, b) = (x1 − x2 )2 + (y1 − y2 )2 .

It follows that every ETSP is ∆TSP. Both ETSP and ∆TSP are proved to be N P-complete [2].
We finish this chapter with an important remark. The hardness of two problems may be very different even if
they are very similar. The table below displays pairs of closely formulated problems of profoundly different hardness:

In P In N P
LP ILP
AP 3AP
AP TSP
MINCUT MAXCUT
SEPP+ LEPP+

In this table SEPP+ (LEPP+) stands for the shortest (longest) elementary path problem in a digraph with
non-negative weights of arcs and 3AP for the 3-dimensional assignment problem.

73
Chapter 8

Algorithms for N P - complete problems

The aim of this chapter is to present approximation algorithms for N P- complete problems. We will restrict
our attention to ∆TSP where feasible solutions are tours (Hamiltonian cycles) in graphs. So the algorithms will
produce tours in given graphs. The construction is based on a related concept of an Eulerian walk in a multigraph
(see Section 8.2) whose construction is easy. A key feature of the theory presented in this chapter is that we will
provide bounds on the relative error (see Section 8.1) of the considered algorihtms. This makes them different from
heuristics. We therefore start with the definition and explanation of the relative error.

8.1 Relative error


In order to quantitatively assess the quality of an algorithm we need to define the relative error of an algorithm.
Let Q be a combinatorial optimisation problem (minimisation). As before, we denote by inst(Q) the set of all
instances of Q. Let c > 0 be the cost function on the set of feasible solutions of an instance and A an (approximation)
algorithm for Q. Hence A can be seen as a mapping that assigns to every I ∈inst(Q) a feasible solution f (I) (not
necessarily optimal). The cost of f (I) therefore is c(f (I)). Let c∗ (I) be the optimal cost for I ∈inst(Q). Then the
relative error of A with respect to instance I is defined as

c(f (I)) − c∗ (I)


re(A, I) = .
c∗ (I)

Finally, if ε ≥ 0 then A is called ε-approximate for Q if re(A, I) ≤ ε for all I ∈inst(Q).


For example if Q=TSP then inst(Q) is the set of all arc-weighted graphs, such as those in Figure 8.1 (and
an infinite number of other arc-weighted graphs). Feasible solutions are Hamiltonian cycles. For I = I1 all three
feasible solutions are in Figure 8.2. Clearly c(σ1 ) = 17, c(σ2 ) = 20, c(σ3 ) = 13, and so c∗ (I) = 13. Let us apply
GREEDY to find a solution of the TSP associated with I1 . GREEDY will accept the arcs of weights 1 and 2, reject
3, accept 4, reject 5 and accept 10. So f (I1 ) = σ1 and therefore re(GREEDY, I1 ) = 17−1313
4
= 13 .

8.2 Multigraphs and Eulerian spanning graphs


The main tool for finding approximate solutions to ∆TSP will be that of an Eulerian walk. We need to introduce
a few new concepts before we give its definition.
A multigraph is a diagram that can be obtained from an undirected graph by adding a nonnegative number of
parallel arcs (hence every graph is also a multigraph), see Figure 8.3.
A walk in a multigraph is a sequence of nodes and arcs (v1 , e1 , v2 , e2 , . . . , vk ) where each ei is an arc with
endpoints vi and vi+1 (for i ∈ {1, . . . , k − 1}).

74
3 2

10 5 8

1 3

5 3 4 1 6 7
1
4 2

2 7 4

I1 I2 I3

Figure 8.1: Three instances of the TSP.

σ1 σ2 σ3

Figure 8.2: All three feasible solutions for I1 .

Figure 8.3: An example of a multigraph

75
f1
f2
f3
v1 v2
f4

f6 f7
f5 f8

v3 v4

Figure 8.4: An illustration of walks in a multigraph.

If v1 = vk then the walk is called closed. An Eulerian walk in a multigraph is a walk satisfying the following
properties:

(1) the walk is closed,

(2) every node appears in the walk at least once and

(3) every arc appears in the walk exactly once.

In the multigraph of Figure 8.4 the sequence (v1 , f8 , v4 ) is not a walk, (v1 , f1 , v2 , f2 , v1 , f3 , v2 ) is a walk but
not closed, (v1 , f1 , v2 , f2 , v1 , f3 , v2 , f4 , v1 ) is closed but not Eulerian and

v = (v1 , f1 , v2 , f2 , v1 , f3 , v2 , f4 , v1 , f6 , v4 , f8 , v2 , f7 , v3 , f5 , v1 )

is Eulerian.
A multigraph G is called Eulerian if there is an Eulerian walk in G. Probably the first theorem of graph theory
(motivated by the famous problem of Königsberg bridges) is the following:

Theorem 8.1 (Euler, 1736) A multigraph G = (V, E) is Eulerian if and only if

(1) G is connected and

(2) the degree of every node is even.

We will not prove this theorem here but it is noted that the necessity of (1) is trivial and that of (2) almost
immediate since an Eulerian walk enters and leaves every node every time by different arcs (and uses all arcs),
implying that there must be an even number of arcs incident with each node. The converse implication is slightly
less trivial and is essentially an algorithm for finding an Eulerian walk. We will need to construct Eulerian walks
in examples but they will all be very simple, solvable by inspection and so this part of the proof is skipped.
Thus we assume in what follows that once the conditions of Theorem 8.1 are satisfied an Eulerian walk can be
found. The aim of this section is to devise algorithms which produce tours (as close as possible to optimal tours).
These algorithms construct tours from Eulerian walks and we will now explain how this is done.
Let A = (aij ) be a symmetric n × n matrix with zero diagonal. An Eulerian spanning graph of A (ESG) is
any Eulerian multigraph G = (V, E) with V = {1, . . . , n}. The cost of G is
X
c(G) = aij ,
(i,j)∈E

76
G1 G2 G3 G4

1 2 1 4 2 1 4 2 1 4 2
4
4 4 4
3 1
3 1 3 1
3 1
3 3 3 3

Figure 8.5: G1 -G3 are Eulerian spanning graphs.

v1 12 v2

9 13

15 8
v3 v4
7

Figure 8.6: An instance of ∆TSP.

where the summation is carried out for all (possibly multiple) arcs. Hence c(G) is equal to the cost of any Eulerian
walk in G. For example, if  
0 4 3
A = 4 0 1
 

3 1 0
then each of the multigraphs G1 , G2 and G3 in Figure 8.5 is an ESG of A. Note that G4 is not, since it is not
Eulerian. We calculate easily c(G1 ) = 8, c(G2 ) = 14, c(G3 ) = 10.
Given a ∆TSP in the form of a direct-distances matrix A we will consider various ESGs for A. Once we have
selected an ESG we will construct a tour, called the embedded tour, from any Eulerian walk w as follows: Start
with the first node in w and then, in the order of appearance, repeatedly take the first node in w that has not been
taken yet; at the end add the first node.
For example consider the ∆TSP for the graph of Figure 8.6. We observe that in this graph there are three tours
of lengths 41, 42 and 45. So the optimal solution to the ∆TSP has length 41. The corresponding direct-distances
matrix is  
0 12 9 8
12 0 15 13
A=
 
 9 15 0 7 

8 13 7 0
One possible ESG is displayed in Figure 8.7. It has total length 78. One of many Eulerian walks in this
multigraph (each must be of length 78) is

v = (v1 , e1 , v2 , e2 , v1 , e7 , v4 , e6 , v3 , e3 , v2 , e4 , v3 , e5 , v1 ).

with the embedded tour τ = (v1 , v2 , v4 , v3 , v1 ). We see that c(τ ) = 41, which is optimal as we already know, but
this is rather because we were lucky with the selection of the ESG and w. To see this consider another ESG, of
Figure 8.8, its cost is 56 and for the Eulerian walk (writing node indices only) w = 1213431 the embedded tour is
12341, of cost 42.

77
e1

e2
v1 v2
e3

e4
e5

e7

v3 v4
e6

Figure 8.7: An ESG for the ∆TSP of Figure 8.6.

12
v1 v2
12

9 9

7
v3 v4
7

Figure 8.8: Another ESG for the ∆TSP of Figure 8.6.

8.3 Approximation algorithms for ∆TSP


We will present two approximation algorithms for ∆TSP called TREE and CHRISTOFIDES’ ALGORITHM. The
first is simpler and faster but the second has substantially better (that is smaller) relative error. We will find a
bound on the relative error for both algorithms. For maximum generality we will assume that the ∆TSP is given
on a complete graph (if necessary, missing arcs can be added with weights +∞). In examples we will draw only arcs
that are being discussed but we should be aware that all arcs are present in the background (although not visible).
For the discussion below it is useful to realise that an instance of the ∆TSP for a complete graph can be
given in two equivalent ways: as a symmetric matrix with zero diagonal, say A = (aij ) ∈ Rn×n , or as a complete
arc-weighted graph (Kn , (aij )) (but only arcs necessary for the calculations are drawn).

Theorem 8.2 Let (Kn , (aij )) be an instance of ∆TSP with associated symmetric direct-distances matrix A = (aij ).
If G = (V, E) is an Eulerian spanning graph for A = (aij ) then every tour in Kn embedded in an Eulerian walk in
G has length not exceeding the cost of G.

Proof. Consider an instance I of the ∆TSP with associated symmetric direct-distances matrix A = (aij ), recall
that A is symmetric.
First we prove the generalised triangle inequality, that is

aj1 js ≤ aj1 j2 + aj2 j3 + aj3 j4 + ... + ajs−1 js (8.1)

for any j1 , j2 , . . . , js in 1, . . . , n, for s ≥ 3. To do this, start with the RHS and repeatedly use the triangle

78
inequality (7.4) starting from the first two terms:

RHS ≥ aj1 j3 + aj3 j4 + aj4 j5 + . . . + ajs−1 js


≥ aj1 j4 + aj4 j5 + . . . + ajs−1 js
..
.
≥ aj1 js−1 + ajs−1 js
≥ aj1 js .
Now, let G be an ESG of I and w be an Eulerian walk in G. Let τ be the tour embedded in w. We have to prove
that c(τ ) ≤ c(G). We use the fact that c(G) = c(w). The walk w has the form

w = (i1 , α1 , i2 , α2 , i3 , α3 , . . . , in , αn , i1 ),

where (ir , αr , ir+1(mod n) ) is a walk (r = 1, . . . , n) and τ = (i1 , i2 , . . . , in , i1 ) is the embedded tour, where repeated
ij are not listed. Hence using the generalized triangle inequality (8.1) we have

c(τ ) ≤ ai1 i2 + ai2 i3 + . . . + ain i1


≤ c(i1 , α1 , i2 ) + c(i2 , α2 , i3 ) + . . . + c(in , αn , i1 )
= c(w) = c(G).

We are now ready to formulate the first of the two algorithms.


Algorithm 8.3 (TREE)
Input: ∆TSP given by an n × n symmetric direct-distances matrix A = (aij ) ≥ 0.
Output: A tour in Kn .
(1) Find a minimal spanning tree T in (Kn , (aij )) (note: GREEDY may be used)

(2) Create a multigraph G = (V, E) by using two copies of each arc of T .

(3) Find an Eulerian walk w in G and the tour embedded in w.

Remark 8.4 The degree of every node in G is twice its degree in T , hence even. G is also connected as it contains a
spanning tree (namely T ) as a sub-graph. Therefore by Theorem 8.1 G is an ESG of A.

Theorem 8.5 TREE is a 1-approximate algorithm for ∆TSP.

Proof. Let T be the minimal spanning tree, G the multigraph and τ the tour, all found by TREE. Let τ ∗ be an
optimal tour and T ∗ any tree obtained from τ ∗ by removing one arc. Hence we have
c(τ ) ≤ c(G)
= 2c(T )
≤ 2c(T ∗ )
≤ 2c(τ ∗ )
where the first relation is by Theorem 8.2, the second follows from the construction of T by TREE, the third by
minimality of T and the fourth because A ≥ 0. By re-arranging we obtain
c(τ ) − c(τ ∗ )
≤1
c(τ ∗ )
which completes the proof.
The next lemma is probably well known but is fully included here because it is essential for the second algorithm
(see below) and has a simple proof.

79
Lemma 8.6 In any graph the number of nodes of odd degree is even.

Proof. Let G = (V, E) be a graph. Then X


deg(v) = 2|E|,
v∈V

because every arc is counted in the summation exactly twice. Hence the number of odd terms in the sum is even.

We are ready to formulate the final algorithm. Note that it differs from the previous algorithm only in the
second step.

Algorithm 8.7 (CHRISTOFIDES)


Input,Output: As in TREE.

(1) Find a minimal spanning tree T in (Kn , (aij )) (note: GREEDY may be used).

(2) Take the nodes of odd degrees in T and find a perfect matching M of minimal total weight in the complete
graph induced by these nodes. Let G be the multigraph with arcs from T and M .

(3) Find an Eulerian walk w in G and the tour embedded in w.

Remark 8.8 When the perfect matching M in step (2) is added to T , it will increase the degree of every node with
odd degree by 1 and will not affect the nodes of even degree. Therefore every node of G constructed in step (2)
will have even degree. G is also connected as it contains a spanning tree (namely T ) as a sub-graph. Therefore by
Theorem 8.1 G is an ESG of A.

Theorem 8.9 Ghristophides’ algorithm is a 1/2-approximate algorithm for ∆TSP.

Proof. Let T be the minimal spanning tree, M the matching, G the multigraph and τ the tour, all found by
Christofides’ algorithm. Let τ ∗ be an optimal tour and T ∗ any tree obtained from τ ∗ by removing one arc. Hence
by Theorem 8.2 we have
c(τ ) ≤ c(G) = c(T ) + c(M ). (8.2)

The inequality
c(T ) ≤ c(T ∗ ) ≤ c(τ ∗ ) (8.3)

holds by minimality of T and since A ≥ 0.


Let i1 , i2 , i3 , . . . be the nodes of odd degree in T in the order they are visited by τ ∗ . Consider the following two
matchings:
M1 = {i1 i2 , i3 i4 , . . .}, M2 = {i2 i3 , i4 i5 , . . .}.

Since M is a minimal matching we have

2c(M ) = c(M ) + c(M ) ≤ c(M1 ) + c(M2 ). (8.4)

But by the generalized triangle inequality (8.1), see also Figure 8.9 (where the zig-zag line corresponds to τ ∗ ),
we have:
c(M1 ) + c(M2 ) ≤ c(τ ∗ ). (8.5)

Inequalities (8.4) and (8.5) imply


1 ∗
c(M ) ≤ c(τ ),
2

80
Figure 8.9: To the proof of Theorem 8.9

which together with (8.2) and (8.3) yield


3 ∗
c(τ ) ≤ c(τ )
2
By re-arranging we obtain
1 ∗
c(τ ) − c(τ ∗ ) ≤ c(τ ),
2
which completes the proof.
Note that statistical tests show that the average relative error of CHRISTOFIDES’ ALGORITHM is about
19.5% which is currently the best result for all algorithms for which an estimate of the relative error is known.
Finding a perfect matching of minimal total weight in step (2) is beyond the scope of this course, but polynomial
algorithms for this problem exist [1]. Hence Christofides‘ algorithm is polynomial.

81
Bibliography

[1] Christos H. Papadimitriou and Kenneth Steiglitz. Combinatorial Optimization: Algorithms and Complexity.
Dover Publications, 1998.

[2] M.R. Garey and D.S. Johnson. Computers and Intractability: A Guide to the theory of NP-completeness. Bell
Laboratories, 1979.

[3] R. Bronson. Theory and Problems of Operations Research. McGraw-Hill, 1982.

[4] R.K. Ahuja, T.L. Magnanti, and J.B. Orlin. Network Flows: Theory, Algorithms and Applications. Prentice
Hall, 1993.

You might also like