CO Lecture Notes
CO Lecture Notes
2024
Contents
1 Introduction 2
5 Transportation problem 36
5.1 Basic properties of the transportation problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2 An application of the primal-dual algorithm: algorithm ALPHA–BETA . . . . . . . . . . . . . . . . 40
7 NP - completeness 67
7.1 Why “efficient” means “polynomial” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.2 Examples of hard problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.3 N P -completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
1
8 Algorithms for N P - complete problems 74
8.1 Relative error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.2 Multigraphs and Eulerian spanning graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.3 Approximation algorithms for ∆TSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
1
Chapter 1
Introduction
Combinatorial optimisation is a key area of discrete optimisation. It deals with optimisation problems formulated
in combinatorial terms. A substantial part of combinatorial optimisation is based on linear programming. We will
therefore review essential bits of linear programming at the beginning of this course, with emphasis on duality.
There are a good number of problems in combinatorial optimisation, for which no efficient solution method is
known (the meaning of the word “efficient” is explained in Section 7.1). We will pay special attention to these
problems towards the end of the course.
These lecture notes are based mainly on [1]. The classical monograph [2] is recommended for an advanced
reading on computational complexity. Additional examples can be found in [3] and [4].
All numbers, vectors and matrices in these notes are real. The symbol ⊤ indicates the transposition of a matrix.
The vectors are one-column matrices, therefore for instance the scalar (inner) product of vectors x and y is written
as x⊤ y.
2
Part I
3
Chapter 2
Here m, n ∈ N; cj , aij , bi ∈ R for all i ∈ {1, . . . , m}, j ∈ {1, . . . , n}; and the brackets ⟨⟩ indicate that for every i
exactly one of the operators inside applies. Notice that in general not all variables are required to be non-negative.
But strict inequalities are not allowed in linear programming problems that we study here.
In optimisation it does not matter too much whether we maximise or minimise, because the minimisation of a
function f (x) is essentially the same task as maximisation of −f (x). So we may assume without loss of generality
that our task is the minimisation (say). This is what we will normally assume (unless stated otherwise). The
inequalities with ≥ can be transformed to inequalities with ≤ (and the other way round) simply by multiplying
both sides by (−1). Also, a condition (symbolically)
LHS ≤ bi
4
Hence any linear program can easily be transformed to a linear program with minimisation, equations only and all
variables non-negative, that is to:
min f (x) = c⊤ x
s.t. Ax = b, (2.2)
x ≥ 0.
Here f is called the objective function, A ∈ Rm×n , c = (c1 , . . . , cn )⊤ ∈ Rn , b = (b1 , . . . , bm )⊤ ∈ Rm are given
coefficient matrix and coefficient vectors, x = (x1 , . . . , xn )⊤ ∈ Rn is the sought vector of unknown variables,
and (2.2) is called a linear program in standard form (SLP). In contrast, the following linear program
min c⊤ x
s.t. Ax ≥ b, (2.3)
x ≥ 0.
We start by multiplying the first inequality by (−1), then we add a slack variable in each of the two inequalities
and replace x1 by the difference of two new non-negative variables. Finally we multiply the objective function by
(−1) and we get the following SLP:
We will now introduce some terminology for the SLP (linear programs in the standard form). This is done
similarly for CLP or general linear programs.
The set of feasible solutions is
M = {x ∈ Rn : Ax = b, x ≥ 0},
5
method is not polynomial since it behaves exponentially in extreme cases. However, (and this is also important
for combinatorial optimisation) linear programs are polynomially solvable by other methods, such as ellipsoid,
Karmarkar‘s and most recently the interior point method. We are not going to review SIMPLEX here but need to
see a fundamentally important consequence of this method which is summarised in the following statement offering
exactly three alternatives for each linear program.
Theorem 2.2 (Fundamental Theorem of Linear Programming) For every linear program (LP) exactly one
of the following holds:
The third of these alternatives means that LP is not bounded from below, i.e., it is unbounded.
Due to this fundamental result, by a “solution” of a linear program we will always understand (unless stated
otherwise) either finding an optimal solution, or deciding that the linear program is infeasible or that the objective
function is unbounded.
3x1 + x2 ≥ 7 −7π1 + π2 ≤ −4
x1,2 ≥ 0 π1,2 ≥ 0
Note that the coefficients of f (x) are the same as the right-hand side constants of the second problem and
similarly, the coefficients of g(π) are the same as the right-hand side constants of the first problem. Also, the matrix
of the left-hand side coefficients of the second problem is obtained by the transposition of the same matrix in the
first problem.
Let us replace the constants of f (x) using the right-hand sides of the second problem and manipulate:
6
In this example, it follows that f (x) ≥ g(π) for any feasible solution x to the first problem and feasible solution π
to the second problem. It is obvious that many such examples could be constructed where pairs of linear programs
are strongly related. Several questions then arise, for instance, can equality f (x) = g(π) be attained? Or if one of
the problems is infeasible, what can be said about the other problem?
Linear programming originated in the 1940s and already in the first years the following interesting phenomenon
was observed: there is a scheme by which every linear program can be associated with another linear program
(minimisation to maximisation or vice versa), called dual, so that the properties of these programs expressed in
Theorem 2.2 are strongly related as in the example above. This may seem to be of little practical importance at
first but the subsequent years showed that duality is actually very useful, and in particular it is a powerful tool
for constructing algorithms for combinatorial optimisation problems. The first part of this course will demonstrate
this.
It is important to understand with confidence how a dual problem to a given linear program is constructed. We
will now explain this in some detail. We will first show it for general linear programs as described in (2.1). However,
for ease of discussion we will assume that all inequalities use the same sign, we fix it to be ≥. So if necessary you
will need to multiply the inequalities with ≤ by (−1). (Keep this convention in mind to avoid a disappointment!)
So our linear program now has the form
min f (x) = c⊤ x
s.t. a⊤
i x = bi , i ∈ I,
a⊤
i x ≥ bi , i ∈ I, (2.4)
xj ≥ 0, j∈J
xj ≥ ≤ 0, j ∈ J.
where the sign ≥≤ indicates that there is no restriction on xj . The vectors a⊤ i , for i ∈ I, are the vectors of
coefficients of the left-hand sides of equations, similarly a⊤i , i ∈ I, for inequalities. The set J (respectively, J) is
the set of indices of non-negative (respectively, free) variables. We will call this problem primal.
In the construction of the dual problem it is important to bear in mind that we assign dual variables to primal
constraints and dual constraints to primal variables. So the correspondence is not variable-variable and constraint-
constraint! This is a very frequent mistake that should be avoided. In order to define the dual problem we first
form the matrix A of all left-hand side coefficients. This matrix can be written row-wise as
⊤
ai , i ∈ I
A= .
a⊤
i , i∈I
A dual variable is assigned to each primal constraint, this variable is free if it is assigned to an equation and
non-negative if it is assigned to an inequality.
A dual constraint is assigned to every primal variable; this constraint is an equation if it is assigned to a free
variable and it is an inequality if it is assigned to a non-negative variable.
The matrix of the left-hand side coefficients of the dual is obtained by the transposition of this matrix in the
primal problem (because π ⊤ Aj = A⊤ j π).
The coefficients of the dual objective function are equal to the right-hand side coefficients of the primal
problem.
7
min f (x) = c⊤ x max g(π) = π ⊤ b
a⊤
i x = bi i∈I πi ≥≤ 0
a⊤
i x ≥ bi i∈I πi ≥ 0
xj ≥ 0 j∈J π ⊤ Aj ≤ c j
xj ≥≤ 0 j∈J π ⊤ Aj = c j
The right-hand side coefficients in the dual problem are equal to the coefficients of the primal objective
function.
Example 2.4 Consider the linear program (slightly different from that of the previous example):
The dual will have two variables π1 and π2 because the primal problem has two constraints. The first dual variable
will be non-negative since it is assigned to an inequality, the second will be free since it is assigned to an equation.
There will be two dual constraints, the first an equation since it is assigned to a free variable x1 , the second will
be an inequality since it is assigned to a non-negative variable x2 . Finally, after swapping the coefficients of the
objective function with the ones on the right-hand side we obtain the dual as follows:
Theorem 2.5 (On Symmetry) The dual problem to the dual problem is equal to the primal problem.
Due to the Theorem on Symmetry (Theorem 2.5) it is more appropriate to speak about a “pair of primal-dual
problems”, rather than about a primal and dual problem.
It will be practical to use the following notation:
8
opt
MD is the set of optimal solutions to the dual problem
Theorem 2.6 (Weak Duality Theorem) For any x ∈ MP and any π ∈ MD , we have c⊤ x ≥ π ⊤ b.
It is also quite important to know that, if x and π are feasible solutions and the values of objective functions
are the same, then x and π are optimal.
Corollary 2.7 Let x ∈ MP and π ∈ MD (i.e., x and π are primal feasible and dual feasible) and c⊤ x = π ⊤ b.
Then x ∈ MPopt and π ∈ MD
opt
(i.e., x and π are optimal).
The unboundedness of the primal problem implies that the dual problem is infeasible, and vice versa.
The following claim is the most nontrivial and striking statement of the duality theory of LP.
min c⊤ x = max π ⊤ b.
Corollary 2.10 For every primal-dual pair exactly one of the possibilities indicated in the table below holds.
MPopt ̸= ∅ minx∈MP c⊤ x = −∞ MP = ∅
opt
MD ̸= ∅ ✓ ✗ ✗
⊤
maxπ∈MD π b = +∞ ✗ ✗ ✓
MD = ∅ ✗ ✓ ✓
If a linear program is special then it may be unnecessary to use the general Table 2.2 for finding the dual
problem. The special schemes in Table 2.3 are easily derived from Table 2.2 and may help us to find the dual more
quickly.
Ax ≥ b π ⊤ A ≤ c⊤ Ax = b π ⊤ A ≤ c⊤
A specific problem in duality theory is to find an optimal solution to the dual problem once an optimal solution
to the primal problem is known. The theorem below is helpful here but it also has far reaching consequences in
combinatorial optimisation.
9
If we compare this statement with Table 2.2 we see it effectively requires as a necessary and sufficient condition
for optimality that in each row of Table 2.2 on at least one side the constraint is satisfied with equality.
uj = (cj − π ⊤ Aj )xj ,
vi = πi (a⊤
i x − bi )
P P
for every i and j and u = j uj , v = i vi . It follows from Table 2.2 that uj ≥ 0 for all j and vi ≥ 0 for all i. We
also have
X X X X
u+v = cj xj − π ⊤ Aj x j + πi a ⊤
i x− πi bi
j j i i
X X
= cj xj − π ⊤ Ax + π ⊤ Ax − π i bi
j i
= c⊤ x − π ⊤ b
Example 2.12 For the linear program below construct its dual, solve the dual (for instance, graphically) and then
use duality theory to find a solution to the primal problem (recall that this means either to find an optimal solution,
or decide about infeasibility or unboundedness):
The dual is
max π1 + 2π2
s.t. − π1 + π2 ≤ 1
π1 + π2 ≤ 7
−5π1 − π2 ≤ −10
π1,2 ≥ 0
An optimal solution π ∗ = (3, 4)⊤ to the dual can be found graphically. Hence by the Strong Duality Theorem
the primal also has an optimal solution and we can find it using the Complementary Slackness Theorem as follows.
By this theorem the following conditions must be satisfied by feasible x and π for them to be optimal:
π1 (−x1 + x2 − 5x3 − 1) = 0
π2 (x1 + x2 − x3 − 2) = 0
(1 + π1 − π2 )x1 = 0
(7 − π1 − π2 )x2 = 0
(−10 + 5π1 + π2 )x3 = 0
10
Since the third dual inequality is satisfied by π ∗ strictly it follows that x3 = 0 at optimality. At the same time both
components of π ∗ are positive and so the first two conditions imply (after setting x3 = 0):
−x1 + x2 − 1 = 0
x1 + x2 − 2 = 0
⊤ ⊤
There is a unique solution 21 , 32 to this system and so x∗ = 21 , 32 , 0 . We can check that the values of both
objective functions (with arguments x∗ and π ∗ ) are equal to 11, which confirms that x∗ and π ∗ are indeed optimal.
This example shows a situation in which it is possible to solve a linear program easily using duality theory: it
arises when the dual problem can easily be solved and if an optimal solution exists, the Complementary Slackness
Theorem then enables us to find an optimal solution to the primal problem. This situation always arises when the
primal problem has only two constraints because then the dual has only two variables and therefore can be solved
graphically. But it should be stressed that this is not the only case when a solution to a linear program can be
found in this way.
In what follows we will show a much more substantial application of duality and specifically, of the Complemen-
tary Slackness Theorem.
11
Chapter 3
The primal-dual algorithm is a meta-algorithm, that is an algorithm designed to generate algorithms for linear
programs of special forms, in particular for a class of combinatorial optimisation problems that can be formulated
as linear programs. It is based on the duality theory of linear programming. In this chapter we derive this
algorithm from the Complementary Slackness Theorem (Theorem 2.11) and later on we will show how it can be
used to generate an algorithm, called ALPHABETA, for the transportation problem.
We will assume without loss of generality that the linear program (referred to as P ) we want to solve is in
standard form
min c⊤ x
s. t. Ax = b
x≥0
Ax = b
x≥0 π ⊤ A ≤ c⊤
By the Complementary Slackness Theorem (Theorem 2.11) a necessary and sufficient condition for x ∈ MP and
π ∈ MD to be optimal is that
(cj − π ⊤ Aj )xj = 0 = πi (a⊤
i x − bi ) (3.1)
are satisfied by all i and j. In the current setting Ax = b is satisfied by any x ∈ MP so the products on the right
in (3.1) are all 0. Hence (3.1) reduces to
(cj − π ⊤ Aj )xj = 0 (3.2)
opt
for all j. So, once we know a π ∈ MD all we need to do to find an optimal solution x to the primal is to find an
x ∈ MP that satisfies (3.2). Clearly, it is not reasonable, in general, to assume that we know an optimal solution
to the dual problem from the start. In the following we will build an algorithm, the primal-dual algorithm, which
requires a feasible solution π ∈ MD of the dual rather than an optimal solution to the dual to start with. However,
even this assumption is difficult, since finding a feasible solution to a linear program is in general equally difficult
as finding an optimal solution. All the discussion of the primal-dual algorithm that will follow is done under this
12
assumption. However, in many cases when the primal-dual algorithm is applied to combinatorial optimisation
problems, it is easily satisfied.
So, suppose now that a π ∈ MD is known and that we are looking for an optimal x ∈ MP . For the given π we
might not be able to find an x ∈ MP that satisfies (3.2) for all j. However, we will iteratively try to find a pair
(x∗ , π ∗ ) which satisfies (3.2) for all j starting with the given π ∈ MD .
Recall that π ∈ MD means π ⊤ A ≤ c⊤ and so (3.2) will be satisfied for all j for which π ⊤ Aj = cj . If, on the
other hand, π ⊤ Aj < cj then xj must be 0. Thus when solving the system Ax = b, x ≥ 0, we may remove all
variables xj where π ⊤ Aj < cj . We therefore set
J = {j | π ⊤ Aj = cj }
and solve the system Ax = b, x ≥ 0 with variables xj , j ∈ J only. That is we need to solve
X
aij xj = bi , i ∈ {1, . . . , m}
j∈J (3.3)
xj ≥ 0, j∈J
One of the important applications of linear programming is a method for solving linear systems of equations
with non-negativity constraints. This method (sometimes called the Two-Phase Method) introduces a new variable,
called auxiliary, in every equation and then minimises the sum of all auxiliary variables. This is what we will now
do to solve (3.3). More precisely, let us consider
Xm
a
min ξ = xi
i=1
X
a
s. t. aij xj + xi = bi , i ∈ {1, . . . , m}
(3.4)
j∈J
xai ≥ 0, i ∈ {1, . . . , m}
xj ≥ 0, j ∈ J
where xai stands for the i-th auxiliary variable. This linear program is called the restricted primal (abbr. RP ). The
set J is called the set of admissible indices or columns (or sometimes the set of admissible variables).
Proposition 3.1 The linear program (3.4) has an optimal solution and ξ min ≥ 0.
Proof. This linear program has a feasible solution (xj = 0, j ∈ J | xai = bi , i ∈ {1, . . . , m}), recall that b ≥ 0, and
the objective function is bounded below by 0. Therefore by Theorem 2.2 this LP has an optimal solution and the
optimal objective function value is non-negative.
Proposition 3.2 ξ min = 0 in (3.4) if and only if (3.3) has a feasible solution.
Proof. Suppose (xj , j ∈ J | xai , i ∈ {1, . . . , m}) is an optimal solution to (3.4) and ξ min = 0. Then all xai = 0 and
xj , j ∈ J is a solution to (3.3).
Conversely, if xj , j ∈ J is a solution to (3.3) then (xj , j ∈ J | xai = 0, i ∈ {1, . . . , m}) is a feasible and therefore
Pm
also optimal solution to (3.4) with ξ min = i=1 xai = 0.
It follows from these two propositions that if ξ min = 0 in (3.4) then an optimal solution for (P ) has been found.
Let us now suppose that ξ min > 0. It follows from the above propositions that the π ∈ MD with which we
started is not dual optimal. We will therefore seek a “correction” of this vector in a way that increases our chances
of obtaining a dual optimal vector. This will be done using an optimal solution to the dual of the restricted primal
problem (DRP ). It follows from (3.4) that this dual is
max w = π ⊤ b
⊤
s. t. π Aj ≤ 0, j∈J (3.5)
πi ≤ 1, i ∈ {1, . . . , m}
13
We assume here that it is not difficult to find an optimal solution to the DRP ; it will turn out to be the case in the
applications of the primal-dual algorithm to combinatorial optimisation problems. So let π be an optimal solution
to (3.5). By the Strong Duality Theorem (Theorem 2.9) we have
π ∗ ⊤ Aj ≤ cj , j ∈ {1, . . . , n}
or, equivalently
π ⊤ Aj + θπ ⊤ Aj ≤ cj , j ∈ {1, . . . , n}. (3.7)
Since π ⊤ Aj ≤ cj for all j and θ > 0 this requirement will automatically be satisfied whenever π ⊤ Aj ≤ 0. If on
the other hand π ⊤ Aj > 0 then we need to solve (3.7) for θ which yields
cj − π ⊤ Aj
θ≤ , if π ⊤ Aj > 0
π ⊤ Aj
or, equivalently, ( )
cj − π ⊤ Aj ⊤
θ ≤ min | π Aj > 0 .
π ⊤ Aj
For a while we will assume that π ⊤ Aj > 0 holds for at least one j and we will return to this question later on.
The value of θ should be as big as possible in order to get as close as possible to an optimal solution to the dual.
Therefore the choice is ( )
cj − π ⊤ Aj ⊤
θ = min | π Aj > 0 .
π ⊤ Aj
This value will be denoted by θ1 . Clearly θ1 > 0 because if π ⊤ Aj > 0 then j ̸∈ J and so π ⊤ Aj < cj . We also
have an increase of the dual objective function value since, see (3.6),
w∗ = π ⊤ b + θ1 π ⊤ b > π ⊤ b = w.
After finding the new dual feasible π ∗ a new iteration can start, π ∗ will be renamed to π, the set J re-defined,
etc. If necessary further iterations will follow and this process will stop when ξ min = 0. The corresponding vector
(xj ; j ∈ J) complemented by zero components of x is then an optimal solution of (P ), that is an optimal solution
is (xj , j ∈ J | xj = 0, j ̸∈ J). It can be proved [1] that this will happen after a finite number of iterations; this
proof is omitted here.
However, one point in the above procedure needs to be revisited, namely we need to clarify the case when
⊤
π Aj ≤ 0 for all j. This is necessary to discuss as in such a case θ1 is undefined. But the next theorem gives a
clear answer.
14
Theorem 3.3 (Infeasibility Criterion) If ξ min > 0 and π ⊤ Aj ≤ 0 for all j ̸∈ J then MP = ∅.
Proof. By Corollary 2.8 it is sufficient to prove that maxπ∈MD π ⊤ b = +∞. For any positive real number θ we define
π(θ) = π + θπ.
opt
If j ∈ J then π ⊤ Aj ≤ 0 because π ∈ MDRP ⊂ MDRP and so by the theorem assumption π ⊤ Aj ≤ 0 for all j. Also,
⊤
for any j we have π Aj ≤ cj since π ∈ MD . Hence for every j ∈ J and θ > 0 we have:
π(θ)⊤ Aj = π ⊤ Aj + θπ ⊤ Aj ≤ cj .
min c⊤ x
s. t. Ax = b
x≥0
2. Solve RP :
m
X
min ξ = xai
i=1
X
s. t. aij xj + xai = bi , i ∈ {1, . . . , m}
j∈J
xai ≥ 0, i ∈ {1, . . . , m}
xj ≥ 0, j ∈ J.
3. Let (xj , j ∈ J | xai , i ∈ {1, . . . , m}) be an optimal solution to RP . If ξ min = 0 then stop, (xj , j ∈ J) is an
optimal solution to P .
6. Set ( )
cj − π ⊤ Aj
θ1 = min | π ⊤ Aj > 0 .
π ⊤ Aj
and
π := π + θ1 π.
7. Go to 1.
15
Stop
ξ(x∗ ) = 0
π > Aj ≤ 0
∃j : π > Aj > 0 ∀j 6∈ J
Adjustment to π
Stop
and a dual feasible solution π = (0, 3)⊤ . We will solve this linear program by the primal-dual algorithm.
To find J we need to start with constructing the dual:
max π1 + 2π2
s.t. 2π1 + π2 ≤ 6
−4π1 + 5π2 ≤ 16
π1 − 5π2 ≤ − 15
and substitute π = (0, 3)⊤ in the left-hand sides to obtain π ⊤ Aj for j ∈ {1, 2, 3}. It will be practical to write the
results of the calculation in a table:
j cj π ⊤ Aj π ⊤ Aj π ∗ ⊤ Aj
1 6 3
2 16 15
3 -15 -15
16
Since ξ = xa1 + xa2 = 0 and the two equations would contradict each other, we have ξ min > 0 and so we need to solve
the DRP which is:
max π1 + 2π2
s.t. π1 − 5π2 ≤0
π1 ≤1
π2 ≤ 1.
An optimal solution to DRP can be found graphically: π = (1, 1)⊤ , and we can calculate π ⊤ Aj :
j cj π ⊤ Aj π ⊤ Aj π ∗ ⊤ Aj
1 6 3 3
2 16 15 1
3 -15 -15 -4
So ! ! !
∗ 0 1 1
π = +1· = .
3 1 4
j cj π ⊤ Aj π ⊤ Aj π ∗ ⊤ Aj
1 6 3 3 6
2 16 15 1 16
3 -15 -15 -4 -19
where again ξ = xa1 + xa2 . For testing whether ξ min = 0 we set xa1 = xa2 = 0 and try to solve the system
2x1 − 4x2 + = 1
x1 + 5x2 = 2
x1,2 ≥ 0.
13 3 ⊤ 13 3
⊤
from which we deduce that an optimal solution to P is x∗ =
It has a (unique) solution 14 , 14 14 , 14 , 0 and
f min = 9, where f (x) = 6x1 + 16x2 − 15x3 .
17
Part II
18
Chapter 4
4.1 Introduction
The max-flow problem is one of the fundamental combinatorial optimisation problems. It has a number of ap-
plications both within and outside combinatorial optimisation such as in scheduling, allocation of resources or
communication.
We will start with a very simple example. Suppose a company wants to ship the maximum possible amount of
oil per hour via the pipeline from station s (source) to station t (sink) in Figure 4.1. On its way from s to t, the oil
must pass through some or all of the stations a, b and c. The various arcs in Figure 4.1 represent pipes of different
diameters. The maximum number of barrels of oil (millions of barrels per hour) that can be pumped through each
pipe is indicated in rectangles at each pipe (“pipe capacity”).
In order to determine the maximum volume of oil per hour that can be transported via this pipeline we need
to take into account natural restrictions. One is of course the capacity of each individual pipe which cannot be
exceeded. The other one is the basic principle that what is entering a station must also leave this station (we
ignore any possible losses due to leakages etc). This principle is usually referred to as the conservation law. The
conservation law of course does not apply at the source and sink.
At the same time we need to introduce a measure that represents the volume of the oil transported and which
we wish to maximise. By definition it is the total amount leaving the source via any pipe incident with the source
(which, as we will prove later on, is the same as the total value entering the sink). In general we will use the words
nodes and arcs rather than “stations” and “pipes”. Nodes, arcs, arc capacities and a specification of the source and
4
2
5
s b t
2
4 1
19
a
4
2 2
2
5
s b t
1
2 1
1 2
4 1
4
2 2
2
5
s b t
2
2 2
2
4 1
20
a
4
2 2
2
5
s b t
1
1 1
1 2
4 1
t ∈ V has no outgoing arcs and b : E → R+ ∪ {+∞} (i.e., b is a function defined on the set of arcs with values in
R+ ∪ {+∞}. If (i, j) ∈ E then b(i, j) or just bij will be called the capacity of the arc (i, j). If E = {e1 , . . . , en },
then bi will stand for b(ei ).
Any function f : E → R+ ∪ {+∞} is called a flow on N if the following capacity constraints hold
0≤f ≤b (4.1)
21
a
e1 e4
s e3 t
e2 e5
with v in the row corresponding to the source and v ′ in the row corresponding to the sink. Since every column of
A contains exactly one 1 and one −1, if we add up all equations we get:
0 = v + v′ ,
22
from which we find that v ′ = −v. Hence the last equation in (4.6) now yields what was intuitively clear at the
beginning of this section, namely that the total flow value on arcs leaving the source (that is the value of the flow) is
equal to the total flow value on arcs entering the sink. Hence we can write (4.7) with all variables on the left-hand
side as follows:
Af + dv = 0
where d is the constant vector
..
.
−1
.
.
.
1
..
.
with −1 in the row corresponding to the source and +1 corresponding to the sink and all other entries 0. It remains
to recall also the capacity conditions on f :
0 ≤ f ≤ b,
and the result below now follows.
Theorem 4.1 (MFP as LP) If A is a node-arc incidence matrix of the network N = (V, E, s, t, b) then f : E →
R+ is a flow in N if and only if it satisfies the following conditions:
Af + dv = 0
f ≤b
f ≥0
Hence the max-flow problem in N is equivalent to the linear program
max v
s.t. Af + dv = 0
(4.8)
f ≤b
f ≥0
It may be useful to realise the following properties (recall that m is the number of nodes and n is the number
of arcs in N ):
The linear program (4.8) has n + 1 variables (f1 , . . . , fn , v) and m + n non-trivial constraints (m equations
and n inequalities).
f = 0 (with v = 0) is always a feasible solution to (4.8), and thus it is a flow in any network.
Corollary 4.2 A max-flow exists in every network with finite capacities of arcs leaving the source (or entering the
sink).
Proof. Since f = 0 is always a feasible flow, by the Fundamental Theorem of Linear Programming (Theorem 2.2)
it is sufficient to show that the objective function in (4.8) is bounded. This is easily seen since by the assumption
of the corollary we have X X
v = v(f ) = f (ej ) ≤ b(ej ) < +∞,
ej is leaving s ej is leaving s
23
a
e3 2 3
e5
1
s b t
e1
e2 e6
e4 2
2 1
Example 4.3 Let N be the network of Figure 4.7. The corresponding linear program is
max v
s.t. f1 + f2 = v
− f3 + f5 = 0
−f1 + f3 − f4 = 0
− f2 + f4 + f6 = 0
− f5 − f6 = −v
f ≤ (1, 2, 2, 2, 3, 1)⊤
f ≥ (0, 0, 0, 0, 0, 0)⊤
max v
f1
1 1
−1
f2
−1 1 ..
f3 .
s.t. −1 1 −1 + v = 0
f .
.
4
−1 1 1 .
f5
−1 −1 1
f6
f ≤ (1, 2, 2, 2, 3, 1)⊤ ,
f ≥ (0, 0, 0, 0, 0, 0)⊤ .
24
a
3
1 2
1
1
s b t
1
1 1
0 2
2 1
3
2 2
2
1
s b t
0
2 0
2 2
2 1
We start with two examples that explain the idea in some detail. In each an augmenting path is indicated by
dotted lines. In Figure 4.8 it is the path (s, c, b, a, t) on which the capacities of the arcs exceed the current flow
values by 1, 2, 1 and 2, so it is possible to transport extra flow along this path of value min{1, 2, 1, 2} = 1. By
adding 1 to the current flow on every arc on this path we obtain a new flow whose value will be greater by 1.
In Figure 4.9 no such path exists, nevertheless it is possible to augment the current flow of value 2 along the
indicated path, which, unlike the previous path, contains a backward arc (c, b). If the flow values are increased by
1 on the forward arcs, that is on (s, b) and (c, t), and decreased by the same value on the backward arc (c, b) then
the new function will again be a flow (check the capacity constraints and the conservation law!) and its value will
be greater by 1.
The above example demonstrates the need to consider also s − t paths with backward arcs. We will now give
precise definitions of the concepts which are important in the Ford and Fulkerson algorithm.
Given a network N = (V, E, s, t, b), and a flow f , an augmenting path p is an s − t path in the undirected graph
resulting from (V, E) by ignoring arc directions, with the following properties:
1. For every forward arc (i, j) we have f (i, j) < b(i, j). That is, forward arcs are unsaturated.
2. For every backward arc (j, i) we have f (j, i) > 0. That is backward arcs have non-zero flow.
The main purpose of an augmenting path is to find a new flow whose value is greater than that of the current
flow. This can be done as follows:
increase the flow value on every forward arc but still satisfying the capacity constraint and
25
decrease the flow value on every backward arc but keeping it non-negative.
Each of these changes should be by the same value so that the conservation law remains intact. For this common
value we can take any positive real number not exceeding the remaining positive capacity of any forward arc and
not exceeding the (positive) flow value on any backward arc. The maximum such value for a fixed augmenting path
p therefore is
b(i, j) − f (i, j) if (i, j) is forward,
δ = min (4.9)
(i, j) on p f (j, i) if (j, i) is backward.
Next we need to explain how to find an augmenting path in a systematic way. We may suppose that a flow in
the network is already known (if necessary we start with the zero flow). The method for finding an augmenting
path is a labelling algorithm. Labelling starts from the source, proceeds through other nodes (not necessarily all)
and stops as soon as the sink is reached. The constructed labels should provide an efficient way of identifying an
augmenting path and the augmenting value δ. Therefore each label, say label(x) for a node x, will consist of two
parts, the first giving information about the previous node from which x has been reached, and the second a value
(positive real) of extra flow that can be brought from the source to x. More precisely we define
where L1[x] is the node from which node x has been reached and L2[x] is the amount of extra flow that can be
brought from s to x. The labelling starts with the source:
reflecting that s has no predecessors, and continues by propagating the labels from node to node with the following
rules:
the label is propagated from labelled nodes to adjacent nodes (this process is called scanning) subject to
precisely specified conditions, see below;
scanning of a node must be carried out completely (that is to all unlabelled adjacent nodes that qualify,
see below) before another node is scanned;
Scanning is the process of propagating the label from a node, say x, to another node, say y, that satisfies the
following conditions:
y is unlabelled;
If (x, y) is an arc then require f (x, y) < b(x, y). The result of the scanning is label(y) := (L1[y], L2[y]), where
L1[y] = x and
L2[y] := min{L2[x], b(x, y) − f (x, y)}
If f (x, y) = b(x, y) then y is not scanned in this scanning step (it might be scanned later from a different
node).
26
f e
8 6
8 13
(a, 5)
g 7 18
x d
3 18
12 4
7 6
(w, 8) 20 10
a c
(u, 15)
If (y, x) is an arc then require f (y, x) > 0. The result of scanning is label(y) := (L1[y], L2[y]), where
L1[y] = −x and
L2[y] := min{L2[x], f (y, x)}.
If f (y, x) = 0 then y is not scanned in this scanning step (it might be scanned later from a different node).
In order to prevent any re-scanning of a node within an iteration we introduce a list called LIST which will help
us to keep track of the nodes that can be scanned. That is at any stage LIST is the set of nodes that have been
labelled but not scanned yet.
Example 4.4 Figure 4.10 shows a section of a network before scanning node x, Figure 4.11 shows the same section
after the scanning.
LIST = {s}
Either t gets labelled: in this case reconstruct an augmenting path backwards from t using L1’s and augment
the current flow along the augmenting path by δ = L2[t], that is, for every arc ej on the augmenting path the
change is carried out as follows:
f + δ, if e is a forward arc,
j j
fj 7→ fj′ =
fj − δ, if ej is a backward arc,
Note that before the next run of the labelling algorithm (with the new flow) all labels must be removed.
27
(−x, 5) (x, 5)
f e
8 6
8 13
(−x, 3) (a, 5)
g 7 18
x d
3 18
12 4
7 6
(w, 8) 20 10 (x, 2)
a c
(u, 15)
Or LIST becomes empty: in this case the current flow is optimal (this will be proved later on). When we
achieve optimality after an application of FFA to MFP, we say we are at non-breakthrough (NBT).
28
Example 4.6 We find a max-flow in the network of Figure 4.12 using the FFA starting from the flow given by the
free-standing numbers.The first iteration of the algorithm is recorded in the tab1e below.
x ∈ LIST label(x)
s (∅, +∞)
b (s, 2)
c (s, 1)
t (c, 1)
δ=1
p = (s, c, t)
The new flow is in Figure 4.13 and the second iteration is:
x ∈ LIST label(x)
s (∅, +∞)
b (s, 2)
c (−b, 2)
t (c, 1)
δ=1
p = (s, b, c, t)
The new flow is in Figure 4.14 and the third iteration is:
x ∈ LIST label(x)
s (∅, +∞)
b (s, 1)
c (−b, 1)
LIST = ∅
The non-breakthrough has been reached, the current flow is maximal, its value is 4.
Next we aim to prove that the algorithm we have deduced intuitively is indeed a correct solution method for
solving the max-flow problem; then we will also discuss its finite termination.
1. V = W ∪ W ,
29
a
3
2 2
2
2
s b t
0
2 0
2 2
3 2
Figure 4.12: Initial flow of the Network Flow Problem of Example 4.6
3
2 2
2
2
s b t
0
3 1
2 2
3 2
Figure 4.13: Flow after iteration 1 of the Network Flow Problem of Example 4.6
3
2 2
2
2
s b t
1
3 2
1 2
3 2
Figure 4.14: Flow after iteration 2 of the Network Flow Problem of Example 4.6
30
b(x, y)
b(x, y)
s t
b(x, y)
ignore
ignore
W W
a
3 8
2 5
s 3 3 0 6 t
5 2
6 2
b
Figure 4.16
2. W ∩ W = ∅ and
3. s ∈ W , t ∈ W .
Note that the first two conditions together are usually referred to by saying that (W, W ) is a partition of V .
The capacity of a cut (W, W ) is:
X
c (W, W ) = {b(x, y) : x ∈ W, y ∈ W }.
Example 4.7 We list all cuts and their capacities in the network of Figure 4.16.
W W c (W, W )
{s} {a, b, t} 3+6=9
{s, a} {b, t} 6 + 6 + 8 = 20
{s, a, b} {t} 8 + 2 = 10
31
Lemma 4.8 Let N = (V, E, s, t, b) be a network. The following relation holds for any cut (W, W ) and flow f in
N: X X
v= f (x, y) − f (x, y). (4.10)
x∈W, y∈W x∈W , y∈W
(x,y)∈E (x,y)∈E
Proof. Let (W, W ) be a cut and f a flow in N = (V, E, s, t, b). Let v = v(f ), then by the conservation law at every
node x we have:
X X
f (x, y) − f (y, x) = 0, if x ∈ V \{s, t};
y : (x,y)∈E y : (y,x)∈E
X
f (s, y) = v,if x = s;
y : (x,y)∈E
X
− f (y, t) = −v,if x = t.
y : (y,t)∈E
Let us add up all equations in the above system that correspond to x ∈ W . Since s ∈ W and t ∈
/ W we get
(writing the sum of the right-hand sides first):
X X X X
v= f (x, y) − f (y, x).
x∈W y : (x,y)∈E x∈W y : (y,x)∈E
We can expand and then simplify this expression using the fact that (W, W ) is a partition of V :
X X X X X X X X
v= f (x, y) + f (x, y) − f (y, x) − f (y, x).
x∈W y∈W x∈W y∈W x∈W y∈W x∈W y∈W
(x,y)∈E (x,y)∈E (y,x)∈E (y,x)∈E
By swapping the symbols x and y in the third term and using the commutative law we see that the first and
third terms are equal and can therefore be omitted. By the same swap in the fourth term and again using the
commutative law we get exactly the expression in the lemma statement.
We can illustrate this lemma on the previous example, where we calculate the two terms on the right-hand side
of (4.10) and compare their difference with the value of the current flow v(f ) = 7:
P P
W (W, W ) f (x, y) f (x, y)
x∈W, y∈W x∈W , y∈W
(x,y)∈E (x,y)∈E
{s} {a, b, t} 7 0
{s, a} {b, t} 10 3
{s, b} {a, t} 7 0
{s, a, b} {t} 7 0
(b) the value of any max-flow is equal to the capacity of any min-cut.
32
Proof. (a) Let (W, W ) be a cut and f a flow in N = (V, E, s, t, b). By Lemma 4.8 and using the capacity constraints
we have:
X X
v(f ) = f (x, y) − f (x, y)
x∈W, y∈W , x∈W , y∈W,
(x,y)∈E (x,y)∈E
X X
≤ b(x, y) − 0
x∈W, y∈W , x∈W , y∈W,
(x,y)∈E (x,y)∈E
= c (W, W ).
(b) Let f ∗ be a max-flow in N (it exists by Corollary 4.2) and let us start the Ford and Fulkerson algorithm
with f ∗ as the initial flow. Then the algorithm will stop in the first iteration with a non-breakthrough. Let us
denote
W ∗ = {x ∈ V : x is labelled at NBT},
W ∗ = V \W ∗ .
Then s ∈ W ∗ , t ∈ W ∗ and so (W ∗ , W ∗ ) is a cut in N . By Lemma 4.8 and the rules of the Ford and Fulkerson
algorithm (see Figure 4.17) we have:
X X
v(f ∗ ) = f ∗ (x, y) − f ∗ (x, y)
x∈W ∗ , y∈W ∗ , x∈W ∗ , y∈W ∗ ,
(x,y)∈E (x,y)∈E
X X
= b(x, y) − 0
∗
x∈W , y∈W ∗ , x∈W ∗ , y∈W ∗ ,
(x,y)∈E (x,y)∈E
∗
= c (W , W ∗ ).
It remains to show that c (W ∗ , W ∗ ) is indeed the smallest Let (W, W ) be any cut. Then by (a) we have:
c (W, W ) ≥ v(f ∗ ) = c (W ∗ , W ∗ )
Theorem 4.10 (Ford and Fulkerson, 1956) If the Ford and Fulkerson algorithm terminates it does so at a
max-flow.
Proof. Let f ∗ be the flow reached at termination, thus a non-breakthrough has occurred. (Note that the next eight
lines are an exact copy of the lines in the previous proof, but not the end of the proof).
Let us denote
W ∗ = {x ∈ V : x is labelled at NBT},
W ∗ = V \W ∗ .
Then s ∈ W ∗ , t ∈ W ∗ and so (W ∗ , W ∗ ) is a cut in N . By Lemma 4.8 and the rules of the Ford and Fulkerson
33
f =b
f =b
s t
f =b
f =0 ∗
W∗ W
= c (W ∗ , W ∗ ).
So by Part (a) of the Max-Flow Min-Cut Theorem (Theorem 4.9) we now have for any flow f and any cut
(W, W ) in N :
c(W, W ) ≥ v(f ∗ ) = c (W ∗ , W ∗ ) ≥ v(f ),
Theorem 4.11 Let N = (V, E, s, t, b) be a network, where b is an integer function on E. Then the Ford and
Fulkerson algorithm will find an integer max-flow, and therefore terminate, after a finite number of steps if it starts
with an integer flow.
Proof. Suppose that the algorithm starts to run from an integer flow f . The increment δ is calculated by (4.9) as
the minimum of the positive expressions of the form b(i, j), f (i, j) and f (j, i). Each of these is an integer since both
f and b are integer functions. Hence δ is a positive integer and therefore δ ≥ 1. If the flow after augmentation is
denoted as f ′ then v(f ′ ) = v(f ) + δ ≥ v(f ) + 1. So after k iterations (augmentations) the flow increases by at least
34
k. At the same time a max-flow exists by Corollary 4.2. If its value is v ∗ then it follows that the algorithm will
reach a max-flow (which will obviously be integer) and therefore terminate after at most v ∗ iterations.
Corollary 4.12 Let N = (V, E, s, t, b) be a network, where b is a rational function on E. Then the max-flow
problem on N can be solved by the Ford and Fulkerson algorithm in a finite number of steps.
Proof. We show that the rational case can easily be converted to the integer case, which then can be solved by the
Ford and Fulkerson algorithm in a finite number of steps due to the theorem. Suppose that
p(i, j)
b(i, j) = , for any (i, j) ∈ E,
q(i, j)
where p(i, j) and q(i, j) are integers, q(i, j) > 0. Let D = lcm(i,j)∈E q(i, j) . Then the max-flow problem
max v
s.t. Af + dv = 0
f ≤b
f ≥0
is equivalent to
max v ′
s.t. Af ′ + dv ′ = 0
f ′ ≤ b′
f′ ≥ 0
where f ′ = Df , v ′ = Dv, b′ = Db. This is a max-flow problem, where the capacities function b′ is integer. By the
theorem this max-flow problem can be solved in a finite number of steps. If the max-flow in the second problem is
f ′ then the max-flow f of the original problem is
f′
f= .
D
Note that a simple modification of the just presented algorithm may guarantee finite termination for networks
with arbitrary real capacities (Karzanov’s algorithm [1]). It is avoided here as we need the basic version for the
application of the primal-dual algorithm.
After the discovery of the Ford and Fulkerson algorithm many other algorithms for the max-flow problem have
been found. There was a tireless effort to improve the speed of these algorithms measured by the number of
operations to termination in the worst case (called computational complexity see Chapter 7). Among the fastest is
that by Goldberg and Tarjan (1998) whose computational complexity is O(m2 · n), where as before m = |V | and
n = |E| .
It should be noted that this section provides a very basic introduction to the theory of max-flows. There are
many modifications that have not been presented here, for instance the case when the capacities have a general
lower bound rather than zero or the case of multi-source, multi-sink networks. The monograph [4] is recommended
for further reading.
35
Chapter 5
Transportation problem
Unit transportation costs cij (from Ai to Bj ) for every i ∈ {1, . . . , m} and j ∈ {1, . . . , n}.
Producers Customers
(capacities) (demands)
(a1 ) A1 B1 (b1 )
(a2 ) A2 B2 (b2 )
c24
(a3 ) A3 B3 (b3 )
x24 =?
(a4 ) A4 B4 (b4 )
(a5 ) A5 B5 (b5 )
And the task is to find the amount of product xij to be transported from Ai to Bj for every i ∈ {1, . . . , m} and
j ∈ {1, . . . , n} (transportation plan) which minimises the total cost of the transport.
It will be practical to store the input data in a table called the transportation table. It has the following form:
36
bj
..
.
ai ······ cij ······
..
.
This table will also be used for making a record of the work of the solution method and for writing the output,
where the quantities xij (initial, current and final) replace cij . Hence the correspondence between the graph of
Figure 5.1 and such a table is as follows:
Table Graph
Cell (i, j) Arc Ai Bj
Row i Producer Ai
Column j Customer Bj
Example 5.1 Powerco has 3 electrical power plants that supply power for 4 cities. The table below shows capacities
of the plants in 106 kWh, peak power demands of the cities in 106 kWh and the costs (in GBP 1,000) of sending
106 kWh from the plants to the cities. The task is to find the distribution of the electricity minimising the cost and
meeting each city’s peak power demand.
Demands → 45 20 30 30
↓ Capacities
35 8 6 10 9
50 9 12 13 7
40 14 9 16 5
45 20 30 30
35
50
40
so that their row and column sums are equal to the values given in the row/column headings. In the absence of any
solution method we could find a feasible solution using a greedy approach, that is repeatedly using a most attractive,
feasible link between the power stations and cities until all power is distributed. The cheapest link is between A3
and B4 , so we may suggest to transport the maximum amount (which is 30) via this link. This will reduce the
capacity of A3 and demand of B4 by 30, so our situation before the next step may be shown as follows:
45 20 30 0
35 ×
50 ×
10 30
37
where × indicates that the link is unavailable for the next steps. Now the cheapest available link is A1 B2 and the
maximum we can transport via this link is 20. If we continue in this way, we obtain the following transportation
plan (the original capacities and demands have been reinstated):
45 20 30 30
35 15 20 × ×
50 30 × 20 ×
40 × × 10 30
The rest of this chapter deals with developing a solution method for which we will prove that it always produces
an optimal solution to the transportation problem. It is called the Alphabeta algorithm (ALPHABETA) and it
is the result of the application of the Primal-Dual Algorithm (Algorithm 3.4) to the transportation problem. In
order to do this we first need to formulate the transportation problem as a linear program. We will do this using
the terminology and symbols introduced at the beginning of this chapter. Since cij is the unit cost of transport
between Ai and Bj and xij is the (initially unknown) amount of units transported between Ai and Bj , the cost of
the transport from Ai to Bj is cij xij . Hence the total cost of the transportation plan is
m X
X n
cij xij .
i=1 j=1
m X
X n
min cij xij
i=1 j=1
n
X
s. t. xij = ai , i ∈ {1, . . . , m}
j=1 (5.1)
Xm
xij = bj , j ∈ {1, . . . , n}
i=1
xij ≥ 0, i ∈ {1, . . . , m}, j ∈ {1, . . . , n}
Here we can obviously assume without loss of generality that ai > 0, i ∈ {1, . . . , m} and bj > 0, j ∈ {1, . . . , n}.
The matrix of the system in (5.1) is a sparse 0 − 1 matrix, that is all entries are zeros and ones only but there
are relatively “few” ones. This is illustrated in the following example, from which one can also see the very special
structure of this matrix, sometimes also called the transportation matrix .
Example 5.2 We write the transportation problem and then its matrix (with indicated structure) when m = 2 and
n = 3:
c11 x11 + c12 x12 + c13 x13 + c21 x21 + c22 x22 + c23 x23 → min
x11 + x12 + x13 = a1
x21 + x22 + x23 = a2
x11 + x21 = b1
x12 + x22 = b2
x13 + x23 = b3
xij ≥ 0 ∀i, j
38
The matrix of the system is:
1 1 1
1 1 1
1 1
1 1
1 1
Before we develop a solution method for transportation problems we need to clarify basic questions of existence
of feasible and optimal solutions.
Theorem 5.3 If a transportation problem has a feasible solution then it also has an optimal solution.
Proof. Every transportation problem is a linear program for which there are only three possibilities by the Funda-
mental Theorem of Linear Programming (Theorem 2.2). Since feasibility is assumed we only need to eliminate the
possibility that the problem is unbounded. This is easily seen from the following:
m X
X n m X
X n
cij xij ≥ − |cij | · |xij |
i=1 j=1 i=1 j=1
Xm X n
=− |cij | · xij
i=1 j=1
Xm X n
≥− |cij | · ai ,
i=1 j=1
Proof. Let (xij ) be a feasible solution to (5.1). Let us add up separately the first subsystem and then the second
subsystem. Using the commutative and associative laws we get
m
X m X
X n n X
X m n
X
ai = xij = xij = bj .
i=1 i=1 j=1 j=1 i=1 j=1
If (5.2) is not satisfied then it is possible to deal with a closely related balanced transportation problem by
Pm Pn Pm Pn
introducing a dummy producer (if i=1 ai < j=1 bj ) or dummy customer (if i=1 ai > j=1 bj ).
Details can be found in [1]. This discussion is skipped here as we will focus on the use of the primal-dual algorithm
for the construction of ALPHABETA. So everywhere in what follows we will assume that the transportation problem
is balanced and thus it has both a feasible and optimal solution.
39
5.2 An application of the primal-dual algorithm: algorithm ALPHA–
BETA
The primal-dual algorithm starts with a linear program in standard form (we referred to this problem as (P )). In
the present case this problem is (5.1), which we repeat here for ease of reference:
Xm X n
min cij xij
i=1 j=1
n
X
s. t. xij = ai , i ∈ {1, . . . , m}
j=1 (P )
Xm
xij = bj , j ∈ {1, . . . , n}
i=1
xij ≥ 0, i ∈ {1, . . . , m}, j ∈ {1, . . . , n}
Note that the right-hand sides are positive without loss of generality so that the first assumption of the PDA is
satisfied. Next we need to construct the dual. For a better understanding we find the dual first in an example.
Example 5.5 We find the dual to the problem discussed in Example 5.2. Recall that the dual variables are assigned
to the primal constraints. In our case (see Example 5.2) there are two types of constraints (first two equations and
the remaining three equations) and therefore there will be two types of dual variables, we will call them α1 , α2 and
β1 , β2 , β3 . The first dual constraint is assigned to the first column in the primal problem (corresponding to x11 )
which has only two ones and all other entries are zero. The two ones are in the rows assigned to α1 and β1 , so the
LHS will just be α1 + β1 . This constraint will be an inequality since x11 ≥ 0 and the RHS is the coefficient at x11
in the primal objective function, that is c11 . Thus the first constraint is α1 + β1 ≤ c11 . All dual variables are free as
they are all assigned to equations and the coefficients of the dual objective function are taken from the right-hand
sides of the primal problem. Hence the dual is
a1 α1 + a2 α2 + b1 β1 + b2 β 2 + b3 β3 → max
α1 + β1 ≤ c11
α1 + β2 ≤ c12
α1 + β3 ≤ c13
α2 + β1 ≤ c21
α2 + β2 ≤ c22
α2 + β3 ≤ c23
It should be clear from this example that the dual to (P ) is
m
X n
X
max w= ai αi + bj βj
i=1 j=1 (D)
s. t. αi + βj ≤ cij , i ∈ {1, . . . , m},j ∈ {1, . . . , n}
We know that the primal-dual algorithm can only be used if a dual feasible solution is known or can be easily
found. This is satisfied here as for a dual feasible solution we can always take
αi = 0, i ∈ {1, . . . , m}
βj = min cij , j ∈ {1, . . . , n}.
i∈{1,...,m}
It will be convenient to denote I = {1, . . . , m} and J = {1, . . . , n}. We will usually shorten the reference to Ai
to just i, similarly Bj to j. The construction of the dual can graphically be interpreted in Figure 5.2 by assigning
variables αi to every i ∈ I and βj to every j ∈ J so that αi + βj ≤ cij for (i, j).
40
j βj
αi + βj ≤ cij
αi i
The next step in the primal-dual algorithm is the construction of the set of admissible indices and the restricted
primal problem. The set of admissible indices clearly is
IJ = {(i, j) ∈ I × J | αi + βj = cij },
where the double letter IJ (rather than just J) indicates that these indices are two-dimensional, the first being
from I and the second from J. Hence the restricted primal is
m+n
X
a
min ξ = xi
i=1
X
a
s. t. xij + xi = ai , i ∈ {1, . . . , m}
j:(i,j)∈IJ
X (RP )
xij + xam+j = bj , j ∈ {1, . . . , n}
i:(i,j)∈IJ
a
xi ≥ 0, i ∈ {1, . . . , m + n}
x ≥ 0, (i, j) ∈ IJ
ij
In the next step of the primal-dual algorithm we need to find out whether ξ min is positive or zero; note that if
ξ min > 0 then we are not actually interested in an optimal solution of (RP ). Before (RP ) is solved we simplify it
to a problem that will be denoted (RP ′ ), which will turn out to be a max-flow problem; we will also show a simple
link between optimal solutions to (RP ) and (RP ′ ).
The simplification of (RP ) is based on the elimination of auxiliary variables xai and xam+j . They can easily be
eliminated from the equations in (RP ) by omitting them and replacing the equations by inequalities. However,
they also appear in the objective function. To eliminate them we add up all equations in (RP ) and get:
m
X X n
X X m
X n
X m
X n
X
xij + xij + xai + xam+j = ai + bj .
i=1 j:(i,j)∈IJ j=1 i:(i,j)∈IJ i=1 j=1 i=1 j=1
P
The first two terms are the same and each can be written as (i,j)∈IJ xij , the third and fourth together give ξ,
hence we have
Xm n
X X
ξ= ai + bj − 2 xij . (5.3)
i=1 j=1 (i,j)∈IJ
Pm Pn P
Since i=1 ai + j=1 bj is a constant, the minimisation of ξ is equivalent to the maximisation of (i,j)∈IJ xij , so
41
Admissible
arcs only
B1
∞
A1 ∞
B2 b1
∞
a1
A2 ..
. b2
a2
bj
s .. Bj t
.
ai ∞
bn−1
Ai ∞ ..
.
am
..
. bn
∞ Bn−1
Am
∞
Bn
the problem (RP ′ ) after the elimination of xai , which is equivalent to (RP ), is:
X
max g(x) = xij
(i,j)∈IJ
X
s. t. xij ≤ ai , i ∈ {1, . . . , m}
j:(i,j)∈IJ (RP ′ )
X
xij ≤ bj , j ∈ {1, . . . , n}
i:(i,j)∈IJ
xij ≥ 0, (i, j) ∈ IJ
We will now show that (RP ′ ) is a max-flow problem. The underlying network is shown in Figure 5.3. Note that
the arcs between Ai and Bj are included if and only if they are admissible; we will set their capacities to +∞.
Proof.
42
(a) Let f be a flow in N and i ∈ I, then by the conservation law at node i we have:
X X
xij = f (i, j) = f (s, i) ≤ ai .
j:(i,j)∈IJ j:(i,j)∈IJ
P
Similarly it can be shown that i:(i,j)∈IJ xij ≤ bj for any j ∈ J. Hence x defined by (5.4) is a feasible solution
of (RP ′ ) and
X X X X X
g(x) = xij = f (i, j) = f (i, j) = f (s, i) = v(f ).
(i,j)∈IJ (i,j)∈IJ i∈I j:(i,j)∈IJ i∈I
(b) Let f be defined on the set of arcs of N as described in the lemma statement. In order to prove that f is a flow
we need to verify the capacity constraints at all arcs and the conservation law at all nodes. The capacities
P
of the (i, j) arcs are +∞ so nothing needs to be done there. If i ∈ I then f (s, i) = j:(i,j)∈IJ xij ≤ ai
P
and if j ∈ J then f (j, t) = i:(i,j)∈IJ xij ≤ bj , which confirms that all capacity constraints are satisfied. If
P P
i ∈ I then f (s, i) = j:(i,j)∈IJ xij = j:(i,j)∈IJ f (i, j) which confirms the conservation law at i. Similarly,
P P P
f (j, t) = i:(i,j)∈IJ xij = i:(i,j)∈IJ f (i, j) verifies this law for any j ∈ J. Finally, v(f ) = i∈I f (s, i) =
P P
i∈I j:(i,j)∈IJ xij = g(x).
Lemma 5.6 enables us to easily check whether ξ min is positive or zero: Since the transportation problem is
balanced, by (5.3) using this lemma we have
X X
ξ=2 ai − 2g(x) = 2 ai − 2v(f ).
i∈I i∈I
Hence X
ξ=0 ⇔ v(f ) = ai .
i∈I
P
Since f (s, i) ≤ ai for every i ∈ I, and v(f ) = i∈I f (s, i) we have equivalently
that is ξ = 0 if and only if the current flow saturates all arcs leaving the source.
So we can summarise how to solve (RP ): We construct the network of Figure 5.3 and find a max-flow using the
Ford and Fulkerson algorithm. At the end (that is at non-breakthrough) there are two possibilities:
1. The max-flow f saturates all arcs leaving the source, that is f (s, i) = ai for every i ∈ I. In this case an
optimal solution x = (xij ) to the transportation problem has been found and can be described as follows:
f (i, j), if (i, j) ∈ IJ;
xij = (5.5)
0, else.
2. The max-flow f does not saturate all arcs leaving the source, that is f (s, i) < ai for some i ∈ I. In this
case the primal-dual algorithm continues by finding an optimal solution to the dual of the restricted primal
(DRP ), (α, β) and then amends the current (α, β) using (α, β).
So the next question we need to deal with is that of finding (α, β). To do this we define
43
Figure 5.4: At non-breakthrough some arcs do not exist
Lemma 5.7 If the Ford and Fulkerson algorithm is applied to the network of Figure 5.3 then the implication
i ∈ I∗ ⇒ j ∈ J∗
Proof. The capacity of every arc (i, j) ∈ IJ is +∞. Since i ∈ I ∗ , node i was scanned during the run of the labelling
algorithm in the last iteration and since f (i, j) is strictly less than the capacity of (i, j) the node j was labelled
during the scan of i or already was labelled. In any case j ∈ J ∗ .
We are now ready to prove the key statement for ALPHABETA.
Theorem 5.8 If the Ford and Fulkerson algorithm is applied to the network of Figure 5.3 then at non-breakthrough
an optimal solution to DRP is given by (αi , βj ), where
1 : i ∈ I∗ −1 : j ∈ J∗
αi = and βj =
−1 :i∈ / I∗ 1 :j∈ / J ∗.
Proof.
(i) First we show that (α, β) is feasible to the DRP . For this we need to construct the DRP (note that there
is no need to use the DRP outside this proof). Similarly as (D) it will have two type of variables forming
vectors α and β. The constraints corresponding to variables xij , (i, j) ∈ IJ in RP will have zero right-hand
sides, since such are the respective coefficients of the objective function in RP at xij . There will be extra
constraints corresponding to the auxiliary variables xai with trivial left-hand sides and the right-hand sides
equal to 1. It may be useful to realise that the objective function in RP has all coefficients 0 or 1 and can be
44
seen as
X m+n
X
ξ= 0 · xij + 1 · xai .
(i,j)∈IJ i=1
All constraints in (DRP ) are satisfied by (α, β) trivially except for the case when αi + βj = 1 + 1 = 2, which
occurs when i ∈ I ∗ and j ∈
/ J ∗ . But this case is excluded for all (i, j) ∈ IJ by Lemma 5.7.
(ii) Next we show that (α, β) is optimal to DRP , see Figure 5.5. By a corollary of the weak duality theorem
(Corollary 2.7) it is sufficient to show that
where f is a max-flow at non-breakthrough. From the definition of the value of a flow we also have
X X X
v(f ) = f (s, i) = f (s, i) + f (s, i). (5.9)
i∈I i∈I ∗ / ∗
i∈I
However,
f (s, i) = ai (5.10)
/ I ∗ because at non-breakthrough all scanning is finished and if f (s, i) < ai , then node i would
for every i ∈
have been labelled during the scan of s.
Let us consider the term i∈I ∗ f (s, i) and fix i ∈ I ∗ . Then by the conservation law
P
X X X X
f (s, i) = f (i, j) = f (i, j) + f (i, j) = f (i, j), (5.11)
j∈J j∈J ∗ / ∗
j ∈J j∈J ∗
(i,j)∈IJ (i,j)∈IJ (i,j)∈IJ (i,j)∈IJ
where the last term is in fact zero because if f (i, j) > 0 for some i then i would have been labelled during the
scan of j along the (backward) arc (i, j). Thus we get:
X
bj = f (i, j). (5.12)
i∈I ∗
(i,j)∈IJ
45
We will now substitute from (5.10), (5.11) and (5.12) in (5.9) to obtain (using also the commutative and
associative laws):
X X X
v(f ) = ai + f (i, j)
/ ∗
i∈I i∈I ∗ j∈J ∗
(i,j)∈IJ
X X X
= ai + f (i, j)
/ ∗
i∈I j∈J ∗ i∈I ∗
(i,j)∈IJ
X X
= ai + bj .
/ ∗
i∈I j∈J ∗
which together with (5.7) confirms that (5.6) holds and thus (α, β) is indeed an optimal solution.
46
opt
Once (α, β) ∈ MDRP is found the next step in the primal-dual algorithm is checking whether αi + βj > 0 for
at least one pair (i, j) ∈/ IJ. However, the negative answer to this question would mean that the transportation
problem is infeasible, which is impossible by the assumption that this problem is balanced and so by Theorem 3.3
feasible. Therefore this check can be skipped.
The formula for θ1 gets the following form in the transportation problem:
cij − αi − βj
θ1 = min αi + βj > 0
αi + βj
(5.13)
cij − αi − βj
= min i ∈ I ∗, j ∈
/ J∗
2
The update of (α, β) using (α, β) can now also easily be formulated:
α + θ
i 1 : if i ∈ I ∗
αi := αi + θ1 αi = (5.14)
αi − θ1 : if i ∈ / I∗
and
β − θ
i 1 : if j ∈ J ∗
βj := βj + θ1 βj = (5.15)
βj + θ1 : / J ∗.
if j ∈
When using APLHABETA we will extend the transportation table by an extra column for αi ’s and an extra
row for βj ’s:
bj
..
.
ai ······ xij ······ αi
..
.
βj
Example 5.10 We return to Example 5.1 and solve it using ALPHABETA. At the end we will compare the solution
with that found by GREEDY.
Initialisation: α = (0, 0, 0)⊤ , β = (8, 6, 10, 5)⊤ (column minimum).
47
45 20 30 30 αi
35 8 6 10 9 0
50 9 12 13 7 0
40 14 9 16 5 0
βj 8 6 10 5
Iteration 1: IJ = {(1, 1), (1, 2), (1, 3), (3, 4)}, the network is shown in Figure 5.6. A max-flow can easily be found
by inspection and is given by free-standing numbers. (By convention the missing flow values are zero). However,
the labelling algorithm has to be run to reach a non-breakthrough, at which the labels are displayed in bold (as we
do not need the details of labels here). Hence I ∗ = {2, 3}, J ∗ = {4}. The arcs leaving the source are not saturated,
thus another iteration is needed. For this we calculate
1 1
θ1 = min{9 − 8 − 0, 12 − 6 − 0, 13 − 10 − 0, 14 − 8 − 0, 9 − 6 − 0, 16 − 10 − 0} =
2 2
and update α and β. The result is in the table below:
45 20 30 30 αi
35 8 6 10 9 −0.5
50 9 12 13 7 0.5
40 14 9 16 5 0.5
βj 8.5 6.5 10.5 4.5
Iteration 2: IJ = {(1, 1), (1, 2), (2, 1), (1, 3), (3, 4)}, the network is shown in Figure 5.7. After finding a max-flow
we get I ∗ = {2, 3}, J ∗ = {1, 4}. The flow is still not saturating the arcs leaving the source and so another iteration
is needed. For this we calculate
1
θ1 = min{12 − 6.5 − 0.5, 13 − 10.5 − 0.5, 9 − 6.5 − 0.5, 16 − 10.5 − 0.5} = 1
2
and update α and β. The result is in the table below:
45 20 30 30 αi
35 8 6 10 9 −1.5
50 9 12 13 7 1.5
40 14 9 16 5 1.5
βj 7.5 7.5 11.5 3.5
Iteration 3: IJ = {(1, 2), (1, 3), (2, 1), (2, 3), (3, 2), (3, 4)}, the network is shown in Figure 5.8. The max-flow now
saturates the arcs leaving s, an optimal solution to the transportation problem can be read off from the flow values
on the admissible arcs and is summarised in the table below:
45 20 30 30
35 10 25
.
50 45 5
40 10 30
The cost of this transportation plan is GBP 1,020,000 which is GBP 60,000 cheaper than that found by GREEDY
in Example 5.1.
48
35
A1 B1
35 35
B2
35 45
20
50
s A2 t
30
40 30
B3
30 30
A3 B4
30
A1 B1
45
35 15 45
B2
35 45
20
50 20 15
s A2 t
45 20
30
40 30
B3
30 30
A3 B4
30
A1 B1
10 45
35 45
B2
35 45
20
25 20
50
s A2 t
50
30
30
40 5 30
B3
40 30
10
A3 B4
30
49
Chapter 6
This chapter presents a basic introduction into the theory of matroids. It provides an explanation why some
combinatorial optimisation problems can be solved by a straightforward “greedy” algorithm and characterises such
problems. Before we do so we need to discuss a selection of properties of undirected graphs.
d c
a b
50
K1 K2 K3
numbers of arcs incident with individual nodes then every arc is counted twice. Hence in any graph we have
0 ≤ n ≤ 12 m(m − 1).
If E = {{u, v} : u, v ∈ V, u ̸= v}, then G is called complete, notation Km . The complete graphs with 1, 2 and 3
nodes are shown in Figure 6.2. In Km clearly n = 12 m(m − 1).
Given a graph G = (V, E) the sequence p = (v0 , v1 , . . . , vk ) is a walk (in G) if v0 , v1 , . . . , vk ∈ V and vi−1 vi ∈ E
(i = 1, 2, . . . , k). If, moreover, vi ̸= vj (i ̸= j; i, j = 0, 1, . . . , k), then p is called a path. If p is indeed a path then
it is called a u − v path if v0 = u and vk = v and we say that p passes through v0 , v1 , . . . , vk or that p contains
v0 , v1 , . . . , vk . We also say that v is reachable from u (in G) if there is a u − v path (in G).
A walk c = (v0 , v1 , . . . , vk ) is called closed if v0 = vk . If, moreover, (v0 , v1 , . . . , vk−1 ) is a path in G and k ≥ 3,
then c is called a cycle. A cycle is called Hamiltonian (in G) if it contains all nodes of G.
The number of arcs incident with a node v is called the degree of v, notation deg(v). We say that v is isolated
if deg(v) = 0.
A graph G = (V, E) is called acyclic (or a forest) if there is no cycle in G; it is called connected if there is a u − v
path in G for all u, v ∈ V (G) with u ̸= v; and it is called a tree if G is both acyclic and connected. For illustration
see Figures 6.3, 6.4 and 6.5.
If you try to draw a few trees you may notice that each of them has the number of arcs one less than the number
of nodes, for instance in the tree of Figure 6.5 we have m = 9, n = 8. This is a simple but fundamentally important
observation confirmed by the next statement which is very well known but we repeat it here for completeness.
Proof. By induction on m.
If m = 1 then G has no arcs and so n = 0.
51
Figure 6.4: Connected but not acyclic
Suppose now m > 1. There are no isolated nodes since otherwise G would not be connected. If there was no
node of degree 1 then starting from any node and an incident arc it would be possible to move from node to node by
entering and leaving every node using different arcs, hence eventually returning back to one of the previous nodes
and thus creating a cycle, which is a contradiction.
Let v be a node of degree 1 and e be the incident arc. Let G ′ = (V \{v}, E\{e}) and denote the number of nodes
by m′ and arcs by n′ . Hence m′ = m − 1, n′ = n − 1 and G ′ is obviously again a tree. By the induction hypothesis
n′ = m′ − 1. We deduce that m − 1 = m′ = n′ + 1 = (n − 1) + 1 = n.
We will refer to the next statement quite often; its proof is trivial.
Lemma 6.2 (obvious) If a graph G is acyclic then also every subgraph of G is acyclic.
The concept of a connected component, briefly just component, will play a key role in the theory that follows.
Informally, a component of a graph G is any subgraph of G that is connected and cannot be extended to a larger
connected subgraph of G (but a larger connected subgraph of G may exist). More precisely, a graph H is called a
component of the graph G if
H ⊆ G,
H is connected and
52
H ⊆ H′ , H′ ⊆ G, H′ connected implies H = H′ .
Proof. Let p = p(G) and mi (ni ) denote the number of nodes (arcs) in the i-th component of G. Then ni = mi − 1
(i = 1, . . . , p) by Theorem 6.1 and so
p
X p
X p
X p
X
n= ni = (mi − 1) = mi − 1 = m − p.
i=1 i=1 i=1 i=1
Another key concept in this chapter is that of a spanning tree. Let G be a connected graph (this assumption
is essential!). Then a subgraph T of the graph G is called a spanning tree of G if
T is a tree and
It follows from this definition and Theorem 6.1 that every spanning tree of a graph with m nodes has m − 1
arcs.
In summary, a spanning tree of a connected graph is any subgraph of this graph that is acyclic, connected and
spanning. Figure 6.10 shows a connected graph G and five subgraphs of G. The first two are spanning trees of
G, however the others are not: G3 is connected and spanning but not acyclic, G4 is acyclic and spanning but not
connected and G5 is connected and acyclic but not spanning.
Let G be a graph. A graph F is called a maximal acyclic subgraph (MAS) of G if F is an acyclic subgraph
of G but is not a proper subgraph of any acyclic subgraph of G. The next statement confirms what is probably
intuitively obvious. It is a basic building block of the matroid theory.
Theorem 6.4 Let G be a connected graph with m nodes. Then every MAS of G is a spanning tree of G and hence
has m − 1 arcs.
Proof. Let G be a connected graph with m nodes and F be a MAS of G. We need to prove that F is a spanning
tree of G, that is, it is acyclic, connected and spanning.
The first property is immediate.
If V (F) ̸= V (G) then by adding any node from V (G)\V (F) (but no arc) to F we get an acyclic subgraph of G
which contains F as a proper subgraph, a contradiction. Therefore V (F) = V (G), that is, F is a spanning subgraph
of G.
53
Figure 6.7: Component of G in Figure 6.6
G G1 G2 G3 G4 G5
54
Finally, we prove by contradiction that F is connected. Suppose that p(F) > 1 and let F1 be one (fixed)
component of F. We define F2 to be the graph
Take any two nodes u ∈ V (F1 ) and v ∈ V (F2 ). The graph G is connected and so there exists a u − v path p in
G. On this path there are consecutive nodes u′ and v ′ such that u′ ∈ V (F1 ) and v ′ ∈ V (F2 ). Hence u′ v ′ ∈ E(G).
Consider now the graph F ′ = (V (F), E(F)) ∪ {u′ v ′ }). This graph is acyclic because it is obtained from an
acyclic graph F by adding an arc joining two different components of F. We also have F ⊆ F ′ ⊆ G and F = ̸ F′
′ ′ ′
because u v ∈ / E(F). Hence F is an acyclic subgraph of G containing F as a proper subgraph, a contradiction.
This completes the proof.
Let G be a graph. A graph F is called a spanning forest of the graph G if V (F) = V (G) and each component of
F is a spanning tree of the corresponding component of G.
Corollary 6.5 (of Theorem 6.4) Let G be a graph with m nodes and p components. Then every maximal acyclic
subgraph of G is a spanning forest of G and (therefore) has m − p arcs.
Proof. Let F be a MAS of G. Then every component of F is a maximal acyclic subgraph of that component
(otherwise we have a contradiction with the assumption that F is a MAS of G) and hence a spanning tree of that
component by Theorem 6.4. If V (F) ̸= V (G) then by adding any node from V (G)\V (F) to F we get an acyclic
subgraph of G which contains F as a proper subgraph, a contradiction. Therefore V (F) = V (G) and so F is a
spanning forest of G. The rest follows from Theorem 6.3.
(1) The assignment problem (AP) is: Given an m × m matrix A, find m entries in A, no two from the same row
or column so that their sum is minimal (or maximal). Note that the minimisation and maximisation versions
of the assignment problem are converted to each other by taking −A instead of A.
If we take !
1 2
A= ,
3 5
then GREEDY will start by choosing 1, skip 2 and 3 (since these are not feasible as they are in the same row
or column as the already chosen 1) and finally take 5. The total then is 1 + 5 = 6 but obviously the choice of
2 + 3 would have been better. So GREEDY is not a correct solution method for AP. An explanation of this
simple observation will follow from the matroid theory in the next section. Observe that if 5 was replaced by
55
a
2 3
s t
4
b
5 2
6
a c
8 9
1 10
e d
12
any big number, say M , then GREEDY would still choose 1 and M rather than the optimal choice of 2 + 3
so the error of GREEDY in AP may be arbitrarily large.
(2) The shortest path problem (SPP). Consider the problem of finding a shortest path from s to t in the graph
of Figure 6.11. Here GREEDY would obviously start by choosing the arc (s, a), followed by (a, t), yielding
the path of length 5 but clearly the path consisting of the single arc (s, t) is shorter. Thus GREEDY is not
a correct solution method for SPP.
(3) The minimal spanning tree problem (MSTP or just MST) is the following: Given a connected graph G =
(V, E) and a function w : E → R, find a spanning tree T = (V, E ∗ ) of G such that
X
w(E ∗ ) = w(e)
e∈E ∗
is minimal.
In the graph of Figure 6.12 GREEDY will choose arcs (in this order) ce, bc, ab, then will reject arc ae because
otherwise the cycle (a, b, c, a) is created and similarly it will reject ae. Then it will accept cd and reject the
remaining arcs. So E ∗ = {ab, bc, ce, cd} and the minimum weight is 17.
Indeed, GREEDY is a correct solution method for the MST in general, and the next section provides an
explanation of why GREEDY is a correct solution method for the MST.
The proof is actually done for the following maximum weight forest problem (MWF), which is equivalent to
the MST: Given a graph G = (V, E) and a function w : E → R≥0 , find a forest F = (V, E ∗ ) of G such that
X
w(E ∗ ) = w(e)
e∈E ∗
is maximal. In this definition it is important that the weights of arcs are restricted to non-negativity. Note
that by definition w(∅) = 0.
56
b
7 10
6
a c
4 3
11 2
e d
0
Details of the conversion from MWF to MST are left as an exercise (see problem sheets) but we illustrate
this transformation on the graph of Figure 6.12. We now show how the MST in that graph can be solved
provided that we can solve the MWF (which will be shown to be solvable by GREEDY in Section 6.4). The
transformation that can be used is
w′ (e) = W − w(e), e ∈ E,
where W is the greatest arc weight in G. In Figure 6.12 we have W = 12 and the graph with transformed
weights is displayed in Figure 6.13. GREEDY now finds an MWF (remember we now have a maximisation
problem so GREEDY is choosing the arcs of greatest weights first) which coincides with the MST found
before.
6.3.1 System 1
Let G = (V, E) be a graph and let S1 = (E, A) where A = {A ⊆ E : (V, A) is acyclic}. It follows immediately from
the “obvious lemma”, Lemma 6.2, that S1 is an independence system (because for any acyclic graph also every
subgraph is acyclic).
We illustrate S1 on the following instance. Let G be the graph of Figure 6.14. Then V = {a, b, c}, E =
{ab, bc, ac} and A = {∅, {ab}, {bc}, {ac}, {ab, bc}, {ab, ac}, {bc, ac}} = 2E \E.
6.3.2 System 2
For S2 we need a bit more of graph theory: Let G = (V, E) be a graph. Then M ⊆ E is called a matching (in G) if
e ∩ f = ∅ for all e, f ∈ M , e ̸= f . So a matching in a graph G is a set of edges of G, no two of which have a vertex
in common.
57
a
b c
G Not a matching Matching but not Maximal but not Maximum and per-
maximal maximum fect
Thus |M | ≤ 12 m for every matching M . Clearly, ∅ is a matching in any graph, {e} is a matching for any e ∈ E.
If |M | = 12 m then M is called perfect. If M is a matching in G and M is not a proper subset of any other matching
in G then M is called maximal (“it cannot be extended to a bigger matching”). If M is a matching in G of biggest
size then M is called maximum. These concepts are illustrated in Figure 6.15 where M is formed by red dashed
arcs.
The difference between the size of a maximum matching and 21 m can be arbitrarily big as can be seen in the
graph of Figure 6.16 where any maximum matching consists of only one arc.
Let G = (V, E) be a graph. Then S2 = (E, M), where M = {M ⊆ E : M is a matching}, is an independence
system because with every matching also every subset is a matching.
We illustrate S2 on the following instance. Let G be the graph of Figure 6.17.
Then V = {a, b, c, d}, E = {ab, bc, cd, ad} and (E, M) is an instance of S2 , where
Note that independence systems can be constructed from graphs also in ways other than in S1 and S2 . For example
..
.
58
d c
a b
if G is the graph of Figure 6.17 and F = {∅, {ab}, {ad}, {ab, ad}} then (E, F) is an independence system (different
from S1 and S2 ): this can easily be verified and is left as an exercise.
The remaining two independence systems will not be graph related.
6.3.3 System 3
Let m ≥ 1 be an integer, S3 = (E, I) where E = {(i, j) : i, j = 1, . . . , m} (the set of all positions in an m × m
matrix) and I ∈ I if and only if I is a set of positions in E such that no two are from the same row or column. S3
is an independence system because if no two positions in a set of positions are from the same row or column then
the same holds also for any subset.
We illustrate S3 on the following instance. Let m = 2, that is we consider 2 × 2 matrices, generally of the form
!
a11 a12
A= .
a21 a22
Here E = {(1, 1), (1, 2), (2, 1), (2, 2)} and
I = {∅, {(1, 1)}, {(1, 2)}, {(2, 1)}, {(2, 2)}, {(1, 1), (2, 2)}, {(1, 2), (2, 1)}}.
6.3.4 System 4
Let A ∈ Rm×n and S4 = (C, L) where C = {1, . . . , n} (the set of column indices of A) and
S4 is an independence system because (as is well known from linear algebra, every subsystem of a linearly indepen-
dent system of vectors is also linearly independent).
We illustrate S4 on the following instance. Let
!
1 0 1 2 0
A=
0 1 1 0 0
Then C = {1, 2, 3, 4, 5} and L will contain ∅, every one-element subset of C except {5} (because the zero vector
alone is not linearly independent); it will also contain pairs of indices of linearly independent columns (for instance
it will not contain the set {1, 4} because the first and fourth columns are linearly dependent). It cannot contain
any set of 3 or more indices because in the two-dimensional space the maximum number of linearly independent
columns is 2. Hence (C, L) is an instance of S4 , where
L = {∅, {1}, {2}, {3}, {4}, {1, 2}, {1, 3}, {2, 3}, {2, 4}, {3, 4}}.
59
6.3.5 Systems 1-4: some remarks
The size of an independent set plays a key role in the theory that will now follow. Let us therefore have a closer
look at the upper bounds of these sizes in S1 − S4 .
It follows from Theorem 6.3 that an acyclic subgraph of a graph with m nodes and p components cannot have
more than m − p arcs. Therefore for any independent set A in S1 we have |A| ≤ m − p.
We have observed immediately after the definition of a matching that a matching cannot have more than 12 m
arcs so for any independent set M in S2 we have |M | ≤ 21 m.
In S3 we have |I| ≤ m for any independent set I because in a set of m + 1 or more positions of an m × m matrix
at least two are from the same row (by, say, the pigeonhole principle).
Finally, it is proved in linear algebra that the size of any linearly independent system of columns of a real
matrix A does not exceed the rank of A (denoted by r(A)). Hence in S4 for any independent set J we have
|J| ≤ r(A) ≤ min(m, n).
Before we continue with matroids it may be useful to realise that “independence” in independence systems is a
generalisation of various concepts in specific independence systems. For instance the meaning of “independence”
in the four examples above is as follows:
S1 acyclic
S2 matching
S3 independent positions
S4 linearly independent
6.4 Matroids
Some independence systems have the desirable powerful feature: whatever combinatorial optimisation problem is
associated with them, GREEDY will always be a correct solution method. Such independence systems are called
matroids. In this section we will formalise this idea and characterise matroids. In order to do so, clearly we need first
to say precisely what a combinatorial optimisation problem is and what exactly GREEDY means. Independence
systems will help us do this fairly easily.
Let S = (E, F) be an independence system. A combinatorial optimisation problem (COP) associated with S,
notation (S, w), is the following problem:
Given a weight function w : E → R≥0 , find an independent set (called an optimal solution) that has the biggest
total weight, i.e., find X ∗ ∈ F such that
w(X ∗ ) = max w(X),
X∈F
where X
w(X) = w(e)
e∈X
for X ⊆ E.
Hence a combinatorial optimisation problem associated with S1 (where “independent” means “acyclic”) is:
Given a graph G = (V, E) and w : E → R≥0 , find an acyclic subgraph of G of maximum total weight. This is
exactly the definition of the maximum weight forest problem.
In S3 “independent” means “being from different rows and columns”. Assigning a weight with every position
(i, j) means creating a matrix (aij ); and so the combinatorial optimisation problem associated with S3 is:
60
Given an m × m matrix A = (aij ) ≥ 0, find m entries in A, no two either from the same row or column, of
maximum total value. This is exactly the definition of the assignment problem (with maximisation).
Next we formally define GREEDY.
(1) X ∗ := ∅
(2) (“Find a most attractive element in E”) Find an e ∈ E such that w(e) = maxz∈E w(z),
(3) E := E\{e},
We are now ready to formulate and prove one of the key statements of this chapter, and in fact of the whole
course.
Theorem 6.7 Let S = (E, F) be an independence system. Then the following statements are equivalent:
(A) For every w : E → R≥0 GREEDY correctly solves the combinatorial optimisation problem (S, w).
(B) [Exchange Proposition (EP)] For all F, F ′ ∈ F, if |F ′ | = |F | + 1 then there is an e ∈ F ′ \F such that
F ∪ {e} ∈ F.
(C) [Rank Proposition (RP)] If A ⊆ E and F , F ′ are maximal independent subsets of A then |F | = |F ′ |.
An independence system satisfying one (and therefore all) of the conditions (A), (B), (C) in Theorem 6.7 is
called a matroid.
Proof. Let S = (E, F) be an independence system.
(A) ⇒ (B)
We prove, equivalently, that if (B) is not true that then also (A) is not true. So suppose that for some F, F ′ ∈ F,
we have |F ′ | = |F | + 1 and for all e ∈ F ′ \F we have F ∪ {e} ∈/ F. Let us denote |F | by p and define w : E → R≥0
as
′
p + 1, if e ∈ F \F ,
w(e) = p + 2, if e ∈ F ,
0, if e ∈ E\(F ∪ F ′ ).
If GREEDY is used for solving the combinatorial optimisation problem (S, w), then it will first choose (one by
one) all elements of F because they have the biggest weight and are feasible (since every subset of F is independent).
Then it will reject (again one by one) all elements from F ′ \F due to the assumption that F ∪ {e} ∈ / F for any
e ∈ F ′ \F . Next, GREEDY will accept or reject the remaining elements which are all in E\(F ∪ F ′ ), and therefore
all are of zero weight. Hence the set X ∗ found by GREEDY has the weight w(X ∗ ) = w(F ) = p(p + 2). However
the set F ′ is independent and has p + 1 elements all of weight p + 1 or p + 2 and hence w(F ′ ) ≥ (p + 1)2 > p(p + 2).
Thus GREEDY fails to find an optimal solution for (S, w), that is, (A) is not true.
(B) ⇒ (C)
Suppose (B) holds, A ⊆ E and F , F ′ are maximal independent subsets of A. We prove by contradiction that
|F | = |F ′ |. Suppose that |F | =
̸ |F ′ |; without loss of generality |F | < |F ′ |. Let F ′′ be any subset of F ′ such that
′′ ′′
|F | = |F | + 1. Clearly F ∈ F (because any subset of an independent set is independent) and so by (B) applied
61
d c
a b
We may assume without loss of generality that w(e1 ) ≥ w(e2 ) ≥ . . . ≥ w(em ) and w(e′1 ) ≥ w(e′2 ) ≥ . . . ≥ w(e′k ).
The set X ∗ is maximal independent in A = E by its construction by GREEDY. The set X may also be assumed
to be maximal in A = E without loss of generality (if necessary, suitable elements of zero weight are added until
this assumption is satisfied). By (C) then k = m. We will now show by induction that
which yields a contradiction with (6.1). The set {e′1 } is independent since it is a subset of the independent set
X. Hence by the choice of GREEDY at the first step we have that w(e1 ) ≥ w(e′1 ). For the second induction step
suppose that
w(ei ) ≥ w(e′i ), i = 1, . . . , r − 1
but w(er ) < w(e′r ). Let us define A = {e ∈ E : w(e) ≥ w(e′r )}. The set {e1 , . . . , er−1 } is a maximal independent
subset of A because if there was an e ∈ A such that {e1 , . . . , er−1 , e} is also independent then w(e) ≥ w(e′r ) > w(er ),
but then GREEDY would have chosen e rather than er . So we have that {e1 , . . . , er−1 } is a maximal independent
subset of A. At the same time {e′1 , . . . , e′r } is an independent subset of A of a larger size which contradicts (C).
This concludes the inductive proof of (6.2), hence we have a contradiction with (6.1), which completes the proof.
Let S = (E, F) be a matroid. The (common) size of maximal independent subsets of a set A ⊆ E is called the
rank of A and any maximal independent subset of A is called a basis of A. A minimal dependent subset of E (that
is a dependent set whose every proper subset is independent) is called a circuit.
We will now apply Theorem 6.7 to each of the four examples of independence systems S1 − S4 in Section 6.3
and deduce whether they are matroids or not. The results are formulated as four corollaries of Theorem 6.7; we
will use the notation of Section 6.3.
Corollary 6.8 S1 = (E, A) is a matroid for every G = (V, E). Hence GREEDY will correctly solve every instance
of the maximum weight forest problem and, consequently, also every minimal spanning tree problem.
Proof. For the proof we use the Rank Proposition: Let A ⊆ E be fixed and take F ⊆ A. Then F is an independent
subset of A if and only if (V, F ) is an acyclic subgraph of (V, A), which we will denote by GA for ease of reference.
Hence F is a maximal independent subset of A if and only if (V, F ) is a maximal acyclic subgraph of GA . But by
62
Corollary 6.5 each such graph has m − p(GA ) arcs, thus the Rank Proposition holds, and the statement now follows.
A matroid defined in the same way as S1 for a graph G is called a graphic matroid associated with G and denoted
by Gr(G).
Corollary 6.9 S2 = (E, M) is not a matroid in general. Hence GREEDY is not a correct solution method for the
maximal weighted matching problem.
Proof. Let G be the graph of Figure 6.18 and A = {ab, ad, bc}. Then both {ad, bc} and {ab} are maximal
independent subsets of A, they are of different sizes, hence by the Rank Proposition GREEDY does not always
solve the maximal weighted matching problem correctly.
Corollary 6.10 S3 = (E, I) is not a matroid in general. Hence GREEDY is not a correct solution method for the
assignment problem.
and A = {(1, 1), (1, 2), (2, 1)}. Then both {(1, 2), (2, 1)} and {(1, 1)} are maximal independent subsets of A.
Since they are of different sizes by the Rank Proposition GREEDY does not always solve AP correctly.
Proof. It is proved in linear algebra that the size of any maximal linearly independent system of columns of a (fixed)
matrix is equal to the rank of this matrix. Hence the Rank Proposition holds and S4 therefore is a matroid.
a+b=b+a a·b=b·a
a + (b + c) = (a + b) + c a · (b · c) = (a · b) · c
a · (b + c) = a · b + a · c
(∃0 ∈ K) a + 0 = a (∃1 ∈ K) a · 1 = a
These axioms resemble the well known properties of real numbers. Many concepts for fields are developed in a
way that is close or identical with real numbers. These include vectors, matrices and linear independence.
Let A be an m × n matrix over K. By col(A) we denote the set of column indices of A and
63
Figure 6.19: Overview of matroids
Then for the same reasons as for S4 the pair (col(A), L(A)) is a matroid; notation M(A). Clearly, S4 is M(A)
for K = R.
Independence systems S = (E, F) and S ′ = (E ′ , F ′ ) are called isomorphic (notation S ≈ S ′ ) if a bijection
ψ : E → E ′ exists such that
F ∈ F ⇔ ψ(F ) ∈ F ′ ,
It is known [1] that not every matric matroid is graphic and not every matroid is matric. We summarise these
results in Figure 6.19 which provides an overview of the concepts discussed in this chapter.
64
Part III
Hard problems
65
In combinatorial optimisation, similarly as in other parts of mathematics, there are problems that always seemed
to be harder. This intuitive perception may be tricky: there are numerous examples of problems in mathematics
that people were unable to solve for many years or even centuries but then one day, suddenly, someone came with a
solution. However, there is a theory that enables us to decide about a wide class of problems that this is extremely
unlikely to ever happen. In 1972 S. A. Cook laid the foundations of this theory, called computational complexity.
A key concept here is that of N P -completeness.
In this chapter we first explain the concept of an N P -complete problem. N P -complete problems are extremely
unlikely to be efficiently solvable exactly and therefore in the second chapter of this part we discuss approximation
methods.
66
Chapter 7
N P - completeness
(a) typically the bigger the size of an instance, the more operations are needed or, in other words, the number of
operations is a function of the size of an instance (we also call this “the size of input”);
(b) it may happen that even if two instances have the same size the number of operations is different because
of the internal structure of instances (for example if a max-flow problem is solved for two networks with the
same number of nodes and arcs, the number of operations may still be very different because of the structure
of the underlying graphs).
In the case (b) a usual (but not exclusive) convention is to consider the worst possible case, that is the greatest
number of operations over all instances of the same size. The number of operations that an algorithm A needs to
find a solution in the worst case, expressed as a function, say f (L), of the size of input L is called the computational
complexity of A. Shortly below we will give a more formal definition.
For illustration consider the very simple task of calculating the scalar product of two vectors from the definition.
That is if u = (u1 , . . . , un )⊤ and v = (v1 , . . . , vn )⊤ then the scalar product is
u⊤ v = u1 v1 + · · · + un vn .
67
The size of input is L = n + n = 2n, the number of operations is n (multiplications) + (n − 1) additions
= 2n − 1 = L − 1 = f (L). Thus here f (L) is a linear function and we would therefore say that this (trivial)
algorithm is “of linear complexity”, or just “linear”.
As another simple example consider multiplication of two real n × n matrices, say A = (aij ) and B = (bij ) from
the definition. Thus L = 2n2 and we need to calculate
n
!
X
C = (cij ) = A.B = aik bkj .
k=1
Each of the n2 entries of C is the scalar product of two n dimensional vectors, hence the total number of
operations (using the example above) is (recall that in these discussions L ≥ 1):
1 1
f (L) = n2 (2n − 1) = 2n3 − n2 = √ L3/2 − L ≤ L2 .
2 2
We may say that the computational complexity is “less than quadratic”.
More formally, given a problem Q and an algorithm A for solving Q, the computational complexity of A (also
denoted as cc(A)) is the function
where op(A, I) stands for the number of operations that algorithm A needs to solve instance I of the problem
Q, inst(Q) is the class of all instances of Q and |I| is the size of I. If there is no risk of confusion, we will say
“complexity” rather than “computational complexity”.
We should also give a more precise definition of the size of an instance. In fact, in the theory of computational
complexity there are three main ways of defining the size of an instance:
(a) Size L1 is the number of parameters. Thus if an input is given by one number (integer or real) then L1 = 1.
If it is an m × n matrix A = (aij ) then L1 = mn. Note that this concept of the size was used in the two
introductory examples above, so the symbol L used there was actually L1 .
(b) Size L2 is the number of all individual symbols used to formulate an instance, that is typically the number
of digits and symbols such as decimal points or minus signs. Since the number of decimal points and signs is
normally negligible, compared to the number of digits, we will only consider the number of digits. Note that
if the input is one integer such as 325 then the size of input in the decimal system is L2 = 3 = 1 + ⌊log10 325⌋,
in general the size of an integer k ̸= 0 is L2 = 1 + ⌊log2 |k|⌋, where the base 2 is used since computers work
in the binary; if the input is 0 then obviously L2 = 1. If the input is given by an m × n integer matrix, say
A = (aij ), then X X X
L2 = (1 + ⌊log2 |aij |⌋) + 1 = mn + ⌊log2 |aij |⌋ .
aij ̸=0 aij =0 aij ̸=0
Obviously, L2 is applicable only if the number of symbols of any instance is finite: usually we restrict ourselves
to instances with integer entries.
(c) Size L3 is the sum of the moduli of all parameters. Thus if the input is given by one integer k then L3 = |k|.
Pm Pn
If it is an m × n integer matrix A = (aij ) then L3 = i=1 j=1 |aij |.
Clearly, L1 ≤ L2 . We also have L2 ≤ L3 because 1 + ⌊log2 x⌋ ≤ x for any positive real number x. Therefore if
the computational complexity is expressed in terms of L1 using a function f1 , in terms of L2 using a function f2
and in terms of L3 using a function f3 then
68
An algorithm is called strongly polynomial (polynomial, pseudopolynomial) if there is a polynomial p(x) such
that f1 (x) ≤ p(x) (f2 (x) ≤ p(x), f3 (x) ≤ p(x)). It follows from (7.1) that if an algorithm is strongly polynomial
then it is also polynomial and if it is polynomial then it is also pseudopolynomial. In what follows we will only
discuss polynomiality (where L2 is used) but it is important to know that in many cases an algorithm is proved to
be polynomial by actually showing that it is strongly polynomial because this is usually much easier. On the other
hand, if we cannot find a polynomial algorithm for a problem then we try (at least) to find a pseudopolynomial
algorithm.
We will now explain why it is generally accepted that “efficient” means “polynomial”. The table below shows
the actual running times of algorithms depending on the computational complexity.
The figures in this table are calculated on the assumption that the computer speed is 10GHz (that is 1010
operations per second or, equivalently, one operation is carried out in 10−10 second). So for instance the entry in
this table for size 60 and computational complexity L2 is 602 operations ×10−10 seconds = 3600 × 10−10 seconds
= 0.36 × 10−6 seconds = 0.36µs.
It is clearly visible from the table that the running times are acceptable even for problems of larger sizes while
the computational complexity is polynomial (that is L, L2 and L4 ). However, once it is exponential it is acceptable
only for problems of very small sizes.
One may argue that computers are faster and faster and so in the future computational complexity will not
matter very much. The next table shows how much the capability of solving problems will increase in the future
assuming that the computer speed increases 1000 times, depending on the computational complexity. The column
“Size” shows an approximate size of instances of a (fixed) problem that will be solved by algorithms of various
complexities within a time limit within which today we can solve instances of this problem of size 100. For example
if the size solvable today is s = 100 and the (unknown) size in the future is s′ then for complexity L2 we have
√
(s′ )2 = 1000s2 and thus s′ = s 1000 ∼ 3,162.
Again, we see that there is a clear borderline between algorithms of polynomial and exponential complexities:
faster computers in the future will enable us to solve problems of substantially larger sizes than today, provided
that the complexity is polynomial, but not in the case of algorithms of exponential complexity.
One may argue that to require “polynomial” alone is not enough because for instance a complexity 1020 would
hardly be considered more plausible than 10−5 2L . But complexities with extremely big and small coefficients
practically never occur.
The above discussion motivates us to accept the convention that “efficient” means “polynomial”.
69
7.2 Examples of hard problems
The discussion in the previous section justifies the convention that “efficient algorithm” means “polynomial al-
gorithm”. Consequently it is reasonable to accept that a “hard problem” is a problem for which a polynomial
algorithm “does not exist”. However, the expression “does not exist” has two possible interpretations:
a. we (people) do not know a polynomial algorithm (for a selected fixed problem) today but admit that maybe
it will be known in the future, or
Obviously, we would like to make the assessment of hardness independently of the state of research, that is (if
true and possible) we would like to know not only that such an algorithm is not known today but that it will never
be known, or at least that it is extremely unlikely that this will ever happen.
We start with the formulations of a few problems for which no polynomial algorithm is known and later on we
will explain why it is extremely unlikely that such an algorithm will be known in the future for these and many
other problems (we will call such problems N P-complete).
Recall that a cycle in a graph G = (V, E) is called Hamiltonian (or a tour ) if it contains all nodes of G and G is
called Hamiltonian if it contains a Hamiltonian cycle.
Problem 7.2 (TSP – The Travelling Salesman Problem:) Given a complete arc-weighted graph, find a tour
of minimal total length.
No polynomial method is known for TSP and neither for the digraph version of the TSP.
Remark 7.3 The TSP problem can be tought of in the following way: A travelling salesman wishes to visit a number
of cities and return to the starting point, in such a way that each city is visited exactly once, and the total distance
covered is as short as possible.
Problem 7.4 (ILP - Integer Linear Programming:) Given an integer matrix A and integer vectors b and c,
find an integer vector x which solves the optimisation problem
min c⊤ x
s. t. Ax = b
x ≥ 0.
Problem 7.5 (INTEGER KNAPSACK) Given integers a1 , . . . , an and K, are there integers x1 , . . . , xn ≥ 0
such that
Xn
aj xj = K?
j=1
Problem 7.6 (0-1 KNAPSACK) Given integers a1 , . . . , an and K, is there a subset S of {1, . . . , n} such that
X
aj = K?
j∈S
70
7.3 N P -completeness
As mentioned before the concept of an N P-complete problem is a formalisation of the intuitive idea that a problem
is “hard”, not only because no polynomial algorithm is known today but it is extremely unlikely that such an
algorithm will ever be known. The question we aim to address in this section is how we know that this is the case
once a problem is given.
Recall that any optimization problem can be represented as
min{cT x | x ∈ S} (7.2)
where S is the feasible set, which is normally defined by functional inequalities or equalities. This problem is
associated with the following decision problem:
Clearly, (7.2) can be easily solved if and only if (7.3) is easy to decide for any k.
A problem whose answer is “yes” or “no” is called a decision problem. So (7.3) is a decision problem.
Definition 7.7 (Complexity class P) A decision problem is in the complexity class P (the class of polynomial-
time solvable problems) if there exists an algorithm for which there exist constants C, p > 0 such that
a) the algorithm gives the correct decision ”Yes/No” for all instances I of the problem, and
b) the number of bit operations and comparisons required to arrive at this answer is bounded by C(L(I))p , where
L(I) denotes the size of the instance I.
Definition 7.8 (Complexity class N P) A decision problem is in the complexity class N P (the class of polynomial-
time checkable problems) if there exist constants C, p > 0 such that every “yes”-instance I can be verified as such
using at most C(L(I))p bit operations and comparisons.
Proposition 7.9 P ⊆ N P.
Proof. For problems in P, both “Yes” and “No”-instances can be correctly decided in polynomial time.
Note that the letters “N P” stand for “non-deterministically polynomial”. The class N P can be viewed as the
class of all “reasonable” problems (including those for which a polynomial solution method is not known and those
which are polynomially solvable). Let us point out that a decision problem lies in N P if once a solution is known
for an arbitrary instance, it can be verified by a polynomial algorithm (but it is not required that a solution can
be found by a polynomial algorithm).
For instance if a solution to HAM is known it can be easily verified (that is it is easy to verify that a sequence
is a cycle in the graph, passing through all nodes of this graph) although it is in general not easy to find such a
cycle; but it is not clear whether NON-HAM is in N P, where NON-HAM is the question: given a graph is it non-
Hamiltonian? Because what and how would you use to verify in an efficient way that a graph is non-Hamiltonian
even if you know that?
The five problems presented in the previous section, for which existence of polynomial algorithms is not known,
are in N P .
It is a long-standing open conjecture to prove that P ̸= N P. Equality would of course mean that all reasonable
problems are polynomially solvable. The reason why this conjecture is believed to be true is that the class N P
includes some notoriously hard problems. So far, the best one can do to prove that a polynomial-time checkable
problem is intractable is to show that it is “hardest” among all problems in N P. The conjecture P = ̸ N P remains
one of the greatest unsolved conjectures of mathematics.
71
Definition 7.10 (N P-hard) A decision problem P is N P-hard if for all problems Q ∈ N P there exists a mapping
θ that transforms any instance I of Q into an instance θ(I) of P , such that for some C, p > 0,
a) the number of bit operations and comparisons required for computing θ(I) is bounded by C(L(I))p , and
b) the output (i.e. “yes/no”) for the instance I is the same as the output of θ(I).
This is a sensible notion of hardness because if an N P -hard Q was polynomial time solvable then it would be the
case that N P ⊆ P. Thus, if a problem Q is N P-hard then Q is extremely unlikely to be polynomially solvable
because it would imply that all reasonable problems are polynomially solvable.
The class of N P-complete problems will be denoted by N PC. Figure 7.1 provides an overview of the defined
concepts.
Methodology for proving that a problem is N P-complete is beyond the scope of this course, recommended
reading for this is [1] and [2]. Important examples of N P-complete problems are all problems of the previous
section, that is HAM, TSP, ILP, KNAPSACK and 0-1 KNAPSACK. Examples of problems in P are LP, MFP, SPP
and AP.
Interestingly, some important special cases of the above mentioned N P problems are also N P-complete. We
discuss this now because it will be important in the next chapter.
The TSP is called metric (notation ∆TSP) if the direct-distances matrix A = (aij ) satisfies the triangle inequal-
ity, that is
aik + akj ≥ aij (7.4)
for all i, j, k.
The graph of Figure 7.2 is an instance of the ∆TSP (verification of all inequalities is left as an exercise) whereas
that of Figure 7.3 is not (for instance 1 + 2 < 7).
In what follows the word “distance” may have various practical meanings: geographical distance, time, cost, . . .
The TSP is called Euclidean (notation ETSP) if the direct-distances matrix A = (aij ) is the matrix of Euclidean
distances between points in a plane with the Cartesian coordinate system, that is R2 . For instance, given four
points z1 , z2 , z3 , z4 in the x-y plane we may create a direct-distances matrix A = (aij ), where
aij = d(zi , zj )
72
3
d c
1 4
2 2
a b
3
7
d c
1 4
5 2
a b
3
and for a = (x1 , y1 ), b = (x2 , y2 ) the symbol d(a, b) stands for the Euclidean distance, that is
p
d(a, b) = (x1 − x2 )2 + (y1 − y2 )2 .
It follows that every ETSP is ∆TSP. Both ETSP and ∆TSP are proved to be N P-complete [2].
We finish this chapter with an important remark. The hardness of two problems may be very different even if
they are very similar. The table below displays pairs of closely formulated problems of profoundly different hardness:
In P In N P
LP ILP
AP 3AP
AP TSP
MINCUT MAXCUT
SEPP+ LEPP+
In this table SEPP+ (LEPP+) stands for the shortest (longest) elementary path problem in a digraph with
non-negative weights of arcs and 3AP for the 3-dimensional assignment problem.
73
Chapter 8
The aim of this chapter is to present approximation algorithms for N P- complete problems. We will restrict
our attention to ∆TSP where feasible solutions are tours (Hamiltonian cycles) in graphs. So the algorithms will
produce tours in given graphs. The construction is based on a related concept of an Eulerian walk in a multigraph
(see Section 8.2) whose construction is easy. A key feature of the theory presented in this chapter is that we will
provide bounds on the relative error (see Section 8.1) of the considered algorihtms. This makes them different from
heuristics. We therefore start with the definition and explanation of the relative error.
74
3 2
10 5 8
1 3
5 3 4 1 6 7
1
4 2
2 7 4
I1 I2 I3
σ1 σ2 σ3
75
f1
f2
f3
v1 v2
f4
f6 f7
f5 f8
v3 v4
If v1 = vk then the walk is called closed. An Eulerian walk in a multigraph is a walk satisfying the following
properties:
In the multigraph of Figure 8.4 the sequence (v1 , f8 , v4 ) is not a walk, (v1 , f1 , v2 , f2 , v1 , f3 , v2 ) is a walk but
not closed, (v1 , f1 , v2 , f2 , v1 , f3 , v2 , f4 , v1 ) is closed but not Eulerian and
v = (v1 , f1 , v2 , f2 , v1 , f3 , v2 , f4 , v1 , f6 , v4 , f8 , v2 , f7 , v3 , f5 , v1 )
is Eulerian.
A multigraph G is called Eulerian if there is an Eulerian walk in G. Probably the first theorem of graph theory
(motivated by the famous problem of Königsberg bridges) is the following:
We will not prove this theorem here but it is noted that the necessity of (1) is trivial and that of (2) almost
immediate since an Eulerian walk enters and leaves every node every time by different arcs (and uses all arcs),
implying that there must be an even number of arcs incident with each node. The converse implication is slightly
less trivial and is essentially an algorithm for finding an Eulerian walk. We will need to construct Eulerian walks
in examples but they will all be very simple, solvable by inspection and so this part of the proof is skipped.
Thus we assume in what follows that once the conditions of Theorem 8.1 are satisfied an Eulerian walk can be
found. The aim of this section is to devise algorithms which produce tours (as close as possible to optimal tours).
These algorithms construct tours from Eulerian walks and we will now explain how this is done.
Let A = (aij ) be a symmetric n × n matrix with zero diagonal. An Eulerian spanning graph of A (ESG) is
any Eulerian multigraph G = (V, E) with V = {1, . . . , n}. The cost of G is
X
c(G) = aij ,
(i,j)∈E
76
G1 G2 G3 G4
1 2 1 4 2 1 4 2 1 4 2
4
4 4 4
3 1
3 1 3 1
3 1
3 3 3 3
v1 12 v2
9 13
15 8
v3 v4
7
where the summation is carried out for all (possibly multiple) arcs. Hence c(G) is equal to the cost of any Eulerian
walk in G. For example, if
0 4 3
A = 4 0 1
3 1 0
then each of the multigraphs G1 , G2 and G3 in Figure 8.5 is an ESG of A. Note that G4 is not, since it is not
Eulerian. We calculate easily c(G1 ) = 8, c(G2 ) = 14, c(G3 ) = 10.
Given a ∆TSP in the form of a direct-distances matrix A we will consider various ESGs for A. Once we have
selected an ESG we will construct a tour, called the embedded tour, from any Eulerian walk w as follows: Start
with the first node in w and then, in the order of appearance, repeatedly take the first node in w that has not been
taken yet; at the end add the first node.
For example consider the ∆TSP for the graph of Figure 8.6. We observe that in this graph there are three tours
of lengths 41, 42 and 45. So the optimal solution to the ∆TSP has length 41. The corresponding direct-distances
matrix is
0 12 9 8
12 0 15 13
A=
9 15 0 7
8 13 7 0
One possible ESG is displayed in Figure 8.7. It has total length 78. One of many Eulerian walks in this
multigraph (each must be of length 78) is
v = (v1 , e1 , v2 , e2 , v1 , e7 , v4 , e6 , v3 , e3 , v2 , e4 , v3 , e5 , v1 ).
with the embedded tour τ = (v1 , v2 , v4 , v3 , v1 ). We see that c(τ ) = 41, which is optimal as we already know, but
this is rather because we were lucky with the selection of the ESG and w. To see this consider another ESG, of
Figure 8.8, its cost is 56 and for the Eulerian walk (writing node indices only) w = 1213431 the embedded tour is
12341, of cost 42.
77
e1
e2
v1 v2
e3
e4
e5
e7
v3 v4
e6
12
v1 v2
12
9 9
7
v3 v4
7
Theorem 8.2 Let (Kn , (aij )) be an instance of ∆TSP with associated symmetric direct-distances matrix A = (aij ).
If G = (V, E) is an Eulerian spanning graph for A = (aij ) then every tour in Kn embedded in an Eulerian walk in
G has length not exceeding the cost of G.
Proof. Consider an instance I of the ∆TSP with associated symmetric direct-distances matrix A = (aij ), recall
that A is symmetric.
First we prove the generalised triangle inequality, that is
for any j1 , j2 , . . . , js in 1, . . . , n, for s ≥ 3. To do this, start with the RHS and repeatedly use the triangle
78
inequality (7.4) starting from the first two terms:
w = (i1 , α1 , i2 , α2 , i3 , α3 , . . . , in , αn , i1 ),
where (ir , αr , ir+1(mod n) ) is a walk (r = 1, . . . , n) and τ = (i1 , i2 , . . . , in , i1 ) is the embedded tour, where repeated
ij are not listed. Hence using the generalized triangle inequality (8.1) we have
Remark 8.4 The degree of every node in G is twice its degree in T , hence even. G is also connected as it contains a
spanning tree (namely T ) as a sub-graph. Therefore by Theorem 8.1 G is an ESG of A.
Proof. Let T be the minimal spanning tree, G the multigraph and τ the tour, all found by TREE. Let τ ∗ be an
optimal tour and T ∗ any tree obtained from τ ∗ by removing one arc. Hence we have
c(τ ) ≤ c(G)
= 2c(T )
≤ 2c(T ∗ )
≤ 2c(τ ∗ )
where the first relation is by Theorem 8.2, the second follows from the construction of T by TREE, the third by
minimality of T and the fourth because A ≥ 0. By re-arranging we obtain
c(τ ) − c(τ ∗ )
≤1
c(τ ∗ )
which completes the proof.
The next lemma is probably well known but is fully included here because it is essential for the second algorithm
(see below) and has a simple proof.
79
Lemma 8.6 In any graph the number of nodes of odd degree is even.
because every arc is counted in the summation exactly twice. Hence the number of odd terms in the sum is even.
We are ready to formulate the final algorithm. Note that it differs from the previous algorithm only in the
second step.
(1) Find a minimal spanning tree T in (Kn , (aij )) (note: GREEDY may be used).
(2) Take the nodes of odd degrees in T and find a perfect matching M of minimal total weight in the complete
graph induced by these nodes. Let G be the multigraph with arcs from T and M .
Remark 8.8 When the perfect matching M in step (2) is added to T , it will increase the degree of every node with
odd degree by 1 and will not affect the nodes of even degree. Therefore every node of G constructed in step (2)
will have even degree. G is also connected as it contains a spanning tree (namely T ) as a sub-graph. Therefore by
Theorem 8.1 G is an ESG of A.
Proof. Let T be the minimal spanning tree, M the matching, G the multigraph and τ the tour, all found by
Christofides’ algorithm. Let τ ∗ be an optimal tour and T ∗ any tree obtained from τ ∗ by removing one arc. Hence
by Theorem 8.2 we have
c(τ ) ≤ c(G) = c(T ) + c(M ). (8.2)
The inequality
c(T ) ≤ c(T ∗ ) ≤ c(τ ∗ ) (8.3)
But by the generalized triangle inequality (8.1), see also Figure 8.9 (where the zig-zag line corresponds to τ ∗ ),
we have:
c(M1 ) + c(M2 ) ≤ c(τ ∗ ). (8.5)
80
Figure 8.9: To the proof of Theorem 8.9
81
Bibliography
[1] Christos H. Papadimitriou and Kenneth Steiglitz. Combinatorial Optimization: Algorithms and Complexity.
Dover Publications, 1998.
[2] M.R. Garey and D.S. Johnson. Computers and Intractability: A Guide to the theory of NP-completeness. Bell
Laboratories, 1979.
[4] R.K. Ahuja, T.L. Magnanti, and J.B. Orlin. Network Flows: Theory, Algorithms and Applications. Prentice
Hall, 1993.