DDRE F-65/1998
''5(
Metaheuristics - an introduction
by
Magnus Hindsberger
DANISH DEFENCE RESEARCH ESTABLISHMENT
DDRE F-65/1998
Abstract
This report is ment as an introduction for people interested in using metaheuristics for solving
combinatorial optimization problems. Four metaheuristics: Descent Strategies, Simulated Annealing, Tabu Search, and Genetic Algorithms, are covered by giving a brief introduction to how
they work, how they should be implemented, and how they can be improved in various ways.
References for further research in the area have been included.
Keywords: Combinatorial optimization, metaheuristics.
Acknowledgment
I am grateful to dr. tech. Rene Victor Valqui Vidal, Department of Mathematical Modelling,
Technical University of Denmark, and to my fellow researchers at the Danish Defence Research
Establishment for their constructive comments to earlier versions of this paper.
2
Contents
1 Introduction
4
2 Heuristics in general
5
3 Descent Strategies
8
2.1 Why use heuristics? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Properties of heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
6
3.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Multistart techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Simulated Annealing
13
5 Tabu Search
18
6 Genetic Algorithms
22
7 Other issues
27
Bibliography
35
4.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Proof of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Variations and improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.2 Advanced operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7.1 Making the right topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7.2 Designing and reporting on heuristic experiments . . . . . . . . . . . . . . . . . . 28
7.3 Pointers of further research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3
Chapter 1
Introduction
Within many Operational Research and Management Science applications you will nd the socalled combinatorial optimization problems. Most of them have the common property that they
are hard to solve with the methods available, since they belong to the NP-hard class of problems
(for de nition see Garey and Johnson [2]). NP-hard problems in general cannot be solved to
optimality in an ecient way. This is because the computation time will grow exponentially
with the size of such optimization problems. Unfortunately in most practical applications the
size of the problem is so large that the optimal solution cannot be found within the acceptable
or available time.
Those problems instead have to be solved using heuristic methods. A heuristic (short of heuristic
method or algorithm) has by Nicholson [4] been described as a procedure "
for solving
problems by an intuitive approach in which the structure of the problem can be interpreted
and exploited intelligently to obtain a reasonable solution". In other words a heuristic is an
approximative method where common sense and thumb-rules are used to construct and improve
possible solutions. The idea is to within reasonable time to obtain an acceptable solution.
Heuristics have always been used for solving speci c problems, but some 20 years ago research
was oriented towards the search for generic principles for making heuristics capable of solving a
large range of problems. These heuristic principles, which was given the name "meta-heuristic"
by Glover in 1986, describes a heuristic superimposed upon another. The metaheuristics therefore describes a high level procedure for developing heuristics solving di erent optimization
problems. The purpose of this report is to introduce the reader to this approach to problem
solving.
This report will start by introducing the basics about heuristics { e.g. where they are used and
which properties they should have. Next come the chapters 3, 4, 5, and 6, where some of the
best known metaheuristics: Descent Strategies, Simulated Annealing, Tabu Search, and Genetic
Algorithms are covered. The last chapter will address some important things to note, when
designing and implementing a metaheuristic method.
In addition an extensive bibliography has been included, where references for further research
can be found.
:::
4
Chapter 2
Heuristics in general
As mentioned above does the name heuristic denote an approximative method for solving a
problem. For larger problems they are generally faster than the exact methods and returns
often, but not necessary every time, an acceptable but seldom the optimal solution. This
chapter gives an short introduction to heuristics { mostly based on the article by Siver et al. [4].
4
1
3
4
4
3
5
Figure 2.1: Example of tree structure
To get an idea of how a heuristic works, the tree structure shown in gure 2.1 can be viewed.
It consists of 5 leafs (the 5 circles in the upper row) and 3 nodes, where the one in the bottom
is denoted the root. The elements are connected by branches as shown, each with a weight c .
It is wanted to nd the route from the root to a leaf along the branches, where the total weight
(i.e. the sum of the used c 's) is highest.
One way of doing this exact could be to write an algorithm, which would calculate the weight
of all the possible routes and return the one with the highest value.
But in some situations the tree might be so big that the exact method above (or any other)
cannot be used, since the time needed to calculate the weights for all the routes is bigger than
the time available to solve the problem. In such cases a heuristic solution to the problem is the
only way out. An intuitive possibility is at each note to pick the branch reaching up with the
highest weight. This particular search method, which would normally be referred to as a greedy
algorithm, is very similar to one of the Descent strategies discussed in the next chapter. Using
this algorithm on the tree structure of gure 2.1, the optimal value 9 is found by adding the
weights of the marked branches.
If looking at the problem of minimizing the weight, the branch with the smallest weight reaching
up at the visited nodes should be chosen instead. But in this example, the algorithm returns
the value 7, which is greater than the optimal value of 6. In other examples even the worst
i
i
5
possible solution might be returned. In generally, though, this algorithm will return a good (i.e.
satisfying), but not necessarily the optimal solution to a problem.
Another heuristic algorithm that could be used on the example too, is to pick randomly three
routes, calculate the total weight of each and return the one with the highest value (or lowest if
minimizing). This algorithm would also generally return a good solution.
2.1
Why use heuristics?
Some reasons to use a heuristic method have already been mentioned, but not all. To sum up,
the most important cases where heuristics should be used are shown below:
The problems where exact algorithms to solve them still are unknown. In those cases
heuristic methods of solution is the only way out.
Those problems where the computation time of the algorithm takes too long time. Many
combinatorial optimization problems belongs to the NP-hard class, where the computation
time will grow exponential with the size of the problem. The size could for instance be
the number of nodes in a tree, number of towns in a "Traveling Salesman" problem, etc.
So for large NP-hard problems, due to the long computation times, exact algorithms do
not exist in practice.
The exact algorithms which initially requires a possible solution and subsequently tries to
improve the one. These algorithms will generally terminate faster, if a good initial solution
is given. A good initial solution might be determined by heuristic methods. An example
is the "Northwest Corner principle", which is used for nding a good initial solution to
the classical transportation problem.
Branch and Bound methods requires determination of good upper/lower bounds for each
node. Heuristic methods might be used for determination of these bounds.
Though heuristics can be useful in many cases it has to be remembered that heuristic methods
of solution are only a supplement to the exact algorithms. Many practical problems require
exact solutions and not just solutions close to the global optimum. And a heuristic algorithm is
not necessary easy to develop as many have experienced.
2.2
Properties of heuristics
When developing a heuristic algorithm, the following should be kept in mind. The quality of all
heuristics is not the same. In general the quality depends upon the following properties, which
a heuristic should process:
return good solutions with a high probability.
similar returning poor (i.e. not satisfying) solutions with a small probability.
fast termination, at least faster than any known exact algorithm used on the same problem.
the heuristic should be easy to understand. Many people prefer an algorithm, "only"
giving a satisfying solution, where they understand the principles behind instead of a
complex algorithm that magically returns the exact solution without any possibility of
understanding.
A heuristic is designed for solving a speci c problem in mind, and can for that reason seldom
be used on other problems or if new constrains are added. But as mentioned much focus are
now on a group of heuristic principles, the metaheuristics, which can provide guidance to how
to implement heuristics for solving many di erent problems.
Hansen [5] characterized metaheuristics as either continuous local search, multistart local search,
or recombination-based.
In the local search based techniques a solution is repeatedly replaced by another very similar
solution, which is said to belong to a de ned neighbourhood of the rst solution. Thus these
methods are also described as neighbourhood based. The continuous local search methods stop
when the time available has been used, while the multistart methods stop when some criteria
concerning the solution have been meet, starting over from a brand new solution a number of
times. Both variants will return the best solution visited during execution of the algorithm.
The recombination based methods will combine two or more solutions into new solutions, sharing
some caracteristics with both of (or all) the originals. This is done using a de ned generation
mechanism, which should favor solutions with the best caracteristics (i.e. those who improve
the objective function at most).
The metaheuristics do not work on the solution space of the speci c problems but on a topology
of those, since some kind of ordering of the soluions is needed.
The topology of the solution space is de ned by the chosen neighbourhood or the generation
mechanism. The user must, for the problem to be solved, nd this possible transformations
between solutions. A topology can usually be de ned in many ways, and some are better suited
than others. The metaheuristics, how they work, and how a good topology of the solution space
should be constructed will be described in the following chapters.
Since the topologies not always can take advantage of the structure of the problem as much
as the problem speci c heuristics, you can expect the computation time or the quality of the
solution of a metaheuristic method to be inferior to that of a problem speci c heuristic. So
why should anyone want to use a metaheuristic approach, if he can make a better problem
speci c heuristic? The metaheuristics have according to Pirlot [6] some advantages over the
problem speci c techniques in terms of: transparency, easiness of implementation, suitability
for experimentation, always produce usable solutions, and exibility when for instance more
constraints are added to the problem.
This is why much research is done in metaheuristics and why they are used more and more
common.
Chapter 3
Descent Strategies
The Descent Strategies (DS) are the most simple neighbourhood based methods. DS are iterative algorithms replacing the current solution only with solutions, which improve the objective
function value of the problem. This is done until a local optimum has been found. The DS
are important since they form the basic for other neighbourhood based metaheuristics. This
chapter should give the reader a clear comprehension of how the strategies work.
Look at a combinatorial optimization problem with a well-de ned solution space . Each of the
possible solutions 2
=( 1 2
) can be evaluated by an objective function (or cost
function) denoted ( ). In addition each solution also has a de ned neighbourhood denoted
( ), which could be a result of adjusting each one by one. In the following it is assumed,
unless otherwise stated, that the objective function value should be minimized.
Graphically, such a problem can be visualized as trying to nd the lowest point in a complex and
often highly contoured landscape. An optimization algorithm can thus be viewed as an explorer
wandering through valleys and across hilltops searching for the topological lowest point. The
grid location where the explorer is, can be likened to the current solution , and the topological
height of this location to the result of the evaluation function, ( ). The neighbourhood ( )
can be de ned as the grid locations found in a one meter distance of in each of the four corners
of earth respectively.
S
x
S; x
x ; x ; :::; xn
C x
x
N x
xi
x
C x
N x
x
C
A
B
D
E
Figure 3.1: The "landscape" of a unknown function, de ned by the topology of
S
Staying in the explorer analogy, two Descent Strategies will be introduced below:
1. the explorer picks randomly one of the four neighbourhood locations, evaluates the height,
and if the height is lower than the one in his current position, he walks there and continues
from this new location picking one of the new neighbours randomly. Otherwise he remains
at his old location choosing a neighbour from the neighbourhood again. If he has not
8
changed location for a number of iterations, he is possible trapped in one of the local
extremes (which might be the global one), and the algorithm is stopped. This strategy is
called Random Search.
2. the explorer evaluates the height of each of the four neighbours. He walks to the best (i.e.
lowest) of them and continues from there, as long as it is lower located than the position
he came from. Otherwise he is in one of the local extremes, and the algorithm is stopped.
This strategy, which was sketched in the previous chapter, is called Steepest Descent (or
in case of maximizing, Steepest Ascent). This type of algorithm is often called a Greedy
Algorithm.
As it can be noted, the explorer in a DS algorithm only accepts downhill moves. This severely
decreases the probability of nding the global optimal solution as seen in gure 3.1. The explorer
is standing at point A, with in this example only two neighbours, B and C . He will either pick
one of them randomly and evaluate the height of this or he will evaluate both B and C . But he
will never choose C as the next current solution, since it is higher located than A. Instead he
will walk downhill toward B and eventually reach the local minimum point D and thus never
come to the global minimum in E . But by rerunning the algorithm several times with di erent
start locations, the chance of nding the optimal, or at least a better, solution is improved
considerable. Using DS in this way will be described later.
The DS algorithms are useful, since they are both simple and generally applicable as long as
a topology de ning the neighbourhood and an evaluation function can be constructed for the
problem. A pseudo-code of the Greedy Algorithm is shown in gure 3.2.
procedure Greedy Algorithm
begin
choose initial solution, x 2 S
repeat
choose neighbour, y 2 N (x)
C = C (y) C (x)
if C 0 then
x := y
C := C (y)
endif
until stop
end
Figure 3.2: Pseudo-code example of the Greedy Algorithm
3.1 Implementation
Three things are of importance when implementing local search based methods. The de nition
of the neighbourhood, N (x), the objective function, C (x), to evaluate the neighbours, and the
stopping criterion. How they can be implemented is shown in Example 3.1 below.
Example 3.1 (The Traveling Salesman Problem) The famous Traveling Salesman Problem, TSP, involves the design of a minimum cost path for a salesman, which has to visit a
number of cities. Each city must be visited exactly once and the path has to end in the same city
as it started.
In this case the problem has 6 cities numbered 0 through 5, where the salesman has to start from
(and thus end in) city no. 0.
A solution can be expressed as:
City no.
0 1 2 3 4 5
Succession no. 0 1 3* 2 4* 5
where the Succession number denotes the number in the succession where the city with City
number is visited. E.g. in the solution above, city 2 is visited as number 3, when the salesman
departs from city 0 as illustrated in solution A in gure 3.3.
Solution A
Solution B
1
1
11
00
11
00
Solution C
1
11
00
11
00
11
00
11
00
00
011
00
11
13
0
0
1
00
011
00
11
11 3
00
00
11
00
011
00
11
11 3
00
00
11
11
00
200
11
11 5
00
11
00
11
00
200
11
11 5
00
11
00
11
00
200
11
11 5
00
11
00
11
00
00
11
11
00
00
11
4
11
00
00
11
4
4
Figure 3.3: Possible solutions to the Traveling Salesman Problem
The neighbourhood:
A neighbourhood of the TSP can be de ned in several ways:
as moving one city in the sequence to another place in the sequence, e.g city 1 is visited
last instead of rst and the other cities are thus visited one "step" earlier.
as an exchange of two cities in the succession they are visited, for instance the two marked
with a *, resulting in the B solution of gure 3.3, where city 2 now is visited 4th and
city 4 is visited 3rd. This kind of neighbourhood is traditionally denoted 2-OPT in the
litterature, sometimes with the restriction of not exchanging two cities next to each other
in the succession.
In the following of the example the neighbourhood is the latter of the two mentioned with no
restriction.
Note that in this example the column for city 0 must be unchanged, since the city to start from
is xed. The solution space S is then given by the number of permutations of city 1 through 5,
resulting in 5 4 3 2 1 = 5! = 120 possible solutions.
The number of neighbours to each solution is 4 3 2 1 = 4! = 24. For a larger number of cities,
n, the number of neighbours to each solution, corresponding to (n 1)!, would be astronomic,
and the Steepest Descent strategy unusable.
Ob jective cost: If the cost of traveling between two cities is assumed to be proportional to the
distance, the cost of each solution can be calculated from the distance chart below.
1
0
2
3
4
5
2.0 2.0 3.5 3.5 4.0
1
3.5 2.0 4.0 3.5
2
4.0 2.0 3.5
3
3.5 2.0
4
2.0
Figure 3.4: The distance between the cities in the TSP
Assuming the current solution is A, the objective cost, C (A), of this solution is found as:
C (A) = d(0; 1) + d(1; 3) + d(3; 2) + d(2; 4) + d(4; 5) + d(5; 1) = 16
where d(0,1) denotes the distance between city 0 and city 1. Now a random solution, B, from
the de ned neighbourhood is picked. C (B ) of this is calculated as:
C (B ) = d(0; 1) + d(1; 3) + d(3; 4) + d(4; 2) + d(2; 5) + d(5; 1) = 17
Since C (B ) > C (A) the move to B is rejected and the search from A is continued. Another
random neighbour is picked from the neighbourhood, this time C from gure 3.3 by exchanging
the number in the succession city 2 and 5 are visited. Evaluating the objective function as before,
C (C ) is found as:
C (C ) = d(0; 1) + d(1; 3) + d(3; 5) + d(5; 4) + d(4; 2) + d(2; 1) = 12
Now a better solution is found and C can be assigned as the current solution. So in the next
iteration the neighbourhood of this is searched.
Stopping criterion: It is chosen to continue as above until 20 new neighbours in a row have
been rejected, since it is then assumed that a local minimum point has been reached. Other
stopping criteria will be discussed in the next chapters.
2
Whether the Greedy Algorithm or Steepest Descent should be used depends upon the problem.
The Greedy Algorithm requires only C (x) to be calculated once in each iteration, but has
in general more iterations than Steepest Descent, since the latter will go more straight for
the "closest" minimum point. If the size of the de ned neighbourhood is big compared to
the computation time of the evaluation function, Greedy Algorithms would normally be most
ecient. On the other hand will Steepest Decent generally do best with small neighbourhoods.
The two strategies can be combined, searching a selected sub-neighbourhood, which in some
cases will make the implementation more ecient.
For both methods the computation time of the procedures nding neighbours and calculating
C (x) should be optimized, since these procedures will be called many times during a single run.
3.2
Multistart techniques
As mentioned above performing multiple descent runs returning the best value achieved will
normally improve the quality of the results. Figure 3.5 illustrates the idea. Here three di erent
starting points have been selected, and one of these x3 leads to the global minimum point, B .
Multistart techniques can be implemented in two di erent ways. Either you can run a certain
number of either descent runs starting at randomly selected solutions or you can hope for
the solution space, S , has a structure which can be exploited (the so-called global convexity
phenomenon explained in chapter 7), thus making it possible to say in which parts of S the
optimal solution most likely is in and then select your starting points in this areas.
In Hindsberger [30] the quality of the solutions when using multiple random initial solutions was
better than the quality of solutions achieved using Simulated Annealing and only sligthly inferior
to Tabu Search. This approach is especially appealing if the increase in speed of parallelization
is wanted, since this is easily done.
x1
x2
A
x3
B
Figure 3.5: Multistart LS improves the probability of nding the global minimum point
But using random selection has its weaknesses. The number of Descent runs needed to obtain
stable solutions, i.e. solutions with low standard deviation, in some problems grow fast with the
problem size (Boese et al. [9]).
Boese et al. [9] is also a good introduction to the more advanced techniques. It describes how to
analyse the global structure of the cost surface, C (S ) as well as introducing a simple multistart
method. Feo and Resende [14] and Tsubakitani and Evans [27] cover two di erent methods,
which both have achieved very good results on several problems.
Chapter 4
Simulated Annealing
This method, which was introduced by Kirkpatrick et al. [23] and independently by C erny [10],
is inspired by the annealing process of solids and is thus known as Simulated Annealing (SA).
Annealing is a process where a material is heated to a very high temperature where all particles
is moving randomly, without interaction between each other. Now the material is slowly cooled
in order to obtain a highly ordered structure of the material with an almost minimum energy
state when it freezes. SA can be introduced by showing the analogy between the annealing
process and nding the minimum cost in a combinatorial optimization problem, but the easiest
way of introducing SA is to show, that it is an extension of the Greedy Algorithm, which was
introduced in the previous chapter.
The Descent methods are easy to understand and, depending upon the problem, are normally
quite simple to implement, but have their weaknesses in being unable to overcome local minimum points. Simulated Annealing (SA) can be viewed as an advanced version of the Greedy
Algorithm, which with the introduction of a stochastic element, to some degree allows uphill
moves. It now becomes possible to escape from local minimum points, improving the chance of
nding a better { including the optimal { solution.
Like in the Greedy Algorithm, downhill moves are always accepted. The new part { the probability of moving uphill to a neighbour { is controlled by the acceptance function, formulated
as ( k ), which denotes the probability of accepting the neighbour resulting in a
change of ( ). The probability depends as it can be seen upon the control parameter k , called
the temperature. In the beginning the temperature is so high, that almost every uphill move
is allowed, but as time goes the temperature is lowered and the probability of moving uphill is
decreased until practically being zero. Because of the dependence upon a neighbour solution
being only slightly worse than the existing would be allowed with a greater probability than a
neighbour resulting in a major increase of ( ).
exp
C=T
C
C x
T
C
C x
Example 4.1 (The Traveling Salesman Problem - continued) This is best illustrated by
resuming the TSP example from chapter 2. Assuming the temperature Tk = 100 the probability
of accepting solution B, when the current solution is A, can be found as:
e
C
Tk
=
e
1
100
= 0 99
:
So a random number between 0 and 1 is generated, and if it is below 0.99, solution B is assigned
as the "current" solution. If the temperature instead is Tk = 1, the probability of accept will
instead be as low as about 37 %.
2
13
Homogeneous Simulated Annealing
procedure
begin
choose initial solution, x 2 S
initialize T0 ; Lk
k := 0
while not stop do
counter := 0
repeat
choose neighbour, y 2 N (x)
C = C (y) C (x)
if C 0 then
x := y
elseif exp( C=Tk ) > random[0; 1] then
x := y
endif
counter := counter + 1
k := k + 1
until counter Lk
update Tk ; Lk
end
end
Figure 4.1: Pseudo-code example of homogeneous Simulated Annealing
Depending upon how often the temperature is lowered you are talking about inhomogeneous
respectively homogeneous SA. A pseudoalgorithm of the latter is showed in gure 4.1.
The inhomogeneous version is a special case of the homogeneous, where the temperature is
lowered each iteration corresponding to Lk = 1; 8k.
The mentioned stochastic element comes from the function random[0; 1], which produces uniformly distributed values between 0 and 1. Because of this, the algorithm, like the Greedy Algorithm, is nondeterministic and might in each run return di erent solutions even when started
with the same initial solution to the same problem.
When evaluating SA with respect to the four properties that heuristics should process, it will be
noted that good and poor solutions can be obtained with respectively high and low probabilities,
but that depends upon the problem.
Returning to the explorer analogy, the explorer now has a limited amount of energy, which allow
him to move uphill, but as time goes he will become tired and be less and less eager to move
uphill. As his energy-resources reaches zero, he will only walk downhill. If the landscape he is
exploring is highly contoured with a lot of valleys of di erent depth, the chance that he should
end in one of the deepest is not that high, since he easily could be trapped in a "non-optimal"
valley without the energy to climb the hills around him. It nevertheless generally returns much
better solutions than a single Descent run, and though it takes considerable more time, it turns
out that SA is well suited to solve many problems.
And the method is easy both to understand and implement, since it is only a quite simple
extension of the Greedy Algorithm.
4.1 Implementation
When implementing SA in order to solve a problem, some problem-speci c and some generic
decisions has to be made.
The problem-speci c decisions are:
Neighbourhood, how should the neighbourhood be de ned in the solution space and how
should it be searched.
Objective function, how can a solution be evaluated.
Initial Solution, should it be picked randomly or systematic.
They are all similar to the ones described in the TSP example in the previous chapter, and will
therefore not be discussed here.
The generic decisions to make are all independent of the actual problem, and concern, except
for the stopping criteria, all the cooling strategy. A cooling strategy consists of:
The initial temperature, T0.
A cooling function to reduce the temperature.
For the homogeneous version only { the number of iterations between each reduction in
temperature, Lk .
The initial temperature T0 should be chosen so that almost every generated neighbour is accepted
initially, which can be formulated as:
exp( C=T0 ) ' 1
(4.1)
where the left hand side as mentioned denotes the probability of accepting the neighbour, which
results in a C change in the objective function.
Several cooling strategies exist. The most common is the so-called geometric cooling scheme,
which involves a geometric temperature function formulated as:
Tk+1 = Tk
;
0< <1
(4.2)
Practice has shown that the value of should be high, e.g. in the interval 0:8 0:99,
in order to make SA run eciently. Higher values of gives SA a better chance to overcome
local minimums but will on the other hand increase the computation time, since the cooling now
takes longer. The similar situation applies to the choice of Lk . High values usually improves the
quality of the results, but it takes longer to compute.
There are no rules for selecting and Lk , so the best values must be found experimentally.
Experience shows that for a xed time several shorter runs return a better result than a single
long run.
Another cooling strategy is the logarithmic cooling scheme formulated as:
1 + log(k) ; k = 1; 2; :::
where k is increased after the usual Lk steps and is a suciently large constant.
Tk =
(4.3)
Additional cooling strategies are described in Collin et al. [11] and Vidal [28], but generally
the performance depends more upon the choice of parameters like and Lk than the chosen
strategy, and that is why no other strategies are covered in this report.
As stopping criteria there are three obvious possibilities. You can choose to stop when:
the value of C (x) has not been improved more than 1 the last K1 steps.
the percentage of accepted functions is below 2 for the last K2 steps.
the total number of iterations has reached K3 .
The best values of (usually between 0 and 5 %), and K should be found experimentally.
Alternatively the above criteria can be combined. The rst two stopping rules both correspond
to the "frozen" situation where a local minimum has been reached and the temperature is so
low, that it cannot be escaped. The last one is the easiest to implement, but you have to be sure
that the temperature has been lowered so much that the "frozen" state has occurred. Another
stopping rule much equal to the latter is to stop the computation after a xed time, which could
be the time available to do the computation.
4.2 Proof of convergence
Some interesting theoretical results about SA has been achieved. The sequence of accepted
solutions can be viewed as an inhomogeneous Markov chain, since the probability of moving to
another solution, state y, only depends upon the current solution, state x, and the transition
probability matrix PT depends upon the current temperature.
As shown in Laarhoven and Aarts [24] and Pirlot [6], Hajek used Markov theory to prove that
SA will with a probability of 1 end in the global minimum of S if the following conditions are
met:
The neighbourhood is symmetric, e.g. y 2 N (x) () x 2 N (y).
The solution space, S , is coherent, so every solution can be reached within a nite number
of iterations.
The neighbourhood is weakly reversible. That is for all x; y; T it is possible to reach y from
x at the temperature, T , if and only if x can be reached from y at the same temperature.
The temperature is cooled using the logarithmic cooling scheme (4.3).
But one thing is theory { another thing is reality. In order to ensure the convergence an in nite
number of iterations has to be made, since the temperature never actual will reach zero.
4.3 Variations and improvements
The SA algorithm can be improved by other means than changing parameters. Another possibility is changing the neighbourhood generating function, so another neighbourhood is de ned.
An unfavorable designed neighbourhood can severe degrade the performance of the algorithm.
Some computation tricks can be used to improve performance too. Random numbers can be
calculated from beforehand, being stored in a vector containing for instance 10.000 numbers.
The access of a number in a array is generally much faster than calculating a new random
number based upon the current.
On computers the exp() function calls are usually slow. Replacing this with an approximation
can save much computation time. According to [6] about 31 of the time can be saved when
solving the Graph Partitioning problem if using a precomputed table of exponentials. Another
possibility, instead of using the exp() function in the acceptance function, is to use the rst terms
of the Taylor expansion of the exponential function around zero, which also on most systems will
take less computation time and ensure faster convergence at lower temperatures. When P (x)
denotes the probability of accepting solution x and C is the change in the objective function
value, which comes from accepting x, the standard acceptance function can be written as (4.4)
and the Taylor acceptance function as (4.5).
(
; for C < 0
P (x) = 1
C
exp Tk ; for C 0
(
P (x) = 1
max 0; 1
; for C < 0
C ; for C 0
Tk
(4.4)
(4.5)
In Borges and Vidal [29] the use of the Taylor acceptance function proved to be superior to the
implementation of the standard acceptance function when assigning channels in cellular mobile
telecommunication systems.
Several variations of SA also exist. Some of them, which might be superior to SA in some cases,
are:
Locally optimized SA. In this the best solution of a randomly selected subneighbourhood is
picked instead of just one random neighbour of the total neighbourhood. This can improve
both the computation time and the quality of the solutions as shown on page 25 in Vidal
[28].
The Threshold Accepting method. Here a neighbour x is accepted if C (x) is lower than the
threshold parameter. The threshold is decreased with time making more and more worse
solutions prohibited. It can be studied in Dueck and Scheuer [13].
SA with memory. The proof of convergence and thus optimality does not apply to real
world usage because of nite running time. So the solution SA freezes in is not neccesary
the optimal since it might in some cases during the run have visited superior solutions.
So storing the best found solution might improve the quality as in Hindsberger [30], much
depending upon the look of the cost surface and the cooling speed.
Chapter 5
Tabu Search
Tabu Search (TS) is another of the best known general heuristic methods. The framework was
introduced in 1986 by Glover [15] and Hansen [21] and TS is thus compared to SA somewhat
younger.
TS can like SA be seen as a improved Descent Strategy, but unlike SA this time build upon the
Steepest Descent strategy. TS utilizes exible memory to remember the previous steps taken
(they will be designated as tabu) and will choose other steps in order to exploit new parts of the
solution space by taking advantage of history. It is important to note that the same step does
not have to be between the same two solutions. If solution A from the TSP example back in
gure 3.3 is a result of exchanging the number in the succession city 1 and 2 are visited and the
next step is to solution C, where city 2 and 5 are exchanged in the sequence, then the step where
city 1 and 2 are exchanged will not bring the situation back to the solution prior to A. But by
remembering some information about the previous steps, TS can make sure that the algorithm
does not alternate between the same two solutions, and if implemented properly assure that no
cycle between the same solutions occur.
In gure 5.1 a pseudocode of TS is shown. The notation is unchanged from before, except for
the neighbourhood, which now is denoted N (x; k), since it now also describes, which solutions
at time k are tabu.
A TS algorithm is not always stochastic as SA. If all the neighbourhood of each solution is
searched, corresponding to the situation V = N (x; k) in the Steepest Descent strategy, where
the best possible solution is chosen, the algorithm is deterministic. But as mentioned in the TSP
example, the number of neighbours can be so huge, that calculating the cost of all of them will
be unpractical. Instead a smaller part of the entire neighbourhood should be searched as in the
Locally optimized SA version of SA. Which part of the neighbourhood to be searched can either
be chosen using some speci c rules or random selection { making the algorithm deterministic or
stochastic respectively.
5.1 Implementation
Like SA, it is quite easy to implement TS, if you already have implemented a Descent algorithm.
The procedure for calculating ( ) is unchanged and the one for nding ( ) is easily altered
to include the tabu-classi ed elements.
The new, and the hardest part, is the implementation of memory. An implementation of TS
always includes a short term memory of the steps taken most recently. This, also called the
recency memory, is used for an intensi cation strategy searching for local minima while still being
able to overcome those. In addition a longer term memory can be implemented, remembering
C x
N x
18
procedure Tabu Search
begin
choose initial solution, x 2 S
x := x
C := C (x)
k := 0
choose V N (x; k)
y := minfC (y) j y 2 V g
while not stop do
x := y
if C (x) C then
x := x
C := C (x )
endif
k := k + 1
choose V N (x; k)
y := minfC (y) j y 2 V g
end
end
Figure 5.1: Pseudo-code of a Tabu Search procedure
the number of times each step has been taken. This is usually denoted the frequency memory
and is used as an diversi cation strategy to prevent the same steps from being chosen over and
over again, as shown in the example below.
Example 5.1 (The Traveling Salesman Problem - continued) How TS and especially
the memory can be implemented is best done by returning to the TSP example from chapter
3. The solution space was the number of permutations of 5 numbers, and a neighbour to a
solution was the exchange of two cities in the succession the cities were visited in. A common
implementation of memory when using a neighbourhood de nition as the one from Example 3.1,
is using a n n matrix. In this example n = 5 for reasons that should become clear later on.
In gure 5.2 the values assigned to some of the data structures of the TS algorithm during two
iterations are shown. The values will be explained below.
The rst part is the memory matrix. The upper triangular matrix of this is the recency memory. For iteration 10 it can be seen that the steps, which are tabu during this iteration, are:
Exchanging city 1 and 2 is tabu the next 3 iterations, exchanging 2 and 5 is tabu the next 2
iterations, etc. The frequency memory is stored in the lower triangular matrix. For iteration 10
it can be seen that city 1 and 2 has been exchanged once, city 1 and 4 twice, etc. for a total of
9 exchanges (since the current iteration are number 10).
The current solution, corresponding to solution A of gure 3.3, is stored as described in example
3.1, with city 0 left out, since it cannot be exchanged with the others. The objective function
value of this solution is 16, but as it can be noted, a better solution, with a value of 15, has been
found earlier (this solution is stored too, but is not pictured here).
A neighbourhood of three neighbours has been generated. Neighbour 1 is exchanging city 2 and
5 resulting in an objective function value of 12. The T denotes, that this step is Tabu, as it can
be seen in the recency memory. The other two neighbours is swapping city 2 with 4 and city 3
with 5 resulting in an objective value of 17 respective 18.
Which of the neighbours should be chosen? If a step is tabu, it cannot be chosen unless a special
condition, called an aspiration criterion, occurs. Some common criteria will be discussed below.
Iteration 10
Memory:
1
2
3
4
5
2
5
1
2
3
4
5
2
1
2
1
2
Recency
1
3
1
3
4
Memory:
Recency
1
2
Iteration 11
3
1
3
3
4
2
2
5
1
3
3
Frequency
Frequency
Current solution:
Current solution:
City
1
2
3
4
5
City
1
2
3
4
5
Seq.
1
3
2
4
5
Seq.
1
5
2
4
3
Objective value: 16
Objective value: 12
Best Objective value: 15
Best Objective value: 12
Neighbourhood:
Neighbourhood:
Swap
Obj. Value
1-3
15
17
2-4
15
18
3-4
16
Swap
Obj. Value
2-5
12
2-4
3-5
T
Figure 5.2: The matrix-implementation of Tabu Search Memory
The number of iterations a step is tabu, which will be denoted the tabulength, can be constant
like above, where a step is tabu for three iterations. Another option is to let the time vary between
two bounds, either randomly or systematically. The latter dynamic form will normally prevent
the algorithm from cycling between the same solutions.
The most common aspiration criterion is to accept a tabu step if it results in a better solution
than the best found until then. Another case where a tabu step has to be chosen, is when all
the steps in the neighbourhood are tabu. Then the step with the oldest tabu-restriction should be
chosen. Other useful criteria can be found in [16].
Because of the aspiration criterion mentioned it is chosen to swap city 2 and 5 resulting in
solution C from gure 3.3. In iteration 11 the data structures have been updated as shown.
Note than only two steps are tabu now, since a tabu step was chosen last. But since none of
the solutions in the neighbourhood are tabu this time, there will be three tabu steps again from
iteration 12.
Frequency memory is usually not as strict as the recency memory. If implemented at all it is
normally only used when no solutions are found with a better objective function value than the
current best. To the objective function values of the neigbours a penalty, depending upon the
frequency of the steps until now, are added. E.g the penalty in this example could correspond to
the frequency of the steps. The adjusted objective values for the three neighbours are then 15,
16, and 19, since swapping 3 and 4 has been done 3 times until now. The neighbour with the
smallest adjusted value should be chosen, since it most common will take the algorithm to the
least explored part of the solution space.
As stopping criterion you can continue for a xed number of iterations or until no change of x*
or c* has occurred for a number of iterations.
2
Because of the memory will TS in general result in very good solutions for most problems. It
may be a bit more complicated to implement than SA, but is still quite easy to implement, if
procedures for neighbourhood generation and calculation of objective cost either exist or easily
can be written. Possible ways to improve the quality of the solutions will be discused below.
5.2 Improvements
The quality of the results of TS is like SA much dependend upon the chosen parameters. So
the optimal values of the tabulength and the penalty to apply for diversi cation must be found
by testing. In general a highly contured cost surface requires a very long tabulist to overcoming
the local minima.
In [16] Glover desribes several possible improvements of TS. He divides the re nements in three
groups: tactical, technical, and computational. A tactical improvement address the actual
implementation of the problem. Most important, like in any other LS-based heuristics, the
de ned neighbourhood must be good. The technical improvements do not directly address the
problems to be solved. Selecting how many and which neighbours to be evaluated, the tabulist
size, etc. are considered technical improvements. Varying the tabulist size using some transition
probabilities and clearing all taburestrictions when a new better value of x is found in order to
unhindered search from this new solution are other technical re nements.
As computationel improvements parallelization counts. As does proper use of already known
information. One good example is the memory matrix implementation of gure 5.2. Instead of
storing the number of iterations a move is tabu and decreasing this number each iteration until
zero it would be better to store the iteration number the move is no longer tabu and then have
an iteration counter. Refreshing of the matrix is no longer needed since you can test whether
the value in the matrix is higher than the current iteration number to see if it is tabu.
Chapter 6
Genetic Algorithms
This heuristic, which was introduced by Holland [22] in 1975, is very di erent from the earlier
desribed methods being a recombination based method. Genetic Algorithms, GA, is as expected
inspired by the genetic evolution theory. The method is stochastic and works with several
solutions at a time. By using the information about current solutions and the objective function
values of those GA creates new and hopefully better solutions. The method is best explained
using a small example.
Example 6.1 (The Traveling Salesman Problem - continued) GA is implemented to
work on an alphabet thus representing the solutions as strings of the alphabetic caracters. In this
TSP example the solutions used by the GA is encoded as a string listing the cities in the order
they are visited - thus 013425 represents the solution where city 1 is visited after city 0, city 3
after city 1, etc. to end with city 5 before returning to city 0.
The algorithm starts with the set of 4 randomly generated solutions shown at the next page.
In GA terms the set of solutions is called the population, the individual solutions are denoted
chromosomes while the single caracters in the solutions are genes.
Initial solutions:
No.
Solution
Fitness
Prob. of select
1
1
3
4
2
5
17
0.233
2
2
4
5
3
1
12
0.330
3
2
3
4
1
5
21
0.189
4
1
3
2
4
5
16
0.248
For each initial solution, a tness value is calculated. In GA terms this represents the value of
the objective function, which in this case should be minimized.
Reproduction: The information held by this 4 solutions shall now be recombined into new
solutions. This is called reproduction. The rst phase of the reproduction is the mating or
selection phase. During this phase 8 solutions (one may be selected more than once) are randomly
selected from the initial population making up four pairs. The probability of selecting a solution
for mating depends upon its tness value. If the problem had been a maximizing problem, a way
of implementing the dependence could be to select the solution i with the tness value f with a
i
22
Reproduction:
Crossover
Selection
1
3
4
2
5
2
4
5
3
1
2
3
4
1
5
2
4
5
3
1
1
3
2
4
5
1
3
4
2
5
2
4
5
3
1
1
3
2
4
Mutation
5
3
4
2
1
2
4
3
1
5
1
3
2
4
5
2
4
5
3
1
4
2
3
1
5
5
probability of Pi where Pi can be found as:
Pi =
Pf f
i
i i
(6.1)
When instead as here minimization is wanted the formula is:
Pi =
1
fi
P
1
i fi
(6.2)
So in this case solution no. 1 should be selected with a probability of 23%. Doing it this way it
can be seen, that the probability of selecting the best solution, no. 2, is almost twice as large as
that of the most inferior, solution no. 3.
The next phase of the reproduction is crossover. Here information of the two solutions in a
mated pair is mixed. A simple way of doing this is to select a sequence of genes in the rst
solution and x them. The missing genes are then lled in using the order they come in the
second solution. The crossover elements are selected randomly - but sometimes no ones should
be selected, keeping the rst of the parent solutions. This is what happened for the fourth pair.
The nal part, before the new generation of the population is ready, is mutation. For each
solutions there should be a small chance that some genes are changed. In the example above
this is what happened for the o spring solution of the second pair after the crossover. Three
genes were selected and swapped randomly, which help diverge the search by possible bringing
back distinct information. Mutation should only happen with a very small probability.
Now a new generation of the population is ready. This is the 4 solutions in the "crossover" column except of course the mutated second solution. With this population the process of calculating
tness values, mating, etc. is continued.
Stopping criteria: A common stopping criterion in a GA algorithm is to stop when most of
the solutions in the population are the same. In this example, the stopping criterion could be to
stop when 3 of the 4 solutions are the same.
2
As the example shows the way GA works is by using mating of pairs, crossover, and mutation on
a sucient number of initial solutions. If P denotes the population, i.e. the set of individuals,
and O denotes the o spring of the current population, then a pseudocode of the algorithm can
be written as:
procedure Genetic Algorithm
begin
generate initial population of n individuals, x 2 S
P := x , i = 1; ::; n
O := ;
calculate C (P )
while P not converged do
i
i
repeat
(x1 ; x2 ) := mate(P; C (P ))
y := crossover(x1 ; x2 )
z := mutation(y)
O := O [ fzg
until members(O) = n
P := O
O := ;
calculate C (P )
end
end
Figure 6.1: Pseudo-code of a Genetic Algorithm procedure
6.1 Implementation
According to Beasley [8] four considerations are important when implementing a GA algorithm:
the internal representation, the tness function, the reproductive plan, and the stoping criteria.
All four areas have been introduced in example 6.1 and below some additional coverage have
been added.
Implementing GA as a method of solving a problem rst of all requires a suitable internal
representation of the solutions. The solutions should be coded in an alphabet. If using the same
alphabet for several problems, like the binary, the reproductive plan working on the alphabet
becomes problem independent and can be reused. Unfortunately that is not always possible.
Like in the other metaheuristics an objective function to compare di erent solutions must be
implemented. In GA the objective function is normally know as the tness function, and can
be seen as an evaluation of the tness of di erent chromosomes.
The reproduction plan cover consists of three parts: Mating, Crossover, and Mutation. When
mating the parents should be selected depending upon the tness of those. Solutions with the
best tness values should thus be selected more often than those with worse tness values as
(6.1) and (6.2) in the example depict. Given this the "strong" parents should create o spring
more often than their weaker counterparts making the next generation as strong or hopefully
stronger than the current. The size of the population must be carefully chosen. The bigger size
- the better result can you expect, but it will also take longer time at each iteration.
Having mated the parents a procedure for Crossover must be made. In some cases a simple
"one-cut" technique can be used, where the crossing point is randomly selected. The genes to
the left of the point are then swapped between the two solutions. In our example this method
could not be used, since it could create unfeasible solutions. So in some cases more advanced
methods must be used. Crossover should not always be applied, allowing a pair of parents to
reproduce themselves, adding the unchanged parents to the new generation. This should happen
with a probability between 10 and 40 percent.
Mutation is normally simple to implement and is, just as the frequency memory was in TS, the
diversi cation part of the algorithm. The probability Mutation should happen with is normally
small like about 0.1 % per gene or 1 % per chromosome. Like the probability of not applying
the Crossover function, the best value should be found by testing.
As stopping criterion you can choose to stop when % of the population has the same chromosomes. The population has converged as it is called. Again the best value of should be found
experimentally. Another possibility is to stop after a xed number of iterations or time.
6.2 Advanced operators
The selection, crossover, and mating operators described in the example were all basic techniques.
Depending upon the problem implementing more advanced operators can improve the quality
of the solutions.
Using random number generators choosing the parents like in formula (6.1) and (6.2) is for
smaller populations a highly variance process, i.e. the expected and the actual numbers of
copies made of the solutions can be very di erent. One way of overcoming this is the stochastic
remainder selection without replacement. It de nes E as the expected numbers of copies of
solution i.
i
Ei = N Pi
(6.3)
Each solution is then copied I times, where I is the integer part of P . The probability of
selecting a solution more than I times is given by the fractional remainder R given by:
i
i
i
i
i
Ri = Ei Ii
(6.4)
Using this scheme a solution with an E of 1.2 with be selected once for sure and a second time
more with a probability of 0.2. The random additions are continued until the new population
is complete.
In the selection phase another possibility is to use Tournament selection. Several variants of
this exist. In the most simple a number of participants in each tournament must be chosen. If
the number is three then three solutions are randomly selected and the one with the best tness
value is allowed to mate. Its mate will be selected by another tournament.
Crossover can be done in many ways. Two di erent have been described and more can be
found in [20], so no other methods will be covered here. A special greedy crossover technique
should be mentioned though. It is a combination of a local improvement method and GA and is
especially useful when small changes (i.e. a few genes swapped) lead to highly a ect the tness
of the solution. It is a GA approach by all means except that every time crossover has been
performed a Descent strategy or a more advanced heuristic is used to nd the local optimum.
This optimum is then used as o spring.
Depending upon the de ned crossover function two parents produce either one or two o spring.
You can then choose to keep either the child (or children) or the ttest one (or two) o them
all, i.e. both parents and o spring. By keeping the o spring every time convergence will usually
be slower, but diversity of the population is sustained in a higher degree, which is importent for
some problems.
i
The elitism scheme can be used too. In this the best parent is tested against the worst o spring
and the best one will stay, the second best parent is tested against the second worst o spring
etc. This continues until the chosen number of parents allowed to compete against its o spring,
the elitism number, have been tested.
If one or more solutions begin to dominate the population within short time it is called premature
convergence. For most problems this is unwanted and some steps to prevent it can be taken.
The usually one is mutation. Another possibility is called immigration. With a certain rate
or probability completely new possible solutions are generated and allowed to compete against
the existing population as when using the elitism scheme. But sometimes neither mutation nor
immigration is enough. Then the niching-technique might help.
Niching is when you attempt to keep several genetic similar sub-populations within the total
population in order to keep diversity. This can be done by designing the selection and crossover
operations with care. In the selection phase for instance sharing can be used. When sharing the
algorithm for each individual calculates how many other individuals which are similar (i.e. one
a few genes di erent). The more individuals that are similar, the more the individuals tness
are derated. In this way no single solution can take over the entire population.
Chapter 7
Other issues
After this brief introduction to some of the metaheuristic methods of solution, this chapter will
cover some other important issues. First the choosing of a topology and the global convexity
will be more deeply described. Then comes a section on how to design, analyse and report
on experiments. At last a section has been included with some suggestions for where to nd
references with more detailed information.
7.1
Making the right topology
By de ning the neighbourhood or the reproduction function, i.e. the possible transitions from
one solution to another, you also de ne a topology of the solution space.
First, when you are to decide about a method to use to solve a speci c problem, you should
think about whether a neighbourhood might be easily de ned or not. An clumsy neighbourhood
structure will often lead to a time-consuming neighbourhood generatingfunction, which can
severely degrade the general performance of the algorithm. Therefore problems, where no clear
neighbourhood structure can be found, might better be solved using for instance Branch and
Bound or a problem speci c heuristic algorithm, where the solution space does not need to be
transformed.
The most important thing to remember when de ning the possible transitions is that every part
of the solution space must be reachable within a nite number of iterations from the current
solution. In some cases a part of the solution space can be omitted, since it does not contain
the optimal solution. But for the part included, the rule above should always hold.
Figure 7.1 shows four di erent shapes of topologies of a solution space. Of these gure 7.1(a)
is the topology most wanted. Even the simple Descent Strategies will always succeed in nding
the optimal solution. This topology is Global Convex, but unfortunately it is almost impossible
to nd such nice topologies. The chance is that you might get something like gure 7.1(b) or
7.1(c) { or if you are unlucky { 7.1(d).
With a relaxed de nition of global convexity (see Hu et al. [3]) both a, b, and c are convex
over their topology of the solution space. If such a property exists the quality of metaheuristic
solutions will most often be superior to a similar metaheuristic solving a problem using the
topology of gure 7.1(d).
A good topology will therefore use as much information about the problem as possible, in order
to create this global convexity. A great deal of litterature exists presenting topologies well suited
for di erent problems, like the Traveling Salesman Problem, when using speci c metaheuristics.
In general a neighbour or a change of a single gene should represent a solution only sligthly
di erent from the current.
27
Objective value
Objective value
Topology S
Topology S
Objective value
(b) Semirough global convexity
Objective value
(a) Nice global convexity
Topology S
(c) Rough global convexity
Topology S
(d) No global convexity
Figure 7.1: Topologies of di erent cost surfaces
The topologies in b and c both have many local minima preventing the simple Descent Strategies
to work eciently. Using multiple starting locations will help - especially in this case if the
starting points are chosen close to the until then best achieved solutions. The roughness of the
topology does not really matter for these methods.
The other local search based heuristics, Simulated Annealing and Tabu Search, have to be
optimized di erently depending upon the roughness of the local minima. For greater "depths"
of the local minima SA should have a slower cooling speed and TS a longer tabulist for getting
clear of these optima.
Concerning the nal implementation, the computation speed of the neighbourhood generation
function or the reproduction function is very important, since it, like the evaluation function, is
called many times during a single run.
7.2 Designing and reporting on heuristic experiments
The best stopping criteria and optimal values of the parameters like the tabulength, frequency
penalty, T0 , , etc. should normally be found experimentally, since no universal rules of choosing
those values exist. How to design and later report on this experiments will be covered in this
section.
Hooker [33] describes two di erent forms of experimentation - competitive testing and scienti c
testing. While the rst is widely used it is less suited for the purpose. In competitive testing you
use benchmark tests for creating the fastest implementation of your algorithm. While this help
telling you which implementation is the fastest, it does not tell you why. The latter is the main
aspect in scienti c testing. Here statistical analysis is used to test how the algorithm depends
upon di erent parameters. The problems are generated according to the factor dependence to be
investigated and instead of raw computation time the measurement is the number of procedure
calls, elementary operations, or other statistics predicted by the algorithm.
Barr et al. [32] discuss more about the experimentation. They divide the making of an experiment into several phases:
De ne the goal of the experiments
Deside how to measure the performance of the di erent factors
Design and run the experiment
Analyse data and draw conclusions
Report the results
The goal can be speed, quality, theoretical interest, etc. When choosing a performance measure
time and quality are the most common. But it is also interesting to see how quickly a good
solution is found, how often the algorithm fails the quality demand and the quality span between
the best solution found and the mean of all found.
For the design and analysis phases knowledge of statistical methods is worth a lot. Montgomery
[34] gives an extensive introduction to this topic. Statistical analysis is especially important
when correlation between di erent factors is expected. It is important to keep the environment
stable, i.e. hardware and software platform, unless of course if the goal of the experiment is
to measure the dependence of the environmental factors. Also the random number generator
should be tested, in order avoid interference from poorly generated numbers.
It is equally important to keep reproducibility. When reporting the results it is essential include
documentation, so the results can be reproduced later. While the quality of the solutions might
not be the goal of every experiment an important issue is the tradeo between computation time
and the quality of the solutions.
Presenting the results graphically will often be more inviting to the reader, than tables. Below
two examples of graphical display are shown.
The boxplot is especially suited for the nondeterministic heuristics. In order to get an idea
of the results several similar runs are performed making it possible to nd the expected mean
value, standard deviation, etc. In a boxplot the maximum and minimum value achieved by each
method are marked as the ends of the vertical lines. The upper and lower edge of the box shows
the 75 % respectively the 25 % percentiles, while the line through it is the median. Eventually
observations statistically to far from the others, the outliers can be marked as circles. Showing
all these the boxplot makes it easier to compare di erent methods.
The right plot of gure 7.2 show the tradeo between quality and computation time. Again
several methods can easily be compared visually using di erent dotted lines for each. In the
plot shown, the improvement during a single run is shown. Another possibility is to plot the
end results of several runs with di erent computation times. Plotting the computation time vs.
the problem size makes another interesting plot.
Unexpected behaviour should be emphasized and if possible explained. Otherwise they should
be presented as problems worth further research.
490
500
450
480
400
470
350
460
obj. value
300
450
250
200
440
150
430
T0
Alpha
Lk
= 1000.00
=
0.99
=
25.00
Max obj. value
End obj. value
=
=
100
420
50
0
410
LSS
LSM
SA
TSS
0
200
TSA
(a) Boxplot comparing 5 di erent algoritms
400
600
800
1000
iterations
1200
1400
1600
464.08
461.12
1800
2000
(b) Plotting quality vs. iterations
Figure 7.2: Exaples of illustations (from Hindsberger [30])
7.3
Pointers of further research
In the bibliography included in this report, selected references for further research can be found.
They has been divided into four parts:
Introductions to heuristics
Metaheuristics:
{ general description
{ speci c metaheuristics
Applications
Design, analysis, and reporting
The rst section in the bibliography contains references about heuristics and the complexity of
problems. Section 2 has two parts - one for general introductions to metaheuristics and the
other for references describing speci c metaheuristics. The metaheuristics covered in this part
are:
Continuous Local Search based:
Simulated Annealing - [10], [11], [23], [24], and [28].
Tabu Search - [15], [16], and [19].
Threshold Algorithms - [13].
Variable Neighbourhood Search - [25].
Multistart Local Search based:
G.R.A.S.P. - [14].
Jump Search - [27].
Reconstruction based:
Genetic Algorithms - [8], [20], and [22].
Heuristic Concentration - [26].
Ant Colonies, Cooperative Search - [12].
Scatter Search - [17] and [18].
In section 3 a few references about metaheuristics applied for real-world problems are shown
while the last section has references for the issue about experimental design, analysis, and
reporting.
Bibliography
Introduction to heuristics:
[1] Fisher, M. L.
Worst-Case Analysis of Heuristic Algorithms
Management Science 26, 1980.
[2] Garey, M. R. and Johnson, D. S.
Computers and Intractability: a Guide to the Theory of NP-Completeness
Freeman, San Fransisco, 1979.
[3] Hu, T. C., Klee, V. and Larman, D.
Optimization of Globally Convex functions
SIAM Journal of Control and Optimization, 27, 1989.
[4] Silver, E. A., Vidal, R. V. V. and de Werra, D.
A Tutorial on Heuristic Methods
European Journal of Operational Research, 5, 1980.
Metaheuristics:
General description
[5] Hansen, M. P.
Metaheuristics for Multiple Objective Combinatorial Optimization
Ph.D. dissertation, IMM, Technical University of Denmark, 1998.
[6] Pirlot, M.
General Local Search Heuristics in Combinatorial Optimization
Belgian Journal of Operations Research, Statistics, and Computer Science, Vol. 32 (1,2),
1992.
[7] Reeves, C. R. (Editor)
Modern Heuristic Techniques for Combinatorial Problems
Wiley & Sons Inc., 1993.
32
Speci c algorithms
[8] Beasley, D., Bull, D. R. and Martin, R. R.
An Overview of Genetic Algorithms: Part 1, Fundamentals
University Computing, 15, No. 2, 1993.
[9] Boese, K. D., Kahng, A. B. and Muddu, S.
A new Adaptive Multi-start Technique for Combinatorial Global Optimizations
Operations Research Letters, 16, 1994.
[10] C erny, V.
A Thermodynamical Approach to the Travelling Salesman Problem: An Ecient Simulation Algorithm
Journal of Optimization Theory and Applications, 45, No. 1, 1985.
[11] Collin, N. E. et al.
Simulated Annealing { an annotated bibliography
American Journal of Matematical and Management Science, 8, 1988.
[12] Dorigo, M., Maniezzo, V. and Colorni, A.
The Ant System: Optimization by a Colony of Cooperating Agents
IEEE Transactions on Systems, Man, and Cybernetics - Part B, 26, No. 1, 1996.
[13] Dueck, D. and Scheuer, T.
Threshold Accepting: A General Purpose Optimization Algorithm Superior to Simulated
Annealing
Journal of Computational Physics, 90, 1990.
[14] Feo, T. A. and Resende, M. G. C.
Greedly Randomized Adaptive Search Procedure
Journal of Global Optimization, Vol. 6, 1995.
[15] Glover, F.
Future Paths for Integer Programming and Links to Arti cial Intelligence
Computers and Operational Research, 5, 1986.
[16] Glover, F., Taillard, E. and de Werra, D.
A User's Guide to Tabu Search
Annals of Operations Research, 41, 1993.
[17] Glover, F. and Laguna, M.
General Purpose Heuristics for Integer Programming - Part I
Journal of Heuristics, 2, no. 4, 1997.
[18] Glover, F. and Laguna, M.
General Purpose Heuristics for Integer Programming - Part II
Journal of Heuristics, 3, no. 2, 1997.
[19] Glover, F. and Laguna, M.
Tabu Search
Kluwer Academic Publishers, Boston, 1997.
[20] Goldberg, D. E.
Genetic Algorithms in Search, Optimization and Machine Learning
Addison-Wesley, 1989.
[21] Hansen, P.
The Steepest Ascent Mildest Descent Heuristic for Combinatorial Programming
Congress on Numerical Methods in Combinatorial Optimization, Capri, Italy, 1986.
[22] Holland, J. M.
Adaptation in Natural and Arti cial Systems
The University of Michigan Press, Ann Arbor, Michigan, USA, 1975.
[23] Kirkpatrick, S., Gelatt, C. D., Jr. and Vecchi, M. P.
Optimization by Simulated Annealing
Science, 220, 1983.
[24] Laarhoven van, P. J. M. and Aarts, E. H. L.
Simulated Annealing: Theory and Applications
Reidel Publishing Company, 1987.
[25] Mladenovic, N. and Hansen, P.
Variable Neighbourhood Search
Computers and Operations Research, 24, No. 11, 1997.
[26] Rosing, K. E. and ReVelle, C. S.
Heuristic Concentration: Two Stage Solution Construction
European Journal of Operational Research, 97, 1997.
[27] Tsubakitani, S. and Evans, J. R.
An Empirical Study of a new Metaheuristic for the Traveling Salesman Problem
European Journal of Operational Research, 104, 1998.
[28] Vidal, R. V. V. (Editor)
Simulated Annealing Applied to Combinatorial Optimization
Control and Cybernetics, 25, 1996.
Applications:
[29] Borges, P. C. and Vidal, R. V. V.
Fixed Channel Assignment in Cellular Mobile Telecommunication Systems
Technical Report 1996-25, IMM, Technical University of Denmark, 1996.
[30] Hindsberger, M.
The Target-Radar Allocation Problem
Master Thesis 1998-9, IMM, Technical University of Denmark, 1998.
[31] Vidal, R. V. V. (Editor)
Applied Simulated Annealing
Lecture Notes in Economics and Mathematical Systems, 396, 1993.
Design, analysis, and reporting:
[32] Barr, R. S., Golden, B. R., Kelly, J. P., Resende, M. G. C. and Stewart, W. R.
Designing and Reporting on Computational Experiments with Heuristic Methods
Journal of Heuristics, 1, No. 1, 1995.
[33] Hooker, J.
Testing Heuristics: We have it all Wrong
Journal of Heuristics, 1, No. 1, 1995.
[34] Montgomery, D. C.
Design and Analysis of Experiments
Fourth edition, Wiley & Sons, 1997.