Academia.eduAcademia.edu

Metaheuristics - an introduction

This report is meant as an introduction for people interested in using metaheuristics for solving combinatorial optimization problems. Four metaheuristics: Descent Strategies, Simulated Annealing, Tabu Search, and Genetic Algorithms, are covered by giving a brief introduction to how they work, how they should be implemented, and how they can be improved in various ways. References for further research in the area have been included.

DDRE F-65/1998 ''5( Metaheuristics - an introduction by Magnus Hindsberger DANISH DEFENCE RESEARCH ESTABLISHMENT DDRE F-65/1998 Abstract This report is ment as an introduction for people interested in using metaheuristics for solving combinatorial optimization problems. Four metaheuristics: Descent Strategies, Simulated Annealing, Tabu Search, and Genetic Algorithms, are covered by giving a brief introduction to how they work, how they should be implemented, and how they can be improved in various ways. References for further research in the area have been included. Keywords: Combinatorial optimization, metaheuristics. Acknowledgment I am grateful to dr. tech. Rene Victor Valqui Vidal, Department of Mathematical Modelling, Technical University of Denmark, and to my fellow researchers at the Danish Defence Research Establishment for their constructive comments to earlier versions of this paper. 2 Contents 1 Introduction 4 2 Heuristics in general 5 3 Descent Strategies 8 2.1 Why use heuristics? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Properties of heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6 3.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Multistart techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4 Simulated Annealing 13 5 Tabu Search 18 6 Genetic Algorithms 22 7 Other issues 27 Bibliography 35 4.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2 Proof of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.3 Variations and improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5.2 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 6.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 6.2 Advanced operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 7.1 Making the right topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 7.2 Designing and reporting on heuristic experiments . . . . . . . . . . . . . . . . . . 28 7.3 Pointers of further research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3 Chapter 1 Introduction Within many Operational Research and Management Science applications you will nd the socalled combinatorial optimization problems. Most of them have the common property that they are hard to solve with the methods available, since they belong to the NP-hard class of problems (for de nition see Garey and Johnson [2]). NP-hard problems in general cannot be solved to optimality in an ecient way. This is because the computation time will grow exponentially with the size of such optimization problems. Unfortunately in most practical applications the size of the problem is so large that the optimal solution cannot be found within the acceptable or available time. Those problems instead have to be solved using heuristic methods. A heuristic (short of heuristic method or algorithm) has by Nicholson [4] been described as a procedure " for solving problems by an intuitive approach in which the structure of the problem can be interpreted and exploited intelligently to obtain a reasonable solution". In other words a heuristic is an approximative method where common sense and thumb-rules are used to construct and improve possible solutions. The idea is to within reasonable time to obtain an acceptable solution. Heuristics have always been used for solving speci c problems, but some 20 years ago research was oriented towards the search for generic principles for making heuristics capable of solving a large range of problems. These heuristic principles, which was given the name "meta-heuristic" by Glover in 1986, describes a heuristic superimposed upon another. The metaheuristics therefore describes a high level procedure for developing heuristics solving di erent optimization problems. The purpose of this report is to introduce the reader to this approach to problem solving. This report will start by introducing the basics about heuristics { e.g. where they are used and which properties they should have. Next come the chapters 3, 4, 5, and 6, where some of the best known metaheuristics: Descent Strategies, Simulated Annealing, Tabu Search, and Genetic Algorithms are covered. The last chapter will address some important things to note, when designing and implementing a metaheuristic method. In addition an extensive bibliography has been included, where references for further research can be found. ::: 4 Chapter 2 Heuristics in general As mentioned above does the name heuristic denote an approximative method for solving a problem. For larger problems they are generally faster than the exact methods and returns often, but not necessary every time, an acceptable but seldom the optimal solution. This chapter gives an short introduction to heuristics { mostly based on the article by Siver et al. [4]. 4 1 3 4 4 3 5 Figure 2.1: Example of tree structure To get an idea of how a heuristic works, the tree structure shown in gure 2.1 can be viewed. It consists of 5 leafs (the 5 circles in the upper row) and 3 nodes, where the one in the bottom is denoted the root. The elements are connected by branches as shown, each with a weight c . It is wanted to nd the route from the root to a leaf along the branches, where the total weight (i.e. the sum of the used c 's) is highest. One way of doing this exact could be to write an algorithm, which would calculate the weight of all the possible routes and return the one with the highest value. But in some situations the tree might be so big that the exact method above (or any other) cannot be used, since the time needed to calculate the weights for all the routes is bigger than the time available to solve the problem. In such cases a heuristic solution to the problem is the only way out. An intuitive possibility is at each note to pick the branch reaching up with the highest weight. This particular search method, which would normally be referred to as a greedy algorithm, is very similar to one of the Descent strategies discussed in the next chapter. Using this algorithm on the tree structure of gure 2.1, the optimal value 9 is found by adding the weights of the marked branches. If looking at the problem of minimizing the weight, the branch with the smallest weight reaching up at the visited nodes should be chosen instead. But in this example, the algorithm returns the value 7, which is greater than the optimal value of 6. In other examples even the worst i i 5 possible solution might be returned. In generally, though, this algorithm will return a good (i.e. satisfying), but not necessarily the optimal solution to a problem. Another heuristic algorithm that could be used on the example too, is to pick randomly three routes, calculate the total weight of each and return the one with the highest value (or lowest if minimizing). This algorithm would also generally return a good solution. 2.1 Why use heuristics? Some reasons to use a heuristic method have already been mentioned, but not all. To sum up, the most important cases where heuristics should be used are shown below:     The problems where exact algorithms to solve them still are unknown. In those cases heuristic methods of solution is the only way out. Those problems where the computation time of the algorithm takes too long time. Many combinatorial optimization problems belongs to the NP-hard class, where the computation time will grow exponential with the size of the problem. The size could for instance be the number of nodes in a tree, number of towns in a "Traveling Salesman" problem, etc. So for large NP-hard problems, due to the long computation times, exact algorithms do not exist in practice. The exact algorithms which initially requires a possible solution and subsequently tries to improve the one. These algorithms will generally terminate faster, if a good initial solution is given. A good initial solution might be determined by heuristic methods. An example is the "Northwest Corner principle", which is used for nding a good initial solution to the classical transportation problem. Branch and Bound methods requires determination of good upper/lower bounds for each node. Heuristic methods might be used for determination of these bounds. Though heuristics can be useful in many cases it has to be remembered that heuristic methods of solution are only a supplement to the exact algorithms. Many practical problems require exact solutions and not just solutions close to the global optimum. And a heuristic algorithm is not necessary easy to develop as many have experienced. 2.2 Properties of heuristics When developing a heuristic algorithm, the following should be kept in mind. The quality of all heuristics is not the same. In general the quality depends upon the following properties, which a heuristic should process:     return good solutions with a high probability. similar returning poor (i.e. not satisfying) solutions with a small probability. fast termination, at least faster than any known exact algorithm used on the same problem. the heuristic should be easy to understand. Many people prefer an algorithm, "only" giving a satisfying solution, where they understand the principles behind instead of a complex algorithm that magically returns the exact solution without any possibility of understanding. A heuristic is designed for solving a speci c problem in mind, and can for that reason seldom be used on other problems or if new constrains are added. But as mentioned much focus are now on a group of heuristic principles, the metaheuristics, which can provide guidance to how to implement heuristics for solving many di erent problems. Hansen [5] characterized metaheuristics as either continuous local search, multistart local search, or recombination-based. In the local search based techniques a solution is repeatedly replaced by another very similar solution, which is said to belong to a de ned neighbourhood of the rst solution. Thus these methods are also described as neighbourhood based. The continuous local search methods stop when the time available has been used, while the multistart methods stop when some criteria concerning the solution have been meet, starting over from a brand new solution a number of times. Both variants will return the best solution visited during execution of the algorithm. The recombination based methods will combine two or more solutions into new solutions, sharing some caracteristics with both of (or all) the originals. This is done using a de ned generation mechanism, which should favor solutions with the best caracteristics (i.e. those who improve the objective function at most). The metaheuristics do not work on the solution space of the speci c problems but on a topology of those, since some kind of ordering of the soluions is needed. The topology of the solution space is de ned by the chosen neighbourhood or the generation mechanism. The user must, for the problem to be solved, nd this possible transformations between solutions. A topology can usually be de ned in many ways, and some are better suited than others. The metaheuristics, how they work, and how a good topology of the solution space should be constructed will be described in the following chapters. Since the topologies not always can take advantage of the structure of the problem as much as the problem speci c heuristics, you can expect the computation time or the quality of the solution of a metaheuristic method to be inferior to that of a problem speci c heuristic. So why should anyone want to use a metaheuristic approach, if he can make a better problem speci c heuristic? The metaheuristics have according to Pirlot [6] some advantages over the problem speci c techniques in terms of: transparency, easiness of implementation, suitability for experimentation, always produce usable solutions, and exibility when for instance more constraints are added to the problem. This is why much research is done in metaheuristics and why they are used more and more common. Chapter 3 Descent Strategies The Descent Strategies (DS) are the most simple neighbourhood based methods. DS are iterative algorithms replacing the current solution only with solutions, which improve the objective function value of the problem. This is done until a local optimum has been found. The DS are important since they form the basic for other neighbourhood based metaheuristics. This chapter should give the reader a clear comprehension of how the strategies work. Look at a combinatorial optimization problem with a well-de ned solution space . Each of the possible solutions 2 =( 1 2 ) can be evaluated by an objective function (or cost function) denoted ( ). In addition each solution also has a de ned neighbourhood denoted ( ), which could be a result of adjusting each one by one. In the following it is assumed, unless otherwise stated, that the objective function value should be minimized. Graphically, such a problem can be visualized as trying to nd the lowest point in a complex and often highly contoured landscape. An optimization algorithm can thus be viewed as an explorer wandering through valleys and across hilltops searching for the topological lowest point. The grid location where the explorer is, can be likened to the current solution , and the topological height of this location to the result of the evaluation function, ( ). The neighbourhood ( ) can be de ned as the grid locations found in a one meter distance of in each of the four corners of earth respectively. S x S; x x ; x ; :::; xn C x x N x xi x C x N x x C A B D E Figure 3.1: The "landscape" of a unknown function, de ned by the topology of S Staying in the explorer analogy, two Descent Strategies will be introduced below: 1. the explorer picks randomly one of the four neighbourhood locations, evaluates the height, and if the height is lower than the one in his current position, he walks there and continues from this new location picking one of the new neighbours randomly. Otherwise he remains at his old location choosing a neighbour from the neighbourhood again. If he has not 8 changed location for a number of iterations, he is possible trapped in one of the local extremes (which might be the global one), and the algorithm is stopped. This strategy is called Random Search. 2. the explorer evaluates the height of each of the four neighbours. He walks to the best (i.e. lowest) of them and continues from there, as long as it is lower located than the position he came from. Otherwise he is in one of the local extremes, and the algorithm is stopped. This strategy, which was sketched in the previous chapter, is called Steepest Descent (or in case of maximizing, Steepest Ascent). This type of algorithm is often called a Greedy Algorithm. As it can be noted, the explorer in a DS algorithm only accepts downhill moves. This severely decreases the probability of nding the global optimal solution as seen in gure 3.1. The explorer is standing at point A, with in this example only two neighbours, B and C . He will either pick one of them randomly and evaluate the height of this or he will evaluate both B and C . But he will never choose C as the next current solution, since it is higher located than A. Instead he will walk downhill toward B and eventually reach the local minimum point D and thus never come to the global minimum in E . But by rerunning the algorithm several times with di erent start locations, the chance of nding the optimal, or at least a better, solution is improved considerable. Using DS in this way will be described later. The DS algorithms are useful, since they are both simple and generally applicable as long as a topology de ning the neighbourhood and an evaluation function can be constructed for the problem. A pseudo-code of the Greedy Algorithm is shown in gure 3.2. procedure Greedy Algorithm begin choose initial solution, x 2 S repeat choose neighbour, y 2 N (x) C = C (y) C (x) if C  0 then x := y C  := C (y) endif until stop end Figure 3.2: Pseudo-code example of the Greedy Algorithm 3.1 Implementation Three things are of importance when implementing local search based methods. The de nition of the neighbourhood, N (x), the objective function, C (x), to evaluate the neighbours, and the stopping criterion. How they can be implemented is shown in Example 3.1 below. Example 3.1 (The Traveling Salesman Problem) The famous Traveling Salesman Problem, TSP, involves the design of a minimum cost path for a salesman, which has to visit a number of cities. Each city must be visited exactly once and the path has to end in the same city as it started. In this case the problem has 6 cities numbered 0 through 5, where the salesman has to start from (and thus end in) city no. 0. A solution can be expressed as: City no. 0 1 2 3 4 5 Succession no. 0 1 3* 2 4* 5 where the Succession number denotes the number in the succession where the city with City number is visited. E.g. in the solution above, city 2 is visited as number 3, when the salesman departs from city 0 as illustrated in solution A in gure 3.3. Solution A Solution B 1 1 11 00 11 00 Solution C 1 11 00 11 00 11 00 11 00 00 011 00 11 13 0 0 1 00 011 00 11 11 3 00 00 11 00 011 00 11 11 3 00 00 11 11 00 200 11 11 5 00 11 00 11 00 200 11 11 5 00 11 00 11 00 200 11 11 5 00 11 00 11 00 00 11 11 00 00 11 4 11 00 00 11 4 4 Figure 3.3: Possible solutions to the Traveling Salesman Problem The neighbourhood: A neighbourhood of the TSP can be de ned in several ways:  as moving one city in the sequence to another place in the sequence, e.g city 1 is visited last instead of rst and the other cities are thus visited one "step" earlier.  as an exchange of two cities in the succession they are visited, for instance the two marked with a *, resulting in the B solution of gure 3.3, where city 2 now is visited 4th and city 4 is visited 3rd. This kind of neighbourhood is traditionally denoted 2-OPT in the litterature, sometimes with the restriction of not exchanging two cities next to each other in the succession. In the following of the example the neighbourhood is the latter of the two mentioned with no restriction. Note that in this example the column for city 0 must be unchanged, since the city to start from is xed. The solution space S is then given by the number of permutations of city 1 through 5, resulting in 5  4  3  2  1 = 5! = 120 possible solutions. The number of neighbours to each solution is 4  3  2  1 = 4! = 24. For a larger number of cities, n, the number of neighbours to each solution, corresponding to (n 1)!, would be astronomic, and the Steepest Descent strategy unusable. Ob jective cost: If the cost of traveling between two cities is assumed to be proportional to the distance, the cost of each solution can be calculated from the distance chart below. 1 0 2 3 4 5 2.0 2.0 3.5 3.5 4.0 1 3.5 2.0 4.0 3.5 2 4.0 2.0 3.5 3 3.5 2.0 4 2.0 Figure 3.4: The distance between the cities in the TSP Assuming the current solution is A, the objective cost, C (A), of this solution is found as: C (A) = d(0; 1) + d(1; 3) + d(3; 2) + d(2; 4) + d(4; 5) + d(5; 1) = 16 where d(0,1) denotes the distance between city 0 and city 1. Now a random solution, B, from the de ned neighbourhood is picked. C (B ) of this is calculated as: C (B ) = d(0; 1) + d(1; 3) + d(3; 4) + d(4; 2) + d(2; 5) + d(5; 1) = 17 Since C (B ) > C (A) the move to B is rejected and the search from A is continued. Another random neighbour is picked from the neighbourhood, this time C from gure 3.3 by exchanging the number in the succession city 2 and 5 are visited. Evaluating the objective function as before, C (C ) is found as: C (C ) = d(0; 1) + d(1; 3) + d(3; 5) + d(5; 4) + d(4; 2) + d(2; 1) = 12 Now a better solution is found and C can be assigned as the current solution. So in the next iteration the neighbourhood of this is searched. Stopping criterion: It is chosen to continue as above until 20 new neighbours in a row have been rejected, since it is then assumed that a local minimum point has been reached. Other stopping criteria will be discussed in the next chapters. 2 Whether the Greedy Algorithm or Steepest Descent should be used depends upon the problem. The Greedy Algorithm requires only C (x) to be calculated once in each iteration, but has in general more iterations than Steepest Descent, since the latter will go more straight for the "closest" minimum point. If the size of the de ned neighbourhood is big compared to the computation time of the evaluation function, Greedy Algorithms would normally be most ecient. On the other hand will Steepest Decent generally do best with small neighbourhoods. The two strategies can be combined, searching a selected sub-neighbourhood, which in some cases will make the implementation more ecient. For both methods the computation time of the procedures nding neighbours and calculating C (x) should be optimized, since these procedures will be called many times during a single run. 3.2 Multistart techniques As mentioned above performing multiple descent runs returning the best value achieved will normally improve the quality of the results. Figure 3.5 illustrates the idea. Here three di erent starting points have been selected, and one of these x3 leads to the global minimum point, B . Multistart techniques can be implemented in two di erent ways. Either you can run a certain number of either descent runs starting at randomly selected solutions or you can hope for the solution space, S , has a structure which can be exploited (the so-called global convexity phenomenon explained in chapter 7), thus making it possible to say in which parts of S the optimal solution most likely is in and then select your starting points in this areas. In Hindsberger [30] the quality of the solutions when using multiple random initial solutions was better than the quality of solutions achieved using Simulated Annealing and only sligthly inferior to Tabu Search. This approach is especially appealing if the increase in speed of parallelization is wanted, since this is easily done. x1 x2 A x3 B Figure 3.5: Multistart LS improves the probability of nding the global minimum point But using random selection has its weaknesses. The number of Descent runs needed to obtain stable solutions, i.e. solutions with low standard deviation, in some problems grow fast with the problem size (Boese et al. [9]). Boese et al. [9] is also a good introduction to the more advanced techniques. It describes how to analyse the global structure of the cost surface, C (S ) as well as introducing a simple multistart method. Feo and Resende [14] and Tsubakitani and Evans [27] cover two di erent methods, which both have achieved very good results on several problems. Chapter 4 Simulated Annealing This method, which was introduced by Kirkpatrick et al. [23] and independently by C erny [10], is inspired by the annealing process of solids and is thus known as Simulated Annealing (SA). Annealing is a process where a material is heated to a very high temperature where all particles is moving randomly, without interaction between each other. Now the material is slowly cooled in order to obtain a highly ordered structure of the material with an almost minimum energy state when it freezes. SA can be introduced by showing the analogy between the annealing process and nding the minimum cost in a combinatorial optimization problem, but the easiest way of introducing SA is to show, that it is an extension of the Greedy Algorithm, which was introduced in the previous chapter. The Descent methods are easy to understand and, depending upon the problem, are normally quite simple to implement, but have their weaknesses in being unable to overcome local minimum points. Simulated Annealing (SA) can be viewed as an advanced version of the Greedy Algorithm, which with the introduction of a stochastic element, to some degree allows uphill moves. It now becomes possible to escape from local minimum points, improving the chance of nding a better { including the optimal { solution. Like in the Greedy Algorithm, downhill moves are always accepted. The new part { the probability of moving uphill to a neighbour { is controlled by the acceptance function, formulated as (  k ), which denotes the probability of accepting the neighbour resulting in a  change of ( ). The probability depends as it can be seen upon the control parameter k , called the temperature. In the beginning the temperature is so high, that almost every uphill move is allowed, but as time goes the temperature is lowered and the probability of moving uphill is decreased until practically being zero. Because of the dependence upon  a neighbour solution being only slightly worse than the existing would be allowed with a greater probability than a neighbour resulting in a major increase of ( ). exp C=T C C x T C C x Example 4.1 (The Traveling Salesman Problem - continued) This is best illustrated by resuming the TSP example from chapter 2. Assuming the temperature Tk = 100 the probability of accepting solution B, when the current solution is A, can be found as: e C Tk = e 1 100 = 0 99 : So a random number between 0 and 1 is generated, and if it is below 0.99, solution B is assigned as the "current" solution. If the temperature instead is Tk = 1, the probability of accept will instead be as low as about 37 %. 2 13 Homogeneous Simulated Annealing procedure begin choose initial solution, x 2 S initialize T0 ; Lk k := 0 while not stop do counter := 0 repeat choose neighbour, y 2 N (x) C = C (y) C (x) if C  0 then x := y elseif exp( C=Tk ) > random[0; 1] then x := y endif counter := counter + 1 k := k + 1 until counter  Lk update Tk ; Lk end end Figure 4.1: Pseudo-code example of homogeneous Simulated Annealing Depending upon how often the temperature is lowered you are talking about inhomogeneous respectively homogeneous SA. A pseudoalgorithm of the latter is showed in gure 4.1. The inhomogeneous version is a special case of the homogeneous, where the temperature is lowered each iteration corresponding to Lk = 1; 8k. The mentioned stochastic element comes from the function random[0; 1], which produces uniformly distributed values between 0 and 1. Because of this, the algorithm, like the Greedy Algorithm, is nondeterministic and might in each run return di erent solutions even when started with the same initial solution to the same problem. When evaluating SA with respect to the four properties that heuristics should process, it will be noted that good and poor solutions can be obtained with respectively high and low probabilities, but that depends upon the problem. Returning to the explorer analogy, the explorer now has a limited amount of energy, which allow him to move uphill, but as time goes he will become tired and be less and less eager to move uphill. As his energy-resources reaches zero, he will only walk downhill. If the landscape he is exploring is highly contoured with a lot of valleys of di erent depth, the chance that he should end in one of the deepest is not that high, since he easily could be trapped in a "non-optimal" valley without the energy to climb the hills around him. It nevertheless generally returns much better solutions than a single Descent run, and though it takes considerable more time, it turns out that SA is well suited to solve many problems. And the method is easy both to understand and implement, since it is only a quite simple extension of the Greedy Algorithm. 4.1 Implementation When implementing SA in order to solve a problem, some problem-speci c and some generic decisions has to be made. The problem-speci c decisions are:  Neighbourhood, how should the neighbourhood be de ned in the solution space and how should it be searched.  Objective function, how can a solution be evaluated.  Initial Solution, should it be picked randomly or systematic. They are all similar to the ones described in the TSP example in the previous chapter, and will therefore not be discussed here. The generic decisions to make are all independent of the actual problem, and concern, except for the stopping criteria, all the cooling strategy. A cooling strategy consists of:  The initial temperature, T0.  A cooling function to reduce the temperature.  For the homogeneous version only { the number of iterations between each reduction in temperature, Lk . The initial temperature T0 should be chosen so that almost every generated neighbour is accepted initially, which can be formulated as: exp( C=T0 ) ' 1 (4.1) where the left hand side as mentioned denotes the probability of accepting the neighbour, which results in a C change in the objective function. Several cooling strategies exist. The most common is the so-called geometric cooling scheme, which involves a geometric temperature function formulated as: Tk+1 = Tk ; 0< <1 (4.2) Practice has shown that the value of should be high, e.g. in the interval 0:8   0:99, in order to make SA run eciently. Higher values of gives SA a better chance to overcome local minimums but will on the other hand increase the computation time, since the cooling now takes longer. The similar situation applies to the choice of Lk . High values usually improves the quality of the results, but it takes longer to compute. There are no rules for selecting and Lk , so the best values must be found experimentally. Experience shows that for a xed time several shorter runs return a better result than a single long run. Another cooling strategy is the logarithmic cooling scheme formulated as: 1 + log(k) ; k = 1; 2; ::: where k is increased after the usual Lk steps and is a suciently large constant. Tk = (4.3) Additional cooling strategies are described in Collin et al. [11] and Vidal [28], but generally the performance depends more upon the choice of parameters like and Lk than the chosen strategy, and that is why no other strategies are covered in this report. As stopping criteria there are three obvious possibilities. You can choose to stop when:    the value of C (x) has not been improved more than 1 the last K1 steps. the percentage of accepted functions is below 2 for the last K2 steps. the total number of iterations has reached K3 . The best values of  (usually between 0 and 5 %), and K should be found experimentally. Alternatively the above criteria can be combined. The rst two stopping rules both correspond to the "frozen" situation where a local minimum has been reached and the temperature is so low, that it cannot be escaped. The last one is the easiest to implement, but you have to be sure that the temperature has been lowered so much that the "frozen" state has occurred. Another stopping rule much equal to the latter is to stop the computation after a xed time, which could be the time available to do the computation. 4.2 Proof of convergence Some interesting theoretical results about SA has been achieved. The sequence of accepted solutions can be viewed as an inhomogeneous Markov chain, since the probability of moving to another solution, state y, only depends upon the current solution, state x, and the transition probability matrix PT depends upon the current temperature. As shown in Laarhoven and Aarts [24] and Pirlot [6], Hajek used Markov theory to prove that SA will with a probability of 1 end in the global minimum of S if the following conditions are met:     The neighbourhood is symmetric, e.g. y 2 N (x) () x 2 N (y). The solution space, S , is coherent, so every solution can be reached within a nite number of iterations. The neighbourhood is weakly reversible. That is for all x; y; T it is possible to reach y from x at the temperature, T , if and only if x can be reached from y at the same temperature. The temperature is cooled using the logarithmic cooling scheme (4.3). But one thing is theory { another thing is reality. In order to ensure the convergence an in nite number of iterations has to be made, since the temperature never actual will reach zero. 4.3 Variations and improvements The SA algorithm can be improved by other means than changing parameters. Another possibility is changing the neighbourhood generating function, so another neighbourhood is de ned. An unfavorable designed neighbourhood can severe degrade the performance of the algorithm. Some computation tricks can be used to improve performance too. Random numbers can be calculated from beforehand, being stored in a vector containing for instance 10.000 numbers. The access of a number in a array is generally much faster than calculating a new random number based upon the current. On computers the exp() function calls are usually slow. Replacing this with an approximation can save much computation time. According to [6] about 31 of the time can be saved when solving the Graph Partitioning problem if using a precomputed table of exponentials. Another possibility, instead of using the exp() function in the acceptance function, is to use the rst terms of the Taylor expansion of the exponential function around zero, which also on most systems will take less computation time and ensure faster convergence at lower temperatures. When P (x) denotes the probability of accepting solution x and C is the change in the objective function value, which comes from accepting x, the standard acceptance function can be written as (4.4) and the Taylor acceptance function as (4.5). ( ; for C < 0  P (x) = 1  C exp Tk ; for C  0 ( P (x) = 1 max 0; 1 ; for C < 0 C  ; for C  0 Tk (4.4) (4.5) In Borges and Vidal [29] the use of the Taylor acceptance function proved to be superior to the implementation of the standard acceptance function when assigning channels in cellular mobile telecommunication systems. Several variations of SA also exist. Some of them, which might be superior to SA in some cases, are:  Locally optimized SA. In this the best solution of a randomly selected subneighbourhood is picked instead of just one random neighbour of the total neighbourhood. This can improve both the computation time and the quality of the solutions as shown on page 25 in Vidal [28].  The Threshold Accepting method. Here a neighbour x is accepted if C (x) is lower than the threshold parameter. The threshold is decreased with time making more and more worse solutions prohibited. It can be studied in Dueck and Scheuer [13].  SA with memory. The proof of convergence and thus optimality does not apply to real world usage because of nite running time. So the solution SA freezes in is not neccesary the optimal since it might in some cases during the run have visited superior solutions. So storing the best found solution might improve the quality as in Hindsberger [30], much depending upon the look of the cost surface and the cooling speed. Chapter 5 Tabu Search Tabu Search (TS) is another of the best known general heuristic methods. The framework was introduced in 1986 by Glover [15] and Hansen [21] and TS is thus compared to SA somewhat younger. TS can like SA be seen as a improved Descent Strategy, but unlike SA this time build upon the Steepest Descent strategy. TS utilizes exible memory to remember the previous steps taken (they will be designated as tabu) and will choose other steps in order to exploit new parts of the solution space by taking advantage of history. It is important to note that the same step does not have to be between the same two solutions. If solution A from the TSP example back in gure 3.3 is a result of exchanging the number in the succession city 1 and 2 are visited and the next step is to solution C, where city 2 and 5 are exchanged in the sequence, then the step where city 1 and 2 are exchanged will not bring the situation back to the solution prior to A. But by remembering some information about the previous steps, TS can make sure that the algorithm does not alternate between the same two solutions, and if implemented properly assure that no cycle between the same solutions occur. In gure 5.1 a pseudocode of TS is shown. The notation is unchanged from before, except for the neighbourhood, which now is denoted N (x; k), since it now also describes, which solutions at time k are tabu. A TS algorithm is not always stochastic as SA. If all the neighbourhood of each solution is searched, corresponding to the situation V = N (x; k) in the Steepest Descent strategy, where the best possible solution is chosen, the algorithm is deterministic. But as mentioned in the TSP example, the number of neighbours can be so huge, that calculating the cost of all of them will be unpractical. Instead a smaller part of the entire neighbourhood should be searched as in the Locally optimized SA version of SA. Which part of the neighbourhood to be searched can either be chosen using some speci c rules or random selection { making the algorithm deterministic or stochastic respectively. 5.1 Implementation Like SA, it is quite easy to implement TS, if you already have implemented a Descent algorithm. The procedure for calculating ( ) is unchanged and the one for nding ( ) is easily altered to include the tabu-classi ed elements. The new, and the hardest part, is the implementation of memory. An implementation of TS always includes a short term memory of the steps taken most recently. This, also called the recency memory, is used for an intensi cation strategy searching for local minima while still being able to overcome those. In addition a longer term memory can be implemented, remembering C x N x 18 procedure Tabu Search begin choose initial solution, x 2 S x := x C  := C (x) k := 0 choose V  N (x; k) y := minfC (y) j y 2 V g while not stop do x := y if C (x)  C  then x := x C  := C (x ) endif k := k + 1 choose V  N (x; k) y := minfC (y) j y 2 V g end end Figure 5.1: Pseudo-code of a Tabu Search procedure the number of times each step has been taken. This is usually denoted the frequency memory and is used as an diversi cation strategy to prevent the same steps from being chosen over and over again, as shown in the example below. Example 5.1 (The Traveling Salesman Problem - continued) How TS and especially the memory can be implemented is best done by returning to the TSP example from chapter 3. The solution space was the number of permutations of 5 numbers, and a neighbour to a solution was the exchange of two cities in the succession the cities were visited in. A common implementation of memory when using a neighbourhood de nition as the one from Example 3.1, is using a n  n matrix. In this example n = 5 for reasons that should become clear later on. In gure 5.2 the values assigned to some of the data structures of the TS algorithm during two iterations are shown. The values will be explained below. The rst part is the memory matrix. The upper triangular matrix of this is the recency memory. For iteration 10 it can be seen that the steps, which are tabu during this iteration, are: Exchanging city 1 and 2 is tabu the next 3 iterations, exchanging 2 and 5 is tabu the next 2 iterations, etc. The frequency memory is stored in the lower triangular matrix. For iteration 10 it can be seen that city 1 and 2 has been exchanged once, city 1 and 4 twice, etc. for a total of 9 exchanges (since the current iteration are number 10). The current solution, corresponding to solution A of gure 3.3, is stored as described in example 3.1, with city 0 left out, since it cannot be exchanged with the others. The objective function value of this solution is 16, but as it can be noted, a better solution, with a value of 15, has been found earlier (this solution is stored too, but is not pictured here). A neighbourhood of three neighbours has been generated. Neighbour 1 is exchanging city 2 and 5 resulting in an objective function value of 12. The T denotes, that this step is Tabu, as it can be seen in the recency memory. The other two neighbours is swapping city 2 with 4 and city 3 with 5 resulting in an objective value of 17 respective 18. Which of the neighbours should be chosen? If a step is tabu, it cannot be chosen unless a special condition, called an aspiration criterion, occurs. Some common criteria will be discussed below. Iteration 10 Memory: 1 2 3 4 5 2 5 1 2 3 4 5 2 1 2 1 2 Recency 1 3 1 3 4 Memory: Recency 1 2 Iteration 11 3 1 3 3 4 2 2 5 1 3 3 Frequency Frequency Current solution: Current solution: City 1 2 3 4 5 City 1 2 3 4 5 Seq. 1 3 2 4 5 Seq. 1 5 2 4 3 Objective value: 16 Objective value: 12 Best Objective value: 15 Best Objective value: 12 Neighbourhood: Neighbourhood: Swap Obj. Value 1-3 15 17 2-4 15 18 3-4 16 Swap Obj. Value 2-5 12 2-4 3-5 T Figure 5.2: The matrix-implementation of Tabu Search Memory The number of iterations a step is tabu, which will be denoted the tabulength, can be constant like above, where a step is tabu for three iterations. Another option is to let the time vary between two bounds, either randomly or systematically. The latter dynamic form will normally prevent the algorithm from cycling between the same solutions. The most common aspiration criterion is to accept a tabu step if it results in a better solution than the best found until then. Another case where a tabu step has to be chosen, is when all the steps in the neighbourhood are tabu. Then the step with the oldest tabu-restriction should be chosen. Other useful criteria can be found in [16]. Because of the aspiration criterion mentioned it is chosen to swap city 2 and 5 resulting in solution C from gure 3.3. In iteration 11 the data structures have been updated as shown. Note than only two steps are tabu now, since a tabu step was chosen last. But since none of the solutions in the neighbourhood are tabu this time, there will be three tabu steps again from iteration 12. Frequency memory is usually not as strict as the recency memory. If implemented at all it is normally only used when no solutions are found with a better objective function value than the current best. To the objective function values of the neigbours a penalty, depending upon the frequency of the steps until now, are added. E.g the penalty in this example could correspond to the frequency of the steps. The adjusted objective values for the three neighbours are then 15, 16, and 19, since swapping 3 and 4 has been done 3 times until now. The neighbour with the smallest adjusted value should be chosen, since it most common will take the algorithm to the least explored part of the solution space. As stopping criterion you can continue for a xed number of iterations or until no change of x* or c* has occurred for a number of iterations. 2 Because of the memory will TS in general result in very good solutions for most problems. It may be a bit more complicated to implement than SA, but is still quite easy to implement, if procedures for neighbourhood generation and calculation of objective cost either exist or easily can be written. Possible ways to improve the quality of the solutions will be discused below. 5.2 Improvements The quality of the results of TS is like SA much dependend upon the chosen parameters. So the optimal values of the tabulength and the penalty to apply for diversi cation must be found by testing. In general a highly contured cost surface requires a very long tabulist to overcoming the local minima. In [16] Glover desribes several possible improvements of TS. He divides the re nements in three groups: tactical, technical, and computational. A tactical improvement address the actual implementation of the problem. Most important, like in any other LS-based heuristics, the de ned neighbourhood must be good. The technical improvements do not directly address the problems to be solved. Selecting how many and which neighbours to be evaluated, the tabulist size, etc. are considered technical improvements. Varying the tabulist size using some transition probabilities and clearing all taburestrictions when a new better value of x is found in order to unhindered search from this new solution are other technical re nements. As computationel improvements parallelization counts. As does proper use of already known information. One good example is the memory matrix implementation of gure 5.2. Instead of storing the number of iterations a move is tabu and decreasing this number each iteration until zero it would be better to store the iteration number the move is no longer tabu and then have an iteration counter. Refreshing of the matrix is no longer needed since you can test whether the value in the matrix is higher than the current iteration number to see if it is tabu. Chapter 6 Genetic Algorithms This heuristic, which was introduced by Holland [22] in 1975, is very di erent from the earlier desribed methods being a recombination based method. Genetic Algorithms, GA, is as expected inspired by the genetic evolution theory. The method is stochastic and works with several solutions at a time. By using the information about current solutions and the objective function values of those GA creates new and hopefully better solutions. The method is best explained using a small example. Example 6.1 (The Traveling Salesman Problem - continued) GA is implemented to work on an alphabet thus representing the solutions as strings of the alphabetic caracters. In this TSP example the solutions used by the GA is encoded as a string listing the cities in the order they are visited - thus 013425 represents the solution where city 1 is visited after city 0, city 3 after city 1, etc. to end with city 5 before returning to city 0. The algorithm starts with the set of 4 randomly generated solutions shown at the next page. In GA terms the set of solutions is called the population, the individual solutions are denoted chromosomes while the single caracters in the solutions are genes. Initial solutions: No. Solution Fitness Prob. of select 1 1 3 4 2 5 17 0.233 2 2 4 5 3 1 12 0.330 3 2 3 4 1 5 21 0.189 4 1 3 2 4 5 16 0.248 For each initial solution, a tness value is calculated. In GA terms this represents the value of the objective function, which in this case should be minimized. Reproduction: The information held by this 4 solutions shall now be recombined into new solutions. This is called reproduction. The rst phase of the reproduction is the mating or selection phase. During this phase 8 solutions (one may be selected more than once) are randomly selected from the initial population making up four pairs. The probability of selecting a solution for mating depends upon its tness value. If the problem had been a maximizing problem, a way of implementing the dependence could be to select the solution i with the tness value f with a i 22 Reproduction: Crossover Selection 1 3 4 2 5 2 4 5 3 1 2 3 4 1 5 2 4 5 3 1 1 3 2 4 5 1 3 4 2 5 2 4 5 3 1 1 3 2 4 Mutation 5 3 4 2 1 2 4 3 1 5 1 3 2 4 5 2 4 5 3 1 4 2 3 1 5 5 probability of Pi where Pi can be found as: Pi = Pf f i i i (6.1) When instead as here minimization is wanted the formula is: Pi = 1 fi P 1 i fi (6.2) So in this case solution no. 1 should be selected with a probability of 23%. Doing it this way it can be seen, that the probability of selecting the best solution, no. 2, is almost twice as large as that of the most inferior, solution no. 3. The next phase of the reproduction is crossover. Here information of the two solutions in a mated pair is mixed. A simple way of doing this is to select a sequence of genes in the rst solution and x them. The missing genes are then lled in using the order they come in the second solution. The crossover elements are selected randomly - but sometimes no ones should be selected, keeping the rst of the parent solutions. This is what happened for the fourth pair. The nal part, before the new generation of the population is ready, is mutation. For each solutions there should be a small chance that some genes are changed. In the example above this is what happened for the o spring solution of the second pair after the crossover. Three genes were selected and swapped randomly, which help diverge the search by possible bringing back distinct information. Mutation should only happen with a very small probability. Now a new generation of the population is ready. This is the 4 solutions in the "crossover" column except of course the mutated second solution. With this population the process of calculating tness values, mating, etc. is continued. Stopping criteria: A common stopping criterion in a GA algorithm is to stop when most of the solutions in the population are the same. In this example, the stopping criterion could be to stop when 3 of the 4 solutions are the same. 2 As the example shows the way GA works is by using mating of pairs, crossover, and mutation on a sucient number of initial solutions. If P denotes the population, i.e. the set of individuals, and O denotes the o spring of the current population, then a pseudocode of the algorithm can be written as: procedure Genetic Algorithm begin generate initial population of n individuals, x 2 S P := x , i = 1; ::; n O := ; calculate C (P ) while P not converged do i i repeat (x1 ; x2 ) := mate(P; C (P )) y := crossover(x1 ; x2 ) z := mutation(y) O := O [ fzg until members(O) = n P := O O := ; calculate C (P ) end end Figure 6.1: Pseudo-code of a Genetic Algorithm procedure 6.1 Implementation According to Beasley [8] four considerations are important when implementing a GA algorithm: the internal representation, the tness function, the reproductive plan, and the stoping criteria. All four areas have been introduced in example 6.1 and below some additional coverage have been added. Implementing GA as a method of solving a problem rst of all requires a suitable internal representation of the solutions. The solutions should be coded in an alphabet. If using the same alphabet for several problems, like the binary, the reproductive plan working on the alphabet becomes problem independent and can be reused. Unfortunately that is not always possible. Like in the other metaheuristics an objective function to compare di erent solutions must be implemented. In GA the objective function is normally know as the tness function, and can be seen as an evaluation of the tness of di erent chromosomes. The reproduction plan cover consists of three parts: Mating, Crossover, and Mutation. When mating the parents should be selected depending upon the tness of those. Solutions with the best tness values should thus be selected more often than those with worse tness values as (6.1) and (6.2) in the example depict. Given this the "strong" parents should create o spring more often than their weaker counterparts making the next generation as strong or hopefully stronger than the current. The size of the population must be carefully chosen. The bigger size - the better result can you expect, but it will also take longer time at each iteration. Having mated the parents a procedure for Crossover must be made. In some cases a simple "one-cut" technique can be used, where the crossing point is randomly selected. The genes to the left of the point are then swapped between the two solutions. In our example this method could not be used, since it could create unfeasible solutions. So in some cases more advanced methods must be used. Crossover should not always be applied, allowing a pair of parents to reproduce themselves, adding the unchanged parents to the new generation. This should happen with a probability between 10 and 40 percent. Mutation is normally simple to implement and is, just as the frequency memory was in TS, the diversi cation part of the algorithm. The probability Mutation should happen with is normally small like about 0.1 % per gene or 1 % per chromosome. Like the probability of not applying the Crossover function, the best value should be found by testing. As stopping criterion you can choose to stop when % of the population has the same chromosomes. The population has converged as it is called. Again the best value of should be found experimentally. Another possibility is to stop after a xed number of iterations or time. 6.2 Advanced operators The selection, crossover, and mating operators described in the example were all basic techniques. Depending upon the problem implementing more advanced operators can improve the quality of the solutions. Using random number generators choosing the parents like in formula (6.1) and (6.2) is for smaller populations a highly variance process, i.e. the expected and the actual numbers of copies made of the solutions can be very di erent. One way of overcoming this is the stochastic remainder selection without replacement. It de nes E as the expected numbers of copies of solution i. i Ei = N  Pi (6.3) Each solution is then copied I times, where I is the integer part of P . The probability of selecting a solution more than I times is given by the fractional remainder R given by: i i i i i Ri = Ei Ii (6.4) Using this scheme a solution with an E of 1.2 with be selected once for sure and a second time more with a probability of 0.2. The random additions are continued until the new population is complete. In the selection phase another possibility is to use Tournament selection. Several variants of this exist. In the most simple a number of participants in each tournament must be chosen. If the number is three then three solutions are randomly selected and the one with the best tness value is allowed to mate. Its mate will be selected by another tournament. Crossover can be done in many ways. Two di erent have been described and more can be found in [20], so no other methods will be covered here. A special greedy crossover technique should be mentioned though. It is a combination of a local improvement method and GA and is especially useful when small changes (i.e. a few genes swapped) lead to highly a ect the tness of the solution. It is a GA approach by all means except that every time crossover has been performed a Descent strategy or a more advanced heuristic is used to nd the local optimum. This optimum is then used as o spring. Depending upon the de ned crossover function two parents produce either one or two o spring. You can then choose to keep either the child (or children) or the ttest one (or two) o them all, i.e. both parents and o spring. By keeping the o spring every time convergence will usually be slower, but diversity of the population is sustained in a higher degree, which is importent for some problems. i The elitism scheme can be used too. In this the best parent is tested against the worst o spring and the best one will stay, the second best parent is tested against the second worst o spring etc. This continues until the chosen number of parents allowed to compete against its o spring, the elitism number, have been tested. If one or more solutions begin to dominate the population within short time it is called premature convergence. For most problems this is unwanted and some steps to prevent it can be taken. The usually one is mutation. Another possibility is called immigration. With a certain rate or probability completely new possible solutions are generated and allowed to compete against the existing population as when using the elitism scheme. But sometimes neither mutation nor immigration is enough. Then the niching-technique might help. Niching is when you attempt to keep several genetic similar sub-populations within the total population in order to keep diversity. This can be done by designing the selection and crossover operations with care. In the selection phase for instance sharing can be used. When sharing the algorithm for each individual calculates how many other individuals which are similar (i.e. one a few genes di erent). The more individuals that are similar, the more the individuals tness are derated. In this way no single solution can take over the entire population. Chapter 7 Other issues After this brief introduction to some of the metaheuristic methods of solution, this chapter will cover some other important issues. First the choosing of a topology and the global convexity will be more deeply described. Then comes a section on how to design, analyse and report on experiments. At last a section has been included with some suggestions for where to nd references with more detailed information. 7.1 Making the right topology By de ning the neighbourhood or the reproduction function, i.e. the possible transitions from one solution to another, you also de ne a topology of the solution space. First, when you are to decide about a method to use to solve a speci c problem, you should think about whether a neighbourhood might be easily de ned or not. An clumsy neighbourhood structure will often lead to a time-consuming neighbourhood generatingfunction, which can severely degrade the general performance of the algorithm. Therefore problems, where no clear neighbourhood structure can be found, might better be solved using for instance Branch and Bound or a problem speci c heuristic algorithm, where the solution space does not need to be transformed. The most important thing to remember when de ning the possible transitions is that every part of the solution space must be reachable within a nite number of iterations from the current solution. In some cases a part of the solution space can be omitted, since it does not contain the optimal solution. But for the part included, the rule above should always hold. Figure 7.1 shows four di erent shapes of topologies of a solution space. Of these gure 7.1(a) is the topology most wanted. Even the simple Descent Strategies will always succeed in nding the optimal solution. This topology is Global Convex, but unfortunately it is almost impossible to nd such nice topologies. The chance is that you might get something like gure 7.1(b) or 7.1(c) { or if you are unlucky { 7.1(d). With a relaxed de nition of global convexity (see Hu et al. [3]) both a, b, and c are convex over their topology of the solution space. If such a property exists the quality of metaheuristic solutions will most often be superior to a similar metaheuristic solving a problem using the topology of gure 7.1(d). A good topology will therefore use as much information about the problem as possible, in order to create this global convexity. A great deal of litterature exists presenting topologies well suited for di erent problems, like the Traveling Salesman Problem, when using speci c metaheuristics. In general a neighbour or a change of a single gene should represent a solution only sligthly di erent from the current. 27 Objective value Objective value Topology S Topology S Objective value (b) Semirough global convexity Objective value (a) Nice global convexity Topology S (c) Rough global convexity Topology S (d) No global convexity Figure 7.1: Topologies of di erent cost surfaces The topologies in b and c both have many local minima preventing the simple Descent Strategies to work eciently. Using multiple starting locations will help - especially in this case if the starting points are chosen close to the until then best achieved solutions. The roughness of the topology does not really matter for these methods. The other local search based heuristics, Simulated Annealing and Tabu Search, have to be optimized di erently depending upon the roughness of the local minima. For greater "depths" of the local minima SA should have a slower cooling speed and TS a longer tabulist for getting clear of these optima. Concerning the nal implementation, the computation speed of the neighbourhood generation function or the reproduction function is very important, since it, like the evaluation function, is called many times during a single run. 7.2 Designing and reporting on heuristic experiments The best stopping criteria and optimal values of the parameters like the tabulength, frequency penalty, T0 , , etc. should normally be found experimentally, since no universal rules of choosing those values exist. How to design and later report on this experiments will be covered in this section. Hooker [33] describes two di erent forms of experimentation - competitive testing and scienti c testing. While the rst is widely used it is less suited for the purpose. In competitive testing you use benchmark tests for creating the fastest implementation of your algorithm. While this help telling you which implementation is the fastest, it does not tell you why. The latter is the main aspect in scienti c testing. Here statistical analysis is used to test how the algorithm depends upon di erent parameters. The problems are generated according to the factor dependence to be investigated and instead of raw computation time the measurement is the number of procedure calls, elementary operations, or other statistics predicted by the algorithm. Barr et al. [32] discuss more about the experimentation. They divide the making of an experiment into several phases:      De ne the goal of the experiments Deside how to measure the performance of the di erent factors Design and run the experiment Analyse data and draw conclusions Report the results The goal can be speed, quality, theoretical interest, etc. When choosing a performance measure time and quality are the most common. But it is also interesting to see how quickly a good solution is found, how often the algorithm fails the quality demand and the quality span between the best solution found and the mean of all found. For the design and analysis phases knowledge of statistical methods is worth a lot. Montgomery [34] gives an extensive introduction to this topic. Statistical analysis is especially important when correlation between di erent factors is expected. It is important to keep the environment stable, i.e. hardware and software platform, unless of course if the goal of the experiment is to measure the dependence of the environmental factors. Also the random number generator should be tested, in order avoid interference from poorly generated numbers. It is equally important to keep reproducibility. When reporting the results it is essential include documentation, so the results can be reproduced later. While the quality of the solutions might not be the goal of every experiment an important issue is the tradeo between computation time and the quality of the solutions. Presenting the results graphically will often be more inviting to the reader, than tables. Below two examples of graphical display are shown. The boxplot is especially suited for the nondeterministic heuristics. In order to get an idea of the results several similar runs are performed making it possible to nd the expected mean value, standard deviation, etc. In a boxplot the maximum and minimum value achieved by each method are marked as the ends of the vertical lines. The upper and lower edge of the box shows the 75 % respectively the 25 % percentiles, while the line through it is the median. Eventually observations statistically to far from the others, the outliers can be marked as circles. Showing all these the boxplot makes it easier to compare di erent methods. The right plot of gure 7.2 show the tradeo between quality and computation time. Again several methods can easily be compared visually using di erent dotted lines for each. In the plot shown, the improvement during a single run is shown. Another possibility is to plot the end results of several runs with di erent computation times. Plotting the computation time vs. the problem size makes another interesting plot. Unexpected behaviour should be emphasized and if possible explained. Otherwise they should be presented as problems worth further research. 490 500 450 480 400 470 350 460 obj. value 300 450 250 200 440 150 430 T0 Alpha Lk = 1000.00 = 0.99 = 25.00 Max obj. value End obj. value = = 100 420 50 0 410 LSS LSM SA TSS 0 200 TSA (a) Boxplot comparing 5 di erent algoritms 400 600 800 1000 iterations 1200 1400 1600 464.08 461.12 1800 2000 (b) Plotting quality vs. iterations Figure 7.2: Exaples of illustations (from Hindsberger [30]) 7.3 Pointers of further research In the bibliography included in this report, selected references for further research can be found. They has been divided into four parts:     Introductions to heuristics Metaheuristics: { general description { speci c metaheuristics Applications Design, analysis, and reporting The rst section in the bibliography contains references about heuristics and the complexity of problems. Section 2 has two parts - one for general introductions to metaheuristics and the other for references describing speci c metaheuristics. The metaheuristics covered in this part are: Continuous Local Search based:     Simulated Annealing - [10], [11], [23], [24], and [28]. Tabu Search - [15], [16], and [19]. Threshold Algorithms - [13]. Variable Neighbourhood Search - [25]. Multistart Local Search based:  G.R.A.S.P. - [14].  Jump Search - [27]. Reconstruction based:     Genetic Algorithms - [8], [20], and [22]. Heuristic Concentration - [26]. Ant Colonies, Cooperative Search - [12]. Scatter Search - [17] and [18]. In section 3 a few references about metaheuristics applied for real-world problems are shown while the last section has references for the issue about experimental design, analysis, and reporting. Bibliography Introduction to heuristics: [1] Fisher, M. L. Worst-Case Analysis of Heuristic Algorithms Management Science 26, 1980. [2] Garey, M. R. and Johnson, D. S. Computers and Intractability: a Guide to the Theory of NP-Completeness Freeman, San Fransisco, 1979. [3] Hu, T. C., Klee, V. and Larman, D. Optimization of Globally Convex functions SIAM Journal of Control and Optimization, 27, 1989. [4] Silver, E. A., Vidal, R. V. V. and de Werra, D. A Tutorial on Heuristic Methods European Journal of Operational Research, 5, 1980. Metaheuristics: General description [5] Hansen, M. P. Metaheuristics for Multiple Objective Combinatorial Optimization Ph.D. dissertation, IMM, Technical University of Denmark, 1998. [6] Pirlot, M. General Local Search Heuristics in Combinatorial Optimization Belgian Journal of Operations Research, Statistics, and Computer Science, Vol. 32 (1,2), 1992. [7] Reeves, C. R. (Editor) Modern Heuristic Techniques for Combinatorial Problems Wiley & Sons Inc., 1993. 32 Speci c algorithms [8] Beasley, D., Bull, D. R. and Martin, R. R. An Overview of Genetic Algorithms: Part 1, Fundamentals University Computing, 15, No. 2, 1993. [9] Boese, K. D., Kahng, A. B. and Muddu, S. A new Adaptive Multi-start Technique for Combinatorial Global Optimizations Operations Research Letters, 16, 1994. [10] C erny, V. A Thermodynamical Approach to the Travelling Salesman Problem: An Ecient Simulation Algorithm Journal of Optimization Theory and Applications, 45, No. 1, 1985. [11] Collin, N. E. et al. Simulated Annealing { an annotated bibliography American Journal of Matematical and Management Science, 8, 1988. [12] Dorigo, M., Maniezzo, V. and Colorni, A. The Ant System: Optimization by a Colony of Cooperating Agents IEEE Transactions on Systems, Man, and Cybernetics - Part B, 26, No. 1, 1996. [13] Dueck, D. and Scheuer, T. Threshold Accepting: A General Purpose Optimization Algorithm Superior to Simulated Annealing Journal of Computational Physics, 90, 1990. [14] Feo, T. A. and Resende, M. G. C. Greedly Randomized Adaptive Search Procedure Journal of Global Optimization, Vol. 6, 1995. [15] Glover, F. Future Paths for Integer Programming and Links to Arti cial Intelligence Computers and Operational Research, 5, 1986. [16] Glover, F., Taillard, E. and de Werra, D. A User's Guide to Tabu Search Annals of Operations Research, 41, 1993. [17] Glover, F. and Laguna, M. General Purpose Heuristics for Integer Programming - Part I Journal of Heuristics, 2, no. 4, 1997. [18] Glover, F. and Laguna, M. General Purpose Heuristics for Integer Programming - Part II Journal of Heuristics, 3, no. 2, 1997. [19] Glover, F. and Laguna, M. Tabu Search Kluwer Academic Publishers, Boston, 1997. [20] Goldberg, D. E. Genetic Algorithms in Search, Optimization and Machine Learning Addison-Wesley, 1989. [21] Hansen, P. The Steepest Ascent Mildest Descent Heuristic for Combinatorial Programming Congress on Numerical Methods in Combinatorial Optimization, Capri, Italy, 1986. [22] Holland, J. M. Adaptation in Natural and Arti cial Systems The University of Michigan Press, Ann Arbor, Michigan, USA, 1975. [23] Kirkpatrick, S., Gelatt, C. D., Jr. and Vecchi, M. P. Optimization by Simulated Annealing Science, 220, 1983. [24] Laarhoven van, P. J. M. and Aarts, E. H. L. Simulated Annealing: Theory and Applications Reidel Publishing Company, 1987. [25] Mladenovic, N. and Hansen, P. Variable Neighbourhood Search Computers and Operations Research, 24, No. 11, 1997. [26] Rosing, K. E. and ReVelle, C. S. Heuristic Concentration: Two Stage Solution Construction European Journal of Operational Research, 97, 1997. [27] Tsubakitani, S. and Evans, J. R. An Empirical Study of a new Metaheuristic for the Traveling Salesman Problem European Journal of Operational Research, 104, 1998. [28] Vidal, R. V. V. (Editor) Simulated Annealing Applied to Combinatorial Optimization Control and Cybernetics, 25, 1996. Applications: [29] Borges, P. C. and Vidal, R. V. V. Fixed Channel Assignment in Cellular Mobile Telecommunication Systems Technical Report 1996-25, IMM, Technical University of Denmark, 1996. [30] Hindsberger, M. The Target-Radar Allocation Problem Master Thesis 1998-9, IMM, Technical University of Denmark, 1998. [31] Vidal, R. V. V. (Editor) Applied Simulated Annealing Lecture Notes in Economics and Mathematical Systems, 396, 1993. Design, analysis, and reporting: [32] Barr, R. S., Golden, B. R., Kelly, J. P., Resende, M. G. C. and Stewart, W. R. Designing and Reporting on Computational Experiments with Heuristic Methods Journal of Heuristics, 1, No. 1, 1995. [33] Hooker, J. Testing Heuristics: We have it all Wrong Journal of Heuristics, 1, No. 1, 1995. [34] Montgomery, D. C. Design and Analysis of Experiments Fourth edition, Wiley & Sons, 1997.