A Study of Some Implications of the No Free Lunch Theorem

Valsecchi, Andrea; Vanneschi, Leonardo

A Study of Some Implications of the No Free Lunch Theorem

Andrea Valsecchi

2008

visibility

…

description

10 pages

link

1 file

We introduce the concept of “minimal” search algorithm for a set of functions to optimize. We investigate the structure of closed under permutation (c.u.p.) sets and we calculate the performance of an algorithm applied to them. We prove that each set of functions based on the distance to a given optimal solution, among which trap functions, onemax or the recently introduced onemix functions, and the NK-landscapes are not c.u.p. and thus the thesis of the sharpened No Free Lunch Theorem does not hold for them. Thus, it makes sense to look for a specific algorithm for those sets. Finally, we propose a method to build a “good” (although not necessarily minimal) search algorithm for a specific given set of problems. The algorithms produced with this technique show better average performance than a genetic algorithm executed on the same set of problems, which was expected given that those algorithms are problem-specific. Nevertheless, in general they cannot be applied for real-life problems, given their high computational complexity that we have been able to estimate.

A Study of Some Implications of the No Free Lunch Theorem Andrea Valsecchi and Leonardo Vanneschi Department of Informatics, Systems and Communication (D.I.S.Co.) University of Milano-Bicocca, Milan, Italy a.valsecchi8@campus.unimib.it vanneschi@disco.unimib.it Abstract. We introduce the concept of “minimal” search algorithm for a set of functions to optimize. We investigate the structure of closed under permutation (c.u.p.) sets and we calculate the performance of an algorithm applied to them. We prove that each set of functions based on the distance to a given optimal solution, among which trap functions, onemax or the recently introduced onemix functions, and the NK-landscapes are not c.u.p. and thus the thesis of the sharpened No Free Lunch Theorem does not hold for them. Thus, it makes sense to look for a specific algorithm for those sets. Finally, we propose a method to build a “good” (although not necessarily minimal) search algorithm for a specific given set of problems. The algorithms produced with this technique show better average performance than a genetic algorithm executed on the same set of problems, which was expected given that those algorithms are problem-specific. Nevertheless, in general they cannot be applied for real-life problems, given their high computational complexity that we have been able to estimate. 1 Introduction The No Free Lunch (NFL) theorem states that all non-repeating search algorithms, if tested on all possible cost functions, have the same average performance [1]. As a consequence, one may informally say that looking for a search algorithm that outperforms all the others on all the possible optimization problems is hopeless. But there are sets of functions for which the thesis of the NFL theorem does not hold, and thus talking about a “good” (or even the “best”) algorithm on those sets of functions makes sense. In this paper, we investigate some of those sets of functions, we define for the first time the concept of minimal algorithm and we investigate some of its properties. The sharpened-NFL [2] states that the thesis of the NFL theorem holds for a set of functions F if and only if F is closed under permutation (c.u.p.). For these particular sets of functions calculating the performance of an algorithm is relatively simple. For instance, for c.u.p. sets of functions with a constant number of globally optimal solutions, a method to estimate the average performance of an algorithm has been presented in [3]. In this paper, we try to generalize this result and to give for the first time an equation to calculate the average performance of algorithms for any c.u.p. set of functions. To prove that a set of cost functions is not c.u.p., it is possible to use some properties of c.u.p. sets. For instance, over the functions that belong to c.u.p. sets, it is not M. Giacobini et al. (Eds.): EvoWorkshops 2008, LNCS 4974, pp. 633–642, 2008. c Springer-Verlag Berlin Heidelberg 2008 634 A. Valsecchi and L. Vanneschi possible to define a non-trivial neighborhood structure of a specific type, as explained in [4]. Furthermore, a set of functions with description length sufficiently bounded is not c.u.p. [5]. In this paper, we prove that some particular sets of problems, which are typically used as benchmarks for experimental or theoretical optimization studies, are not c.u.p. In particular, we focus on problems for which the fitness (or cost) of the solutions is a function of the distance to a given optimum. These problems include, for instance, trap functions [6], onemax and the recently defined onemix [7] functions. We also consider the NK-landscapes. As a consequence of the fact that these sets of functions are not c.u.p., and thus the NFL does not hold for them, we could informally say that it makes sense to look for a “good” (or even the “best”) optimization algorithm for them. In this paper, we present a method to build a “good” (although not necessarily minimal) search algorithm for a given set of problems and we apply it to some sets of trap functions and NK-landscapes. Those algorithms are experimentally compared with a standard Genetic Algorithm (GA) on the same sets of functions. This paper is structured as follows: in Section 2 we briefly recall the NFL theorem. In Section 3 we define the concept of minimal search algorithm and we prove some properties of its performance; furthermore, we give an equation to estimate the performance of an algorithm applied to any set of c.u.p. functions. In section 4 we prove that each set of functions based on the distance to an optimal solution is not c.u.p.; successively, we prove the same property for NK-landscapes. In section 5 we present a method to automatically generate an optimization algorithm specialized for a given set of functions. In section 6, the performance of some of the algorithms generated by our method are compared with the ones of a GA. Finally, in section 7 we offer our conclusions and discuss possible future research activities. 2 No Free Lunch Theorem Let X , Y be two finite sets and let f : X → Y . We call trace of length m over X and Y a sequence of couples: t = (x1 , y1 ), · · · , (xm , ym ) such that xi ∈ X and yi = f (xi ), ∀i = 1, 2, · · · , m. If we interpret X as the space of all possible solutions of an optimization problem (search space), f (.) as the fitness (or cost) function and Y as the set of all possible fitness values, a trace t can be interpreted as a sequence of points visited by a search algorithm along with their fitness values. We call simple a trace t such that: t[i] = t[ j] ⇒ i = j, i.e. a trace in which each solution appears only once. Let T be the set of all possible traces over the sets X and Y . We call search operator a function g : T → X . A search operator can be interpreted as a function that given a trace representing all the solutions visited by a search algorithm until the current instant (along with their fitness values) returns the solution that will / t X . We can be visited at the next step. We say that g is non-repeating1 if ∀t ∈ T, g(t) ∈ observe that, if t ∈ T is simple and g is non-repeating, then also t ′ = t (g(t), f ◦ g(t)) where is the concatenation operator is a simple trace. A deterministic search algorithm Ag is an application Ag : ((X → Y ) × T ) → T with ∀t ∈ T Ag ( f ,t) = t (g(t), f ◦ g(t)) where g is a search operator. We say that Ag is 1 Given a trace t = t1 ,t2 , ...,tm , with ∀i = 1, 2, ..., n : ti = (xi , yi ), we use the notation t X to indicate the set {x1 , x2 , ..., xm }. and the notation t Y to indicate the set {y1 , y2 , ..., ym }. A Study of Some Implications of the No Free Lunch Theorem 635 non-repeating if g is non-repeating. From now on, we use the notation A to indicate a search algorithm (i.e. omitting the search operator g used by A). Furthermore, except where differently specified, with the term “algorithm” we indicate a deterministic non-repeating search algorithm. We indicate Am ( f ,t) the application of m iterations of algorithm A to trace t, defined as: A0 ( f ,t) = t, Am+1 ( f ,t) = A( f , Am ( f ,t)). Finally, m m we define A ( f ) = A ( f , ), where is the empty trace. A set F ⊆ Y X is called closed under permutation (c.u.p.) if ∀ f ∈ F and for each permutation σ of X , ( f ◦ σ) ∈ F. Then, we can enunciate: Proposition 1 (Sharpened NFL [2]). Let A, B be two deterministic non-repeating search algorithms and let F ⊆ Y X be c.u.p. Then ∀m ∈ {1, · · · , |X |}: {(Am ( f ))Y : f ∈ F} = {(Bm ( f ))Y : f ∈ F} 3 The Minimal Algorithm and Its Performances Given an algorithm A and a function f : X → Y , we define minimum fitness evaluations for optimum as: φA ( f ) = min{m ∈ N | Am ( f ) contains an optimal solution}. Given F ⊆ 1 Y X the average of such value on F is: φ̄A (F) = |F| ∑ f ∈F φA ( f ). If X , Y are finite, then also the number of search algorithms is finite (each algorithm produces a simple trace over X and Y and the number of possible simple traces over X and Y is finite if X and Y are finite). Let {A1 , A2 , ..., An } be the finite set of all possible algorithms. Then the set {φ̄A1 (F), φ̄A2 (F), ..., φ̄An (F)} is also finite and thus has a minimum. Let φ̄Amin (F) be that minimum, i.e. φ̄Amin (F) = min{φ̄A1 (F), φ̄A2 (F), ..., φ̄An (F)}. We call an algorithm like Amin a minimal algorithm for F [2]. From now on, we use the notation ∆(F) to indicate φ̄Amin (F) for simplicity. Proposition 2. If F is c.u.p. and all the functions in F have k optimal solutions, then: ∆(F) = (|X | + 1)/(k + 1). The proof of this proposition can be found in [3]. Let F be c.u.p. and let ∼ be a relationship defined as follows: ∀ f , g ∈ F, f ∼ g ⇐⇒ ∃σ ∈ SX : f = g ◦ σ, where SX is the set of all permutation of X . We remark that, given a c.u.p. set F and any two functions f and g in F, f ∼ g does not necessarily hold. ∼ is an equivalence relationship and [ f ]∼ can be written as { f ◦ σ | σ ∈ SX }. Thus [ f ]∼ is c.u.p. Furthermore, all functions in [ f ]∼ have the same number of optimal solutions. According to the sharpened NFL, for each algorithm A, ∆(F) = φ̄A (F) (all the algorithms have the same average performance over a c.u.p. set of functions) and thus also ∆([ f ]) = φ̄A ([ f ]) holds. Let R be a set of class representatives of F/∼, then the following property holds: ∆(F) = φ̄(F) = = 1 |F| 1 ∑ φ( f ) = |F| ∑ ∑ φ( f ) = f ∈F r∈R f ∈[r] 1 1 |X | + 1 1 ∑ |[r]|φ̄([r]) = |F| ∑ |[r]|∆([r]) = |F| ∑ |[r]| op(r) + 1 |F| r∈R r∈R r∈R = |X | + 1 |[r]| ∑ op(r) + 1 |F| r∈R (1) 636 A. Valsecchi and L. Vanneschi where op(r) is the number of optimal solutions of r. Considering the image of r as a multiset, we have Im(r) = {a1 , . . . , a|Y | } (i.e. ai = |{x ∈ X : r(x) = i}|). It follows that: |X | |X |! = |[r]| = a1 , . . . , a|Y | (a1 )! · · · (a|Y | )! Equation (1) allows us to estimate ∆(F) for any set of c.u.p. functions. 4 Some Sets on Non-c.u.p. Functions The objective of this section is to prove that some sets of functions that are often used as benchmarks in experimental or theoretical optimization studies are not c.u.p. As a consequence, for these sets of functions it makes sense to look for a “good” (or even for the minimal) algorithm. In particular, in this work we focus on the set of all the problems where fitness can be calculated as a function of the distance to a given optimal solution (like for instance trap functions [6], onemax and the recently defined onemix [7] functions) and the NK-landscapes [8]. 4.1 Functions of the Distance to the Optimum Here we consider the set of functions of the distance to a unique known global optimum solution. If we normalize all fitness values into the range [0, 1] and we consider 1 as the best possible fitness value and zero as the worst one, we can characterize these functions as follows: Let G = {g : {0, . . . , n} → [0, 1] | g(k) = 1 ⇐⇒ k = 0}. We say that fo,g : {0, 1}n → [0, 1] is a function of the distance if ∃o ∈ {0, 1}n, ∃g ∈ G such that fo,g (z) = g(d(z, o)), where d is a given distance, z is a (binary string) solution and o is the unique global optimum. In this work we focus on Hamming distance. We call trivial a function fo,g such that g ∈ G is constant over {1, . . . n}. In other words, we call trivial a “needle in a haystack” function. We want to prove that each set of functions of the Hamming distance containing at least one non-trivial function is not c.u.p. Proposition 3. Let n ≥ 1 and let F = { fg,o | f (z) = g(d(z, o)), z ∈ {0, 1}n, o ∈ {0, 1}n, g ∈ G}, where d is the Hamming distance. For each F ′ ⊆ F, if a non-trivial function fg,o ∈ F ′ exists, then F ′ is not c.u.p. Proof. If n = 1, the thesis is trivially true. The proof of the case n = 2 is not reported here to save space (the case of binary strings of length 2 is probably not very interesting). Let us consider n > 2 and let z = z1 · · · zn ∈ {0, 1}n . We call flip(z, i) := z̄1 , . . . , z̄i , zi+1 , . . . , zn and backflip(z, i) := z1 , . . . , zn−i , z̄n−i+1 , . . . , z̄n . Then: d(flip(z, i), z) = i and d(backflip(z, i), z) = i. Let ab absurdo F ′ be a c.u.p. set. An fg,o ∈ F ′ non-trivial exists by hypothesis. Thus ∃i, j with 0 < i < j ≤ n such that g(i) = g( j). Let σ a permutation of {0, 1}n such that σ(o) = o, σ(flip(1, o)) = flip(i, o) and σ(backflip(1, o)) = flip( j, o). Since F ′ is c.u.p., we can say that g′ ∈ G and o′ ∈ {0, 1}n exist such that ( fg,o ◦ σ) = fg′ ,o′ . Then the following property holds: g′ (d(o, o′ )) = fg′ ,o′ (o) = fg,o (σ(o)) = fg,o (o) = g(d(o, o)) = g(0) = 1. By definition, it follows that d(o, o′ ) = 0 i.e. o′ = o. Then the following properties hold: g′ (1) = g′ (d(flip(1, o), o) = fg′ ,o′ (flip(1, o)) = fg,o (σ(flip(1, o))) = A Study of Some Implications of the No Free Lunch Theorem 637 fg,o (flip(i, o)) = g(d(flip(i, o), o)) = g(i) and g′ (1) = g′ (d(backflip(1, o), o) = fg′ ,o′ (backflip(1, o)) = fg,o (σ(backflip(1, o))) = fg,o (flip( j, o)) = g(d(flip( j, o), o)) = g( j). From these properties, we can deduce that g(i) = g′ (1) = g( j), which is against the hypothesis. 4.2 NK-Landscapes In this section, we prove that the set of all the NK-landscapes [8] with adjacent neighbourhood is not c.u.p. Proposition 4. Let n ≥ 1 and let 0 ≤ k < n. Then the set of functions NK = {Fφ | φ : {0, 1}k+1 → {0, 1}} where Fφ (x) = n1 ∑ni=1 φ(xi , . . . , xi+k ) is not c.u.p. Proof. In order to prove that this set of functions is not c.u.p. we consider a function that belongs to this set and we show that one of its permutations does not belong to this set. Let k < n − 1, φ = XOR and σ a permutation such that σ(0n ) = 10n−1 . We show that a g such that Fφ ◦ σ = Fg does not exist. The following properties hold: (Fφ ◦ σ)(0n ) = Fφ (10n−1 ) = (k + 1)/n and Fg (0n ) = n1 ∑ni=1 g(0k+1 ) = g(0k+1 ) ∈ {0, 1}. It follows that: (k + 1)/n ∈ {0, 1} which is absurd, since k + 1 < n. Let now k = n − 1, φ = XOR and σ such that σ(10n−1 ) = 0n , σ(0n ) = 10n−1 and σ(x) = x for the remaining elements. We have: (Fφ ◦ σ)(0n−1 1) = Fφ (0n−1 1) = 1n [φ(0n−1 1) + . . . + φ(10n−1 )] = n/n = 1; thus: Fg (0n−1 1) = 1n [g(0n−11) + . . . + g(10n−1 )] = 1; and thus: (Fφ ◦ σ)(10n−1 ) = Fφ (0n ) = 0; but: Fg (10n−1) = 1n [g(10n−1) + . . . + g(0n−11)] = Fg (0n−1 1) = 1, which is an absurd. 5 Automatic Generation of an Optimization Algorithm We now introduce a method to automatically generate a specific algorithm for a given set of functions F. Let us consider the following “game” with two players A and B: player A chooses a cost function f in F (without revealing its choice to player B) and player B must find an optimal solution of this function. At each step of this game, player B can choose a solution i in the search space X and player A is forced to reveal the fitness of that solution, i.e. the value of f (i). The objective of player B is to find an optimal solution in the smallest possible number of steps. We assume that player B knows all the values of all the possible solutions for all the functions in F (for instance, we could imagine that they have been stored in a huge database). Here we propose a strategy that player B may use to identify the function f chosen by player A. Once player B has identified this function, since he knows all the cost values of all the possible solutions, he can immediately exhibit the optimal solution at the subsequent step. According to this strategy, in order to identify the function f chosen by player A, at each step of the game player B should perform the action that allows him to “eliminate” the largest possible number of functions among the ones that can be candidate to be function f . We call Fc such a set of candidate functions. The strategy initializes Fc := F. Then, for each point k of X , we create a “rule” s1 | . . . | sm where si is the number of functions in Fc such that f (k) = i. The most suitable rule is the one that minimizes the average 638 A. Valsecchi and L. Vanneschi 2 of |Fc | after its application, i.e. the one that minimizes (∑m 1 (si ) )/|Fc |, or more simply m 2 ∑1 (si ) , given that |Fc | is the same for all the rules before their application. Thus, at each step we choose point b with the most suitable rule, we ask to player A the value of f (b) and we eliminate from the set Fc all the functions f ′ with f (b) = f ′ (b). Example 1. Let X = Y = {0, 1, 2, 3, 4} and let us consider the set of functions F = { f1 , . . . , f8 } with: f1 = (1, 2, 3, 4, 0), f2 = (3, 1, 2, 0, 0), f3 = (4, 1, 1, 2, 3), f4 = (0, 0, 0, 1, 2), f5 = (0, 0, 0, 1, 1), f6 = (0, 0, 0, 0, 1), f7 = (4, 1, 0, 2, 3), f8 = (0, 1, 0, 0, 0); where ∀i = 1, 2, ..., 8, fi has been defined with the notation: fi = (y0 , y1 , y2 , y3 , y4 ) where ∀ j = 0, ..., 4 fi ( j) = y j . Let f1 , f2 , ..., f8 be functions to be maximized. Let us suppose that the target function (the one chosen by player A) is f6 . Our method begins initializing the set of candidate functions: Fc := F. Then a loop over all the solutions begins. Let us consider first solution x = 0. We have to generate a rule s1 | . . . | sm where si is the number of functions in Fc such that f (0) = i. The number of functions fi in F for which fi (0) = 0 is equal to 4, thus s0 = 4; the number of functions for which fi (0) = 1 is equal to 1, thus s1 = 1. Iterating this process, we obtain a rule for solution x = 0 equal to 4|1|1|2. Now we have to calculate a measure of the “cost” of this rule. We define 2 it as ∑m 1 (si ) , and thus it is equal to 22. We now repeat the same process for all the solutions in the search space and we obtain the data reported in table 1. The two rules Table 1. The rules produced for all the possible solutions at the first iteration in the example introduced in the text solution 0 1 2 3 4 rule 4|1|1|2 3|4|1 5|1|1|1 3|2|2|1 3|2|1|2 cost 22 26 28 18 18 that minimize the cost are the ones associated with solutions 3 and 4. Thus player B asks to player A the value of the chosen function for the first of those solutions, i.e. solution 3. The answer of player A is f (3) = 0. Now, player B can modify the set of candidate solutions Fc , by eliminating those functions fi such that fi (3) = 0. Thus: Fc := { f2 , f6 , f8 }. If we iterate this process, we can see at the second iteration one of the solutions associated with the rule with the minimum cost is 0 and at the third iteration it is 1. After that, Fc = { f6 } and the algorithm terminates, since player B has identified the function chosen by player A. All she has to do is to return the solution that maximizes function f6 , i.e. x = 4. To automatically produce a search algorithm inspired by this strategy, it is sufficient to store the choices that have been done in a decision tree. Such a tree has the nodes labelled with solutions x ∈ X and arcs departing from a node labelled with x are labelled with all possible values of fi (x) for each fi ∈ Fc . The son of a node labelled with x and linked to x by an arc labelled with fi (x) is the solution related to rule with minimum cost after that all functions f j with f j (x) = fi (x) have been eliminated from Fc . Figure 1 A Study of Some Implications of the No Free Lunch Theorem 639 Fig. 1. The decision tree generated by our algorithm for the example introduced in the text. The dashed line refers to the last step of the algorithm, where the function has been identified and thus the optimal solution can be returned. represents the decision tree generated by our method for the previous example. The first solution that we have considered (the one that minimizes the cost in Table 1) was 3. Successively, we have considered solutions 0 and 1 and we have been able to return solution 4. We remark that the leaves of this tree are labelled with points that are optimal solutions for at least one function in F. We call “RULE” this method to automatically generate algorithms2 for a given set of functions F. The following property holds: Proposition 5. RULE has a computational complexity of Θ(|X ||F|4 ). Proof. The algorithm is executed on each solution in X and for each function in F. Thus, the generation of the sequence of rules has a complexity of Θ(|X ||F|). At each application of the rules, at least one function is eliminated from the set of candidate |F| . Thus ones, thus one iteration has a complexity of: |X ||F| ∑i=0 |F| − i = |X ||F| (|F|+1)|F| 2 the total complexity is Θ(|X ||F|4 ). Proposition 6. The decision tree produced by RULE has a number of nodes of O(|X ||F|). Proof. For each f ∈ F the search for an optimal solution consists in examinating a sequence of points of X with no repetitions. It is possible to show that RULE does not necessarily generate the minimal algorithm for a given set of functions. For instance, one may imagine a set of functions that all share the same set of optimal solutions Opt = {o1 , o2 , ..., oq }; for this set of functions, a minimal algorithm is clearly one that outputs a solution oi ∈ Opt at the first step, and not the algorithm generated by RULE. For this reason, we informally say that RULE generates “good”, although not necessarily minimal, algorithms. Furthermore, from proposition 5, we can deduce that RULE is clearly too complex to be used for real-life problems. Nevertheless, in the next section we show how RULE (and the algorithms generated by RULE) can be used for interesting (although rather “small”) instances of trap functions and NK-landscapes. 2 Strictly speaking, RULE does not generate “search algorithms”, since they are specialized for a particular set of functions and domain; nevertheless, we continue calling the output of rule “algorithm” for simplicity. 640 A. Valsecchi and L. Vanneschi 6 Experimental Results Propositions 5 and 6 show that given a particular set of functions F defined over a domain X , executing RULE to generate an algorithm A and then running A on all the functions in F has a larger computational cost than exhaustively examinating all the possible solutions for each function in F (whose cost is clearly O(|X ||F|)). For this reason, it must be clear that the goal of this study is not trying to produce a technique to efficiently solve particular sets of functions in practice. Nevertheless, we think that it might be interesting to quantify the theoretical performance improvement of a problemspecific algorithm, compared to a “general” one, like for instance a GA. For this reason, in this section the algorithms produced by RULE are compared with a standard GA. The performance measure used for this comparison, consistantly with our definition of minimal algorithm for a set of functions F (see Section 3), will be φ̄(F). Since GAs are repeating, we count its fitness evaluations without repetitions. For GAs, we have used the following set of parameters: population size of 100 potential solutions, standard single-point crossover [9,10] with rate equal to 0.9, standard point mutation [9,10] with rate equal to 0.01, tournament selection with tournament size equal to 2, elitism (i.e., the best individual is copied unchanged into the next population), maximum number of generations equal to 200. The sets of functions that we use in our experiments are partitioned into two groups, each one composed by three sets of functions. The first group contains three sets of trap functions. Trap functions [6] are a particular set of functions of the distance (as defined in Section 4.1) that depend on the values of two costants: B (the width of the attractive basin for each optimum) and R (their relative importance). The three sets of trap functions used for our experiments are respectively composed by 100, 250 and 500 “randomly chosen” trap functions. For “randomly chosen” trap function we mean a trap function where the B and R constants and the (unique) optimal solution have been chosen randomly with uniformly distributed probability over their domains (the range [0, 1] for B and R, the search space for the optimal solution). The second group of functions that we have used contains three sets of NK-landscapes functions. NKlandscape functions [8] are completely defined by the value of two constants (N and K) and one “kernel” function φ : [0, 1]K+1 → [0, 1] The sets of functions we have used are respectively composed by 100, 250 and 500 “randomly generated” NK-landscapes, i.e. NK-landscapes where K and φ have been generated uniformly at random. For all these functions, the search space X that we have chosen is composed by binary strings of 8 bits (thus N = 8 for NK-landscapes). Table 2 shows the results obtained by the GA. The first column represents the set of functions F on which the experiments have been done (for instance “Trap p” means a set of p “randomly chosen” trap functions). The second column reports the average number of evaluations with no repetitions that have been spent by the GA for finding an optimal solution with their standard deviations; more in particular, for each f ∈ F we have executed 100 independent GA runs and only for those runs where the optimal solution has been found (before generation 200) we have calculated the number of evaluations without repetitions that have been performed before finding the optimum. Then, we have averaged all those numbers over the 100 independent runs. The result that we report is the average of all those averages over all functions in F. The third A Study of Some Implications of the No Free Lunch Theorem 641 Table 2. Results returned by the GA. Each line reports the results for a different set of functions. F Trap 500 Trap 250 Trap 100 NK 500 NK 250 NK 100 φ̄GA (F) 145.18 (σ = 13.6) 145.52 (σ = 13.7) 145.64 (σ = 13.4) 141.61 (σ = 12.5) 142.05 (σ = 12.9) 141.86 (σ = 12.5) Avg Total FE 3633.41 (σ = 7232) 3607.35 (σ = 7200.7) 4128.53 (σ = 7641.6) 804.15 (σ = 3024.8) 886.54 (σ = 3267.2) 754.18 (σ = 2867.6) SR 0.82 0.83 0.85 0.98 0.97 0.98 column reports the average number (calculated as above) of evaluations (also counting repetitions) that have been spent by the GA for finding an optimal solution with their standard deviations. Finally, the fourth column reports the success rate, i.e. the number of runs where an optimal solution has been found divided by the total number of runs that we have performed (100 in our experiments) averaged over all functions in F. Table 3 reports the results of the algorithms generated by RULE on the same sets of problems. The first column identifies the set of functions F on which the experiments have been done; the second column reports the average (calculated over all functions in F) number of evaluations spent to find an optimal solution with their standard deviations. An optimal solution has always been found for each one of these executions (thus we do not report success rates). Table 3. Results returned by the algorithms generated by RULE. Each line reports the results for a different set of functions. F Trap 500 Trap 250 Trap 100 NK 500 NK 250 NK 100 φ̄rule (F) 2.99 (σ = 0.22) 2.95 (σ = 0.21) 2.75 (σ = 0.43) 4.57 (σ = 0.62) 4.22 (σ = 0.52) 3.81 (σ = 0.49) Comparing results in Tables 2 and 3 we can clearly see that the algorithms generated by RULE have a remarkably better performance than the GA. This was expected since these algorithms are problem-specific, i.e. they have been generated to solve those particular problems. 7 Conclusions and Future Work We have defined the concept of minimal search algorithm for a given set of problems. We have also introduced an equation to calculate the average performance of an algorithm over a closed under permutation (c.u.p.) set of functions. Furthermore, we have proven that some particular sets of functions are not c.u.p. In particular, we focused on any set of functions of the distance to a given optimal solution (this set contains 642 A. Valsecchi and L. Vanneschi some well known benchmarks, like trap functions, onemax and onemix) and on NKlandscapes. Not being c.u.p., for those sets the No Free Lunch theorem does not hold and thus it makes sense to look for a minimal algorithm. Inspired by this, we have presented a method to build a specific (not necessarily minimal) search algorithm for a given set of functions to optimize. We have experimentally shown that the algorithms generated by such a method remarkably outperform a standard Genetic Algorithm on some “small” instances of trap functions and NK-landscapes. This was expected given that the generated algorithms are problem-specific. Our method cannot be applied to real-life applications, given its complexity, which we have estimated as a function of the size of the search space and of the cardinality of the considered set of functions. In the future, we plan to prove other interesting properties of the minimal algorithm, to prove whether other interesting sets of functions are c.u.p. or not and to improve the RULE algorithm, eventually employing some concepts of Rough-Sets. References 1. Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation 1(1), 67–82 (1997) 2. Schumacher, C., Vose, M.D., Whitley, L.D.: The no free lunch and problem description length. In: Spector, L., Goodman, E.D., Wu, A., Langdon, W.B., Voigt, H.-M., Gen, M., Sen, S., Dorigo, M., Pezeshk, S., Garzon, M.H., Burke, E. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), 7-11 2001, pp. 565–570. Morgan Kaufmann, San Francisco (2001) 3. Igel, C., Toussaint, M.: Recent results on no-free-lunch theorems for optimization. CoRR: Neural and Evolutionary Computing cs.NE/0303032 (2003) 4. Igel, C., Toussaint, M.: On classes of functions for which no free lunch results hold. Inf. Process. Lett. 86(6), 317–321 (2003) 5. Streeter, M.J.: Two broad classes of functions for which a no free lunch result does not hold. In: Cantú-Paz, E., Foster, J.A., Deb, K., Davis, L., Roy, R., O’Reilly, U.-M., Beyer, H.-G., Kendall, G., Wilson, S.W., Harman, M., Wegener, J., Dasgupta, D., Potter, M.A., Schultz, A., Dowsland, K.A., Jonoska, N., Miller, J., Standish, R.K. (eds.) GECCO 2003. LNCS, vol. 2724, pp. 1418–1430. Springer, Heidelberg (2003) 6. Deb, K., Goldberg, D.E.: Analyzing deception in trap functions. In: Whitley, D. (ed.) Foundations of Genetic Algorithms, vol. 2, pp. 93–108. Morgan Kaufmann, San Francisco (1993) 7. Poli, R., Vanneschi, L.: Fitness-proportional negative slope coefficient as a hardness measure for genetic algorithms. In: Thierens, D., et al. (eds.) Genetic and Evolutionary Computation Conference, GECCO 2007, pp. 1335–1342. ACM Press, New York (2007) 8. Altenberg, L.: Nk fitness landscapes. In: Back, T., et al. (eds.) Handbook of Evolutionary Computation, Section B2.7.2, p. 2. B2.7:5 – B2.7:10 IOP Publishing Ltd and Oxford University Press (1997) 9. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading (1989) 10. Holland, J.H.: Adaptation in Natural and Artificial Systems. The University of Michigan Press, Ann Arbor, Michigan (1975)

Log In

A Study of Some Implications of the No Free Lunch Theorem

Related papers

Related topics