A Study of Some Implications
of the No Free Lunch Theorem
Andrea Valsecchi and Leonardo Vanneschi
Department of Informatics, Systems and Communication (D.I.S.Co.)
University of Milano-Bicocca, Milan, Italy
a.valsecchi8@campus.unimib.it
vanneschi@disco.unimib.it
Abstract. We introduce the concept of “minimal” search algorithm for a set of
functions to optimize. We investigate the structure of closed under permutation
(c.u.p.) sets and we calculate the performance of an algorithm applied to them.
We prove that each set of functions based on the distance to a given optimal solution, among which trap functions, onemax or the recently introduced onemix
functions, and the NK-landscapes are not c.u.p. and thus the thesis of the sharpened No Free Lunch Theorem does not hold for them. Thus, it makes sense to
look for a specific algorithm for those sets. Finally, we propose a method to build
a “good” (although not necessarily minimal) search algorithm for a specific given
set of problems. The algorithms produced with this technique show better average performance than a genetic algorithm executed on the same set of problems,
which was expected given that those algorithms are problem-specific. Nevertheless, in general they cannot be applied for real-life problems, given their high
computational complexity that we have been able to estimate.
1 Introduction
The No Free Lunch (NFL) theorem states that all non-repeating search algorithms, if
tested on all possible cost functions, have the same average performance [1]. As a consequence, one may informally say that looking for a search algorithm that outperforms
all the others on all the possible optimization problems is hopeless. But there are sets
of functions for which the thesis of the NFL theorem does not hold, and thus talking
about a “good” (or even the “best”) algorithm on those sets of functions makes sense.
In this paper, we investigate some of those sets of functions, we define for the first time
the concept of minimal algorithm and we investigate some of its properties.
The sharpened-NFL [2] states that the thesis of the NFL theorem holds for a set of
functions F if and only if F is closed under permutation (c.u.p.). For these particular
sets of functions calculating the performance of an algorithm is relatively simple. For
instance, for c.u.p. sets of functions with a constant number of globally optimal solutions, a method to estimate the average performance of an algorithm has been presented
in [3]. In this paper, we try to generalize this result and to give for the first time an equation to calculate the average performance of algorithms for any c.u.p. set of functions.
To prove that a set of cost functions is not c.u.p., it is possible to use some properties of c.u.p. sets. For instance, over the functions that belong to c.u.p. sets, it is not
M. Giacobini et al. (Eds.): EvoWorkshops 2008, LNCS 4974, pp. 633–642, 2008.
c Springer-Verlag Berlin Heidelberg 2008
634
A. Valsecchi and L. Vanneschi
possible to define a non-trivial neighborhood structure of a specific type, as explained
in [4]. Furthermore, a set of functions with description length sufficiently bounded is
not c.u.p. [5]. In this paper, we prove that some particular sets of problems, which are
typically used as benchmarks for experimental or theoretical optimization studies, are
not c.u.p. In particular, we focus on problems for which the fitness (or cost) of the solutions is a function of the distance to a given optimum. These problems include, for
instance, trap functions [6], onemax and the recently defined onemix [7] functions. We
also consider the NK-landscapes. As a consequence of the fact that these sets of functions are not c.u.p., and thus the NFL does not hold for them, we could informally say
that it makes sense to look for a “good” (or even the “best”) optimization algorithm for
them. In this paper, we present a method to build a “good” (although not necessarily
minimal) search algorithm for a given set of problems and we apply it to some sets of
trap functions and NK-landscapes. Those algorithms are experimentally compared with
a standard Genetic Algorithm (GA) on the same sets of functions.
This paper is structured as follows: in Section 2 we briefly recall the NFL theorem. In
Section 3 we define the concept of minimal search algorithm and we prove some properties of its performance; furthermore, we give an equation to estimate the performance
of an algorithm applied to any set of c.u.p. functions. In section 4 we prove that each
set of functions based on the distance to an optimal solution is not c.u.p.; successively,
we prove the same property for NK-landscapes. In section 5 we present a method to
automatically generate an optimization algorithm specialized for a given set of functions. In section 6, the performance of some of the algorithms generated by our method
are compared with the ones of a GA. Finally, in section 7 we offer our conclusions and
discuss possible future research activities.
2 No Free Lunch Theorem
Let X , Y be two finite sets and let f : X → Y . We call trace of length m over X and Y
a sequence of couples: t = (x1 , y1 ), · · · , (xm , ym ) such that xi ∈ X and yi = f (xi ), ∀i =
1, 2, · · · , m. If we interpret X as the space of all possible solutions of an optimization
problem (search space), f (.) as the fitness (or cost) function and Y as the set of all
possible fitness values, a trace t can be interpreted as a sequence of points visited by a
search algorithm along with their fitness values.
We call simple a trace t such that: t[i] = t[ j] ⇒ i = j, i.e. a trace in which each
solution appears only once. Let T be the set of all possible traces over the sets X and Y .
We call search operator a function g : T → X . A search operator can be interpreted as
a function that given a trace representing all the solutions visited by a search algorithm
until the current instant (along with their fitness values) returns the solution that will
/ t X . We can
be visited at the next step. We say that g is non-repeating1 if ∀t ∈ T, g(t) ∈
observe that, if t ∈ T is simple and g is non-repeating, then also t ′ = t (g(t), f ◦ g(t))
where is the concatenation operator is a simple trace.
A deterministic search algorithm Ag is an application Ag : ((X → Y ) × T ) → T with
∀t ∈ T Ag ( f ,t) = t (g(t), f ◦ g(t)) where g is a search operator. We say that Ag is
1
Given a trace t = t1 ,t2 , ...,tm , with ∀i = 1, 2, ..., n : ti = (xi , yi ), we use the notation t X to
indicate the set {x1 , x2 , ..., xm }. and the notation t Y to indicate the set {y1 , y2 , ..., ym }.
A Study of Some Implications of the No Free Lunch Theorem
635
non-repeating if g is non-repeating. From now on, we use the notation A to indicate
a search algorithm (i.e. omitting the search operator g used by A). Furthermore, except where differently specified, with the term “algorithm” we indicate a deterministic
non-repeating search algorithm. We indicate Am ( f ,t) the application of m iterations of
algorithm A to trace t, defined as: A0 ( f ,t) = t,
Am+1 ( f ,t) = A( f , Am ( f ,t)). Finally,
m
m
we define A ( f ) = A ( f , ), where is the empty trace.
A set F ⊆ Y X is called closed under permutation (c.u.p.) if ∀ f ∈ F and for each
permutation σ of X , ( f ◦ σ) ∈ F. Then, we can enunciate:
Proposition 1 (Sharpened NFL [2]). Let A, B be two deterministic non-repeating
search algorithms and let F ⊆ Y X be c.u.p. Then ∀m ∈ {1, · · · , |X |}: {(Am ( f ))Y : f ∈
F} = {(Bm ( f ))Y : f ∈ F}
3 The Minimal Algorithm and Its Performances
Given an algorithm A and a function f : X → Y , we define minimum fitness evaluations
for optimum as: φA ( f ) = min{m ∈ N | Am ( f ) contains an optimal solution}. Given F ⊆
1
Y X the average of such value on F is: φ̄A (F) = |F|
∑ f ∈F φA ( f ).
If X , Y are finite, then also the number of search algorithms is finite (each algorithm
produces a simple trace over X and Y and the number of possible simple traces over X
and Y is finite if X and Y are finite). Let {A1 , A2 , ..., An } be the finite set of all possible
algorithms. Then the set {φ̄A1 (F), φ̄A2 (F), ..., φ̄An (F)} is also finite and thus has a minimum. Let φ̄Amin (F) be that minimum, i.e. φ̄Amin (F) = min{φ̄A1 (F), φ̄A2 (F), ..., φ̄An (F)}.
We call an algorithm like Amin a minimal algorithm for F [2]. From now on, we use the
notation ∆(F) to indicate φ̄Amin (F) for simplicity.
Proposition 2. If F is c.u.p. and all the functions in F have k optimal solutions, then:
∆(F) = (|X | + 1)/(k + 1).
The proof of this proposition can be found in [3].
Let F be c.u.p. and let ∼ be a relationship defined as follows: ∀ f , g ∈ F, f ∼ g ⇐⇒
∃σ ∈ SX : f = g ◦ σ, where SX is the set of all permutation of X . We remark that,
given a c.u.p. set F and any two functions f and g in F, f ∼ g does not necessarily
hold. ∼ is an equivalence relationship and [ f ]∼ can be written as { f ◦ σ | σ ∈ SX }.
Thus [ f ]∼ is c.u.p. Furthermore, all functions in [ f ]∼ have the same number of optimal
solutions. According to the sharpened NFL, for each algorithm A, ∆(F) = φ̄A (F) (all
the algorithms have the same average performance over a c.u.p. set of functions) and
thus also ∆([ f ]) = φ̄A ([ f ]) holds. Let R be a set of class representatives of F/∼, then the
following property holds:
∆(F) = φ̄(F) =
=
1
|F|
1
∑ φ( f ) = |F| ∑ ∑ φ( f ) =
f ∈F
r∈R f ∈[r]
1
1
|X | + 1
1
∑ |[r]|φ̄([r]) = |F| ∑ |[r]|∆([r]) = |F| ∑ |[r]| op(r) + 1
|F| r∈R
r∈R
r∈R
=
|X | + 1
|[r]|
∑ op(r) + 1
|F| r∈R
(1)
636
A. Valsecchi and L. Vanneschi
where op(r) is the number of optimal solutions of r. Considering the image of r as a
multiset, we have Im(r) = {a1 , . . . , a|Y | } (i.e. ai = |{x ∈ X : r(x) = i}|). It follows that:
|X |
|X |!
=
|[r]| =
a1 , . . . , a|Y |
(a1 )! · · · (a|Y | )!
Equation (1) allows us to estimate ∆(F) for any set of c.u.p. functions.
4 Some Sets on Non-c.u.p. Functions
The objective of this section is to prove that some sets of functions that are often used as
benchmarks in experimental or theoretical optimization studies are not c.u.p. As a consequence, for these sets of functions it makes sense to look for a “good” (or even for the
minimal) algorithm. In particular, in this work we focus on the set of all the problems
where fitness can be calculated as a function of the distance to a given optimal solution (like for instance trap functions [6], onemax and the recently defined onemix [7]
functions) and the NK-landscapes [8].
4.1 Functions of the Distance to the Optimum
Here we consider the set of functions of the distance to a unique known global optimum
solution. If we normalize all fitness values into the range [0, 1] and we consider 1 as
the best possible fitness value and zero as the worst one, we can characterize these
functions as follows: Let G = {g : {0, . . . , n} → [0, 1] | g(k) = 1 ⇐⇒ k = 0}. We say
that fo,g : {0, 1}n → [0, 1] is a function of the distance if ∃o ∈ {0, 1}n, ∃g ∈ G such that
fo,g (z) = g(d(z, o)), where d is a given distance, z is a (binary string) solution and o is
the unique global optimum. In this work we focus on Hamming distance. We call trivial
a function fo,g such that g ∈ G is constant over {1, . . . n}. In other words, we call trivial
a “needle in a haystack” function. We want to prove that each set of functions of the
Hamming distance containing at least one non-trivial function is not c.u.p.
Proposition 3. Let n ≥ 1 and let F = { fg,o | f (z) = g(d(z, o)), z ∈ {0, 1}n, o ∈ {0, 1}n,
g ∈ G}, where d is the Hamming distance. For each F ′ ⊆ F, if a non-trivial function
fg,o ∈ F ′ exists, then F ′ is not c.u.p.
Proof. If n = 1, the thesis is trivially true. The proof of the case n = 2 is not reported here to save space (the case of binary strings of length 2 is probably not very
interesting). Let us consider n > 2 and let z = z1 · · · zn ∈ {0, 1}n . We call flip(z, i) :=
z̄1 , . . . , z̄i , zi+1 , . . . , zn and backflip(z, i) := z1 , . . . , zn−i , z̄n−i+1 , . . . , z̄n . Then: d(flip(z, i),
z) = i and d(backflip(z, i), z) = i.
Let ab absurdo F ′ be a c.u.p. set. An fg,o ∈ F ′ non-trivial exists by hypothesis. Thus
∃i, j with 0 < i < j ≤ n such that g(i) = g( j). Let σ a permutation of {0, 1}n such that
σ(o) = o, σ(flip(1, o)) = flip(i, o) and σ(backflip(1, o)) = flip( j, o). Since F ′ is c.u.p.,
we can say that g′ ∈ G and o′ ∈ {0, 1}n exist such that ( fg,o ◦ σ) = fg′ ,o′ . Then the following property holds: g′ (d(o, o′ )) = fg′ ,o′ (o) = fg,o (σ(o)) = fg,o (o) = g(d(o, o)) =
g(0) = 1. By definition, it follows that d(o, o′ ) = 0 i.e. o′ = o. Then the following properties hold: g′ (1) = g′ (d(flip(1, o), o) = fg′ ,o′ (flip(1, o)) = fg,o (σ(flip(1, o))) =
A Study of Some Implications of the No Free Lunch Theorem
637
fg,o (flip(i, o)) = g(d(flip(i, o), o)) = g(i) and g′ (1) = g′ (d(backflip(1, o), o) =
fg′ ,o′ (backflip(1, o)) = fg,o (σ(backflip(1, o))) = fg,o (flip( j, o)) = g(d(flip( j, o), o)) =
g( j). From these properties, we can deduce that g(i) = g′ (1) = g( j), which is against the
hypothesis.
4.2 NK-Landscapes
In this section, we prove that the set of all the NK-landscapes [8] with adjacent neighbourhood is not c.u.p.
Proposition 4. Let n ≥ 1 and let 0 ≤ k < n. Then the set of functions NK = {Fφ | φ :
{0, 1}k+1 → {0, 1}} where Fφ (x) = n1 ∑ni=1 φ(xi , . . . , xi+k ) is not c.u.p.
Proof. In order to prove that this set of functions is not c.u.p. we consider a function
that belongs to this set and we show that one of its permutations does not belong to this
set. Let k < n − 1, φ = XOR and σ a permutation such that σ(0n ) = 10n−1 . We show that
a g such that Fφ ◦ σ = Fg does not exist. The following properties hold: (Fφ ◦ σ)(0n ) =
Fφ (10n−1 ) = (k + 1)/n and Fg (0n ) = n1 ∑ni=1 g(0k+1 ) = g(0k+1 ) ∈ {0, 1}. It follows that:
(k + 1)/n ∈ {0, 1} which is absurd, since k + 1 < n.
Let now k = n − 1, φ = XOR and σ such that σ(10n−1 ) = 0n , σ(0n ) = 10n−1 and
σ(x) = x for the remaining elements. We have: (Fφ ◦ σ)(0n−1 1) = Fφ (0n−1 1) = 1n
[φ(0n−1 1) + . . . + φ(10n−1 )] = n/n = 1; thus: Fg (0n−1 1) = 1n [g(0n−11) + . . . + g(10n−1 )]
= 1; and thus: (Fφ ◦ σ)(10n−1 ) = Fφ (0n ) = 0; but: Fg (10n−1) = 1n [g(10n−1) + . . . +
g(0n−11)] = Fg (0n−1 1) = 1, which is an absurd.
5 Automatic Generation of an Optimization Algorithm
We now introduce a method to automatically generate a specific algorithm for a given
set of functions F. Let us consider the following “game” with two players A and B:
player A chooses a cost function f in F (without revealing its choice to player B) and
player B must find an optimal solution of this function. At each step of this game, player
B can choose a solution i in the search space X and player A is forced to reveal the fitness
of that solution, i.e. the value of f (i). The objective of player B is to find an optimal
solution in the smallest possible number of steps. We assume that player B knows all
the values of all the possible solutions for all the functions in F (for instance, we could
imagine that they have been stored in a huge database). Here we propose a strategy
that player B may use to identify the function f chosen by player A. Once player B has
identified this function, since he knows all the cost values of all the possible solutions,
he can immediately exhibit the optimal solution at the subsequent step. According to
this strategy, in order to identify the function f chosen by player A, at each step of
the game player B should perform the action that allows him to “eliminate” the largest
possible number of functions among the ones that can be candidate to be function f .
We call Fc such a set of candidate functions. The strategy initializes Fc := F. Then, for
each point k of X , we create a “rule” s1 | . . . | sm where si is the number of functions
in Fc such that f (k) = i. The most suitable rule is the one that minimizes the average
638
A. Valsecchi and L. Vanneschi
2
of |Fc | after its application, i.e. the one that minimizes (∑m
1 (si ) )/|Fc |, or more simply
m
2
∑1 (si ) , given that |Fc | is the same for all the rules before their application. Thus, at
each step we choose point b with the most suitable rule, we ask to player A the value of
f (b) and we eliminate from the set Fc all the functions f ′ with f (b) = f ′ (b).
Example 1. Let X = Y = {0, 1, 2, 3, 4} and let us consider the set of functions F =
{ f1 , . . . , f8 } with: f1 = (1, 2, 3, 4, 0), f2 = (3, 1, 2, 0, 0), f3 = (4, 1, 1, 2, 3), f4 = (0, 0, 0,
1, 2), f5 = (0, 0, 0, 1, 1), f6 = (0, 0, 0, 0, 1), f7 = (4, 1, 0, 2, 3), f8 = (0, 1, 0, 0, 0); where
∀i = 1, 2, ..., 8, fi has been defined with the notation: fi = (y0 , y1 , y2 , y3 , y4 ) where ∀ j =
0, ..., 4 fi ( j) = y j . Let f1 , f2 , ..., f8 be functions to be maximized. Let us suppose that
the target function (the one chosen by player A) is f6 . Our method begins initializing
the set of candidate functions: Fc := F. Then a loop over all the solutions begins. Let
us consider first solution x = 0. We have to generate a rule s1 | . . . | sm where si is the
number of functions in Fc such that f (0) = i. The number of functions fi in F for which
fi (0) = 0 is equal to 4, thus s0 = 4; the number of functions for which fi (0) = 1 is
equal to 1, thus s1 = 1. Iterating this process, we obtain a rule for solution x = 0 equal
to 4|1|1|2. Now we have to calculate a measure of the “cost” of this rule. We define
2
it as ∑m
1 (si ) , and thus it is equal to 22. We now repeat the same process for all the
solutions in the search space and we obtain the data reported in table 1. The two rules
Table 1. The rules produced for all the possible solutions at the first iteration in the example
introduced in the text
solution
0
1
2
3
4
rule
4|1|1|2
3|4|1
5|1|1|1
3|2|2|1
3|2|1|2
cost
22
26
28
18
18
that minimize the cost are the ones associated with solutions 3 and 4. Thus player B
asks to player A the value of the chosen function for the first of those solutions, i.e.
solution 3. The answer of player A is f (3) = 0. Now, player B can modify the set of
candidate solutions Fc , by eliminating those functions fi such that fi (3) = 0. Thus:
Fc := { f2 , f6 , f8 }.
If we iterate this process, we can see at the second iteration one of the solutions
associated with the rule with the minimum cost is 0 and at the third iteration it is 1.
After that, Fc = { f6 } and the algorithm terminates, since player B has identified the
function chosen by player A. All she has to do is to return the solution that maximizes
function f6 , i.e. x = 4.
To automatically produce a search algorithm inspired by this strategy, it is sufficient
to store the choices that have been done in a decision tree. Such a tree has the nodes
labelled with solutions x ∈ X and arcs departing from a node labelled with x are labelled
with all possible values of fi (x) for each fi ∈ Fc . The son of a node labelled with x and
linked to x by an arc labelled with fi (x) is the solution related to rule with minimum
cost after that all functions f j with f j (x) = fi (x) have been eliminated from Fc . Figure 1
A Study of Some Implications of the No Free Lunch Theorem
639
Fig. 1. The decision tree generated by our algorithm for the example introduced in the text. The
dashed line refers to the last step of the algorithm, where the function has been identified and thus
the optimal solution can be returned.
represents the decision tree generated by our method for the previous example. The
first solution that we have considered (the one that minimizes the cost in Table 1) was
3. Successively, we have considered solutions 0 and 1 and we have been able to return
solution 4. We remark that the leaves of this tree are labelled with points that are optimal
solutions for at least one function in F.
We call “RULE” this method to automatically generate algorithms2 for a given set
of functions F. The following property holds:
Proposition 5. RULE has a computational complexity of Θ(|X ||F|4 ).
Proof. The algorithm is executed on each solution in X and for each function in F.
Thus, the generation of the sequence of rules has a complexity of Θ(|X ||F|). At each
application of the rules, at least one function is eliminated from the set of candidate
|F|
. Thus
ones, thus one iteration has a complexity of: |X ||F| ∑i=0 |F| − i = |X ||F| (|F|+1)|F|
2
the total complexity is Θ(|X ||F|4 ).
Proposition 6. The decision tree produced by RULE has a number of nodes of
O(|X ||F|).
Proof. For each f ∈ F the search for an optimal solution consists in examinating a sequence of points of X with no repetitions.
It is possible to show that RULE does not necessarily generate the minimal algorithm
for a given set of functions. For instance, one may imagine a set of functions that all
share the same set of optimal solutions Opt = {o1 , o2 , ..., oq }; for this set of functions,
a minimal algorithm is clearly one that outputs a solution oi ∈ Opt at the first step, and
not the algorithm generated by RULE. For this reason, we informally say that RULE
generates “good”, although not necessarily minimal, algorithms.
Furthermore, from proposition 5, we can deduce that RULE is clearly too complex
to be used for real-life problems. Nevertheless, in the next section we show how RULE
(and the algorithms generated by RULE) can be used for interesting (although rather
“small”) instances of trap functions and NK-landscapes.
2
Strictly speaking, RULE does not generate “search algorithms”, since they are specialized for
a particular set of functions and domain; nevertheless, we continue calling the output of rule
“algorithm” for simplicity.
640
A. Valsecchi and L. Vanneschi
6 Experimental Results
Propositions 5 and 6 show that given a particular set of functions F defined over a
domain X , executing RULE to generate an algorithm A and then running A on all the
functions in F has a larger computational cost than exhaustively examinating all the
possible solutions for each function in F (whose cost is clearly O(|X ||F|)). For this
reason, it must be clear that the goal of this study is not trying to produce a technique
to efficiently solve particular sets of functions in practice. Nevertheless, we think that it
might be interesting to quantify the theoretical performance improvement of a problemspecific algorithm, compared to a “general” one, like for instance a GA. For this reason,
in this section the algorithms produced by RULE are compared with a standard GA.
The performance measure used for this comparison, consistantly with our definition of
minimal algorithm for a set of functions F (see Section 3), will be φ̄(F). Since GAs are
repeating, we count its fitness evaluations without repetitions. For GAs, we have used
the following set of parameters: population size of 100 potential solutions, standard
single-point crossover [9,10] with rate equal to 0.9, standard point mutation [9,10] with
rate equal to 0.01, tournament selection with tournament size equal to 2, elitism (i.e.,
the best individual is copied unchanged into the next population), maximum number of
generations equal to 200.
The sets of functions that we use in our experiments are partitioned into two groups,
each one composed by three sets of functions. The first group contains three sets of
trap functions. Trap functions [6] are a particular set of functions of the distance (as
defined in Section 4.1) that depend on the values of two costants: B (the width of the
attractive basin for each optimum) and R (their relative importance). The three sets of
trap functions used for our experiments are respectively composed by 100, 250 and
500 “randomly chosen” trap functions. For “randomly chosen” trap function we mean
a trap function where the B and R constants and the (unique) optimal solution have
been chosen randomly with uniformly distributed probability over their domains (the
range [0, 1] for B and R, the search space for the optimal solution). The second group
of functions that we have used contains three sets of NK-landscapes functions. NKlandscape functions [8] are completely defined by the value of two constants (N and K)
and one “kernel” function φ : [0, 1]K+1 → [0, 1] The sets of functions we have used are
respectively composed by 100, 250 and 500 “randomly generated” NK-landscapes, i.e.
NK-landscapes where K and φ have been generated uniformly at random. For all these
functions, the search space X that we have chosen is composed by binary strings of 8
bits (thus N = 8 for NK-landscapes).
Table 2 shows the results obtained by the GA. The first column represents the set
of functions F on which the experiments have been done (for instance “Trap p” means
a set of p “randomly chosen” trap functions). The second column reports the average
number of evaluations with no repetitions that have been spent by the GA for finding
an optimal solution with their standard deviations; more in particular, for each f ∈ F
we have executed 100 independent GA runs and only for those runs where the optimal solution has been found (before generation 200) we have calculated the number of
evaluations without repetitions that have been performed before finding the optimum.
Then, we have averaged all those numbers over the 100 independent runs. The result
that we report is the average of all those averages over all functions in F. The third
A Study of Some Implications of the No Free Lunch Theorem
641
Table 2. Results returned by the GA. Each line reports the results for a different set of functions.
F
Trap 500
Trap 250
Trap 100
NK 500
NK 250
NK 100
φ̄GA (F)
145.18 (σ = 13.6)
145.52 (σ = 13.7)
145.64 (σ = 13.4)
141.61 (σ = 12.5)
142.05 (σ = 12.9)
141.86 (σ = 12.5)
Avg Total FE
3633.41 (σ = 7232)
3607.35 (σ = 7200.7)
4128.53 (σ = 7641.6)
804.15 (σ = 3024.8)
886.54 (σ = 3267.2)
754.18 (σ = 2867.6)
SR
0.82
0.83
0.85
0.98
0.97
0.98
column reports the average number (calculated as above) of evaluations (also counting
repetitions) that have been spent by the GA for finding an optimal solution with their
standard deviations. Finally, the fourth column reports the success rate, i.e. the number
of runs where an optimal solution has been found divided by the total number of runs
that we have performed (100 in our experiments) averaged over all functions in F.
Table 3 reports the results of the algorithms generated by RULE on the same sets of
problems. The first column identifies the set of functions F on which the experiments
have been done; the second column reports the average (calculated over all functions
in F) number of evaluations spent to find an optimal solution with their standard deviations. An optimal solution has always been found for each one of these executions (thus
we do not report success rates).
Table 3. Results returned by the algorithms generated by RULE. Each line reports the results for
a different set of functions.
F
Trap 500
Trap 250
Trap 100
NK 500
NK 250
NK 100
φ̄rule (F)
2.99 (σ = 0.22)
2.95 (σ = 0.21)
2.75 (σ = 0.43)
4.57 (σ = 0.62)
4.22 (σ = 0.52)
3.81 (σ = 0.49)
Comparing results in Tables 2 and 3 we can clearly see that the algorithms generated by RULE have a remarkably better performance than the GA. This was expected
since these algorithms are problem-specific, i.e. they have been generated to solve those
particular problems.
7 Conclusions and Future Work
We have defined the concept of minimal search algorithm for a given set of problems.
We have also introduced an equation to calculate the average performance of an algorithm over a closed under permutation (c.u.p.) set of functions. Furthermore, we have
proven that some particular sets of functions are not c.u.p. In particular, we focused
on any set of functions of the distance to a given optimal solution (this set contains
642
A. Valsecchi and L. Vanneschi
some well known benchmarks, like trap functions, onemax and onemix) and on NKlandscapes. Not being c.u.p., for those sets the No Free Lunch theorem does not hold
and thus it makes sense to look for a minimal algorithm. Inspired by this, we have presented a method to build a specific (not necessarily minimal) search algorithm for a
given set of functions to optimize. We have experimentally shown that the algorithms
generated by such a method remarkably outperform a standard Genetic Algorithm on
some “small” instances of trap functions and NK-landscapes. This was expected given
that the generated algorithms are problem-specific. Our method cannot be applied to
real-life applications, given its complexity, which we have estimated as a function of
the size of the search space and of the cardinality of the considered set of functions.
In the future, we plan to prove other interesting properties of the minimal algorithm,
to prove whether other interesting sets of functions are c.u.p. or not and to improve the
RULE algorithm, eventually employing some concepts of Rough-Sets.
References
1. Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Transactions
on Evolutionary Computation 1(1), 67–82 (1997)
2. Schumacher, C., Vose, M.D., Whitley, L.D.: The no free lunch and problem description
length. In: Spector, L., Goodman, E.D., Wu, A., Langdon, W.B., Voigt, H.-M., Gen, M.,
Sen, S., Dorigo, M., Pezeshk, S., Garzon, M.H., Burke, E. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), 7-11 2001, pp. 565–570.
Morgan Kaufmann, San Francisco (2001)
3. Igel, C., Toussaint, M.: Recent results on no-free-lunch theorems for optimization. CoRR:
Neural and Evolutionary Computing cs.NE/0303032 (2003)
4. Igel, C., Toussaint, M.: On classes of functions for which no free lunch results hold. Inf.
Process. Lett. 86(6), 317–321 (2003)
5. Streeter, M.J.: Two broad classes of functions for which a no free lunch result does not hold.
In: Cantú-Paz, E., Foster, J.A., Deb, K., Davis, L., Roy, R., O’Reilly, U.-M., Beyer, H.-G.,
Kendall, G., Wilson, S.W., Harman, M., Wegener, J., Dasgupta, D., Potter, M.A., Schultz,
A., Dowsland, K.A., Jonoska, N., Miller, J., Standish, R.K. (eds.) GECCO 2003. LNCS,
vol. 2724, pp. 1418–1430. Springer, Heidelberg (2003)
6. Deb, K., Goldberg, D.E.: Analyzing deception in trap functions. In: Whitley, D. (ed.) Foundations of Genetic Algorithms, vol. 2, pp. 93–108. Morgan Kaufmann, San Francisco (1993)
7. Poli, R., Vanneschi, L.: Fitness-proportional negative slope coefficient as a hardness measure
for genetic algorithms. In: Thierens, D., et al. (eds.) Genetic and Evolutionary Computation
Conference, GECCO 2007, pp. 1335–1342. ACM Press, New York (2007)
8. Altenberg, L.: Nk fitness landscapes. In: Back, T., et al. (eds.) Handbook of Evolutionary
Computation, Section B2.7.2, p. 2. B2.7:5 – B2.7:10 IOP Publishing Ltd and Oxford University Press (1997)
9. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning.
Addison-Wesley, Reading (1989)
10. Holland, J.H.: Adaptation in Natural and Artificial Systems. The University of Michigan
Press, Ann Arbor, Michigan (1975)