Evolutionary Computation and Convergence To A Pareto Front: David A. Van Veldhuizen Gary B. Lamont
Evolutionary Computation and Convergence To A Pareto Front: David A. Van Veldhuizen Gary B. Lamont
Gary B. Lamont
Department of Electrical and Computer Engineering Graduate School of Engineering Air Force Institute of Technology Wright-Patterson AFB, OH 45433-7765 flamontg@at.af.mil (937) 255-3636 ext. 4718 This paper focuses on ECs ability to converge to the pdimensional hypersurface dened by simultaneously optimizing p functions. For example, by optimizing a bi-objective problem where both objectives are functions of one variable, we can determine necessary analytical expressions for use in a specic EC convergence computation. We then show an arbitrary Evolutionary Algorithm (EA) statistically converges (given specic MOPs) and generalize our results. The remainder of this paper is organized as follows. Section 2 introduces relevant MOP concepts. Section 3 describes our experiments with an example MOP and the software environment used. Section 4 presents and presents a theorem showing a multiobjective EA statistically converges to the Pareto front. Finally, Section 5 presents our conclusions and proposes promising directions for further research.
ABSTRACT Research into solving multiobjective optimization problems (MOP) has as one of its an overall goals that of developing and dening foundations of an Evolutionary Computation (EC)-based MOP theory. In this paper, we introduce relevant MOP concepts, and the notion of Pareto optimality, in particular. Specic notation is dened and theorems are presented ensuring Paretobased Evolutionary Algorithm (EA) implementations are clearly understood. Then, a specic experiment investigating the convergence of an arbitrary EA to a Pareto front is presented. This experiment gives a basis for a theorem showing a specic multiobjective EA statistically converges to the Pareto front. We conclude by using this work to justify further exploration into the theoretical foundations of EC-based MOP solution methods.
Introduction
Although single-objective optimization problems may have a unique optimal solution, MOPs (as a rule) offer a possibly uncountable set of solutions, which when evaluated produce vectors whose components represent trade-offs in decision space. A decision maker then implicitly chooses an acceptable solution by selecting one of these vectors. Mathematically speaking, an MOP minimizes (or maximizes, since minfF (x)g = ? maxf?F (x)g) the components of a vector f (x) where x is an n-dimensional decision variable vector from some universe U . Or in general, minimize subject to
Our research focuses on solving scientic and engineering multiobjective optimization problems (MOPs), contributing to the overall goal of developing and dening foundations of an Evolutionary Computation (EC)-based MOP theory. This paper addresses shortfalls recently identied by Horn (Horn 1997), focusing on MOP theory development and experimentation. Solving MOPs with EC methods presents several unique challenges. For a good introduction to the relevant issues and past EC-based approaches, see the articles by Fonseca and Fleming (Fonseca and Fleming 1995), and by Horn (Horn 1997).
(1)
An MOP then consists of n variables, m constraints, and p objectives (p 2), of which any or all of the objective functions may be nonlinear (Hwang and Masud 1979). MOPs are often characterized by measures of performance (objectives) which may be (in)dependent and/or non-commensurable; the multiple objectives being optimized almost always conict. These opposing objectives place a partial, rather than total, ordering on the search space. In order to successfully deal with
these characteristics, several EC-based methods (Fonseca and Fleming 1995, Horn 1997) were developed to determine optimal solutions given an MOPs objectives and constraints. Key to many of these methods, however, is the use of Pareto Optimality in determining a (set of) MOP solutions.
The problems two objectives are dened as: Minimize f11 Minimize f12
= x2 ; = (x ? 2)2 :
(3)
to f11 but not f12 ; the solution x = 2 is optimal with respect to f12 but not f11 . Any solution fx j x 62 0 x 2g is not a member of the Pareto optimal set because it is not better than a solution in the set with respect to either objective. For the general case, Rudolph (Rudolph 1998b) recently showed that given Minimize f11 Minimize f12
u = (u1; : : : ; up)
(4)
X = fx 2 R j x = rz; r 2 0; 1]g :
40 f11 f12 35
(5)
minimize f (xu); u 2 fa; b; c; dg; where a , f (xa) = (3:25; 1:76; 4:67); b , f (xb ) = (3:25; 1:76; 4:67); c , f (xc) = (3:15; 1:76; 4:67); d , f (xd) = (3:15; 1:76; 4:22): (2) Here, a and b are dominated by both c and d; c is dominated by d, and d dominates all other vectors.
Denition 2 (Pareto Optimality): A solution xu 2 U is said to be Pareto optimal if and only if there is no xv 2 U for which v = f (xv ) = (v1 ; : : : ; vp ) dominates u = f (xu ) = 2 (u1; : : : ; up ). The solution xd in Equation 2s example is the only element of the associated Pareto optimal set; i.e., it is a Pareto optimal solution of the set fxa ; xb ; xc ; xd g. Pareto optimal solutions are also termed non-inferior, admissible, or efcient solutions. Their corresponding vectors are termed nondominated (Horn 1997); selecting a vector(s) from this nondominated vector set implicitly indicates acceptable Pareto optimal solutions. These solutions may have no clearly apparent relationship besides their membership in the Pareto optimal set. We stress here that Pareto optimal solutions are classied as such based on their phenotypical expression. Their expression (the non-dominated vectors), when plotted in criterion space, is known as the Pareto front. Researchers have sometimes inconsistently used these terms in the literature. As another example of Pareto optimality we present the one-variable, two-objective problem F 1. This is the same problem used by Vincent and Grantham, Schaffer, and Srinivas and Deb for the same purpose (Srinivas and Deb 1994). Figure 1
30
25 f11, f12
20
15
10
0 4
0 x
9 8 7 6 5 f12 4 3 2 1 0 0
Pareto Front
4 f11
Figure 2 f12 ).
We point out a signicant difference between Figures 1 and 2. Figure 1 plots the values of functions f11 and f12 for different values of the independent variable. However, Figure 2 represents the values of function f11 plotted against those of function f12 for the same value of the independent
variable. In other words, Figure 2 is a graph in phenotype space. This plot graphically displays the non-dominated vectors for this problem as points in criterion space; this representation forms the Pareto front. Identifying a set of Pareto optimal solutions is thus key for a decision maker to choose a compromise solution satisfying the objectives as best possible. Choosing an MOP solution optimized for only one objective may well ignore solutions, which from an overall standpoint, are better. The Pareto optimal solution set denes that overall view. Of course, the accuracy of the decision makers view depends on both the true Pareto optimal set and the set presented as Pareto optimal. EA-derived solutions of real-world MOPs offer only a nite number of points which may or may not be truly Pareto optimal. Also, complex MOPs dont generally lend themselves to analytical determination of the actual Pareto front. This raises the issue of EC convergence to the true Pareto front, which we address in Section 4.
Proof: Assume n elements of an arbitrary solution set. When evaluated in a minimization problem n vectors result. Without loss of generality, assume a one-dimensional function which when evaluated results in n 1-D vectors. Order these vectors from smallest to largest, which we can do because by denition, a total ordering exists. There are then two cases to consider: Case I. The smallest vector is strictly less than or equal to all other vectors. Since the vectors are ordered this vector is Pareto dominant (by denition) and its associated solution is Pareto optimal. Case II. Beginning with the smallest vector, two or more vectors are equal in value. Since the vectors are ordered, these particular vectors are guaranteed to be nondominated since no other vector dominates them, and neither (or any other vector of equal value) dominates the other(s). Thus, since they are non-dominated, their associated solutions are also Pareto optimal. Without loss of generality, the same reasoning holds when extended to vectors of two or more dimensions except that the ordering is now partial, and to maximization problems where the vector ordering is now largest to smallest. Thus, any non-empty solution set has at least one Pareto optimal solu2 tion.
Experimentation
Theorem 1: Given any non-empty solution set, at least one 2 Pareto optimal solution exists within that set.
1 Horn
Ponline , Po
ine ,
and
Pactual
instead of
EC embodies the techniques of Genetic Algorithms (GAs), Genetic Programming (GP), Evolution Strategies (ESs), and Evolutionary Programming (EP), collectively known as EAs. Several research efforts successfully used Pareto optimality as the basis for EA tness assignment in solving MOPs (Fonseca and Fleming 1995, Horn 1997). We investigate here whether
Using the Pareto fronts equation (Equation 7) and setting the derivative of F (x) equal to zero (since minimizing the square of the distance simultaneously minimizes the distance) results in:
0 = F 0(x) = 2(y1 ? y10 ) + 2 f (y1 ) ? y20 ] f 0 (y1 )] = 2y1 ? 2y10 + 2py 2(y1 ? 4py1 + 4) ? 2y20 ] 1 ? y 1 ] 1 py + 4py1 (y ? 4) + = 4y1 ? 12 1 y 20 1 24 ? 2y20 ? 2y10 : (10) Finally, for an arbitrary tuple (y1i ; y2i ) in Pcurrent , solving Equation 10 for y1 allows Equation 8 to be solved and a distance di between (y1i ; y2i ) and the nearest point on PFtrue
determined. Several solutions in each generation may be elements of Pcurrent . Thus, as a measure of distance between PFcurrent and PFtrue we dene a generational distance:
Gj ,
pPn
(6)
Because weve shown y2 is a function of y1 we can assume a standard coordinate system for the following computations. The equation to determine distance between any given point (i.e., EA MOP phenotypes) and a point on the given curve (Equation 7) is then given by
di = =
p(y p(y
(8)
where y10 and y20 are EA-returned tness values, and y1 and y2 are coordinates of a point on PFtrue . We wish to determine the distance between an arbitrary point and the nearest point on a given curve, where the point corresponds to a member of PFcurrent and the given curve is PFtrue . For easier mathematical manipulation Equation 8 is modied to reect the distance squared:
(9)
is a Trademark of The MathWorks, Inc. functions must be invertible in order to obtain an analytical formula for a Pareto front.
where n is the number of points in Pcurrent at generation j , i.e., Equation 8 is solved for each member of Pcurrent and the results used to derive a value for Equation 11. The generational distance value then determines the error between PFcurrent and PFtrue . To perform our experiment we chose the default GA implementation provided by GEATbx, requiring few user-specied parameters. We set the following necessary options: a realvalued representation, 50 individuals in the GA population, terminate after 100 generations, and search space bounds of [-100,100]. Also, a secondary solution population (Pknown ) was constructed and integrated into the GA. At each generation, Pcurrent Pknown and Pknown updated. To solve the MOP under study, the GA was augmented by the Paretobased ranking and linear tness assignment scheme described by Fonseca (Fonseca 1995). Appropriate tness values were assigned to each member of the GAs population by developing two additional MATLAB routines. The rst determines which associated vectors of a given solution set are non-dominated and assigns them a rank of 0. The remaining solutions (all of which are dominated) are given a rank equal to the number of associated vectors dominating theirs. The second routine transforms the rankings into tness; all solutions of equal rank receive identical tness values. Using the above parameters and methodology, we see the implemented GA does in fact converge using Pareto optimality as a tness assignment criterion. Figure 3 shows PFcurrent generational distances. Its seen that at some generations, the currently known Pareto front appears to be close to or part of the true front, but at other generations is farther away from it. This clearly indicates the contextual nature of Pareto-based tness assignment. Figure 4
i=1 di
(11)
shows the generational distance of the secondary population, indicating that as better Pareto optimal solutions are found, worse ones are discarded and PFknown moves ever closer to PFtrue .
3.5
0.16
0.14
0.12
0.1
0.08
0.06
0.04
2.5
0.02
0 0
10
20
30
40 50 60 Generation Number
70
80
90
100
1.5
Figure 6
1 0.5
0 0
10
20
30
40 50 60 Generation Number
70
80
90
100
Figure 3
0.4
0.3
Two changes were made to the GA parameters mentioned above. First, search space bounds were set to [-10, 10]. Secondly, because this MOP involves two independent variables we used MATLAB to derive an approximation to the equation describing PFtrue ; the RMS error between the actual and computed solutions was 0:0224. We used this equation in determining the distance between any member of PFcurrent or PFknown and the nearest point on the curve representing PFtrue . Figurers 5 and 6 show the resulting data. Again, the implemented GA converges to PFtrue using Pareto optimality as a tness assignment criterion.
0.2
Figure 4
10
20
30
40 50 60 Generation Number
70
80
90
100
Figure 5 II).
minimize f (x; y) = (f1(x; y); f2 (x; y)); where f1 (x; y) = 1 ? exp(?(x ? 1)2 ? (y + 1)2) ; f2 (x; y) = 1 ? exp(?(x + 1)2 ? (y ? 1)2) :
(12)
We must consider the real world before discussing experimental results. The major issue concerns our denition of the Pareto front in the above problems. It is only because of the problems unconstrained and bi-objective nature that we are able to determine expressions for PFtrue . As Horn also notes (Horn 1997), in real problems we have no way of knowing when to terminate search. Unlike our example, there is no general method to determine PFknown s distance from PFtrue . In the rst example (Equation 3) we were able to derive an exact analytical expression for PFtrue because both objective functions are functions of one variable. In the second case we deterministically computed the objective functions values at equidistant values of x and y over a given range, thus determining which values gave rise to PFtrue . Using those values, a MATLAB polynomial curve-tting function then gave us an approximation of f (f1 ) for our distance computations. We realize EAs are overkill for nding solutions to these particular problems. These MOPs were selected for their simplicity and ease of implementation for our experiments. We do plan to extend this experiment to MOPs of three or more functions (which are functions of both single and multiple variables), and apply results to real-world problems. Note also that EA implementations discretize continuous models because of computational limitations. Thus, if one is searching a discrete space, PFtrue of this space is most likely only an approximation to the continuous PFtrue , to a particular level of resolution.
EA Convergence
f P , ln f max((0)) ; max T
Theorem 2: An EA using Pareto-based ranking and a monotonic selection algorithm converges to the global optimum of ~ an MOP (a Pareto front F , composed of at most an uncountably innite number of vectors v1 ; v2 ; : : : ; v@1 ) with probability one, i.e.,
(13)
where fmax (i)is the best objective function value in the parent population at generation i. We modify this denition to the following:
Proof: B ck proves (B ck 1996, pg. 129) that an EA cona a verges with probability one if it fullls the following conditions:
RP , ln G 1 ; T
rG
(14)
where G1 is the generational distance at generation 1, and GT the distance at generation T . We also select GT as a metric showing the EAs effectiveness in converging to the Pareto front. Initial experimental results indicate convergence occurs as conjectured, as shown in Table 1. Each reported result was derived from 10 GA executions. Niching and sharing werent explicitly considered in these runs; we wished to see only if PFknown converged to PFtrue , not how much of the front it covered. Each result is labeled in the form DEB(X) or FNF(X), where DEB refers to (Equation 3) and FNF to (Equation 12). The X designator is interpreted as follows: A had a recombination probability of 0.7 and a mutation probability of 0.7, and Bs recombination was 0.7 and mutation was 0.35. The mean and variance are reported for each metric. Table 1
Function DEB(A) DEB(B) DEB(ALL) FNF(A) FNF(B) FNF(ALL)
Experimental Results.
RP (Mean)
11.0071 14.0125 12.5098 1.4247 1.3617 1.3932
RP (Variance)
6.8985 11.0616 9.1038 0.1999 0.2565 0.2262
GT (Mean)
1.0346e-04 4.9123e-05 7.6292e-05 0.0056 0.0065 0.0061
GT (Variance)
1.8771e-04 1.5534e-04 1.6999e-04 0.0013 0.0022 0.0018
RP (Equation 14) measures the relative convergence of the implemented GAs. These results cannot be generalized at this time because supporting data and analysis are lacking. Addressing this shortfall is part of of our future research. However, for both examples, after 100 generations, GT indicates the GA was quite effective in converging PFknown to PFtrue :. That is, by looking at Figures 4 and 6, we see the trend for GT values moving towards a zero error (i.e., they are within some of PFtrue : ).
We assume an EA with innite precision4, a minimization MOP, an uncountably innite population size, and appropriate mutation and recombination operators allowing every point in ~ ~ ~ the search space to be visited. Thus, 8F and F 0 2 I , F 0 is ~ . This satises (15), and it remains to prove reachable from F that the EAs population sequence is monotone. This situation occurs when the EA used is admissible (both tness function and selection algorithm are monotonic). We note here that the tness function is not a combination of the objective functions (see Section 3.2). Grefenstette denes a monotonic tness function as one which does not reverse the sense of any pairwise solution ranking (Grefenstette 1997). When using Pareto-based tness assignment, any given pair of Pareto optimal solutions receive identical tness values; they also receive better tness than dominated solutions. Any tness function assigning tness in this way is monotonic. A monotonic selection algorithm also respects the tness of pairwise solutions as regarding the expected number of offspring. Thus, we see that Equation 16 is also satised, since any selection algorithm selecting individuals based on tness is monotonic. All Pareto optimal solutions present in generation P (t) have the best tness and are selected for inclusion in generation P (t + 1). Pareto optimal solutions whose associated vectors are non-dominated in generation P (t + 1) receive higher tness than non-Pareto solutions. However, using Pareto-based ranking ensures either case satises Equation 16. Thus, an EA using Pareto optimality-based ranking and a monotonic selection algorithm converges to the global optimum of an MOP (i.e., PFtrue ) with probability one. 2
4 This assumption is based on our use of a real-valued EA; if the optimal solutions were rationals we could use a nite representation.
parameter selection. We plan to evaluate more complex multiobjective problems from this convergence perspective and apply insights to real-world MOP solving.
Acknowledgments
We thank Maj (Dr) Thomas Reid, and the anonymous reviewers from an earlier draft.
References
B ck, Thomas (1996). Evolutionary Algorithms in Theory a and Practice. Oxford University Press. New York. B ck, Thomas, Fogel, David and Michalewicz, Zbigniew, a Eds.) (1997). Handbook of Evolutionary Computation. Vol. 1. IOP Publishing Ltd. and Oxford University Press. Ben-Tal, Aharon (1980). Characterization of pareto and lexicographic optimal solutions. In: Multiple Criteria Decision Making Theory and Application (G. Fandel and T. Gal, Eds.). Vol. 177 of Lecture Notes in Economics and Mathematical Systems. pp. 111. Springer-Verlag. Berlin. Fonseca, Carlos M. and Peter J. Fleming (1995). An overview of evolutionary algorithms in multiobjective optimization. Evolutionary Computation 3(1), 116. Fonseca, Carlos Manuel Mira (1995). Multiobjective Genetic Algorithms with Application to Control Engineering Problems. PhD thesis. The University of Shefeld. Shefeld, UK. Grefenstette, John (1997). Handbook of Evolutionary Computation. Chap. Rank-Based Selection, pp. C2.4:1 C2.4:6. Vol. 1 of B ck et al. (1997). a Horn, Jeffrey (1997). Handbook of Evolutionary Computation. Chap. Multicriterion Decision Making, pp. F1.9:1 F1.9:15. Vol. 1 of B ck et al. (1997). a Hwang, Ching-Lai and Abu Syed Md. Masud (1979). Multiple Objective Decision Making - Methods and Applications. Springer Verlag. Pohlheim, Hartmut (1998). Genetic and evolutionary algorithm toolbox for use with matlab. Technical report. Technical University Ilmenau. Rudolph, G nter (1998a). Evolutionary search for minimal u elements in partially ordered nite sets. In: Proceedings of the Seventh Annual Conference on Evolutionary Programming. Springer. Berlin. Rudolph, G nter (1998b). On a multi-objective evolutionary u algorithm and its convergence to the pareto set. In: Proceedings of the 1998 IEEE Conference on Evolutionary Computation. IEEE.
Conclusions
We have shown an EA statistically converges to a given Pareto front and presented a theorem supporting the experimental evidence. We also introduced a consistent notation not reected in the current literature, ensuring various Pareto concepts are clearly understood. We now plan to investigate how to make the EA more efcient, i.e., how to make it converge more quickly to, and uniformly across, PFtrue . Insight into examples of the types presented here may help in appropriate EA
Srinivas, N. and Kalyanmoy Deb (1994). Multiobjective optimization using nondominated sorting in genetic algorithms. Evolutionary Computation 2(3), 221248.