Multiobjective Optimization Using Evolutionary Algorithms: by Ivo F. Sbalzarini Petros Koumoutsakos
Multiobjective Optimization Using Evolutionary Algorithms: by Ivo F. Sbalzarini Petros Koumoutsakos
Multiobjective Optimization Using Evolutionary Algorithms: by Ivo F. Sbalzarini Petros Koumoutsakos
AND
Petros Koumoutsakos
1. Introduction
Evolutionary algorithms (EAs) such as evolution strategies and genetic algorithms
have become the method of choice for optimization problems that are too complex to be
solved using deterministic techniques such as linear programming or gradient (Jacobian)
methods. The large number of applications (Beasley (1997)) and the continuously growing interest in this field are due to several advantages of EAs compared to gradient based
methods for complex problems. EAs require little knowledge about the problem being
solved, and they are easy to implement, robust, and inherently parallel. To solve a certain
optimization problem, it is enough to require that one is able to evaluate the objective
(cost) function for a given set of input parameters. Because of their universality, ease
of implementation, and fitness for parallel computing, EAs often take less time to find
the optimal solution than gradient methods. However, most real-world problems involve
simultaneous optimization of several often mutually concurrent objectives. Multiobjective EAs are able to find optimal trade-offs in order to get a set of solutions that are
optimal in an overall sense. In multiobjective optimization, gradient based methods are
often impossible to apply. Multiobjective EAs, however, can always be applied, and they
inherit all of the favorable properties from their single objective relatives.
Section 2 of this paper introduces main concepts of single objective EAs. Section 3
extends these ideas to multiobjective cases and introduces the principles of dominance
and Pareto optimality. Section 4 describes the Strength Pareto Approach used in this
work, and in section 5 we extend it with a targeting capability. In section 6 the results
of both single and multiobjective optimization of a microchannel flow are shown and
discussed.
64
I. F. Sbalzarini, S. M
uller & P. Koumoutsakos
parameters (design variables). One then creates a group of (> 0) different parameter vectors and considers it as a population of individuals. The quantity is called the population
size. The quality of a certain vector of parameters (i.e. an individual in the population)
is expressed in terms of a scalar valued fitness function (objective function). Depending
on whether one wants to minimize or maximize the objective function, individuals (i.e.
parameter vectors) with lower or greater fitness are considered better, respectively. The
algorithm then proceeds to choose the , ( < ) best individuals out of the population
to become the parents of the next generation (natural selection, survival of the fittest).
Therefore, denotes the number of parents. The smaller is chosen compared to , the
higher the selection pressure will be. Out of the individuals chosen to be parents for
the next generation, one then creates a new population of offspring xg+1
by applying
i
mutation on the parents xgj as follows:
xg+1
= xgj + N (0, )
i
, i = 1, . . . , , j {1, . . . , }
(2.1)
where N (0, ) denotes a vector of jointly distributed Gaussian random numbers with
zero mean and covariance matrix . The standard deviations (i.e. the square roots of the
diagonal elements i2 of ) of the additive random numbers determine how far away from
its parent a child will be and are called step sizes of the mutation. Now, the first iteration
is completed and the algorithm loops back to the evaluation of the fitness function for
the new individuals. Several different techniques for adaptation and control of the step
size have been developed (see e.g. B
ack (1997a), B
ack (1997b), Back (1993), Hansen &
Ostermeier (1996), or Hansen & Ostermeier (1997)). In the following subsections, some
of the single objective Evolution Strategies used in this work are outlined.
2.1. The (1+1)-ES
One of the simplest and yet powerful evolution strategies is the one plus one evolution
strategy, denoted by (1+1)-ES. In this strategy, both the number of parents and the
population size (i.e. number of offspring) are set to one: = = 1. Mutation is accomplished by adding a vector of usually uncorrelated Gaussian random numbers, i.e.
= diag(i2 ) is a diagonal matrix. Step size adaptation can be performed according to
Rechenbergs 1/5-rule: if less than 20% of the generations are successful (i.e. offspring
better than parent), then decrease the step size for the next generation; if more than
20% are successful, then increase the step size in order to accelerate convergence. This
adaptation is done every N LR generations where N is the number of parameters (i.e.
dimension of search space) and LR is a constant, usually equal to one. Selection is done
out of the set union of parent and offspring, i.e. the better one of the two is chosen to
become the parent of the next generation.
2.2. The (, )-ES
A slightly more advanced method is to take one or more parents and even more offspring,
i.e. 1 and > . Mutation is accomplished in a similar way as with the (1+1)-ES.
Besides the 1/5 rule, another method for step size adaptation becomes available which
is called self-adaptive mutation (Back (1997a)). In this method, the mutation steps are
adapted every generation. They are either increased, decreased, or kept the same, each
with a probability of 1/3. On the average, 1/3 of the offspring will now be closer to their
parents than before, 1/3 keeps progressing at the same speed, and 1/3 explores further
areas. Depending on how far away from the optimum we currently are, one of these three
groups will do better than the others and, therefore, more individuals out of it will be
65
selected to the next generation, where their step sizes are inherited. The algorithm adapts
the step size by itself, i.e. by means of mutation and selection.
2.3. The (/I , )-CMA-ES
The covariance matrix adaptation (CMA) is a sophisticated method for online adaptation
of step sizes in (, )-ES with intermediate recombination (i.e. averaging of parents). It
was first described by Hansen & Ostermeier (1996) and further improved and evaluated
by Hansen & Ostermeier (1997). For a complete description of the algorithm, the reader
is referred to the latter publication. The basic idea is to adapt step sizes and covariances
in such a way that the longest axis of the hyperellipsoid of mutation distribution always
aligns in the direction of greatest estimated progress. This is done by accumulating
information about former mutation steps and their success (evolution path) and searching
it for correlations. Besides this very sophisticated method for step size adaptation, a
CMA-ES also includes mutation (with now being a full matrix) and selection.
x = (x1 , . . . , xm ) X
y = (y1 , . . . , yn ) Y
(3.1)
66
I. F. Sbalzarini, S. M
uller & P. Koumoutsakos
and where x is called decision (parameter) vector, X parameter space, y objective vector
and Y objective space. A decision vector a X is said to dominate a decision vector
b X (also written as a b) if and only if:
i {1, . . . , n} : fi (a) fi (b)
j {1, . . . , n} : fj (a) > fj (b)
Additionally, we say a covers b (a b) if and only if a b or f (a) = f (b).
(3.2)
@a X : a
(3.3)
(b) The decision (parameter) vector a is called Pareto-optimal if and only if a is nondominated regarding the whole parameter space X.
If the set X is not explicitly specified, the whole parameter space X is implied.
Pareto-optimal parameter vectors cannot be improved in any objective without causing
a degradation in at least one of the other objectives. They represent in that sense globally
optimal solutions. Note that a Pareto-optimal set does not necessarily contain all Paretooptimal solutions in X. The set of objective vectors f (a ), a X , corresponding to
a set of Pareto-optimal parameter vectors a X is called Pareto-optimal front or
Pareto-front.
3.2. Difficulties in multiobjectve optimization
In extending the ideas of single objective EAs to multiobjective cases, two major problems
must be addressed:
1. How to accomplish fitness assignment and selection in order to guide the search
towards the Pareto-optimal set.
2. How to maintain a diverse population in order to prevent premature convergence
and achieve a well distributed, wide spread trade-off front.
Note that the objective function itself no longer qualifies as fitness function since it
is vector valued and fitness has to be a scalar value. Different approaches to relate the
fitness function to the objective function can be classified with regard to the first issue.
For further information, the reader is referred to Horn (1997). The second problem is
usually solved by introducing elitism and intermediate recombination. Elitism is a way
to ensure that good individuals do not get lost (by mutation or set reduction), simply by
storing them away in a external set, which only participates in selection. Intermediate
recombination, on the other hand, averages the parameter vectors of two parents in order
to generate one offspring according to:
x j = xgj1 + (1 )xgj2
xg+1
= x j + N (0, )
i
, j, j1 , j2 {1, . . . , }
, i = 1, . . . ,
, j {1, . . . , }
(3.4)
67
i,i j
si
, fi [1, N )
(4.1)
68
I. F. Sbalzarini, S. M
uller & P. Koumoutsakos
Step 1: Randomly (uniformly distributed random numbers) select two individuals out
of the population P .
Step 2: Copy the one with the better (i.e. lower for SPEA) fitness value to the mating
pool.
Step 3: If the mating pool is full, then stop, else go to Step 1.
Adaptation of the step sizes was done using the self-adaptive mutation method (c.f.
section 2.3). Each element of P and P is assigned an individual step size for every
parameter, i.e. = diag(i2 ) is a diagonal matrix for each individual. The step sizes of
all members of the mating pool are then either increased by 50%, cut to half, or kept the
same, each at a probability of 1/3.
4.3. Reduction by clustering
In Step 5, the number of externally stored nondominated solutions is limited to some
number N . This is necessary because otherwise P would grow to infinity since there
always is an infinite number of points along the Pareto-front. Moreover, one wants to
be able to control the number of proposed possible solutions because, from a decision
makers point of view, a few points along the front are often enough. A third reason for
introducing clustering is the distribution of solutions along the Pareto-front. In order
to explore as much of the front as possible, the nondominated members of P should
be equally distributed along the Pareto-front. Without clustering, the fitness assignment
method would probably be biased towards a certain region of the search space, leading to
an unbalanced distribution of the solutions. For this work, the average linkage method, a
clustering algorithm which has proven to perform well on Pareto optimization, has been
chosen. The reader is referred to Morse (1980) or Zitzler & Thiele (1999) for details.
Parameter
Value
5
50
30
70
250
(0.5, 0.7)
4
69
calculates the distances of all members of P to the target and picks the one with minimal
distance into Pbest . At all times, Pbest only contains one solution.
3. At the end of the algorithm, not only the Pareto-front is output but also the solution
stored in Pbest . Note: due to clustering and removal in P , the solution in Pbest is not
necessarily contained in P . It is, therefore, an optimal solution which otherwise would
not have appeared in the output.
The algorithm has been implemented and tested for convex and nonconvex testfunctions. Figures 1 to 4 show some results for the nonconvex testfunction T2 as proposed in
Zitzler & Thiele (2000):
Minimize
subject to
where
x = (x1 , . . . , xm )
f1 (x1 ) = x1
g(x2 , . . . , xm ) = 1 + 9
2
h(f1 , g) = 1 (f1 /g)
(5.1)
m
i=2
xi /(m 1)
where m is the dimension of the parameter space and xi [0, 1]. The exact Pareto-optimal
front is given by g(x) = 1. The parameters of the algorithm were set as summarized in
table 1.
The chosen target value is slightly off-front. Therefore, the targeting error will never
be zero. Figure 1 shows the final population after 250 generations without targeting. The
diamonds indicate members of the external nondominated set (Pareto-optimal front)
whereas members of the regular population are denoted by crosses. In Fig. 2 the same
run has been repeated with targeting. Figure 3 shows the targeting error as a function
of the generation number. The dashed line indicates the theoretical minimum of the
distance. After about 80 to 100 generations, the point on the front which is closest to the
target has been found with good accuracy. Figure 4 shows the path of Pbest towards the
target. The jumps are due to the fact that the individual stored in Pbest gets replaced as
soon as another individual is closer to the target.
The best objective value that was achieved was: f (Pbest ) = (0.5265, 0.7247), its Euclidean distance from the target is 3.6287102, which is equal to the theoretical minimal
distance within the given computational accuracy.
I. F. Sbalzarini, S. M
uller & P. Koumoutsakos
2
1.8
1.8
1.6
1.6
1.4
1.4
1.2
1.2
f2
f2
70
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
0.2
0.4
f1
0.6
0.8
0.2
0.4
f1
0.6
0.8
1.8
1.6
0.8
1.2
0.6
f2
distance
1.4
1
0.8
0.4
0.6
0.4
0.2
0.2
0
0
0
20
40
60
80
generation
100
120
140
0.2
0.4
f1
0.6
0.8
(a(x, T ) U (x)) dx
J=
(6.1)
71
8
7
cost function
6
5
4
3
2
1
0
20
40
60
80
100
120
140
nr. of function calls
160
180
200
] vs. number of
with being the cross section of the channel exit. The shape of the 90-degree turn is
described by 11 parameters. Therefore, the parameter search space is of dimension 11.
The objective space is scalar since it is a single objective problem.
The calculation of the flow field and evaluation of the objective function was done
by an external flow solver provided by Mohammadi, Molho & Santiago (2000). Both
a (1+1)-ES and a (3,12)-CMA-ES were applied to the problem and their convergence
was compared. The results were statistically averaged from 5 runs with different initial
conditions, i.e. starting points.
Since the CMA-ES has a population size of 12, it performs 12 function evaluations per
generation. Figure 5 shows the convergence normalized to the same number of function
calls. Figure 6 and 7 show the corresponding solutions after 20 and 180 generations of
the best 1+1 run out of the ensemble (the lines are iso-potential lines of the electric
field). After 20 generations the contour of the channel gets a clearly visible dent in it.
After 80 evaluations of the objective function, the algorithm has found a double-bump
shape to be even better, and after 180 calls to the solver, the optimum has been reached.
The value of the objective function has dropped to about 106 for the best run out of
the ensemble. This means that dispersion is almost zero and the channel will have very
good separation properties.
6.2. Multiobjective optimization
We then introduced the total deformation of the channel contour as a second objective
to be minimized simultaneously in order to minimize manufacturing costs. The second
objective was thus given by:
11
p2i
K=
(6.2)
i=1
where pi are the shape parameters of the channel as introduced by Mohammadi, Molho
& Santiago (2000). The first objective remained unchanged. The algorithm used for this
72
I. F. Sbalzarini, S. M
uller & P. Koumoutsakos
0.6
0.6
0.4
0.4
0.2
0.2
-0.2
-0.2
-0.4
-0.4
0
0.2
0.4
0.6
0.8
0.2
0.4
0.6
0.8
4
3.5
deformation
3
2.5
2
2
1.5
1
3
0.5
2.5
3.5
4
skew
4.5
5.5
optimization was a SPEA with a population size of 20, a maximum size of the external
nondominated set of 30, and a mating pool of size 10.
Figure 8 shows the Pareto-optimal trade-off front after 80 generations of the algorithm,
and Figs. 9 and 10 show the corresponding solutions, i.e. optimized shapes of the channel.
One is now free to choose whether to go for minimal skewness at the expense of a higher
deformation (c.f. Fig. 9), choose some intermediate result, or minimize deformation in
order to minimize manufacturing costs and still get the lowest skewness possible with
the given amount of deformation (c.f. Fig. 10).
The results obtained with evolutionary optimization are comparable to the results of
the gradient based method. However, far less mathematics and complex formulas were
involved here, which leads to greater flexibility and shorter time-to-solution.
0.6
0.4
0.4
0.2
0.2
-0.2
-0.2
-0.4
73
-0.4
0
0.2
0.4
0.6
0.8
0.2
0.4
0.6
0.8
REFERENCES
ck, T. 1993 Optimal mutation rates in genetic search. In Forrest, S., editor, ProceedBa
ings of the 5th International Conference on Genetic Algorithms, Urbana-Champaign,
IL, July 1993. Morgan Kaufmann, San Mateo, CA. 2-8.
ck, T. 1997a Self-adaptation. In B
Ba
ack, T., Fogel, D. B. & Michalewicz, Z., editors,
Handbook of Evolutionary Computation. 97/1, C7.1, IOP Publishing Ltd. and Oxford
University Press.
Back, T. 1997b Mutation parameters. In Back, T., Fogel, D. B. & Michalewicz, Z.,
editors, Handbook of Evolutionary Computation. 97/1, E1.2, IOP Publishing Ltd.
and Oxford University Press.
Beasley, D. 1997 Possible applications of evolutionary computation. In B
ack, T., Fogel,
D. B. & Michalewicz, Z., editors, Handbook of Evolutionary Computation. 97/1, A1.2,
IOP Publishing Ltd. and Oxford University Press.
Coello, C. A. C. 1999 A comprehensive survey of evolutionary-based multiobjective
optimization. Knowledge and Information Systems. 1(3), 269-308.
Fonseca, C. M. & Fleming, P. J. 1995 Am overview of evolutionary algorithms in
multiobjective optimization. Evolutionary Computation. 3(1), 1-16.
Hansen, N. & Ostermeier, A. 1996 Adapting Arbitrary Normal Mutation Distributions in Evolution Strategies: The Covariance Matrix Adaptation. Proceedings of
the 1996 IEEE International Conference on Evolutionary Computation (ICEC 96),
312-317.
Hansen, N. & Ostermeier, A. 1997 Convergence Properties of Evolution Strate-
74
I. F. Sbalzarini, S. M
uller & P. Koumoutsakos
gies with the Derandomized Covariance Matrix Adaptation: The (/I , )-CMAES. Proceedings of the 5th European Conference on Intelligent Techniques and Soft
Computing (EUFIT 97), 650-654.
Horn J. 1997 Multicriteria decision making. In B
ack, T., Fogel, D. B. and Michalewicz,
Z., editors, Handbook of Evolutionary Computation. 97/1, F1.9, IOP Publishing Ltd.
and Oxford University Press.
Jonathan, M., Zebulum, R. S., Pacheco, M. A. C., & Vellasco, M. B. R.
2000 Multiobjective Optimization Techniques: A Study Of The Energy Minimization
Method And Its Application To The Synthesis Of Ota Amplifiers. Proceedings of the
second NASA/DoD Workshop on Evolvable Hardware, July 13-15, 2000, Palo Alto,
CA, 133-140.
Mohammadi, B., Molho, J. I. & Santiago J. A. 2000 Design of Minimal Dispersion
Fluidic Channels in a CAD-Free Framework. Summer Program Proceedings. Center
for Turbulence Research, NASA Ames/Stanford Univ. This volume.
Morse, J. N. 1980 Reducing the size of the nondominated set: Pruning by clustering.
Computers and Operations Research. 7(1-2), 55-66.
Schaffner, J. D. 1984 Multiple Objective Optimization with Vector Evaluated Genetic
Algorithms. Unpublished Ph.D. thesis, Vanderbilt University, Nashville, Tennessee.
Schaffner, J. D. 1985 Multiple objective optimization with vector evaluated genetic
Algorithms. Proceedings of an International Conference on Genetic Algorithms and
Their Applications, sponsored by Texas Instruments and the U.S. Navy Center for
Applied Research in Artificial Intelligence (NCARAI), 93-100.
Van Veldhuizen, D. A. & Lamont, G. B. 1998 Multiobjective evolutionary algorithm research: A history and analysis. Technical Report TR-98-03, Department of
Electrical and Computer Engineering, Graduate School of Engineering, Air Force
Institute of Technology, Wright-Patterson AFB, Ohio.
Zitzler, E. & Thiele, L. Nov. 1999 Multiobjective Evolutionary Algorithms: A Comparative Case Study and the Strength Pareto Approach. IEEE Transactions on
Evolutionary Computation. 3(4), 257-271.
Zitzler, E. & Thiele L. 2000 Comparison of Multiobjective Evolutionary Algorithms:
Empirical Results. Evolutionary Computation. 8(2), 173-195.