Optimization For Engineering Design
Optimization For Engineering Design
Optimization For Engineering Design
Optimization for
Engineering Design
Algorithms and Examples
SECOND EDITION
KALYANMOY DEB
Department of Mechanical Engineering
Indian Institute of Technology Kanpur
New Delhi-110001
2012
OPTIMIZATION FOR ENGINEERING DESIGN—Algorithms and Examples, Second Edition
Kalyanmoy Deb
© 2012 by PHI Learning Private Limited, New Delhi. All rights reserved. No part of this book
may be reproduced in any form, by mimeograph or any other means, without permission in
writing from the publisher.
ISBN-978-81-203-4678-9
The export rights of this book are vested solely with the publisher.
Published by Asoke K. Ghosh, PHI Learning Private Limited, M-97, Connaught Circus,
New Delhi-110001 and Printed by Rajkamal Electric Press, Plot No. 2, Phase IV, HSIDC,
Kundli-131028, Sonepat, Haryana.
To
My Parents
Contents
Preface ............................................................................................... xi
Preface to the First Edition............................................................... xiii
Acknowledgements ............................................................................. xvii
The first edition of this book which was published in 1995 has been well
tested at IIT Kanpur and at many other universities over the past 17
years. It is unusual to have the second edition of a book being published
after so many years, but it is the nature of the book that prompted me to
wait till there is enough feedback from students and teachers before I was
sitting down to revise the first edition. The optimization algorithms laid
out in this book do not change with time, although their explanations and
presentations could have been made better. But the feedback I received from
several of my students and a large number of instructors has been positive
and I had not much motivation to revise the book in a major way. The
simplified presentation of optimization algorithms remains as a hallmark
feature of this book. Purposefully, a few topics of optimization were left out
in the first edition, which I have now included in this edition. Specifically, a
section on quadratic programming and its extension to sequential quadratic
programming have been added. Genetic algorithms (GAs) for optimization
have been significantly modified in the past 17 years, but if I have to
provide an account of all the current methods of GAs, it will be a book of
its own. But I could not resist to include some details on real-parameter
GAs and multi-objective optimization. Readers interested in knowing more
about GAs are encouraged to refer to most recent books and conference
proceedings on the topic.
A major modification has been made to the Linear Programming (LP)
chapter in the Appendix. Several methods including sensitivity analysis
procedures have been added so that students can get a comprehensive idea
of different LP methods. While making the modifications, the simplicity
of the algorithms, as it was presented in the first edition, has been kept.
Finally, more exercise problems are added not only to this chapter, but to
all previous chapters of this revised book.
xi
xii Preface
Kalyanmoy Deb
Preface to the First Edition
The person who introduced me to the field of optimization and who has had
a significant role in moulding my career is Professor David E. Goldberg of
the University of Illinois at Urbana-Champaign. On a lunch table, he once
made me understand that probably the most effective way of communicating
one’s ideas is through books. That discussion certainly motivated me in
taking up this project. The main inspiration for writing this book came
from Professor Amitabha Ghosh, Mechanical Engineering Department, IIT
Kanpur, when in one tutorial class I showed him the fifty-page handout
I prepared for my postgraduate course entitled “Optimization Methods in
Engineering Design”. Professor Ghosh looked at the handout and encouraged
me to revise it in the form of a textbook. Although it took me about an
year-and-half to execute that revision, I have enjoyed every bit of my
experience.
Most of the algorithms presented in this text are collected from various
books and research papers related to engineering design optimization. My
sincere thanks and appreciation are due to all authors of those books and
papers. I have been particularly influenced by the concise and algorithmic
approach adopted in the book entitled ‘Engineering Optimization–Methods
and Applications’ by G.V. Reklaitis, A. Ravindran, and K.M. Ragsdell.
Many algorithms presented here are modified abstractions from that book.
I am also grateful to Professor Brahma Deo and Dr. Partha Chakroborty
for their valuable comments which significantly improved the contents of this
book. The computer facility of the Computer Aided Design (CAD) Project,
generously provided by Professor Sanjay Dhande, is highly appreciable. My
special thanks are due to two of my students N. Srinivas and Ram Bhusan
Agrawal for helping me in drawing some of the diagrams and checking
some exercise problems. The computer expertise provided by P.V.M. Rao,
Samir Kulkarni, and Sailesh Srivastava in preparing one of the computer
codes is also appreciated. Discussions with Professors David Blank and
M.P. Kapoor on different issues of optimization were also helpful. I am
xvii
xviii Acknowledgements
thankful to my colleagues and staff of the CAD Project for their constant
support.
It would have taken at least twice the time to complete this book, if
I did not have the privilege to meet Dr. Subhransu Roy who generously
provided me with his text-writing and graph plotting softwares. My visits
to TELCO, TISCO, Hindustan Motors and Engineers India Ltd, and the
discussions I had with many design engineers were valuable in writing
some of the chapters. The financial assistance provided by the Continuing
Education Centre at the Indian Institute of Technology Kanpur to partially
compensate for the preparation of the manuscript is gratefully acknowledged.
I also wish to thank the Publishers, PHI Learning for the meticulous care
they took in processing the book.
This book could not have been complete without the loving support and
encouragement of my wife, Debjani. Her help in typing a significant portion
of the manuscript, in proofreading, and in preparing the diagrams has always
kept me on schedule. Encouragements from my two children, Debayan and
Dhriti, have always motivated me. Finally, I take this opportunity to
express my gratitude to my parents, Late Sri Kumud Chandra Deb and
Mrs. Chaya Deb, and my loving affection to my brothers—Asis, Debasis,
and Subhasis.
Kalyanmoy Deb
1
Introduction
1
2 Optimization for Engineering Design: Algorithms and Examples
1.1.2 Constraints
Having chosen the design variables, the next task is to identify the constraints
associated with the optimization problem. The constraints represent some
functional relationships among the design variables and other design
parameters satisfying certain physical phenomenon and certain resource
limitations. Some of these considerations require that the design remain in
static or dynamic equilibrium. In many mechanical and civil engineering
problems, the constraints are formulated to satisfy stress and deflection
limitations. Often, a component needs to be designed in such a way that
it can be placed inside a fixed housing, thereby restricting the size of the
component. There is, however, no unique way to formulate a constraint in
all problems. The nature and number of constraints to be included in the
formulation depend on the user. In many algorithms discussed in this book,
it is not necessary to have an explicit mathematical expression of a constraint;
but an algorithm or a mechanism to calculate the constraint is mandatory. For
example, a mechanical engineering component design problem may involve
a constraint to restrain the maximum stress developed anywhere in the
component to the strength of the material. In an irregular-shaped component,
there may not exist an exact mathematical expression for the maximum stress
developed in the component. A finite element simulation software may be
necessary to compute the maximum stress. But the simulation procedure and
the necessary input to the simulator and the output from the simulator must
be understood at this step.
There are usually two types of constraints that emerge from most
considerations. Either the constraints are of an inequality type or of an
equality type. Inequality constraints state that the functional relationships
among design variables are either greater than, smaller than, or equal to,
a resource value. For example, the stress (σ(x)) developed anywhere in a
component must be smaller than or equal to the allowable strength (Sallowable )
of the material. Mathematically,
σ(x) ≤ Sallowable .
Most of the constraints encountered in engineering design problems are of this
type. Some constraints may be of greater-than-equal-to type: for example,
the natural frequency (ν(x)) of a system may required to be greater than
2 Hz, or mathematically, ν(x) ≥ 2. Fortunately, one type of inequality
constraints can be transformed into the other type by multiplying both sides
by −1 or by interchanging the left and right sides. For example, the former
constraint can be transformed into a greater-than-equal-to type by either
−σ(x) ≥ −Sallowable or Sallowable ≥ σ(x).
Introduction 5
δ(x) ≤ 6.
The third task in the formulation procedure is to find the objective function
in terms of the design variables and other problem parameters. The common
engineering objectives involve minimization of overall cost of manufacturing,
or minimization of overall weight of a component, or maximization of net
profit earned, or maximization total life of a product, or others. Although
most of the above objectives can be quantified (expressed in a mathematical
form), there are some objectives that may not be quantified easily. For
example, the esthetic aspect of a design, ride characteristics of a car suspension
design, and reliability of a design are important objectives that one may be
interested in maximizing in a design, but the exact mathematical formulation
may not be possible. In such a case, usually an approximating mathematical
expression is used. Moreover, in any real-world optimization problem, there
could be more than one objective that the designer may want to optimize
simultaneously. Even though a few multi-objective optimization algorithms
exist in the literature (Chankong and Haimes, 1983), they are complex and
computationally expensive. Thus, in most optimal design problem, multiple
objectives are avoided. Instead, the designer chooses the most important
6 Optimization for Engineering Design: Algorithms and Examples
The final task of the formulation procedure is to set the minimum and the
maximum bounds on each design variable. Certain optimization algorithms
do not require this information. In these problems, the constraints completely
surround the feasible region. Other problems require this information in order
to confine the search algorithm within these bounds. In general, all N design
variables are restricted to lie within the minimum and the maximum bounds
as follows:
(L) (U )
xi ≤ xi ≤ x i for i = 1, 2, . . . , N .
Introduction 7
Figure 1.2 Illustration of the duality principle. The maximum point of f (x) is the
same as the minimum point of F (x).
(L)
In any given problem, the determination of the variables bounds xi and
(U )
xi may be difficult. One way to remedy this situation is to make a guess
about the optimal solution and set the minimum and maximum bounds so
that the optimal solution lies within these two bounds. After simulating the
optimization algorithm once, if the optimal solution is found to lie within
the chosen variable bounds, there is no problem. On the other hand, if any
design variable corresponding to the optimal solution is found to lie on or
near the minimum or the maximum bound, the chosen bound may not be
correct. The chosen bound may be readjusted and the optimization algorithm
may be simulated again. Although this strategy may seem to work only with
linear problems, it has been found useful in many real-world engineering
optimization problems.
After the above four tasks are completed, the optimization problem
can be mathematically written in a special format, known as nonlinear
programming (NLP) format. Denoting the design variables as a column
vector1 x = (x1 , x2 , . . . , xN )T , the objective function as a scalar quantity
f (x), J inequality constraints as gj (x) ≥ 0, and K equality constraints as
hk (x) = 0
1
The representation of the design variables in the above column vector helps
to achieve some matrix operations in certain multivariable optimization methods
described in Chapters 3 and 4.
8 Optimization for Engineering Design: Algorithms and Examples
(A1 to A7 ). Using the symmetry of the truss structure and loading, we observe
that for the optimal solution, A7 = A1 , A6 = A2 , and A5 = A3 . Thus, there
are practically four design variables (A1 to A4 ). This completes the first task
of the optimization procedure.
The next task is to formulate the constraints. In order for the truss to
carry the given load P = 2 kN, the tensile and compressive stress generated
in each member must not be more than the corresponding allowable strength
Syt and Syc of the material. Let us assume that the material strength for all
elements is Syt = Syc = 500 MPa and the modulus of elasticity E = 200 GPa.
For the given load, we can compute the axial force generated in each element
(Table 1.1). The positive force signifies tensile load and the negative force
signifies compressive load acting on the member.
AB − P2 csc θ BC + P2 csc α
Thereafter, the axial stress can be calculated by dividing the axial load
by the cross-sectional area of that member. Thus, the first set of constraints
can be written as
P csc θ
≤ Syc ,
2A1
P cot θ
≤ Syt ,
2A2
P csc α
≤ Syt ,
2A3
P
(cot θ + cot α) ≤ Syc .
2A4
In the above truss structure, tan θ = 1.0 and tan α = 2/3. The other set of
constraints arises from the stability consideration of the compression members
AB, BD, and DE. Realizing that each of these members is connected by pin
joints, we can write the Euler buckling conditions for the axial load in members
AB and BD (Shigley, 1986):
P πEA21
≤ ,
2 sin θ 1.281ℓ2
P πEA24
(cot θ + cot α) ≤ .
2 5.76ℓ2
Introduction 11
This shows the formulation of the truss structure problem (Deb et al.,
2000). The seven-bar truss shown in Figure 1.3 is statically determinate
and the axial force, stress, and deflection were possible to compute exactly.
In cases where the truss is statically indeterminate and large (for hand
calculations), the exact computations of stress and deflection may not be
possible. A finite element software may be necessary to compute the stress
and deflection in any member and at any point in the truss. Although similar
constraints can then be formulated with the simulated stresses and deflections,
the optimization algorithm which may be used to solve the above seven-bar
truss problem may not be efficient to solve the resulting NLP problem for
statically indeterminate or large truss problems. The difficulty arises due to
the inability to compute the gradients of the constraints. We shall discuss
more about this aspect in Chapter 4.
In some cars, the axle assembly is directly supported on the wheel. The
tyre of the wheel can also be assumed to have some stiffness in the vertical
direction. A two-dimensional dynamic model of a car suspension system is
shown in Figure 1.5. In this model, only two wheels (one each at rear
and front) are considered. The sprung mass of the car is considered to be
supported on two axles (front and rear) by means of a suspension coil spring
and a shock absorber (damper). Each axle contains some unsprung mass
which is supported by the tyre.
In order to formulate the optimal design problem, the first task is to
identify the important design variables. Let us first identify all the design
parameters that could govern the dynamic behaviour of the car vibration. In
the following, we list all these parameters:
Introduction 13
Figure 1.5 The dynamic model of the car suspension system. The above model
has four degrees-of-freedom (q1 to q4 ).
We may consider all the above parameters as design variables, but the
time taken for convergence of the optimization algorithm may be too much.
In order to simplify the formulation, we consider only four of the above
parameters—front coil stiffness kf s , rear coil stiffness krs , front damper
coefficient αf , and rear damper coefficient αr —as design variables. We keep
the other design parameters as constant:
The parameters ℓ1 and ℓ2 are the horizontal distance of the front and rear
axles from the centre of gravity of the sprung mass. Using these parameters,
differential equations governing the vertical motion of the unsprung mass at
the front axle (q1 ), the sprung mass (q2 ), and the unsprung mass at the
14 Optimization for Engineering Design: Algorithms and Examples
rear axle (q4 ), and the angular motion of the sprung mass (q3 ) are written
(Figure 1.5):
F1 = kf t d1 , F2 = kf s d2 , F3 = αf d˙2 ,
d1 = q1 − f1 (t),
d2 = q2 + ℓ1 q3 − q1 ,
d3 = q4 − f2 (t),
d4 = q2 − ℓ2 q3 − q4 .
The time-varying functions f1 (t) and f2 (t) are road irregularities as functions
of time. Any function can be used for f1 (t). For example, a bump can be
modelled as f1 (t) = A sin(πt/T ), where A is the amplitude of the bump and T
the time required to cross the bump. When a car is moving forward, the front
wheel experiences the bump first, while the rear wheel experiences the same
bump a little later, depending upon the speed of the car. Thus, the function
f2 (t) can be written as f2 (t) = f1 (t − ℓ/v), where ℓ is the axle-to-axle distance
and v is the speed of the car. For the above bump, f2 (t) = A sin(π(t−ℓ/v)/T ).
The coupled differential equations specified in Equations (1.2) to (1.5) can
be solved using a numerical integration technique (for example, a fourth-
order Runge-Kutta method can be used) to obtain the pitching and bouncing
dynamics of the sprung mass ms . Equations can be integrated for a time range
from zero to tmax .
After the design variables are chosen, the next task is to formulate the
constraints associated with the above car suspension problem. In order to
simplify the problem, we consider only one constraint. The jerk (the rate
of change of the vertical acceleration of the sprung mass) is a major factor
concerning the comfort of the riding passengers. The guideline used in car
Introduction 15
When the four coupled differential equations (1.2) to (1.5) are solved, the
above constraint can be computed by numerically differentiating the vertical
movement of the sprung mass (q2 ) thrice with respect to time.
The next task is to formulate the objective function. In this problem, the
primary objective is to minimize the transmissibility factor which is calculated
as the ratio of the bouncing amplitude q2 (t) of the sprung mass to the road
excitation amplitude A. Thus, we write the objective function as
0 ≤ kf s , krs ≤ 2 kg/mm,
0 ≤ αf , αr ≤ 300 kg/(m/s).
Thus, the above optimal car suspension design problem can be written in
NLP form as follows (Deb and Saxena, 1997):
subject to
0 ≤ kf s , krs ≤ 2,
0 ≤ αf , αr ≤ 300.
considered that the operation will remove 219,912 mm3 of material. The set-
up time, tool-change time and the time during which the tool does not cut
have been assumed as 0.15, 0.20 and 0.05 min, respectively. The objectives are
minimization of operation time (Tp ) and used tool life (ξ). The multi-objective
optimization problem formulation is given below (Deb and Datta, 2012):
Minimize F1 (x) = Tp (x),
subject to
P (x)
g1 (x) ≡ 1 − ≥ 0,
ηP max
Fc (x)
g2 (x) ≡ 1 − ≥ 0,
Fcmax
R(x)
g3 (x) ≡ 1 − ≥ 0,
Rmax
xmin
i ≤ xi ≤ xmax
i ,
where
( 0.20
)
1+ T (x)
Tp (x) = 0.15 + 219,912 + 0.05,
M RR(x)
219,912
ξ(x) = × 100,
M RR(x) T (x)
5.48(109 )
T (x) = ,
v 3.46 f 0.696 a0.460
125f 2
R(x) = .
rn
The objective ξ(x) is considered as the part of the whole tool life which is
consumed in the machining process, hence an operating condition that will
minimize this objective will use the machining task optimally. Here, R is the
surface roughness and rn is the nose radius of the tool. The constant parameter
values are given below:
Following variable bounds can be used for cutting speed, feed, and depth of
cut:
vmin = 250 m/min, vmax = 400 m/min,
fmin = 0.15 mm/rev, fmax = 0.55 mm/rev,
amin = 0.5 mm, amax = 6 mm.
1.2.2 Modelling
The next major application area of optimization methods is in modelling of
systems and processes that are often faced in engineering studies. Modelling
refers to a task that is used to describe a system or process in a manner
that can be later analyzed to gather further information about the system
or process. Often a crisp and clear idea about the working principle in any
reasonable scientific detail is usually not known for most engineering systems.
Take, for example, modelling a blast furnace in which steel is produced from
raw materials, such as iron ore, sinter, coke, etc. Given all the input materials
and their sizes and compositions that are fed into a blast furnace, the quality
of the produced steel after some hours of its operation cannot be written in
mathematical form in any accurate way. This is because the quality of the raw
materials is not uniform across their charges. Moreover, all heat and mass
transfer phenomena and chemical reactions may not be known appropriately.
In such a scenario, a mathematical modelling of such a system becomes a
difficult task. However, steel has been produced for more than 100 years and
there exists a plethora of data (input raw material information versus output
quality of steel production) with most steel plants. In such a scenario, there
are two types of modelling tasks that can be performed:
(i) Manipulation of governing equations: Whatever scientific
knowledge is available about the system or process, they can be used
by using some modification parameters. For example, if the following
system of (hypothetical) partial differential equations (PDEs) describes
a particular process:
∂E
= a11 x2 + a12 y,
∂x
∂E
= a21 xy 2 + a12 y 3 ,
∂y
then the difference between the predicted E values (obtained by solving
above equations) and the actual E values (obtained from practice) can
be taken care of by modifying the above PDEs as follows:
∂E
= β11 a11 x2 + β12 a12 y,
∂x
∂E
= β21 a21 xy 2 + β22 a12 y 3 .
∂y
18 Optimization for Engineering Design: Algorithms and Examples
In the above PDEs, the β-terms are unknown. We can then formulate
an optimization procedure to find a set of values of β-terms so that the
difference between simulated and observed E values is minimum:
( )2
Minimize f (β) = E observed − E simulated ,
subject to
E simulated = Solution (PDEs).
For a given point β in the search space, the above PDEs can be
numerically (or exactly, if possible) solved and simulated E values can
be found as a function of x and y. These simulated values can then be
compared with observed values from practice and the objective function
f (β) can be calculated. The optimization algorithm should drive the
search towards a solution β ∗ that would correspond to the minimum
possible difference. The advantage with such a procedure is that at the
end a mathematical expression of E(x, y) can be known and a further
analysis can be performed to optimize its performance and get useful
insights about an optimal operation of a system or a plant.
(ii) Use of non-mathematical techniques: Most recent modelling
tasks are performed using a non-mathematical procedure in which
instead of arriving at a mathematical function of the model, the
modelling information is stored in a network of entities or by some other
means. One such common approach is to use artificial neural networks
(ANN) (Haykin, 1999). In an ANN, a network of connectivities
between input and output layers through a series of hidden layers is
established. This is done by first choosing a network of connectivities
and then employing an ANN learning algorithm to arrive at optimal
connection weight for each connection link by minimizing the error
between ANN and observed output data. This process is called the
training phase of ANN. One popular approach is the use of back-
propagation learning algorithm (Rumelhart and McClelland, 1986).
The learning optimization task is achieved through an optimization
procedure. Both the tasks of finding the optimal connectivity and
the optimal connection weights can be achieved by other more
sophisticated ANN methodologies. However, at the end of the ANN
learning process, the optimal network is capable of providing output
value of a new input set accurately, provided the new test set
interpolates the training data well. Thus, such an ANN model can
be used to perform an analysis of the system or process for a better
understanding of the system.
Here, we illustrate a mathematical modelling task as an optimization problem.
Optimal design of an ammonia reactor
In an ammonia reactor design problem, feed gas containing nitrogen,
hydrogen, methane, argon, and a small percentage of ammonia enters the
bottom of the reactor (Figure 1.6).
Introduction 19
Thereafter, the feed gas rises till it reaches the top of the reactor. Then,
while moving downward, the nitrogen and hydrogen present in the feed gas
undergo reaction to form ammonia in the presence of a catalyst placed in the
reactor. The production of ammonia depends on the temperature of the feed
gas, the temperature at the top of the reactor, the partial pressures of the
reactants (nitrogen and hydrogen), and the reactor length. The modelling
task attempts to arrive at a mathematical governing reaction equations so as
to correctly predict certain problem parameters.
In this problem, for a given reactor length x, we identify three design
parameters—the molar flow rate of nitrogen per unit catalyst area NN2 , the
feed gas temperature Tf , and the reacting gas temperature Tg . In order
to maintain the energy balance of reactions in the reactor, three coupled
differential equations must be satisfied (Murase, Roberts, and Converse, 1970;
Upreti and Deb, 1997). First, the decrease in the feed gas temperature must
be according to the heat loss to the reaction gas:
dTf US1
=− (Tg − Tf ). (1.6)
dx WCpf
In Equation (1.6), U is the overall heat transfer coefficient, S1 is the
surface area of the catalyst tubes per unit reactor length, W is the total mass
flow rate, and Cpf is the specific heat capacity of the feed gas. Secondly, the
change in the reaction gas temperature must be according to the heat gain
from the feed gas and heat generated in the reaction:
dTg US1 (−∆H)S2
=− (Tg − Tf ) + fa
dx WCpg WCpg
( )
1.5pN2 pH2 pNH3
× K1 − K2 , (1.7)
pNH3 1.5pH2
20 Optimization for Engineering Design: Algorithms and Examples
pH2 = 3pN2 ,
286(2.23NN20 − 2NN2 )
pNH3 = ,
2.598NN20 + 2NN2
where NN20 is the molar flow rate of nitrogen per unit catalyst area at the top
of the reactor. We use the following parameter values:
fa = 1.
Note that all the above three ordinary differential equations (ODEs)
(1.6)–(1.8) are coupled to each other. In order to solve these equations, we
use the following boundary conditions:
Tf (x = 0) = T0 ,
Tg (x = 0) = 694 K,
The three constraints (Equations (1.6) to (1.8)) can be used to compute the
three parameters of the problem. By comparing these three parameters with
Introduction 21
their recorded values from an actual reactor, the chemical process can be
modelled. There are various practical reasons why the solutions of the above
ODEs usually do not match with the plant values. By using the objective
of the model optimization problem as minimization of the difference between
simulated parameter values and their actual plant values, we hope to represent
the model of the reactor with the ODEs. Thus, the NLP problem is as follows:
( )2
plant plant 2
Minimize NN2 − NN2 + (Tf − Tf ) + (Tg − Tgplant )2 ,
subject to
dTf US1
= −β1 (Tg − Tf ),
dx WCpf
Data fitting and regression analysis are activities commonly used by scientists,
engineers and managers. We often employ a software to help us find a fitted
curve on a set of data. If thought carefully, the software formulates and solves
an optimization problem to arrive at the fitted curve. For a set of given data
points (say (xi,p , y i,p ) for i = 1, 2, . . . , K) involving input variable set x and
output y, any curve y = f (x) is assigned an objective function as the sum of
the squared difference between the y i,p and f (xi,p ):
∑K ( )
i,p 2
Minimize i,p
i=1 f (x ) − y . (1.9)
The decision variables are the coefficients of the fitted curve f (x). For a
single-input linear regression (y = ax + b), there are two decision variables a
and b. The minimization procedure will find a and b values that will make the
overall error between the fitted line and the supplied data points minimum.
The same idea can be extended to fit a curve of any sort. The regression task
can be made more generic, by keeping the nature of the curve y = f (x) also
as a part of the variable set. This domain then falls under the category of
modelling of systems which is discussed in the previous subsection.
22 Optimization for Engineering Design: Algorithms and Examples
Figure 1.7 A schematic of the overhead crane consisting of a trolley and a swaying
load.
N = M g + 2T cos(α),
where T is twice the tension in each cable, M is the mass (in kg) of the trolley,
and ẍ is the acceleration of the trolley in the x-direction. Performing a similar
task for the hanging load (of mass m kg), we have the following two equations:
−2T sin(α) − cẋ1 = mx¨1 ,
Here, lo is the length of the cable, α̇ and α̈ are the angular velocity and
acceleration of the cable. By eliminating T , x1 and y1 from the above
expressions, we get the following two equations of motion of the trolley and
the hanging mass:
[
ẍ = F − cẋ sin2 (α) + mlo sin(α)α̇2 + mg sin(α) cos(α)
]
−f (mlo cos(α)α̇2 − cẋ sin(α) cos(α) − mg sin2 (α))
cos(α)
α̈ = − (ẍ + rẋ + g tan(α)) − rα̇, (1.11)
lo
where r is the ratio of c to m. These two equations can be solved using a
numerical integration technique and the variation of x and α with time t can
be found.
A little thought of the problem makes it clear that the two objectives
(i) total energy supplied to the system and (ii) the total time for the block-
load system to reach the desired position and stabilize are the two conflicting
24 Optimization for Engineering Design: Algorithms and Examples
objectives. The supplied energy will be minimum for the case of moving ever
slowly towards the destination. But such a solution will require quite a long
time to complete the task. On the other hand, reaching the destination with
a large velocity and suddenly stopping at the destination would be a quick
way to reach the destination; however, some time needs to be elapsed for
the sway of the load to diminish. Although such a solution may not be the
quickest overall time solution, there would exist a solution with a reasonable
velocity which would minimize the overall time. Ideally, this is a two-objective
problem, but for the sake of our discussion here, we may formulate a single-
objective time minimization of the crane maneuvering problem:
Minimize Time = T ,
subject to
α(T ) ≤ ϵ.
Values x(T ) and α(T ) can be found by solving the differential equations given
in Equations (1.10) and (1.11) from suitable initial conditions. The parameter
ϵ is a user-specified small sway angle for a stable system at time T . The
decision variables of this problem are F (t) and ℓ(t).
Figure 1.8 (a) Ray coverage, (b) Ultrasound signal indicating TOF.
The simulation of TOF assumes that the ultrasound rays follow straight
paths from source to the detector. This assumption is reasonable when the
impedance mismatch within the specimen is small.
A solution to this problem is a configuration in which some cells are
occupied by foreign materials or voids (Kodali et al., 2008). The task of the
optimization is then to identify the cells with foreign materials or voids that
minimize the error between actual TOF and simulated TOF. The simulated
TOF for a ray r originating from j-th source and terminating at the k-th
detector is estimated using Equation (1.12):
∑M (r)
lm (j, k)
TOFsimulated (r) = , (1.12)
m=1
vm
where
M = number of cells intercepted by the ray,
(r)
lm (j, k) = length of ray intercepted by m-th cell along the ray path,
vm = velocity of propagation of ultrasound through the m-th cell.
where R is the number of rays that are considered. The optimization procedure
will then enable finding the size and location of voids or foreign material
embedded in a component.
Figure 1.10 shows a typical transit system network. The solid lines represent
different routes, the points on the lines represent the stops and the circled
intersections of the routes represent the transfer stations. The problem is to
determine schedules for the routes such that the transit system provides the
best level of service (LOS) to its passengers, within the resources available.
One of the good measures of the LOS is the amount of time passengers wait
during their journey—the lesser the waiting time, the better is the LOS
(Chakroborty et al., 1995). On any transit network, either the passengers
wait to board the vehicle at the station of origin or they wait at a transfer
Introduction 27
Figure 1.9 A typical schedule for a travelling salesperson problem with some
Indian cities.
station at which they transfer from one vehicle to another. For example, a
passenger wishing to travel from station A to station B (in the network shown
in Figure 1.10) will have to wait at station A to board a vehicle on Route 1.
Further, the passenger will have to wait at transfer station C to board a vehicle
on Route 3 (which will take him/her to the destination). We will refer to the
wait at station A as the initial wait time (IWT) and the wait at station C
as the transfer time (TT). A good schedule is one which minimizes the sum
of IWT and TT for all passengers. Thus, the optimization problem involves
finding a schedule of vehicles on all routes (arrival and departure times) such
that the total waiting time for the passengers is minimum.
28 Optimization for Engineering Design: Algorithms and Examples
The design variables in this problem are the arrival time aki and departure
time dki for the k-th vehicle at the i-th route. Thus, if in a problem, there are
a total of M routes and each route has K vehicles, the total number of design
variables is 2MK. In addition, there are a few more artificial variables which
we shall discuss later.
The constraints in this problem appear from different service-related
limitations. Some of these constraints are formulated in the following:
Minimum stopping time: A vehicle cannot start as soon as it stops; it has to
wait at the stop for a certain period of time, or
Maximum stopping time: A vehicle cannot stop for more than a certain period
of time even if it means increasing the total transfer time on the network, or
Figure 1.11 Transfers from the k-th vehicle in the i-th route to three consecutive
vehicles in the j-th route.
in the i-th route is not possible to the (l − 1)-th vehicle in the j-th route,
because the departure time of the latter vehicle (dl−1
j ) is earlier than that of
k,l−1 k,l
aki . Thus, the parameter δi,j takes a value zero, whereas the parameter δi,j
takes a value one. In order to simplify the model, we assume that transfers to
vehicles departing after the l-th vehicle in the j-th route are also not possible.
k,q
All parameters δi,j for q = (l + 1), (l + 2), . . . are also zero. Thus, between
any two vehicles, the following condition must be satisfied:
k,l
(dlj − aki )δi,j ≤T for all i, j, k and l.
It is clear that the left side expression of the above condition is zero for those
transfers that are not feasible. Since transfers only to the next available vehicle
k,l
are assumed, only one δi,j (for l = 1, 2, . . .) is one and the rest all are zeros
for fixed values of i, j, and k. Mathematically,
∑ k,l
δi,j = 1 for all i, j and k.
l
k,l
The introduction of the artificial variables δi,j makes the formulation easier,
but causes a difficulty. Many optimization algorithms cannot handle discrete
k,l
design variables efficiently. Since the artificial design variables δi,j can only
take a value zero or one, another set of constraints is added to enforce the
binary values:
k,l
(dlj − aki ) + M (1 − δi,j )≥0 for all i, j, k and l,
where M is a large positive number. The above constraint ensures that the
k,l
variable δi,j always takes a value one whenever a transfer is possible and the
value zero whenever a transfer is not possible. This constraint is derived purely
from the knowledge of the available optimization algorithms. There may be
other ways to formulate the concept of feasible transfers, but inclusion of such
artificial design variables often makes the understanding of the problem easier.
Maximum headway: The headway between two consecutive vehicles should be
less than or equal to the policy headway, hi , or
(ak+1
i − aki ) ≤ hi for all i and k.
30 Optimization for Engineering Design: Algorithms and Examples
The objective function consists of two terms: the first term represents
the total transfer time (TT) over all the passengers and the second term
represents the initial waiting time (IWT) for all the passengers. The objective
is to minimize the following function:
∑∑∑∑ k,l l
δi,j (dj − aki )wi,j
k
i j k l
∫ ak k−1
i −ai
∑∑ [ ]
+ vi,k (t) (aki − aik−1 ) − t dt.
i l 0
k
The parameter wi,j is the number of passengers transferring from the k-th
vehicle of the i-th route to the j-th route. The first term is obtained by
summing the individual transfer time (dlj − aki ) over all passengers for all
the vehicles for every pair of routes. The parameter vi,k (t) is the number of
passengers arriving at the stop for the k-th vehicle in the i-th route at a given
time t. Since the arrival time for passengers can be anywhere between t = 0
and t = (aki −aik−1 ) (the headway), the initial waiting time also differs from one
passenger to another. For example, a passenger arriving at the stop just after
the previous vehicle has left has to wait for the full headway time (aki − aik−1 )
before the next vehicle arrives. On the other hand, a passenger arriving at the
stop later has to wait for a shorter time. The calculation of the second term
assumes that passengers arrive at the stop during the time interval aik−1 to
aki according to the known time-varying function vi,k (t), where t is measured
from aik−1 . Then the quantity
∫ ak k−1
i −ai [ ]
vi,k (t) (aki − aik−1 ) − t dt
0
gives the sum of the initial waiting times for all passengers who board the
k-th vehicle of the i-th route. We then sum it over all the routes and vehicles
to estimate the network total of the IWT. Thus, the complete NLP problem
can be written as follows:
∑ ∑ ∑ ∑ k,l l
Minimize δi,j (dj − aki )wi,j
k
i j k l
∫ ak k−1
i −ai
∑∑ [ ]
+ vi,k (t) (aki − aik−1 ) − t dt
i l 0
subject to
smax − (dki − aki ) ≥ 0 for all i and k,
k,l
(dlj − aki ) + M (1 − δi,j )≥0 for all i, j, k and l,
hi − (ak+1
i − aki ) ≥ 0 for all i and k,
∑ k,l
δi,j = 1 for all i, j and k.
l
k,l
In the above NLP problem, the variables δi,j are binary variables taking
only a value zero or a one and other variables aki and dki are real-valued.
Thus, a mixed integer programming technique described in Chapter 5 or
genetic algorithms described in Chapter 6 can be used to solve the above
NLP problem (Chakroborty et al., 1995).
Data mining activities have become an inseparable part of any scientific work,
mainly due to the availability and ease of generation of data. When the data
generation task is expensive or tedious, data is usually generated in parallel
among various research groups and they are shared by scientists. In such an
activity, the first task is often to cluster the data into several categories to
establish which data from what source are related to each other. Optimization
is usually a tool employed for such a clustering activity. Here, we discuss such
a procedure.
Let us consider a two-dimensional data shown in Figure 1.12. When the
data is achieved, they are all jumbled together. One way to formulate an
optimization problem is to first decide on a number of clusters that the data
may be divided to. Then, for each cluster, a few parameters are used variables
for the optimization task. The parameters describe an affinity function for a
cluster. Say, for example, we define a simple affinity function for the i-th
cluster, as follows:
(x − xi )2 (y − yi )2
ci (x, y) = + .
a2i b2i
This is an equation of an ellipse with centre at (xi , yi ) and major and minor
axis values as ai and bi , respectively. Next, each data point (x, y) is tested for
each cluster affinity function Ci . The point is associated with that cluster for
which the affinity function value is minimum. This procedure will associate
every data point with a particular cluster. Point A (in the figure) may get
associated with Cluster 2.
Next, an intra-cluster distance (Diintra ) between all associated points of
each cluster is computed as follows:
intra 1 ∑ √
Di = (x − xi )2 + (y − yi )2 . (1.14)
|Ci |
(x,y)∈Ci
Diintra
Minimize ,
Dinter
i
subject to
(xi , yi ) ∈ R,
where R is the specified range in which the centres of ellipses are supposed to
lie. The minimization will ensure that intra-cluster points are as close to the
centre as possible and all cluster centres are as far away from each other as
possible, thereby achieving the goal of clustering.
Prediction is another data mining activity which is used on a daily basis by
weather forecasters, share market predictors, etc. The task here is to model
the past few days (or weeks or months) of data and then extrapolate the
model to predict how the future is going to be for the next hours, days or
weeks. Since the modelling activity can be achieved through an optimization
problem-solving (as discussed before), the prediction activity can be achieved
using an optimization procedure.
algorithms work at finding the optimal solution in the search space, which
is not any arbitrary solution, they can be ideal candidates for arriving at an
intelligent solution. For example, consider the robot path planning problem in
which the task is to design the ‘brain’ of the robot which will help it to avoid
obstacles that may be moving and still reach a destination with minimum
time or using minimum energy or optimizing some other criterion.
The use of an optimization method to solve such problems causes an inher-
ent problem. The optimal path depends on how the obstacles will be moving
in a real scenario and unfortunately this information may not be known a
priori. One way to address such a task is to first develop an optimal ‘rule
base’ by performing an off-line optimization on several training cases. Once
the optimal rule-base is developed, the robot can then utilize it to navigate
in a real scenario. If sufficient training cases are considered, a fairly robust
rule-base can be developed. Figure 1.13 depicts the above-discussed off-line
optimization-based approach (Pratihar et al., 1999). On a set of instantia-
tions, an optimization algorithm is applied to find a knowledge base using
rules or by other means. The optimization task would find a set of rules or
classifiers which will determine the nature of the outcome based on the vari-
able values at any time instant. In the following, we describe the procedure
in the context of an on-line robot navigation problem. There are essentially
two parts of the problem:
(i) Learn to find any obstacle-free path from point A to B, and
(ii) Learn to choose that obstacle-free path which takes the robot in a
minimum possible time.
This process of avoiding an object can be implemented using a rule of the
following sort:
If an object is very near and is approaching, then turn right to the original path.
34 Optimization for Engineering Design: Algorithms and Examples
Figure 1.14 A schematic showing condition and action variables for the robot
navigation problem.
The angle and distance of a robot from a critical obstacle and its decision
on deviation, if any, are illustrated in Figure 1.14. Because of the imprecise
definition of the deviation in this problem, it seems natural to use a fuzzy logic
technique (Mendel, 2000) here. Table 1.2 shows a set of possible fuzzy rules.
The (1,1) position of the table suggests that if an obstacle is very near (VN)
to the robot and it is left (L) of the robot, the decision is to move ahead (A).
L AL A AR R
VN A AR AL AL A
Distance
N A A AL A A
F A A AR A A
VF A A A A A
This way, the rule-base shows every possible decision that the robot should
take in navigating in a real scenario. Finding what decision to take in every
combination of angle and distance is the task of an off-line optimization
algorithm. To extend the scope, the presence or absence of a rule for every
combination of angle and distance may be considered as another kind of
decision variable. The fuzzy rules involve membership functions that define
the attributes such as the extent of distance values for VN, N, etc. and also A,
L, etc. Triangular membership functions are commonly used for this purpose.
Introduction 35
This involves another set of decision variables as the width of the triangles
(such as parameters b1 and b2 shown in Figure 1.15). Thus, when a rule-
subject to
Path (rule-base) is obstacle-free.
Note that both Time and Path are functions of the rule-base and the elements
of the rule-base are decision variables of the optimization task.
one problem, but may perform poorly on other problems. That is why the
optimization literature contains a large number of algorithms, each suitable to
solve a particular type of problem. The choice of a suitable algorithm for an
optimization problem is, to a large extent, dependent on the user’s experience
in solving similar problems. This book provides a number of optimization
algorithms used in engineering design activities.
Since the optimization algorithms involve repetitive application of certain
procedures, they need to be used with the help of a computer. That is why the
algorithms are presented in a step-by-step format so that they can be easily
coded. To demonstrate the ease of conversion of the given algorithms into
computer codes, most chapters contain a representative working computer
code. Further, in order to give a clear understanding of the working of
the algorithms, they are hand-simulated on numerical exercise problems.
Simulations are performed for two to three iterations following the steps
outlined in the algorithm sequentially. Thus, for example, when the algorithm
suggests to move from Step 5 to Step 2 in order to carry out a conditional
statement, the exercise problem demonstrates this by performing Step 2 after
Step 5. For the sake of clarity, the optimization algorithms are classified into
a number of groups, which are now briefly discussed.
Single-variable optimization algorithms. Because of their simplicity, single-
variable optimization techniques are discussed first. These algorithms provide
a good understanding of the properties of the minimum and maximum
points in a function and how optimization algorithms work iteratively to
find the optimum point in a problem. The algorithms are classified into two
categories—direct methods and gradient-based methods. Direct methods do
not use any derivative information of the objective function; only objective
function values are used to guide the search process. However, gradient-based
methods use derivative information (first and/or second-order) to guide the
search process. Although engineering optimization problems usually contain
more than one design variable, single-variable optimization algorithms are
mainly used as unidirectional search methods in multivariable optimization
algorithms.
Multivariable optimization algorithms. A number of algorithms for uncons-
trained, multivariable optimization problems are discussed next. These
algorithms demonstrate how the search for the optimum point progresses in
multiple dimensions. Depending on whether the gradient information is used
or not used, these algorithms are also classified into direct and gradient-based
techniques.
Constrained optimization algorithms. Constrained optimization algorithms
are described next. These algorithms use the single-variable and multivariable
optimization algorithms repeatedly and simultaneously maintain the search
effort inside the feasible search region. Since these algorithms are mostly used
in engineering optimization problems, the discussion of these algorithms covers
most of the material of this book.
Introduction 37
to solve such problems (Deb and Sinha, 2010). Tri-level and higher-level
optimization problems are also possible.
Most optimization problems discussed in this chapter are deterministic
in nature, meaning that the objective function, constraint function or
parameters are fixed from the beginning till the end of the optimization run.
These problems are called deterministic optimization problems. However, as
discussed somewhat in Section 1.2.8, some problems may have dynamically
changing objective function or constraint function or parameters. While the
optimization run is in process, one or a few of the functions and parameters
may change. This happens usually if a system depends on the environment
(ambient temperature or humidity) and the process continues over most of the
day. The ambient condition becomes different from morning to noon. Thus,
the solution that was optimal during morning may not remain optimal in
the noon time. Such problems are called stochastic optimization problems,
or sometimes dynamic optimization problems as well. Although an off-line
optimization, as discussed in Section 1.2.8, is one approach, recent dynamic
optimization methodologies (Deb et al., 2007) are other viable and pragmatic
approaches.
At the end of the optimization process, one obvious question may arise:
Is the obtained solution a true optimum solution? Unfortunately, there is
no easy answer to this question for an arbitrary optimization problem. In
problems where the objective functions and constraints can be written in
simple, explicit mathematical forms, the Kuhn-Tucker conditions described
in Chapter 4 may be used to check the optimality of the obtained solution.
However, those conditions can only be used for a few classes of optimization
problems. In a generic problem, this question is answered in a more practical
way. In many engineering design problems, a good solution is usually known
either from the previous studies or from experience. After formulating the
optimal problem and applying the optimization algorithm if a better solution
than the known solution is obtained, the new solution becomes the current
best solution. The optimality of the obtained solution is usually confirmed by
applying the optimization algorithms a number of times from different initial
solutions.
Having discussed different types of optimization algorithms that exist to
solve different kinds of optimization problems, we suggest that for practical
applications there is always a need for customizing an optimization algorithm
for solving a particular problem. The reasons are many. First, there may
not exist a known efficient optimization algorithm for a particular problem at
hand. Second, the size and complexity of solving a problem using an algorithm
may be computationally too expensive for it to be pragmatic. Third, in
many scenarios, the user may not be interested in finding the exact optimal
solution, rather a reasonably good (or better than an existing solution)
may be adequate. In such scenarios, knowledge (or problem information)
about the current problem may be used to design a customized optimization
algorithm that is appropriate for the current problem (Deb, Reddy and Singh,
2003). Such optimization methods are often known as heuristic optimization
40 Optimization for Engineering Design: Algorithms and Examples
1.4 Summary
In order to use optimization algorithms in engineering design activities, the
first task is to formulate the optimization problem. The formulation process
begins with identifying the important design variables that can be changed
in a design. The other design parameters are usually kept fixed. Thereafter,
constraints associated with the design are formulated. The constraints may
arise due to resource limitations such as deflection limitations, strength
limitations, frequency limitations, and others. Constraints may also arise due
to codal restrictions that govern the design. The next task is to formulate
the objective function which the designer is interested in minimizing or
maximizing. The final task of the formulation phase is to identify some
bounding limits for the design variables.
The formulation of an optimization problem can be more difficult than
solving the optimization problem. Unfortunately, every optimization problem
requires different considerations for formulating objectives, constraints, and
variable bounds. Thus, it is not possible to describe all considerations in a
single book. However, many of these considerations require some knowledge
about the problem, which is usually available with the experienced designers
due to their involvement with similar other design problems.
The rest of the book assumes that the formulation of an optimization
problem is available. Chapters 2 to 6 describe a number of different
optimization algorithms—traditional and nontraditional—in step-by-step
format. To demonstrate the working of each algorithm, hand-simulations on
a numerical example problem are illustrated. Sample computer codes for a
number of optimization algorithms are also appended to demonstrate the ease
of conversion of other algorithms into similar computer codes.
REFERENCES
Single-variable Optimization
Algorithms
43
44 Optimization for Engineering Design: Algorithms and Examples
EXERCISE 2.1.1
EXERCISE 2.1.2
We begin with the exhaustive search method, simply because this method
is the simplest of all other methods. In the exhaustive search method, the
optimum of a function is bracketed by calculating the function values at
a number of equally spaced points (Figure 2.3). Usually, the search begins
from a lower bound on the variable and three consecutive function values are
compared at a time based on the assumption of unimodality of the function.
Based on the outcome of comparison, the search is either terminated or
continued by replacing one of the three points by a new point. The search
continues until the minimum is bracketed.
Figure 2.3 The exhaustive search method that uses equally spaced points.
Single-variable Optimization Algorithms 47
Algorithm
EXERCISE 2.2.1
Consider the problem:
in the interval (0, 5). A plot of the function is shown in Figure 2.4. The plot
shows that the minimum lies at x∗ = 3. The corresponding function value at
Figure 2.4 The unimodal, single-variable function used in the exercise problems.
The minimum point is at x = 3.
that point is f (x∗ ) = 27. By calculating the first and second derivatives at
this point, we observe that f ′ (3) = 0 and f ′′ (3) = 6. Thus, the point x = 3 is
48 Optimization for Engineering Design: Algorithms and Examples
Comparing these function values, we observe that f (x1 ) > f (x2 ) > f (x3 ).
Thus, the minimum does not lie in the interval (0, 1). We set x1 = 0.5,
x2 = 1.0, x3 = 1.5, and proceed to Step 3.
Step 3 At this step, we observe that x3 < 5. Therefore, we move to Step 2.
This completes one iteration of the exhaustive search method. Since the
minimum is not bracketed, we continue to perform the next iteration.
Step 2 At this iteration, we have function values at x1 = 0.5, x2 = 1.0, and
x3 = 1.5. Since we have already calculated function values at the first two
points, we compute the function value at x3 only: f (x3 ) = f (1.5) = 38.25.
Thus, f (x1 ) > f (x2 ) > f (x3 ), and the minimum does not lie in the interval
(0.5, 1.5). Therefore, we set x1 = 1.0, x2 = 1.5, x3 = 2.0, and move to Step 3.
Step 3 Once again, x3 < 5.0. Thus, we move to Step 2.
Step 2 At this step, we require to compute the function value only at
x3 = 2.0. The corresponding function value is f (x3 ) = f (2.0) = 31.00. Since
f (x1 ) > f (x2 ) > f (x3 ), we continue with Step 3. We set x1 = 1.5, x2 = 2.0,
and x3 = 2.5.
Step 3 At this iteration, x3 < 5.0. Thus, we move to Step 2.
Step 2 The function value at x3 = 2.5 is f (x3 ) = 27.85. Like previous
iterations, we observe that f (x1 ) > f (x2 ) > f (x3 ), and therefore, we go
to Step 3. The new set of three points are x1 = 2.0, x2 = 2.5, and x3 = 3.0.
As evident from the progress of the algorithm so far that if the number of
desired iterations to achieve the optimum is taken to be large to attain good
accuracy in the solution, this method may lead to a number of iterations
through Steps 2 and 3.
Step 3 Once again, x3 < 5.0. Thus, we move to Step 2.
Step 2 Here, f (x3 ) = f (3.0) = 27.00. Thus, we observe that f (x1 ) >
f (x2 ) > f (x3 ). We set x1 = 2.5, x2 = 3.0, and x3 = 3.5.
Step 3 Again, x3 < 5.0, and we move to Step 2.
Single-variable Optimization Algorithms 49
EXERCISE 2.2.2
f (x) = x2 + 54/x
Step 4 The function value at x(1) is 50.301 which is less than that at x(0) .
Thus, we set k = 1 and go to Step 3. This completes one iteration of the
bounding phase algorithm.
Step 3 The next guess is x(2) = x(1) + 21∆ = 1.1 + 2(0.5) = 2.1.
Step 4 The function value at x(2) is 30.124 which is smaller than that at
x(1) . Thus we set k = 2 and move to Step 3.
Step 4 The function value f (x(3) ) is 29.981 which is smaller than f (x(2) ) =
31.124. We set k = 3.
Step 4 The function value at this point is f (8.1) = 72.277 which is larger
than f (x(3) ) = 29.981. Thus, we terminate with the obtained interval as
(2.1, 8.1).
With ∆ = 0.5, the obtained bracketing is poor, but the number of function
evaluations required is only 7. It is found that with x(0) = 0.6 and ∆ = 0.001,
the obtained interval is (1.623, 4.695), and the number of function evaluations
is 15. The algorithm approaches the optimum exponentially but the accuracy
in the obtained interval may not be very good, whereas in the exhaustive
search method the iterations required to attain near the optimum may be
large, but the obtained accuracy is good. An algorithm with a mixed strategy
may be more desirable. At the end of this chapter, we present a FORTRAN
code implementing this algorithm. A sample simulation result obtained using
the code is also presented.
Single-variable Optimization Algorithms 51
Let us consider two points x1 and x2 which lie in the interval (a, b) and
satisfy x1 < x2 . For unimodal functions for minimization, we can conclude
the following:
• If f (x1 ) > f (x2 ) then the minimum does not lie in (a, x1 ).
• If f (x1 ) < f (x2 ) then the minimum does not lie in (x2 , b).
• If f (x1 ) = f (x2 ) then the minimum does not lie in (a, x1 ) and (x2 , b).
Consider a unimodal function drawn in Figure 2.5. If the function value at
x1 is larger than that at x2 , the minimum point x∗ cannot lie on the left-side
of x1 . Thus, we can eliminate the region (a, x1 ) from further consideration.
Therefore, we reduce our interval of interest from (a, b) to (x1 , b). Similarly,
the second possibility (f (x1 ) < f (x2 )) can be explained. If the third situation
occurs, that is, when f (x1 ) = f (x2 ) (this is a rare situation, especially when
numerical computations are performed), we can conclude that regions (a, x1 )
and (b, x2 ) can be eliminated with the assumption that there exists only one
local minimum in the search space (a, b). The following algorithms constitute
their search by using the above fundamental rule for region elimination.
Figure 2.5 A typical single-variable unimodal function with function values at two
distinct points.
52 Optimization for Engineering Design: Algorithms and Examples
Figure 2.6 Three points x1 , xm , and x2 used in the interval halving method.
is eliminated. There are three scenarios that may occur. If f (x1 ) < f (xm ),
then the minimum cannot lie beyond xm . Therefore, we reduce the interval
from (a, b) to (a, xm ). The point xm being the middle of the search space,
this elimination reduces the search space to 50 per cent of the original search
space. On the other hand, if f (x1 ) > f (xm ), the minimum cannot lie in the
interval (a, x1 ). The point x1 being at one-fourth point in the search space,
this reduction is only 25 per cent. Thereafter, we compare function values at
xm and x2 to eliminate further 25 per cent of the search space. This process
continues until a small enough interval is found. The complete algorithm is
described below. Since in each iteration of the algorithm, exactly half of the
search space is retained, the algorithm is called the interval halving method.
Algorithm
Step 1 Choose a lower bound a and an upper bound b. Choose also a small
number ϵ. Let xm = (a + b)/2, L0 = L = b − a. Compute f (xm ).
Step 2 Set x1 = a + L/4, x2 = b − L/4. Compute f (x1 ) and f (x2 ).
Step 3 If f (x1 ) < f (xm ) set b = xm ; xm = x1 ; go to Step 5;
Else go to Step 4.
Step 4 If f (x2 ) < f (xm ) set a = xm ; xm = x2 ; go to Step 5;
Else set a = x1 , b = x2 ; go to Step 5.
Step 5 Calculate L = b − a. If |L| < ϵ, Terminate;
Else go to Step 2.
Single-variable Optimization Algorithms 53
At every iteration, two new function evaluations are performed and the
interval reduces to half of that at the previous iteration. Thus, the interval
reduces to about 0.5n/2 L0 after n function evaluations. Thus, the function
evaluations required to achieve a desired accuracy ϵ can be computed by
solving the following equation:
(0.5)n/2 (b − a) = ϵ.
EXERCISE 2.3.1
We again consider the unimodal, single-variable function used before:
f (x) = x2 + 54/x.
Step 3 By comparing these function values, we observe that f (x1 ) > f (xm ).
Thus we continue with Step 4.
Step 4 We again observe that f (x2 ) > f (xm ). Thus, we drop the intervals
(0.00, 1.25) and (3.75, 5.00). In other words, we set a = 1.25 and b = 3.75.
The outcome of this iteration is pictorially shown in Figure 2.7.
Figure 2.7 First two iterations of the interval halving method. The figure shows
how exactly half of the search space is eliminated at every iteration.
Step 5 The new interval is L = 3.75 − 1.25 = 2.5, which is exactly half of
that in the original interval (L0 = 5). Since |L| is not small, we continue with
Step 2. This completes one iteration of the interval halving method.
54 Optimization for Engineering Design: Algorithms and Examples
The function values are f (x1 ) = 32.32 and f (x2 ) = 27.05, respectively. It is
important to note that even though three function values are required for
comparison at Steps 3 and 4, we have to compute function values at two new
points only; the other point (xm , in this case) always happens to be one from
the previous iteration.
Step 3 We observe that f (x1 ) = 32.32 > f (xm ) = 27.85. Thus, we go to
Step 4.
Step 4 Here, f (x2 ) = 27.05 < f (xm ) = 27.85. Thus, we eliminate the
interval (1.25, 2.5) and set a = 2.5 and xm = 3.125. This procedure is also
depicted in Figure 2.7.
Step 5 At the end of the second iteration, the new interval length is
L = 3.75 − 2.5 = 1.25, which is again half of that in the previous iteration.
Since this interval is not smaller than ϵ, we perform another iteration.
Step 2 We compute x1 = 2.8125 and x2 = 3.4375. The corresponding
function values are f (x1 ) = 27.11 and f (x2 ) = 27.53.
Step 3 We observe that f (x1 ) = 27.11 > f (xm ) = 27.05. So we move to
Step 4.
Step 4 Here, f (x2 ) = 27.53 > f (xm ) = 27.05 and we drop the boundary
intervals. Thus, a = 2.8125 and b = 3.4375.
Step 5 The new interval L = 0.625. We continue this process until an L
smaller than a specified small value (ϵ) is obtained.
We observe that at the end of each iteration, the interval is reduced to half
of its original size and after three iterations, the interval is ( 12 )3 L0 = 0.625.
Since two function evaluations are required per iteration and half of the region
is eliminated at each iteration, the effective region elimination per function
evaluation is 25 per cent. In the following subsections, we discuss two more
algorithms with larger region elimination capabilities per function evaluation.
Algorithm
Step 1 Choose a lower bound a and an upper bound b. Set L = b − a.
Assume the desired number of function evaluations to be n. Set k = 2.
Step 2 Compute L∗ = (Fk /F
n−k+1 )L. Set x = a + L∗ and x = b − L∗ .
n+1 1 k 2 k
Step 3 Compute one of f (x1 ) or f (x2 ), which was not evaluated earlier.
Use the fundamental region-elimination rule to eliminate a region. Set new a
and b.
Step 4 Is k = n? If no, set k = k + 1 and go to Step 2;
Else Terminate.
In this algorithm, the interval reduces to (2/Fn+1 )L after n function
evaluations. Thus, for a desired accuracy ϵ, the number of required function
evaluations n can be calculated using the following equation:
2
(b − a) = ϵ.
Fn+1
56 Optimization for Engineering Design: Algorithms and Examples
As is clear from the algorithm, only one function evaluation is required at each
iteration. At iteration k, a proportion of Fn−k /Fn−k+2 of the search space at
the previous iteration is eliminated. For large values of n, this quantity is close
to 38.2 per cent, which is better than that in the interval halving method.
(Recall that in the interval halving method this quantity is 25 per cent.)
However, one difficulty with this algorithm is that the Fibonacci numbers
must be calculated in each iteration.
We illustrate the working of this algorithm on the same function used
earlier.
EXERCISE 2.3.2
Minimize the function
f (x) = x2 + 54/x.
Step 1 We choose a = 0 and b = 5. Thus, the initial interval is L = 5. Let
us also choose the desired number of function evaluations to be three (n = 3).
In practice, a large value of n is usually chosen. We set k = 2.
Step 2 We compute L2* as follows:
2
L2* = (F3−2+1 /F3+1 )L = (F2 /F4 ) · 5 = · 5 = 2.
5
Thus, we calculate x1 = 0 + 2 = 2 and x2 = 5 − 2 = 3.
Step 3 We compute the function values: f (x1 ) = 31 and f (x2 ) = 27. Since
f (x1 ) > f (x2 ), we eliminate the region (0, x1 ) or (0, 2). In other words, we set
a = 2 and b = 5. Figure 2.9 shows the function values at these two points
and the resulting region after Step 3. The exact minimum of the function is
also shown.
1
Step 2 We compute L3* = (F1 /F4 )L = 5 · 5 = 1, x1 = 2 + 1 = 3, and
x2 = 5 − 1 = 4.
Step 3 We observe that one of the points (x1 = 3) was evaluated in the
previous iteration. It is important to note that this is not an accident. The
property of the Fibonacci search method is such that at every iteration only
one new point will be considered. Thus, we need to compute the function
value only at point x2 = 4: f (x2 ) = 29.5. By comparing function values at
x1 = 3 and x2 = 4, we observe that f (x1 ) < f (x2 ). Therefore, we set a = 2
and b = x2 = 4, since the fundamental rule suggests that the minimum cannot
lie beyond x2 = 4.
Step 4 At this iteration, k = n = 3 and we terminate the algorithm. The
final interval is (2, 4).
One difficulty of the Fibonacci search method is that the Fibonacci numbers
have to be calculated and stored. Another problem is that at every iteration
the proportion of the eliminated region is not the same. In order to overcome
these two problems and yet calculate one new function evaluation per
iteration, the golden section search method is used. In this algorithm, the
search space (a, b) is first linearly mapped to a unit interval search space
(0, 1). Thereafter, two points at τ from either end of the search space are
chosen so that at every iteration the eliminated region is (1 − τ ) to that in
the previous iteration (Figure 2.10). This can be achieved by equating 1 − τ
with (τ × τ ). This yields the golden number: τ = 0.618. Figure 2.10 can be
used to verify that in each iteration one of the two points x1 and x2 is always
a point considered in the previous iteration.
Figure 2.10 The points (x1 and x2 ) used in the golden section search method.
58 Optimization for Engineering Design: Algorithms and Examples
Algorithm
Step 1 Choose a lower bound a and an upper bound b. Also choose a small
number ϵ. Normalize the variable x by using the equation w = (x − a)/(b − a).
Thus, aw = 0, bw = 1, and Lw = 1. Set k = 1.
Step 2 Set w1 = aw + (0.618)Lw and w2 = bw − (0.618)Lw . Compute f (w1 )
or f (w2 ), depending on whichever of the two was not evaluated earlier. Use
the fundamental region-elimination rule to eliminate a region. Set new aw and
bw .
Step 3 Is |Lw | < ϵ small? If no, set k = k + 1, go to Step 2;
Else Terminate.
In this algorithm, the interval reduces to (0.618)n−1 after n function
evaluations. Thus, the number of function evaluations n required to achieve
a desired accuracy ϵ is calculated by solving the following equation:
(0.618)n−1 (b − a) = ϵ.
Like the Fibonacci method, only one function evaluation is required at each
iteration and the effective region elimination per function evaluation is exactly
38.2 per cent, which is higher than that in the interval halving method. This
quantity is the same as that in the Fibonacci search for large n. In fact, for
a large n, the Fibonacci search is equivalent to the golden section search.
EXERCISE 2.3.3
Consider the following function again:
f (x) = x2 + 54/x.
Figure 2.11 Region eliminations in the first two iterations of the golden section
search algorithm.
w2 = 1 − (0.618)0.618 = 0.618.
algorithm is also presented. For other functions, the subroutine funct may
be modified and rerun the code.
Figure 2.12 The function f (x) and the interpolated quadratic function.
initial points x1 , x2 , and x3 . The fitted quadratic curve through these three
points is also plotted with a dashed line. The minimum (x) of this curve is
used as one of the candidate points for the next iteration. For non-quadratic
functions, a number of iterations of this algorithm is necessary, whereas for
quadratic objective functions the exact minimum can be found in one iteration
only.
A general quadratic function passing through two points x1 and x2 can
be written as
q(x) = a0 + a1 (x − x1 ) + a2 (x − x1 )(x − x2 ).
Single-variable Optimization Algorithms 61
If (x1 , f1 ), (x2 , f2 ), and (x3 , f3 ) are three points on this function, then the
following relationships can be obtained:
a0 = f1 , (2.2)
f2 − f1
a1 = , (2.3)
x2 − x 1
( )
1 f3 − f1
a2 = − a1 . (2.4)
x3 − x 2 x 3 − x 1
The above point is an estimate of the minimum point provided q ′′ (x) > 0 or
a2 > 0, which depends only on the choice of the three basic points. Among
the four points (x1 , x2 , x3 , and x), the best three points are kept and a new
interpolated function q(x) is found again. This procedure continues until two
consecutive estimates are close to each other.
Based on these results, we present Powell’s algorithm (Powell, 1964).
Algorithm
Step 1 Let x1 be an initial point and ∆ be the step size. Compute x2 =
x1 + ∆.
Step 2 Evaluate f (x1 ) and f (x2 ).
Step 3 If f (x1 ) > f (x2 ), let x3 = x1 + 2∆;
Else let x3 = x1 − ∆. Evaluate f (x3 ).
Step 4 Determine Fmin = min(f1 , f2 , f3 ) and Xmin is the point xi that
corresponds to Fmin .
Step 5 Use points x1 , x2 , and x3 to calculate x using Equation (2.5).
Step 6 Are |Fmin − f (x)| and |Xmin − x| small? If not, go to Step 7;
Else the optimum is the best of current four points and Terminate.
Step 7 Save the best point and two bracketing it, if possible; otherwise,
save the best three points. Relabel them according to x1 < x2 < x3 and go
to Step 4.
In the above algorithm, no check is made to satisfy a2 > 0. The same
can be incorporated in Step 5. If a2 is found to be negative, one of the three
points may be replaced by a random point. This process is continued until
the quantity a2 becomes nonnegative.
62 Optimization for Engineering Design: Algorithms and Examples
EXERCISE 2.4.1
We consider again the same unimodal, single-variable function
f (x) = x2 + 54/x
to illustrate the working principle of the algorithm.
Step 1 We choose x1 = 1 and ∆ = 1. Thus, x2 = 1 + 1 = 2.
Step 2 The corresponding function values are f (x1 ) = 55 and f (x2 ) = 31.
Step 3 Since f (x1 ) > f (x2 ), we set x3 = 1 + 2(1) = 3 and the function
value is f (x3 ) = 27.
Step 4 By comparing function values at these points, we observe that the
minimum function value Fmin = min (55, 31, 27) = 27 and the corresponding
point is Xmin = x3 = 3.
Step 5 Using Equations (2.2) to (2.4), we calculate the following
parameters:
a0 = 55,
31 − 55
a1 = = −24,
2−1
[ ]
1 27 − 55
a2 = − (−24) = 10.
3−2 3−1
Since a2 > 0, the estimated minimum is
x = (1 + 2)/2 − (−24)/(2 × 10) = 2.7.
The corresponding function value is f (x) = 27.29.
Step 6 Let us assume that |27 − 27.29| and |3 − 2.7| are not small enough
to terminate. Thus, we proceed to Step 7.
Step 7 The best point is x3 = 3, which is an extreme point. Thus, we
consider the best three points: x1 = 2, x2 = 2.7, and x3 = 3. This completes
one iteration of the algorithm. To continue with the next iteration, we proceed
to Step 4.
Step 4 At this stage, Fmin = min(31, 27.29, 27) = 27 and the corresponding
point is Xmin = 3.
Step 5 Using Equation (2.2), we obtain a1 = −5.3 and a2 = 4.33, which is
positive. The estimated minimum is x = 2.96. The corresponding function
value is f (x) = 27.005.
Step 6 Here, the values |27 − 27.005| and |3 − 2.96| may be assumed to
be small. Therefore, we terminate the process and declare that the minimum
solution of the function is the best of current four points. In this case, the
minimum is x∗ = 3 with f (x∗ ) = 27.
Single-variable Optimization Algorithms 63
f ′ (x(t) )
x(t+1) = x(t) − . (2.6)
f ′′ (x(t) )
Algorithm
Step 1 Choose initial guess x(1) and a small number ϵ. Set k = 1. Compute
f ′ (x(1) ).
Convergence of the algorithm depends on the initial point and the nature
of the objective function. For mathematical functions, the derivative may
be easy to compute, but in practice, the gradients have to be computed
64 Optimization for Engineering Design: Algorithms and Examples
numerically. At a point x(t) , the first and second derivatives are computed as
follows, using the central difference method (Scarborough, 1966):
According to Equations (2.7) and (2.8), the first derivative requires two
function evaluations and the second derivative requires three function
evaluations.
EXERCISE 2.5.1
Consider the minimization problem:
f (x) = x2 + 54/x.
Step 3 The next guess, as computed using Equation (2.6), is x(3) = 2.086
and f ′ (x(3) ) = −8.239 computed numerically.
Step 4 Since |f ′ (x(3) )| ̸< ϵ, we set k = 3 and move to Step 2. This is the
end of the second iteration.
Step 2 The second derivative at the point is f ′′ (x(3) ) = 13.899.
Step 3 The new point is calculated as x(4) = 2.679 and the derivative is
f ′ (x(4) ) = −2.167. Nine function evaluations were required to obtain this
point.
Step 4 Since the absolute value of this derivative is not smaller than ϵ, the
search proceeds to Step 2.
After three more iterations, we find that x(7) = 3.0001 and the derivative
is f ′ (x(7) ) = −4(10)−8 , which is small enough to terminate the algorithm.
Since, at every iteration the first and second-order derivatives are calculated
at a new point, a total of three function values are evaluated at every iteration.
Algorithm
Step 1 Choose two points a and b such that f ′ (a) < 0 and f ′ (b) > 0. Also
choose a small number ϵ. Set x1 = a and x2 = b.
Step 2 Calculate z = (x2 + x1 )/2 and evaluate f ′ (z).
66 Optimization for Engineering Design: Algorithms and Examples
The sign of the first-derivative at the mid-point of the current search region
is used to eliminate half of the search region. If the derivative is negative, the
minimum cannot lie in the left-half of the search region and if the derivative
is positive, the minimum cannot lie in the right-half of the search space.
EXERCISE 2.5.2
Consider again the function:
f (x) = x2 + 54/x.
Step 1 We choose two points a = 2 and b = 5 such that f ′ (a) = −9.501 and
f ′ (b) = 7.841 are of opposite sign. The derivatives are computed numerically
using Equation (2.7). We also choose a small number ϵ = 10−3 .
Step 3 Since f ′ (z) > 0, the right-half of the search space needs to be
eliminated. Thus, we set x1 = 2 and x2 = z = 3.5. This completes one
iteration of the algorithm. This algorithm works more like the interval halving
method described in Section 2.4. At each iteration, only half of the search
region is eliminated, but here the decision about which half to delete depends
on the derivatives at the mid-point of the interval.
Step 2 The new point z is the average of the two bounds: z = 3.125. The
function value at this point is f ′ (z) = 0.720.
In the secant method, both magnitude and sign of derivatives are used
to create a new point. The derivative of the function is assumed to vary
linearly between the two chosen boundary points. Since boundary points have
derivatives with opposite signs and the derivatives vary linearly between the
boundary points, there exists a point between these two points with a zero
derivative. Knowing the derivatives at the boundary points, the point with
zero derivative can be easily found. If at two points x1 and x2 , the quantity
f ′ (x1 )f ′ (x2 ) ≤ 0, the linear approximation of the derivative x1 and x2 will
have a zero derivative at the point z given by
f ′ (x2 )
z = x2 − . (2.10)
(f ′ (x2 ) − f ′ (x1 ))/(x2 − x1 )
In this method, in one iteration more than half the search space may
be eliminated depending on the gradient values at the two chosen points.
However, smaller than half the search space may also be eliminated in one
iteration.
Algorithm
The algorithm is the same as the bisection method except that Step 2 is
modified as follows:
Step 2 Calculate the new point z using Equation (2.10) and evaluate f ′ (z).
This algorithm also requires only one gradient evaluation at every
iteration. Thus, only two function values are required per iteration.
EXERCISE 2.5.3
Consider once again the function:
f (x) = x2 + 54/x.
(b − z) = 1.356, which is less than half the search space (b − a)/2 = 2.5.
We set x1 = 2 and x2 = 3.644. This completes one iteration of the secant
method.
Step 2 The next point which is computed using Equation (2.10) is z =
3.228. The derivative at this point is f ′ (z) = 1.127.
Step 3 Since f ′ (z) > 0, we eliminate the right part of the search space,
that is, we discard the region (3.228, 3.644). The amount of eliminated search
space is 0.416, which is also smaller than half of the previous search space
(3.644 − 2)/2 or 0.822. In both these iterations, the eliminated region is less
than half of the search space, but in some iterations, a region more than the
half of the search space can also be eliminated. Thus, we set x1 = 2 and
x2 = 3.228.
Step 2 The new point, z = 3.101 and f ′ (z) = 0.586.
Step 3 Since |f ′ (z)| ̸< ϵ, we continue with Step 2.
At the end of 10 function evaluations, the guess of the true minimum
point is computed using Equation (2.10): x = 3.037. This point is closer to
the true minimum point (x∗ = 3.0) than that obtained using the bisection
method.
f (x) = a0 + a1 (x − x1 ) + a2 (x − x1 )(x − x2 ) + a3 (x − x1 )2 (x − x2 )
exactly by specifying the function value as well as the first derivative at only
two points: ((x1 , f1 , f1′ ), (x2 , f2 , f2′ )). Thereafter, by setting the derivative
of the above equation to zero, the minimum of the above function can be
obtained (Reklaitis, et al., 1983):
x2 , if µ = 0,
x= x2 − µ(x2 − x1 ), if 0 ≤ µ ≤ 1, (2.11)
x , if µ > 1.
1
where
3(f1 − f2 )
z= + f1′ + f2′ ,
x2 − x 1
√
x2 − x 1
w= (z 2 − f1′ f2′ ),
|x2 − x1 |
f2′ + w − z
µ= .
f2′ − f1′ + 2w
Similar to Powell’s successive quadratic estimation method, the minimum
of the approximation function f (x) can be used as an estimate of the true
minimum of the objective function. This estimate and the earlier two points
(x1 and x2 ) may be used to find the next estimate of the true minimum point.
Two points (x1 and x2 ) are so chosen that the product of their first derivative
is negative. This procedure may be continued until the desired accuracy is
obtained.
Algorithm
Step 1 Choose an initial point x(0) , a step size ∆, and two termination
parameters ϵ1 and ϵ2 . Compute f ′ (x(0) ). If f ′ (x(0) ) > 0, set ∆ = −∆. Set
k = 0.
Step 2 Compute x(k+1) = x(k) + 2k ∆.
Step 3 Evaluate f ′ (x(k+1) ).
If f ′ (x(k+1) )f ′ (x(k) ) ≤ 0, set x1 = x(k) , x2 = x(k+1) , and go to Step 4;
Else set k = k + 1 and go to Step 2.
Step 4 Calculate the point x using Equation (2.11).
Step 5 If f (x) < f (x1 ), go to Step 6;
Else set x = x − 12 (x − x1 ) until f (x) ≤ f (x1 ) is achieved.
Step 6 Compute f ′ (x). If |f ′ (x)| ≤ ϵ1 and |(x − x1 )/x| ≤ ϵ2 , Terminate;
Else if f ′ (x)f ′ (x1 ) < 0, set x2 = x;
Else set x1 = x.
Go to Step 4.
70 Optimization for Engineering Design: Algorithms and Examples
EXERCISE 2.5.4
Consider the following function again:
f (x) = x2 + 54/x.
the stationary points of the original function h(x), roots of h(x) also become
minima. Consider the function h(x) shown in Figure 2.15, which has two
minima and two maxima. When the function is squared, these minima and
dH d2 H
= h(x)h′ (x) = 0, = [h′ (x)]2 + h(x)h′′ (x).
dx dx2
Single-variable Optimization Algorithms 73
The first condition reveals that either h(x) = 0 or h′ (x) = 0. The former
ensures that the root of h(x) is a stationary point of H(x) and the latter
condition ensures that all stationary points of h(x) remain stationary. The
second-order derivative is positive for h(x) = 0 points, thereby indicating
that roots of h(x) are minima of H(x) (like the point x = 3 in the figure).
However, for an original minimum or maximum point x̂ of h(x), there can be
two situations. If the h(x̂) is positive (like x = 5 and x = 4 in the example
problem), it remains as minimum or maximum of H(x), respectively. On the
other hand, if h(x̂) is negative, then the minimum becomes maximum and
vice versa, as shown for points x = 1 and x = 2.
Once the optimization problem is formed, a bracketing technique can be
used to first bracket the root and then a region-elimination method or a
gradient-based search method can be used to find the root with the desired
accuracy. We illustrate the root-finding procedure on a simple problem.
EXERCISE 2.6.1
Let us find the cube-root of a number (say 10). This can be formulated as a
root-finding problem as follows:
h(x) = x3 − 10 = 0.
A point that satisfies the above equation is the cube-root of number 10.
Incidentally, the cube-root of 10 obtained using a calculator is 2.154 (up to
three decimal places of accuracy). Let us investigate whether we can get this
solution using the above root-finding procedure. We use the bounding phase
and golden section search routines given at the end of this chapter to solve
this problem. We modify the objective function expression in the subroutine
funct to code the function f (x) = abs (x3 − 10). First, the bounding phase
method is applied with an initial point x(0) = 0 and ∆ = 1. (These numbers
are chosen at random.) After five function evaluations, the lower and upper
limits of the root are found to be 0 and 3, respectively. The input and output
of the bounding phase code are shown below:
enter x0, delta
0 1
enter 1 for intermediate results
0
0 3
Enter accuracy desired
0.001
Enter 1 for intermediate results
0
2.7 Summary
In this chapter, a number of optimization techniques suitable for finding the
minimum point of single-variable functions have been discussed. The problem
of finding the minimum point can be divided into two phases. At first, the
minimum point of the function needs to be bracketed between a lower and
an upper bound. Secondly, the minimum needs to be found as accurately as
possible by keeping the search effort enclosed in the bounds obtained in the
first phase.
Two different techniques to bracket the minimum point have been
discussed. The exhaustive search method requires, in general, more function
evaluations to bracket the minimum but the user has a control over the final
bracketing range. On the other hand, the bounding phase method can bracket
the minimum very fast (usually exponentially fast) but the final bracketing
range may be poor.
Once the minimum is bracketed, region-elimination methods, point
estimation methods, or gradient-based methods may be used to find a close
estimate of the minimum. Region-elimination methods exclude some portion
of the search space at every iteration by comparing function values at two
points. Among the region-elimination methods discussed in this chapter, the
golden section search is the most economical. Gradient-based methods use
derivative information, which may be computed numerically. Point estimation
methods work by approximating the objective function by simple unimodal
functions iteratively, each time finding a better estimate of the true minimum.
Two methods—quadratic approximation search based on function values at
three points and cubic interpolation search based on function and gradient
information at two points—have also been discussed.
For well-behaved objective functions, the convergence to the optimum
point is faster with Powell’s method than with the region-elimination method.
However, for any arbitrary unimodal objective function, the golden section
search method is more reliable than other methods described in this chapter.
Single-variable Optimization Algorithms 75
REFERENCES
PROBLEMS
2-1 Identify the optimum points of the following functions. Find the
optimum function values.
(i) f (x) = x3 − 10x − 2x2 + 10.
(ii) f (x) = (x − 1)2 − 0.01x4 .
(iii) f (x) = 2(x − 2) exp (x − 2) − (x + 3)2 .
(iv) f (x) = (x2 − 10x + 2) exp (0.1x).
(v) f (x) = 2x − x3 − 5x2 − 2 exp (0.01x).
(vi) f (x) = 0.01x5 − 2x4 + 500(x − 2)2 .
(vii) f (x) = exp (x) − x3 .
s(x) = a0 + a1 sin a2 (x − x1 )
2-9 Use three iterations of the bisection and the secant method to minimize
the following function:
Compare the algorithms in terms of the interval obtained at the end of three
iterations.
Single-variable Optimization Algorithms 77
2-10 Compare the golden section search and interval halving method
in terms of the obtained interval after 10 function evaluations for the
minimization of the function
in the interval (2, 5). How does the outcome change if the interval (−2, 5) is
chosen?
2-12 Find at least one root of the following functions:
(i) f (x) = x3 + 5x2 − 3.
(ii) f (x) = (x + 10)2 − 0.01x4 .
(iii) f (x) = exp (x) − x3 .
(iv) f (x) = (2x − 5)4 − (x2 − 1)3 .
(v) f (x) = ((x + 2)2 + 10)2 − x4 .
2-13 Perform two iterations of the cubic search method to minimize the
function
f (x) = (x2 − 1)3 − (2x − 5)4 .
subject to
4.5x1 + x22 − 18 ≤ 0,
2x1 − x2 − 1 ≥ 0,
x1 , x2 ≥ 0.
Use two iterations of the interval halving method to find the bracketing points.
Find out the exact minimum point along d(t) and compare.
1
The feasible direction method is discussed in Chapter 4.
78 Optimization for Engineering Design: Algorithms and Examples
COMPUTER PROGRAMS
subroutine bphase(a,b,nfun)
c bounding phase algorithm
c ****************************************************
c a and b are lower and upper bounds (output)
c nfun is the number of function evaluations (output)
c ****************************************************
implicit real*8 (a-h,o-z)
c.....step 1 of the algorithm
1 write(*,*) ’enter x0, delta’
read(*,*) x0,delta
call funct(x0-delta,fn,nfun)
Single-variable Optimization Algorithms 79
call funct(x0,f0,nfun)
call funct(x0+delta,fp,nfun)
write(*,*) ’enter 1 for intermediate results’
read(*,*) iprint
c.....step 2 of the algorithm
if (fn .ge. f0) then
if (f0 .ge. fp) then
delta = 1 * delta
else
a = x0 - delta
b = x0 + delta
endif
elseif ((fn .le. f0) .and. (f0 .le. fp)) then
delta = -1 * delta
else
go to 1
endif
k=0
xn = x0 - delta
c.....step 3 of the algorithm
3 x1 = x0 + (2**k) * delta
call funct(x1,f1,nfun)
if (iprint .eq. 1) then
write(*,4) x1, f1
4 format(2x,’Current point ’,f10.4,
- ’ function value ’,1pe15.4)
endif
c.....step 4 of the algorithm
if (f1 .lt. f0) then
k = k+1
xn = x0
fn = f0
x0 = x1
f0 = f1
go to 3
else
a = xn
b = x1
endif
if (b .lt. a) then
temp = a
a = b
b = temp
endif
return
end
80 Optimization for Engineering Design: Algorithms and Examples
subroutine funct(x,f,nfun)
c ****************************************************
c x is the current point (input)
c f is the function value (output)
c nfun if the number of function evaluations (output)
c ****************************************************
implicit real*8 (a-h,o-z)
nfun = nfun + 1
f = x*x + 54.0/x
return
end
Simulation run
subroutine golden(a,b,eps,xstar,nfun,ierr,iprint)
c ******************************************************
c a and b are lower and upper limits of the search
c interval eps is the final accuracy desired
c xstar is the obtained solution (output)
c nfun is the number of function evaluations (output)
c ******************************************************
implicit real*8 (a-h,o-z)
real*8 lw
c.....step 1 of the algorithm
xstar = a
ierr=0
maxfun = 10000
aw=0.0
bw=1.0
lw=1.0
nfun=0
k=1
gold=(sqrt(5.0)-1.0)/2.0
w1prev = gold
w2prev = 1.0-gold
call funct(a,b,w1prev,fw1,nfun)
82 Optimization for Engineering Design: Algorithms and Examples
call funct(a,b,w2prev,fw2,nfun)
ic=0
c.....step 2 of the algorithm
10 w1 = w1prev
w2 = w2prev
if (ic .eq. 1) then
fw2 = fw1
call funct(a,b,w1,fw1,nfun)
else if (ic .eq. 2) then
fw1 = fw2
call funct(a,b,w2,fw2,nfun)
else if (ic .eq. 3) then
call funct(a,b,w1,fw1,nfun)
call funct(a,b,w2,fw2,nfun)
endif
if (fw1 .lt. fw2) then
c...... first scenario
ic = 1
aw = w2
lw = bw-aw
w1prev = aw + gold * lw
w2prev = w1
c...... second scenario
else if (fw2 .lt. fw1) then
ic = 2
bw = w1
lw=bw-aw
w1prev = w2
w2prev = bw - gold * lw
c...... third scenario
else
ic = 3
aw = w2
bw = w1
lw = bw-aw
w1prev = aw + gold * lw
w2prev = bw - gold * lw
endif
c.....print intermediate solutions
if (iprint .eq. 1) then
write(*,9) a+aw*(b-a), a+bw*(b-a)
9 format(2x,’Current interval is (’,1pe12.4,’, ’,
- 1pe12.4,’)’)
endif
k=k+1
c.....step 3 of the algorithm
Single-variable Optimization Algorithms 83
subroutine funct(a,b,w,fw,nfun)
c ***************************************************
c a and b are lower and upper limits
c w is the current point
c fw is the function value (output)
c nfun if the current number of function evaluations
c ***************************************************
implicit real*8 (a-h,o-z)
c.....mapping from w to x
x = a + w * (b-a)
c.....calculate the function value, change here
c fw = x*x + 54.0/x
x1 = sqrt(4.84-(x-2.5)**2) + 0.05
fw = (x1**2 + x - 11.0)**2 + (x1 + x*x - 7)**2
nfun = nfun + 1
return
end
Simulation run
The above code is run on a PC-386 under Microsoft FORTRAN compiler with
the bounds obtained from the simulation run on the bounding phase method.
Thus, we input a = 2.1 and b = 8.1. The accuracy is set to 10−3 in order to
get the solution with three decimal places of accuracy. The input and output
statements are shown below.
Multivariable Optimization
Algorithms
85
86 Optimization for Engineering Design: Algorithms and Examples
EXERCISE 3.2.1
Consider the objective function:
Minimize f (x1 , x2 ) = (x1 − 10)2 + (x2 − 10)2 .
Figure 3.1 shows the contour plot of this function. The property of a contour
line is that any two points on a contour line have the same function value.
Thus, it is convenient to show the optimum point of a function on contour
plots. The figure shows that the minimum point lies at the point (10, 10)T .
The function value at this point is zero. Let us say that the current point of
interest is x(t) = (2, 1)T and we are interested in finding the minimum point
and the minimum function value in a search direction s(t) = (2, 5)T from
the current point. From the right-angled triangle shown in dotted line, we
obtain the optimal point x∗ = (6.207, 11.517)T . Let us investigate whether
we can find this solution by performing a unidirectional search along s(t) on
the function.
The search can be achieved by first using a bracketing algorithm to enclose
the optimum point and then using a single-variable function optimization
88 Optimization for Engineering Design: Algorithms and Examples
technique to find the optimum point with the desired accuracy. Here, we use
the bounding phase method and the golden section search method for these
purposes, respectively. The code for the bounding phase method presented at
the end of Chapter 2 is used with the following modifications to the subroutine
funct:
subroutine funct(x,f,nfun)
implicit real*8 (a-h,o-z)
dimension xt(2),s(2),xl(2)
c xt : current point (a vector)
c s : specified direction vector
c xl : a point along the search direction
data xt/2.0,1.0/
data s/2.0,5.0/
nfun = nfun + 1
do 10 i = 1,2
xl(i) = xt(i) + x * s(i)
10 continue
f = (xl(1)-10)**2 + (xl(2)-10)**2
return
end
The input to the code and the corresponding results obtained by running
the code are shown as follows:
an algorithm, having searches along each variable one at a time, can only
successfully solve linearly separable functions. These algorithms (called one-
variable-at-a-time methods) cannot usually solve functions having nonlinear
interactions among design variables. Ideally, we require algorithms which
either completely eliminate the concept of search direction and manipulate a
set of points to create a better set of points or use complex search directions
to effectively decouple the nonlinearity of the function. In the following
subsections, we describe two algorithms of each kind.
In the above algorithm, x(0) is always set as the current best point.
Thus, at the end of simulation, x(0) becomes the obtained optimum point.
It is evident from the algorithm that at most 2N functions are evaluated at
each iteration. Thus, the required number of function evaluations increases
exponentially with N . The algorithm, however, is simple to implement and
has had success in solving many industrial optimization problems (Box, 1957;
Box and Draper, 1969). We illustrate the working of this algorithm through
an exercise problem.
1
This algorithm should not be confused with well-established evolutionary
optimization field (Goldberg, 1989; Holland, 1975). We discuss one evolutionary
optimization method, largely known as Genetic Algorithms (GAs), in Chapter 6.
2
An N -dimensional hypercube is an N -dimensional box, whose length in each
dimension is fixed according to the precision required in the respective variable.
Multivariable Optimization Algorithms 91
EXERCISE 3.3.1
Consider the Himmelblau function (Reklaitis et al., 1983):
Minimize f (x1 , x2 ) = (x21 + x2 − 11)2 + (x1 + x22 − 7)2
in the interval 0 ≤ x1 , x2 ≤ 5. This function is chosen throughout this text
for a particular reason. The function is a summation of two squared terms.
Each term inside the bracket can be considered as an error term. The first
term calculates the difference between the term (x21 + x2 ) and 11 and the
second term calculates the difference between the term (x1 + x22 ) and 7. Since
the objective is to minimize these squared differences (or deviations of the
two terms from 11 and 7, respectively), the optimum solution will be a set of
values of x1 and x2 , satisfying the following two equations:
x21 + x2 = 11, x1 + x22 = 7.
Many engineering design problems aim to find a set of design
parameters satisfying a number of goals simultaneously. In these problems, a
mathematical expression for each goal is usually written and the difference of
the expression from the target is calculated. The differences are then squared
and added together to form an overall objective function, which must be
minimized. Thus, the above Himmelblau function resembles the mathematical
expression of an objective function in many engineering design problems.
Himmelblau’s function is plotted in Figure 3.2 in the range 0 ≤ x1 , x2 ≤ 5.
It appears from the plot that the minimum point is at (3, 2)T . The minimum
Figure 3.2 A plot of the Himmelblau function vs. x1 and x2 . The contour plot of
the function shows that the minimum point is at (3, 2)T .
92 Optimization for Engineering Design: Algorithms and Examples
point can also be obtained by solving the above two equations. The function
value at this minimum point is equal to zero. The contour of the function is
also shown at the bottom of the function plot.
A contour line is a collection of points having identical function values.
The same contour plot is shown in Figure 3.3. The continual decrease in the
function value of successive contour lines as they approach a point means that
the point is the minimum point. Thus, a contour plot gives an efficient (visual)
means to identify the minimum points in an objective function. Throughout
this text, we shall demonstrate the progress of optimization algorithms on the
contour plot shown in Figure 3.3.
Figure 3.3 The contour plot of the Himmelblau function. The function value
corresponding to each contour line is also listed.
Step 1 We choose an initial point x(0) = (1, 1)T and a size reduction param-
eter, ∆ = (2, 2)T . We also choose ϵ = 10−3 and initialize x = x(0) = (1, 1)T .
Step 2 Since ∥∆∥ = 2.828 > 10−3 , we create a two-dimensional hypercube
(a square) around x(0) :
x(1) = (0, 0)T , x(2) = (2, 0)T , x(3) = (0, 2)T , x(4) = (2, 2)T .
Figure 3.4 shows this square with five chosen points.
Step 3 The function values at the five points are
f (x(0) ) = 106, f (x(1) ) = 170, f (x(2) ) = 74,
f (x(3) ) = 90, f (x(4) ) = 26.
The minimum of the five function values is 26 and the corresponding point is
x(4) = (2, 2)T . Thus we designate x = (2, 2)T .
Multivariable Optimization Algorithms 93
Figure 3.4 Five iterations of the Box’s evolutionary optimization method shown
on a contour plot of Himmelblau’s function.
Step 4 Since x ̸= x(0) , we set x(0) = (2, 2)T and go to Step 2. This completes
one iteration of the Box’s evolutionary optimization method. Figure 3.4 shows
how the initial point (1, 1)T has moved to the point (2, 2)T in one iteration.
Step 2 The quantity ∥∆∥ = 2.828 is not small and therefore, we create a
square around the point (2, 2)T by adding and subtracting ∆i to the variable
xi :
x(1) = (1, 1)T , x(2) = (3, 1)T , x(3) = (1, 3)T , x(4) = (3, 3)T .
The minimum point is x = (3, 1)T having the function value equal to 10.
Step 4 Since x ̸= x(0) , we set the current best point x(0) = (3, 1)T and
proceed to Step 2.
x(1) = (2, 0)T , x(2) = (4, 0)T , x(3) = (2, 2)T , x(4) = (4, 2)T .
94 Optimization for Engineering Design: Algorithms and Examples
Step 3 The corresponding function values are f (x(0) ) = 10, f (x(1) ) = 74,
f (x(2) ) = 34, f (x(3) ) = 26, and f (x(4) ) = 50. The minimum point is
x = (3, 1)T having the function value equal to 10.
Step 4 Since the new point is the same as the previous best point x(0) , we
reduce the size parameter ∆ = (1, 1)T and move to Step 2.
Step 2 The new square with a reduced size around (3, 1)T is as follows
(Figure 3.4):
x(1) = (2.5, 0.5)T , x(2) = (3.5, 0.5)T ,
x(1) = (3, 1)T , x(2) = (4, 1)T , x(3) = (3, 2)T , x(4) = (4, 2)T .
Step 3 The minimum point is x = (3, 2)T having a function value equal to
zero.
Step 4 The point x is better than the previous best point and we proceed to
Step 2 by setting x(0) = (3, 2)T . Figure 3.4 shows the progress of the algorithm
by indicating the movement of the best point from one iteration to another
using arrows.
It is interesting to note that although the minimum point is found,
the algorithm does not terminate at this step. Since the current point is
the minimum, no other point can be found better than x(0) = (3, 2)T and
therefore, in subsequent iterations the value of the size parameter will continue
to decrease (according to Step 4 of the algorithm). When the value ∥∆∥
becomes smaller than ϵ, the algorithm terminates.
It is clear from the working of the algorithm that its convergence depends
on the initial hypercube size and location, and the chosen size reduction
parameter ∆i . Starting with a large ∆i is good, but the convergence to the
minimum may require more iterations and hence more function evaluations.
On the other hand, starting with a small hypercube may lead to premature
convergence on a suboptimal point, especially in the case of highly nonlinear
functions. Even with a large initial hypercube, the algorithm does not
guarantee convergence to a local or global optimal solution. It is worth noting
here that the reduction of the size parameter (∆i ) by a factor of two in Step 4
is not always necessary. A smaller or larger reduction can be used. However,
a smaller reduction (a factor smaller than two and greater than one) is usually
recommended for a better convergence.
Multivariable Optimization Algorithms 95
In the simplex search method, the number of points in the initial simplex is
much less compared to that in Box’s evolutionary optimization method. This
reduces the number of function evaluations required in each iteration. With
N variables only (N + 1) points are used in the initial simplex. Even though
some guidelines are suggested to choose the initial simplex (Reklaitis et al.,
1983), it should be kept in mind that the points chosen for the initial simplex
should not form a zero-volume N -dimensional hypercube. Thus, in a function
with two variables, the chosen three points in the simplex should not lie along
a line. Similarly, in a function with three variables, four points in the initial
simplex should not lie on a plane.
At each iteration, the worst point in the simplex is found first. Then, a new
simplex is formed from the old simplex by some fixed rules that steer the search
away from the worst point in the simplex. The extent of steering depends on
the relative function values of the simplex. Four different situations may arise
depending on the function values. The situations are depicted in Figure 3.5.
At first, the centroid (xc ) of all but the worst point is determined. Thereafter,
the worst point in the simplex is reflected about the centroid and a new point
xr is found. The reflection operation is depicted in Figure 3.5(a). If the
function value at this point is better than the best point in the simplex, the
reflection is considered to have taken the simplex to a good region in the
search space. Thus, an expansion along the direction from the centroid to
the reflected point is performed (Figure 3.5(b)). The amount of expansion is
controlled by the factor γ. On the other hand, if the function value at the
reflected point is worse than the worst point in the simplex, the reflection
is considered to have taken the simplex to a bad region in the search space.
Thus, a contraction in the direction from the centroid to the reflected point is
made (Figure 3.5(c)). The amount of contraction is controlled by a factor β
(a negative value of β is used). Finally, if the function value at the reflected
point is better than the worst and worse than the next-to-worst point in the
simplex, a contraction is made with a positive β value (Figure 3.5(d)). The
default scenario is the reflected point itself. The obtained new point replaces
the worst point in the simplex and the algorithm continues with the new
simplex. This algorithm was originally proposed by Spendley, et al. (1962)
and later modified by Nelder and Mead (1965).
96 Optimization for Engineering Design: Algorithms and Examples
Algorithm
Step 1 Choose γ > 1, β ∈ (0, 1), and a termination parameter ϵ. Create an
initial simplex3 .
Step 2 Find xh (the worst point), xl (the best point), and xg (next to the
worst point). Calculate
1 N∑ +1
xc = xi .
N i=1,i̸=h
Any other termination criteria may also be used. The performance of the
above algorithm depends in the values of β and γ. If a large value of γ or 1/β
is used, the approach to the optimum point may be faster, but the convergence
to the optimum point may be difficult. On the other hand, smaller values of γ
or 1/β may require more function evaluations to converge near the optimum
point. The recommended values for parameters are γ ≈ 2.0 and |β| ≈ 0.5.
EXERCISE 3.3.2
Consider, as an example, the Himmelblau function:
Any other initial simplex may be used, but care should be taken not to choose a
simplex with a zero hypervolume.
Multivariable Optimization Algorithms 97
Step 2 The worst point is xh = x(1) , the best point is xl = x(2) , and next
to the worst point is xg = x(3) . Thus, we calculate the centroid of x(2) and
x(3) as follows:
xc = (x(2) + x(3) )/2 = (1.5, 0.5)T .
Step 3 We compute the reflected point, xr = 2xc − xh = (3, 1)T . The
corresponding function value is f (xr ) = 10. Since f (xr ) = 10, which is less
than f (xl ) = 74, we expand the simplex to find a new point
xnew = (1 + 1.5)xc − 1.5xh = (3.75, 1.25)T .
The function value at this point is 21.440. Thus, the new simplex is
x(1) = (3.75, 1.25)T , x(2) = (2, 0)T , and x(3) = (1, 1)T . The new simplex is
shown in Figure 3.6 (x(2) , x(3) and xnew ). It is interesting to note that even
though the reflected point xr is better than the new point, the basic simplex
search algorithm does not allow this point in the new simplex.
Figure 3.6 Three iterations of the simplex search method shown on a contour plot
of Himmelblau’s function.
Step 3 The reflected point is xr = 2xc −xh = (4.75, 0.25)T and the function
value is 144.32. Since f (xr ) > f (xh ), we contract the simplex and find a new
point
xnew = (1 − 0.5)xc + 0.5xh = (1.937, 0.812)T .
The corresponding function value is 60.772. Thus, the new simplex is as
follows:
Figure 3.6 also shows the new simplex found at this iteration.
Step 4 Since Q = 94.91 > ϵ, we proceed to Step 2 with the new simplex.
x(k+1)
p = x(k) + (x(k) − x(k−1) ).
(k+1)
Step 5 Perform another exploratory move using xp as the base point.
Let the result be x(k+1) .
Step 6 Is f (x(k+1) ) < f (x(k) )? If yes, go to Step 4;
Else go to Step 3.
100 Optimization for Engineering Design: Algorithms and Examples
EXERCISE 3.3.3
Consider the Himmelblau function again:
Step 1 Let us choose the initial point to be x(0) = (0, 0)T , the increment
vector ∆ = (0.5, 0.5)T , and reduction factor α = 2. We also set a termination
parameter ϵ = 10−3 and an iteration counter k = 0.
Step 2 We perform an iteration of exploratory move using x(0) as the base
point. Thus, we set x = xc = (0, 0)T and i = 1. The steps of the exploratory
move are worked out in the following:
Step 1 We first explore the vicinity of the variable x1 . We calculate the
function values at three points:
(0) (0)
(x1 + ∆1 , x2 )T = (0.5, 0.0)T , f + = f ((0.5, 0)T ) = 157.81,
Step 1 Note that at this point, the base point is (0.5, 0)T . We explore
variable x2 at this point and calculate function values at the following three
points:
f + = f ((0.5, 0.5)T ) = 144.12,
Step 2 Here, fmin = 144.12 and the corresponding point is x = (0.5, 0.5)T .
Step 3 At this step i = 2 and we move to Step 4 of the exploratory move.
Step 4 Since x ̸= xc , the exploratory move is a success and we set
x = (0.5, 0.5)T .
Since the exploratory move is a success, we set x(1) = x = (0.5, 0.5)T and
move to Step 4. The successful points in the exploratory move of the Hooke-
Jeeves algorithm are marked with a filled circle in Figure 3.7.
x(2)
p =x
(1)
+ (x(1) − x(0) ) = 2(0.5, 0.5)T − (0, 0)T = (1, 1)T .
(2)
Step 5 We perform another exploratory move using xp as the base point.
After performing the exploratory move as before, we observe that the search
102 Optimization for Engineering Design: Algorithms and Examples
is a success and the new point is x = (1.5, 1.5)T . We set the new point
x(2) = x = (1.5, 1.5)T .
Step 6 We observe that f (x(2) ) = 63.12, which is smaller than f (x(1) ) =
144.12. Thus, we proceed to Step 4 to perform another pattern move. This
completes one iteration of the Hooke-Jeeves method.
Step 4 We set k = 2 and create a new point
x(3)
p = 2x
(2)
− x(1) = (2.5, 2.5)T .
It is interesting to note here that since x(2) is better than x(1) , a jump along
the direction (x(2) − x(1) ) is made. This jump takes the search closer towards
the true minimum.
Step 5 We perform another exploratory move locally to find out if there is
any better point around the new point. After performing an exploratory move
on both variables, it is found that the search is a success and the new point is
x(3) = (3.0, 2.0)T . This new point is incidentally the true minimum point. In
this example, the chosen initial point and the increment vector happen to be
such that in two iterations of the Hooke-Jeeves algorithm the minimum point
is obtained. But, in general, more iterations may be required. An interesting
point to note that even though the minimum point is found in two iterations,
the algorithm has no way of knowing whether the optimum is reached or not.
The algorithm proceeds until the norm of the increment vector ∆ is small.
We continue to perform two more iterations to see how the algorithm may
finally terminate at the minimum point.
Step 6 Calculating the function value at the new point, we observe that
f (x(3) ) = 0 < f (x(2) ) = 63.12. Thus, we move to Step 4.
(4)
Step 4 The iteration counter k = 3 and the new point is xp = 2x(3) −x(2) =
(4.5, 2.5)T .
(4)
Step 5 By performing an exploratory move with xp as the base point,
we find that the search is a success and x = (4.0, 2.0)T . Thus, we set
x(4) = (4.0, 2.0)T .
Step 6 The function value at this point is f (x(4) ) = 50, which is larger than
f (x(3) ) = 0. Thus, we move to Step 3.
Step 3 Since ∥∆∥ = 0.5 ̸< ϵ, we reduce the increment vector
Step 3 Since ∥∆∥ is not small, we reduce the increment vector by α again
and move to Step 2. The new increment vector is ∆ = (0.125, 0.125)T .
The algorithm now continues with Steps 2 and 3 until ∥∆∥ is smaller than
the termination factor ϵ. Thus, the final solution is x∗ = (3.0, 2.0)T with a
function value f (x∗ ) = 0. Figure 3.7 shows the intermediate points obtained
using the Hooke-Jeeves pattern search algorithm.
The conjugate direction method is probably the most successful and popular
direct search method used in many engineering optimization problems. It
uses a history of previous solutions to create new search directions. Unlike
previous methods, this method has a convergence proof for quadratic objective
functions, even though many non-quadratic functions have been successfully
solved using this method.
The basic idea is to create a set of N linearly independent search directions
and perform a series of unidirectional searches along each of these search
directions, starting each time from the previous best point. This procedure
guarantees to find the minimum of a quadratic function by one pass of N
unidirectional searches along each search direction. In other functions, more
than one pass of N unidirectional searches are necessary. The algorithm is
designed on the basis of solving a quadratic function and has the following
property.
Parallel subspace property
Given a quadratic function q(x) = A + B T x + 12 xTCx of two variables (where
A is a scalar quantity, B is a vector, and C is a 2 × 2 matrix), two arbitrary
but distinct points x(1) and x(2) , and a direction d.
If y (1) is the solution to the problem
minimize q(x(1) + λd)
then the direction (y (2) −y (1) ) is conjugate to d or, in other words, the quantity
(y (2) − y (1) )TCd is zero.
Thus, if two arbitrary points x(1) and x(2) and an arbitrary search
direction d are chosen, two unidirectional searches, one from each point will
create two points y (1) and y (2) . For quadratic functions, we can say that the
minimum of the function lies on the line joining the points y (1) and y (2) , as
depicted in Figure 3.8(a). The vector (y (2) − y (1) ) forms a conjugate direction
with the original direction vector d.
Instead of using two points (x(1) and x(2) ) and a direction vector (d) to
create one pair of conjugate directions, one point (x(1) ) and both coordinate
104 Optimization for Engineering Design: Algorithms and Examples
Figure 3.8 An illustration of the parallel subspace property with two arbitrary
points and an arbitrary search direction in (a). The same can also be
achieved from one point and two coordinate points in (b).
directions ((1, 0)T and (0, 1)T ) can be used to create a pair of conjugate
directions (d and (y (2) − y (1) )) (Figure 3.8(b)). The point y (1) is obtained by
performing a unidirectional search along (1, 0)T from the point x(1) . Then,
the point x(2) is obtained by performing a unidirectional search along (0, 1)T
from y (1) and finally the point y (2) is found by a unidirectional search along
the direction (1, 0)T from the point x(2) . By comparing Figures 3.8(a) and
3.8(b), we notice that both figures follow the parallel subspace property. The
former approach requires two unidirectional searches to find a pair of conjugate
directions, whereas the latter approach requires three unidirectional searches.
For a quadratic function, the minimum lies in the direction (y (2) − y (1) ),
but for higher-order polynomials the true minimum may not lie in the above
direction. Thus, in the case of quadratic function, four unidirectional searches
will find the minimum point and in higher-order polynomials more than four
unidirectional searches may be necessary. In the latter case, a few iterations
of this procedure are required to find the true minimum point. This concept
of parallel subspace property can also be extended to higher dimensions.
e(1) = (1, 0, . . . , 0)T ). Thereafter the point y (2) is found after searches in N
coordinate directions (e(i) , i = 2, 3, . . . , N, 1) with e(1) being the final search
direction. Then the vector (y (2) − y (1) ) is conjugate to the search direction
e(1) . The coordinate direction e(N ) can now be replaced with the new search
direction (y (2) − y (1) ) and the same procedure can be followed starting from
e(2) . With this property, we now present the algorithm:
Algorithm
Step 1 Choose a starting point x(0) and a set of N linearly independent
directions; possibly s(i) = e(i) for i = 1, 2, . . . , N .
Step 2 Minimize along N unidirectional search directions using the previous
minimum point to begin the next search. Begin with the search direction s(1) .
and end with s(N ) . Thereafter, perform another unidirectional search along
s(1) .
Step 3 Form a new conjugate direction d using the extended parallel
subspace property.
Step 4 If ∥d∥ is small or search directions are linearly dependent,
Terminate;
Else replace s(j) = s(j−1) for all j = N, N − 1, . . . , 2. Set s(1) = d/∥d∥ and go
to Step 2.
If the function is quadratic, exactly (N − 1) loops through Steps 2
to 4 is required. Since in every iteration of the above algorithm exactly
(N + 1) unidirectional searches are necessary, a total of (N − 1) × (N + 1) or
(N 2 − 1) unidirectional searches are necessary to find N conjugate directions.
Thereafter, one final unidirectional search is necessary to obtain the minimum
point. Thus, in order to find the minimum of a quadratic objective function,
the conjugate direction method requires a total of N 2 unidirectional searches.
For other functions, more loops of the above algorithm may be required. One
difficulty with this algorithm is that since unidirectional searches are carried
out numerically by using a single-variable search method, the computation of
the minimum for unidirectional searches may not be exact. Thus, the resulting
directions may not be exactly conjugate to each other. To calculate the
extent of deviation, linear independence of the conjugate directions is usually
checked. If the search directions are not found to be linearly independent,
a completely new set of search directions (possibly conjugate to each other)
may be created at the current point. To make the implementation simpler,
the coordinate directions can be used again as search directions at the current
point.
EXERCISE 3.3.4
Consider again the Himmelblau function:
Step 1 We begin with a point x(0) = (0, 4)T . We assume initial search
directions as s(1) = (1, 0)T and s(2) = (0, 1)T .
Step 2 We first find the minimum point along the search direction s(1) .
Any point along that direction can be written as xp = x(0) + αs(1) , where α
is a scalar quantity expressing the distance of the point xp from x(0) . Thus,
the point xp can be written as xp = (α, 4)T . Now the two-variable function
f (x1 , x2 ) can be expressed in terms of one variable α as
which represents the function value of any point along the direction s(1)
and passing through x(0) . Since we are looking for the point for which
the function value is minimum, we may differentiate the above expression
with respect to α and equate to zero. But in any arbitrary problem, it
may not be possible to write an explicit expression of the single-variable
function F (α) and differentiate. In those cases, the function F (α) can be
obtained by substituting each variable xi by xpi . Thereafter, any single-
variable optimization methods, as described in Chapter 2, can be used to find
the minimum point. The first task is to bracket the minimum and then the
subsequent task is to find the minimum point. Here, we could have found the
exact minimum solution by differentiating the single-variable function F (α)
with respect to α and then equating the term to zero, but we follow the more
generic procedure of numerical differentiation, a method which will be used
in many real-world optimization problems. Using the bounding phase method
in the above problem we find that the minimum is bracketed in the interval
(1, 4) and using the golden section search we obtain the minimum α∗ = 2.083
with three decimal places of accuracy4 . Thus, x(1) = (2.083, 4.000)T . The
above procedure of obtaining the best point along a search direction is also
described in Section 3.2.
Similarly, we find the minimum point along the second search direction
s(2) from the point x(1) . A general point on that line is
The optimum point found using a combined application of the bounding phase
and the golden section search method is α∗ = −1.592 and the corresponding
point is x(2) = (2.083, 2.408)T .
From the point x(2) , we perform a final unidirectional search along the first
search direction s(1) and obtain the minimum point x(3) = (2.881, 2.408)T .
Step 3 According to the parallel subspace property, we find the new
conjugate direction
Step 4 The magnitude of search vector d is not small. Thus, the new
conjugate search directions are
s(2) = (1, 0)T ,
= (0.448, −0.894)T .
Step 4 The new pair of conjugate search directions are s(1) = (0.448, −0.894)T
and s(2) = (0.055, −0.039)T , respectively. The search direction d (before nor-
malizing) may be considered to be small and therefore the algorithm may
terminate.
We observe that in one iteration of Step 2, (N + 1) unidirectional searches
are necessary. Thus, computationally this method may be expensive. In terms
of the storage requirement, the algorithm has to store (N + 1) points and N
search directions at any stage of the iteration.
∂f (x) ( )/( )
(t) (t) (t) (t) (t)
= f (xi +∆xi ) − f (xi −∆xi ) 2∆xi , (3.4)
∂xi x(t)
∂ 2 f (x) ( )/( )2
(t) (t) (t) (t) (t) (t)
= f (x i +∆x i ) − 2f (x ) + f (x i −∆x i ) ∆x i ,
∂ 2 x2i x(t)
(3.5)
∂ 2 f (x) [
(t) (t) (t) (t) (t) (t) (t) (t)
= f (xi +∆xi , xj +∆xj ) − f (xi +∆xi , xj −∆xj )
∂xi ∂xj x(t)
(t) (t) (t) (t)
− f (xi −∆xi , xj +∆xj )
]/( )
(t) (t) (t) (t) (t) (t)
+ f (xi −∆xi , xj −∆xj ) 4∆xi ∆xj . (3.6)
Multivariable Optimization Algorithms 109
(t) (t)
The quantity f (xi + ∆xi ) represents the function value at the point
(t) (t) (t) (t)
(x1 , . . . , xi +∆xi , . . . , xN )T , a point obtained by perturbing the variable
(t) (t) (t) (t)
xi only. The quantity f (xi + ∆xi , xj + ∆xj ) represents the function
(t) (t) (t) (t) (t) (t)
value at the point (x1 , . . . , xi + ∆xi , . . . , xj + ∆xj , . . . , xN )T , a point
obtained by perturbing the variables xi and xj only.
The computation of the first derivative with respect to each variable
requires two function evaluations, thus totaling 2N function evaluations
for the complete first derivative vector. The computation of the second
derivative ∂ 2f /∂x2i requires three function evaluations, but the second-order
partial derivative ∂ 2f /(∂xi ∂xj ) requires four function evaluations as given in
Equation (3.6). Thus, the computation of Hessian matrix requires (2N 2 + 1)
function evaluations (assuming the symmetry of the matrix). For example, in
two-variable functions the first and the second derivatives are computed as
∂f (x) ( )/( )
(t) (1) (t) (1) (t)
= f (x1 +∆x1 , x(2) ) − f (x1 −∆x1 , x(2) ) 2∆x1 ,
∂x1 x(t)
(3.7)
∂ 2 f (x) [
(t) (t) (t) (t) (t)
= f (x1 +∆x1 , x2 ) − 2f (x1 , x2 )
∂x21 x(t)
]/( )2
(t) (t) (t) (t)
+ f (x1 −∆x1 , x2 ) ∆x1 , (3.8)
∂ 2 f (x) [
(t) (t) (t) (t) (t) (t) (t) (t)
= f (x1 +∆x1 , x2 +∆x2 ) − f (x1 +∆x1 , x2 −∆x2 )
∂x1 ∂x2 x(t)
(t) (t) (t) (t)
− f (x1 −∆x1 , x2 +∆x2 )
]/( )
(t) (t) (t) (t) (t) (t)
+ f (x1 −∆x1 , x2 −∆x2 ) 4∆x1 ∆x2 . (3.9)
f (x(t+1) ) < f (x(t) ) and expanding the expression f (x(t+1) ) in Taylor’s series
up to the linear term:
The magnitude of the vector ∇f (x(t) ) · d(t) for a descent direction d(t) specifies
(t)
how descent the search direction is. For example, if d2 = −∇f (x(t) ) is used,
the quantity ∇f (x(t) ) · d(t) is maximally negative. Thus, the search direction
−∇f (x(t) ) is called the steepest descent direction.
EXERCISE 3.4.1
We take an exercise problem to illustrate this concept. Let us consider
Himmelblau’s function again:
We would like to determine whether the direction d(t) = (1, 0)T at the point
x(t) = (1, 1)T is a descent direction or not. The point and the direction
are shown on a contour plot in Figure 3.11. It is clear from the figure that
moving locally along d(t) from the point x(t) will reduce the function value.
We investigate this aspect by calculating the derivative ∇f (x(t) ) at the point.
The derivative, as calculated numerically, is ∇f (x(t) ) = (−46, −38)T . Taking
the dot product between ∇f (x(t) ) and d(t) , we obtain
( )
T 1
∇f (x(t) ) d(t) = (−46, −38) = −46,
0
which is a negative quantity. Thus, the search direction d(t) = (1, 0)T is a
descent direction. The amount of nonnegativity suggests the extent of descent
Multivariable Optimization Algorithms 111
Figure 3.11 The direction (1, 0)T is a descent direction, whereas the direction
−∇f (x(t) ) is the steepest descent direction.
in the direction. If the search direction d(t) = −∇f (x(t) ) = (46, 38)T is used,
the magnitude of the above dot product becomes
( )
46
(−46, −38)T = −3, 560.
38
Thus, the direction (46, 38)T or (0.771, 0.637)T is more descent than d(t) . In
fact, it can be proved that the above direction (0.771, 0.637)T is the steepest
descent direction at the point x(t) , as shown in Figure 3.11. It is noteworthy
that for any nonlinear function, the steepest descent direction at any point
may not exactly pass through the true minimum point. The steepest descent
direction is a direction which is a local best direction. It is not guaranteed
that moving along the steepest descent direction will always take the search
closer to the true minimum point. We shall discuss more about this aspect in
Chapter 6.
Most gradient-based methods work by searching along several directions
iteratively. The algorithms vary according to the way the search directions
are defined. Along each direction, s(k) , a unidirectional search is performed
to locate the minimum. This is performed by first writing a representative
point along the direction s(t) as follows:
where α(k) is the step length. Since x(k) and s(k) are known, the point x(k+1)
can be expressed with only one variable α(k) . Thus, a unidirectional search on
α(k) can be performed and the new point x(k+1) can be obtained. Thereafter,
the search is continued from the new point along another search direction
112 Optimization for Engineering Design: Algorithms and Examples
s(k+1) . This process continues until the search converges to a local minimum
point. If a gradient-based search method is used for unidirectional search, the
search can be terminated using the following procedure. By differentiating the
expression f (x(k+1) ) = f (x(k) + αs(k) ) with respect to α and satisfying the
optimality criterion, it can be shown that the minimum of the unidirectional
search occurs when
∇f (x(k+1) ) · s(k) = 0.
The above criterion can be used to check the termination of the unidirectional
search method.
The search direction used in Cauchy’s method is the negative of the gradient
at any particular point x(t) :
Since the direction s(k) = −∇f (x(k) ) is a descent direction, the function
value f (x(k+1) ) is always smaller than f (x(k) ) for positive values of α(k) .
Multivariable Optimization Algorithms 113
Cauchy’s method works well when x(0) is far away from x∗ . When the current
point is very close to the minimum, the change in the gradient vector is small.
Thus, the new point created by the unidirectional search is also close to the
current point. This slows the convergence process near the true minimum.
Convergence can be made faster by using the second-order derivatives, a
method which is discussed in Section 3.4.2.
EXERCISE 3.4.2
Consider the Himmelblau function:
Minimize f (x1 , x2 ) = (x21 + x2 − 11)2 + (x1 + x22 − 7)2 .
Step 1 In order to ensure proper convergence, a large value of M (≈ 100)
is usually chosen. The choice of M also depends on the available time and
computing resource. Let us choose M = 100, an initial point x(0) = (0, 0)T ,
and termination parameters ϵ1 = ϵ2 = 10−3 . We also set k = 0.
Step 2 The derivative at x(0) is first calculated using Equation (3.4) and
found to be (−14, −22)T , which is identical to the exact derivative at that
point (Figure 3.12).
Using the golden section search (Section 2.3.3) in the interval5 (0, 1), we
∗
obtain α(0) = 0.127. Thus, the minimum point along the search direction
is x(1) = (1.788, 2.810)T .
Step 5 Since, x(1) and x(0) are quite different, we do not terminate; rather
we move back to Step 2. This completes one iteration of Cauchy’s method.
The total number of function evaluations required in this iteration is equal
to 30.
Step 2 The derivative vector at this point, computed numerically, is
(−30.707, 18.803)T .
Step 3 This magnitude of the derivative vector is not smaller than ϵ1 . Thus,
we continue with Step 4.
Step 4 Another unidirectional search along (30.707, −18.803)T from the
point x(1) = (1.788, 2.810)T using the golden section search finds the new
point x(2) = (3.008, 1.999)T with a function value equal to 0.018.
We continue this process until one of the termination criteria is satisfied.
The progress of the algorithm is shown in Figure 3.12. At the end of this
chapter, we present a FORTRAN code implementing the steepest descent
algorithm. The code uses the bounding phase and the golden section search
methods for performing the unidirectional search.
is used. It can also be shown that if the matrix [∇2f (x(k) )]−1 is positive-
semidefinite, the direction s(k) must be descent. But if the matrix
[∇2f (x(k) )]−1 is not positive-semidefinite, the direction s(k) may or may not be
T
descent, depending on whether the quantity ∇f (x(k) ) [∇2f (x(k) )]−1 ∇f (x(k) )
is positive or not. Thus, the above search direction may not always guarantee
a decrease in the function value in the vicinity of the current point. But the
second-order optimality condition suggests that ∇2f (x∗ ) be positive-definite
(Section 3.1) for the minimum point. Thus, it can be assumed that the matrix
∇2f (x∗ ) is positive-definite in the vicinity of the minimum point and the
above search direction becomes descent near the minimum point. Thus, this
5
Ideally, the bounding phase method should be used to bracket the minimum.
Thereafter, the golden section search method may be used within the obtained
interval.
Multivariable Optimization Algorithms 115
method is suitable and efficient when the initial point is close to the optimum
point. Since the function value is not guaranteed to reduce at every iteration,
occasional restart of the algorithm from a different point is often necessary.
Algorithm
The algorithm is the same as Cauchy’s method except that Step 4 is modified
as follows:
Step 4 Perform a unidirectional search to find α(k) using ϵ2 such that
[ ]−1
f (x(k+1) ) = f (x(k) − α(k) ∇2f (x(k) ) ∇f (x(k) )) is minimum.
EXERCISE 3.4.3
We choose the Himmelblau function:
using Equations (3.8) and (3.9). At the point x(0) , the Hessian matrix and
the corresponding inverse are given below:
( )
2 (0) −42.114 0
∇ f (x ) = ,
0 −25.940
( )
2 (0) −1 1 −25.940 0
(∇ f (x )) = .
1092.4 0 −42.114
Since one of the two principal determinants of the above matrix is negative, the
matrix is not positive-semidefinite. Thus, we are not sure whether the search
along the direction s(0) will improve the objective function value or not. To be
T
sure, we may compute the quantity ∇f (x(0) ) [∇2f (x(0) )]−1 ∇f (x(0) ) and find
its value to be −23.274, which is a negative quantity. Thus, we can conclude
that the search direction s(0) is non-descent. Nevertheless, we carry on with
the steps of the algorithm and investigate the descent property of the search
direction.
116 Optimization for Engineering Design: Algorithms and Examples
Using the derivative and inverse Hessian matrix, we can compute the
search direction. Along that direction, we specify any point as follows:
[ ]−1
x(1) = x(0) − α(0) ∇2f (x(0) ) ∇f (x(0) ),
( ) ( )( )
0 (0) 1 −25.940 0 −14
= −α ,
0 1092.4 0 −42.114 −22
( )
−0.333α(0)
= .
−0.846α(0)
∗
Performing a unidirectional search along this direction, we obtain α(0) =
−3.349. Since this quantity is negative, the function value does not reduce in
the given search direction. Instead, the function value reduces in the opposite
direction (refer to Figure 3.13). This demonstrates that the search direction
Figure 3.13 Two iterations of Newton’s method. When the initial point (0, 0)T is
chosen, the search direction is not descent.
in Newton’s method may not always be descent. When this happens, the
algorithm is usually restarted with a new point. Thus, we try with another
initial point.
Step 1 We choose an initial point x(0) = (2, 1)T . The function value at this
point is f (x(0) ) = 52.
Step 3 Since the termination criteria are not met at this point, we move to
Step 4.
Multivariable Optimization Algorithms 117
Calculating the inverse of this matrix, we write a generic point along the new
search direction as follows:
[ ]−1
x(1) = x(0) − α(0) ∇2f (x(0) ) ∇f (x(0) ),
( ) ( )( )
2 (0) 1 −6.027 −11.940 −56
= −α ,
1 −202.8 −11.940 9.994 −28
( )
2 + 3.313α(0)
= .
1 + 1.917α(0)
A unidirectional search along this direction reveals that the minimum occurs
∗
at α(0) = 0.342. Since this quantity is positive, we accept this search
direction and calculate the new point: x(1) = (3.134, 1.656)T . The function
value at this point is 1.491, which is substantially smaller than the initial
function value. The search direction s(0) and the resulting point x(1) are
shown in Figure 3.13.
Step 5 Assuming that we do not terminate the algorithm at this iteration,
we increment the iteration counter to k = 1 and proceed to Step 2. This
completes one iteration of Newton’s method.
Step 2 The derivative vector (computed numerically) at this point is
∇f (x(1) ) = (3.757, −6.485)T .
Step 3 The magnitude of the derivative vector is not small. Thus, we move
to Step 4 to find a new search direction.
Step 4 The Hessian matrix at this point is computed numerically using
Equations (3.8) and (3.9). Any generic point along the search direction is
calculated as follows:
[ ]−1
x(2) = x(1) − α(1) ∇2f (x(1) ) ∇f (x(1) ),
( ) ( )( )
3.134 (1) 1 19.441 −19.158 3.757
= −α ,
1.656 1236.6 −19.158 82.488 −6.485
( )
3.134 − 0.160α(1)
= .
1.656 + 0.491α(1)
∗
The minimum point is α(1) = 0.7072 and the corresponding point is
x(2) = (3.021, 2.003)T with a function value f (x(2) ) = 0.0178.
118 Optimization for Engineering Design: Algorithms and Examples
We continue to compute one more iteration of this algorithm and find the
point x∗ = (3.000, 2.000)T (up to three decimal places of accuracy) with the
function value f (x(3) ) = 0. As discussed earlier, this method is effective for
initial points close to the optimum. Therefore, this demands some knowledge
of the optimum point in the search space. Moreover, the computations of the
Hessian matrix and its inverse are also computationally expensive.
Cauchy’s method works well when the initial point is far away from the
minimum point and Newton’s method works well when the initial point is near
the minimum point. In any given problem, it is usually not known whether the
chosen initial point is away from the minimum or close to the minimum, but
wherever be the minimum point, a method can be devised to take advantage
of both these methods. In Marquardt’s method, Cauchy’s method is initially
followed. Thereafter, Newton’s method is adopted. The transition from
Cauchy’s method to Newton’s method is adaptive and depends on the history
of the obtained intermediate solutions, as outlined in the following algorithm.
Algorithm
Step 1 Choose a starting point, x(0) , the maximum number of iterations,
M , and a termination parameter, ϵ. Set k = 0 and λ(0) = 104 (a large
number).
A large value of the parameter λ is used initially. Thus, the Hessian matrix
has little effect on the determination of the search direction (see Step 4 of
the above algorithm). Initially, the search is similar to that in Cauchy’s
method. After a number of iterations (when hopefully the current solution
has converged close to the minimum), the value of λ becomes small and the
effect is more like that in Newton’s method. The algorithm can be made faster
by performing a unidirectional search while finding the new point in Step 4:
Multivariable Optimization Algorithms 119
x(k+1) = x(k) + α(k) s(x(k) ). Since the computations of the Hessian matrix
and its inverse are computationally expensive, the unidirectional search along
s(k) is not usually performed. For simpler objective functions, however, a
unidirectional search in Step 4 can be achieved to find the new point x(k+1) .
EXERCISE 3.4.4
We consider once again the Himmelblau function:
Step 1 We begin with a point x(0) = (0, 0)T , M = 100, and termination
factor 10−3 . We set a counter k = 0 and a parameter λ(0) = 100. Intentionally,
a smaller value of α(0) is chosen to show the progress of the algorithm in a few
iterations. But in practice, a larger value must be chosen to ensure a smooth
transition from Cauchy’s to Newton’s method.
Step 2 The derivative at this point is ∇f (x(0) ) = (−14, −22)T .
Step 3 Since the derivative is not small and k = 0, we move to Step 4.
Step 4 In order to calculate the search direction, we need to compute the
Hessian matrix. We have already computed the Hessian matrix at the point
(0, 0)T in the previous exercise. Using that result, we compute the search
direction as follows:
Step 5 The function value at this point is f (x(1) ) = 157.79, which is smaller
than that at x(0) , (recall that f (x(0) ) = 170). Thus, we move to Step 6.
Step 6 We set a new λ(1) = 100/2 = 50. This has an effect of switching
from Cauchy’s to Newton’s method. We set k = 1 and go to Step 2. This
completes one iteration of Marquardt’s algorithm. The point x(1) is shown in
Figure 3.14.
Step 2 The derivative at this point is ∇f (x(1) ) = (−23.645, −29.213)T .
120 Optimization for Engineering Design: Algorithms and Examples
Figure 3.14 Three iterations of Marquardt’s method. The deviation from the
Cauchy’s method is highlighted.
Step 3 Since the termination criteria are not met, we move to Step 4.
Step 4 At this step, a new search direction is calculated by computing the
Hessian matrix at the current point x(1) : s(1) = (2.738, 1.749)T . Thus, the
next point is
x(2) = x(1) + s(1) = (2.980, 2.046)T
with a function value f (x(2) ) = 0.033.
Step 5 Since this point is found to be better than x(1) , we decrease the
parameter λ further: λ(2) = 25.
The process continues until one of the termination criteria is satisfied.
After one more iteration, the new point is found to be x(3) = (2.994, 2.005)T ,
having a function value equal to 0.001. Figure 3.14 shows how Marquardt’s
algorithm converges the search near the optimum point. The difference
between Cauchy’s method and Marquardt’s method is also shown in the same
figure. One difficulty with Marquardt’s method is the need for computing the
Hessian matrix at every iteration. In the following subsections, we present two
algorithms that use only the first-order derivatives to find a search direction.
scope of this book. Interested readers may refer to Reklaitis et al. (1983) or
Rao (1984). Fletcher and Reeves (1964) suggested the following conjugate
search directions and proved that s(k) is conjugate to all previous search
directions s(i) for i = 1, 2, . . . , (k − 1):
with s(0) = −∇f (x(0) ). Note that this recursive equation for search direction
s(k) requires only first-order derivatives at two points x(k) and x(k−1) . The
initial search direction s(0) is assumed to be the steepest descent direction at
the initial point. Thereafter, the subsequent search directions are found by
using the above recursive equation.
Algorithm
Step 1 Choose x(0) and termination parameters ϵ1 , ϵ2 , ϵ3 .
Step 2 Find ∇f (x(0) ) and set s(0) = −∇f (x(0) ).
Step 3 Find λ(0) such that f (x(0) + λ(0) s(0) ) is minimum with termination
∗
parameter ϵ1 . Set x(1) = x(0) + λ(0) s(0) and k = 1. Calculate ∇f (x(1) ).
Step 4 Set
∥∇f (x(k) )∥2 (k−1)
s(k) = −∇f (x(k) ) + s .
∥∇f (x(k−1) )∥2
Step 5 Find λ(k) such that f (x(k) + λ(k) s(k) ) is minimum with termination
∗
parameter ϵ1 . Set x(k+1) = x(k) + λ(k) s(k) .
(k+1) (k)
∥(x −x )∥
Step 6 Is (k) ≤ ϵ2 or ∥∇f (x(k+1) )∥ ≤ ϵ3 ?
∥x ∥
If yes, Terminate;
Else set k = k + 1 and go to Step 4.
EXERCISE 3.4.5
Consider the Himmelblau function:
Minimize f (x1 , x2 ) = (x21 + x2 − 11)2 + (x1 + x22 − 7)2 .
Step 1 Let us choose the initial point to be x(0) = (0, 0)T . All termination
parameters are chosen as 10−3 .
Step 2 The derivative vector at x(0) , calculated numerically, is equal to
(−14, −22)T . Thus, the initial search direction is s(0) = −∇f (x(0) ) = (14, 22)T .
Step 3 A unidirectional search from x(0) along s(0) using the golden section
search finds the minimum point x(1) = (1.788, 2.810)T . We set the iteration
counter, k = 1. The derivative vector at the new point is ∇f (x(1) ) =
(−30.707, 18.803)T . Once the new point is found, we have to create a new
search direction conjugate to s(0) . We find that search direction in the next
step.
Step 4 A new search direction s(1) is calculated using Equation (3.11):
∥∇f (x(1) )∥2 (0)
s(1) = −∇f (x(1) ) + s ,
∥∇f (x(0) )∥2
( ) [ ]( )
−30.707 (−30.707)2 + (18.803)2 14
=− + ,
18.803 (−14)2 + (−22)2 22
( )
57.399
= .
23.142
The unit vector along this search direction is (0.927, 0.374)T . In order to check
the linear independence of the search directions s(0) and s(1) , let us compute
the angle between them:
[ (0) ] [ ( )]
(1) 0.927
s s
cos−1 · = cos−1 (0.537, 0.844) ,
∥s(0) ∥ ∥s(1) ∥ 0.374
which is not close to zero. Thus, the two vectors s(0) and s(1) are not linearly
dependent. This can also be observed in Figure 3.15.
Step 5 Performing another unidirectional search along s(1) from x(1) , we
find the minimum x(2) = (2.260, 2.988)T .
Step 6 The quantities
are both greater than the chosen termination factors. Thus, we increment the
iteration counter k to 2 and move to Step 4. This step completes one iteration
of the Fletcher-Reeves method. We observe that in the first iteration, the
function value has improved from an initial value of 170.0 to 25.976. Points
x(1) and x(2) are shown in the contour plot in Figure 3.15.
Step 4 In the beginning of the second iteration, we first create a search
direction s(2) which should be linearly independent to the direction s(1) . Using
Equation (3.11), we find the new direction:
( ) ( ) ( )
−17.875 57.399 118.746
s(2) = − + 1.757 = .
44.259) 23.142 −3.590
We observe that the angle between s(1) and s(2) is 23.7◦ . The new direction
is shown in Figure 3.15.
Figure 3.15 Several iterations of the conjugate gradient method. The convergence
to the minimum is slow after a few iterations because of the linear
dependence of the search directions.
Step 5 In order to find the best point along s(2) , we perform another
unidirectional search and find the new point x(3) = (2.708, 2.974)T with a
function value f (x(3) ) = 21.207. Note that the point x(3) is better than the
point x(2) .
The second iteration will be complete after checking the termination
criteria. In the third iteration, another search direction s(3) is computed and
the angle between directions s(2) and s(3) is found to be 20◦ . A unidirectional
search finds a new point x(4) = (3.014, 2.852)T with a function value
124 Optimization for Engineering Design: Algorithms and Examples
f (x(4) ) = 18.082. The algorithm continues this way until the termination
criteria are met. We continue simulating the algorithm for another five
iterations and find the point x(9) = (3.317, 1.312)T having a function value
equal to 5.576.
As seen from Figure 3.15, after a few iterations, search directions
tend to become less independent from each other. This causes many
iterations of the above algorithm to converge to the optimum solution.
If we restart the algorithm at the point x(3) , then we reassign
x(0) = (2.708, 2.974)T and move to Step 2. The search direction at this point
is s(0) = −∇f (x(0) ) = (−1.610, −52.784)T . A unidirectional search along this
direction finds the new point (2.684, 2.179)T . After one more search, we obtain
the point (2.992, 2.072)T with a function value equal to 0.082, which is much
better than even the point x(9) observed without restart. Figure 3.15 also
shows how this restart at the point x(3) improves the search process (with
dashed lines).
We have discussed earlier that once the current solution is close to the
minimum point of a function, Newton’s method is very efficient. But one
difficulty with that method is the computation of the inverse of the Hessian
matrix. In variable-metric methods, an estimate of the inverse of the Hessian
matrix at the minimum point is obtained by iteratively using first-order
derivatives. This eliminates expensive computation of the Hessian matrix
and its inverse. We replace the inverse of the Hessian matrix by a matrix A(k)
at iteration k and have a search direction
In the above equation, the quantity e(x(k) ) is the gradient of the function at
point x(k) , or e(x(k) ) = ∇f (x(k) ). The change in the design variable vector is
denoted by
∆x(k−1) = x(k) − x(k−1)
Multivariable Optimization Algorithms 125
In the DFP method, the modification of the matrix, using Equation (3.13)
preserves the symmetry and the positive-definiteness of the matrix. This
property makes the DFP method attractive. Let us recall that in order
to achieve a descent direction with Newton’s method, the Hessian matrix
must be positive-semidefinite. Since an identity matrix is symmetric and
positive-definite, at every iteration the positive-definiteness of the matrix A(k)
is retained by the above transformation and the function value is guaranteed
to decrease monotonically. Thus, a monotonic improvement in function values
in every iteration is expected with the DFP method.
Algorithm
The algorithm is the same as the Fletcher-Reeves algorithm except that
the expression for the search direction s(x(k) ) in Step 4 is set according to
Equation (3.12) and the matrix A(k) is calculated using Equation (3.13).
EXERCISE 3.4.6
Let us consider the Himmelblau function again:
The inverse of the Hessian matrix at the minimum point x∗ = (3, 2)T is found
to be
( )
0.016 −0.009
H −1 (x∗ ) = . (3.14)
−0.009 0.035
Successive iterations of the DFP method transform an initial identity matrix
into the above matrix. This allows the DFP search to become similar to
Newton’s search after a few iterations. The advantage of this method over
Newton’s method is that the inverse of the Hessian matrix need not be
calculated. Thus, the algorithm has the effect of a second-order search, but
the search is achieved only with first-order derivatives.
Step 1 Once again we begin with the initial point x(0) = (0, 0)T . The
termination parameters are all set to be 10−3 .
Step 2 The derivative vector at the initial point is equal to ∇f (x(0) ) =
(−14, −22)T . The search direction is s(0) = (14, 22)T .
Step 3 A unidirectional search from x(0) along s(0) gives the minimum point
x(1) = (1.788, 2.810)T , as found in Step 3 of the Exercise 3.4.5. We calculate
the gradient at this point:
Step 4 In order to calculate the new search direction, we first compute the
parameters required to be used in Equation (3.13):
∆x(0) = x(1) − x(0) = (1.788, 2.810)T ,
Knowing A(1) , we now obtain the search direction s(1) using Equation (3.12)
as follows:
( )( ) ( )
(1) 0.894 0.411 −30.707 19.724
s =− = .
0.411 0.237 18.803 8.164
Figure 3.16 Three iterations of the DFP method shown on a contour plot of
Himmelblau’s function.
Step 6 We observe that the point x(2) is very different from the point x(1)
and calculate the gradient at x(2) : ∇f (x(2) ) = (−18.225, 44.037)T . Since
∥∇f (x(2) )∥ is not close to zero, we increment k and proceed to Step 4. This
completes one iteration of the DFP method. The initial point x(0) , the point
x(1) , and the point x(2) are shown in Figure 3.16.
Step 4 The second iteration begins with computing another search direction
s(2) = −A(2) ∇f (x(2) ). The matrix A(2) can be computed by calculating
∆x(1) and ∆e(x(1) ) as before and using Equation (3.13). By simplifying the
expression, we obtain the new matrix
( )
(2) 0.070 −0.017
A =
−0.017 0.015
Note that this direction is more descent than that obtained in the Fletcher-
Reeves method after one iteration.
Step 5 The unidirectional search along s(2) finds the best point: x(3) =
(2.995, 1.991)T with a function value equal to f (x(3) ) = 0.003.
Another iteration updates the A matrix as follows:
( )
(3) 0.030 −0.005
A = .
−0.005 0.020
128 Optimization for Engineering Design: Algorithms and Examples
and the new search direction s(4) = (0.003, −0.003)T . A unidirectional search
finds the point x(5) = (3.000, 2.000)T with a function value equal to zero. One
final iteration finds the A matrix:
( )
0.016 −0.009
A(5) = ,
−0.009 0.035
3.5 Summary
Optimization algorithms have been discussed to solve multivariable functions.
At first, optimality conditions have been presented. Thereafter, four direct
search algorithms have been discussed followed by five different gradient-based
search methods.
In multivariable function optimization, the first-order optimality condition
requires that all components of the gradient vector be zero. Any point
that satisfies this condition is a likely candidate for the minimum point.
The second-order optimality condition for minimization requires the Hessian
matrix to be positive-definite. Thus, a point is minimum if both the first and
the second-order optimality conditions are satisfied.
Among the direct search methods, the evolutionary optimization method
compares 2N (where N is the number of design variables) function values at
each iteration to find the current best point. This algorithm usually requires a
large number of function evaluations to find the minimum point. The simplex
search method also uses a number of points ((N +1) of them) at each iteration
to find one new point, which is subsequently used to replace the worst point
in the simplex. If the size of the simplex is large, this method has a tendency
to wander about the minimum. The Hooke-Jeeves pattern search method
Multivariable Optimization Algorithms 129
REFERENCES
PROBLEMS
3-1 Locate and classify the stationary points of the following functions:
(i) f (x1 , x2 ) = x21 + 2x22 − 4x1 − 2x1 x2 .
(ii) f (x1 , x2 ) = 10(x2 − x21 )2 + (1 − x1 )2 .
(iii) f (x1 , x2 , x3 ) = (x1 + 3x2 + x3 )2 + 4(x1 − x3 − 2)2 .
3-2 Find the stationary points of the function:
(i) Which of these points are local minima, which are local maxima, and
which are neither?
(ii) How many minima exist along the line joining (0, 1)T and (1, 2)T ?
Which are these points? Explain.
3-3 Consider the unconstrained function:
(i) s = (2, 1)T from the point (−5, 5)T up to the point (5, 0)T .
(ii) s = (−1, −1)T from the point (5, 5)T up to the point (−5, −5)T .
Using three iterations of the golden section search method, estimate the
minimum point along the line joining the points (−3, −4)T and (3, 2)T .
Restrict the search between the above two points.
3-5 In trying to find the maximum of the following function, the point
x(t) = (3, 2, −1)T is encountered:
Determine whether the search direction s(t) = (−1, 1, −2)T would be able to
find better solutions locally from x(t) .
3-6 Given the function:
f (x) = 10 − x1 + x1 x2 + x22 ,
an initial point x(0) = (2, 4)T , and a direction vector d(0) = (−1, −1)T , such
that any point along d(0) can be written as x(α) = x(0) + αd(0) . Using three
iterations of the interval halving method estimate the value of α in a one-
dimensional search along d(0) for which f (α) = 15. Assume αmin = 0 and
αmax = 3. Do not solve the quadratic equation to find the value of α.
3-7 In solving the following problem:
using x1 = (−1, 1)T and x2 = (1, −1)T and a search direction d = (1, −1)T , we
would like to use the conjugate direction method using the parallel subspace
property.
(i) Find the direction s which is C-conjugate to d. Show that s is C-
conjugate to d.
(ii) Continue to find the minimum solution of the above function. Verify
this solution by finding the minimum using the first and second-order
optimality conditions.
3-8 In order to minimize the unconstrained objective function
a search direction (cos θ, sin θ)T needs to be used at the point (1, 2)T . What
is the range of θ for which the resulting search direction is descent? What is
the steepest descent direction?
132 Optimization for Engineering Design: Algorithms and Examples
f (x1 , x2 ) = 10 − x1 + x1 x2 + x22 ,
use x(1) = (0, 2)T , x(2) = (0, 0)T and x(3) = (1, 1)T as the initial simplex of
three points. Complete two iterations of Nelder and Mead’s simplex search
algorithm to find the new simplex. Assume β = 0.5 and γ = 2.
3-10 Consider the following function for minimization
and a search direction s(1) = (1, −1, −1)T . Using two points (1, 0, 1)T and
(0, 1, 0)T , find a new search direction s(2) conjugate to s(1) using parallel
subspace property. Show that the search direction s(2) is conjugate to s(1)
with respect to the above function.
3-11 Find whether the given direction s at the point x is descent for the
respective functions:
(i) For f (x1 , x2 ) = 2x21 + x22 − 2x1 x2 + 4,
+ 10(x1 − x4 )4 .
Multivariable Optimization Algorithms 133
Perform two iterations of following algorithms from the point x(0) = (2, −1, 0, 1)T .
(i) Hooke-Jeeves method with ∆ = (1, 1, 1, 1)T .
(ii) Cauchy’s method.
(iii) Fletcher-Reeves method.
3-14 What are the differences between Cauchy’s and Newton’s search
methods? Determine for what values of x1 and x2 , Newton’s search is
guaranteed to be successful for the following unconstrained minimization
problem:
f (x1 , x2 ) = x31 − 4x1 x2 + x22 .
3-17 Prove that two consecutive search directions obtained in the steepest
descent search algorithm are mutually orthogonal to each other.
3-18 Solve the following equations by formulating suitable optimization
problems:
(i) 2x + y = 5, 3x − 2y = 2.
(ii) x2 − 5xy + y 3 = 2, x + 3y = 6.
(iii) z exp (x) − x2 = 10y, x2 z = 0.5, x + z = 1.
Use the computer program listed in the following to obtain the solutions with
two decimal places of accuracy.
134 Optimization for Engineering Design: Algorithms and Examples
3-19 Starting from the point (1, 1)T , perform two iterations of DFP method
to find a stationary point of the following function:
f (x1 , x2 ) = 10 − x1 + x1 x2 + x22 .
COMPUTER PROGRAM
dimension x0(10),xstar(10),gr0(10),s0(10),x1(10),
- xd(10)
write(*,*) ’enter the dimension of design vector’
read(*,*) n
write(*,*) ’enter initial vector’
read(*,*) (x0(i),i=1,n)
write(*,*) ’enter accuracy in steepest descent’
read(*,*) eps
write(*,*) ’enter accuracy in golden section’
read(*,*) epss
write(*,*) ’enter 1 for intermediate results’
read(*,*) iprint
nfun = 0
call steepest(n,eps,epss,x0,xstar,fstar,nfun,ierr,
- gr0,s0,x1,xd,iprint)
if (ierr .ne. 1) then
write(*,1) (xstar(i),i=1,n)
1 format(/,2x,’The minimum solution is : (’,
- 2(f10.4,’,’),’)’)
write(*,2) fstar
2 format(2x,’Function value: ’,1pe13.5)
write(*,3) nfun
3 format(2x,’Number of function evaluations: ’,i8)
else
write(*,*) ’Terminated due to above error.’
endif
stop
end
subroutine steepest(n,eps,epss,x0,xstar,fstar,
- nfun,ierr,grad0,s0,x1,xdummy,ip)
c.....steepest Desecent method
c.....n : dimension of design vector
c.....eps : accuracy in steepest-descent search
c.....epss : accuracy in golden section search
c.....x0 : initial design vector
c.....xstar : final design solution (output)
c.....fstar : final objective function value (output)
c.....nfun : number of function evaluations required
c.....ierr : error code, 1 for error
c.....rest all are dummy variables of size n
implicit real*8 (a-h,o-z)
dimension x0(n),xstar(n),grad0(n),s0(n),x1(n),
- xdummy(n)
maxiter = 10000
k = 0
136 Optimization for Engineering Design: Algorithms and Examples
15 continue
go to 2
endif
end
subroutine golden(n,x,s,a,b,eps,xstar,nfun,ierr,
- xdummy)
c.....golden section search algorithm
c.....finds xstar such that f(x+xstar*s) is minimum
c.....x : solution vector
c.....s : direction vector
c.....a,b : lower and upper limits
c.....nfun : function evaluations
c.....ierr : error code, 1 for error
c.....xdummy : dummy variable of size n
implicit real*8 (a-h,o-z)
real*8 lw,x(n),s(n),xdummy(n)
c.....step 1 of the algorithm
xstar = a
ierr=0
maxfun = 10000
aw=0.0
bw=1.0
lw=1.0
k=1
c.....golden number
gold=(sqrt(5.0)-1.0)/2.0
w1prev = gold
w2prev = 1.0-gold
c.....initial function evaluations
call mapfun(n,x,s,a,b,w1prev,fw1,nfun,xdummy)
call mapfun(n,x,s,a,b,w2prev,fw2,nfun,xdummy)
ic=0
c.....step 2 of the algorithm
10 w1 = w1prev
w2 = w2prev
c.....calculate function value for new points only
if (ic .eq. 1) then
fw2 = fw1
call mapfun(n,x,s,a,b,w1,fw1,nfun,xdummy)
else if (ic .eq. 2) then
fw1 = fw2
call mapfun(n,x,s,a,b,w2,fw2,nfun,xdummy)
else if (ic .eq. 3) then
call mapfun(n,x,s,a,b,w1,fw1,nfun,xdummy)
call mapfun(n,x,s,a,b,w2,fw2,nfun,xdummy)
138 Optimization for Engineering Design: Algorithms and Examples
endif
c.....region-elimination rule
if (fw1 .lt. fw2) then
ic = 1
aw = w2
lw = bw-aw
w1prev = aw + gold * lw
w2prev = w1
else if (fw2 .lt. fw1) then
ic = 2
bw = w1
lw=bw-aw
w1prev = w2
w2prev = bw - gold * lw
else
ic = 3
aw = w2
bw = w1
lw = bw-aw
w1prev = aw + gold * lw
w2prev = bw - gold * lw
endif
k=k+1
c.....step 3 of the algorithm
if (dabs(lw) .lt. eps) then
xstar = a + (b-a) * (aw+bw)/2
return
else if (nfun .gt. maxfun) then
write(*,3) maxfun,a+aw*(b-a),a+bw*(b-a)
3 format(’Golden section did not converge in’,i6,
- ’ function evaluations’/,’Interval (’,
- 1pe12.5,’,’1pe12.5,’)’)
ierr = 1
return
endif
go to 10
end
subroutine bphase(n,x,s,a,b,nfun,xdummy)
c.....bounding phase method
c.....arguments are explained in subroutine golden
implicit real*8 (a-h,o-z)
dimension x(n),s(n),xdummy(n)
c.....step 1 of the algorithm
c.....initial guess, change if you like
w0 = 0.0
Multivariable Optimization Algorithms 139
delta = 1.0
1 call mapfun(n,x,s,0d0,1d0,w0-delta,fn,nfun,xdummy)
call mapfun(n,x,s,0d0,1d0,w0,f0,nfun,xdummy)
call mapfun(n,x,s,0d0,1d0,w0+delta,fp,nfun,xdummy)
c.....step 2 of the algorithm
if (fn .ge. f0) then
if (f0 .ge. fp) then
delta = 1 * delta
else
a = w0 - delta
b = w0 + delta
endif
elseif ((fn .le. f0) .and. (f0 .le. fp)) then
delta = -1 * delta
else
delta = delta / 2.0
go to 1
endif
k=0
wn = w0 - delta
c.....step 3 of the algorithm
3 w1 = w0 + (2**k) * delta
call mapfun(n,x,s,0d0,1d0,w1,f1,nfun,xdummy)
c.....step 4 of the algorithm
if (f1 .lt. f0) then
c.....bracketing isn’t over, reset wn to w0 and w0 to w1
k = k+1
wn = w0
fn = f0
w0 = w1
f0 = f1
go to 3
else
c.....bracketing is complete, so quit
a = wn
b = w1
endif
if (b .lt. a) then
temp = a
a = b
b = temp
endif
return
end
140 Optimization for Engineering Design: Algorithms and Examples
subroutine fderiv(n,x,grad,f,nfun,xd)
c.....derivative calculation at point x
implicit real*8 (a-h,o-z)
c.....calculates the first derivative of the function
dimension x(n),grad(n),xd(n)
do 10 i = 1,n
xd(i) = x(i)
10 continue
call funct(n,xd,f,nfun)
c.....set delta_x
do 12 i = 1,n
if (xd(i) .lt. 0.01) then
dx = 0.01
else
dx = 0.01 * xd(i)
endif
xd(i) = xd(i) + dx
c...... compute two function evaluations
call funct(n,xd,fp,nfun)
xd(i) = xd(i) - 2 * dx
call funct(n,xd,fn,nfun)
c...... then compute the gradient
grad(i) = (fp-fn)/(2.0*dx)
xd(i)=x(i)
12 continue
return
end
subroutine mapfun(n,x,s,a,b,w,f,nfun,xd)
c.....calculates a point and function value in mapped
c.....w units away from a point x in s direction
implicit real*8 (a-h,o-z)
dimension x(n),s(n),xd(n)
xw = a + w * (b-a)
do 10 i = 1,n
xd(i) = x(i) + xw * s(i)
10 continue
call funct(n,xd,f,nfun)
return
end
subroutine funct(n,x,f,nfun)
c.....calculates the function value at x
implicit real*8 (a-h,o-z)
dimension x(n)
nfun = nfun + 1
Multivariable Optimization Algorithms 141
f=(x(1)*x(1)+x(2)-11.0)**2+(x(1)+x(2)*x(2)-7.0)**2
return
end
subroutine unitvec(n,x,sum)
c.....computes a unit vector
implicit real*8 (a-h,o-z)
dimension x(n)
sum = 0.0
do 1 i = 1,n
sum = sum + x(i)*x(i)
1 continue
sum = dsqrt(sum)
if (sum .le. 1.0e-06) then
do 2 i = 1,n
x(i) = x(i)/sum
2 continue
endif
return
end
Simulation Run
The above code is run on a PC-386 using Microsoft FORTRAN compiler for
minimizing Himmelblau’s function starting from x(0) = (0, 0)T . The solution
is achieved with three decimal places of accuracy. First, the input to the code
is given. Thereafter, the output from the code is presented.
-------------------------------------
Iteration: 0
Solution vector: ( 0.0000E+00, 0.0000E+00,)
Function value: 1.7000E+02 Function Eval. : 5
-------------------------------------
Iteration: 1
Solution vector: ( 1.7880E+00, 2.8098E+00,)
142 Optimization for Engineering Design: Algorithms and Examples
Constrained Optimization
Algorithms
143
144 Optimization for Engineering Design: Algorithms and Examples
∑
J ∑
K
∇f (x) − uj∇gj (x) − vk∇hk (x) = 0, (4.2)
j=1 k=1
gj (x) ≥ 0, j = 1, 2, . . . , J; (4.3)
hk (x) = 0, k = 1, 2, . . . , K; (4.4)
Constrained Optimization Algorithms 145
uj gj (x) = 0, j = 1, 2, . . . , J; (4.5)
uj ≥ 0, j = 1, 2, . . . , J. (4.6)
In order to verify whether a point is a K-T point, all the above conditions
are expressed in terms of u and v vectors. If there exists at least one set of u
and v vectors, which satisfy all K-T conditions, the point is said to be a K-T
point.
146 Optimization for Engineering Design: Algorithms and Examples
EXERCISE 4.1.1
Let us take an exercise problem to illustrate the Kuhn-Tucker points. We
consider the following constrained Himmelblau function:
Minimize f (x) = (x21 + x2 − 11)2 + (x1 + x22 − 7)2
subject to
g1 (x) = 26 − (x1 − 5)2 − x22 ≥ 0,
g2 (x) = 20 − 4x1 − x2 ≥ 0,
x1 , x2 ≥ 0.
Notice that the objective function is the same as that used in exercise problems
in Chapter 3. However, not every point in the search space is feasible.
The feasible points are those that satisfy the above two constraints and
variable bounds. Let us also choose four points x(1) = (1, 5)T , x(2) = (0, 0)T ,
x(3) = (3, 2)T , and x(4) = (3.396, 0)T to investigate whether each point is a
K-T point. The feasible search space and these four points are shown on a
contour plot of the objective function in Figure 4.1. The region on the other
side of the hatched portion of a constraint line is feasible. The combination
of two constraints and variable bounds makes the interior region feasible, as
depicted in the figure. We would like to find out whether each of these points
is a likely candidate for the minimum of the above NLP problem.
Figure 4.1 The feasible search space and four points x(1) , x(2) , x(3) , and x(4) .
Table 4.1 Gradient and Constraint Values at Four Different Points for Constrained
Himmelblau’s Function. [The gradients for constraints g2 , g3 , and g4
are same for all points: ∇g2 (x(t) ) = (−4, −1)T , ∇g3 (x(t) ) = (1, 0)T , and
∇g4 (x(t) ) = (0, 1)T .]
1 18 8
( ) ( ) ( )
1 −15.000 11.000 1.000 5
5 370 −10
0 −14 10
( ) ( ) ( )
2 1.000 20.000 0.000 0
0 −22 0
3 0 4
( ) ( ) ( )
3 18.000 6.000 3.000 2
2 0 −4
3.396 0 3.21
( ) ( ) ( )
4 23.427 6.416 3.396 0
0 1 0
u1 , u2 , u3 , u4 ≥ 0.
For the second point x(2) = (0, 0)T , we obtain the following conditions:
The vector u∗ = (0, 0, 0, 0)T satisfies all the above conditions. Thus, the point
x(3) is a K-T point (Figure 4.1). As mentioned earlier, K-T points are likely
candidates for minimal points. To conclude, we may say that the optimality
of a point requires satisfaction of more conditions, a point we shall discuss
later.
The K-T conditions obtained for the point x(4) = (3.396, 0)T are
−3.21u1 + 4u2 − u3 = 0, 1 + u2 − u4 = 0,
23.427 > 0, 6.416 > 0, 3.396 > 0, 0 = 0,
(23.427)(u1 ) = 0, (6.416)u2 = 0, (3.396)u3 = 0, (0)u4 = 0,
u1 , u2 , u3 , u4 ≥ 0.
The solution to the above conditions is the vector u∗ = (0, 0, 0, 1)T . Thus,
the point x(4) is also a K-T point. It is clear from the figure that the point
x(3) is the minimum point, but the point x(4) is not a minimum point. Thus,
we may conclude from the above exercise problem that a K-T point may or
may not be a minimum point. But if a point is not a K-T point (point x(2)
or x(3) ), then it cannot be an optimum point. In order to say more about
optimality of points, the following two theorems are useful.
Kuhn-Tucker necessity theorem
Consider the NLP problem described above. Let f , g, and h be differentiable
functions and x∗ be a feasible solution to NLP. Let I = {j|gj (x∗ ) = 0} denote
Constrained Optimization Algorithms 149
the set of active inequality constraints. Furthermore, ∇gj (x∗ ) for j ∈ I and
∇hk (x∗ ) for k = 1, 2, . . . , K are linearly independent (known as constraint
qualification). If x∗ is an optimal solution to NLP, there exists a (u∗ , v ∗ )
such that (x∗ , u∗ , v ∗ ) satisfies Kuhn-Tucker conditions.
If a feasible point satisfies the constraint qualification condition, the K-T
necessity theorem can be used to prove that the point is not optimal. However,
if the constraint qualification is not satisfied at any point, the point may or
may not be an optimal point. For a non-boundary yet feasible point, the
constraint qualification condition depends on equality constraints only. In
the absence of equality constraints, all feasible, non-boundary points meet
the constraint qualification condition. The above theorem can only be used
to conclude whether a point is not an optimum point. We use the above
theorem to investigate the non-optimality of four points described before.
The first point is simply not a feasible point; thus the point cannot be
an optimum point. At the second point, there are two active constraints:
g3 (x(2) ) = 0 and g4 (x(2) ) = 0. Since their derivatives (1, 0)T and (0, 1)T are
linearly independent, they meet the constraint qualification. But since the
point is not a K-T point (shown earlier), it cannot be an optimal point. The
third point is a feasible as well as a non-boundary point. Since the point is
a K-T point, we cannot conclude anything about it—the point could be or
could not be a minimum point. The fourth point makes the constraint g4 (x)
active. Thus, the constraint qualification is met. Again, the point is found to
be a K-T point; thus we cannot conclude whether the point is a minimum or
not.
The optimality of a Kuhn-Tucker point can be checked using the
sufficiency theorem described below. However, the theorem is only applicable
to a particular class of problems having a convex1 objective function and
concave2 constraints.
The Hessian matrix of the Himmelblau function at x(0) = (0, 0)T is
calculated in Step 4 of the first iteration in Exercise 3.4.3. The leading
principal determinants are −42.114 and 1094.4. Since both of them are
not positive, the Hessian matrix at x(0) is not positive-definite nor positive-
semidefinite. Thus, the Himmelblau function is not a convex function, as can
also be seen from the contour plot. As a result, the sufficiency theorem cannot
be applied to this function. We mention here that the Himmelblau function is
chosen in this exercise problem simply because most algorithms presented in
1
A function f (x) is defined as a convex function if for any two points x(1) and
(2)
x in the search space and for 0 ≤ λ ≤ 1
The convexity of a function is tested by checking the Hessian matrix of the function.
If the Hessian matrix is positive-definite or positive-semidefinite for all values of x
in the search space, the function is a convex function (Strang, 1980).
2
A function f (x) is defined as a concave function if the function −f (x) is a convex
function.
150 Optimization for Engineering Design: Algorithms and Examples
this chapter are applied on this function. In order to maintain the continuity
of our discussion, we present the sufficiency theorem and illustrate the usage
of the theorem on a different problem with a convex objective function.
Kuhn-Tucker sufficiency theorem
Let the objective function be convex, the inequality constraints gj (x) be all
concave functions for j = 1, 2, . . . , J and equality constraints hk (x) for
k = 1, 2, . . . , K be linear. If there exists a solution (x∗ , u∗ , v ∗ ) that satisfies
the K-T conditions, then x∗ is an optimal solution to the NLP problem.
EXERCISE 4.1.2
To illustrate the use of the above sufficiency theorem, we consider a convex
objective function as follows:
We construct an NLP problem with the above objective function and the same
constraints and variable bounds as in Exercise 4.1.1. The objective function
has the following Hessian matrix:
( )
2 2 0
∇ f (x) = .
0 2
The leading principal determinants are |2| and |∇2f (x)| or 2 and 4,
respectively. Since both these values are positive, the Hessian matrix is
positive-definite and the function f (x1 , x2 ) is a convex function. Let us
consider the point x(3) = (3, 2)T . The point x(3) is a K-T point with a u-
vector: u∗ = (0, 0, 0, 0)T . The first constraint g1 (x) is a concave function
because the matrix
( )
2 0
−∇2g(x) =
0 2
m
∑
L(x, λ) = f (x) + λi gi (x), (4.9)
i=1
where λi is the dual variable corresponding to the i-th constraint. For the
dual problem (D), components of λ-vector are treated as variables. For an
inequality constraint, λi is restricted to be nonnegative. The function θ(λ),
defined above, is known as the Lagrangian dual function. Let us denote the
solution to dual problem (D) as λD (dual solution) and the objective of the
dual problem as θD . The x-vector corresponding to dual solution (xD ) need
not be feasible to the primal problem nor it be always identical to the primal
solution xP . The duality gap (ρ) is defined as the difference between the
objective values of the primal and dual solutions, or ρ = (f P − θD ).
It is important to note that for any x feasible to (P) and any λ feasible to
(D), that is λi ≥ 0, the weak duality theorem states the following (Bazaraa
et al., 2004; Bector et al., 2005; Rockafellar, 1996):
Figure 4.2 shows the feasible region for the objective function values. Clearly,
the optimum is at (x1 , x2 )T = (1, 1)T with a function value equal to 2. We
find the Lagrange multiplier for the inequality constraint by writing the KKT
conditions at the optimum point:
2(x
1 − 2) 1 0
+ λ = ,
2(x2 − 2) T
1 0
(1,1)
or, −2 + λ = 0,
or, λ = 2.
λ ≥ 0. (4.13)
Using the first and second-order optimality conditions on the minimization
problem reveals the optimum x1 = x2 = 2 − λ/2. At this point, θ(λ) =
2λ − λ2 /2. Maximizing this function reveals λ∗ = 2 and θ(λ∗ ) = 2, which
is identical to the optimal objective value at the primal solution. Figure 4.3
shows the dual function and its maximum point. Interestingly, every feasible
x has a higher function value than every feasible dual solution λ. Thus, both
weak and strong duality conditions are satisfied at this problem.
With this background on the optimality conditions, we shall now present
a number of optimization algorithms which attempt to find an optimum point
iteratively. We begin with the direct search methods.
154 Optimization for Engineering Design: Algorithms and Examples
This penalty term is used for handling inequality constraints. Here, the
term R is a large number (usually a 1020 is used) and J denotes the set
of violated constraints at the current point. Thus, a penalty proportionate to
the constraint violation is added to the objective function. In this term, only
one sequence with a large value of the penalty parameter is used. Since only
infeasible points are penalized, this is also an exterior penalty term.
Log penalty
Ω = −R ln [g(x)].
This penalty term is also used for inequality constraints. For infeasible points,
g(x) < 0. Thus, this penalty term cannot assign penalty to infeasible points.
For feasible points, more penalty is assigned to points close to the constraint
boundary or points with very small g(x). Since only feasible points are
penalized, this is an interior penalty term. In order to use this term, special
care needs to be taken to handle infeasible points. Using this term, the first
sequence is started with a large value of R. Thereafter, the penalty parameter
R is gradually reduced to a small value.
Inverse penalty
[ ]
1
Ω=R .
g(x)
Like the log penalty term, this term is also suitable for inequality constraints.
This term penalizes only feasible points—the penalty is more for boundary
points. This is also an interior penalty term and the penalty parameter is
assigned a large value in the first sequence. In subsequent sequences, the
parameter R is reduced gradually.
156 Optimization for Engineering Design: Algorithms and Examples
EXERCISE 4.3.1
Consider the constrained Himmelblau’s function:
subject to
(x1 − 5)2 + x22 − 26 ≥ 0, x1 , x2 ≥ 0.
The inclusion of the constraint changes the unconstrained optimum point.
The feasible region and the optimum point of this NLP is shown in Figure 4.4.
The figure shows that the original optimum point (3, 2)T is now an infeasible
point. The new optimum is a point on the constraint line that touches a
contour line at that point.
Step 1 We use the bracket-operator penalty term to solve this problem.
The bracket operator penalty term is an exterior penalty term. We choose
an infeasible point x(0) = (0, 0)T as the initial point. We also choose a small
value for the penalty parameter: R(0) = 0.1. We choose two convergence
parameters ϵ1 = ϵ2 = 10−5 .
Step 2 The next task is to form the penalty function:
Figure 4.4 The feasible search space and the true minimum of the constrained
problem in Exercise 4.3.1.
descent method to solve the above problem (the FORTRAN code presented
at the end of Chapter 3 is used). We begin the algorithm with an initial
solution x(0) = (0, 0)T having f (x(0) ) = 170.0. At this point, the constraint
violation is −1.0 and the penalized function value P (x(0) , R(0) ) = 170.100.
Intermediate points obtained by the steepest descent algorithm are tabulated
in Table 4.3.1, and some of these points are shown in Figure 4.5. After 150
function evaluations, the solution x∗ = (2.628, 2.475)T having a function value
equal to f (x∗ ) = 5.709 is obtained. At this point, the constraint violation is
equal to −14.248, but has a penalized function value equal to 25.996, which is
smaller than that at the initial point. Even though the constraint violation at
this point is greater than that at the initial point, the steepest descent method
has minimized the penalized function P (x, R(0) ) from 170.100 to 25.996. We
set x(1) = (2.628, 2.475)T and proceed to the next step.
Step 5 At this step, we update the penalty parameter R(1) = 10 × 0.1 = 1.0
and move to Step 2. This is the end of the first sequence. It is important
here to note that with a different initial point, we could have also converged
to the same point. But simulation runs with certain initial points may have
taken a longer time to converge than with other points. However, solutions
in subsequent sequences will be identical for all simulations.
Constrained Optimization Algorithms 159
Figure 4.5 A simulation of the steepest descent method on the penalized function
with R = 0.1. The feasible region is marked. The hashes used to mark
the feasible region are different from that in most other figures in this
book.
Step 3 At this step, we once again use the steepest descent method to solve
the above problem from the starting point (2.628, 2.475)T . Table 4.3.1 shows
intermediate points of the simulation run. The minimum of the function is
found after 340 function evaluations and is x(2) = (1.011, 2.939)T . At this
point, the constraint violation is equal to −1.450, which suggests that the
point is still an infeasible point. The penalized function and the minimum of
the function are both shown in Figure 4.6. The progress of the previous
Figure 4.6 Intermediate points using the steepest descent method for the
penalized function with R = 1.0 (solid lines). The hashes used to
mark the feasible region is different from that in most other figures in
this book.
sequence is also shown using dashed lines. Observe that this penalized
function is distorted with respect to the original Himmelblau function. This
distortion is necessary to shift the minimum point of the current function
closer to the true constrained minimum point. Also notice that the penalized
function at the feasible region is undistorted.
Step 4 Comparing the penalized function values, we observe that
P (x(2) , 1.0) = 58.664 and P (x(1) , 0.1) = 25.996. Since they are very different
from each other, we continue with Step 5.
Step 5 The new value of the penalty parameter is R(2) = 10.0. We
increment the iteration counter t = 2 and go to Step 2.
In the next sequence, the penalized function is formed with R(2) = 10.0.
The penalized function and the corresponding solution is shown in Figure 4.7.
This time the steepest descent algorithm starts with an initial solution x(2) .
The minimum point of the sequence is found to be x(3) = (0.844, 2.934)T
with a constraint violation equal to −0.119. Figure 4.7 shows the extent
of distortion of the original objective function. Compare the contour levels
Constrained Optimization Algorithms 161
Figure 4.7 Intermediate points obtained using the steepest descent method for the
penalized function with R = 10.0 (solid lines near the true optimum).
Notice the distortion in the function. The hashes used to mark the
feasible region are different from that in most other figures in this
book.
shown at the top right corner of Figures 4.5 and 4.7. With R = 10.0, the effect
of the objective function f (x) is almost insignificant compared to that of the
constraint violation in the infeasible search region. Thus, the contour lines are
almost parallel to the constraint line. Fortunately in this problem, the increase
in the penalty parameter R only makes the penalty function steeper in the
infeasible search region. In problems with a sufficiently nonlinear objective
function and with multiple constraints, a large value of the penalty parameter
may create one or more artificial local optima in the search space, thereby
making it difficult for the unconstrained search to obtain the correct solution.
The advantage of using the unconstrained search method sequentially is that
the unconstrained search is always started from the best point found in the
previous sequence. Thus, despite the presence of many local optima in the
search space, the search at every sequence is initiated from a point near the
correct optimum point. This makes it easier for the unconstrained search to
find the correct solution.
After another sequence (iteration) of this algorithm, the obtained solution
is
x(4) = (0.836, 2.940)T
with a constraint violation of only −0.012. This point is very close to the
true constrained optimum solution. A few more iterations of the penalty
function method may be performed to get a solution with the desired accuracy.
Although a convergence check with a small difference in the penalized function
162 Optimization for Engineering Design: Algorithms and Examples
gj (x)
≥ 0,
gmax
where gmax is the maximum value of the constraint gj (x) in the search space.
Often, engineering design problems contain constraints restraining resource
or capacity of bj as gj′ (x) ≤ bj . The constraint can be normalized as follows:
gj′ (x)
1− ≥ 0.
bj
The problem with the penalty function method is that the penalized function
becomes distorted at later sequences and that the unconstrained methods
may face difficulty in optimizing those distorted functions. There exist a
number of techniques to alleviate this problem. One method is to use a fixed
penalty parameter R with a multiplier corresponding to each constraint. The
constraint violation is increased by the multiplier value before calculating the
penalty term. Thereafter, an equivalent term is subtracted from the penalty
term. This method works in successive sequences, each time updating the
multipliers in a prescribed manner. The penalty function is modified as
follows:
{( )2 ( )2 }
(t) (t) ∑J
(t) (t)
P (x, σ , τ ) = f (x) + R ⟨gj (x) + σj ⟩ − σj
j=1
{( )2 ( )2 }
∑
K
(t) (t)
+R hk (x) + τk − τk ,
k=1
By differentiating the term P (x, σ (t) , τ (t) ), it can been shown that the final
solution xT of the above procedure is a K-T point (Reklaitis et al., 1983). The
above formulation does not distort the original objective function but shifts it
towards the constrained optimum point. Thus, the complexity of solving the
penalized function remains the same as that of the original objective function.
Another advantage of this method is that the final values of the multipliers
can be used to compute the corresponding Lagrange multipliers, which are
also known as shadow prices, using the following two equations:
uj = −2RσjT , (4.15)
vk = −2RτkT . (4.16)
The importance of Lagrange multipliers in the context of constrained
optimization will be understood better, when we discuss sensitivity analysis
in the next section.
The method of multiplier (MOM) is similar to the penalty function
(0) (0)
method. The initial values of multipliers (σj and τk ) are usually kept
to be zero. The penalty parameter R is kept constant in all sequences. At
every sequence, the unconstrained function is minimized and a new point is
found. The multipliers are updated using the constraint value at the new
point and a new unconstrained function is formed. This process continues
until a convergence criterion is met.
Algorithm
Step 1 Choose a penalty parameter R, termination parameters ϵ1 and
(0) (0)
ϵ2 . Choose an initial solution x(0) . Set multipliers σj = τk = 0 and the
iteration counter t = 0.
Step 2 Next, form the penalized function:
{( )2 ( )2 }
∑
J
(t) (t)
P (x, σ (t) , τ (t) ) = f (x) + R ⟨gj (x) + σj ⟩ − σj
j=1
{( )2 ( )2 }
∑
K
(t) (t)
+R hk (x) + τk − τk .
k=1
EXERCISE 4.3.2
Consider the following function:
subject to
x1 , x2 ≥ 0.
(0)
Step 1 We choose R = 0.1 and an initial point x(0) = (0, 0)T . We set σ1 =
0. We also choose convergence factors ϵ1 = ϵ2 = 10−5 . We set an iteration
counter t = 0. Once again, we simplify our calculations by not considering
the variable bounds as inequality constraints.
(0)
P (x, σ1 ) = (x21 + x2 − 11)2 + (x1 + x22 − 7)2
Recall that the bracket operator ⟨α⟩ takes a nonzero value α, if α is negative.
Since this function and the initial point considered here are the same as in
the first iteration of Exercise 4.3.1, Steps 3 and 4 are not repeated here. We
simply reproduce the solution x(1) = (2.628, 2.475)T (Table 4.3.2) and move
to Step 5.
(1)
σ1 = ⟨−14.248⟩ + 0 = −14.248.
We set t = 1 and proceed to Step 2. This completes one sequence of the MOM
algorithm.
Constrained Optimization Algorithms 165
Step 3 Starting from the point x(1) = (2.628, 2.475)T , we use the
unconstrained steepest descent search method to solve the above function.
We obtain the solution x(2) = (1.948, 2.846)T . The intermediate points of the
steepest descent method are shown in Table 4.3.2.
At this point, the objective function value is f (x(2) ) = 28.292 and the
constraint violation is g1 (x(2) ) = −8.586, an improvement from the first
iteration. Figure 4.8 shows the contour of the penalized function and the
progress of the steepest descent method. It is clear that the function is less
distorted as compared to that in the penalty function method. In fact, this
166 Optimization for Engineering Design: Algorithms and Examples
function can be viewed as a shift of the original penalized function (Figure 4.5)
towards the current minimum point. Note that the function does not change
in the feasible region.
Figure 4.8 Intermediate points obtained using the steepest descent method for
(1)
the minimization of P (x, σ1 ). The hashes used to mark the feasible
region are different from that in most other figures in this book.
Figure 4.9 Intermediate points obtained at the third iteration of the MOM
algorithm. The penalized function is not distorted, but translated
towards the constrained optimum point. The hashes used to mark the
feasible region are different from that in most other figures in this book.
Once the optimal Lagrange multipliers are found, the sensitivity analysis
can be performed using the following procedure. The NLP problem given in
Equation (4.1) can be rewritten by separating the constant term from the rest
of the expression as follows:
Minimize f (x)
subject to
′
gj (x) ≥ αj , j = 1, 2, . . . , J; (4.18)
h′k (x) = βk , k = 1, 2, . . . , K,
∂f ∗
= uj , j = 1, 2, . . . , J; (4.19)
∂αj
∂f ∗
= vk , k = 1, 2, . . . , K. (4.20)
∂βk
The net change in the optimal objective function value is then obtained as
follows:
∑J ∂f ∗ ∑K ∂f ∗
∆f ∗ = ∆αj + ∆βk ,
j=1 ∂αj k=1 ∂βk
∑
J ∑
K
= uj ∆αj + vk ∆βk . (4.21)
j=1 k=1
The above analysis is valid for small changes in right-side parameters and
can be used to get an estimate in the change in value of optimal objective
function value due to changes in right-side parameters without performing
another optimization run.
We illustrate the sensitivity analysis procedure with the help of the
following exercise problem.
EXERCISE 4.4.1
We consider the geometric optimization problem:
x1 + x2 − 2 ≥ 0.
Constrained Optimization Algorithms 171
Figure 4.10 The bounded region and the point (3, 2)T . The optimum point and
the minimum distance of the bounded region from the point are also
shown. The plot also shows how the minimum distance changes with
the change in one of the constraints.
u2 = −2Rσ2T = −2(0.1)(0) = 0,
172 Optimization for Engineering Design: Algorithms and Examples
0 (5.000, 5.000)T 0 0
1 (2.076, 1.384)T −2.226 0
T
2 (1.844, 1.229) −3.135 0
T
3 (1.751, 1.167) −3.565 0
4 (1.708, 1.139)T −3.781 0
T
5 (1.687, 1.125) −3.893 0
T
6 (1.676, 1.117) −3.950 0
T
7 (1.670, 1.113) −3.977 0
8 (1.668, 1.112)T −3.996 0
T
9 (1.668, 1.110) −4.004 0
T
10 (1.665, 1.110) −4.010 0
which are also close to the true Lagrange multipliers found using the Kuhn-
Tucker conditions. With these values of Lagrange multipliers, let us now
investigate the effect of each constraint on the optimal objective function value
by using the sensitivity analysis procedure and by referring to the bounded
region shown in Figure 4.10. Since u1 is greater than u2 , the effect of the
first constraint on the optimal objective function value is more. This aspect
can also be visualized in Figure 4.10. Comparing this NLP problem with the
problem stated in Equation (4.18), we observe that α1 = −4 and α2 = 2.
A small change in the value of α1 changes the outcome of the optimum
solution. On the other hand, a small change in the value of α2 does not
change the outcome of the optimum solution, because u2 = 0. Thus, the first
constraint is more crucial than the second one. Let us now compute the net
change in the optimal objective function value for small changes in α1 and
α2 values. If α1 is now changed to −3 and α2 is unchanged, the changes
are ∆α1 = −3 − (−4) = 1 and ∆α2 = 0. Thus, the net change in optimal
function value according to Equation (4.21) is
Therefore, the optimal function value increases, which means that we now
have a larger distance from the point (3, 2). A change of the constraint from
4 − x21 − x22 ≥ 0 to 3 − x21 − x22 ≥ 0 (shown with a dashed line in the figure)
takes the bounded region away from the point (3, 2) (as shown by the dashed
curve line in the figure), thereby increasing the minimum distance of the
Constrained Optimization Algorithms 173
new bounded region from the fixed point (3, 2)T . In fact, the new optimum
function value is 3.510 which is 0.931 units more than the original optimal
function value. This value is closer to the estimate ∆f ∗ obtained using the
sensitivity analysis.
∑
K
(L) (U )
Minimize abs [hk (x(t) , x
b)], x
bi ≤x
bi ≤ x
bi .
k=1
The nomenclature used in the above description may look intimidating, but
this algorithm is one of the simplest algorithms we have discussed in this
chapter. We take an exercise problem to illustrate this method.
EXERCISE 4.5.1
Consider the constrained Himmelblau function:
subject to
h(x) = 26 − (x1 − 5)2 − x22 = 0,
0 ≤ x1 , x2 ≤ 5.
In the explicit method, the expression for the dependent variable in terms
of the independent variable is directly substituted to the objective function.
In the above problem, we may write the expression from the constraint as
follows: √
x2 = 26 − (x1 − 5)2 .
We also observe that when x2 is eliminated, the variable bounds on the
independent variable x1 need to be changed to 0 ≤ x1 ≤ 4. Because, any
value of 4 < x1 ≤ 5 will produce a value of x2 > 5, which is not acceptable.
Thus, the unconstrained minimization problem becomes
√
Minimize f (x1 ) = (x21 + 26 − (x1 − 5)2 − 11)2
This function is plotted in Figure 4.11. Since the above problem is a single-
variable minimization problem, we use the golden section search method
described in Chapter 2 and obtain the minimum point in the interval
(0, 4): x∗
1 = 0.829. The corresponding solution of the exercise problem is
x∗ = (0.829, 2.933)T with a function value equal to 60.373. The variable
substitution method adopted here may not be possible in general. In those
cases, an implicit method needs to be used.
Figure 4.11 The function f (x1 ) vs. x1 . The plot shows that the function is
unimodal in the interval 0 ≤ x1 ≤ 4.
Minimize (x21 + x2 (x1 ) − 11)2 + (x1 + (x2 (x1 ))2 − 7)2 , (4.22)
where 0 ≤ x1 ≤ 4.
The above objective function is a function of x1 only. The variable x2 is
written as a function of x1 . Let us use the Newton-Raphson method (for more
than one variable, a multivariable optimization technique may be used), which
requires calculation of both first and second-order derivatives. We compute
the derivatives numerically. The expression x2 (x1 ) is computed by solving the
following minimization problem (root-finding problem) for a given value of
(t)
x1 = x1 :
(t)
Minimize abs [26 − (x1 − 5)2 − x22 ], 0 ≤ x2 ≤ 5. (4.23)
(t)
This minimization solves the equation 26 − (x1 − 5)2 − x22 = 0 for a fixed
(t)
value of x1 . For example, at the initial point, the value of the dependent
variable x
b = x2 is found by solving the following problem:
(0) (0) 2
where f (x1 ) = (x21 + x2 − 11)2 + (x1 + x2 − 7)2 . Computing the above
(1)
equation results: x1= 2.80. The resulting point is (2.80, 5.00)T . As shown
in Figure 4.12, this point is not a feasible point. Thus, in order to make
the point feasible, we keep the same value of the independent variable x1 , but
alter the dependent variable x2 so that the constraint is satisfied. This involves
another optimization procedure (problem stated in (4.23)). The corresponding
value for x2 found using the golden section search on the problem given
(1) (1)
in Equation (4.23) for x1 = 2.80 is x2 = 4.60. The next iteration of the
Newton-Raphson method results in the point (2.09, 4.60)T . After adjusting
this solution for constraint satisfaction, we find the point (2.09, 4.19)T .
Another iteration of Newton-Raphson search and constraint satisfaction
using the golden section search yields the point x(3) = (1.91, 4.06)T . This
process continues until the minimum point is found. Figure 4.12 shows
Constrained Optimization Algorithms 177
Figure 4.12 The constraint h(x) and the progress of the implicit variable
elimination method shown on a contour plot of the Himmelblau
function.
how the implicit variable elimination method begins from the initial point
x(0) = (4, 5)T and moves towards the true minimum point.
Thus, the implicit variable elimination method works by using two
optimization problems—a unconstrained optimization problem followed by
a root finding problem for constraint satisfaction. Besides, using the two
optimization problems successively as illustrated here, the unconstrained
optimization problem (given in Equation (4.22) can be solved for x (x1 here)
and whenever an objective function value is required to be evaluated, the
corresponding x b(x) (x2 here) can be found using a root-finding technique.
The latter technique is discussed in detail in Section 4.9.
new point. Depending on the feasibility and function value of the new point,
the point is further modified or accepted. If the new point falls outside the
variable boundaries, the point is modified to fall on the violated boundary. If
the new point is infeasible, the point is retracted towards the feasible points.
The worst point in the simplex is replaced by this new feasible point and the
algorithm continues for the next iteration.
Algorithm
Step 1 Assume a bound in x (x(L) , x(U ) ), a reflection parameter α and
termination parameters ϵ, δ.
Step 2 Generate an initial set of P (usually 2N ) feasible points. For each
point
(p)
(a) Sample N times to determine the point xi in the given bound.
(b) If x(p) is infeasible, calculate x (centroid) of current set of points and
reset
x(p) = x(p) + 12 (x − x(p) )
until x(p) is feasible;
Else if x(p) is feasible, continue with (a) until P points are created.
(c) Evaluate f (x(p) ) for p = 0, 1, 2, . . . , (P − 1).
Step 3 Carry out the reflection step:
(a) Select xR such that
(b) Calculate the centroid x (of points except xR ) and the new point
xm = x + α(x − xR ).
(c) If xm is feasible and f (xm ) ≥ Fmax , retract half the distance to the
centroid x. Continue until f (xm ) < Fmax ;
Else if xm is feasible and f (xm ) < Fmax , go to Step 5.
Else if xm is infeasible, go to Step 4.
Step 4 Check for feasibility of the solution
(a) For all i, reset violated variable bounds:
(L) (L)
If xm
i < xi set xm
i = xi .
(U ) (U )
If xm
i > xi set xm
i = xi .
(b) If the resulting xm is infeasible, retract half the distance to the centroid.
Continue until xm is feasible. Go to Step 3(c).
Constrained Optimization Algorithms 179
Terminate;
Else set k = k + 1 and go to Step 3(a).
For the successful working of this algorithm, it is necessary that the
feasible region should be convex; otherwise retraction of centroid point in
Steps 3(c) and 4(b) may result in infeasible points. Often, the points of the
simplex may end up close to a constraint boundary, in which case the speed of
the algorithm tends to become somewhat slow. In problems where the feasible
search space is narrow or the optimum lies at the constraint boundary, the
algorithm is not very efficient. On the other hand, for problems with convex
feasible search space and with the optimum well inside the search space, this
algorithm is efficient.
EXERCISE 4.5.2
Consider the constrained Himmelblau function in the range 0 ≤ x1 , x2 ≤ 5.
subject to
g1 (x) = 26 − (x1 − 5)2 − x22 ≥ 0,
g2 (x) = 20 − 4x1 − x2 ≥ 0,
x1 , x2 ≥ 0.
Step 1 The minimum and maximum bounds on both variables are taken to
be 0 and 5, respectively. The reflection parameter α = 1.3 and convergence
parameters ϵ = δ = 10−3 are chosen.
Step 2 Since N = 2, we need to create a set of four feasible points. Since the
points are required to be random in the chosen range, we create two random
numbers to choose a point:
xi = 0 + ri (5 − 0).
(a) We create two random numbers r1 = 0.10 and r2 = 0.15. Thus, the first
point is x(1) = (0.50, 0.75)T . To investigate whether this point is feasible, we
compute the constraints: g1 (x(1) ) = 5.19 > 0 and g2 (x(1) ) = 17.25 > 0. Thus,
the point x(1) is feasible. The feasibility of the point can also be observed
from Figure 4.13.
180 Optimization for Engineering Design: Algorithms and Examples
(a) We create two new random numbers to find the second point: r1 = 0.4
and r2 = 0.7. Thus, the point is x(2) = (2.00, 3.50)T . Calculating the
constraint values, we observe that the point is feasible (Figure 4.13).
(a) We create another set of random numbers: r1 = 0.9 and r2 = 0.8 to
create the third point. The point is x(3) = (4.50, 4.00)T . For this point, we
observe that g1 (x(3) ) = 9.75 > 0, but g2 (x(3) ) = −2.00 < 0, which is violated.
Figure 4.13 shows that this point falls on the right-side of the constant g2 (x).
Thus the point is infeasible. In order to create a feasible point, we push the
point towards the centroid of the first two points. This approach is likely to
provide a feasible point.
(b) The centroid of the previous two points is
x = (1.250, 2.125)T ,
Geometrically, this point is obtained by pushing the point x(3) halfway towards
x1,2 . Calculating the constraint values we observe that the new point x(3) is
feasible. Thus, we accept the third point as x(3) = (2.875, 3.062)T .
Constrained Optimization Algorithms 181
(a) Another set of random numbers (r1 = 0.6, r2 = 0.1) creates a feasible
point x(4) = (3.0, 0.5)T .
(c) The function values for these four points are f (x(1) ) = 135.25,
f (x(2) ) = 64.812, f (x(3) ) = 27.711, and f (x(4) ) = 16.312.
Step 3 After creating four feasible points, we now find out the worst point
and reflect it over the centroid of rest of the points.
(a) The worst point is x(1) , as can be seen from the above function values
(or from the figure). Thus, we set xR = x(1) and Fmax = 135.250.
(b) The centroid of the second, the third, and the fourth points is
x = (2.625, 2.354)T and is marked as x2,3,4 in the figure. The new point is
computed by reflection as follows:
( ) (( ) ( )) ( )
m 2.625 2.625 0.500 5.387
x = + 1.3 − = .
2.354 2.354 0.750 4.439
This point falls outside the feasible region as shown in the figure.
By calculating the constraint values we observe that g1 (xm ) = 6.14 and
g2 (xm ) = −5.99, which is violated. Thus, the point xm is infeasible. Since
the point is infeasible, before we do any modification, we check whether the
obtained point is close to a variable boundary or not.
(b) We retract half the distance to the centroid. This can be achieved by
taking the average of the points x (x2,3,4 in the figure) and xm :
At this point, the constraints are not violated, thus the point is feasible. We
accept this point and move to Step 3(c). This point is marked x(5) in the
figure.
Step 3(c) The function value at this point is f (x(5) ) = 117.874, which is
smaller than Fmax = 135.250. We continue with Step 5.
Step 5 We now form the new simplex and check for termination.
(b) The sum of the squared differences in function values from f is found
∑
to be 8685.9 which is large compared to ϵ2 . The quantity p ∥x(p) − x∥2 is
also large compared to δ 2 . This completes one iteration of the complex search
algorithm. Figure 4.13 shows the progress of one iteration of this algorithm.
In order to proceed to the next iteration, we need to move to Step 3(a), but
we do not show the results here for brevity.
After another iteration of the complex search algorithm, the new
point is found to be x(6) = (1.081, 0.998)T with a function value equal to
f (x(6) ) = 102.264. The new simplex constitutes points x(2) , x(3) , x(4) , and
x(6) . This process continues until the simplex size becomes smaller than the
chosen termination parameters. The parameters P and α are two important
parameters for the successful working of the complex search method. The
larger the value of P (number of points in the simplex), the better the
convergence characteristics. But a large number of points in the simplex
requires more function evaluations in each iteration. A compromise of P ≈ 2N
is usually followed (Box, 1965). As seen in the above exercise, a large value of
α may create points outside the feasible range. Although the infeasible points
can be brought back to the feasible region by successive retraction, frequent
retraction may cause the points to lie along the constraint boundaries. This
reduces the search power considerably. Thus, the algorithm works well when
the simplex points lie well inside the feasible region.
Algorithm
Step 1 Given an initial feasible point x0 , an initial range z 0 such that the
minimum, x∗ , lies in (x0 − 12 z 0 , x0 + 21 z 0 ). Choose a parameter 0 < ϵ < 1.
For each of Q blocks, initially set q = 1 and p = 1.
Constrained Optimization Algorithms 183
EXERCISE 4.5.3
Consider again the constrained Himmelblau function:
subject to
g1 (x) = 26 − (x1 − 5)2 − x22 ≥ 0,
g2 (x) = 20 − 4x1 − x2 ≥ 0,
x1 , x2 ≥ 0.
Step 1 Let us assume that the initial point is x0 = (3, 3)T , the initial
interval z 0 = (6, 6)T . Other parameters are P = 3 (in practice, a large
value is suggested), Q = 10, and ϵ = 0.25 (in practice, a much smaller value
is suggested). We set counters p = q = 1.
Step 2 We create two random numbers between −0.5 to 0.5: r1 = 0.018
and r2 = −0.260. The corresponding point is
( ) ( )
3 + (0.018)6 3.108
x(1) = = .
3 + (−0.260)6 1.440
Step 3 The point x(2) is also feasible and the function value is f (x(2) ) =
3.299.
Step 2 To create the third point, we create two new random numbers: r1 =
−0.464 and r2 = 0.149. The corresponding point is x(3) = (0.216, 3.894)T .
Step 3 This point is not feasible, since the first constraint is violated
(g1 (x(3) ) = −12.050). This point is also an infeasible point, as shown in
Figure 4.14. Thus, we do not accept this point, rather create a new point.
Step 2 We choose two new random numbers r1 = −0.344 and r2 = −0.405.
The new point is
x(3) = (0.936, 0.570)T .
Step 2 Since the point x(3) is feasible, we compute the function value
f (x(3) ) = 124.21. At this stage, p = 3 = P ; we find the best of all P
feasible points. We observe that the best point is x(2) with function value
3.299. Thus, x1 = (2.754, 1.836)T . We reset the counter p = 1.
Step 4 We reduce the interval for the next iteration:
Step 5 Since q = 1 ̸> Q = 10, we continue with Step 2. This completes one
iteration of the Luus and Jaakola algorithm. Three points and the reduced
search space are shown in Figure 4.14.
Figure 4.14 Two iterations of random search method. The search region after five
more iterations is also shown.
Constrained Optimization Algorithms 185
Step 2 Creating two more random numbers in the interval (−0.5, 0.5)
(r1 = −0.284 and r2 = 0.311), we obtain the first point for the second
iteration:
( ) ( )
(1) 2.754 + (−0.284)4.5 1.476
x = = .
1.836 + (0.311)4.5 3.236
Step 3 The point x(1) is feasible and has a function value f (x(1) ) =
56.030. We create two other points for comparison. The points
are x(2) = (3.857, 0.647)T and x(3) = (4.070, 2.490)T with function values
f (x(2) ) = 27.884 and f (x(3) ) = 75.575, respectively. Comparing these points
and the previously found best point x1 , we find that the current best point is
x2 = x1 = (2.754, 1.836)T .
Step 4 We reduce the interval z 2 = 0.75z 1 = (3.375, 3.375)T .
Thus, the search interval is centred around the point x2 with a reduced
size. This process continues until the counter q is equal to the specified Q.
The progress of the algorithm is shown in Figure 4.14. If the current best
solution remains the best point for another five iterations, the search region
(the small box in the vicinity of the true minimum) is reduced to a small size as
shown in the figure. Since the search region is small, the probability of finding
either the true minimum or a point close to the true optimum is high. The
reduction of the search space from one iteration to another must be marginal;
otherwise the algorithm may exclude the true optimum and converge to a
wrong solution.
In case the feasible search space is very narrow, this method may be
inefficient, creating many infeasible points. Instead of using this method in
actually finding the optimum point, this method is usually used to find a
feasible initial guess for other more sophisticated constrained optimization
methods.
approximation is valid close to the chosen point x(0) . Two linearized search
techniques are discussed here. One uses an LP algorithm successively to solve
LP problems created by linearizing the objective function and all constraints
at intermediate points. This method is known as the Frank-Wolfe algorithm.
The other method also uses a successive LP algorithm but each time an
LP problem is created by linearizing the objective function and only a few
governing constraints. This method is known as the cutting plane method.
These methods use gradient information of the constraints and the objective
function.
In linearized search methods and in a number of other NLP methods, LP
methodology is extensively used mainly due to its simplicity and availability
of computer codes for implementing LP techniques. The LP methods are
discussed in details in texts on operations research (Taha, 1989). Without
deviating from the main focus of this book, we present a brief discussion on
one linear programming technique in the Appendix.
Step 4 Find α(t) that minimizes f (x(t) + α(y (t) − x(t) )) in the range α ∈
(0, 1).
Step 5 Calculate x(t+1) = x(t) + α(t) (y (t) − x(t) ).
Step 6 If ∥x(t+1) − x(t) ∥ < δ∥x(t) ∥ and if ∥f (x(t+1) ) − f (x(t) )∥ < ϵ∥f (x(t) ∥,
Terminate;
Else t = t + 1 and go to Step 2.
As given in the above algorithm, the simulation can terminate either from
Step 2 or from Step 6. But in most cases, the simulation terminates from
Step 6. The above algorithm works similar to the steepest descent search
method. The convergence of this algorithm to a Kuhn-Tucker point is proved
by Zangwill (1969). Since linear approximations of the objective function and
the constraints are used, in highly nonlinear problems the search process may
have to be restricted to a small neighbourhood of the point x(t) .
EXERCISE 4.6.1
Consider again the Himmelblau function:
subject to
g1 (x) = 26 − (x1 − 5)2 − x22 ≥ 0,
g2 (x) = 20 − 4x1 − x2 ≥ 0,
x1 , x2 ≥ 0.
Step 1 We choose an initial point x(0) = (0, 0)T and convergence parameters
ϵ = δ = 10−3 . We set the iteration counter t = 0.
Step 2 The gradient at this point is ∇f (x(0) ) = (−14, −22)T . Since the
magnitude of the gradient is not close to zero, we move to Step 3.
Step 3 At this step, we first form the LP problem by linearizing the
objective function and all constraints with respect to x = (x1 , x2 )T . In the
above problem, the linearized objective function is calculated as follows:
f (x) = f (x(0) ) + ∇f (x(0) )(x − x(0) ),
( )
x1 − 0
= 170 + (−14, −22) ,
x2 − 0
= −14x1 − 22x2 + 170.
This way the constraints can also be linearized. The complete LP problem is
given as follows:
Minimize − 14x1 − 22x2 + 170
188 Optimization for Engineering Design: Algorithms and Examples
subject to
10x1 + 1 ≥ 0,
−4x1 − x2 + 20 ≥ 0,
x1 , x2 ≥ 0.
The objective function and the two constraints are linear functions of x1 and
x2 . Also observe that the linear constraint g2 (x) remains the same after
linearization. The simplex method of LP search can now be used to solve
the above problem. Since there are only two variables, we solve the above
LP problem graphically. Interested readers may refer to the Appendix for
the simplex technique of LP method. Figure 4.15 shows the feasible region
of above LP problem. By plotting the contours of the objective function at
various function values, we observe that the solution to the above LP problem
is y (0) = (0, 20)T .
Figure 4.15 The linear programming problem in the first iteration of the Frank-
Wolfe algorithm. The optimum point is (0, 20)T .
Step 4 The search direction for the unidirectional search is y (0) − x(0) =
(0, 20)T . Performing the golden section search in the domain (0 ≤ α ≤ 1)
yields the minimum α(0) = 0.145.
Step 5 Thus, the new point is x(1) = (0.000, 2.898)T .
Step 6 Since points x(0) and x(1) are not close enough (with respect to ϵ),
we continue with Step 2. This completes the first iteration of the Frank-Wolfe
algorithm.
Step 2 The gradient at the new point is ∇f (x(1) ) = (2.797, −0.016)T .
Step 3 We form another LP problem by linearizing the objective function
and the two constraints at this point.
Minimize 2.797x1 − 0.016x2 + 67.55
Constrained Optimization Algorithms 189
subject to
10x1 − 3.885x2 + 1 ≥ 0,
−4x1 − x2 + 20 ≥ 0,
x1 , x2 ≥ 0.
Figure 4.16 The linear programming problem in the second iteration of the Frank-
Wolfe algorithm. The optimum solution is (0, 0.257)T .
Step 5 Thus, the new point is x(2) = (0, 2.898)T , which is identical to the
previous point x(1) .
Step 6 Since the points x(1) and x(2) are identical, the algorithm
prematurely terminates to a wrong solution. The history of the intermediate
points is shown on a contour plot of the objective function in Figure 4.17,
where starting from the point (0, 0)T the algorithm converges to the point
(0, 2.898)T on the x2 axis. This premature convergence to a wrong solution is
one of the drawbacks of the Frank-Wolfe algorithm. When this happens, the
algorithm is usually restarted from a different initial point.
Step 1 We restart the algorithm from an initial point x(0) = (1, 1)T . Other
parameters are the same as before.
Figure 4.17 Intermediate points obtained using the Frank-Wolfe algorithm. The
first attempt with an initial solution x(0) = (0, 0)T prematurely
converges to a wrong solution. A restart with an initial solution
x(0) = (1, 1)T finds the true optimum.
subject to
8x1 − 2x2 + 3 ≥ 0,
−4x1 − x2 + 20 ≥ 0,
x1 , x2 ≥ 0.
Step 6 Since the points x(0) and x(1) are not close to each other, we move
to Step 2. This is the end of one iteration of the Frank-Wolfe algorithm with
the new starting point.
Constrained Optimization Algorithms 191
Figure 4.18 The linear programming problem in the first iteration of the Frank-
Wolfe algorithm. The algorithm is started from the point (1, 1)T .
The optimum solution is (2.312, 10.750)T .
subject to
7.538x1 − 3.936x2 + 5.830 ≥ 0,
−4x1 − x2 + 20 ≥ 0,
x1 , x2 ≥ 0.
Figure 4.19 The linear programming problem in the second iteration of the Frank-
Wolfe algorithm. The optimum solution is (5, 0)T .
The cutting plane method begins with a user-defined search space. At every
iteration, some part of that search space is cut (or eliminated) by constructing
linear hyperplanes from the most violated constraint at the current point. The
objective function is minimized in the resulting search space and a new point is
found. Depending on the obtained point, a certain portion of the search space
is further eliminated. Although the algorithm is designed to work for linear
objective functions and convex feasible regions, the algorithm has been applied
successfully to nonlinear objective functions as well. Since the algorithm works
most efficiently for linear objective functions and also the obtained cutting
planes (or constraints) are always linear, a linear programming technique is
used to solve every subproblem.
We first describe the cutting plane algorithm for solving NLP problems
with linear objective functions of the following type:
∑
N
Minimize c i xi
i=1
subject to
gj (x) ≥ 0, j = 1, 2, . . . , J;
(L) (U )
xi ≤ x i ≤ xi , i = 1, 2, . . . , N.
Constrained Optimization Algorithms 193
(L) (U )
Z 0 = {x : xi ≤ x i ≤ xi , i = 1, 2, . . . , N },
such that Z 0 contains the true feasible region. However, any other search
space that includes the optimum point can also be chosen. At any iteration
t, a new cutting plane (p(t) ≥ 0) is found and a new search space Z t is found
by performing an intersection of the previous search space Z t−1 with the new
cutting plane p(t) .
Algorithm
∑
N
Minimize c i xi
i=1
subject to
x ∈ Z 0.
Let us say that the solution is x(1) . Set a counter k = 1.
p(k) (x) ≡ gem (x; x(k) ) = gm (x(k) ) + ∇gm (x(k) )(x − x(k) ) ≥ 0.
Let H (k) define the space H (k) = {x : p(k) (x) ≥ 0}. Solve the following LP
problem:
∑N
Minimize c i xi
i=1
subject to
x ∈ Z (k−1) ∩ H (k) .
Designate the solution x(k+1) .
EXERCISE 4.6.2
Consider the following NLP problem:
subject to
g1 (x) = 26 − (x1 − 5)2 − x22 ≥ 0,
g2 (x) = 20 − 4x1 − x2 ≥ 0,
x1 , x2 ≥ 0.
Figure 4.20 The NLP problem used to illustrate the basic cutting plane algorithm.
The minimum point lies at x∗ = (3.763, 4.947)T with a function value
equal to 8.481.
Constrained Optimization Algorithms 195
subject to
0 ≤ x1 ≤ 6, 0 ≤ x2 ≤ 6.
= 73 − 2x1 − 12x2 ≥ 0.
The vector ∇g1 (x(1) ) is computed numerically using the central difference
technique described in Chapter 3. With the new constraint, we now construct
the second LP problem by including a new constraint p(1) (x) ≥ 0:
subject to
73 − 2x1 − 12x2 ≥ 0,
0 ≤ x1 ≤ 6,
0 ≤ x2 ≤ 6.
196 Optimization for Engineering Design: Algorithms and Examples
Since the new constraint added to the previous problem is linear, the resulting
problem is an LP problem. The solution to this LP problem can be found
graphically. We solve this problem in Figure 4.21. The obtained solution
Figure 4.21 Four cutting planes and intermediate solutions. Note how the initial
square region is cut by various cutting planes to take the shape of
the true feasible region.
Step 3 At the point x(2) , the cutting plane obtained from the second
constraint is calculated as follows:
≡ 20 − 4x1 − x2 ≥ 0.
It is interesting to note that the constraint p(2) (x) ≥ 0 is the same as the
original constraint g2 (x) ≥ 0. Since g2 (x) is linear, the linearization of this
constraint at any point will always produce the same constraint. With the
new constraint, we form another LP problem:
subject to
20 − 4x1 − x2 ≥ 0,
73 − 2x1 − 12x2 ≥ 0,
0 ≤ x1 ≤ 6,
0 ≤ x2 ≤ 6.
constraint surface. Thus, after a number of iterations when the cutting planes
surround the true feasible search space closely, the solution to the LP problem
constructed by these cutting planes will be close to the true minimum point.
However, there are some disadvantages with this method. Firstly, the
method can only be applied to convex feasible region and linear objective
function efficiently. Secondly, the method cannot be terminated prematurely,
because at any iteration the obtained solution is usually not feasible. Thirdly,
the algorithm creates a new constraint at every iteration. Soon, the number
of constraints becomes too many and the speed of the algorithm slows down.
However, as seen from Figure 4.21, some constraints may dominate other
constraints which may be eliminated from further consideration. In the
above problem, the constraint p(3) (x) ≥ 0 implies the constraint p(4) (x) ≥ 0.
Thus, we can eliminate the former constraint from further consideration. This
problem can be alleviated by advocating a cut-deletion procedure mentioned
in the following algorithm.
In the cut-deletion method, at every iteration t, some constraints are
deleted based on their inactiveness and the objective function value at the
current point. For each previously created constraint, two conditions are
checked. At first, the constraints are checked for inactiveness at the current
point. Secondly, the objective function value at the current point is compared
with the penalized function value at the point where each constraint is created.
If the constraint is found to be inactive and the current function value is
greater than the penalized function value at the previously found best point,
there exists at least one other constraint that dominates this constraint at the
point. Thus, this constraint is included in the set for deletion. This procedure
requires the storage of all previously found best points corresponding to each
constraint. The cut-deletion algorithm is obtained by replacing Step 4 of the
cutting plane algorithm by two steps, given as follows:
Algorithm
Step 4(a) Determine the set D of cutting planes to be deleted. For each
i ∈ I (t) ∪ {t}, if both
subject to
p(3) (x) = 44.22 + 2.74x1 − 10.96x2 ≥ 0,
0 ≤ x1 , x2 ≤ 6.
At the end of Step 3, we have solved the LP problem and found the solution
x(4) = (3.756, 4.974)T . The third constraint is also created at the current step.
Thus, the set of constraints that are accumulated but not deleted before this
iteration is I (t) = {1, 2}. Since none of the constraints is deleted yet, D = ∅,
an empty set. We now proceed to Step 4 to find out the constraints to be
deleted.
Step 4(a) We consider constraints in the set I (t) ∪ {t} or {1, 2, 3}. For the
first constraint,
Thus, we include this constraint in the set D. We update the deletion set
D = {1}. We now check the second constraint for deletion and find that
p(2) (x(4) ) ̸> 0. Thus, we do not delete the second constraint. Similarly, we
find that p(3) (x(4) ) ̸> 0 and we do not delete it either. Before we move to
Step 4(b), we update the set
Note that the first constraint p(1) (x) ≥ 0 has been eliminated from further
consideration. As seen from Figure 4.21, the combination of the second
constraint (p(2) (x) ≥ 0), the third constraint (p(3) (x) ≥ 0), and the variable
bounds take care of the first constraint (p(1) (x) ≥ 0) at point x(4) . The cut-
deletion method helps in reducing the number of constraints to be carried
along, thereby reducing the effort in solving successive LP problems. Now, we
set t = 4 and move to Step 2. This completes the third iteration.
200 Optimization for Engineering Design: Algorithms and Examples
Step 2 At this step, we find the most violated constraint first. It is found
that the constraint g1 (x) is violated the most.
Step 3 Thus, we form the new cutting plane:
subject to
39.89 + 2.48x1 − 9.95x2 ≥ 0,
20 − 4x1 − x2 ≥ 0,
0 ≤ x1 , x2 ≤ 6.
Notice that the above problem contains the second, third, and fourth cutting
planes; the first cutting plane is eliminated. The solution to this LP problem
is x(5) = (3.763, 4.948)T with a function value f (x(5) ) = 8.462, which is now
very close to the actual minimum of the problem. We now move to Steps 4(a)
and 4(b) for deletion of constraints.
This process continues until the convergence criterion is met. With the
cut-deletion procedure, only a few constraints are used in each LP problem.
The cutting plane method described above can also be used to solve
convex, nonlinear objective functions with a simple modification. A new
variable x0 is introduced with an additional artificial constraint: x0 ≥ f (x).
Thus, the new NLP problem becomes as follows:
Minimize x0
subject to
gj (x) ≥ 0, j = 1, 2, . . . , J;
x0 − f (x) ≥ 0,
(L) (U )
xi ≤ x i ≤ xi , i = 1, 2, . . . , N.
EXERCISE 4.6.3
Since the Himmelblau function is not convex, we cannot use the modified
cutting plane method described above to solve the problem. To illustrate
the working of the cutting plane method on nonlinear objective functions, we
solve another problem with a convex objective function:
subject to
26 − (x1 − 5)2 − x22 ≥ 0,
20 − 4x1 − x2 ≥ 0,
x1 , x2 ≥ 0.
The minimum of this NLP problem lies at the point x∗ = (3, 2)T . Since the
original function has two variables, the addition of the artificial variable x0
takes the total number of variables to three. Thus, the transformed problem
becomes as follows:
Minimize x0
subject to
g1 (x) = 26 − (x1 − 5)2 − x22 ≥ 0,
g2 (x) = 20 − 4x1 − x2 ≥ 0,
x1 , x2 ≥ 0.
Minimize x0
subject to
0 ≤ x1 ≤ 6, 0 ≤ x2 ≤ 6, 0 ≤ x0 ≤ 30.
202 Optimization for Engineering Design: Algorithms and Examples
Step 2 At this step, we observe that the third constraint is violated the
most. Thus, we form the next cutting plane based on the third constraint.
Step 3 The new cutting plane is p(2) (x) = x0 − 6x1 − 8x2 + 59 ≥ 0. The
new LP problem is formed by adding this constraint and the feasible region
is shown in Figure 4.22. At this point, the LP problem is solved and a new
cutting plane is found.
This way, cutting planes can be formed at various points and resulting
feasible region narrows. Finally, the feasible region is small enough to find
the optimal solution. Cut-deletion method can be used to keep the number of
constraints limited to a reasonable size. Figure 4.22 shows a number of cutting
planes that surround the feasible region in a three-dimensional space. In the
x1 -x2 plane, the region ABCDEFA is found after three cutting planes are
formed. When more cutting planes are formed, the feasible region narrows
around the true minimum point. Although the cutting plane method can
be used to tackle nonlinear objective functions by the above method, it is
primarily used in problems with a linear objective function.
Constrained Optimization Algorithms 203
Figure 4.23 shows a point x(t) on a constraint surface g(x) ≥ 0. The feasible
region is shown hatched by dashed lines. Any direction d in the hatched region
is a feasible direction.
The algorithm begins with a random point. At this point, the set of
active constraints is found. If none of the constraints are active, the current
point is an intermediate point in the search space and the steepest descent
search direction is used. But if one or more search directions are active, the
current point is on at least one constraint boundary. Thus, any arbitrary
search direction (or the steepest descent direction) may not find a feasible
point. Thus, a search direction which is maximally feasible and descent is
found by solving an artificial LP problem. Once a search direction is found, a
unidirectional search is performed along that direction to find the minimum
point. This completes one iteration of Zoutendijk’s feasible direction method
(Zoutendijk, 1960).
Algorithm
Step 2 At the current point x(t) , let I (t) be the set of indices of active
constraints. In other words,
If I (t) is empty, use d(t) = θ(t) = −∇f (x(t) ), normalize d(t) and go to Step 4.
Maximize θ
subject to
∇f (x(t) )d ≤ −θ,
−1 ≤ di ≤ 1, i = 1, 2, . . . , N.
EXERCISE 4.7.1
subject to
Let us recall that the minimum point of the above problem lies at x∗ = (3, 2)T
with a function value equal to zero.
Step 1 Let us choose an initial feasible point x(0) = (0, 0)T and a tolerance
parameter ϵ = 10−3 . We also set the iteration counter t = 0.
Step 2 At this step, let us find the active constraints at point x(0) . It
turns out that only variable bounds are active. Calling these two inequality
constraints g3 (x) = x1 ≥ 0 and g4 (x) = x2 ≥ 0, we update the active
constraint set I (0) = {3, 4}. Since this set is not empty, we continue with
Step 3.
Maximize θ
subject to
d1 ≥ θ,
d2 ≥ θ,
−1 ≤ d1 , d2 ≤ 1.
Maximize θ
206 Optimization for Engineering Design: Algorithms and Examples
subject to
t1 − θ ≥ 1,
t2 − θ ≥ 1,
0 ≤ t1 , t2 ≤ 2.
Secondly, the simplex method can handle only equality constraints. Slack
variables are usually added or subtracted to convert inequality constraints to
equality constraints. Therefore, for each of the above constraints we add a
slack variable (y1 to y5 ). Thirdly, we observe that the problem variables and
the slack variables do not constitute an initial basic feasible solution (refer to
the Appendix for details). Thus, we introduce three more artificial variables
(y6 , y7 , and y8 ) to constitute an initial basic feasible solution for the first
phase of the dual phase LP method (see Section A.3 for details). Thus, the
underlying LP problem becomes as follows:
θ
Maximize
subject to
14t1 + 22t2 − θ − y1 + y6 = 36,
t1 − θ − y2 + y7 = 1,
(4.26)
t2 − θ − y3 + y8 = 1,
t1 + y4 = 2,
t2 + y5 = 2,
t1 , t2 , y1 , y2 , t3 , y4 , y5 , y6 , y7 , y8 ≥ 0.
Since all artificial variables must also be nonnegative, the solution to the above
problem would have y6 = y7 = y8 = 0, because the above objective function
at this point would be zero. This solution will then be a candidate solution for
the initial basic feasible solution of the problem presented in Equation (4.26).
The dual phase LP method is described in the Appendix. The successive
tables for the first problem of obtaining a feasible starting solution are shown
in Tables 4.5 to 4.8.
Constrained Optimization Algorithms 207
Table 4.5 The First Tableau for the First Phase of the Dual Phase LP Method
0 0 0 0 0 0 0 0 −1 −1 −1
cB Basic t1 t2 θ y1 y2 y3 y4 y5 y6 y7 y8
36
−1 y6 14 22 −1 −1 0 0 0 0 1 0 0 36 22
= 1.64
1
−1 y7 1 0 −1 0 −1 0 0 0 0 1 0 1 0
= ∞
1
−1 y8 0 1 −1 0 0 −1 0 0 0 0 1 1 1
=1 ←
2
0 y4 1 0 0 0 0 0 1 0 0 0 0 2 0
=∞
2
0 y5 0 1 0 0 0 0 0 1 0 0 0 2 1
=2
(∆f )q 15 23 −3 −1 −1 −1 0 0 0 0 0 f = −38
↑
Table 4.6 The Second Tableau for the First Phase of the Dual Phase LP Method
0 0 0 0 0 0 0 0 −1 −1 −1
cB Basic t1 t2 θ y1 y2 y3 y4 y5 y6 y7 y8
−1 y6 14 0 21 −1 0 22 0 0 1 0 −22 14 0.64 ←
1
−1 y7 1 0 −1 0 −1 0 0 0 0 1 0 1 0
=∞
0 t2 0 1 −1 0 0 −1 0 0 0 0 1 1 −ve
2
0 y4 1 0 0 0 0 0 1 0 0 0 0 2 0
=∞
1
0 y5 0 0 1 0 0 1 0 1 0 0 −1 1 1
=1
(∆f )q 15 0 20 −1 −1 22 0 0 0 0 −23 f = −15
↑
Note that the objective function value has improved considerably from
the previous iteration. Here, we also observe that the nonbasic variable y3
corresponds to the maximum value of the quantity (∆f )q . Thus, we choose y3
as the new basic variable. Using the minimum ratio rule, we also observe that
the current basic variable y6 must be replaced by the variable y3 (Table 4.7).
At the end of the third iteration, we observe that the nonbasic variable t1 must
replace the basic variable y7 . The objective function value at this iteration
208 Optimization for Engineering Design: Algorithms and Examples
Table 4.7 The Third Tableau for the First Phase of the Dual Phase LP Method
0 0 0 0 0 0 0 0 −1 −1 −1
cB Basic t1 t2 θ y1 y2 y3 y4 y5 y6 y7 y8
0 y3 0.64 0 0.95 −0.04 0 1 0 0 0.04 0 −1 0.64
−1 y7 1.00 0 −1.00 0.00 −1 0 0 0 0.00 1 0 1.00 ←
0 t2 0.64 1 −0.04 −0.04 0 0 0 0 0.04 0 0 1.64
0 y4 1.00 0 0.00 0.00 0 0 1 0 0.00 0 0 2.00
0 y5 −0.64 0 0.04 0.04 0 0 0 1 −0.04 0 0 0.36
(∆f )q 1.00 0 −1.00 0.00 −1 0 0 0 −1.00 0 −1 f = −1
↑
Ratios: 1 (first row), 1 (second row), 2.57 (third row), 2 (fourth row),
and –ve (fifth row)
is f = −1. We form the next row-echelon matrix in Table 4.8. At this stage,
we observe that all artificial variables are zero and the objective function is
also equal to zero. This is the termination criterion for the first phase of the
dual phase LP method. The solution of the above iteration is t1 = 1, t2 = 1,
y1 = 0, y2 = 0, y3 = 0, y4 = 1, and y5 = 1. This solution was not obvious in
the formulation of the problem presented in Equation (4.26).
Table 4.8 The Fourth Tableau for the First Phase of the Dual Phase LP Method
0 0 0 0 0 0 0 0 −1 −1 −1
cB Basic t1 t2 θ y1 y2 y3 y4 y5 y6 y7 y8
0 y3 0 0 1.59 −0.04 0.64 1 0 0 0.04 −0.64 −1 0
0 t1 1 0 −1.00 0.00 −1.00 0 0 0 0.00 1.00 0 1
0 t2 0 1 0.59 −0.04 0.64 0 0 0 0.04 −0.64 0 1
0 y4 0 0 1.00 0.00 1.00 0 1 0 0.00 −1.00 0 1
0 y5 0 0 −0.59 0.04 0.04 0 0 1 −0.04 0.64 0 1
(∆f )q 0 0 0.00 0.00 0 0 0 0 −1.00 −1.00 −1 f =0
We begin the second phase with the above solution as the initial solution.
The objective in the second phase is to maximize the original function:
f (x) = θ. Since the artificial variables are no more required, we discontinue
with them in subsequent computations. The row-echelon matrix is shown in
Table 4.9. The objective function value at this iteration is f (x) = 0. Table 4.9
shows that the basic variable y3 must be replaced by the nonbasic variable θ.
The new set of basic and nonbasic variables are formed in Table 4.10. At
this iteration, the objective function value does not change, but the algorithm
Constrained Optimization Algorithms 209
Table 4.9 The First Tableau for the Second Phase of the Dual Phase LP Method
0 0 1 0 0 0 0 0
cB Basic t1 t2 θ y1 y2 y3 y4 y5
0
0 y3 0 0 1.59 −0.04 0.64 1 0 0 0 1.59
=0 ←
0 t1 1 0 −1.00 0.00 −1.00 0 0 0 1 −ve
1
0 t2 0 1 0.59 −0.04 0.64 0 0 0 1 0.59
= 1.69
1
0 y4 0 0 1.00 0.00 1.00 0 1 0 1 1
= 1
0 y5 0 0 −0.59 0.04 0.04 0 0 1 1 −ve
(∆f )q 0 0 1.00 0.00 0.00 0 0 0 f (x) = 0
↑
moves into a new solution. The table shows that the nonbasic variable y1
must become basic. The minimum ratio rule suggests that the basic variable
y4 must be replaced with the new basic variable y1 .
Table 4.10 The Second Tableau for the Second Phase of the Dual Phase LP
Method
0 0 1 0 0 0 0 0
cB Basic t1 t2 θ y1 y2 y3 y4 y5
1 θ 0 0 1 −0.03 0.40 0.63 0 0 0 −ve
1
0 t1 1 0 0 −0.03 −0.60 0.63 0 0 1 0
=∞
0 t2 0 1 0 −0.03 0.40 −0.37 0 0 1 −ve
1
0 y4 0 0 0 0.03 0.60 0.63 1 0 1 0.03
= 35 ←
1
0 y5 0 0 0 0.03 −0.40 0.37 0 1 1 0.03
= 35
(∆f )q 0 0 0 0.03 −0.4 −0.63 0 0 f (x) = 0
↑
It is important to note that even though the ratio calculated in the right-
most column for the first row is zero, it is considered to be negative. In the
implementation of the linear programming method, care should be taken to
check the sign for both numerator and denominator. If they are of opposite
sign, that row must be excluded from consideration. We form the new row-
echelon matrix in Table 4.11. In this table, the quantity (∆f )q corresponding
to all nonbasic variables is nonpositive. Thus, we have obtained the optimum
solution and we terminate the linear programming method. Thus, the final
solution is t1 = 2, t2 = 2, θ = 1, y1 = 35, y2 = 0, y3 = 0, y4 = 0, and
y5 = 0. These values satisfy all constraints in the problem presented in
Equation (4.26). Now, we get back to Step 3 of the feasible direction search
method.
210 Optimization for Engineering Design: Algorithms and Examples
Table 4.11 The Third Tableau for the Second Phase of the Dual Phase LP Method
0 0 1 0 0 0 0 0
cB Basic t1 t2 θ y1 y2 y3 y4 y5
1 θ 0 0 1 0 1 0 1 0 1
0 t1 0 0 0 0 0 0 1 0 2
0 t2 0 1 0 0 1 −1 1 0 2
0 y1 0 0 0 1 21 −22 1 0 35
0 y5 0 0 0 0 −1 1 −1 1 0
(∆f )q 0 0 0 0 −1 0 −1 0 f (x) = 1
For example, the upper limit along d(0) can be found for the first constraint
by minimizing the unidirectional function:
Since the absolute value of the argument is always considered, the above
function allows only positive values. Since we are looking for points for which
the constraint has a value zero, those points correspond to the minimum value
of the above expression. Note that the problem described in Equation (4.28)
is a single-variable function. Thus, we first bracket the minimum and then
minimize the function. Using the bounding phase method from a starting
point α(0) = 5 and ∆ = 1, we obtain the bracketing interval (4, 6). Next, we
use the golden section search in that interval to obtain the minimum point with
three decimal places of accuracy: α1∗ = 5.098. The same solution can also be
obtained by solving the quadratic expression g1 (x(α)) = 0. Similarly, the limit
on the second constraint can also be calculated: α2∗ = 4.0. Other constraints
Constrained Optimization Algorithms 211
produce upper limits α3∗ = α4∗ = 0, which are not acceptable. Thus, the true
upper limit is α = 4.0.
Step 5 Once the lower and upper limit on α are found, we perform another
one dimensional search with the given objective function to find the minimum
point along that direction. Using the golden section search, we obtain the
minimum point in the interval (0, 4): α∗ = 2.541, which corresponds to the
new point x(1) = (2.541, 2.541)T . At this point, we increment the iteration
counter and go to Step 2. This completes one iteration of the feasible direction
method. The progress of this iteration is shown in Figure 4.24.
Step 2 At the new point, we find that no constraints are active. Thus,
I (1) = ∅, which means that the point is not on any constraint boundary and
we are free to search in any direction locally. Therefore, we choose the steepest
descent direction and the search direction is set according to the negative of
the gradient of the objective function at the new point:
d(1) = −∇f (x(1) ) = (16.323, −16.339)T ,
which is computed numerically. At this point the function value is f (x(1) ) =
8.0.
Step 4 Once again, we compute the upper limit along the search direction
d(1) . Posing the root-finding problem as an optimization problem as shown
in the previous iteration, we obtain the parameter α = min [0.162, 0.149] =
0.149.
Step 5 Performing a unidirectional search along d(1) in the domain (0, 0.149),
we obtain α∗ = 0.029. The corresponding point is x(2) = (3.018, 2.064)T with
an objective function value f (x(2) ) = 0.107.
212 Optimization for Engineering Design: Algorithms and Examples
This is needed in order to ensure that all variable values take nonnegative
values.
Thereafter, the objective function can be approximated to a quadratic
function by using Taylor’s series expansion principle and the constraints can
be approximated to linear functions. The following quadratic problem can
(t)
then be formed at the current point x′ :
(t) (t) (t) T (t)
Minimize q(x′ ; x′ ) = f (x′ ) + ∇f (x′ ) (x′ − x′ )
inequality constraints from here on. The linearized inequality constraints and
the variable bounds can be written as Ax′ ≥ b with
(t) (t) (t) (t)
∇g1 (x′ )T −g1 (x′ ) + ∇g1 (x′ )T x′
∇g2 (x′ (t) )T −g2 (x′ (t) ) + ∇g2 (x′ (t) )T x′ (t)
... ...
(t) (t) (t) (t)
A = ∇gJ (x′ )T , and b = −gJ (x′ ) + ∇gJ (x′ )T x′ ,
−I1 (U )
x 1 − x1
(L)
... ...
(U ) (L)
−In xn − xn
where Ii is a zero vector of size n with i-th element equal to one. Note that
the matrix A is of size (J + n) × n and vector b is of size (J + n). Also, we
compute the C-matrix (of size K × n) and the d-vector (of size K) as follows
from the set of equality constraints:
(t) (t) (t) (t)
∇h1 (x′ )T −h1 (x′ ) + ∇h1 (x′ )T x′
∇h2 (x′ (t) )T −h2 (x′ (t) ) + ∇h2 (x′ (t) )T x′ (t)
C= , and d = .
. . . . . .
(t) (t) (t) (t)
∇hK (x′ )T −hK (x′ ) + ∇hK (x′ )T x′
Let us also consider the quadratic form of the objective function, as
follows: 1 T
(t)
f (x′ ) = F + eT x′ + x′ Hx′ , (4.31)
2
(t)
where the constant term in the objective function is F = f (x′ ) −
T
′ (t) T ′ (t) ′ (t) ′ (t) ′ (t)
∇f (x ) x + 12 x H(x )x . The e-vector and H-matrix can be
computed by comparing the above equation with the quadratic approximation
of f (x′ ). Ignoring the constant term, the e-vector can be written as follows:
(t) (t) (t)
e = ∇f (x′ ) − H(x′ )x′ . (4.32)
The matrix H is simply the Hessian matrix of the quadratic approximation
of f with respect to x′ . Thus, the quadratic programming problem becomes
as follows:
1 T
Minimize F + eT x′ + x′ Hx′ ,
2
subject to
Ax′ − b ≥ 0,
′
Cx − d = 0 (4.33)
x′ ≥ 0.
214 Optimization for Engineering Design: Algorithms and Examples
The KKT optimality conditions for the above optimization problem can
be written as follows:
e + Hx′ − AT µ + C T ω − ν = 0, (4.34)
′
Ax − b ≥ 0, (4.35)
′
Cx − d = 0, (4.36)
′ ′
µi (Ax − b)i = 0, νi (x )i = 0, (4.37)
′
x ≥ 0, µ ≥ 0, ν ≥ 0, ω free. (4.38)
Here, the Lagrange multipliers µ ∈ RJ+n , ω ∈ RK , and ν ∈ Rn . The first
Equation (4.35) is called the equilibrium condition, the second and third
set of conditions (Equations (4.36) and (4.37)) are constraint satisfaction
conditions, the fourth set of conditions (4.38) are called complementarity
conditions and the fifth set of conditions (4.38) are strict nonnegativity of
variables and Lagrange multipliers. Notice that there is no sign restriction for
ω parameters. A solution (x′ , µ, ω, ν) that satisfies all the above conditions is
the solution to the quadratic programming problem given in Equation (4.33)
∗
under some regularity conditions. When the solution (x′ ) to the above system
∗
of equations is found, the objective function f (x′ ) can be computed using
Equation (4.31).
Notice also that the above KKT conditions are all linear functions of
variables (x′ , µ, ω, ν). This implies that we may attempt to find a solution to
the KKT conditions using a linear programming (LP) technique (Taha, 1989).
However, an LP problem requires a linear objective function that is usually
maximized and a set of linear equality or inequality constraints. Importantly,
all variables of the LP must also take nonnegative values. The first three sets
of conditions can be rewritten as follows:
and also the sign of AT µ term must be positive in the first set of constraints
in the above optimization problem. Both λ and q variables are needed to be
added in the left-hand side part of the equation. However, the cost coefficient
for q variables is only considered nonzero.
The usual simplex method (discussed in Section A.2 in the Appendix)
does not automatically satisfy conditions given in Equations (4.41) and
(4.42) among the variables. Thus, the simplex method needs to be changed
somewhat to satisfy these conditions. While we use the simplex method, the
following two sets of conditions must be satisfied when entering a new variable
in the set of basic variables. Since the basic variables take non-zero values,
we need to restrict two complementary variables to exist in the basic variable
set. In other words, if λk is present in the basic variable set, even if µk
variable is decided to enter the basic set from maximum (∆f )q consideration,
we shall not accept this variable. Instead, the non-basic variable having the
next highest (∆f )q will be chosen, provided its complementary variable does
not exist in the basic variable set. This process will continue till all the above
conditions are satisfied within the basic variables. We call this method as the
conditioned simplex method.
∗
After the solution x′ is found, the objective function must be computed
using Equation (4.31).
EXERCISE 4.8.1
subject to g1 (x) = x1 + x2 ≤ 2,
g2 (x) = 2x2 − x1 = 0,
x1 , x2 ≥ 0.
Hx + AT µ + C T ω − ν + p = −e,
Ax + λ + q = b,
Cx + r = d,
(x, µ, ω, ν, λ, p, q, r) ≥ 0,
4 −2 x1 1 −1 ν1 p1 −4
+ µ + ω − + = − ,
−2 2 x2 1 2 ν2 p2 −2
x1
[1 1] + λ + q = 2,
x2
x1
[−1 2] + r = 0,
x2
µλ = 0, ν1 x1 = 0, ν2 x2 = 0. (4.44)
0 0 0 0 0 0 0 −1 −1 −1 −1
cB Basic x1 x2 µ ω ν 1 ν2 λ p 1 p 2 q r
4
−1 p1 4 −2 1 −1 −1 0 0 1 0 0 0 4 −2
= −ve
2
−1 p2 −2 2 1 2 0 −1 0 0 1 0 0 2 2
=1
2
−1 q 1 1 0 0 0 0 1 0 0 1 0 2 1
=2
0
−1 r −1 2 0 0 0 0 0 0 0 0 1 0 2
=0 ←
(∆f )q 2 3 2 1 −1 −1 1 0 0 0 0 f (x) = −8
↑
The basic variables are p1 , p2 , q and r. It can be observed that the variable
r will get replaced by x2 in the next tableau (Table 4.13).
0 0 0 0 0 0 0 −1 −1 −1 −1
cB Basic x1 x2 µ ω ν 1 ν 2 λ p1 p 2 q r
4
−1 p1 3 0 1 −1 −1 0 0 1 0 0 1 4 3
= 1.33 ←
2
−1 p2 −1 0 1 2 0 −1 0 0 1 0 −1 2 −1
= −ve
2
−1 q 1.5 0 0 0 0 0 1 0 0 1 −0.5 2 1.5
= 1.33
0
0 x2 −0.5 1 0 0 0 0 0 0 0 0 0.5 0 −0.5
= −ve
(∆f )q 3.5 0 2 1 −1 −1 1 0 0 0 −1.5 f (x) = −8
↑
0 0 0 0 0 0 0 −1 −1 −1 −1
cB Basic x1 x2 µ ω ν1 ν2 λ p1 p2 q r
1.33
0 x1 1 0 0.33 −0.33 −0.33 0 0 0.33 0 0 0.33 1.33 −0.33
= −ve
3.33
−1 p2 0 0 1.33 1.67 −0.33 −1 0 0.33 2 0 −0.67 3.33 1.67
= 2
0
−1 q 0 0 −0.5 0.5 0.5 0 1 −0.5 0 1 −1 0 0.5
=0 ←
0.67
0 x2 0 1 0.17 −0.17 −0.17 0 0 0.17 0 0 0.67 0.67 −0.17
= −ve
(∆f )q 0 0 0.83 2.17 0.17 −1 1 −1.17 0 0 −2.67 f (x) = −3.33
↑
218 Optimization for Engineering Design: Algorithms and Examples
The solution here is still (x1 , x2 )T = (1.33, 0.67)T . Now, p2 gets replaced
by µ. Table 4.16 shows the tableau.
Algorithm
Step 1 Set an iteration counter t = 0. Choose an initial feasible point x(0)
and two termination parameters ϵ1 and ϵ2 .
Step 2 At the current point x(t) , make a coordinate transformation (in
terms of x′ ) as shown in Equation (4.29) and then a quadratic approximation
of f (x′ ). Also, make linear approximations of gj (x′ ) and hk (x′ ) at the point
(t)
x′ .
Step 3 Formulate the LP given in Equation (4.33) and solve using the
conditioned simplex method described above. Label the solution y(t) .
(t) (t)
Step 4 If ∥y(t) − x′ ∥ ≤ ϵ1 and |f (y(t) ) − f (x′ | ≤ ϵ2 , Terminate;
Else increment counter t = t + 1 and go to Step 2.
EXERCISE 4.8.2
Consider the constrained Himmelblau function again:
subject to
g1 (x) = (x1 − 5)2 + x22 ≤ 26,
x1 , x2 ≥ 0.
220 Optimization for Engineering Design: Algorithms and Examples
Let us recall that the minimum point of the above problem lies at x∗ = (3, 2)T
with a function value equal to zero.
Step 1 We choose an initial point x(0) = (1, 1)T and set iteration counter
t = 0. Also, we set ϵ1 = ϵ2 = 0.001 for termination.
Step 2 The above problem (with n = 2, J = 2, and K = 0) does not have
a specified upper bound on both variables. Thus, parameters µ and λ-vectors
have two (J = 2) elements each. Since the lower-bound of both variables
is also zero, no variable transformation is needed for this problem. Thus,
(t)
x′ = xt at every t.
The quadratic approximation of f (x) at x(0) yields the following e-vector
and H-matrix:
−28 −26 8
e= , H = .
−36 8 −10
Minimize −p1 − p2 − q1 − q2 ,
subject to
Hx + A µ − ν + p = −e,
T
Ax + λ + q = b, (4.45)
(x, µ, ν, λ, p, q) ≥ 0,
Figure 4.26 shows the two linear inequality constraints and the contour
of the quadratic objective function at this iteration. The optimal solution
((2.312, 10.750)T ) to this quadratic programming problem is shown in the
figure. However, we show the tableau of the conditioned simplex method to
solve the above problem in Table 4.17. First, variables p1 , p2 , q1 and q2 are
basic variables.
Constrained Optimization Algorithms 221
Figure 4.26 LP problem for the first iteration of the QP method is shown.
0 0 0 0 0 0 0 0 −1 −1 −1 −1
cB Basic x1 x 2 µ1 µ 2 ν 1 ν 2 λ 1 λ 2 p 1 p 2 q 1 q 2
28
−1 p1 −26 8 −8 4 −1 0 0 0 1 0 0 0 28 4
=7 ←
36
−1 p2 8 −10 2 1 0 −1 0 0 0 1 0 0 36 1
= 36
3
−1 q1 −8 2 0 0 0 0 1 0 0 0 1 0 3 0
=∞
20
−1 q2 4 1 0 0 0 0 0 1 0 0 0 1 20 0
=∞
(∆f )q −22 1 −6 5 −1 −1 1 1 0 0 0 0 f (x) = −87
↑
After this iteration, µ2 enters the basic variable set and p2 becomes non-basic.
Since λ2 is not present in the basic variable set, this inclusion is allowed.
Iteration 2 is applied next and the simplex is shown in Table 4.18.
Now, x1 enters the basic variable set. Since ν1 does not exist in the basic
variable set, this is also allowed. Notice how the function value increases from
−87 to −52 in one iteration of the simplex method. Iteration 3 is tabulated
in Table 4.19.
Now, x2 in place of q1 in the basic variable set. Again, this move is allowed.
The function value increases to −23. Iteration 5 is next and is shown in
Table 4.21.
Next, variable x1 enters the basic variable set. Iteration 6 is then executed
and is shown in Table 4.22.
Constrained Optimization Algorithms 223
where
∇b
h1 ∇h1
∇b ∇h2
h2
J = . , C= . .
. .
. .
∇b
hK ∇hK
gj (x) − xN +j = 0.
The variable xN +j takes positive values for feasible points and takes negative
values for infeasible points. If for a point xN +j is zero, the point lies on the
boundary. Thus, by adding an extra variable, each inequality constraint can
226 Optimization for Engineering Design: Algorithms and Examples
EXERCISE 4.9.1
We consider the constrained Himmelblau function to illustrate the working of
the above algorithm.
subject to
g1 (x) = 26 − (x1 − 5)2 − x22 ≥ 0,
g2 (x) = 20 − 4x1 − x2 ≥ 0,
x1 , x2 ≥ 0.
Here, both constraints are inequality constraints. Thus, we introduce two slack
variables (x3 and x4 ) to transform them to equality constraints. This strategy
of handling inequality constraints is known as the slack variable strategy.
Thus, the constrained Himmelblau’s problem is modified into the following
problem:
Minimize (x21 + x2 − 11)2 + (x1 + x22 − 7)2
subject to
h1 (x) = 26 − (x1 − 5)2 − x22 − x3 = 0,
h2 (x) = 20 − 4x1 − x2 − x4 = 0,
x1 , x2 , x3 , x4 ≥ 0.
In this problem, there are four decision variables and two constraints (that is
N = 4 and K = 2). Therefore, there must be two basic and two nonbasic
variables.
(0) (0)
Step 1 We choose an initial point x1 = 1 and x2 = 2. Since we require
an initial feasible solution, we find two other variables so that both constraints
are satisfied. We obtain the initial point x(0) = (1, 2, 6, 14)T . We choose all
termination parameters equal to 10−3 and set the iteration counter t = 0.
Constrained Optimization Algorithms 229
Step 2 At this step, we determine which two variables would be best for
basic variables. Since the upper limits on neither of the variables x1 and x2
is specified, we assume some bounds. Let us assume that variables x1 and x2
lie in the domain (0, 5). In this range, the minimum and maximum values of
variable x3 computed from the first constraint are 1 and 26, respectively. For
variables x4 they are −5 and 20, respectively, as obtained from the second
constraint. Thus, we compute the vector y = {0.2, 0.4, 0.2, 0.24}. We choose
variables x2 and x4 as basic variables, since they correspond to larger values
in the vector y. Thus, the first and third variables are chosen as nonbasic
variables. Next, we compute the gradient of the constraints to form J and C
matrices. By using numerical differentiation (Equation (3.4)), we obtain the
gradient of the constraints at the initial point: ∇h1 (x(0) ) = (8, −4, −1, 0)T and
∇h2 (x(0) ) = (−4, −1, 0, −1)T . The matrix J is formed with the basic variables
as rows and the matrix C is formed with the nonbasic variables as rows:
( ) ( )
−4 0 8 −1
J= , C= .
−1 −1 −4 0
The first-order derivative of the objective function is computed numerically
at the point x(0) : ∇f (x(0) ) = (−36, −32, 0, 0)T . Thus, the basic and nonbasic
vectors of this gradients are ∇fb = (−32, 0) and ∇f = (−36, 0), respectively.
Therefore, we compute the reduced gradient by using Equation (4.46) and by
calculating the inverse of the matrix J:
( )( )
−0.25 0 8 −1
∇fe(x(0) ) = (−36, 0) − (−32, 0)
0.25 −1 −4 0
= (−100, 8).
The above vector is a 1 × (N − K) or 1 × 2 row vector.
Step 3 Since the magnitude of the reduced gradient is not small, we
compute the direction vector d(0) . The nonbasic component of the direction
vector d is computed first. Since none of the nonbasic variables are at their
boundaries at the current point, we assign d = (−∇fe)T = (100, −8)T . The
vector d is a (N − K) × 1 column vector and the vector db is a K × 1 column
vector, computed as follows: db = −J −1 Cd = (202, −602)T . Thus, the
overall direction vector is d = (100, 202, −8, −602)T . At this stage, we use
the modified version of Step 4, because one of the constraints is nonlinear and
the new point generated along this direction after a unidirectional search may
not be feasible. In order to reduce the computational effort in each step, we
follow the modified Step 4.
Step 4 We first set a step factor α = 0.015. Thereafter, we set i = 1.
(a) The new point is v (1) = x(0) + 0.015d and is found to be the point
(2.500, 5.030, 5.880, 4.970)T . The constraint values are h1 (v (1) ) = −11.430
and h2 (v (1) ) = 0.000. The linear constraint is always satisfied.
230 Optimization for Engineering Design: Algorithms and Examples
Since all constraint violations are not small, we go to Step (b). The points
x(0) and v (1) and the search direction d(0) are shown in Figure 4.28.
(b) The new basic variables are calculated using Equation (4.47) vb(2) =
vb(1) − J −1 (v (1) ) · h(v (1) ). Before we compute the new point, we calculate the
matrix J at the current point v (1) and then compute the inverse of the matrix.
The new J matrix and its inverse are as follows:
( )
−10.060 0.000
J(v (1) ) =
−1.000 −1.000
( )
−1 (1) 1 −1.000 0.000
J (v )= .
10.060 1.000 −10.060
Thus, the new point is calculated as follows:
( ) ( ) ( )
(2) 5.030 1 −1.000 0.000 −11.430
vb = − · ,
4.970 10.060 1.000 −10.060 0.000
( )
3.894
= .
6.106
The nonbasic variables are the same as before: v (2) = (2.500, 5.880)T . Using
the above transformation, only basic variables are modified so that along with
nonbasic variables they satisfy both equality constraints.
Constrained Optimization Algorithms 231
(c) The two points v (1) and v (2) are not close to each other, thus we
increment the counter i = 2 and go to Step (b).
The nonbasic variables remain the same as before. Thus the new point is
v (3) = (2.500, 3.728, 5.880, 6.272)T with a violation of the first constraint of
only −0.028. Figure 4.28 shows how the infeasible point v (1) is made feasible.
Since the variable x2 is a basic variable and the variable x1 is a nonbasic
variable, the infeasible point is only modified on the variable x2 to make it
feasible. One may wonder as to why the Newton-Raphson method could not
find a point on the constraint g1 (x). Recall that the original problem has
now been changed to a four-variable problem. A plot of only two variables
does not reveal the true scenario. The point v (3) actually lies close to the
intersection of the constraint boundaries in four variables. At the end of this
step, we move to Step (d).
(d) The function value at the initial point is f (x(0) ) = 68 and at the current
point is f (v (3) ) = 88.81, which is worse than the initial point. Thus, we reduce
the step parameter: α = 0.5α0 = 0.0075. We set i = 1 and go to Step (a).
(a) The new point is v (1) = x(0) + αd = (1.750, 3.515, 5.940, 9.485)T . The
constraint violations are h1 (v (1) ) = −2.875 and h2 (v (1) ) = 0.0. The point v (1)
is also shown in Figure 4.28, close to the constraint g1 (x). The figure shows
that the point is feasible as far as the original two constraints g1 (x) and g2 (x)
(in two-variables) are concerned. But this point is not feasible for the two
modified equality constraints h1 (x) and h2 (x). Computing the J matrix at
this point, we find the new point v (2) = (1.750, 3.108, 5.940, 9.892)T . The
violation of the first constraint reduces to a value −0.162. One more iteration
yields the point
v (3) = (1.750, 3.082, 5.940, 9.918)T .
At this point, the constraint violation is below the permissible limit. Thus,
we move to Step (d) and compare the function values.
(d) The function value at the point v (3) is f (v (3) ) = 41.627, which
is less than that at x(0) . Thus, we accept this point x(1) = (1.750,
3.082, 5.940, 9.918)T and move to Step 2. This completes one iteration of the
generalized reduced gradient method.
232 Optimization for Engineering Design: Algorithms and Examples
than a specified small number, either the optimum is reached or the active
constraints at the current point are orthogonal to each other. In the latter
case, one active constraint is excluded from the constraint set and a new search
direction is found. Once a feasible search direction is found, a unidirectional
search is performed to obtain the minimum point along that direction. The
difficulty with this technique is that in the case of nonlinear constraints,
the intermediate points in the unidirectional search may not fall on the
constraint boundaries. Thus, in order to make the search points feasible, the
unidirectional search method is modified. Every point created in the search is
projected onto the intersection surface of the constraints in order to make the
point feasible. Since this may require a number of function evaluations, only
three feasible points are chosen along the search direction and the quadratic
interpolation search method described in Chapter 2 is used to obtain a guess
for the optimum point along that direction. The search is continued from the
new point. This process continues until no better solution could be found.
In the following, we describe the algorithm and then show hand-
calculations of a few iterations of this algorithm on the constrained
Himmelblau’s function.
Algorithm
Step 1 Choose an initial point x(0) , termination factors ϵ1 and ϵ2 , and an
iteration counter t = 0.
Step 2 Evaluate all inequality constraints at x(t) to identify the active set
Calculate the matrix A at the point w(t) and the vector H is the constraint
values of all equality and active inequality constraints at the point w(t) . Then
use a quadratic interpolation to estimate α∗ .
Step 7 Set x(t+1) = w(α∗ ), t = t + 1, and go to Step 2.
The difficulty with this method is that the matrix (AAT )−1 is evaluated
at every iteration and every new infeasible point needs to be projected on the
constraint surface. If the optimum lies on only one of the constraints, this
method makes the search faster.
EXERCISE 4.10.1
We illustrate the working of this algorithm on the constrained Himmelblau
function.
Minimize (x21 + x2 − 11)2 + (x1 + x22 − 7)2
subject to
g1 (x) = 26 − (x1 − 5)2 − x22 ≥ 0,
g2 (x) = 20 − 4x1 − x2 ≥ 0,
g3 (x) = x1 ≥ 0,
g4 (x) = x2 ≥ 0.
Step 1 We choose an initial point x(0) = (0, 0)T , all termination factors
ϵ1 = ϵ2 = 10−3 , and set the iteration counter t = 0.
Step 2 At the initial point, two constraints (g3 and g4 ) are active. Thus,
I (0) = {3, 4}.
Step 3 At this step, we form the matrix A from the gradient of the active
constraints: g3 (x(0) ) = (1, 0)T and g4 (x(0) ) = (0, 1)T . Thus the matrix A is
formed as follows:
( )
1 0
A= ,
0 1
which is an identity matrix. We observe that the matrix (AAT )−1 is also an
identity matrix. We now compute the projection matrix as follows:
( ) ( )( )( ) ( )
1 0 1 0 1 0 1 0 0 0
P = − = .
0 1 0 1 0 1 0 1 0 0
Constrained Optimization Algorithms 235
The gradient vector of the objective function at the current point is ∇f (x(0) )
= (−14, −22)T . Thus, the search direction is s = P ∇f (x(0) ) = (0, 0)T , a zero
vector.
Step 4 Since ∥s∥ = 0, we calculate the constraint multipliers:
( )( ) ( )
1 0 −14 −14
u= = .
0 1 −22 −22
Since u2 is most negative, the constraint g4 (x) is excluded from the active
constraint set. Thus, we update the set I (0) = {3} and we move to Step 3.
The set I (0) is not empty; we construct a new A matrix.
Step 3 The matrix A now has only one row: A = (1, 0). The projection
matrix is calculated as
( )
0 0
P = .
0 1
The search direction is computed by multiplying the negative of the gradient
vector ∇f (x(0) ) with this projection matrix: s = (0, 22)T .
Step 4 The norm of the search vector s is not small. Thus, we move to
Step 5.
Step 5 At this step, we have to determine the maximum permissible step
size for the other three constraints. We find a generic point along the search
direction s as
w(α) = x(0) + αs = (0, 22α)T .
For the first constraint, the maximum permissible α can be calculated by
solving g1 (w(α)) = 0. This is a root-finding problem which can be formulated
as an optimization problem described in Section 2.6. Using that technique,
we obtain the solution α1 = 0.045. Similarly, for the second and fourth
constraints, the corresponding limits are α2 = 0.909 and α4 = 0, respectively.
Since α4 is zero, we find the minimum of only other two values and obtain
αmax = 0.045.
Step 6 Thus, the bounds of the search parameter α are 0 and
0.045, respectively. We take three points along the search direction
to estimate the minimum point along that direction. Let us say the
chosen values for α are α1 = 0.000, α2 = 0.020 and α2 = 0.045. We
observe that the corresponding points (w(α)) are w1(0) = (0.00, 0.00)T ,
w2(0) = (0.000, 0.440)T , and w3(0) = (0.000, 1.000)T , respectively. Knowing
that the constraint g3 (x) is the only active constraint being considered in
this iteration, we check each of these points for feasibility. Since the active
constraint g3 (x) is linear, all points along the projected search direction s lie
on the constraint. This can be verified by projecting any of the above points
onto the constraint surface g3 (x). Knowing the matrices
A = (1, 0), H = (0.000)T ,
236 Optimization for Engineering Design: Algorithms and Examples
we project the second point w2(0) onto the constraint surface g3 (x):
Thus, we get back the same point w2(0) . Similarly, other points can also
be shown to lie on g3 (x) after projection. The function values at these
points are f (w1) = 170.000, f (w2) = 157.841, and f (w3) = 136.000. With
these function values, we now have to estimate the minimum point along s in
the range α ∈ (0, 0.045). We observe that the function monotonically reduces
with the increase in α. Assuming that there is only one minimum point along
s, we conclude that the minimum lies at α∗ = 0.045 or at the point w3(0) .
Step 2 At this step, there are two active constraints: I (1) = {1, 3}, as seen
from Figure 4.29.
Constrained Optimization Algorithms 237
Step 4 Since the search direction is a zero vector, we calculate the constraint
multipliers. The gradient of the objective function at this point is
The constraint multipliers are u = (22, −232)T . Thus, we eliminate the third
constraint from the active set and recalculate the projection matrix at Step 3.
Step 3 The matrix A is now A = (10, −2) and the corresponding projection
matrix is
( )
0.029 0.192
P = .
0.192 0.962
The search vector is s = (8.922, 44.613)T , which is shown in Figure 4.29 with
a dashed arrow from x(1) .
Step 5 Like in the previous iteration, we compute the upper bounds for
the second, third, and fourth constraints: α2 = 0.237, α3 = 0.000, and
α = −0.020. The negative value of α4 signifies that the constraint g4 (x) can
be found in the negative s direction. The latter two values are not accepted;
thus we set αmax = 0.237.
( )
0.446
w2(1) = w2(0) − AT (AAT )−1 H =
3.231
( )( ( ))−1
9.108 9.108
− (9.108, −6.462) (−5.178),
−6.462 −6.462
( )
0.824
= .
2.963
at this point, we again search along the steepest descent direction, and obtain
the next point x(4) = (3.148, 1.752)T with a function value f (x(4) ) = 1.054.
Continuing this process further will lead to the optimum solution. Figure 4.29
shows the progress of these iterations. In general, the search process continues
along the boundary of the feasible region. If the optimum point is on the
boundary, the optimum is eventually found by traversing from one boundary
to another. On the other hand, if the optimum is inside the feasible region,
the search deviates from the boundary after some iterations and follows the
steepest descent directions through the feasible region.
Although the active constraint strategy is used in this exercise problem,
the slack variable strategy could also be used to handle inequality constraints.
Similarly, the active constraint strategy can also be used equally efficiently in
the reduced gradient method. The active constraint strategy requires a book-
keeping of active constraints at each iteration, but the slack variable strategy
usually requires more computational time owing to the use of a larger number
of variables. Thus, for a few inequality constraints, the slack variable strategy
may be efficient; otherwise active constraint strategy can be used.
4.11 Summary
In this chapter, we have presented a number of nonlinear programming
methods for constrained optimization. To begin with, the Kuhn-Tucker
conditions for optimality have been discussed. Kuhn-Tucker conditions for
constrained optimization problems are derived on the basis of unconstrained
optimization of the corresponding Lagrange function. Kuhn-Tucker points are
those points which satisfy all Kuhn-Tucker conditions. It turns out that Kuhn-
Tucker points are likely candidates for constrained optimal points. The Kuhn-
Tucker necessity theorem helps identify non-optimality of a point. Although
the Kuhn-Tucker sufficiency theorem helps to identify the optimality of a
point, the theorem can only be applied for a limited class of constrained NLP
problems.
Optimization algorithms described in this chapter are divided into two
broad categories—direct search methods and gradient-based methods. Among
the direct search methods, the penalty function method is most widely used.
In the penalty function method, infeasible solutions are penalized by adding
a penalty term in relation to the amount of the constraint violation at
that solution. There are primarily two types of penalty functions—interior
penalty functions which penalize only feasible solutions close to constraint
boundaries and exterior penalty functions which penalize infeasible points.
The interior penalty methods require an initial feasible solution, whereas the
exterior penalty methods do not require the point to be feasible. This is why
exterior penalty methods are more popular than interior penalty methods.
The bracket-operator exterior penalty function method has been applied to a
constrained Himmelblau function. The results have shown that even though
the penalty function method solves the above function to optimality, it distorts
240 Optimization for Engineering Design: Algorithms and Examples
straightforward, it may not work if the initial point is far away from the true
optimum point, because the linearized constraint and the objective function
may be very different than the true functions at a point away from the current
point. The cutting plane method begins with a large search space formed by
linear constraints. The resulting LP problem is solved and a linear cutting
plane is formed using the most violated constraint at the resulting point.
This is how cutting planes are formed at each iteration to reduce the initial
large search space into the shape of the true feasible region. It has been
proved elsewhere (Kelly, 1960) that for convex objective functions and feasible
regions, the cutting planes do not exclude any part of the true feasible region.
One way to handle nonlinear objective functions is also discussed. Since the
cutting plane method always surrounds the true feasible search space with
linear planes and the solution is obtained by using those linear planes as
constraints, the obtained solution is always infeasible. Thus, sufficiently large
number of iterations are required so that the solution, although infeasible, is
very close to the true optimum solution. This method is not very efficient if
the true optimum point is not a boundary point. A local exhaustive search
can then be performed to find the true optimum solution.
Next, we have described quadratic programming (QP) algorithm which
makes a quadratic approximation of the objective function and a linear
approximation of all constraints. It turns out that the resulting problem
can be solved using the LP technique. The sequential quadratic programming
(SQP) method has been described thereafter.
The feasible direction method starts its search along a feasible direction
from an initial feasible point. At each point, the active constraints are first
found. Thereafter, a search direction is found by maximally satisfying the
descent and feasibility properties of a direction. This requires solving a linear
programming problem. If the point does not make any constraint active,
the steepest descent direction is used. When a search direction is found,
a unidirectional search is adopted and a new point is found. This method
is largely known as Zoutendijk’s method. If the true optimum falls on a
constraint boundary, this method may require a number of iterations before
converging close to the optimum point.
The generalized reduced gradient method is a sophisticated version of
the variable elimination method of handling equality constraints. Some
variables (called basic variables) are expressed in terms of other variables
(called nonbasic variables) by using the equality constraints. The gradient
of the objective function at any point is then expressed in terms of only
nonbasic variables. This gradient is called the reduced gradient. Thus, this
method is similar in principle to the steepest descent method except that
some variables are expressed in terms of other variables. Although a method
to handle inequality constraints is discussed, this algorithm works very well
for NLP problems with equality constraints only.
242 Optimization for Engineering Design: Algorithms and Examples
REFERENCES
PROBLEMS
0 ≤ x1 , x2 ≤ 3,
find whether the following points are feasible. If they are feasible, which of
the above constraints are active?
(i) (0, 1)T .
(ii) (1, 4)T .
(iii) (2, 1)T .
(iv) (3, 0)T .
4-2 Write down the Kuhn-Tucker conditions for the above NLP problem.
Check whether the above points are K-T points.
4-3 Consider the constrained optimization problem:
Minimize 10x21 + 2.5x22 − 5x1 x2 − 1.5x1 + 10
subject to
x21 + 2x22 + 2x1 ≤ 5.
Find whether any of the following points are likely candidates of the optimum
point:
(i) (0, 0)T .
(ii) (0.1, 0.1)T .
(iii) (2, 1)T .
4-4 Identify whether the points
(i) (0, 6)T ,
(ii) (1.5, 1.5)T ,
(iii) (2, 2)T
are optimal points to the following NLP problem:
Minimize x21 + x22 − 10x1 + 4x2 + 2
subject to
x21 + x2 − 6 ≤ 0,
x 2 ≥ x1 ,
x1 ≥ 0.
244 Optimization for Engineering Design: Algorithms and Examples
4-5 Write down the Kuhn-Tucker conditions for the following problem:
subject to
2x1 + x2 = 4,
x1 ≥ 0.
Find out whether points (0, 4)T and (3.4, −2.8)T are Kuhn-Tucker points.
How would the maximum function value change if the equality constraint is
changed to the following: 2x1 + x2 = 6?
4-6 In an NLP problem, the following constraints are used:
g2 (x) = x1 + 2x2 − 12 ≤ 0,
g3 (x) = 2x1 + x2 + 4 ≥ 0.
The current point is (0, 0)T and a search direction db = (2, 1)T is found. Find
the minimum and maximum allowable bounds along this direction.
4-7 Consider the following constrained Himmelblau’s function:
subject to
(x1 − 5)2 + x22 ≤ 26
4x1 + x2 ≤ 20
x 1 , x2 ≥ 0
How does the solution change if a constraint x ≥ 0.7 is added? Form the
primal and dual problem in each case and solve.
Constrained Optimization Algorithms 245
subject to 2x1 + x2 ≤ 2,
develop the dual function, maximize it and find the corresponding point in
x-space.
4-10 For the primal problem
Minimize (x1 − 2)2 + (x2 − 1)2 ,
subject to
x1 + x2 − 2 ≤ 0,
x21 − x2 ≤ 0,
construct the dual problem and find the dual solution.
4-11 The primal problem is to minimize x3 such that x ≥ 0. Find the dual
solution.
4-12 For the problem
Maximize f (x) = x2 , subject to 0 ≤ x ≤ 1,
show that there exists an infinite duality gap for this problem (Bector,
Chandra and Dutta, 2005).
4-13 A thin right circular conical container (base diameter 10 cm and height
12.5 cm) is cut by a plane x + y + 1.5z = 10 (the origin is assumed to be at
the centre of the base circle). Find the point on the cut surface closest to the
apex of the cylinder using the variable elimination method.
4-14 Find the point on the ellipse defined by the intersection of the surfaces
x + y = 1 and x2 + 2y 2 + z 2 = 1 and is nearest to the origin. Use the Lagrange
function method.
4-15 The intersection of the planar surface x + y + 4z = 2 and a cone
x2 + 2y 2 + z 2 = 1 creates an ellipse.
(i) Formulate an optimization problem to locate a point on the ellipse
which is nearest to the origin.
(ii) Write and solve the resulting KKT conditions to find the nearest point.
4-16 Consider the NLP problem:
Minimize f (x) = 10 + x2 − 8x
subject to
x ≥ 6.
Use the penalty function method with the following penalty terms to find the
constrained minimum point of the above problem:
246 Optimization for Engineering Design: Algorithms and Examples
(i) Bracket operator penalty term with the following values of R: 0.01,
0.1, 1.0, 10.0, 100.0, and ∞.
(ii) Inverse penalty term with following values of R: 1,000.0, 100.0, 10.0,
1.0, 0.1, and 0.01.
Form the penalized function and use the exact differentiation technique to
compute the optimal point in each sequence.
4-17 Consider the following optimization problem:
subject to
x1 − x2 − 2 = 0,
x1 + x2 − 0.5 ≤ 0.
(t) (t)
t σ (t) τ (t) x1 x2 f (t)
4-19 Solve Problem 4-18 using the method of multiplier technique. Use
five iterations of the MOM technique with R = 1. Compare the resulting
multiplier σ1 with the Lagrange multiplier u1 obtained using the Kuhn-Tucker
conditions.
4-20 Consider the NLP problem:
subject to
25 − x21 − x22 ≥ 0,
x21 + x2 ≤ 9,
0 ≤ x1 ≤ 5,
0 ≤ x2 ≤ 10.
(i) Set up and solve the first subproblem for the cutting plane method.
(ii) Calculate the cutting plane at the resulting point.
(iii) Set up and solve the second subproblem and generate the next cut.
Minimize f (x),
subject to
g(x) ≥ 0.
4-22 Use two iterations of the cutting plane method to solve the following
problem:
Maximize f (x) = x2
subject to
2x1 − x2 ≥ 1,
x1 , x2 ≥ 0.
(i) The feasible region bounded by two linear inequality constraints (g1
and g2 ) and one nonlinear inequality constraint (circle) is shown
shaded. If point A is the current point, can you draw the cutting plane
which will be formed from these three constraints?
(ii) If x1 is considered as a nonbasic variable in a two-variable minimization
problem having one equality constraint (h(x) = 0), indicate the
reduced gradient direction in the figure at the point x(t) indicated.
The gradient direction at this point is shown.
(iii) Using the reduced gradient method, the point w is found from x(t)
on the equality constraint h(x) = 0. How would this point be
made feasible if x1 is considered as the basic variable? Show the
corresponding feasible point on the figure.
(iv) The minimum distance of a point A from a given circle (x21 + x22 = a2 )
is needed to be found. If an equivalent minimization problem is formed
with one equality constraint (h(x1 , x2 ) = x21 + x22 − a2 = 0), what is
the sign of the Lagrange multiplier of this constraint at the minimum
solution? What would be the sign of the Lagrange multiplier if point
B is used, instead of A?
(v) For the feasible direction method applied to a minimization problem,
the gradient direction is shown at the current point. What will be the
resulting search direction?
(vi) In another minimization problem by the feasible direction method, the
location of the current point is shown. Indicate the region on which the
resulting search direction must lie.
g2 (x) = x1 + 2x2 − 12 ≤ 0,
g3 (x) = 2x1 + x2 + 4 ≥ 0.
The current point is (0, 0)T and a search direction db = (2, 1)T is found. Find
the minimum and maximum allowable bounds along this direction.
4-26 Consider the problem:
subject to
2 ≤ x ≤ 3.
Use Zoutendijk’s feasible direction method to solve the above problem. Use
a starting point x(0) = 2. Show the proceedings of the algorithm on a f (x)-x
plot.
4-27 For an optimization problem, the following constraint set is given:
4-28 Complete one iteration of the reduced gradient technique to find the
point x(1) in solving the following NLP problem:
subject to
x21 + x22 = 5.
250 Optimization for Engineering Design: Algorithms and Examples
Use a starting feasible solution x(0) = (1, −2)T , initial α = 0.25, and all
termination factors equal to 0.01. Show the intermediate points on an x1 -x2
plot.
4-29 Repeat Problem (4.28) using the gradient projection method.
4-30 We would like to solve the following problem:
subject to
x21 + 4x2 − 20 = 0.
We would like to use the generalized reduced gradient method (GRG) to solve
the problem using x2 as the basic variable and using x(0) = (2, 4)T .
subject to
x1 − x2 − 2 = 0,
x1 + x2 − 0.5 ≤ 0.
subject to
x21 + 4x2 − 20 ≤ 0,
using the sequential quadratic programming (SQP) method. Start from x(0) =
(1, 1)T .
Constrained Optimization Algorithms 251
subject to
2x1 − x2 ≥ 1,
x1 , x2 ≥ 0.
cos(x) ≥ 0,
0 ≤ x ≤ 2π.
4-37 Use the gradient projection method to find the optimal solution of the
following NLP problem:
Minimize f (x, y) = x2 + y 2 − 6x − 2y + 2
subject to
y ≤ x,
x + 5y ≤ 15,
y ≥ 0.
Begin your search from x(0) = (0, 0)T . Use the active constraint strategy.
Show each step of the algorithm clearly and show the intermediate solutions
on an x-y plot.
Help: Projection matrix is (I − AT (AAT )−1 A) and the Lagrange multiplier
vector is (AAT )−1 A∇f .
4-38 In Problem (4-28), the equality constraint is changed into an inequality
constraint as follows:
Minimize x21 + 2x22
subject to
g(x) = x21 + x22 ≥ 5.
252 Optimization for Engineering Design: Algorithms and Examples
subject to
h(x) = x21 + x22 + x23 − 1 = 0.
Recognizing that both the objective function and constraint surface are
spherical, calculate the optimum point using elementary calculus and
analytical geometry. In order to use the generalized reduced gradient (GRG)
method on this problem, we choose to use x = (x1 , x2 ) and x b = (x3 ) as
nonbasic and basic variables, respectively. For a feasible initial point x(0) , the
nonbasic variable is x(0) = (0.3,0.3).
4-40 In trying to solve the following problem using the feasible direction
method at the point x(5) = (1, 1)T , a direction vector d(5) = (1, 1/7)T is
obtained.
Maximize (x1 − 1.5)2 + (x2 − 4)2
subject to
4.5x1 + x22 ≤ 18,
2x1 − x2 ≥ 1,
x1 , x2 ≥ 0.
Use two iterations of the golden section search method to bracket the minimum
point along the direction d(5) . Assume the mid-point of that interval as the
new point x(6) . Create a search direction at the new point x(6) .
4-41 Consider the following optimization problem:
subject to
3 − x1 − x2 ≥ 0,
10x1 − x2 − 2 ≥ 0.
Starting from x(0) = (0.2, 0)T , perform one iteration of the feasible direction
method. Show that the search direction obtained by the feasible direction
method is within the region bounded by the steepest descent and the most
feasible directions. Solve the resulting LP problem graphically.
4-42 Complete one iteration of the gradient projection method to find the
point x(1) for the following NLP problem:
subject to
x21 + x22 = 5.
Use a starting feasible solution x(0) = (1, 2)T , an initial α = 0.2, and a
termination factor equal to 0.01. Show intermediate points on an x1 -x2 plot.
4-43 Repeat Problem 4-42 using the generalized reduced gradient method.
Handle the constraint using the slack variable strategy. Use other parameters
same as that in Problem (4-28).
COMPUTER PROGRAM
subroutine steepest(n,eps,epss,x0,xstar,fstar,
- nfun,ierr,grad0,s0,x1,xdummy,iprint)
c.....steepest Descent method
c.....n : dimension of design vector
c.....eps : accuracy in steepest-descent method
c.....epss: accuracy in golden section search
c.....x0 : initial design vector
c.....xstar : final design solution (output)
c.....fstar : final objective function value (output)
c.....nfun : number of function evaluations required
c.....ierr : error code, 1 for error
c.....rest all are dummy variables of size n
implicit real*8 (a-h,o-z)
dimension x0(n),xstar(n),grad0(n),s0(n),x1(n),
- xdummy(n)
maxiter = 10000
k = 0
c.....step 2 of the algorithm
2 call fderiv(n,x0,grad0,f0,nfun,xdummy)
if (iprint .eq. 1) then
write(*,*) ’----------------------------’,
- ’----------------------------’
write(*,8) k,(x0(i),i=1,n)
256 Optimization for Engineering Design: Algorithms and Examples
8format(2x,’Iteration: ’,i5,/,5x,
- ’Solution vector: ’,4(1pe12.4,’,’))
write(*,9) f0,nfun
9 format(5x,’Function value : ’,1pe13.4,
- ’ Function Eval. : ’,i7)
endif
c.....step 3 of the algorithm
call unitvec(n,grad0,gradmag)
if ((gradmag .le. eps) .or. (k .ge. maxiter)) then
do 11 i = 1,n
11 xstar(i) = x0(i)
fstar = f0
return
endif
c.....step 4 of the algorithm
do 10 i = 1,n
s0(i) = -1.0 * grad0(i)
10 continue
call bphase(n,x0,s0,a,b,nfun,xdummy)
call golden(n,x0,s0,a,b,epss,alfastr,nfun,ierr,
- xdummy)
sum = 0.0
do 12 i = 1,n
x1(i) = x0(i) + alfastr * s0(i)
if (dabs(x0(i)) .gt. eps) then
sum = sum + dabs(x1(i) - x0(i))/x0(i)
else
sum = sum + dabs(x1(i) - x0(i))/eps
endif
12 continue
call funct(n,x1,f1,nfun)
c.....step 5 of the algorithm
if (sum .le. eps) then
do 14 i = 1,n
xstar(i) = x1(i)
14 continue
fstar = f1
return
else
k = k + 1
do 15 i = 1,n
x0(i) = x1(i)
15 continue
go to 2
endif
end
Constrained Optimization Algorithms 257
subroutine golden(n,x,s,a,b,eps,xstar,nfun,ierr,
- xdummy)
c.....golden section search algorithm
c.....finds xstar such that f(x + xstar * s) is minimum
c.....x : solution vector
c.....s : direction vector
c.....a,b : lower and upper limits
c.....nfun : function evaluations
c.....ierr : error code, 1 for error
c.....xdummy : dummy variable of size n
implicit real*8 (a-h,o-z)
real*8 lw,x(n),s(n),xdummy(n)
c.....step 1 of the golden section search
xstar = a
ierr=0
maxfun = 10000
aw=0.0
bw=1.0
lw=1.0
k=1
c.....golden number
gold=(sqrt(5.0)-1.0)/2.0
w1prev = gold
w2prev = 1.0-gold
c.....initial function evaluations
call mapfun(n,x,s,a,b,w1prev,fw1,nfun,xdummy)
call mapfun(n,x,s,a,b,w2prev,fw2,nfun,xdummy)
ic=0
c.....Step 2 of the golden section search
10 w1 = w1prev
w2 = w2prev
c.....calculate function value for new points only
if (ic .eq. 1) then
fw2 = fw1
call mapfun(n,x,s,a,b,w1,fw1,nfun,xdummy)
else if (ic .eq. 2) then
fw1 = fw2
call mapfun(n,x,s,a,b,w2,fw2,nfun,xdummy)
else if (ic .eq. 3) then
call mapfun(n,x,s,a,b,w1,fw1,nfun,xdummy)
call mapfun(n,x,s,a,b,w2,fw2,nfun,xdummy)
endif
c.....region-elimination rule
if (fw1 .lt. fw2) then
ic = 1
aw = w2
258 Optimization for Engineering Design: Algorithms and Examples
lw = bw-aw
w1prev = aw + gold * lw
w2prev = w1
else if (fw2 .lt. fw1) then
ic = 2
bw = w1
lw=bw-aw
w1prev = w2
w2prev = bw - gold * lw
else
ic = 3
aw = w2
bw = w1
lw = bw-aw
w1prev = aw + gold * lw
w2prev = bw - gold * lw
endif
k=k+1
c.....step 3 of the golden section search
if (dabs(lw) .lt. eps) then
xstar = a + (b-a) * (aw+bw)/2
return
else if (nfun .gt. maxfun) then
write(*,3) maxfun, a+aw*(b-a), a+bw*(b-a)
3 format(’ The algorithm did not converge in’,i6,
- ’ function evaluations’/,’ Interval (’,
- 1pe12.5,’,’1pe12.5,’)’)
ierr = 1
return
endif
go to 10
end
subroutine bphase(n,x,s,a,b,nfun,xdummy)
c.....bounding phase method
c.....all arguments are explained in subroutime golden
implicit real*8 (a-h,o-z)
dimension x(n),s(n),xdummy(n)
c.....step 1 of the algorithm
c.....initial guess, change if you like
w0 = 0.0
delta = 1.0
1 call mapfun(n,x,s,0d0,1d0,w0-delta,fn,nfun,xdummy)
call mapfun(n,x,s,0d0,1d0,w0,f0,nfun,xdummy)
call mapfun(n,x,s,0d0,1d0,w0+delta,fp,nfun,xdummy)
Constrained Optimization Algorithms 259
subroutine fderiv(n,x,grad,f,nfun,xd)
c.....derivative calculation at point x
implicit real*8 (a-h,o-z)
c.....calculates the first derivative of the function
dimension x(n),grad(n),xd(n)
260 Optimization for Engineering Design: Algorithms and Examples
do 10 i = 1,n
xd(i) = x(i)
10 continue
call funct(n,xd,f,nfun)
c.....set delta_x
do 12 i = 1,n
if (xd(i) .lt. 0.01) then
dx = 0.01
else
dx = 0.01 * xd(i)
endif
xd(i) = xd(i) + dx
call funct(n,xd,fp,nfun)
xd(i) = xd(i) - 2 * dx
call funct(n,xd,fn,nfun)
grad(i) = (fp-fn)/(2.0*dx)
xd(i)=x(i)
12 continue
return
end
subroutine mapfun(n,x,s,a,b,w,f,nfun,xd)
c.....first, a unit is mapped (for golden section)
c.....then, a point is found in s direction
implicit real*8 (a-h,o-z)
dimension x(n),s(n),xd(n)
xw = a + w * (b-a)
do 10 i = 1,n
xd(i) = x(i) + xw * s(i)
10 continue
call funct(n,xd,f,nfun)
return
end
subroutine funct(n,x,f,nfun)
c.....calculates the function value at x
implicit real*8 (a-h,o-z)
dimension x(n),g(10)
common/constr/nc,r
nfun = nfun + 1
c.....code your objective function and constraints here
c.....objective function value
f=(x(1)*x(1)+x(2)-11.0)**2+(x(1)+x(2)*x(2)-7.0)**2
c.....constraints
g(1) = -26.0 + (x(1)-5.0)**2 + x(2)*x(2)
Constrained Optimization Algorithms 261
subroutine unitvec(n,x,sum)
c.....finds unit vector and magnitude of a vector s
implicit real*8 (a-h,o-z)
dimension x(n)
sum = 0.0
do 1 i = 1,n
sum = sum + x(i)*x(i)
1 continue
sum = dsqrt(sum)
if (sum .ge. 1e-06) then
do 2 i = 1,n
x(i) = x(i)/sum
2 continue
endif
return
end
Simulation Run
The above code is run on a PC-386 under Microsoft FORTRAN for
minimizing the constrained Himmelblau function starting from x(0) = (0, 0)T .
Intermediate solutions obtained for the first five sequences of the penalty
function method are shown.
enter accuracy in steepest descent
0.001
enter accuracy in golden section
0.001
enter number of sequences
5
enter c: R^(t+1) = c R^(t)
10
enter initial vector
0 0
enter initial R
0.01
enter 1 for intermediate results
0
262 Optimization for Engineering Design: Algorithms and Examples
Sequence Number = 1
========================================================
Starting Solution is: 0.0000E+00, 0.0000E+00,
Function value: 1.70010E+02
--------------------------------------------------------
The minimum point is: 2.9639E+00, 2.0610E+00,
Function value: 3.16851E+00
Total function evaluations so far: 134
========================================================
Sequence Number = 2
========================================================
Starting Solution is: 2.9639E+00, 2.0610E+00,
Function value: 3.10671E+01
--------------------------------------------------------
The minimum point is: 2.6280E+00, 2.4750E+00,
Function value: 2.59956E+01
Total function evaluations so far: 240
========================================================
Sequence Number = 3
========================================================
Starting Solution is: 2.6280E+00, 2.4750E+00,
Function value: 2.08695E+02
--------------------------------------------------------
The minimum point is: 1.0111E+00, 2.9391E+00,
Function value: 5.86645E+01
Total function evaluations so far: 425
========================================================
Sequence Number = 4
========================================================
Starting Solution is: 1.0111E+00, 2.9391E+00,
Function value: 7.76006E+01
--------------------------------------------------------
The minimum point is: 8.5058E-01, 2.9421E+00,
Function value: 6.02360E+01
Total function evaluations so far: 556
========================================================
Sequence Number = 5
========================================================
Starting Solution is: 8.5058E-01, 2.9421E+00,
Function value: 6.16682E+01
--------------------------------------------------------
The minimum point is: 8.4169E-01, 2.9485E+00,
Function value: 6.03703E+01
Total function evaluations so far: 609
========================================================
5
Specialized Algorithms
263
264 Optimization for Engineering Design: Algorithms and Examples
∑
I
+ qt Qt (xi ), (5.2)
i=1
where Gj (y) is a penalty term handling the j-th inequality constraint, Hk (y)
is a penalty term handling the k-th equality constraint, and Qt (xi ) is the
penalty term handling the i-th integer variable. A number of penalty terms
for handling inequality and equality constraints are discussed in Chapter 4
and can also be used here. But the penalty term Qt in new and relates to the
integer variables only. The penalty term Qt for integer variables should be
such that there is no penalty for the integer values but there is an increasing
penalty for values away from integers. The following penalty term is suggested
for this purpose (Rao, 1984):
βt
Qt (xi ) = [4(xi − li )(1 − xi + li )] , (5.3)
where li = ⌊xi ⌋ (the operator ⌊ ⌋ takes the largest integer value smaller than
the operand). Usually, a value of βt ≥ 1 is used. In Figure 5.1, the above
penalty function is shown for two integer variables. The figure shows that the
penalty term Qt is zero for integer values and increases as the value deviates
from integer values. The penalty is maximum when the value is midway from
integer values.
This method works similar to the penalty function method described in
Chapter 4 except that in every sequence parameters rt , qt , and βt are all
changed. The initial parameters are chosen so as to have a minimal distortion
of the original function due to the addition of penalty terms in Equation (5.2).
In successive sequences the parameter βt is gradually reduced, the parameter
qt is gradually increased, and the parameter rk is changed according to the
penalty term used. Since the penalty function Qt pushes the real values
towards their nearest integers, a small tolerance parameter ϵI is defined to
check whether the current real value is close to an integer or not. If the real
266 Optimization for Engineering Design: Algorithms and Examples
EXERCISE 5.1.1
Consider the constrained INLP problem:
Minimize f (x) = (x21 + x2 − 9)2 + (x1 + x22 − 7)2
subject to
g1 (x) = 26 − (x1 − 5)2 − x22 ≥ 0,
g2 (x) = 20 − 4x1 − x2 ≥ 0,
x1 , x2 ≥ 0, x1 , x2 integers.
Specialized Algorithms 267
Figure 5.2 The feasible search space and the optimum point for the INLP problem
in Exercise 5.1.1.
∑
2 ∑
2
P (x, r1 , q1 ) = f (x) + r1 ⟨gj (x)⟩2 + q1 Q1 (xi ).
i=1 j=1
Here, we are using the bracket-operator exterior penalty term. Recall that the
quantity ⟨α⟩ takes a nonzero value α only if α < 0. The function Q1 is defined
for β1 = 2.0. Figure 5.3 shows the contour plot of the above function and the
intermediate points found in optimizing the above penalized function using
the steepest descent method. The initial point has a penalized function value
equal to 130. The final point obtained is (2.928, 1.993)T with a function equal
to 2.463. This point is not feasible, as none of the variables have converged
268 Optimization for Engineering Design: Algorithms and Examples
Solution Penalized
Sequence r1 q1 β1 function value
∑
2 ∑
2
P (x, r2 , q2 ) = f (x) + r2 ⟨gj (x)⟩2 + q2 Q2 (xi ).
i=1 i=1
The function Q2 is defined for β2 . Starting from the solution found in the
previous iteration (2.928, 1.993)T , we use the steepest descent method and
obtain the solution (2.975, 1.999)T with a function value equal to 3.422. This
point is also not feasible, but is closer to an integer solution than the initial
point. The penalized function value at this point is 3.881.
Specialized Algorithms 269
Figure 5.3 Intermediate points obtained using the steepest descent method on the
penalized INLP problem.
Step 3 Since the penalized value at this point is very different from that
in the previous iteration, we do not terminate the algorithm. Instead, we
update the parameters and move to Step 2 again. The updated parameters
are r3 = 100.0, q3 = 22.5, and β3 = 1.125.
Step 2 The new penalty function is formed with the above parameter values
and solved from the starting point (2.975, 1.999)T . The solution obtained at
this stage is (3.000, 1.999)T which can be approximated to be (3, 2)T with the
given tolerance ϵI . Incidentally, this point is the true minimum of the original
INLP problem.
Step 4 Since the penalized values for two consecutive iterations are not
close, the algorithm may proceed for one or two more iterations and finally
terminate.
Figure 5.4 shows the distorted penalty function and the convergence of the
solution to the true minimum. The above three iterations require a total of 374
function evaluations. This method, like all penalty function methods, suffers
from the distortion of the original objective function—a matter which can
be minimized by properly updating parameter values in successive iterations
or by using the method of multiplier approach described in Chapter 4. The
increase in values of the parameter q introduces multimodality in the function.
But since in later sequences of the simulation the initial point is close to the
true optimum, the penalty function method works successfully.
In discrete-variable problems where the design variables take discrete
values in regular or irregular intervals, the penalty function Qt (xi ) given in
Equation (5.3) can be modified as follows:
[ ( )( )]βt
x i − li xi − li
Qt (xi ) = 4 1− , (5.4)
hi − li h i − li
270 Optimization for Engineering Design: Algorithms and Examples
Figure 5.4 Intermediate points on a contour plot of the penalized function. Notice
the multimodality of the function obtained due to the addition of the
penalty function Q3 .
where li is the largest permissible discrete value smaller than xi and hi is the
smallest permissible discrete value larger than xi . For example, if a discrete
variable takes values from the set [0.2, 0.4, 0.7, 1.0, 1.2, . . .], and the current
solution is xi = 0.82, the above penalty function Qt (xi ) is used with the
following parameter values: li = 0.7 and hi = 1.0. It is interesting to note
that for integer variables, the above equation reduces to Equation (5.3).
Step 3 Fathom all NLPs one at a time, if one or more of the following
conditions are satisfied:
(i) The optimal solution of the NLP for all i ∈ I is integer valued.
(ii) The NLP problem is infeasible.
(iii) The optimal function value of the NLP is not better than the current
best function value.
Step 4 If all nodes have been fathomed, Terminate;
Else go to Step 2.
The selection of the integer variable to be branched is usually guided by
one of the following rules (Reklaitis et al., 1983):
(i) The integer variable corresponding to the better function value is used.
(ii) The most important integer variable is used first (guided by the
experience of the user).
(iii) An arbitrary integer design variable is used.
When a number of NLP problems in any level are required to be solved,
there is usually no preference for solving one NLP over the other. This is
because all created NLPs in any level must be solved before proceeding to the
next level.
We present hand simulation of this method on a numerical problem.
EXERCISE 5.1.2
We consider the INLP problem used in the previous exercise problem and
apply the branch-and-bound algorithm to solve the problem:
Minimize f (x) = (x21 + x2 − 9)2 + (x1 + x22 − 7)2
subject to
g1 (x) = 26 − (x1 − 5)2 + x22 ≥ 0,
g2 (x) = 20 − 4x1 − x2 ≥ 0,
x1 , x2 ≥ 0, x1 , x2 integers.
Recall that the optimal integer solution lies at the point (3, 2)T with a function
value equal to 4.0.
Step 1 At first, we assume that both variables can take any real values. We
solve the resulting NLP problem using the penalty function method described
in the previous chapter starting from the initial point x(1) = (0, 0)T . We
choose R(0) = 1.0 and obtain the solution x(1) = (2.628, 2.091)T . The second
sequence is performed with R(1) = 10.0 and with x(1) as the initial point.
We obtain the point x(2) = (2.628, 2.091)T having an objective function value
equal to f (x(2) ) = 4.152 × 10−7 . Since these solutions x(1) and x(2) are the
same up to three decimal places of accuracy, we terminate the search process.
Intermediate points are tabulated in Table 5.2 and the points are shown on a
contour plot in Figure 5.5. This solution requires 306 function evaluations.
272 Optimization for Engineering Design: Algorithms and Examples
Penalty Penalized
Sequence parameter Solution function value
Figure 5.5 Intermediate points for NLP-I on a contour plot of the objective
function f (x) for real values of x1 and x2 . Based on the obtained
optimum of NLP-I, the search space is divided into two nonoverlapping
regions (NLP-II and NLP-III). Note that no feasible point in NLP-I is
eliminated in forming the two NLPs. The choice of division along x1
is arbitrary.
Step 2 At this point, both solutions are real-valued. Thus, we branch along
any one of the variables. Let us assume that we choose to branch on x1
variable. Thus, we form two NLP problems as shown in Table 5.3. Let us
solve the NLP-II problem first. We use the penalty function method with the
steepest descent method as the unconstrained optimizer and the golden section
Specialized Algorithms 273
Table 5.3 NLP-II and NLP-III Problems Formed from the Solution of
NLP-I. (One extra constraint is added to the original INLP
problem to form two NLP problems. Note that no feasible
solution is eliminated.)
NLP-II NLP-III
subject to subject to
g1 (x) ≥ 0, g1 (x) ≥ 0,
g2 (x) ≥ 0, g2 (x) ≥ 0,
x1 ≤ 2.0, x1 ≥ 3.0,
x1 , x2 ≥ 0. x1 , x2 ≥ 0.
of the penalty function method find the minimum point (2.001, 2.324)T . Since
the solution to the first variable is within the tolerance level (which is assumed
to be ϵI = 0.001), we set the solution to be (2, 2.324)T . At this point, the
function value is 7.322. The penalty function method requires 850 function
evaluations. The intermediate points of the penalty function method are
shown in Figure 5.6.
Figure 5.6 Intermediate points for NLP-II and NLP-III on a contour plot of the
objective function f (x) for real values of x1 and x2 . NLP-II is branched
into NLP-IV and NLP-V based on the obtained solution of NLP-II.
After the solution of NLP-II, we solve the NLP-III problem from the same
starting point and penalty parameters. This time, we obtain the solution
(2.999, 1.925)T (refer to Table 5.5). This solution can be accepted to be
the solution (3, 1.925)T with the specified tolerance level. At this point, the
function value is 3.792. The number of function evaluations required to solve
this problem is 952. Figure 5.6 shows how the penalty function method finds
this minimum point.
Step 3 Since none of the problems NLP-II and NLP-III satisfy any of the
three conditions for fathoming a node, we proceed to Step 4.
Step 4 Since NLP-II and NLP-III are not yet fathomed, we move to Step 2.
This completes one iteration of the branch-and-bound method.
Step 2 From each node (NLP-II and NLP-III), we now have to branch into
two more nodes. This makes a total of four NLP problems. Recalling that the
solution obtained from NLP-II is (2, 2.324)T , we branch only on the second
variable. Two problems arising from NLP-II are tabulated in Table 5.6. Using
the penalty function method and starting from the point (0, 0)T , we solve
NLP-IV and obtain the solution (2.001, 2.000)T . We approximate this solution
to be (2, 2)T with a function value equal to 10.0. This procedure requires 510
function evaluations.
Specialized Algorithms 275
Solving NLP-V, we obtain the solution (1.999, 3.000)T . This solution can
be approximated as (2, 3)T with a function value equal to 20.0. The number
of function evaluations required to solve NLP-V is 612.
In NLP-III, the solution obtained is (3, 1.925)T . The progress of this
simulation is not shown in Figure 5.6, for the sake of brevity. We branch
on the second variable only. Two NLP problems arising from NLP-III are
shown in Table 5.7.
Solutions to these two problems are also found using the penalty
function method from an initial point (0, 0)T . The solution for NLP-VI is
(3.000, 1.000)T with a function value equal to 10.0 and the solution for NLP-
VII is found to be (3.000, 2.001)T . This solution is approximated to be (3, 2)T
with a function value equal to 4.0. The function evaluations required to solve
these NLPs are 544 and 754, respectively.
276 Optimization for Engineering Design: Algorithms and Examples
Table 5.6 NLP-IV and NLP-V Problems Formed from the Solution
of NLP-II. (One extra constraint is added to NLP-II to
form two NLP problems.)
NLP-IV NLP-V
NLP-VI NLP-VII
Step 4 Since all open nodes are now fathomed, we terminate the algorithm.
Therefore, the solution to the original INLP problem is (3, 2)T with optimal
function value equal to 4.0.
Specialized Algorithms 277
Figure 5.7 The optimum points for NLP-IV, NLP-V, NLP-VI, and NLP-VII on
a contour plot of the objective function f (x) for real values of x1 and
x2 . No feasible solution in NLP-IV is eliminated.
where ct > 0 and ait is any real number (positive, negative, or zero). The
terms are called posynomials because all variables xi can take only positive
values. We shall discuss more about functions with negative xi values later.
In the above expression, T posynomial terms are added together. Each of
the posynomial terms may contain at most N design variables. Fortunately,
many engineering design problems can be expressed in the above form and in
those problems GP methodology works more efficiently than other methods
(Beightler and Phillips, 1976).
In the GP method, the original optimization problem (supposedly in
posynomial form) is known as the primal problem. If the original problem
is not in posynomial form, variable substitutions may be used to convert the
original problem into the primal form. Thereafter, the primal problem is
converted to an equivalent dual problem expressed in terms of a set of dual
variables. The transformation is achieved using the arithmetic-geometric-
mean inequality which states that the arithmetic mean is greater than or equal
to the geometric mean of a set of positive numbers. Using this inequality, it
can be shown that for a set of variables δt > 0 (where t = 1, 2, . . . , T ), the
following inequality is true (Rao, 1984):
( )δt
∏
T ct
f (x) ≥ (5.5)
t=1 δ t
∑
T
ait δt = 0 for all i = 1, 2, . . . , N .
t=1
The equality sign in inequality (5.5) holds good when all δt are equal (with
a value 1/T ). Thus, an N -dimensional, unconstrained primal problem can
be substituted by an equivalent T -dimensional, linearly constrained dual
problem. Consider the primal problem given below:
∑
T ∏
N
Minimize f (x) = ct xai it
t=1 i=1
subject to
xi > 0 for i = 1, 2, . . . , N .
Specialized Algorithms 279
The posynomial objective function can be expressed in the dual form by using
the arithmetic-geometric-mean inequality principle. It can be shown that this
problem is equivalent to solving the following dual problem (Reklaitis et al.,
1983):
( )δt
∏
T ct
Maximize Z(δ) =
t=1 δt
subject to
∑
T
δt = 1,
t=1
∑
T
ait δt = 0, i = 1, 2, . . . , N,
t=1
δt ≥ 0.
Once the optimal dual variables (δ ∗ ) are found by solving the above dual
NLP problem, the corresponding primal solutions (x∗ ) can be obtained by
solving following linear simultaneous equations:
[ ∗ ]
∑
N δt Z(δ ∗ )
ait ln [x∗
i ] = ln , t = 1, 2, . . . , T . (5.6)
i=1 ct
Primal problem
∑
T0 ∏
N
Minimize f0 (x) = c0t xai 0it
t=1 i=1
subject to
Tj
∑ ∏
N
a
fj (x) = cjt xi jit ≤ 1, j = 1, 2, . . . , J;
t=1 i=1
xi > 0, i = 1, 2, . . . , N.
Note that the constraints are also in posynomial form. The inequality
constraints are required to be less-than-or-equal type and the right side of
each inequality constraint must be equal to one. If the NLP problem has a
greater-than-or-equal type constraint, it must be converted to a less-than-or-
equal type constraint. Fortunately, in engineering problems, the inequality
constraints occur primarily due to some resource limitation. Thus, the
inequality constraints are mainly less-than-or-equal type and are suitable for
GP methodology. Equality constraints are usually avoided from consideration.
They can be included by relaxing the equality requirement and by using two
inequality constraints, as discussed in Chapter 1.
The greatest difficulty in using GP methodology for constrained
optimization problems lies in the formulation of constraints in the above form.
However, if the objective function and constraints can be written in the above
form, the optimization procedure is efficient. The number of dual variables in
the above problem is the total number of terms in the objective function and
constraints, or
J
∑
T = Tj .
j=0
In order to derive the dual problem, Lagrange multipliers (uj ) for all
inequality constraints are used. As we have discussed before, the Lagrange
multiplier for an inequality constraint is the sum of the dual variables
corresponding to each posynomial term in the left-side expression of the
constraint. Thus, for the j-th inequality constraint having Tj terms, the
Lagrange multiplier is
Tj
∑
uj = δjt .
t=1
may write the dual variables for the objective function and constraints as
follows:
∏
N
δ0t = c0t xna0it /f0 (x),
i=1
∏
N
a
δjt = uj cjt xi jit .
i=1
Using the dual variables and Lagrange multipliers, we now present the
corresponding dual problem. The analysis for conversion of the primal
problem to the dual problem is beyond the scope of this book. Interested
readers may refer to a more advanced book (Rao, 1984).
Dual problem
We write the dual problem in terms of the dual variables δ. The original
minimization problem becomes a maximization problem in terms of the dual
variables. The resulting dual problem contains a number of linear constraints.
One advantage of working with the dual problem is that the constraints are
linear in terms of the dual variables. Thus, the Frank-Wolfe method discussed
in Chapter 4 becomes suitable to solve the dual problem.
Tj
( )δjt
∏
J ∏ cjt uj
Maximize Z(δ) =
j=0 t=1 δjt
subject to
∑
T0
δ0t = 1,
t=1
Tj
J ∑
∑
ajit δjt = 0, i = 1, 2, . . . , N ;
j=0 t=1
δjt ≥ 0, j = 0, 1, 2, . . . , J, t = 1, 2, . . . , Tj .
Tj
∑
uj = δjt .
t=1
∗ > 0. The
Both the above equations are valid for t = 1, 2, . . . , Tm and δmt
degree of difficulty in this problem is equal to
J
∑
d= Tj − N − 1.
j=0
EXERCISE 5.2.1
We consider the following two-variable NLP problem where both variables are
nonnegative:
Minimize 10/(x1 x2 )
subject to
x21 + 4x22 ≤ 32, x1 , x2 > 0.
The contour of the objective function, the feasible space, and the minimum
point are shown in Figure 5.9. The minimum point is (4, 2)T with a function
value equal to 1.25.
Figure 5.9 The feasible search space and the optimum point.
Step 1 In order to form the dual problem, we first find the degree of
difficulty. We observe that there is only one constraint (J = 1). We also
observe that N = 2, T0 = 1, and T1 = 2. Thus, the degree of difficulty
is d = (1 + 2) − 2 − 1 = 0, which suggests that an optimization method is
not required to solve the dual problem; the solution of simultaneous linear
equations is sufficient to find the optimal solution.
We first write the primal problem in posynomial form and then present
the corresponding dual problem.
Primal problem
Minimize f0 (x) = 10x−1 −1
1 x2
subject to
1 2 1 2
f1 (x) = x + x ≤ 1,
32 1 8 2
x1 , x2 > 0.
Dual problem
( )δ11 ( )δ12
δ11 + δ12 δ11 + δ12
Maximize Z(δ) = 10δ01
32δ11 8δ12
284 Optimization for Engineering Design: Algorithms and Examples
subject to
δ10 = 1,
(−1)δ01 + (2)δ11 = 0,
(−1)δ02 + (2)δ12 = 0,
Step 2 From the equality constraints, we obtain the optimal values: δ01∗ =1
∗ ∗
and δ11 = δ12 = 0.5. In any other problem, the Gauss-elimination method may
be used to find the optimal dual variables.
Step 3 The optimal dual function value is Z(δ ∗ ) = 1.25, which is also equal
to the optimal primal function value f (x∗ ). In order to find the optimal
0
primal variables, we write Equations (5.7) and (5.8) at the optimum point:
[ ]
(1)(1.25)
(−1) ln x1 + (−1) ln x2 = ln ,
10
[ ]
0.5
(2) ln x1 = ln ,
(1)(1/32)
[ ]
0.5
(2) ln x2 = ln .
(1)(1/8)
From the last two equations, we obtain x∗1 = 4 and x∗2 = 2. It is interesting to
note that these values satisfy the first equation. We emphasize here that this
did not happen accidentally. The formulation of the primal to dual problem is
such that optimal primal variables can be calculated by using any N equations.
The rest of the equations will be automatically satisfied. Thus, the minimum
solution to the given problem is x∗ = (4, 2)T with a function value equal to
1.25.
The above formulation can solve problems where the objective function
and constraints can be expressed by the summation of a number of
posynomials. This restricts the application of the above method to a narrow
class of problems. In order to apply the GP method to problems formed by
addition and/or subtraction of posynomials, an extension of the original GP
algorithm is suggested.
Let us consider the following primal problem which can be written by
algebraic sum of several posynomials:
∑
T0 ∏
N
Minimize f0 (x) = σ0t c0t xai 0it
t=1 i=1
Specialized Algorithms 285
subject to
Tj
∑ ∏
N
a
fj (x) = σjt cjt xi jit ≤ σj , j = 1, 2, . . . , J;
t=1 i=1
xi > 0, i = 1, 2, . . . , N.
Here, the parameter σjt and σj can take a value either 1 or −1. The
corresponding dual problem can be written as follows:
Tj
( )σ σ δ
∏
J ∏ cjt uj 0 jt jt
Maximize Z(δ, σ0 ) = σ0
j=0 t=1 δjt
subject to
∑
T0
u 0 = σ0 σ0t δ0t = 1,
t=1
∑ Tj
J ∑
ajit δjt σjt = 0, i = 1, 2, . . . , N ;
j=0 t=1
Tj
∑
u j = σj σjt δjt ≥ 0, j = 1, 2, . . . , J;
t=1
δjt ≥ 0, j = 0, 1, 2, . . . , J, t = 1, 2, . . . , Tj .
EXERCISE 5.2.2
We would like to solve the following NLP problem:
Minimize f (x) = 10 − x2
subject to
g(x) = 26 − (x1 − 5)2 − x22 ≥ 0, x1 , x2 > 0.
√
The constraint g(x) makes all points inside a circle of radius 26 and centre at
(5, 0)T feasible. Since the objective is to minimize the function f (x) = 10 − x2 ,
the optimal solution is the point on the circle having maximum x2 value. This
solution is (5.000, 5.099)T . The optimal function value at this point is 4.901.
We shall use the above primal-dual formulations to find this solution.
Step 1 We first write the given problem in the primal form. We recognize
that the constant term in the objective function does not affect the minimum
solution and therefore can be omitted from the following discussion. Once the
optimal function value is found, the constant can be added to obtain the true
optimum value. By performing simple algebraic manipulation, we rewrite the
above problem without the constant term in the objective function:
Minimize f0 (x) = −1(x1 )0 (x2 )1
subject to
g(x) = (x1 )2 (x2 )0 + (x1 )0 (x2 )2 − 10(x1 )1 (x2 )0 ≤ 1, x1 , x2 > 0.
The above problem has two variables (N = 2). The objective function has
only one posynomial term (T0 = 1). In the above problem, there is only one
constraint (J = 1) and there are three posynomial terms in the constraint
(T1 = 3). Thus, the degree of difficulty of the problem is (1 + 3 − 2 − 1) or 1.
In the above problem, we observe the following parameter values:
Objective function σ01 = −1, c01 = 1, a011 = 0, a021 = 1,
Constraints: σ11 = 1, σ12 = 1, c11 = 1,
a111 = 2, a121 = 0, σ13 = −1,
σ1 = 1, c12 = 1, a112 = 0,
a122 = 2, c13 = 10, a113 = 1, a123 = 0.
Maximize Z(δ, σ0 )
[( )−δ01 ( )δ ( )δ ( )−δ ]σ0
1 u1 11 u1 12 10u1 13
= σ0
δ01 δ11 δ12 δ13
Specialized Algorithms 287
subject to
−σ0 δ01 = 1,
2δ11 − δ13 = 0,
−δ01 + 2δ12 = 0,
In general, the optimal solution to the above NLP problem can be obtained
using one of the algorithms discussed in Chapter 4. Since the number of
variables is less and equality constraints are linear, we use variable elimination
method to find the optimal solution. From the first equality constraint, we
notice that σ0 must be negative, because δ01 has to be positive. Thus, we
set σ0 = −1 in the remaining calculations. Since there are four variables and
three equality constraints, the objective function may be written in terms of
only one variable. From the first and third equality constraints, we observe
that δ01 = 1 and δ12 = 0.5. By writing the δ13 in terms of δ11 , we obtain the
one-dimensional constrained problem:
1
Maximize Z(δ11 ) = − √ (0.04δ11 )−δ11 (0.5 − δ11 )(0.5−δ11 )
2
subject to
δ11 ≤ 0.5,
δ11 ≥ 0.
Using the golden section search method in the domain (0.0, 0.5), we obtain
∗ = 0.4808. The other optimal dual variables are δ ∗ = 0.5,
the solution δ11 12
∗
δ13 = 0.9615, u∗ ∗
1 = 0.0192, and σ0 = −1.
These equations yield the solution x∗1 = 5.000 and x∗2 = 5.099. The true
optimum function value is (f0∗ + 10) or 4.901.
288 Optimization for Engineering Design: Algorithms and Examples
5.3 Summary
Two special purpose optimization algorithms are described in this chapter.
In addition to constraints associated with an optimization problem, many
problems may have an additional restrictions of having discrete or integer
variables. Most optimization algorithms discussed in the previous chapters
may fail to correctly solve these problems because the underlying notion of
search direction and gradient information at a particular point are local in
nature. In discrete or integer programming problems, the points in the vicinity
of a feasible point are not feasible. In this chapter, two different methods—a
penalty function method and a branch-and-bound method—are described.
In the penalty function method, all infeasible solutions corresponding to
noninteger solutions of integer variables are penalized. The resulting penalty
function is solved recursively to find the optimum. In the branch-and-bound
method, the optimum solution is found by recursively partitioning (branching)
the problem in several regions of the search space and by subsequent by
fathoming (bounding) the nodes.
Since many engineering design problems (objective function and
constraints) can be written as summation of several posynomial terms,
the geometric programming (GP) method is found suitable and efficient in
solving those problems (Beightler and Phillips, 1976; Duffin, et al., 1967). In
the geometric programming method, the original (primal) problem is first
transformed into an equivalent dual problem and solved. Thereafter, the
optimal dual solution is transformed back to an equivalent primal solution.
The GP method is efficient if the NLP problem can be written in the required
primal form. Fortunately, many engineering problems can be written in the
required primal form and, thus, GP methodology is popular in engineering
design optimization.
REFERENCES
PROBLEMS
Minimize (x22 + 2x1 + 2)4 + (x5 − x1 )2 + (x3 − 3)2 + x44 + (x2 + x1 − 2)2
subject to
2x1 + x2 − 3x23 + 10 ≥ 0,
x21 − 2x5 ≥ 0,
x1 , x2 , x3 integers,
we waive the integer restriction and solve the resulting NLP problem using
penalty function method described in Chapter 4. The obtained minimum
point is x = (1.9, 0.1, 3.2, 0, 2.0)T . Find the true optimum solution by choosing
suitable values for the integer variables.
5-2 Perform two iterations of the penalty function method to minimize the
following INLP problem:
subject to
x ≥ y,
x, y integers.
subject to
x21 + 2x22 ≤ 100,
x1 , x2 integers.
5-4 Use the branch-and-bound method to find the optimal solution of the
following NLP problem:
Maximize f (x, y) = x + 2y
subject to
x2 + 7y ≤ 49,
(ii) Make a separate plot of the feasible region of each subproblem and
show the corresponding optimal solution.
(iii) Show a flowchart of all NLPs and their solutions with function values.
5-5 Solve the following mixed integer program using the branch-and-bound
algorithm.
Maximize 3x1 + 2x2
subject to
2x1 + x2 ≤ 9,
−x1 + 2x2 ≤ 4,
x1 − x2 ≤ 3,
x1 , x2 ≥ 0,
x1 , x2 integers.
Show the progress of the algorithm by plotting feasible regions.
5-6 Formulate a suitable primal problem of the following NLP problem:
Minimize (x1 + 2x2 )3.5 /x3 + x22 x3
subject to
2x1 + 3x2 x23 + 4x2 ≤ 1,
x2 ≥ x23 + 1,
x1 , x2 , x3 > 0.
Write the corresponding dual problem. What is the degree of difficulty of the
problem?
5-7 Solve the following integer linear program using the branch-and-bound
method:
Minimize x1 + 10x2 ,
subject to
66x1 + 14x2 ≥ 1428,
x1 , x2 integer.
Solve the intermediate linear programs graphically by clearly showing the
feasible space and corresponding optimum solution in separate plots. Branch
based on the largest absolute value of a non-integer variable. Maintain
calculations up to three decimal places of accuracy. Also show a flowchart
of branching and bounding procedure.
Specialized Algorithms 291
subject to
x21 + exp (−2x2 ) ≤ 9,
x2 < 0,
x1 , x3 > 0.
Minimize x1 x2 x3
subject to
x1 x2 x1 x 3 x 2 x3
+ + ≤ 1,
x3 x2 x1
x1 , x2 , x3 > 0.
subject to
x2 − 6x + y ≤ 0,
x ≥ 3,
y ≥ 2.
6
Nontraditional Optimization
Algorithms
292
Nontraditional Optimization Algorithms 293
Coding
In order to use GAs to solve the above problem, variables xi ’s are first
coded in some string structures. It is important to mention here that the
coding of the variables is not absolutely necessary. There exist some studies
where GAs are directly used on the variables themselves, but here we shall
ignore the exceptions and discuss the working principle of a simple genetic
algorithm. Binary-coded strings having 1’s and 0’s are mostly used. The
length of the string is usually determined according to the desired solution
accuracy. For example, if four bits are used to code each variable in a two-
variable function optimization problem, the strings (0000 0000) and (1111
1111) would represent the points
(L) (L) (U ) (U )
(x1 , x2 )T (x1 , x2 )T ,
respectively, because the substrings (0000) and (1111) have the minimum
and the maximum decoded values. Any other eight-bit string can be found
to represent a point in the search space according to a fixed mapping rule.
Usually, the following linear mapping rule is used:
(U ) (L)
(L) xi − xi
x i = xi + ℓ
decoded value (si ). (6.1)
2 −1
i
possible, because each bit-position can take a value either 0 or 1. The accuracy
that can be obtained with a four-bit coding is only approximately 1/16th of
the search space. But as the string length is increased by one, the obtainable
accuracy increases exponentially to 1/32th of the search space. It is not
necessary to code all variables in equal substring length. The length of a
substring representing a variable depends on the desired accuracy in that
variable. Generalizing this concept, we may say that with an ℓi -bit coding
for a variable, the obtainable accuracy in that variable is approximately
(U ) (L)
(xi − xi )/2ℓi . Once the coding of the variables has been done, the
corresponding point x = (x1 , x2 , . . . , xN )T can be found using Equation (6.1).
Thereafter, the function value at the point x can also be calculated by
substituting x in the given objective function f (x).
Fitness function
As pointed out earlier, GAs mimic the survival-of-the-fittest principle of
nature to make a search process. Therefore, GAs are naturally suitable
for solving maximization problems. Minimization problems are usually
transformed into maximization problems by some suitable transformation.
In general, a fitness function F(x) is first derived from the objective function
and used in successive genetic operations. Certain genetic operators require
that the fitness function be nonnegative, although certain operators do not
have this requirement. For maximization problems, the fitness function can
be considered to be the same as the objective function or F(x) = f (x). For
minimization problems, the fitness function is an equivalent maximization
problem chosen such that the optimum point remains unchanged. A number
of such transformations are possible. The following fitness function is often
used:
F(x) = 1/(1 + f (x)). (6.2)
This transformation does not alter the location of the minimum, but converts
a minimization problem to an equivalent maximization problem. The fitness
function value of a string is known as the string’s fitness.
The operation of GAs begins with a population of random strings
representing design or decision variables. Thereafter, each string is evaluated
to find the fitness value. The population is then operated by three main
operators—reproduction, crossover, and mutation—to create a new population
of points. The new population is further evaluated and tested for termination.
If the termination criterion is not met, the population is iteratively operated
by the above three operators and evaluated. This procedure is continued
until the termination criterion is met. One cycle of these operations and
the subsequent evaluation procedure is known as a generation in GA’s
terminology. The operators are described next.
GA operators
where n is the population size. One way to implement this selection scheme
is to imagine a roulette-wheel with it’s circumference marked for each string
proportionate to the string’s fitness. The roulette-wheel is spun n times, each
time selecting an instance of the string chosen by the roulette-wheel pointer.
Since the circumference of the wheel is marked according to a string’s fitness,
this roulette-wheel mechanism is expected to make F i /F copies of the i-th
string in the mating pool. The average fitness of the population is calculated
as
∑n
F= F i /n.
i=1
Figure 6.1 shows a roulette-wheel for five individuals having different fitness
values. Since the third individual has a higher fitness value than any other, it
Figure 6.1 A roulette-wheel marked for five individuals according to their fitness
values. The third individual has a higher probability of selection than
any other.
is expected that the roulette-wheel selection will choose the third individual
more than any other individual. This roulette-wheel selection scheme can be
simulated easily. Using the fitness value Fi of all strings, the probability of
selecting a string pi can be calculated. Thereafter, the cumulative probability
296 Optimization for Engineering Design: Algorithms and Examples
(Pi ) of each string being copied can be calculated by adding the individual
probabilities from the top of the list. Thus, the bottom-most string in
the population should have a cumulative probability (Pn ) equal to 1. The
roulette-wheel concept can be simulated by realizing that the i-th string in
the population represents the cumulative probability values from Pi−1 to Pi .
The first string represents the cumulative values from zero to P1 . Thus, the
cumulative probability of any string lies between 0 to 1. In order to choose
n strings, n random numbers between zero to one are created at random.
Thus, a string that represents the chosen random number in the cumulative
probability range (calculated from the fitness values) for the string is copied
to the mating pool. This way, the string with a higher fitness value will
represent a larger range in the cumulative probability values and therefore has
a higher probability of being copied into the mating pool. On the other hand,
a string with a smaller fitness value represents a smaller range in cumulative
probability values and has a smaller probability of being copied into the
mating pool. We illustrate the working of this roulette-wheel simulation later
through a computer simulation of GAs.
In reproduction, good strings in a population are probabilistically assigned
a larger number of copies and a mating pool is formed. It is important to note
that no new strings are formed in the reproduction phase. In the crossover
operator, new strings are created by exchanging information among strings of
the mating pool. Many crossover operators exist in the GA literature. In most
crossover operators, two strings are picked from the mating pool at random
and some portions of the strings are exchanged between the strings. A single-
point crossover operator is performed by randomly choosing a crossing site
along the string and by exchanging all bits on the right side of the crossing
site as shown:
0 0 0 0 0 0 0 1 1 1
⇒
1 1 1 1 1 1 1 0 0 0
The two strings participating in the crossover operation are known as parent
strings and the resulting strings are known as children strings. It is intuitive
from this construction that good substrings from parent strings can be
combined to form a better child string, if an appropriate site is chosen. Since
the knowledge of an appropriate site is usually not known beforehand, a
random site is often chosen. With a random site, the children strings produced
may or may not have a combination of good substrings from parent strings,
depending on whether or not the crossing site falls in the appropriate place.
But we do not worry about this too much, because if good strings are created
by crossover, there will be more copies of them in the next mating pool
generated by the reproduction operator. But if good strings are not created
by crossover, they will not survive too long, because reproduction will select
against those strings in subsequent generations.
It is clear from this discussion that the effect of crossover may be
detrimental or beneficial. Thus, in order to preserve some of the good strings
Nontraditional Optimization Algorithms 297
that are already present in the mating pool, not all strings in the mating
pool are used in crossover. When a crossover probability of pc is used, only
100pc per cent strings in the population are used in the crossover operation
and 100(1 − pc ) per cent of the population remains as they are in the current
population1 .
A crossover operator is mainly responsible for the search of new strings,
even though a mutation operator is also used for this purpose sparingly.
The mutation operator changes 1 to 0 and vice versa with a small mutation
probability, pm . The bit-wise mutation is performed bit by bit by flipping a
coin2 with a probability pm . If at any bit the outcome is true then the bit
is altered; otherwise the bit is kept unchanged. The need for mutation is to
create a point in the neighbourhood of the current point, thereby achieving
a local search around the current solution. The mutation is also used to
maintain diversity in the population. For example, consider the following
population having four eight-bit strings:
0110 1011
0011 1101
0001 0110
0111 1100
Notice that all four strings have a 0 in the left-most bit position. If the true
optimum solution requires 1 in that position, then neither reproduction nor
crossover operator described above will be able to create 1 in that position.
The inclusion of mutation introduces some probability (Npm ) of turning 0
into 1.
These three operators are simple and straightforward. The reproduction
operator selects good strings and the crossover operator recombines good
substrings from good strings together to hopefully create a better substring.
The mutation operator alters a string locally to hopefully create a better
string. Even though none of these claims are guaranteed and/or tested while
creating a string, it is expected that if bad strings are created they will
be eliminated by the reproduction operator in the next generation and if
good strings are created, they will be increasingly emphasized. Interested
readers may refer to Goldberg (1989) and other GA literature given in the
references for further insight and some mathematical foundations of genetic
algorithms.
Here, we outline some differences and similarities of GAs with traditional
optimization methods.
1
Even though the best (1 − pc )100% of the current population can be copied
deterministically to the new population, this is usually performed at random.
2
Flipping of a coin with a probability p is simulated as follows. A number between
0 to 1 is chosen at random. If the random number is smaller than p, the outcome
of coin-flipping is true, otherwise the outcome is false.
298 Optimization for Engineering Design: Algorithms and Examples
As seen from the above description of the working principles of GAs, they
are radically different from most of the traditional optimization methods
described in Chapters 2 to 4. However, the fundamental differences are
described in the following paragraphs.
GAs work with a string-coding of variables instead of the variables. The
advantage of working with a coding of variables is that the coding discretizes
the search space, even though the function may be continuous. On the other
hand, since GAs require only function values at various discrete points, a
discrete or discontinuous function can be handled with no extra cost. This
allows GAs to be applied to a wide variety of problems. Another advantage
is that the GA operators exploit the similarities in string-structures to make
an effective search. Let us discuss this important aspect of GAs in somewhat
more details. A schema (pl. schemata) represents a number of strings with
similarities at certain string positions. For example, in a five-bit problem,
the schema (101∗∗) (a ∗ denotes either a 0 or a 1) represents four strings
(10100), (10101), (10110), and (10111). In the decoded parameter space,
a schema represents a continuous or discontinuous region in the search space.
Figure 6.2 shows that the above schema represents one-eighth of the search
space. Since in an ℓ-bit schema, every position can take either 0, 1, or ∗, there
Figure 6.2 A schema with three fixed positions divides the search space into eight
regions. The schema (101∗∗) is highlighted.
where F(H) is the fitness of the schema H calculated by averaging the fitness
of all strings representing the schema, δ(H) is the defining length of the
schema H calculated as the difference in the outermost defined positions,
and o(H) is the order of the schema H calculated as the number of fixed
positions in the schema. For example, the schema H = 101∗∗ has a defining
length equal to δ(H) = 3 − 1 = 2 and has an order o(H) = 3. The growth
factor ϕ defined in the above equation can be greater than, less than, or
Nontraditional Optimization Algorithms 299
except the objective function values. Although the direct search methods
used in traditional optimization methods do not explicitly require the gradient
information, some of those methods use search directions that are similar in
concept to the gradient of the function. Moreover, some direct search methods
work under the assumption that the function to be optimized is unimodal and
continuous. In GAs, no such assumption is necessary.
One other difference in the operation of GAs is the use of probabilities
in their operators. None of the genetic operators work deterministically. In
the reproduction operator, even though a string is expected to have F i /F
copies in the mating pool, a simulation of the roulette-wheel selection scheme
is used to assign the true number of copies. In the crossover operator, even
though good strings (obtained from the mating pool) are crossed, strings
to be crossed are created at random and cross-sites are created at random.
In the mutation operator, a random bit is suddenly altered. The action of
these operators may appear to be naive, but careful studies may provide some
interesting insights about this type of search. The basic problem with most
of the traditional methods is that they use fixed transition rules to move from
one point to another. For instance, in the steepest descent method, the search
direction is always calculated as the negative of the gradient at any point,
because in that direction the reduction in the function value is maximum.
In trying to solve a multimodal problem with many local optimum points
(interestingly, many real-world engineering optimization problems are likely
to be multimodal), search procedures may easily get trapped in one of the
local optimum points. Consider the bimodal function shown in Figure 6.3.
The objective function has one local minimum and one global minimum. If
Figure 6.3 An objective function with one local optimum and one global optimum.
The point x(t) is in the local basin.
the initial point is chosen to be a point in the local basin (point x(t) in the
figure), the steepest descent algorithm will eventually find the local optimum
point. Since the transition rules are rigid, there is no escape from these
local optima. The only way to solve the above problem to global optimality
is to have a starting point in the global basin. Since this information is
usually not known in any problem, the steepest-descent method (and for that
matter most traditional methods) fails to locate the global optimum. We show
simulation results showing inability of the steepest descent method to find the
Nontraditional Optimization Algorithms 301
Even though GAs are different than most traditional search algorithms, there
are some similarities. In traditional search methods, where a search direction
is used to find a new point, at least two points are either implicitly or
explicitly used to define the search direction. In the Hooke-Jeeves pattern
search method, a pattern move is created using two points. In gradient-based
methods, the search direction requires derivative information which is usually
calculated using function values at two neighbouring points. In the crossover
operator (which is mainly responsible for the GA search), two points are also
used to create two new points. Thus, the crossover operation is similar to a
directional search method except that the search direction is not fixed for all
points in the population and that no effort is made to find the optimal point
in any particular direction. Consider a two-variable optimization problem
shown in Figure 6.4, where two parent points p1 and p2 are participated
in the crossover. Under the single-point crossover operator, one of the two
substrings is crossed. It can be shown that the two children points can only
lie along directions (c1 and c2 ) shown in the figure (either along solid arrows
or along dashed arrows). The exact locations of the children points along
these directions depend on the relative distance between the parents (Deb
and Agrawal, 1995). The points y1 and y2 are the two typical children points
obtained after crossing the parent points p1 and p2 . Thus, it may be envisaged
that point p1 has moved in the direction from d1 up to the point y1 and
similarly the point p2 has moved to the point y2 .
302 Optimization for Engineering Design: Algorithms and Examples
Since the two points used in the crossover operator are chosen at random,
many such search directions are possible. Among them some directions may
lead to the global basin and some directions may not. The reproduction
operator has an indirect effect of filtering the good search directions and help
guide the search. The purpose of the mutation operator is to create a point
in the vicinity of the current point. The search in the mutation operator is
similar to a local search method such as the exploratory search used in the
Hooke-Jeeves method. With the discussion of the differences and similarities
of GAs with traditional methods, we are now ready to present the algorithm
in a step-by-step format.
Algorithm
Step 1 Choose a coding to represent problem parameters, a selection
operator, a crossover operator, and a mutation operator. Choose population
size, n, crossover probability, pc , and mutation probability, pm . Initialize
a random population of strings of size ℓ. Choose a maximum allowable
generation number tmax . Set t = 0.
Step 2 Evaluate each string in the population.
Step 3 If t > tmax or other termination criteria is satisfied, Terminate.
Step 4 Perform reproduction on the population.
Step 5 Perform crossover on random pairs of strings.
Step 6 Perform mutation on every string.
Step 7 Evaluate strings in the new population. Set t = t + 1 and go to
Step 3.
Nontraditional Optimization Algorithms 303
EXERCISE 6.1.1
The objective is to minimize the function
The next step is to compute the expected count of each string as F(x)/F. The
values are calculated and shown in column A of Table 6.1. In other words, we
can compute the probability of each string being copied in the mating pool
by dividing these numbers with the population size (column B). Once these
probabilities are calculated, the cumulative probability can also be computed.
These distributions are also shown in column C of Table 6.1. In order to form
the mating pool, we create random numbers between zero and one (given in
column D) and identify the particular string which is specified by each of these
random numbers. For example, if the random number 0.472 is created, the
tenth string gets a copy in the mating pool, because that string occupies the
interval (0.401, 0.549), as shown in column C. Column E refers to the selected
string. Similarly, other strings are selected according to the random numbers
shown in column D. After this selection procedure is repeated n times (n is the
population size), the number of selected copies for each string is counted. This
number is shown in column F. The complete mating pool is also shown in the
table. Columns A and F reveal that the theoretical expected count and the
true count of each string more or less agree with each other. Figure 6.5 shows
the initial random population and the mating pool after reproduction. The
Figure 6.5 The initial population (marked with empty circles) and the mating
pool (marked with boxes) on a contour plot of the objective function.
The best point in the population has a function value 39.849 and the
average function value of the initial population is 360.540.
points marked with an enclosed box are the points in the mating pool. The
action of the reproduction operator is clear from this plot. The inferior points
have been probabilistically eliminated from further consideration. Notice that
not all selected points are better than all rejected points. For example, the
14th individual (with a fitness value 0.002) is selected but the 16th individual
(with a function value 0.005) is not selected.
Although the above roulette-wheel selection is easier to implement, it is
noisy. A more stable version of this selection operator is sometimes used.
After the expected count for each individual string is calculated, the strings
306 Optimization for Engineering Design: Algorithms and Examples
are first assigned copies exactly equal to the mantissa of the expected count.
Thereafter, the regular roulette-wheel selection is implemented using the
decimal part of the expected count as the probability of selection. This
selection method is less noisy and is known as the stochastic remainder
selection.
Step 5 At this step, the strings in the mating pool are used in the crossover
operation. In a single-point crossover, two strings are selected at random and
crossed at a random site. Since the mating pool contains strings at random, we
pick pairs of strings from the top of the list. Thus, strings 3 and 10 participate
in the first crossover operation. When two strings are chosen for crossover,
first a coin is flipped with a probability pc = 0.8 to check whether a crossover
is desired or not. If the outcome of the coin-flipping is true, the crossing
over is performed, otherwise the strings are directly placed in an intermediate
population for subsequent genetic operation. It turns out that the outcome
of the first coin-flipping is true, meaning that a crossover is required to be
performed. The next step is to find a cross-site at random. We choose a site
by creating a random number between (0, ℓ − 1) or (0, 19). It turns out that
the obtained random number is 11. Thus, we cross the strings at the site 11
and create two new strings. After crossover, the children strings are placed
in the intermediate population. Then, strings 14 and 2 (selected at random)
are used in the crossover operation. This time the coin-flipping comes true
again and we perform the crossover at the site 8 found at random. The new
children strings are put into the intermediate population. Figure 6.6 shows
how points cross over and form new points. The points marked with a small
Figure 6.6 The population after the crossover operation. Two points are crossed
over to form two new points. Of ten pairs of strings, seven pairs are
crossed.
Nontraditional Optimization Algorithms 307
box are the points in the mating pool and the points marked with a small
circle are children points created after crossover operation. Notice that not all
10 pairs of points in the mating pool cross with each other. With the flipping
of a coin with a probability pc = 0.8, it turns out that fourth, seventh, and
tenth crossovers come out to be false. Thus, in these cases, the strings are
copied directly into the intermediate population. The complete population at
the end of the crossover operation is shown in Table 6.2. It is interesting to
note that with pc = 0.8, the expected number of crossover in a population
of size 20 is 0.8 × 20/2 or 8. In this exercise problem, we performed seven
crossovers and in three cases we simply copied the strings to the intermediate
population. Figure 6.6 shows that some good points and some not-so-good
points are created after crossover. In some cases, points far away from the
parent points are created and in some cases points close to the parent points
are created.
Figure 6.7 The population after mutation operation. Some points do not get
mutated and remain unaltered. The best point in the population has a
function value 18.886 and the average function value of the population
is 140.210, an improvement of over 60 per cent.
the mutation operator changes a point locally and in some other it can bring
a large change. The points marked with a small circle are points in the
intermediate population. The points marked with a small box constitute the
308 Optimization for Engineering Design: Algorithms and Examples
Nontraditional Optimization Algorithms 309
Figure 6.8 All 20 points in the population at generation 25 shown on the contour
plot of the objective function. The figure shows that most points are
clustered around the true minimum.
310 Optimization for Engineering Design: Algorithms and Examples
fitness value at this point is equal to 0.999 and the average population fitness
of the population is 0.474. The figure shows how points are clustered around
the true minimum of the function in this generation. A few inferior points
are still found in the plot. They are the result of some unsuccessful crossover
events. We also observe that the total number of function evaluations required
to obtain this solution is 0.8 × 20 × 26 or 416 (including the evaluations of the
initial population).
In order to show the efficiency of GAs in arriving at a point close to the
true optimum, we perform two more simulations starting with different initial
populations. Figure 6.9 shows how the function value of the best point in a
population reduces with generation number. Although all three runs have a
different initial best point, they quickly converge to a solution close to the
true optimum (recall that the optimum point has a function value equal to
zero).
Figure 6.9 The function value of the best point in the population for three
independent GA runs. All runs quickly converge to a point near the
true optimum.
∑
J ∑
K
P (x) = f (x) + uj ⟨gj (x)⟩2 + vk [hk (x)]2 , (6.4)
j=1 k=1
where uj and vk are penalty coefficients, which are usually kept constant
throughout GA simulation. The fitness function is formed by the usual
transformation: F(x) = 1/(1 + P (x)). Since GAs are population based search
techniques, the final population converges to a region, rather than a point
as depicted in the simulation on Himmelblau’s function in Exercise 6.1.1.
Unlike the penalty function method described in Chapter 4, the update of
penalty parameters in successive sequences is not necessary with GAs. Recall
that in the traditional penalty function method (described in Chapter 4),
the increase of penalty parameter R in successive sequences distorted the
penalized function. In some occasions, this distortion may create some
artificial local optima. This causes the traditional penalty function method
difficult to solve a constrained optimization problem in a single sequence.
Since GAs can handle multimodal functions better than the traditional
methods, a large penalty parameter R can be used. Since the exact optimum
point can only be obtained with an infinite value of penalty parameter R,
in most cases the GA solution would be close to the true optimum. With
a solution close to the true optimum, an arbitrary large value of R can be
used with the steepest descent method to find the optimum point in only one
sequence of the penalty function method. In order to illustrate this procedure,
we reconsider the NLP problem described in Exercise 4.3.1.
With a population of 30 points, a crossover probability of 0.9 and a
mutation probability of 0.01, we perform a GA simulation for 30 generations
with a penalty parameter R = 100. This is very large compared to R = 0.1
used in the first sequence in Exercise 4.3.1. Figure 6.10 shows the initial
population (with empty boxes) and the population at generation 30 (with
empty circles) on the contour plot of the NLP problem. The figure shows
that initial population is fairly spread out on the search space. After 30
generations, the complete population is in the feasible region and is placed
312 Optimization for Engineering Design: Algorithms and Examples
Figure 6.10 Initial population and the population after generation 30 shown on
a contour plot of the NLP problem described in Exercise 4.3.1. The
hashes used to mark the feasible region is different from that in most
other figures in this book.
close to the true optimum point. The best point found at this population
is (0.809, 3.085)T and may now be used as an initial point for one sequence
of the penalty function method described in Chapter 4 (with a large R) to
obtain the true optimum point with the desired accuracy.
Although the problem contains only two variables and the function is fairly
well-behaved, it is interesting to compare the overall function evaluations in
GAs and in the penalty function method. Recall that the penalty function
method required 609 evaluations in five sequences (the sample run at the
end of Chapter 4). In genetic algorithms, one generation required (0.9 ×
30) or 27 function evaluations. At the end of 30 generations, the total
function evaluations required were 837 (including the evaluation of the initial
population). It is worth mentioning here that no effort is made to optimally
set the GA parameters to obtain the above solution. It is anticipated that with
a proper choice of GA parameters, a better result may have been obtained.
Nevertheless, for a comparable number of function evaluations, GAs have
found a population near the true optimum.
In 2000, the author suggested a penalty-parameter-less constraint-
handling approach which has become quite popular in the subsequent years,
mainly due to its simplicity and successful performance on most problems
(Deb, 2000). The concept is simple. In a tournament selection operator
comparing two population members, the following three possibilities and cor-
responding selection strategy were suggested:
(i) When one solution is feasible and the other is infeasible, the feasible
solution is selected.
(ii) When both solutions are feasible, the one with better function value is
selected.
Nontraditional Optimization Algorithms 313
(iii) When both solutions are infeasible, the one with smaller constraint
violation (which we define below) is selected.
The first selection criterion makes sense and is also pragmatic. If a solution
cannot be implemented at all, what good is it to know its objective function
value? Although there may be some benefits in preserving certain infeasible
solutions that are close (in the variable space) to the optimal solution, the
above simple strategy seemed to have worked in many difficult problems. The
second selection criterion is also obvious and is followed in unconstrained
optimization problems. The third criterion makes sense, but a proper
definition of a constraint violation may be needed for the procedure to work
well. In Deb (2000), we suggested a two-step procedure. First, normalize all
constraints using the constant term in the constraint function. For example,
the constraint gj (x) − bj ≥ 0 is normalized as ḡJ (x) = gj (x)/bj − 1 ≥ 0.
Equality constraints can also be normalized similarly to obtain normalized
constraint h̄k (x). In constraints where no constant term exists (gj (x) ≥ 0),
the constraint can be divided by gj (x̄), where x̄ = 0.5(x(L) + x(U ) ). Second,
the constraint violation (CV(x)) can be determined as follows:
J
∑ K
∑
CV(x) = ⟨ḡj (x)⟩ + |h̄k (x)|. (6.5)
j=1 k=1
Note that the above function always takes nonnegative values. When the
constraint violation is zero, the solution is feasible. Thus, in the early
generations while solving complex problems, most or all population members
may be infeasible. Then, the above selection operation will face the third
scenario most often and will emphasize the solutions that have smaller
constraint violations. Thus, early on, a GA with the above approach will
attempt to progress towards the feasible region and when a few feasible
solutions emerge, they will be emphasized by the first scenario and later on
when most solutions are feasible the second scenario will be predominant.
Another way to use the above penalty-parameter-less approach is to
convert the problem into the following unconstrained fitness function:
f (x), if x is feasible,
F(x) = (6.6)
fmax + CV(x), if x is infeasible,
where fmax is the maximum objective function value among the feasible
solutions in a population. If a population does not have any feasible solution,
fmax can be chosen as zero. Thus, when no feasible solution exists in the
population, the GA minimizes constraint violation and when some feasible
solutions exist, the GA emphasizes them more than the infeasible solutions.
Constrained optimization has received a lot of attention in the recent past.
Various methods have been tried and the following are worth mentioning (Deb
and Datta, 2010; Takahoma and Sakai, 2006; Brest, 2009).
314 Optimization for Engineering Design: Algorithms and Examples
bits. They have a higher probability of getting exchanged than the left-most
bits in the string. Thus, if ten variables are coded left to right with the first
variable being at the left-most position and the tenth variable at the right-
most position, the effective search on the tenth variable is more compared to
the first variable. In order to overcome this difficulty, a multi-point crossover
is often used. The operation of a two-point crossover operator is shown below:
0 0 0 0 0 0 1 1 0 0
⇒
1 1 1 1 1 1 0 0 1 1
Two random sites are chosen along the string length and bits inside the cross-
sites are swapped between the parents. An extreme of the above crossover
operator is to have a uniform crossover operator where a bit at any location
is chosen from either parent with a probability 0.5. In the following, we show
the working of a uniform crossover operator, where the first and the fourth
bit positions have been exchanged.
0 0 0 0 0 1 0 0 1 0
⇒
1 1 1 1 1 0 1 1 0 1
This operator has the maximum search power among all of the above
crossover operators. Simultaneously, this crossover has the minimum survival
probability for the good bit combinations (schemata) from parents to
children.
As GAs use a coding of variables, they work with a discrete search space.
Even though the underlying objective function is a continuous function, GAs
convert the search space into a discrete set of points. In order to obtain the
optimum point with a desired accuracy, strings of sufficient length need to
be chosen. GAs have also been developed to work directly with continuous
variables (instead of discrete variables). In those GAs, binary strings are not
used. Instead, the variables are directly used. Once a population of random
sets of points is created, a reproduction operator (roulette-wheel or other
selection operators) can be used to select good strings in the population. In
order to create new strings, the crossover and mutation operators described
earlier cannot be used efficiently. Even though simple single-point crossover
can be used on these points by forcing the cross-sites to fall only on the
variable boundaries, the search is not adequate. The search process then
mainly depends on the mutation operator. This type of GA has been used in
earlier studies (Wright, 1991). Recently, new and efficient crossover operators
have been designed so that search along variables is also possible. Let us
(j) (k)
consider xi and xi values of design variables xi in two parent strings j and
316 Optimization for Engineering Design: Algorithms and Examples
k. The crossover between these two values may produce the following new
value (Radcliffe, 1990):
(j) (k)
xnew
i = (1 − λ)xi + λxi , 0 ≤ λ ≤ 1. (6.7)
The parameter λ is a random number between zero and one. The above
(j) (k)
equation calculates a new value bracketing xi and xi . This calculation
is performed for each variable in the string. This crossover has a uniform
probability of creating a point inside the region bounded by two parents. An
extension to this crossover is also suggested to create points outside the range
bounded by the parents. Eshelman and Schaffer (1993) have suggested a
blend crossover operator (BLX-α), in which a new point is created uniformly
(i) (k)
at random from a larger range extending an amount α|xj − xj | on either
side of the region bounded by two parents. The crossover operation depicted
in Equation (6.7) can also be used to achieve BLX-α by varying λ in the
range (−α, 1 + α). In a number of test problems, Eshelman and Schaffer have
observed that α = 0.5 provides good results. One interesting feature of this
type of crossover operator is that the created point depends on the location of
both parents. If both parents are close to each other, the new point will also
be close to the parents. On the other hand, if parents are far from each other,
the search is more like a random search. The random search feature of these
crossover operators can be relaxed by using a distribution other than random
distribution between parents. A recent study using a polynomial probability
distribution with a bias towards near-parent points has been found to perform
better than BLX-0.5 in a number of test problems (Deb and Agrawal, 1995).
Moreover, the distribution of children points with this crossover resembles that
of the single-point, binary crossover operator. More studies in this direction
are necessary to investigate the efficacy of real-coded GAs.
IN 1995, Deb and Agrawal (1995) suggested a real-parameter
recombination operator that mimics the inherent probability distribution of
creating children in a binary single-point crossover applied to a real-parameter
space. In the so-called simulated binary crossover (SBX) operator, a large
probability is assigned to a point close to a parent and a small probability to
a point away from a parent. The probability distribution is controlled by a
user-defined parameter ηc (called the distribution index). SBX is implemented
variable-wise, that is, for variable xi , two parent values of that variable
(1) (2)
(p1 = xi and p2 = xi , assuming p2 > p1 ) of two solutions x(1) and x(2) are
recombined (blended) to create two child values (c1 and c2 ) as follows:
p1 + p2 p2 − p1
c1 = − βL , (6.8)
2 2
p1 + p2 p2 − p1
c2 = + βR , (6.9)
2 2
Nontraditional Optimization Algorithms 317
where
0.5
αL = ( )ηc +1 , (6.12)
(L)
1 + 2(p1 − xi )/(p2 − p1 )
0.5
αR = ( )ηc +1 . (6.13)
(U )
1 + 2(xi − p2 )/(p2 − p1 )
The above calculations ensure that created child solutions c1 and c2 always
(L) (U )
lie within the specified variable bounds (xi , xi ). For single-objective
optimization problems, experimental results have shown that ηc ≈ 2 produces
good results (Deb and Agrawal, 1995). The larger the value of ηc , the smaller
is the difference between parents and respective created child. Figure 6.11
shows the probability distribution of children solutions near two parents
(p1 = 2.0 and p2 = 5.0) with ηc = 2 and having bounds [1, 8]. Notice that
Figure 6.11 Probability density function of creating a child solution from two
parents p1 = 2.0 and p2 = 5.0 over the entire search space [1, 8] is
shown.
318 Optimization for Engineering Design: Algorithms and Examples
the probability density is zero outside the specified variable bounds. It is also
clear that points near p1 = 2.0 and p2 = 5.0 are more likely to be created
than away from them.
A similar mutation operator can also be devised for real-parameter GAs.
Here, we describe the commonly used polynomial mutation operator (Deb,
2001). As the name suggests, a polynomial probability distribution is used to
create a mutated solution. The order of the polynomial (ηm ) is a user-defined
parameter. This operator is also applied variable-wise. For a given parent
(L) (U )
solution p ∈ [xi , xi ], the mutated solution p′ for a particular variable is
created for a random number u created within [0,1], as follows:
(L)
p + δ̄L (p − xi ), for u ≤ 0.5,
p′ = (U )
(6.14)
p + δ̄R (x − p), for u > 0.5.
i
The parameter ηm is usually chosen in the range [20, 100]. The larger the
value, the smaller is the difference between the parent and the created child
solution. The above calculations ensure that the created mutated child is
always bounded within the specified lower and upper bounds. Figure 6.12
shows the probability density of creating a mutated child point from a parent
point p = 3 in a bounded range of [1, 8] with ηm = 20. Notice how with
ηm = 20 points only near the parent (xi = 3) can be created. Also since the
Figure 6.12 Probability density function of creating a mutated child solution from
a parent p = 3.0 over the entire search space [1, 8] is shown.
lower boundary is closer to the parent solution, points are created more closer
to the parent in the left side than that in the right side, although the overall
probability of creating a mutated child is equal in left and right sides of the
Nontraditional Optimization Algorithms 319
parent. Thus, when a solution lying on a boundary (say, on the left bound)
is mutated, there is 50% probability for the point to remain on the same
boundary and the remaining 50% probability exists for an inside solution to
become a mutated child.
Solution B, on the other hand, costs more, but is less prone to accidents.
In their respective ways, both these solutions are useful, but solution C is
not as good as solution B in both objectives. It incurs more cost as well as
more accidents. Thus, in multi-objective optimization problems, there exist
a number of solutions which are optimum in some sense. These solutions
constitute a Pareto-optimal front shown by the thick-dashed line in the figure.
Since any point on this front is an optimum point, it is desirable to find as
many such points as possible. Recently, a number of extensions to simple GAs
have been tried to find many of these Pareto-optimal points simultaneously
(Horn and Nafpliotis, 1993; Fonseca and Fleming, 1993; Srinivas and Deb,
1995). Consider the following two objective functions for minimization:
f1 (x) = x2 ,
f2 (x) = (x − 2)2 ,
in the interval −1000 ≤ x ≤ 1000. The Pareto-optimal front for this problem
is the complete region 0 ≤ x ≤ 2. With a random initial population of 100
points in the range −1000 ≤ x ≤ 1000, the modified GA converges to the range
320 Optimization for Engineering Design: Algorithms and Examples
The NSGA algorithm was modified by the author and his students to
devise an elitist NSGA or NSGA-II which has received a lot of attention since
its original suggestion in 2000 (Deb et al., 2000) and then a modified version
in 2002 (Deb et al., 2002). The NSGA-II algorithm has the following three
features:
(i) It uses an elitist principle,
(ii) It uses an explicit diversity preserving mechanism.
(iii) It emphasizes non-dominated solutions.
At any generation t, the offspring population (say, Qt ) is first created by using
the parent population (say, Pt ) and the usual genetic operators. Thereafter,
the two populations are combined together to form a new population (say,
Rt ) of size 2N . Then, the population Rt is classified into different non-
dominated classes. Thereafter, the new population is filled by points of
different non-dominated fronts, one at a time. The filling starts with the first
non-dominated front (of class one) and continues with points of the second
non-dominated front, and so on. Since the overall population size of Rt is
2N , not all fronts can be accommodated in N slots available for the new
population. All fronts which could not be accommodated are deleted. When
the last allowed front is being considered, there may exist more points in
the front than the remaining slots in the new population. This scenario is
illustrated in Figure 6.15. Instead of arbitrarily discarding some members
from the last front, the points which will make the diversity of the selected
points the highest are chosen.
The crowded-sorting of the points of the last front which could not be
accommodated fully is achieved in the descending order of their crowding
Nontraditional Optimization Algorithms 321
distance values and points from the top of the ordered list are chosen. The
crowding distance di of point i is a measure of the objective space around
i which is not occupied by any other solution in the population. Here, we
simply calculate this quantity di by estimating the perimeter of the cuboid
(Figure 6.16) formed by using the nearest neighbours in the objective space
as the vertices (we call this the crowding distance).
The second problem (KUR), with three variables, has a disconnected Pareto-
optimal front:
√
∑2 [ ]
Minimize f 1 (x) = −10 exp(−0.2 x 2 + x2 ) ,
i=1 i i+1
∑3 [ ]
KUR : Minimize f2 (x) = i=1 |xi |0.8 + 5 sin(x3i ) , (6.18)
−5 ≤ xi ≤ 5, i = 1, 2, 3.
NSGA-II is run with a population size of 100 and for 250 generations. The
variables are used as real numbers and an SBX recombination operator (Deb
and Agrawal, 1995) with pc = 0.9 and distribution index of ηc = 10 and a
polynomial mutation operator (Deb, 2001) with pm = 1/n (n is the number
of variables) and distribution index of ηm = 20 are used. Figures 6.17 and
6.18 show that NSGA-II converges to the Pareto-optimal front and maintains
a good spread of solutions on both test problems.
(Zitzler et al., 2001), Pareto archived evolution strategy (PAES) and its im-
proved versions PESA and PESA2 (Corne et al., 2000), multi-objective messy
GA (MOMGA) (Veldhuizen and Lamont, 2000), multi-objective Micro-GA
(Coello and Toscano, 2000), neighbourhood constraint GA (Loughlin and
Ranjithan, 1997), ARMOGA (Sasaki et al., 2001) and others. Besides, there
exists other EA-based methodologies, such as Particle swarm EMO (Coello
and Lechuga, 2002; Mostaghim and Teich, 2003), Ant-based EMO (McMullen,
2001; Gravel et al., 2002), and differential evolution-based EMO (Babu and
Jehan, 2003).
NSGA-II employs an efficient constraint-handling procedure. The
constraint-handling method modifies the binary tournament selection, where
two solutions are picked from the population and the better solution is chosen.
In the presence of constraints, each solution can be either feasible or infeasible.
Thus, there may be at most three situations: (i) both solutions are feasible,
(ii) one is feasible and the other is not, and (iii) both are infeasible. We
consider each case by simply redefining the domination principle as follows
(we call it the constrained-domination condition for any two solutions x(i)
and x(j) ): A solution x(i) is said to ‘constrained-dominate’ a solution x(j) (or
x(i) ≼c x(j) ), if any of the following conditions are true:
(i) Solution x(i) is feasible and solution x(j) is not.
(ii) Solutions x(i) and x(j) are both infeasible, but solution x(i) has a
smaller constraint violation, which can be computed by adding the
normalized violation of all constraints:
J
∑ K
∑
CV(x) = max (0, −ḡj (x)) + abs(h̄k (x)).
j=1 k=1
Figure 6.20 A function with five maximum points. A population of 50 points after
100 generations of a GA simulation with sharing functions shows that
all optimum points are found.
Nontraditional Optimization Algorithms 325
optimization using GAs is beyond the scope of this book. Interested readers
may refer to the above-mentioned GA literature for more details.
Besides, a plethora of other applications including non-stationary function
optimization, job shop scheduling problems, routing problems, and travelling
salesperson problems have been tried using GAs. Among engineering
applications, structural optimization problems (Hajela, 1990; Jenkins, 1991;
Deb, 1991; Rajeev and Krishnamoorthy, 1992), turbine blade design problem
(Powell et al., 1989), laminate stacking problem in composite structures
(Callahan and Weeks, 1992), and pipeline optimization (Goldberg, 1983) are
some of the applications. The growing list of successful applications of GAs
to different engineering problems confirms the robustness of GAs and shows
promise of GAs in solving other engineering design problems.
EXERCISE 6.2.1
The objective of this exercise is to minimize the following problem using the
simulated annealing method.
Minimize f (x1 , x2 ) = (x21 + x2 − 11)2 + (x1 + x22 − 7)2 .
range (0, 1): r = 0.746. The probability of accepting the new point is
exp (−9.379/50.625)= 0.831. Since, r < 0.831, we accept this point also.
Even though the temperature is comparatively low, this point is accepted
because the difference in the objective function value is small. We increment
the iteration counter t = 4 and move to Step 4.
Step 2 The point created at this step is x(5) = (0.793, 1.950)T with a
function value equal to f (x(5) ) = 76.697.
Step 3 Since the difference in function values at x(6) and x(5) (∆E = −23.152)
is negative, the point x(6) is better than the earlier point x(5) . Thus, we accept
this point.
Figure 6.21 The contour plot of the multimodal objective function. The figure
shows the four minima at different locations. The global minimum is
also marked.
steepest descent method works from a chosen initial point by finding a search
direction along the negative of the gradient at the point. At every iteration,
the best point along the search direction is found and the search continues
from this point by finding a new search direction along the negative of the
gradient at that point. We have highlighted in Chapter 3 that the solution of
the steepest descent method largely depends on the chosen initial point. In
order to eliminate this bias, we consider 121 points for each variable (uniformly
spaced in the range x1 , x2 ∈ (−6, 6)), thereby making a total of 121 × 121 or
14,641 initial points. Thereafter, we apply the steepest descent algorithm
14,641 times, every time starting from a different point. Since there are many
results to report, we have plotted the outcome of all runs in Figure 6.22 in a
simple way. If a simulation starting from an initial solution x0 converges6 to
the vicinity of the global minimum (3, 2)T , we mark that initial point x0 by a
small solid box in the figure. This means that all simulation runs starting with
points marked in solid boxes in the figure have successfully found the global
minimum, whereas rest of the runs could not find the global minimum. It is
clear from the plot that the convergence of steepest descent algorithm largely
depends on the chosen initial point in the case of problems with multiple
optima. Out of 14,641 points, 4,012 points have successfully converged to the
global minimum. The successful runs have taken on an average 215 function
evaluations for convergence. Since the set of successful initial points contains
about one-fourth of the search space (Figure 6.22), it can be concluded that
on an average one out of four simulations of the steepest descent algorithm
solves the above function to global optimality. This result is typical for many
6
A simulation run is considered to have converged if in two successive sequences,
points with three decimal places accuracy are found.
332 Optimization for Engineering Design: Algorithms and Examples
Figure 6.22 The starting points that successfully converged to the global
minimum (3, 2)T with the steepest descent method are marked in
solid boxes. Approximately 72 per cent of the points could not
converge to the global minimum.
Genetic algorithms are applied next. Recall that in order to use GAs, the
variables need to be coded first. A 10-bit binary coding for each variable is
chosen. With 10 bits for each variable in the interval (−6, 6)T , the obtainable
accuracy is 0.012. The total length of a string becomes 20. A population
size of n = 20 is used. Binary tournament selection procedure is used. For
crossover, the single-point crossover with a crossover probability pc = 0.8
is used. For mutation, the bit-wise mutation with a mutation probability
pm = 0.05 is chosen. In Figure 6.23, the initial population of points is marked
with empty circles. The figure shows that all quadrants have almost equal
number of points to start with. The population after 50 generations is also
shown in Figure 6.23 (marked with solid circles). The figure shows that the
population has converged near the global minimum, even though it could have
converged to any of the other three minima. The population at generation 50
contains a point (2.997, 1.977)T with a function value equal to 0.0106.
The above result is obtained by simulating GAs on only one initial
population. In order to investigate the robustness of GAs, we apply them on
a number of different initial populations. As mentioned earlier, the successful
working of GAs depends on the proper selection of GA parameters. Thus, we
investigate the effect of GA parameters on the number of function evaluations
required to solve the above multimodal problem. Table 6.3 shows the outcome
of these experiments. For each parameter setting (crossover probability pc ,
Nontraditional Optimization Algorithms 333
Figure 6.23 All 20 points in the initial population and the population at
generation 50 shown on the contour plot of the multimodal function.
The empty circles represent the initial population and the solid circles
represent the population at generation 50.
Figure 6.24 The progress of the simulated annealing algorithm for a simulation
run starting from the point (−6, −6)T . The algorithm finally
converges to the point (3, 2)T .
variable) and the simulated annealing algorithm is started from each point
independently. If the simulation starting from an initial point converges to a
point close to the global minimum, that initial point is marked with a small
solid box in Figure 6.25. In each case, the annealing schedule and parameters
Figure 6.25 Initial points that were successful in converging near the global
minimum using the simulated annealing procedure are shown by small
solid boxes. About 88 per cent points have converged to the true
solution.
as mentioned above are used. The figure shows that simulations starting from
most points in the domain have converged to the global optimum. In fact,
out of 14,641 points, 12,845 points (about 88 per cent) have converged to the
global minimum. The average number of function evaluations required in all
successful simulations is 603.
It is worth mentioning here that no effort is made to find the optimal
parameter setting (annealing schedule or number of iterations in a stage) for
solving the above problem. It may be possible to improve the performance
with a better parameter setting. Nevertheless, these results show that GAs
and simulated annealing stand as better candidates than the steepest-descent
technique in solving multimodal functions to global optimality. Although
a simple numerical problem with only four optima is used to compare the
performance of nontraditional with traditional optimization methods, the
performance of nontraditional methods is expected to be much better than
traditional methods in more complex multimodal problems.
6.4 Summary
Two nontraditional search and optimization algorithms motivated by
natural principles have been described in this chapter. Genetic algorithms
work according to the principles of natural genetics on a population of
336 Optimization for Engineering Design: Algorithms and Examples
REFERENCES
PROBLEMS
What should be the minimum string length of any point (x, y, z)T coded in
binary string to achieve the following accuracy in the solution:
(i) Two significant digits.
(ii) Three significant digits.
6-2 Repeat Problem 6-1 if ternary strings (with three alleles 0, 1, and 2)
are used instead of binary strings.
6-3 We would like to use a binary-coded genetic algorithm to solve the
following NLP problem:
subject to
4.5x1 + x22 − 22.5 ≤ 0,
2x1 − x2 − 1 ≥ 0,
0 ≤ x1 , x2 ≤ 4.
We decide to have two and three decimal places of accuracy for variables x1
and x2 , respectively.
(i) Find the optimum solution graphically and clearly indicate the
optimum solution on the plot.
(ii) At least how many bits are required for coding the variables?
(iii) Use the penalty-parameter-less fitness assignment procedure to
compute the fitness of the following population members: (i) (2, 1)
(ii) (6, 2) (iii) (0, 3) (iv) (2, 8) (v) (3, 0) (vi) (1, −1) (vii) (2.5, 2).
subject to
4.5x1 + x22 − 18 ≤ 0,
2x1 − x2 − 1 ≥ 0,
0 ≤ x1 , x2 ≤ 4.
We decide to have three and two decimal places of accuracy for variables x1
and x2 , respectively.
(i) How many bits are required for coding the variables?
(ii) Write down the fitness function which you would be using in
reproduction.
(iii) Which schemata represent the following regions according to your
coding:
(a) 0 ≤ x1 ≤ 2,
(b) 1 ≤ x1 ≤ 2, and x2 > 2?
Strings Fitness
0101101 10
1001001 8
1010110 14
0111110 20
1110111 2
0111100 12
Strings Fitness
01101 5
11000 2
10110 1
00111 10
10101 3
00010 100
Find out the expected number of copies of the best string in the above
population in the mating pool under
Nontraditional Optimization Algorithms 343
If only the reproduction operator is used, how many generations are required
before the best individual occupies the complete population under each
selection operator?
6-9 Consider the following population of five strings, having three choices
in each place:
6-11 From a schema processing point of view, derive the schema theorem
for a ternary string coding (three options at each position) under roulette-
wheel selection, single-point crossover and allele-wise mutation (one allele
value changing to any of the two other allele values with equal probability).
6-12 Consider the following single-variable maximization problem with x ∈
[0, 1] and having global maximum at x = 1 and a local maximum at x = 0.25
that is to be solved using a binary-coded GA with a very large-size string
coding for x:
3.2x, if 0 ≤ x ≤ 0.25,
3.2(0.5 − x), if 0.25 ≤ x ≤ 0.5,
f (x) =
0, if 0.5 ≤ x ≤ 0.875,
8(x − 0.875), if 0.875 ≤ x ≤ 1.
6-14 Using real-coded GAs with simulated binary crossover (SBX) having
η = 2, find the probability of creating children solutions in the range 0 ≤ x ≤ 1
Nontraditional Optimization Algorithms 345
with two parents x(1) = 0.5 and x(2) = 3.0. Recall that for SBX operator, the
children solutions are created using the following probability distribution:
0.5(η + 1)β η , if β ≤ 1;
P(β) =
0.5(η + 1)/β η+2 , if β > 1.
Calculate the probability of finding a child in the range xi ∈ [0.0, 5.0] for
i = 1, 2 using the simulated binary crossover (SBX) operator with ηc = 2.
6-17 Apply the polynomial mutation operator (with ηm = 5.0) to create
a mutated child of the solution x(t) = 6.0 using a random number 0.675.
Assume x ∈ [0, 10].
6-18 Find the probability of creating an offspring in the range x ∈ [0.8, 1.0]
from the parent p = 0.9 using polynomial mutation operator with ηm = 20
for a bounded problem with x ∈ [0, 1].
6-19 Calculate the overall probability of creating a child in the range
0.0 ≤ (x1 , x2 ) ≤ 0.2 by mutating a parent (0.5, 0.3)T for a two-variable
unbounded problem using polynomial mutation with ηm = 10. Assume a
variable-wise mutation probability of pm = 0.5.
6-20 A cantilever beam of circular cross-section (diameter d) has to be
designed for minimizing the cost of the beam. The beam is of length l = 0.30
m and carries a maximum of F = 1 kN force at the free end. The beam
material has E = 100 GPa, Sy = 100 MPa, and density ρ = 7866 kg/m3 . The
material costs Rs. 20 per kg. There are two constraints: (a) Maximum stress
in the beam (σ = 32F l/(πd3 )) must not be more than the allowable strength
Sy , (b) Maximum deflection of the beam must not be more than 1 mm. The
deflection at the free end due to the load F is δ = 64F l3 /(3πEd4 ). The
volume of the beam is πd2 l/4. The initial population contains the following
solutions: d = 0.06, 0.1, 0.035, 0.04, 0.12, and 0.02 m.
(i) For each of six solutions, calculate cost in rupees and determine its
feasibility in a tabular format.
(ii) Determine the fitness of each solution using penalty-parameter-less
constraint handling strategy. Which is the best solution in the above
population?
346 Optimization for Engineering Design: Algorithms and Examples
Minimize x3 − x,
Minimize x2 ,
subject to x ≥ 0,
Minimize f (x),
Maximize f (x),
subject to
x ∈ S.
Individual optimal solutions of the above two objectives are x∗min and x∗max ,
respectively. Which solution(s) in the search space S constitute the Pareto-
optimal front?
6-23 For the following problem
Minimize f1 (x) = x1 ,
Minimize f2 (x) = x2 ,
subject to
Soln. Id. f1 f2 f3
1 2 3 1
2 5 1 10
3 3 4 10
4 2 2 2
5 3 3 2
6 4 4 5
6-25 Consider the following parent and offspring populations for a problem
of minimizing the first objective and maximizing the second objective:
Soln. Pt Soln. Qt
f1 f2 f1 f2
1 5.0 2.5 a 3.5 0.0
2 1.0 1.0 b 2.2 0.5
3 1.5 0.0 c 5.0 2.0
4 4.5 1.0 d 3.0 3.0
5 3.5 2.0 e 3.0 1.5
subject to
(i) If constraints are not used, sort the population in increasing level of
non-domination.
(ii) By considering the constraints, sort the population in increasing level
of constraint non-domination.
348 Optimization for Engineering Design: Algorithms and Examples
COMPUTER PROGRAM
ncross = 0
return
end
common/sgaparam/ipopsize,lchrom,maxgen,ncross,
- nmute,pcross,pmute,nparam
integer*1 chrom(ipopsize,lchrom), chr(lchrom)
real oldx(ipopsize,nparam), fitness(ipopsize)
real alow(nparam), ahigh(nparam), factor(nparam)
integer lsubstr(nparam)
real x(nparam)
do 121 j = 1,ipopsize
do 122 j1 = 1,lchrom
c.......... create random strings of 1 and 0
chr(j1) = iflip(0.5)
chrom(j,j1) = chr(j1)
122 continue
c........ calculate x values from the string
call decodevars(chr,x,lsubstr,alow,ahigh,
- factor)
do 123 j1 = 1,nparam
oldx(j,j1) = x(j1)
123 continue
c........ calculate the fitness of the string
fitness(j) = funct(x)
121 continue
return
end
10 continue
c.....compute the average fitness of the population
avg = sumfitness/float(ipopsize)
return
end
do 11 j1 = 1,nparam
newx(j+1,j1) = x(j1)
11 continue
c.....compute fitness of the second child string
newfit(j+1) = funct(x)
c.....update the individial count in the population
j = j + 2
c.....if the population is already filled up, quit
if(j .le. ipopsize) go to 181
return
end
c *** primer for the stochastic rem. RW selection
c updates array ’choices’ (mate pool) ******
subroutine preselect(oldfit,choices,fraction,
- nremain)
common/sgaparam/ipopsize,lchrom,maxgen,ncross,
- nmute,pcross,pmute,nparam
common/statist/igen,avg,amax,amin,sumfitness
real oldfit(ipopsize)
integer choices(ipopsize)
real fraction(ipopsize)
integer*1 winner
j=0
k=0
141 j=j+1
expected = oldfit(j) / avg
c.....assign mantissa number of copies first
jassign = ifix(expected)
c.....do roulette wheel operation with decimals
fraction(j) = expected - jassign
142 if(jassign .lt. 0) go to 143
k = k + 1
jassign = jassign - 1
choices(k) = j
go to 142
143 if(j .lt. ipopsize) go to 141
j = 0
144 if(k .ge. ipopsize) go to 145
j = j + 1
if(j .gt. ipopsize) j = 1
if(fraction(j) .gt. 0.0) then
c....... check if the wheel points to the individual
winner = iflip(fraction(j))
if(winner .eq. 1) then
k = k + 1
choices(k) = j
Nontraditional Optimization Algorithms 357
integer*1 oldchr(ipopsize,lchrom)
integer*1 newchr(ipopsize,lchrom)
c.....check if a crossover is to be performed
if(iflip(pcross) .eq. 1) then
c.....if yes, create a random cross site
jcross = irnd(1,lchrom-1)
ncross = ncross + 1
else jcross = lchrom
endif
c.....copy till the cross site as it is
do 171 j = 1,jcross
newchr(ipop, j) = mutation(oldchr(mate1,j))
newchr(ipop+1,j) = mutation(oldchr(mate2,j))
171 continue
if(jcross .eq. lchrom) go to 173
c.....swap from the cross site till the end of string
do 172 j = jcross + 1,lchrom
newchr(ipop, j) = mutation(oldchr(mate2,j))
newchr(ipop+1,j) = mutation(oldchr(mate1,j))
172 continue
173 return
end
subroutine warmup_random(random_seed)
real oldrand(55)
common/randvar/oldrand,jrand
oldrand(55) = random_seed
rand_new = 1.0e-9
prev_rand = random_seed
do 21 j1 = 1,54
ii = modop(21*j1,55)
oldrand(ii) = rand_new
rand_new = prev_rand - rand_new
if(rand_new.lt.0.0) rand_new = rand_new + 1.0
prev_rand = oldrand(ii)
21 continue
call advance_random
call advance_random
call advance_random
jrand = 0
return
end
real oldrand(55)
common/randvar/oldrand,jrand
jrand = jrand + 1
if(jrand.gt.55) then
jrand = 1
call advance_random
endif
random = oldrand(jrand)
return
end
Simulation Run
The above code runs successfully on a PC-386 under Microsoft FORTRAN.
The two-variable unconstrained Himmelblau function is coded in the function
funct for minimization. In order to demonstrate the working of the code, we
present the results of one simulation run. The input values to the code are
given below.
Genetic Algorithms in FORTRAN
Kalyanmoy Deb
All rights reserved
GA parameters
-----------------
population size = 20
Nontraditional Optimization Algorithms 365
chromosome length = 20
max. # of generations = 30
crossover probability = 0.800
mutation probabilty = 0.050
seed random number = 0.123
x fitness | x fitness
x fitness | x fitness
1) 0.860 4.326 0.00515 | 2.620 2.111 0.19574
2) 0.279 3.964 0.00768 | 2.439 1.955 0.08980
3) 2.165 1.955 0.04760 | 2.791 2.170 0.42838
4) 2.630 2.869 0.05752 | 0.850 4.301 0.00528
5) 0.836 4.301 0.00529 | 2.542 3.069 0.03598
6) 2.791 3.025 0.03925 | 2.791 2.107 0.43999
7) 2.620 2.131 0.19803 | 0.860 1.823 0.01248
8) 0.547 1.408 0.00932 | 2.678 3.983 0.00744
9) 2.678 1.481 0.09066 | 3.050 1.559 0.30053
10) 4.027 0.547 0.02415 | 0.802 1.408 0.01011
11) 0.440 1.926 0.01136 | 2.620 2.869 0.05735
12) 1.799 4.164 0.00620 | 2.551 2.131 0.15204
13) 2.688 3.123 0.03221 | 2.165 2.072 0.05191
14) 2.287 4.027 0.00733 | 2.678 1.911 0.19546
15) 3.871 1.496 0.03142 | 1.804 4.164 0.00620
16) 2.620 1.408 0.07040 | 2.620 1.505 0.08068
17) 3.407 1.955 0.13145 | 0.283 3.964 0.00767
18) 2.004 1.725 0.03055 | 2.625 2.869 0.05743
19) 2.786 0.367 0.03862 | 0.127 0.469 0.00642
20) 2.107 1.569 0.03140 | 2.776 2.004 0.36967
x fitness | x fitness
x fitness | x fitness
369
370 Optimization for Engineering Design: Algorithms and Examples
Maximize f (x) = c1 x1 + c2 x2 + . . . + cN xN
subject to
a11 x1 + a12 x2 + · · · + a1N xN = b1 ,
a21 x1 + a22 x2 + · · · + a2N xN = b2 , (A.1)
.. ..
. .
aJ1 x1 + aJ2 x2 + · · · + aJN xN = bJ ,
xi ≥ 0, i = 1, 2, . . . , N.
Even though all constraints in the above problem are shown to be equality
constraints, inequality constraints can also be handled using the linear
programming method. An inequality constraint is usually transformed into an
equivalent equality constraint by introducing a slack variable. If the inequality
constraint is of lesser-than-equal-to type, a slack variable is added to the left-
side expression. For example, the constraint gj (x) ≤ 0 can be converted into
an equality constraint by adding a slack variable xN +j as follows:
gj (x) + xN +j = 0,
where xN +j ≥ 0. The idea is that since the quantity gj (x) is less than or equal
to zero, we add a positive quantity to make the sum zero. On the other hand,
if the constraint is of greater-than-equal-to type, a slack variable is subtracted
from the left side expression. The constraint gj (x) ≥ 0 is converted to
gj (x) − xN +j = 0,
.. .. .. .. ..
. . . . .
′ ′ ′
xJ + aJ(J+1) xJ+1 + · · · + aJN xN = bJ .
(A.2)
Linear Programming Algorithms 371
Thus, we can calculate (∆f )q for all nonbasic variables and choose the one
with the maximum positive value. In the case of a tie (same (∆f )q for more
than one nonbasic variables), any nonbasic variable can be chosen at random.
It is worth mentioning here that if for all remaining nonbasic variables, the
quantity (∆f )q is non-positive, no increment in the objective function value
is possible. This suggests that the optimum solution has been obtained, and
we terminate the algorithm.
Once a nonbasic variable (xq ) is chosen, the next question is, which of the
basic variables has to be made nonbasic. It is clear that as xq is increased
from zero to one, the objective function value will also increase. Therefore, we
may want to increase the nonbasic variable xq indefinitely. But there is a limit
to the extent of this increment. When xq is increased, all basic variables must
either be increased, decreased, or kept the same in order to make the solution
feasible. Recall that all variables in a linear program must be nonnegative.
Thus, the critical basic variable is the one which, when reduced, becomes zero
first. Any more increase in the chosen nonbasic variable will make that basic
variable negative. From the row-echelon formulation, we can write the value
of a basic variable as follows:
xj = b′j − a′jq xq , j = 1, 2, . . . , J.
A basic variable becomes zero when xq = b′j /a′jq . Since, in the row-echelon
form, all b′j are nonnegative, this can happen only when a′jq is positive. In
order to find the critical basic variable, we compute the quantity b′j /a′jq and
choose the basic variable xq for which this quantity is minimum. This rule is
also known as minimum ratio rule in linear programming.
The simplex method begins with an initial feasible solution. Thereafter,
a basic variable is replaced by a nonbasic variable chosen according to rules
described above. Thus, the simplex method is an iterative method which
works by alternating among various basic and nonbasic variables so as to
achieve the optimum point efficiently. Since all constraints and the objective
Linear Programming Algorithms 373
function are linear, these points are the corners of the feasible search region.
In the following, we describe the algorithm:
Algorithm
Step 1 Choose a basic feasible solution. Set all nonbasic variables to zero.
Step 2 Calculate the quantity (∆f )q for all nonbasic variables and choose
the one having the maximum value. If (∆f )q ≤ 0 for all nonbasic variables,
Terminate;
Else use the minimum ratio rule to choose the basic variable to be replaced.
Step 3 Perform a row-echelon formulation for new basic variables and go
to Step 2.
The algorithm assumes that the underlying problem can be written in the
form shown in Equation (A.1). If some variables in the original problem take
negative values, they need to be suitably transformed so that the new variables
can only take nonnegative values. Inequality constraints are converted into
equality constraints by adding or subtracting slack variables. The algorithm
stated above works only for maximization problems. There are two ways
the above algorithm can be used to solve minimization problems. The duality
principle (by multiplying the objective function by −1) can be used to convert
the minimization problem into an equivalent maximization problem. In the
other approach, the criterion for choosing a nonbasic variable to make it basic
needs to be changed. Recall that in the maximization problem, the nonbasic
variable for which the quantity (∆f )q is maximally nonnegative is chosen.
That variable increases the objective function at the highest rate. Similarly,
the nonbasic variable for which the quantity (∆f )q is maximally non-positive
will decrease the objective function value at the highest rate. Thus, the above
algorithm can be used for the minimization problems by selecting nonbasic
variables according to the most negative (∆f )q . The rest of the algorithm can
be used as above. The other change is that the algorithm is terminated when
all nonbasic variables have nonnegative values of the quantity (∆f )q .
We illustrate the working of the simplex algorithm by considering a simple
constrained optimization problem.
EXERCISE A.2.1
Consider the following problem:
subject to
x1 ≤ 6,
x1 + 2x2 ≤ 10,
x1 , x2 ≥ 0. (A.3)
374 Optimization for Engineering Design: Algorithms and Examples
The feasible solution space and the optimum solution are shown in
Figure A.1. It is clear that the optimum solution is (6, 2)T with a function
value equal to 18. Let us investigate how we can obtain this solution using
the simplex method.
Figure A.1 The linear program described in Exercise A.2.1. The feasible region
and the optimal solution are shown.
x1 + x3 = 6,
x1 + 2x2 + x4 = 10.
The variables x3 and x4 are slack variables and can take only nonnegative
values. Thus, the optimization problem becomes as follows:
subject to
x1 + x3 = 6,
x1 + 2x2 + x4 = 10,
x1 , x2 , x3 , x4 ≥ 0.
2 3 0 0
cB Basic x1 x2 x3 x4
6
0 x3 1 0 1 0 6 0 =∞
10
0 x4 1 2 0 1 10 2 =5 ←
(∆f )q 2 3 0 0 f (x) = 0
↑
The top row of the table represents the coefficients of the variables in the
objective function. The next row shows all variables. The next two rows
show the coefficients (aij ) in the constraint expressions. There will be as many
rows as there are constraints. The seventh column shows the corresponding
bj values in the constraints. The second column shows the basic variables
and the first column shows the corresponding coefficients in the objective
function. The bottom row shows the quantity (∆f )q for all variables. For
basic variables, this quantity must be zero. For nonbasic variables, this
quantity is calculated in Step 2. The above table represents the current linear
programming problem and it contains all the information necessary to form
the next row-echelon table. The table also represents the current solution—all
nonbasic variables are zero and basic variables (xq ) are set equal to bj where
the row j corresponds to the coefficient aqj = 1 of the basic variable. It is
important to note that in the column of every basic variable there must be
only one entry of aij = 1 and rest of the entries must be zero. This initial
solution is marked as A in Figure A.2.
Figure A.2 The intermediate points found in the simulation of the simplex
algorithm. The figure shows how neighbouring basic feasible solutions
are found iteratively and the optimal solution is finally obtained.
Step 2 The quantities (∆f )q are calculated for the nonbasic variables and
are shown in the bottom row. It turns out that the quantity (∆f )q is maximum
for the variable x2 . Thus, the nonbasic variable x2 is to be made a basic
variable. Since (∆f )2 is positive, the optimal solution is not found. We use
376 Optimization for Engineering Design: Algorithms and Examples
the minimum ratio rule to determine the basic variable which is to be replaced.
The right-most column shows these calculations. By calculating the quantity
bj /a2j , we observe that the quantity is minimum for the basic variable x4 .
Thus, the new basic variables are x2 and x3 , and the new nonbasic variables
are x1 and x4 .
2 3 0 0
cB Basic x1 x2 x3 x4
6
0 x3 1 0 1 0.0 6 1 =6 ←
5
3 x2 0.5 1 0 0.5 5 0.5 = 10
(∆f )q 0.5 0 0 −1.5 f (x) = 15
↑
Step 2 Next, the quantity (∆f )q is calculated for all nonbasic variables. It
is observed that only (∆f )1 is positive. Thus, the variable x1 is chosen to
be the new basic variable. In order to decide which of the two older basic
variables to replace, we use the minimum ratio rule. It turns out that the
variable x3 has the smallest bj /a1j value. Thus, the new basic variables are
x1 and x2 .
2 3 0 0
cB Basic x1 x2 x3 x4
2 x1 1 0 1.0 0.0 6
3 x2 0 1 −0.5 0.5 2
(∆f )q 0 0 −0.5 −1.5 f (x) = 18
Step 2 At this step, the quantity (∆f )q for all nonbasic variables is non-
positive. Thus, the optimal solution is found. The optimal solution is
x∗ = (6, 2)T and the optimal function value is f (x∗ ) = 18.
Linear Programming Algorithms 377
Figure A.2 shows that the simplex method begins at a corner point of
the feasible search region and visits only the neighbouring corner points in
successive iterations. Since the optimal solution is bound to be one of the
corner points of the feasible bounded region, the simplex method guarantees
the convergence to the optimal solution in linear programming problems.
In the case of an unbounded optimal solution (where the optimal function
value is not finite), the constraint entries for all basic variables corresponding
to the chosen nonbasic variables are negative.
EXERCISE A.3.1
Let us add one more inequality constraint to the problem used in the previous
exercise. Thus, the new problem is as follows:
Maximize f (x) = 2x1 + 3x2
subject to
x1 ≤ 6,
x1 + 2x2 ≤ 10,
x1 + x2 ≥ 2,
x1 , x2 ≥ 0.
378 Optimization for Engineering Design: Algorithms and Examples
The solution to the above problem remains the same as that in the previous
problem: x∗ = (6, 2)T .
Step 1 In order to convert the new inequality constraint into an equality
constraint, we use another slack variable:
x1 + x2 − x5 = 2,
x1 + 2x2 + x4 = 10,
x 1 + x2 − x5 = 2,
x1 , x2 , x3 , x4 , x5 ≥ 0.
Since there are three constraints, there must be three basic variables. The
variables x3 and x4 can be considered as basic variables. But the variables
x1 , x2 , and x5 cannot be considered as basic variables, because none of them
have a coefficient one in one equation and zero in other equations. Thus, we
add an artificial variable x6 at the third constraint and form the following
row-echelon form:
x1 + x3 = 6,
x1 + 2x2 + x4 = 10,
x 1 + x2 − x5 + x6 = 2,
x1 , x2 , x3 , x4 , x5 , x6 ≥ 0.
0 0 0 0 0 −1
cB Basic x1 x2 x3 x4 x5 x6
6
0 x3 1 0 1 0 0 0 6 0 =∞
10
0 x4 1 2 0 1 0 0 10 2 =5
2
−1 x6 1 1 0 0 −1 1 2 1 =2 ←
(∆f )q ) 1 1 0 0 −1 0 f (x) = −2
↑
Linear Programming Algorithms 379
The initial solution is (0, 0)T and is marked as A in Figure A.3. It is interesting
to note that this solution is not a feasible solution to the original problem,
violating the third constraint.
Figure A.3 The intermediate points found in the simulation of the dual phase
method of the simplex algorithm. The first phase begins with the
point A and the basic feasible solution B is found. In the next phase,
the optimal solution D is obtained after two iterations.
0 0 0 0 0 −1
cB Basic x1 x2 x3 x4 x5 x6
0 x3 1 0 1 0 0 0 6
0 x4 −1 0 0 1 2 −2 6
0 x2 1 1 0 0 −1 1 2
(∆f )q 0 0 0 0 0 −1 f (x) = 0
At this step, the solution is (0, 2)T and is marked as B in Figure A.3.
Step 2 Since the quantities (∆f )q for all basic variables are non-positive,
the optimal solution for the first phase is found. Therefore, we terminate the
first phase and use this solution as the initial solution for the next phase.
Here the solution corresponds to x1 = 0, x2 = 2, x3 = 6, x4 = 6, x5 = 0, and
380 Optimization for Engineering Design: Algorithms and Examples
The artificial variable at this solution is zero. Thus, this solution is a feasible
solution.
Step 1 In the second phase, the objective function is the same as in the
original problem: f (x) = 2x1 + 3x2 . Since the artificial variable x6 was
introduced to obtain an initial basic feasible solution, we need not continue
with that variable in the second phase. Thus, we tabulate the new row-echelon
form:
2 3 0 0 0
cB Basic x1 x2 x3 x4 x5
6
0 x3 1 0 1 0 0 6 0 =∞
6
0 x4 −1 0 0 1 2 6 2 =3 ←
2
3 x2 1 1 0 0 −1 2 0 =∞
(∆f )q ) −1 0 0 0 3 f (x) = 6
↑
2 3 0 0 0
cB Basic x1 x2 x3 x4 x5
6
0 x3 1.0 0 1 0.0 0 6 1 =6 ←
3
0 x5 −0.5 0 0 0.5 1 3 0 =∞
5
3 x2 0.5 1 0 0.5 0 5 0.5 = 10
(∆f )q 0.5 0 0 −1.5 0 f (x) = 15
↑
At this step, the solution is (0, 5)T and is marked as C in Figure A.3. Notice
that the objective function value at the current point is better than that in
the previous iteration.
Step 2 The quantity (∆f )q is calculated for all nonbasic variables and only
(∆f )1 is positive. Again, the minimum ratio rule suggests that the basic
variable x3 needs to be made nonbasic.
Linear Programming Algorithms 381
2 3 0 0 0
cB Basic x1 x2 x3 x4 x5
2 x1 1 0 1.0 0.0 0 6
0 x5 0 0 0.5 0.5 1 6
3 x2 0 1 −0.5 0.5 0 2
(∆f )q 0 0 −0.5 −1.5 0 f (x) = 18
The solution obtained from the above table is (6, 2)T with an objective
function value equal to 18.
Step 2 Since the quantities (∆f )q for all nonbasic variables (x3 and x4 ) are
non-positive, the optimal solution is obtained. Thus, we terminate the simplex
method. Therefore, the final solution is (6, 2)T as marked D in Figure A.3.
2 3 0 0 0 −100
cB Basic x1 x2 x3 x4 x5 x6
6
0 x3 1 0 1 0 0 0 6 0 =∞
10
0 x4 1 2 0 1 0 0 10 2 =5
2
−1 x6 1 1 0 0 −1 1 2 1 =2 ←
(∆f )q 1 1 0 0 −1 0 f (x) = −200
↑
The tableau is now updated and the following new tableau is obtained
using the original simplex method:
382 Optimization for Engineering Design: Algorithms and Examples
0 0 0 0 0 −1
cB Basic x1 x2 x3 x4 x5 x6
0 x3 1 0 1 0 0 0 6
0 x4 −1 0 0 1 2 −2 6 ←
3 x2 1 1 0 0 −1 1 2
(∆f )q −1 0 0 0 3 −103 f (x) = 6
↑
The next tableau is given as follows:
0 0 0 0 0 −1
cB Basic x1 x2 x3 x4 x5 x6
0 x3 1 0 1 0 0 0 6 ←
0 x4 −0.5 0 0 0.5 1 −1 3
3 x2 0.5 1 0 0.5 0 0 5
(∆f )q 0.5 0 0 −1.5 0 −100 f (x) = 15
↑
The next tableau finds the optimal solution:
0 0 0 0 0 −1
cB Basic x1 x2 x3 x4 x5 x6
0 x3 1 0 1 0 0 0 6
0 x4 0 0 0.5 0.5 1 −1 6
3 x2 0 1 −0.5 0.5 0 0 2
(∆f )q 0 0 −0.5 −1.5 0 −100 f (x) = 18
The optimal solution is (x1 , x2 ) = (6, 2)T with a function value of 18. This
solution is the same as that found in the previous section.
Maximize f (x) = z = cT x,
subject to
Ax = b,
(A.4)
x ≥ 0.
Linear Programming Algorithms 383
subject to
BxB + NxN = b,
(A.5)
xB , xN ≥ 0.
The simplex method identifies the basic variable set xB and manipulates
the constraints in the following way:
In each constraint (or, row in the simplex tableau), only one basic variable is
present with a coefficient of one. From the above equation, the basic variable
set can be written as follows:
z = cTB xB + cTN xN ,
( )
= cTB B−1 b + cTN − cTB B−1 N xN .
If the term inside the bracket is positive, the nonbasic variables can be
increased from zero to improve the objective function value z. Thus, such a
solution cannot be optimal. So, a criterion for a solution to become optimum
is that c̄T = cTN − cTB B−1 N is non-positive (or, c̄T ≤ 0) and at the optimal
solution the objective function value is cTB B−1 b.
Let us now discuss the inequality constraint case (Ax ≤ b). In such a
case, first, the slack variables are added to find a set of basic variable set.
Let us say that the slack variables are represented as xS and there are J
such variables, one for each constraint. At the initial starting iteration, the
following simplex tableau is formed:
cTB cTN 0T
Basic xTB xTN xTS
0 xS B N I b
(∆f )q cTB cTN 0T f (x) = 0
384 Optimization for Engineering Design: Algorithms and Examples
The basic variable set is xS and the combined x = (xN , xB )T is the nonbasic
variable set. The parameter (∆f )q are cost terms and are usually positive.
As the simplex algorithm proceeds and get into the final tableau
representing the optimal solution, the above scenario changes. In most cases,
J variables from x (and not from xS usually) become basic variables. Let us
call this basic variable set as xB and the remaining variables (xN ) together
with the slack variable set xS become the combined nonbasic variable set. Let
us write the final tableau in algebraic form, as shown in Table A.1. There are
several important observations that can be made from this final tableau.
cTB cTN 0T
Basic xTB xTN xTS
cB xB I B−1 N B−1 B−1 b
(∆f )q 0T cTN − cTB B−1 N −cTB B−1 f (x) = cTB B−1 b
(t+1) T (t+1)
is f (x(t+1) ) = cB xB . Increment iteration counter by one (t = t + 1)
and go to Step 2.
= (2, 3).
Step 3 Since both c̄j values are non-negative, we do not terminate the
algorithm.
(0) (0)
Step 4 We compute p = argmax{c̄1 , c̄2 } = 2. Thus, variable x2 should
become a basic variable.
(1)
Step 6 We update the variable vectors as follows: xB = (x3 , x2 )T and
(1) (1) (1)
xN = (x1 , x4 )T . Also, cB = (0, 3)T and cN = (2, 0)T . Now,
1 2
B(1) = (a3 , a2 ) = .
0 2
Its inverse is
( )−1
1 0
B(1) = .
0 0.5
One iteration of the revised simplex method is over and we now move to
Step 2.
Step 2 We compute the reduced cost
(1) 1 0 1 0
(c̄N )T = (2, 0) − (0, 3) = (0.5, −1.5).
0 0.5 1 1
(2) (2)
Step 6 The new basic variable set is xB = (x1 , x2 )T and xN = (x3 , x4 )T .
(2) (2)
Also, cB = (2, 3)T and cN = (0, 0)T . Now,
1 1
B(2) = (a1 , a2 ) = .
0 2
Its inverse is
( )−1
1 0
B(1) = .
−0.5 0.5
The basic variable values are computed as follows:
( )−1
(2) 1 0 6 6
xB = B(2) b= = .
−0.5 0.5 10 2
The second iteration of the revised simplex method is also over and we now
move to Step 2 for the third iteration (t = 2).
Step 2 The reduced cost is
(2) 1 0 1 0
(c̄N )T = (0, 0) − (2, 3) = (−0.5, −1.5).
−0.5 0.5 0 1
(2)
Step 3 Since both elements of c̄N < 0, we have found the optimal solution
and we now terminate the algorithm and declare x∗ = (x1 , x2 , x3 , x4 )(2) =
(6, 2, 0, 0)T as the final solution with a function value of 18.
The revised simplex method is easier to convert to a computer algorithm
and as mentioned before, it makes a computationally quicker implementation
of the original simplex algorithm.
(i) The sensitivity of the optimal solution with a change in the right-hand
side (RHS) parameters (bk values): The right-hand side parameter
usually signifies the capacity or resource values. After obtaining the
optimal solution, the user may be interested in finding how the solution
will change if the capacity or resource values are increased or decreased.
Thus, such a sensitivity analysis has tremendous practical significance.
This analysis is similar in principle to the sensitivity analysis procedure
discussed in the case of nonlinear programming problem in Section 4.4.
(ii) The sensitivity of the optimal solution with respect to a change in
the cost coefficient (cB ) in the objective function: If the prices are
increased or decreased, the user may be interested in knowing whether
the optimal solution changes and if changes, by how much.
One other important matter of interest in both the above cases is to know
the extent of changes (either in RHS or cost terms) from their current values
which will not alter the current optimal solution. In some studies, a change
in both RHS and cost terms may be simultaneously changed and their effects
on the optimal solutions are required to be known. We discuss each of the
two cases in the following subsections.
Equation (A.6) can be used to find the allowable range in the change in RHS
parameters so that xB remains feasible, provided we have an easier way to
compute the matrix B−1 . The algebraic terms in Table A.1 helps us to find
this matrix. Notice that this matrix is already computed at the final tableau
from the columns of the slack variable set xS corresponding to constraint rows.
Note that the new optimum solution becomes xB given in Equation (A.6)
and the corresponding objective value is cTB B−1 b. We take an example to
illustrate the above procedure.
Let us consider the example problem given in Equation (A.3). Note that
for this problem, the quantities xB and B−1 are as follows:
6 1 0
xB = , B−1 = .
2 −0.5 0.5
390 Optimization for Engineering Design: Algorithms and Examples
Let us also assume that we perturb constraints 1 and 2 by ∆b1 and ∆b2 ,
respectively. Equation (A.6) implies that in order for the corresponding
optimal solution to remain feasible, the following conditions must be met:
6 1 0 ∆b1
+ ≥ 0,
2 −0.5 0.5 ∆b2
6 + ∆b1
or, ≥ 0.
2 − 12 ∆b1 + 21 ∆b2
The above two conditions remain as important restrictions for the optimal
solution to remain feasible due to changes in the RHS of both constraints.
For a fixed change in b2 (say by ∆b2 ), there is a range of ∆b1 that will ensure
feasibility of the resulting optimal solution:
−6 ≤ ∆b1 ≤ 4 + ∆b2 .
For a change in the RHS of constraint 1 alone, ∆b2 = 0 and the allowable
change in b1 is −6 ≤ ∆b1 ≤ 4 or, 0 ≤ b1 ≤ 10. For this case, the optimal
solution changes (for a fixed b2 = 10) as follows:
1 0 0 10 0 10
xB = B−1 b ∈ , = , .
−0.5 0.5 10 10 5 0
The above sensitivity analysis can also be verified from Figure A.4. Keeping
the second constraint identical, if the first constraint is allowed to change from
its original value b1 = 6, the allowable range of values of b1 is [0, 10]. When
b1 = 0, the constraint becomes x1 ≤ 0, meaning the solutions along the x2 -
axis and in 0 ≤ x2 ≤ 5 are feasible. The optimal solution is the point (0, 5)T
with a function value of 2 × 0 + 3 × 5 or 15. The estimated function value
at this point is also found to be 15 as shown above. The RHS parameter b1
cannot be reduced below zero. On the other hand, when b1 is increased to 10,
Linear Programming Algorithms 391
Figure A.4 Change of right-hand side coefficient of the first constraint and its
effect on the optimum.
When the RHS parameters for more than one constraint is changed
simultaneously, the conditions for checking if the perturbed solution is feasible
are different from that described above. Let us say that for the j-th constraint,
∆bj is the desired perturbation, Uj is the maximum feasible increase of bi for
keeping the solution feasible, and Lj is the minimum feasible decrease of bi for
keeping the solution feasible. Let us also define a ratio rj for the perturbation
in the j-th constraint as follows:
∆bj
Uj , if ∆bj ≥ 0,
rj = (A.9)
−∆bj
Lj , if ∆bj < 0.
x1 ≤ 4,
x1 + 2x2 ≤ 8.
After the optimal solution is obtained, one may be interested in knowing how
would the solution and its function value change if the cost terms (cB and
cN ) are changed. It could be that unlike in the case of a change the RHS
parameters, here, the optimal solution does not change for a perturbation in
the cost terms. Then, one may be interested in knowing what are the ranges
of change in cost terms that will not alter the original optimal solution.
For this purpose, we recall that the condition for optimality is when
c̄N ≤ 0. Let us say that all cost terms are changed, such that the new
c̄N is calculated as follows:
( ) ( )
c̄N = cTN + ∆cTN − cTB + ∆cTB B−1 N ≤ 0. (A.10)
Since the solution xB does not change, satisfaction of Equation A.10 will
provide the range of cost terms that will not affect the optimal solution. The
revised objective function becomes (cB + ∆cB )T xB .
Let us take the example problem given in Equation (A.3). Say, we change
c1 and c2 terms by ∆c1 and ∆c2 , respectively. Recall that in this problem,
there is no cN term. Therefore, the reduced cost term becomes as follows:
[ ]
0 1 0
c̄N = − 2 + ∆c1 3 + ∆c2
0 −0.5 0.5
−0.5 − ∆c1 + 0.5∆c2
= ≤ 0.
−1.5 − 0.5∆c2
Linear Programming Algorithms 393
∆c2 ≥ −3.
For a fixed ∆c1 , there is a range of values of ∆c2 which will result in an
identical optimal solution: −3 ≤ ∆c2 ≤ 2∆c1 + 1. For example, if the cost
term for the first variable is unchanged (that is, ∆c1 = 0), the range of
allowable values for ∆c2 is −3 ≤ ∆c2 ≤ 1 or, 0 ≤ c2 ≤ 4. When c2 = 0,
there is no importance of the second variable, the optimal solution can still be
(6, 2)T (in fact, any point for x1 = 6 and 0 ≤ x2 ≤ 2 is an optimal solution).
When c2 = 4, all solutions on the constraint 2 within x1 ∈ [0, 6] are optimal,
including the original solution (6, 2)T . If a change ∆c2 = −2 is chosen, the
optimal solution is still (6, 2)T , but the function value changes by cTB (0, −2)T
or −1.5. Figure A.5 depicts these scenarios. Any slope of the contour line of
the objective function in the range (2/0, 2/4) keeps the optimum at (6, 2)T .
Figure A.5 Change of cost term for x2 and its effect on the optimum. Cost term
c2 in the range [0, 4] does not change the position of the optimum for
c1 = 2.
There also exists a 100% rule for changes in multiple cost terms
simultaneously. Let us say that there is a change ∆ci anticipated in the
394 Optimization for Engineering Design: Algorithms and Examples
i-th cost term ci . Let us also say that Ui is the maximum allowable increase
in ∆ci to keep optimality, and Li is the minimum allowable decrease in ∆ci
to keep optimality. Then the ratio ri is computed as follows:
∆ci ,
Ui if ∆ci ≥ 0,
ri = (A.11)
−∆c j
Li , if ∆c i < 0.
∑
If the combined ratio ( i ri ) is smaller than or equal to one, the changes in
cost terms do not change the original optimal solution, otherwise nothing can
be said about the optimality of the original solution in the new context.
For the example problem considered in the previous subsection,
condition A.10 yields: ∆c1 ≥ −0.5 and −3 ≤ ∆c2 ≤ 1. Therefore, L1 = −0.5,
U1 = ∞, L2 = −3 and U2 = 1. For desired changes of ∆c1 = −0.5 and
∆c2 = 0.5, the respective r1 = (−0.5)/(−0.5) = 1 and r2 = 0.5/1 = 0.5. The
combined change is greater than one, hence nothing can be said about the
optimality of the solution. The above changes make the objective function
f (x) = 1.5x1 + 3.5x2 . The original solution (6, 2)T with a function value of
16 is no more optimal, as the solution (0, 5)T has a better function value of
16.5. However, if a ∆c1 = −0.2 is used, making r1 = (−0.2)/(−0.5) = 0.4 and
r1 + r2 = 0.9 < 1. The solution (6, 2)T has a function value of 17.8, while the
solution (0, 5)T has a function value of 16.5. Thus, the 100% rule correctly
predicts the optimality of the original solution.
Note that we are required to maximize the objective function f (x) = 2x1 +3x2 .
Since both x1 and x2 are non-negative, we can safely say that the optimal
objective function value cannot be greater than 20. This is because, the sum
of 2x1 and 4x2 cannot be more than 20, hence the sum of 2x1 and 3x2 cannot
also be greater than 20. Although from the first constraint alone we may
never get such a bound, we may try the following. Since the upper bounds
of some linear combinations of variables are known from the constraints, we
Linear Programming Algorithms 395
can use two variables (called dual variables), say y1 and y2 , to combine the
constraints as follows:
Our goal is to choose y1 and y2 such that the left side expression approaches
the objective function f (x) = 2x1 + 3x2 but by keeping the coefficients not
smaller than 2 and 3, respectively. That is, we would like to satisfy the
following constraints:
y1 + y2 ≥ 2,
2y2 ≥ 3.
For a set of values of y1 and y2 , the right-hand side (6y1 + 10y2 ) puts the
upper bound of f (x). Since our goal is to come close to the optimum of f (x),
the following optimization problem should give us the requisite values of y1
and y2 :
Minimize f (x) = 6y1 + 10y2 ,
subject to
y1 + y2 ≥ 2,
2y2 ≥ 3, (A.12)
y1 , y2 ≥ 0.
This problem is known as the dual problem to the primal problem given in
Equation (A.3).
Thus, for the following primal LP problem (maximization):
Maximize f (x) = cT x,
subject to
Ax ≤ b,
(A.13)
x ≥ 0,
Minimize w(y) = bT y,
subject to
AT y ≥ c,
(A.14)
y ≥ 0.
396 Optimization for Engineering Design: Algorithms and Examples
The primal problem has n variables and J constraints, whereas the dual
problem has J variables and n constraints. Notice how the cost term c
and RHS coefficient vector b are interchanged between the two problems.
Also, the primal problem is a maximization type and the dual problem is a
minimization type. One other vital difference is that while constraints in the
primal problem are ‘≤’-type, that of the dual problem are of ‘≥’-type.
Besides the differences, there are certain relationships between the two
problems which are of great importance in solving LP problems in general.
The weak duality theorem states that for a feasible solution x̂ to the primal
problem and for a feasible solution ŷ to the dual problem,
cT x̂ ≤ bT ŷ. (A.15)
This means that a feasible primal solution has always smaller or equal
objective function value than the objective value of any feasible dual solution.
If somehow a feasible dual solution can be identified for a primal problem, its
dual objective value can be taken as the upper bound of the primal optimal
function value. There exists a strong duality concept that is more useful.
The strong duality theorem states that if one of the problems has a
bounded optimal solution, then the same holds for the other problem as well,
and importantly the two optimal objective function values are equal, that is,
cT x∗ = bT y∗ . (A.16)
This theorem suggests that if we solve the dual problem, the optimal objective
value of the primal problem is identical to the optimal objective value of the
dual problem.
We now develop another interesting relationship between the two problems
at their respective optimum (x∗ and y∗ ). Recall from the algebraic analysis
of the simplex method that
This means that the optimal dual solution can be obtained from the final
simplex tableau (Table A.1) of the primal LP procedure. Recall that −cTB B−1
vector appears under the column of slack variables xS for the (∆f )q row. We
now demonstrate the interchangeability of both problems through the same
example problem given in Equation (A.3).
The final tableau using the simplex method is reproduced here for
convenience (Table A.2): Here, the slack variables are (x3 , x4 )T . Thus, the
Linear Programming Algorithms 397
2 3 0 0
cB Basic x1 x2 x3 x4
2 x1 1 0 1.0 0.0 6
3 x2 0 1 −0.5 0.5 2
(∆f )q 0 0 −0.5 −1.5 f (x) = 18
vector under xS in the (∆f )q row is (−0.5, −1.5). In Table A.1, we have
marked this vector as −cTB B−1 . Thus, the negative of this vector is nothing
but the optimal dual solution y∗B = (cTB B−1 )T = (0.5, 1, 5)T . Thus, without
solving the dual problem, we can predict the solution of the dual problem
using the final tableau of the primal LP procedure.
We now solve the dual problem graphically and investigate if the
above-obtained solution is indeed the dual solution. The dual problem is
formulated in Equation (A.12). Figure (A.6) shows the feasible space and
the corresponding optimal (dual) solution. Since the function 6y1 + 10y2
Figure A.6 Graphical representation of the dual problem and its solution.
(ii) Second, the constraints are of ‘≥’-type, thereby requiring to use the
artificial variables and resorting to the dual phase method.
Although the Big-M method discussed in Section A.4 can be employed,
in the following section, we suggest another method that makes such problem
solving relatively easier and also allows an easier way to perform a sensitivity
analysis.
subject to
x1 + x2 ≥ 2,
2x2 ≥ 3, (A.18)
x1 , x2 ≥ 0.
subject to
−x1 − x2 + x3 ≤ −2,
x1 , x2 , x3 , x4 ≥ 0.
It is clear that the initial basic variables x3 and x4 both take negative values
(x3 = −2 and x4 = −3). This is not allowed by the original simplex method
and we cannot proceed with the simplex algorithm. One way out is to
introduce artificial variables as discussed in Section A.3 and resort to a dual
phase LP method.
Here, we suggest a more efficient algorithm. In the dual-simplex
algorithm, the dual problem of the given primal problem is kept in mind. If
Linear Programming Algorithms 399
the dual problem is feasible, a method can be devised in principle to solve the
dual problem, however, by actually solving the primal problem. As we have
discussed earlier, at the optimum of the primal problem, the dual problem is
also optimum. Thus, despite searching around the infeasible regions of the
primal problem, an attempt to solve the dual problem should eventually take
the algorithm to converge to the primal solution. We describe the dual-simplex
algorithm in the following.
Algorithm
Step 1 Choose a dual-feasible solution, that is, for minimization problems,
choose c ≥ 0 and for maximization problems c ≤ 0. Set all nonbasic variables
to zero. Set iteration counter t = 0.
Step 2 If xB ≥ 0, terminate. Else, go to Step 3.
Step 3 Identify the leaving basic variable by the following criterion.
Determine
xr = min{xj |xj < 0}.
j
That is, find the basic variable xj that has the most negative bj value. This
variable xr leaves the basic variable set.
Step 4 Identify the nonbasic variable that should be made basic using the
following criterion. Determine
{ }
(∆f )k
q = argmink ark < 0 .
ark
The reasons for operations in Steps 2 and 3 are given here. Note that
−b in the primal problem is equal to (∆f )q of the dual problem. Since the
variable having the highest positive (∆f )q is chosen as a basic variable in the
original simplex algorithm (in our dual problem), this amounts to choosing the
variable having the most negative bj value in the primal problem. The basic
variables in the primal problem are equivalent to the nonbasic variables in
the dual problem. Similarly the minimum ratio rule for choosing the nonbasic
variable that would become basic in the original simplex algorithm (our dual
problem) should now be applied as a ratio between (∆f )k and ark , but only
for those for which ark is negative. Thus, the above operations are designed
in visualizing how the dual problem would have been solved by the original
simplex method. Hence, this method is named the dual-simplex method.
We now illustrate the working of the dual-simplex algorithm on the
example problem given in Equation A.19.
400 Optimization for Engineering Design: Algorithms and Examples
−6 −10 0 0
cB Basic x1 x2 x3 x4
0 x3 −1 −1 1 0 −2
0 x4 0 −2 0 1 −3 ←
(∆f )q −6 −10 0 0 f (x) = 0
↑
Step 2 Here xB = (−2, −3)T . Since they are negative, we do not terminate.
Step 3 We now identify xr by looking at all negative bj values and choosing
the most negative one. The tableau above indicates that xr = x4 having
b4 = −3.
Step 4 Now we identify the corresponding nonbasic variable that has
a4k < 0 and minimum ratio between (∆f )k and a4k . In the above tableau,
there is only one entry for which a4k is negative. It is for variable x2 . That
is, q = 2. This indicates that variable x4 should now be replaced by variable
x2 in the next simplex.
We now form the second tableau and perform a row-echelon operation.
−6 −10 0 0
cB Basic x1 x2 x3 x4
0 x3 −1 0 1 −0.5 −0.5 ←
−10 x2 0 1 0 −0.5 1.5
(∆f )q 0 0 0 −5 f (x) = −15
↑
−6 −10 0 0
cB Basic x1 x2 x3 x4
−6 x1 1 0 −1 0.5 0.5
−10 x2 0 1 0 −0.5 1.5
(∆f )q 0 0 −6 −2 f (x) = −18
Let us consider the primal problem given in Equation (A.3) again. After we
solve the problem, we obtain the tableau shown in Table A.2. At this point,
the optimal solution is (6, 2)T with a function value of 18. To perform a
sensitivity analysis, let us say that we would like to add a new constraint
x1 + x2 ≥ 2 (as used in Section A.3) and resolve the problem, not by
formulating a new problem, but starting from the final tableau obtained
402 Optimization for Engineering Design: Algorithms and Examples
after solving the original problem. With the third constraint, one more slack
variable (x5 ) is needed:
−x1 − x2 + x5 = −2.
2 3 0 0 0
cB Basic x1 x2 x3 x4 x5
2 x1 1 0 1 0 0 6
3 x2 0 1 −0.5 0.5 0 2
0 x5 −1 −1 0 0 1 −2
(∆f )q 0 0 −0.5 −1.5 0 f (x) = 18
We shall now apply the dual-simplex method to find the new solution.
Since the addition of a new constraint disturbs the tableau in terms of the
rows being not in row-echelon form, we fix the tableau to make the rows in
row-echelon form. For this purpose, first we add rows 1, 2 and 3 together and
replace row 3 with the result. We obtain the following tableau:
2 3 0 0 0
cB Basic x1 x2 x3 x4 x5
2 x1 1 0 1 0 0 6
3 x2 0 1 −0.5 0.5 0 2
0 x5 0 0 0.5 0.5 1 6
(∆f )q 0 0 −0.5 −1.5 0 f (x) = 18
Since all the basic variables are non-negative, we terminate the dual-
simplex procedure in Step 2 of the algorithm and declare (x1 , x2 ) = (6, 2) as
the optimal solution to the new problem. Notice that the addition of the new
constraint x1 +x2 ≥ 2 does not alter the optimal solution and how quickly the
dual-simplex method is able to find this fact. Recall the dual phase method
solved the same problem in Section A.3 in a much more computationally
expensive manner.
Let us now add a constraint x2 ≥ 3 to the original problem. This changes
the optimal solution to (4, 3)T with a function value 17. Let us apply the
dual-simplex method to investigate if it is able to find this solution. We shall
begin with the final tableau obtained for solving the original problem. The
tableau is modified with the new constraint in the following:
Linear Programming Algorithms 403
2 3 0 0 0
cB Basic x1 x2 x3 x4 x5
2 x1 1 0 1 0 0 6
3 x2 0 1 −0.5 0.5 0 2
0 x5 0 −1 0 0 1 −3
(∆f )q 0 0 −0.5 −1.5 0 f (x) = 18
Since this disturbs the row-echelon nature of the rows, we add rows 2 and
3 together and replace row 3 with the result:
2 3 0 0 0
cB Basic x1 x2 x3 x4 x5
2 x1 1 0 1 0 0 6
3 x2 0 1 −0.5 0.5 0 2
0 x5 0 0 −0.5 0.5 1 −1 ←
(∆f )q 0 0 −0.5 −1.5 0 f (x) = 18
↑
2 3 0 0 0
cB Basic x1 x2 x3 x4 x5
2 x1 1 0 0 1 2 4
3 x2 0 1 0 0 −1 3
0 x3 0 0 1 −1 −2 2
(∆f )q 0 0 0 −2 −1 f (x) = 17
Since all the basic variables have a positive value, we stop the dual-
simplex algorithm. The optimal solution obtained from the final tableau
is (4, 3)T with a function value of 17. As an additional information, the
corresponding shadow price (or, Lagrange multiplier) values are 0, 2 and 1 for
constraints 1, 2 and 3, respectively. This means that the first constraint is not
active at the current optimal solution. As shown in Figure A.7, the optimal
solution corresponds to the intersection of the second and third constraints.
Constraint 1 is inactive at the optimal solution.
Without solving the problem with old and new constraints together, the
above dual-simplex method shows how the final simplex tableau of the original
problem can be modified to take care of any new constraint to obtain the new
optimal solution in a computationally quick manner.
404 Optimization for Engineering Design: Algorithms and Examples
Figure A.7 A new constraint (x2 ≥ 3) is added. The optimum changes to (4, 3)T .
Thus, the new RHS vector is B−1 b = (12, −1)T . We modify the final tableau
as follows:
2 3 0 0
cB Basic x1 x2 x3 x4
2 x1 1 0 1 0 12
3 x2 0 1 −0.5 0.5 −1 ←
(∆f )q 0 0 −0.5 −1.5
2 3 0 0
cB Basic x1 x2 x3 x4
2 x1 1 2 0 1 10
3 x3 0 −2 1 −1 2
(∆f )q 0 −1 0 −2 f (x) = 20
Figure A.8 RHS of g1 is changed to 12. The optimum changes to (10, 0)T .
constraint is inactive, which is also obtained from the zero value of the
Lagrange multiplier of the first constraint, as computed from the tableau:
cTB B−1 = (0, 2)T .
We now consider a final problem in which a new constraint x1 + 5x2 ≥ 15
is added to the original problem and the RHS of the first constraint is changed
as x1 ≤ 12. The modified optimization problem is given as follows:
subject to
x1 ≤ 12,
x1 + 2x2 ≤ 10,
x1 + 5x2 ≥ 15,
x1 , x2 ≥ 0.
406 Optimization for Engineering Design: Algorithms and Examples
Figure A.9 shows the feasible region and the optimal solution. It can be seen
that the optimal solution is (6, 67, 1.67)T having a function value equal to
18.33.
Figure A.9 A new constraint (g3 ) is added and RHS of g1 is changed. Dual-
simplex method finds the true optimum (6.67, 1.67)T .
We begin solving the problem from the final tableau of the original simplex
method (of solving the problem in Equation (A.3)) by adding a third row
corresponding to the new constraint and by changing the RHS vector for the
first two rows as B−1 b = (12, −1)T . The modified tableau is as follows:
2 3 0 0 0
cB Basic x1 x2 x3 x4 x5
2 x1 1 0 1 0 0 12
3 x2 0 1 −0.5 0.5 0 −1
0 x5 −1 −5 0 0 1 −15
(∆f )q 0 0 −0.5 −1.5 0
We perform a row-echelon operation and obtain the following tableau.
2 3 0 0 0
cB Basic x1 x2 x3 x4 x5
2 x1 1 0 1 0 0 12
3 x2 0 1 −0.5 0.5 0 −1
0 x5 0 0 −1.5 2.5 1 −8 ←
(∆f )q 0 0 −0.5 −1.5 0
↑
Linear Programming Algorithms 407
2 3 0 0 0
cB Basic x1 x2 x3 x4 x5
2 x1 1 0 0 1.67 0.67 6.67
3 x2 0 1 0 −0.33 −0.33 1.67
0 x3 0 0 1 −1.67 −0.67 5.33
(∆f )q 0 0 0 −2.33 −0.33 f (x) = 18.33
All basic variables take non-negative values, hence the tableau represents
the optimal solution (x1 , x2 ) = (6.67, 1.67)T with a function value equal to
18.33. As a by-product, we obtain the shadow prices as zero, 2.33 and 0.33
for constraints 1, 2, and 3, respectively. As evident, the first constraint is
inactive and the optimal solution lies at the intersection of constraints 2
and 3.
A.10 Summary
Linear programming methods are widely used in problems where the objective
function as well as the constraints are linear. Usually, all design variables are
restricted to be nonnegative. In these problems, one of the corner points
of the feasible search space is the optimum point. The simplex method of
linear programming begins from a basic feasible solution (a corner point
in the feasible search space) and moves to a neighbouring basic feasible
solution that increases the objective function value the most. This feature
of the simplex search method makes it efficient and popular in solving various
linear programming problems. In such formulations, inequality constraints are
converted into equality constraints by adding or subtracting slack variables.
Often, the addition of slack variables alone cannot find an initial basic feasible
solution. Artificial variables are then added and a dual phase strategy is used
to first find an initial basic feasible solution and then the regular simplex
method is used to find the exact optimal solution. Simple numerical exercise
problems are taken to illustrate the working of the simplex method. Interested
readers may refer to other textbooks for more details on linear programming
methods (Taha, 1989).
Thereafter, the Big-M method is described to handle greater-than-equal-
to type inequality constraints. An algebraic description of simplex method is
presented next to provide a symbolic representation of the working principle
of the simplex method. Initial and final simplexes are shown in algebraic
form, so that a clear idea of the obtained quantities at the final simplex can
be obtained. The results are used in subsequent sections related to sensitivity
analysis and the dual-simplex method.
408 Optimization for Engineering Design: Algorithms and Examples
REFERENCES
PROBLEMS
(b) Maximize x1
subject to
2x1 + x2 ≤ 2,
x1 + 5x2 + 10 ≥ 0,
x2 ≤ 1.
4x1 + x2 ≥ 3,
4x1 + 3x2 ≤ 6,
x1 + 2x2 ≤ 3,
x1 , x2 ≥ 0.
A-2 Solve the following linear program by using the simplex method
(tableau method):
Maximize 0.9x1 + x2 ,
subject to
2x1 + 3x2 ≤ 9,
|x2 − x1 | ≤ 1,
x1 , x2 ≥ 0.
Plot a neat sketch of the constraints and the feasible region and mark the
proceedings of each tableau on the plot.
A-3 Solve the following linear program by using the simplex method
(tableau method):
Maximize x1 + x2 ,
subject to
2x1 + 3x2 ≤ 12,
|x2 − x1 | ≤ 1,
x1 , x2 ≥ 0.
Plot a neat sketch of the constraints and the feasible region and mark the
proceedings of each tableau on the plot.
410 Optimization for Engineering Design: Algorithms and Examples
subject to
4x1 + 2x2 + x3 ≤ 8,
x1 + x2 + x3 ≥ 4,
x1 , x2 , x3 ≥ 0,
A-5 Find the intersecting point of the following pairs of straight lines by
formulating an LP program:
(a) x1 − 2x2 + 1 = 0,
5x1 + 3x2 − 10 = 0.
(b) x1 + x2 − 1 = 0,
10x1 + x2 + 5 = 0.
Maximize θ
subject to
∇f (x(t) ) · t ≤ ∇f (x(t) ) · A − θ,
x2 ≤ 2,
2x1 + x2 ≥ 1,
x1 , x2 ≥ 0.
2x1 + x2 ≥ 1,
x1 + 2x2 ≥ 1,
x1 , x2 ≥ 0.
0 ≤ x1 , x2 ≤ 1.
A-11 Formulate the dual problem and solve to find the dual solution:
Maximize 3x1 + 2x2
subject to
x1 + 3x2 ≤ 3,
5x1 − x2 = 4,
x1 + 2x2 ≤ 2,
x1 , x2 ≥ 0.
412 Optimization for Engineering Design: Algorithms and Examples
x1 + 2x2 ≤ 2,
x1 , x2 ≥ 0, x3 free,
x1 , x2 ≥ 0.
A-14 Find the range of values of the cost terms in the above LP
Problem A.13 for which the optimum solution does not change.
A-15 If the objective function of Problem A.13 is changed to 1.5x1 + 0.8x2 ,
determine if the original optimal solution remains as optimal using 100% rule.
A-16 If a new constraint 3x1 +5x2 ≤ 15 is added to Problem A.13, determine
what would be the new optimal solution using the dual-simplex method.
A-17 Consider the following LP problem:
Maximize x1 + 3x2
subject to
x1 + x2 ≤ 8,
−x1 + x2 ≤ 4,
x1 ≤ 6,
x1 , x2 ≥ 0.
Linear Programming Algorithms 413
Maximize x1 + x2
subject to
3x1 + 5x2 ≤ 7,
0 ≤ x1 , x2 ≤ 1.
Maximize x1 + 2x2
subject to
2x1 + x2 ≤ 4,
x1 − x2 ≤ 2,
x1 , x2 ≥ 0.
subject to
2x1 + x2 ≤ 10,
x1 + x2 ≤ 8,
x1 ≤ 4,
x1 , x2 ≥ 0.
the following final tableau is obtained (x3 , x4 and x6 are slack variables for
three constraints, respectively):
2 3 0 0 0
cB Basic x1 x2 x3 x4 x5
2 x2 0 1 −1 2 0 6
0 x5 0 0 −1 1 1 2
3 x1 1 0 1 −1 0 2
(∆f )q 0 0 −10 −10 0 f (x) = 18
subject to
x1 + x2 + x3 ≥ 1,
x1 , x2 , x3 ≥ 0,
the following optimal tableau is obtained (x4 , x5 , and x6 are slack variables
for three constraints):
Linear Programming Algorithms 415
100% rule for cost coefficients, 393 Bracket operator penalty, 156, 162
100% rule for RHS coefficients, 391 Bracketing algorithms, 46
Branch-and-bound method, 270
algorithm of, 270
Active constraint, 144
strategy, 232, 239
Ant-based EMO, 323 Car suspension design, 12
Arithmetic-geometric-mean inequality, dynamic model of, 12
278 Cauchy’s method, 112–114, 118, 204
Artificial neural networks, 18 algorithm of, 112
Artificial variables, 377 Central difference technique, 108
Classical optimization methods, 37
Classification of optimization methods,
Basic feasible solution, 371, 377 35
Basic variable, 371 Complex search method, 177–182
BFS method, 134 algorithm of, 178
Bi-level optimization, 38 Conditioned simplex method, 215
Big-M method, 381 Conjugate, 103
Binary tournament selection, 314 Conjugate direction method, 103–108
Bisection method, 65 algorithm of, 105
algorithm of, 65 Conjugate gradient method, 120–124
BLX operator, 316 algorithm of, 121
Boltzmann distribution, 325 Constrained domination, 323
Bounding phase method, 49, 73, 88, 106 Constrained optimization algorithms,
algorithm of, 49 143
computer code of, 78 Constraint qualification, 149
Box’s evolutionary optimization, 90 Constraints, 4
algorithm of, 90 active, 144, 233
Box’s method equality, 4, 143
constrained functions for, 177 inactive, 144
unconstrained functions for, 90 inequality, 4, 144
417
418 Index