Mechanical Engineering Program COPPE / Federal University of Rio de Janeiro Caixa Postal 68503, 21945-970 Rio de Janeiro, BRAZIL

Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

A VIEW ON NONLINEAR OPTIMIZATION

HERSKOVITS
JOSE

Mechanical Engineering Program


COPPE / Federal University of Rio de Janeiro1
Caixa Postal 68503, 21945-970 Rio de Janeiro, BRAZIL

1. Introduction
Once the concepts of a material object are established, the act of designing
consists on choosing the values for the quantities that prescribe the object,
or dimensioning. These quantities are called Design Variables. A particular
value assumed by the design variables defines a configuration. The design
must meet Constraints given by physical or others limitations. We have a
feasible configuration if all the constraints are satisfied.
A better design is obtained if an appropriate cost or objective function
can be reduced. The objective is a function of the design variables and
quantifies the quality of the material object to be designed. The design is
optimum when the cost is the lowest among all the feasible designs.
If we call [x1 , x2 , ..., xn ] the design variables, f (x) the objective function
and the feasible set, the optimization problem can be denoted
minimize f (x); subject to x .

(1.1)

This problem is said to be a Mathematical Program and, the discipline


that studies the numerical techniques to solve Problem (1.1), Mathematical
Programming. Even if mathematical programs arise naturally in optimization problems related to a wide set of disciplines that employ mathematical
models, several physical phenomena can be modeled by means of mathematical programs. This is the case when the equilibrium is attained at
the minimum of an energy function.
1
Partially written at INRIA, Institut National de Recherche en Informatique et en
Automatique, Rocquencourt, France.

HERSKOVITS
JOSE

The first stage to get an optimal design is to define the Optimization


Model. That is, to select appropriate design variables, an objective function
and the feasible set. In engineering design, we generally have
{x Rn /gi (x) 0, i = 1, 2, ..., m; hi (x) = 0, i = 1, 2, ..., p},
where x [x1 , x2 , ..., xn ]t and gi and hi are the Inequality and Equality
Constraints respectively. Since it involves nonlinear functions, Problem
(1.1) is a Nonlinear Program. The optimization problem is said to be Unconstrained when Rn .
Mathematical Programming provides a general and flexible formulation
for engineering design problems. Once the optimization model was established, nonlinear programming algorithms only require the computation of
f , gi , hi and their derivatives at each iteration.
Nowadays, strong and efficient mathematical programming techniques
for several kind of problems, based on solid theoretical results and extensive
numerical studies, are available. Approximated functions, derivatives and
optimal solutions can be employed together with optimization algorithms
to reduce the computer time.
The aim of this paper is not to describe the state of the art of nonlinear
programming, but to explain in a simple way most of the modern techniques
applied in this discipline. With this objective, we include the corresponding
algorithms in the framework of a general approach, based on Newton - like
iterations for nonlinear systems. These iterations are used to find points
verifying first order optimality conditions. Some favorable characteristics
of optimization problem, conveniently explored, lead to strong algorithms
with global convergence.
There exists a very large bibliography about nonlinear programming.
We only mention the books by Bazaraa and Shetty [7], Dennis and Schnabel
[10], Fletcher [14], Gill et al. [19], Hiriart - Urruty and Lemarechal [32],
Luenberger [35], [36] and Minoux [39]. Several books written by engineering
and/or structural designers include numerical optimization techniques in
the framework of optimal design, such as Arora [1], Haftka et al. [20], Haug
and Arora [22], Kirsch [34], Fox [15] and Vanderplaats [32].
We discuss some basic concepts of mathematical programming in the
next section and, in the following one, optimality conditions are studied.
A view of Newton - like algorithms for nonlinear systems is given in section 4. Unconstrained and equality constrained optimization techniques are
studied in sections 5 and 6. Sequential Quadratic Programming method is
discussed in section 7, where in addition to a classical algorithm based on
this method, a feasible directions algorithm is presented. A Newton - like

A VIEW ON NONLINEAR OPTIMIZATION

approach for interior points algorithms, proposed by the author [30], [31],
is presented in the last section.
2. Some basic concepts
We deal with the nonlinear optimization problem

minimize f (x)

subject to gi (x) 0; i = 1, 2, ..., m

and hi (x) = 0; i = 1, 2, ..., p,

(2.1)

where f, g and h are smooth functions in Rn and at least one of these functions is nonlinear. Any inequality constraint is said to be Active if gi (x) = 0
and Inactive if gi (x) < 0. Denoting g(x) = [g1 (x), g2 (x), ..., gm (x)]t and
h(x) = [h1 (x), h2 (x), ..., hp (x)]t , we have

minimize f (x)

subject to g(x) 0

and h(x) = 0.

(2.2)

We introduce now the auxiliary variables Rm and Rp , called


Dual Variables or Lagrange Multipliers and define the Lagrangian Function
associated with Problem (2.1) as
l(x, , ) f (x) + t g(x) + t h(x).
Following, some definitions concerning Problem (2.1) and the methods
to solve this problem are presented. First, the meaning of the statement of
the optimization problem is discussed.
DEFINITION 2.1. A point x is a Local Minimum (or Relative Minimum) of f over if there exists a neighborhood {x / k xx k }
such that f (x) f (x ) for any x . If f (x) > f (x ) for any x , then
x is a Strict Local Minimum.
DEFINITION 2.2. A point x is a Global Minimum (or Absolute Minimum) of f over if f (x) f (x ) for any x .
Note that a global minimum is also a local minimum.The nature of
optimization implies the search of the global minimum. Unfortunately, the
global minimums can be characterized only in some particular cases, as in
Convex Programming.
Usually Nonlinear Programming methods are iterative. Given an initial
point x0 , a sequence of points, {xk }, is obtained by repeated applications
of an algorithmic rule. This sequence must converge to a solution x of the

HERSKOVITS
JOSE

problem. The convergence is said to be asymptotic when the solution is


not achieved after a finite number of iterations. Except in some particular
cases, like linear or quadratic programming, such is the case in nonlinear
optimization.
DEFINITION 2.3. An iterative algorithm is said to be Globally Convergent
if for any initial point x0 Rn (or x0 ) it generates a sequence of points
converging to a solution of the problem.
DEFINITION 2.4. An iterative algorithm is Locally Convergent if there
exists a positive such that for any initial point x0 Rn (or x0 )
verifying k x0 x k , it generates a sequence of points converging to a
solution of the problem.
Modern Mathematical Programming techniques seek for globally convergent methods. Locally convergent algorithms are not useful in practice
since the neighborhood of convergence is not known in advance.
The major objective of numerical techniques is to have strong methods
with global convergence. Once this is obtained, engineers are worried with
efficiency. In Design Optimization, evaluation of functions and derivatives
generally takes more computer time than the internal computations of the
algorithms itself. Then, the number of iterations gives a good idea of the
computer time required to solve the problem. The following definition introduces a criterion to evaluate the speed of convergence of asymptotically
convergent iterative methods.
DEFINITION 2.5. The Order of Convergence of a sequence {xk } x is
the largest number p of the nonnegative numbers p satisfying
limk

k xk+1 x k
= < .
k xk x kp

When p = 1 we have Linear Convergence with Convergence Ratio < 1.


If = 0 the convergence is said to be Superlinear. The convergence is
Quadratic in the case when p = 2.
Since they involve the limit when k , p and are a measure of the
asymptotic speed of convergence. Unfortunately, a sequence with a good
order of convergence may be very slow far from the solution.
The convergence is faster when p is larger and is smaller. Near the
solution, if the convergence is linear the error is multiplied by at each
iteration, while the reduction is squared for quadratic convergence. The
methods that will be studied here have rates varying between linear and
quadratic.

A VIEW ON NONLINEAR OPTIMIZATION

Figure 1.

(a) Descent directions (left); (b) Feasible directions (right)

In general, globally convergent algorithms define at each point a search


direction and look for a new point on that direction.
DEFINITION 2.6. A vector d Rn is a Descent Direction of a real function f at x Rn if there exists a > 0 such that f (x + td) < f (x) for any
t (0, ).
If f is differentiable at x and dt f (x) < 0, it is easy to prove [7] that d
is a descent direction of f.
In Figure 1.a, constant value contours of f(x) are represented. We note
that f(x) decreases in any direction that makes an angle greater than 90
degrees with f (x). The set of all descent directions constitutes the half
space D.
DEFINITION 2.7. A vector d Rn is a Feasible Direction of the problem
(2.1), at x , if for some > 0 we have x + td for all t [0, ].
In Figure 1.b the feasible region of an inequality constrained problem
is represented. The vector d is a feasible direction, since it supports a non
zero segment [x, x + d]. Any direction is feasible at an interior point. At
the boundary, the feasible directions constitutes a cone F that we call Cone
of Feasible Directions. This cone is not necessarily closed.
3. Optimality conditions
A first requirement to solve optimization problems is to characterize the
solutions by conditions that are easy to verify. These conditions will be
useful to identify a minimum point and, frequently, will be at the heart of
numerical techniques to solve the problem.

HERSKOVITS
JOSE

Figure 2.

Illustration of the first order optimality condition

Optimality conditions are based on differential calculus, and they are


said to be of first order if they involve only first derivatives. They are
of second order if second derivatives are also required. In what follows we
describe a series of optimality conditions for unconstrained and constrained
problems based on Luenberger [36], where they are proved. Since differential
calculus gives only a local information about the problem and we do not
include convexity assumptions, only relative minima can be characterized.
All these conditions can be proved by considering that any feasible curve
x(t) passing through a local minimum x of problem (2.1) has a local minimum of f [x(t)] at x . The results are obtained by applying optimality
conditions of one dimensional problems to f [x(t)]. The following theorem
gives a geometric interpretation of optimality conditions of a large class of
problems.
THEOREM 3.1. First and second order necessary conditions. If x is
a local minimum of f over then, for any feasible direction d Rn , it is
i) dt f (x ) 0
ii) if dt f (x ) = 0, then dt 2 f (x )d 0

The first result means that every improving direction of f at x is not


a feasible direction, that is F D 0. This is the case in Figure 2, where
x is a minimum while x
is not. In fact, walking along d from x
, we can get
a new feasible point with lower f .
In what follows, optimality conditions for unconstrained, equality constrained and inequality constrained optimization are discussed.

A VIEW ON NONLINEAR OPTIMIZATION

3.1. UNCONSTRAINED OPTIMIZATION

Since any d Rn is a feasible direction, the well known optimality conditions for unconstrained optimization are easily derived from Theorem 3.1.
COROLLARY 3.2. First and second order necessary conditions. If x is a
local minimum of f over Rn , then
i) f (x ) = 0.
ii) for all d Rn , dt 2 f (x )d 0. That is, 2 f (x ) is positive semidefinite.
2
A sufficient local optimality condition for this problem is stated below.
It requires the calculus of two derivatives of f .
THEOREM 3.3. Sufficient optimality conditions. Let f be a twice continuously differentiable scalar function in Rn and x such that
i) f (x ) = 0
ii) 2 f (x ) is positive definite.
Then, x is a strict local minimum point of f.

3.2. EQUALITY CONSTRAINED OPTIMIZATION

We consider now the equality constrained problem


minimize f (x)
subject to h(x) = 0

(3.1)

and introduce some definitions concerning the constraints of this problem.


DEFINITION 3.4. Let x be a point in and consider all the continuously
differentiable curves in that pass through x. The collection of all the
vectors tangent to these curves at x is said to be the Tangent Set to at
x.
DEFINITION 3.5. A point x is a Regular Point of the constraints if
the vectors hi (x), for i = 1, 2, ..., p, are linearly independent.

HERSKOVITS
JOSE

Regularity will be a requirement for most of the theoretical results and


numerical methods in constrained optimization. At a regular point x, it is
proved that
i) The Tangent Set to at x constitutes a subspace , called Tangent Space.
ii) The tangent space at x is
T {y/ht (x)y = 0}.
For a given constraint geometry, even in the case when the tangent set
constitutes a subspace, a point may or may not be regular depending on
how h is, defined [36]. The following optimality conditions are stated,
LEMMA 3.6. Let x , a regular point of the constraints h(x) = 0, be a local minimum of Problem (3.1). Then f (x ) is orthogonal to the tangent
space.
2
Since f (x ) is orthogonal to all the constraints, it can be expressed
as a linear combination of their gradients. Then, it gives the following result.
THEOREM 3.7. First Order Necessary Conditions. Let x , a regular point
of the constraints h(x) = 0, be a local minimum of Problem (3.1). Then,
there is a vector Rp such that
f (x ) + h(x ) = 0
h(x ) = 0
2
This theorem is valid even in the trivial case when p = n. Introducing
now the Lagrangian, we see that a feasible point satisfying the first order
optimality conditions, can be obtained by solving the nonlinear system in
(x, )
x l(x, ) = 0
l(x, ) = 0

(3.2)

Then, the Lagrangian plays a role similar to that of the objective function
in unconstrained optimization. This is also true when we consider the following second order conditions.
THEOREM 3.8. Second Order Necessary Conditions. Let x , a regular
point of the constraints h(x) = 0, be a local minimum of Problem (3.1).

A VIEW ON NONLINEAR OPTIMIZATION

Then there is a vector Rp such that the result of Theorem 3.7 is true
and the matrix
H(x , ) = 2 f (x ) +

p
X

i 2 hi (x )

i=1

is positive semidefinite on the tangent space, that is, y t H(x , )y 0 for


all y T .
2
The matrix H(x , ) plays a very important role in constrained optimization. When the constraints are linear, we have H(x , ) = 2 f (x ),
and it follows from the previous theorem that 2 f (x ) is positive semidefinite on the space defined by the constraints. This is a natural result in view
of Theorem 3.3. When there are nonlinear constraints, H(x , ) takes into
account their curvature.
THEOREM 3.9. Second Order Sufficiency Conditions. Let the point x
satisfy h(x ) = 0. Let Rp be a vector such that
f (x ) + h(x ) = 0
and H(x , ) be positive definite on the tangent space. Then x is a Strict
Local Minimum of Problem 3.1.
2
3.3. INEQUALITY CONSTRAINED OPTIMIZATION

Let us consider the inequality constrained problem


minimize f (x)
subject to g(x) 0.

(3.3)

We call I(x) {i/gi (x) = 0} the Set of Active Constraints at x and


say that x is a Regular Point if the vectors gi (x) for i I(x) are linearly
independent. The Number of Active Constraints at x is Card[I(x)].
It is easy to prove that, if dt gi (x) < 0 for i I(x) , then d is a feasible
direction of the constraints at x.
Suppose now that x , a regular point, is a local minimum of Problem
(3.3). It is clear that x is also a local minimum of f (x) subject to gi (x) = 0,
for i I(x ). Then, it follows from Theorem 3.7 that there is a vector
Rm such that
f (x ) + g(x ) = 0,
(3.4)
where i = 0 for i 6 I(x ).

10

HERSKOVITS
JOSE

The condition i = 0 for i 6 I(x ) is called Complementarity Condition


and can be represented by means of the following equalities
i gi (x ) = 0; for i = 1, 2, ..., m.
If we define G(x) diag[g(x)], a diagonal matrix such that Gii (x) =
gi (x), the complementarity condition is expressed as
G(x ) = 0.
An additional necessary condition,
0,
is obtained as a consequence of the first result of Theorem 3.1. In effect,
let us assume that the condition is not true. Then, for some l I(x ), it
is l < 0. As x is a regular point, given i < 0, i I(x ), we can find a
feasible direction d verifying dt gi (x ) = i . It follows from (3.4) that
dt f (x ) =

X
iI(x ), i6=l

i i l l .

Taking now i , for i I(x ) and i 6= l , small enough, we can get a


feasible direction d such that dt f (x ) < 0. Then, d is a descent direction
of f , but this conclusion is in contradiction with Theorem 3.1. These results
constitute Karush - Kuhn - Tucker optimality conditions.
THEOREM 3.10. First Order Necessary Conditions. Let x , a regular point
of the constraints g(x) 0, be a Local Minimum of Problem (3.1). Then,
there is a vector Rm such that
f (x ) + g(x ) = 0
G(x ) = 0
0
g(x ) 0.
2
In Figure 3 we have the convex cone F , defined by all the positive linear
combinations of the gradients of the active constraints [7]. The previous
theorem implies that, if x is a local minimum, then f (x ) F .
3.4. GENERAL CONSTRAINED OPTIMIZATION

The optimality conditions discussed above are easily generalized to optimization problem (2.1), with equality and inequality constraints.

A VIEW ON NONLINEAR OPTIMIZATION

Figure 3.

11

Illustration of Karush-Kuhn-Tucker conditions

DEFINITION 3.11. A point x is a Regular Point of the constraints if


the vectors hi (x), for i = 1, 2, ..., p, and gi (x) for i I(x) are linearly
independent. The tangent space at x is now
T {y/git (x)y for i I(x) and ht (x)y = 0}.
THEOREM 3.12. Karush - Kuhn - Tucker First Order Necessary Conditions. Let x , a regular point of the constraints g(x) 0 and h(x) = 0, be
a Local Minimum of Problem (2.1). Then, there is a vector Rm and a
vector Rp such that
f (x ) + g(x ) + h(x ) = 0

(3.5)

G(x ) = 0

(3.6)

h(x ) = 0

(3.7)

g(x ) 0

(3.8)

0.

(3.9)
2

THEOREM 3.13. Second Order Necessary Conditions. Let x , a regular


point of the constraints g(x) 0 and h(x) = 0, be a local minimum of
Problem (2.1). Then there is a vector Rm and a vector Rp such
that the result of Theorem 3.12 is true and the matrix
H(x , , ) = 2 f (x ) +

m
X
i=1

i 2 gi (x ) +

p
X
i=1

i 2 hi (x )

HERSKOVITS
JOSE

12

Figure 4.

Iterations of successive approximations method

is positive semidefinite on the tangent space, that is, y t H(x , , )y 0


for all y T .
2
THEOREM 3.14. Second Order Sufficiency Conditions. Let the point x
satisfy g(x ) 0 and h(x ) = 0. Let there be a vector Rm , 0
and a vector Rp such that
f (x ) + g(x ) + h(x ) = 0
and H(x , , ) be positive definite on the tangent space. Then x is a
Strict Local Minimum of Problem 2.1.
2
4. Newton-like Algorithms for Nonlinear Systems
In this section we discuss about iterative methods for solving
(y) = 0,

(4.1)

where : Rn Rn is continuously differentiable.


Let us write (4.1) as follows:
y = y (y).

(4.2)

To find a solution of (4.1) by Successive Approximations , we give an initial


trial and make repeatedly substitutions in the right side of (4.2) obtaining
the sequence
y k+1 = y k (y k ).

(4.3)

Under appropriate conditions, this sequence converges to a solution of the


system.

A VIEW ON NONLINEAR OPTIMIZATION

13

ALGORITHM 4.1 Successive Approximations Method


Data. Initial y Rn .
Step 1. Computation of the step d.
d = (y).

(4.4)

Step 2. Update.
Set
y := y + d.

Step 3. Go back to Step 1.

The above algorithm is said to be a Fixed Point Algorithm because,


if a solution is attained, then d = 0 and the rest of the sequence stays
unchanged.
We define now the function (y) y (y). The theorem that follows
gives conditions for global convergence and results about the speed of convergence of Algorithm 4.1. These results are proved in [35].
THEOREM 4.1. Let be k (y) k < 1 on Rn , then there is a unique solution y of (4.1) and the sequence generated by Successive Approximations
Method converges to y for any initial y 0 Rn . The order of convergence
is linear and the rate is equal to .
2
The assumptions of the previous theorem restrict the application of successive approximations method to a particular class of problems. In Figure
4 the process of solving a one-dimensional equation is illustrated. This is
equivalent to find the point of intersection of z = (y) with z = y. In Figure 4.a the slope of (y) is less than the unity and the iterates converge,
while in 4.b the process diverges. Since the order of convergence is linear,
Algorithm 4.1 is also called Linear Iterations Algorithm.
Let y k be an estimate of y . In a neighborhood of y k , we have
(y) (y k ) + (y k )t (y y k ).
Then, a better estimate y k+1 can obtained by making
(y k ) + (y k )t (y k+1 y k ) = 0,

(4.5)

HERSKOVITS
JOSE

14

Figure 5.

Newtons iterations

which defines Newtons Method iteration.


ALGORITHM 4.2 Newtons Method
Data. Initial y Rn
Step 1. Computation of the step d.
Solve the linear system for d
[(y)]t d = (y).

(4.6)

Step 2. Update.
Set
y := y + d.

Step 3. Go back to Step 1.

The process of Newtons method is illustrated in Figure 5. At a given


estimate of y the function is approximated by its tangent. A new estimate
is then taken at the point where the tangent crosses the y axis.
The theorem below gives the conditions for local convergence and studies the speed of convergence. It can be proved in a similar way as in [9],
[35] and [36].
THEOREM 4.2. Let (y) be twice continuously differentiable and y a solution of (y) = 0. Assume that (y )1 exists. Then if started close
enough from y , Algorithm 4.2 doesnt fail and it generates a sequence that

A VIEW ON NONLINEAR OPTIMIZATION

converges to y . The convergence is at least quadratic.

15
2

The algorithm doesnt fail if (4.6) has a unique solution. This methods
major advantage comes from its speed of convergence. However it requires
the evaluation of the Jacobian and the solution of a linear system at
each iteration, which can be very expensive in terms of computer effort.
Moreover, global convergence is not assured. The analytic Jacobian can be
replaced by a finite - difference approximation, but this is also costly since
n additional evaluations of the function per iteration are required.
With the objective of reducing computational effort, quasi - Newton
method generates an approximation of the Jacobian or of its inverse. The
basic idea of most quasi - Newton techniques is to try to construct an
approximation of the Jacobian, or of its inverse, using information gathered
as the iterates progress. Let be B k the current approximation of (y k ), a
new approximation B k+1 is obtained from
B k+1 = B k + B k .
Since

(4.7)

(y k+1 ) (y k ) (y k )t (y k+1 y k ),

B k is defined in such a way that


(y k+1 ) (y k ) = [B k+1 ]t (y k+1 y k ).

(4.8)

Substitution of (4.7) in (4.8) gives n conditions to be satisfied by B k .


Since B k has n2 elements, these conditions are not enough to determine
it. Several updating rules for B k+1 were proposed [10], Broydens Rule being
the most successful,
B k+1 = B k + ( B k ) t / t ,

(4.9)

where = y k+1 y k and = (y k+1 ) (y k ).


ALGORITHM 4.3 Quasi - Newton Method
Data. Initial y Rn and B Rnn
Step 1. Computation of the step d.
Solve the linear system for d
B t d = (y).

(4.10)

HERSKOVITS
JOSE

16
Step 2. Update.
(i) Set

y := y + d
and
B := B + B.
Step 3. Go back to Step 1.

In Step 2, B can be updated using (4.9) or other rules. The following theorem is proved in [10].
THEOREM 4.3. Let (y) be twice continuously differentiable and y a solution of (y) = 0. Assume that (y )1 exists. Then, if started close
enough to y and the initial B is close enough from (y ), Algorithm 4.3
doesnt fail and it generates a sequence that converges to y . The convergence is superlinear.
2
Although quasi - Newton methods have the advantage of avoiding the
computation of (y), the initial B must be a good approximation of (y)
to have local convergence.
Looking to Algorithms 4.1 to 4.3 we note that they have a similar structure. All of them define a step by the expression
Sd = (y),
where S I in the linear iterations, S is an approximation of (y) in
quasi - Newton methods or S (y) in Newtons method. The rate of
convergence goes from linear to quadratic. We call this kind of iterations
Newton - like algorithms.
5. Unconstrained Optimization
Let us consider the unconstrained optimization problem
minimize f (x); x Rn .

(5.1)

According to Corollary 3.3, a local minimum x is a solution of the system


of equations
f (x) = 0.
(5.2)
In this section we show that the best known techniques for unconstrained optimization can be obtained by applying the Newton - like algorithms studied above to solve (5.2). Some favorable characteristics of

17

A VIEW ON NONLINEAR OPTIMIZATION

optimization problems, conveniently explored, lead to globally convergent


algorithms.
This system is generally nonlinear in x, being linear only when f is
quadratic. The Jacobian 2 f (x) is symmetric. As a consequence of second
order optimality conditions 2 f (x) is positive definite, or at least positive
semidefinite, at a local minimum.
Following the techniques discussed in the previous section, (5.2) can be
solved using Newton - like iterations. The change of x, called d, is given
now by the expression
Sd = f (x),

(5.3)

where S can be taken equal to the identity, to 2 f (x) or to a quasi - Newton


approximation of 2 f (x).
In general this procedure is not globally convergent. To get global convergence each iterate must be nearer the solution than the previous one. In
unconstrained optimization this happens if the function is reduced at each
new point.
If S is positive definite, d is a descent direction of f at x. In effect, it
follows from (5.3) that dt f (x) = dt Sd. Then, dt f (x) < 0. This result
means that d points towards the lower values of the function, but it does
not imply that f (x + d) < f (x).
Since d is a descent direction, we can find a new point x + td in a way
to have a satisfactory reduction of f . In this case, d is called the Search
Direction and the positive number t, the Step Length. The procedure to find
t is called the Line Search. Different Line Search Criteria can be adopted
to decide whether the step length is adequate or not.
As the search direction is zero only at a solution of (5.2), the line search
cannot allow step lengths that are null or that go to zero. Otherwise premature convergence to points that are not a solution would be obtained.
The algorithm that follows is based on these ideas. It is globally convergent to points satisfying first order optimality conditions.
ALGORITHM 5.1 A Basic Algorithm for Unconstrained Optimization
Data. Initialize x Rn and S Rnn symmetric and positive definite.
Step 1. Computation of the search direction d.
Solve the linear system for d
Sd = f (x).

(5.4)

HERSKOVITS
JOSE

18
Step 2. Line search.

Find a step length t satisfying a given line search criterium.


Step 3. Update.
Set
x := x + td
and define a new S

Rnn

symmetric and positive definite.

Step 4. Go back to Step 1.

Particular versions of this algorithm can be obtained by choosing S and


a line search procedure. The best alternative depends on the problem to be
solved, the available information about f and the desired speed of convergence. Even if far from a local minimum 2 f (x) is not necessarily positive
definite, to get descent search directions, S must be taken positive definite.
Moreover, we make the following assumption about S.
ASSUMPTION 5.1. There exists positive numbers 1 and 2 such that
1 k d k2 dt Sd 2 k d k2
for any d Rn .
5.1. ABOUT THE LINE SEARCH

At a given point x, once the search direction d is defined, f (x + td) becomes


a function of the single variable t. The first idea is to walk on d until the
minimum on that direction is reached; that is, to find t that minimizes
f (x + td). This procedure is called Line Search by Exact Minimization,
even though in practice the exact minimum can be rarely obtained. Exact
minimization is done in an iterative way and is very costly.
Modern algorithms include Inaccurate Line Search Criteria that also
ensure global convergence [10], [32], [36]. The line search can be completed
in a few number of iterations. A very simple procedure is due to Armijo.
Armijos Line Search
Define the step length t as the first number of the sequence {1, , 2 , 3 , ...}
satisfying
f (x + td) f (x) + t1 f t (x)d,
(5.5)

A VIEW ON NONLINEAR OPTIMIZATION

Figure 6.

19

(a) Armijos search (left);(b) Wolfes criterion (right)

where 1 (0, 1) and (0, 1).

Condition (5.5) on t ensures that the function is reduced at least 1


times the reduction of a linear function, tangent to f at t = 0. In Figure
6.a we have p1 (t) = f (x) + t1 f t (x)d. The acceptable step is in [tl , tu ]. It
is easy to deduce that tl = inf (1, tu ).
Wolfes inaccurate line search criterion also establishes bounds on the
step length by requiring a reduction of the function and, at the same time,
a reduction of its directional derivative.
Wolfes Criterion
Accept a step length t if (5.5) is true and
f t (x + td)d 2 f t (x)d
where 1 (0, 1/2) and 2 (1 , 1) .

(5.6)
2

This criterion is illustrated in Figure 6.b, where the slope is 2 f t (x)d


at tl . Condition (5.5) defines an upper bound on the step length and (5.6)
a lower bound. Then, a step t is too long if (5.5) is false and too short if
(5.6) is false.
A step length satisfying Wolfes criterion can be obtained iteratively
[32]. Given an initial t, if it is too short, extrapolations are done until a
good or a too long step is obtained. If a too long step was already obtained,
interpolations based on the longest short step and the shortest long step are
done, until the criterion is satisfied. Since the function and the directional
derivative are evaluated for each new t, cubic interpolations of f can be
done. As the criterion of acceptance is quite wide, the process generally
requires very few iterations.

20

HERSKOVITS
JOSE

Figure 7.

Steepest Descent iterates

The number of iterations required by Armijos line search is usually


greater than using Wolfes criterion, but it is simple to code and it does
not require calculation of f . Goldsteins Criterion, described in [36], also
does not require f . Very efficient line search algorithms can be obtained
by combining polynomial interpolations with Goldsteins criterion.
5.2. CONSIDERATIONS ABOUT GLOBAL CONVERGENCE

Global convergence of Algorithm 5.1 can be proved if Assumption 5.1 is


true and any of the previously discussed line search criteria is adopted. We
are not going to present the proof in this paper. In the case of Armijos
search this result is a particular case of Theorem 4.7 in [40].
5.3. FIRST ORDER ALGORITHMS

Taking S I in Algorithm 5.1, we have d = f (x). That is, the search


direction is opposite to the gradient. It is easy to prove that the downhill
slope of f is maximum in the direction of d.
One of the most widely known methods for unconstrained optimization
is the Steepest Descent Algorithm, that includes an exact minimization in
the line search. In Figure 7 we illustrate the iterative process to solve a two
dimensional problem described by some level lines f (x) = constant. Since
the search directions are normal to the level line at the current point and
f t (x + td)f (x) = 0 in the line search, each direction is orthogonal to
the previous one.
Modern first order algorithms include inaccurate line search procedures
instead of the exact minimization. Although the number of iterations is not
smaller, there is generally reduction in the overall computer time.
Let r be the Condition Number of 2 f (x ), defined as the ratio of
the largest to the smallest eigenvalue. If the steepest descent algorithm

A VIEW ON NONLINEAR OPTIMIZATION

21

generates a sequence {xk } converging to x , then the sequence of objective


values f (xk ) converges linearly to f (x ) with a convergence ratio no greater

than = r1
r+1 . This result is proved in refs. [19] and [36]. It implies that
the convergence is slower as the conditioning of 2 f (x ) becomes worse.
5.4. NEWTONS METHOD

Newtons algorithm is obtained by taking S 2 f (x). To have descent


search directions, S must be positive definite. This is not necessarily true
at any point even though, according to Theorem 3.3, 2 f (x) is positive
definite at a strict local minimum.
It follows from Theorem 4.2 that, if f (x) is three times continuously
differentiable at x and there exists K > 0 such that tk = 1 for k > K, then
the convergence of Algorithm 5.1 with S 2 f (x) is at least quadratic.
When Armijos line search or Wolfes criterion is adopted, since 2 f (x ) is
positive definite, it is easy to prove that a unit step length can be obtained
near the solution. This is a requirement of quasi - Newton and Newton
algorithms to have superlinear and quadratic convergence.
To make S positive definite, Newton method is modified by taking S
2 f (x) + I, where > 0 is large enough to satisfy Assumption 5.1 and
0 [10], [19], [36]. Since the search direction is now a combination
of steepest descent and Newtons directions, the speed of convergence is
not as good as Newtons method. This approach is known as Levenberg Marquardt method. The major difficulty is to determine , that is not too
large, in a way to perturb Newtons iteration as little as possible.
5.5. QUASI-NEWTON METHOD

We define S as B, a quasi Newton approximation matrix for 2 f (x) [9].


Since the Hessian is symmetric, it is reasonable to generate B symmetric.
Then, given an initial symmetric B, we need symmetric B. Rank One
Updating Rule adds to B a rank one matrix of the form
B = zz t ,
where is a number and z a vector in Rn . By imposing (4.8) the following
rule is obtained [36],
B k+1 = B k +

( B k )( B k )t
,
t ( B k )

(5.7)

where now it is = xk+1 xk and = f (xk+1 ) f (xk ).


To obtain descent directions in Algorithm 5.1, B is required to be positive definite. If B k is positive definite, we need t ( B k ) > 0 to have

22

HERSKOVITS
JOSE

B k+1 positive definite. Unfortunately this is not always true. Several Rank
Two Updating Rules overcome this problem. This is the case of Broyden Fletcher - Shanno - Goldfarb (BFGS) formulae
B k+1 = B k +

t B k t B k
t k .
t
B

(5.8)

It can be proved that, if B k is positive definite, then


t > 0

(5.9)

is a sufficient condition to have B k+1 positive definite. It can be easily


shown that this condition is automatically satisfied if the line search in
Algorithm 5.1 is an exact minimization or if Wolfes criterion is adopted.
As a consequence of Theorem 4.3, the convergence is superlinear. A unit
step length near the solution is also required.
6. Equality Constrained Optimization
The techniques for unconstrained optimization studied above can be extended to the problem
minimize f (x)
subject to h(x) = 0.

(6.1)

The pair (x , ), satisfying first order optimality conditions, is obtained


by solving with Newton - like iterations the nonlinear system
f (x) + h(x) = 0

(6.2)

h(x) = 0,

(6.3)

where the unknowns x and are called Primal and Dual Variables respectively. To have unique in (6.2), x must be a regular point of the problem.
Due to this fact, the algorithms that solve first order optimality conditions,
require the assumption that all the iterates xk are regular points of the
problem.
Denoting y (x, ) and

(y)

f (x) + h(x)
h(x)

(6.4)

we have

(y)

H(x, ) ht (x)
h(x)
0

(6.5)

23

A VIEW ON NONLINEAR OPTIMIZATION

where H(x, ) =

2 f (x)

p
X

i 2 hi (x) is the Hessian of the Lagrangian

i=1

defined in Theorem 3.8.


A Newtons iteration that starts at (xk , k ) and gives a new estimate
(xk+1 , k+1 ) is then stated as follows:
"

H(xk , k ) h(xk )
ht (xk )
0

# "

xk+1 xk
k+1 k

"

f (xk ) + h(xk )k
h(xk )

(6.6)
As the system (6.4) is linear in , when
is known Newtons method
gives in one iteration. In fact, taking xk = x in (6.6), we have that
(xk+1 , k+1 ) = (x , ) for any k . We conclude that k+1 when
{xk } x . This remark suggests that a line search concerning only the
primal variables x is enough to get global convergence in (x, ).
In a similar way as in Section 4, can be substituted by a quasi
- Newton approximation or by the identity matrix. Since anyway, in the
present problem we need h to evaluate , it seems more efficient to substitute only H(xk , k ) in by its quasi Newton approximation or by the
identity. Calling S k this matrix and dk the change in x, we have the linear
system of equations
x

S k dk + h(xk )k+1 = f (xk )

(6.7)

ht (xk )dk = h(xk )

(6.8)

that gives dk , a search direction in x, and k+1 , a new estimate of . If xk


is a regular point and S k is positive definite, then it can be proved that the
solution of (6.7), (6.8) is unique.
In unconstrained optimization we know that a minimum is approached
as the objective function decreases. This is not always true in constrained
minimization since an increase of the function can be necessary to obtain
feasibility. Then, an appropriate objective for the line search is needed.
With this purpose we define the auxiliary function
(x, r) = f (x) +

p
X

ri khi (x)k,

(6.9)

i=1

where ri are positive constants. It can be shown that (x, r) is an Exact


Penalty Function of the equality constrained problem if ri are large enough
[36]. In other words, there exists a finite r such that, for ri ri , the
unconstrained minimum of (x, r) occurs at the solution of the problem
(6.1). When compared with others penalty functions, is numerically very

24

HERSKOVITS
JOSE

advantageous since it does not require penalty parameters going to infinite


[35]. However, is not differentiable at points on the constraints, requiring
nonsmooth optimization techniques [32].
Denoting by SG(h) a diagonal matrix such that SGii (h) sg(hi ), where
sg(.) = (.)/|(.)| and sg(0) = 0, we can write
(x, r) = f (x) + rt SG[h(x)]h(x).

(6.10)

Suppose now that xk is a regular point, S is positive definite and


ri kk+1
k; i = 1, 2, ..., p.
i

(6.11)

Then, dk given by (6.7) and (6.8) is a descent direction of (x, r) at xk .


In effect, it exists > 0 such that sg[hi (xk + tdk )] doesnt change for any
t (0, ]. Calling (xk , tdk ) (xk + tdk , r) (xk , r), we have
(xk , tdk ) = tdkt f (xk ) + tdkt h(xk )SG[h(xk + tdk )]r + o(t),
for any t (0, ], where o(t) 0 faster than t. As a consequence of (6,7)
and (6,8),
dkt f (xk ) = dkt S k dk + ht (xk )k+1 .

(6.12)

Then, considering (6.8) and (6.12), we get


(xk , tdk ) = tdkt S k dk + tht (xk ){k+1 SG[h(xk + tdk )]r} + o(t)
and there exists (0, ] such that (xk + tdk , r) < (xk , r) for any
t [0, ), what proves the assertion above.
The following globally convergent algorithm takes (x, r) as the objective of a line search along dk .
ALGORITHM 6.1 A Basic Algorithm for Equality Constrained Optimization
Data. Initial x Rn and S Rnn symmetric and positive definite and
r Rp , r > 0.
Step 1. Computation of the search direction d and an estimate of the Lagrange multipliers .
Solve the linear system in (d, )
Sd + h(x) = f (x)
t

h (x)d = h(x)

(6.13)
(6.14)

A VIEW ON NONLINEAR OPTIMIZATION

25

Step 2. Line search.


i) If ri ki k, then set ri > ki k, for i = 1, 2, ..., p.
ii) Find a step length t satisfying a given line search criterion on the
auxiliary function
(x, r) = f (x) + rt SG[h(x)]h(x).
Step 3. Update.
Set
x := x + td
and define a new S Rnn symmetric and positive definite.
Step 4. Go back to Step 1.

Assumption 5.1 is also adopted in this algorithm. The same line search
procedures as in Algorithm 5.1 for unconstrained optimization can be employed. However some precautions must be taken here since is nonsmooth
[32].
6.1. FIRST ORDER METHOD

First order algorithms are obtained by taking S I in Algorithm 6.1. In the


case when the constraints are linear, a natural extension of gradient methods to equality constrained optimization consists on taking an initial point
on the constraints and a search direction obtained by projecting f (xk )
on the constraints. This direction is known as the Projected Gradient Direction [36], [43] and denoted here by d . A new point on the constraints
is then obtained.
We have that f (x) can be written as the sum of its projection on
the tangent space and a vector orthogonal to all the constraints. Then, if
the constraints are regular, there exists Rp such that
f (x) = d + h(x)

(6.15)

and, since d is orthogonal to the set {h1 , h2 , ..., hp }, we have


ht (x)d = 0.

(6.16)

It can be concluded that taking S I in Algorithm 6.1, the search direction


at points where h(x) = 0 becomes the Projected Gradient.

HERSKOVITS
JOSE

26

The results about the speed of convergence of gradient algorithms for


unconstrained optimization are easily extended to the projected gradient
method. The convergence is linear and the convergence ratio no greater

than = r1
r+1 , where now r is the ratio of the largest to the smallest
eigenvalue of 2 f (x ) as measured on the constraints. This is not surprising
since the iterates walk on the constraints.
The Projected Gradient Method was extended by Rosen [20], [44] to
problems with nonlinear constraints.Given a point xk on the constraints, a
better point x
k on the projected gradient is obtained. Since the projected
gradient no longer follows the constraints surface, xk is generally infeasible. The new iterate xk+1 is the projection of x
k on the constraints. This
projection is done in an iterative way, which is very costly.
When the constraints are nonlinear, is the ratio of the largest to the
smallest eigenvalue of the Hessian of the Lagrangian H(x , ) as measured
on the tangent space that determines the speed of convergence [36]. The
effect on the rate of convergence due to the curvature of the constraints is
included in H(x , ).
Algorithm 6.1, when applied to nonlinearly constrained problems, is
much more efficient than the projected gradient method. Since the iterates
are not required to be feasible, the projection stage is avoided.
6.2. NEWTONS METHOD

Taking S H(xk , k ) we have a Newtons equality constrained optimization algorithm. Even though in the majority of applications the computation of the second derivatives of f and g is very expensive, in several
problems they are available or easy to obtain.
The Hessian of the Lagrangian function, H(x, ), is not necessarily positive definite. Theorem 3.9 only ensures that H(x , ) is positive definite
on the tangent space. To have positive definite S, a procedure similar to
Levenberg - Marquardt method for unconstrained optimization can be employed.
6.3. QUASI-NEWTON METHOD

In this class of algorithms, S is defined as a quasi - Newton approximation


matrix of the Hessian of the Lagrangian H(x , ), that we call B. In principle, B can be obtained using the same updating rules as in unconstrained
optimization, but taking x l(x, ) instead of f (x).
As H(x, ) is not necessarily positive definite at (x , ), it is not always
possible to get B positive definite. To overcome this difficulty, Powell [42]

A VIEW ON NONLINEAR OPTIMIZATION

27

proposed a modification of BFGS updating rule that takes


= xk+1 xk
and
If

= x l(xk+1 , k ) x l(xk , k ).
t < 0.2 t B,

then it is computed
=

0.8 t B
t B t

and taken
= + (1 )B.
Finally, BFGS updating rule (5.8) is employed.
6.4. ABOUT THE SPEED OF CONVERGENCE

Since the techniques studied in this section are based on iterative algorithms
for nonlinear systems, (xk , k ) converges to (x , ) at the same speed as
the original algorithm. Then, the convergence of (xk , k ) is quadratic, superlinear or linear, depending if a Newtons, a quasi - Newton or a first
order algorithm for equality constrained optimization is employed.
In practice, we are more interested on the speed of convergence of xk
than (xk , k ). Several authors studied this point, in particular Powell [41],
Gabay [16], Hoyer [33], Gilbert [18]. Basically the same results are obtained,
but in two steps, i. e., Powell proved that quasi - Newton algorithms are
two-steps superlinearly convergent. That is
limk

k xk+2 x k
= 0.
k xk x k

As in unconstrained optimization, when applying Newtons or quasi Newton algorithms, the step length must be the unity near the solution.
There are examples showing that, taking t = 1 in Step 3) of Algorithm
6.1, it is not always possible to get a reduction of near the solution. This
is known as Maratos effect [37]. Several researches have been looking for
methods to avoid Maratos effect [18], [24], [27].
7. Sequential Quadratic Programming Method
To extend the ideas presented above in a way to solve the general nonlinear
programming problem (2.1), one major difficulty has to be overcome. While

28

HERSKOVITS
JOSE

in unconstrained or in equality constrained optimization, a point satisfying


first order necessary optimality condition can be obtained by solving a
system of equations, problems including inequality constraints require the
solution of the system of equations and inequations (3.5) - (3.9). That is, a
solution of the system of equation (3.5) - (3.7), that satisfies the inequalities
(3.8) and (3.9) has to be found.
Sequential Quadratic Programming (SQP), that is at the moment the
largest employed method for nonlinear constrained optimization, is a quasi
- Newton technique based on an idea proposed by Wilson [50] in 1963 and
interpreted by Beale [8] in 1967.
A Quadratic Program is a class of constrained optimization problems
such that the objective is a convex quadratic function and the constraints
are linear. Efficient techniques to solve this problem are available, even
when inequality constraints are included. The exact solution is obtained
after a finite number of iterations [36].
To explain Wilsons idea, we take x constant and consider the following
quadratic programming problem that has d Rn as unknown,
minimize 12 dt Sd + f t (x)d
subject to ht (x)d + h(x) = 0.

(7.1)

Since (7.1) is a convex problem, the global minimum satisfies Karush Kuhn - Tucker optimality conditions
Sd + f (x) + h(x) = 0

(7.2)

ht (x)d + h(x) = 0,

(7.3)

where are the Lagrange multipliers. Then, (d, ) in Algorithm 6.1 can
be obtained by solving the quadratic program (7.1) instead of the linear
system (6.13), (6.14). Based on this fact, to solve the problem

minimize f (x)

subject to g(x) 0

and h(x) = 0,

(7.4)

Wilson proposed to define the search direction d and new estimates of the
Lagrange multipliers and by solving at each iteration

minimize 12 dt Sd + f t (x)d

subject to g t (x)d + g(x) 0

and ht (x)d + h(x) = 0.

(7.5)

Wilsons is a Newton algorithm. Garcia Palomares and Mangasarian


proposed later a quasi - Newton technique [17], Han obtained a globally
convergent algorithm [21] and Powell proved superlinear convergence [41].

29

A VIEW ON NONLINEAR OPTIMIZATION

The exact penalty function


(x, s, r) = f (x) +

m
X

si {sup[0, gi (x)]} +

i=1

p
X

ri khi (x)k

(7.6)

i=1

is taken as the objective of the line search. If r satisfies (6.11) and


si i , for i = 1, 2, ..., m,

(7.7)

then d that solves (7.5) is a descent direction of (x, s, r). This result is
proved in [36].
In Sequential Quadratic Programming algorithms, the matrix S is defined as a quasi - Newton approximation of the Hessian of the Lagrangian.
Most of the optimizers employ BFGS rule modified by Powell, as explained
in the last section. SQP algorithm can be stated as follows,
ALGORITHM 7.1 Sequential Quadratic Programming
Parameters. r Rp and s Rm positive.
Data. Initialize x Rn and B Rnn symmetric and positive definite.
Step 1. Computation of the search direction d and an estimate of the Lagrange multipliers and .
Solve the quadratic program for d,

minimize 12 dt Sd + f t (x)d

subject to g t (x)d + g(x) 0

and ht (x)d + h(x) = 0

(7.8)

Step 2. Line search.


i) If ri ki k, then set ri > ki k, for i = 1, 2, ..., p.
ii) If si i , then set ri > i , for i = 1, 2, ..., m.
iii) Find a step length t satisfying a given line search criterion on the
auxiliary function
(x, s, r) = f (x) +

m
X
i=1

si {sup[0, gi (x)]} +

p
X
i=1

ri khi (x)k

HERSKOVITS
JOSE

30
Step 3. Updates.

Let be = td and = x l(x + td, , ) x l(x, , ).


i) If t < 0.2 t B, then compute
=

0.8 t B
t B t

and set = + (1 )B.


ii) Set
B := B +

t B t B
t
t
B

and
x := x + td
Step 4. Go back to Step 1.

This algorithm generates sequences that are globally convergent to Karush


- Kuhn - Tucker points of the problem. However it fails at points where the
quadratic program has no a solution. In effect, since the constraints of the
quadratic program solved in Step 1 are linear approximations of the constraints of the original problem, the feasible region may be empty.
The asymptotic speed of convergence has similar properties as quasi
- Newton algorithms for equality constrained optimization and Maratos
effect can also occur.
7.1. A FEASIBLE DIRECTIONS ALGORITHM

Feasible directions algorithms are an important class of methods for solving


constrained optimization problems. At each iteration, the search direction
is a feasible direction of the inequality constraints and, at the same time,
a descent direction of the objective or an other appropriate function. A
constrained line search is then performed to obtain a satisfactory reduction
of the function without loosing the feasibility.
The fact of giving feasible points makes feasible directions algorithms
very efficient in engineering design, where functions evaluation is in general
very expensive. Since any intermediate design can be employed, the iterations can be stopped when the cost reduction per iteration becomes small
enough.
There are also several examples that deal with an objective function, or
constraints, that are not defined at infeasible points. This is the case of size

A VIEW ON NONLINEAR OPTIMIZATION

31

and shape constraints in structural optimization. When applying feasible


directions algorithms to real time problems, as feasibility is maintained and
cost reduced, the controls can be activated at each iteration .
In what follows we describe an SQP feasible directions algorithm based
on a technique presented by Herskovits in [26] and by Herskovits and Carvalho in [27]. By solving the quadratic program (7.5), this algorithm defines
first (d0 , 0 ), where d0 is a descent direction in the primal space of l(x, 0 ).
However, d0 is not necessarily a feasible direction since, if an inequality
constraint of problem (7.4) is active and the corresponding constraint in
(7.5) is also active, then d0 is tangent to the feasible set. In a second stage,
the algorithm obtains a feasible and descent search direction d by solving
a modified quadratic program with equality constraints only.
To present the method, we consider the inequality constrained optimization problem
minimize f (x)
subject to g(x) 0,

(7.9)

whose feasible set is {x Rn /g(x) 0}, and introduce the following


definition:
DEFINITION 7.1. A vector field d(x) defined on is said to be a uniformly feasible directions field of the problem (7.9), if there exists a step
length > 0 such that x + td(x) for all t [0, ].
This condition is much stronger than the simple feasibility of d(x) for
any x . When d(x) constitutes a uniformly feasible directions field, it
supports a feasible segment [x, x + (x)d(x)], such that (x) is bounded
below in by > 0.
As a consequence of the feasibility requirement, the search directions of
feasible directions algorithms must constitute a uniformly feasible directions
field. Otherwise, the step length may go to zero, forcing convergence to
points which are not KKT points.
We state now the algorithm
ALGORITHM 7.2 SQP Feasible Directions Algorithm
Parameters. (0, 1) and > 0.
Data. Initialize x Rn feasible and B Rnn symmetric and positive
definite.

HERSKOVITS
JOSE

32

Step 1. Computation of a descent direction d0 and an estimate of the Lagrange multipliers 0 .


Solve the quadratic program in d0
minimize 12 dt0 Bd0 + f t (x)d0
subject to g t (x)d0 + g(x) 0

(7.10)

Step 2. Computation of the search direction d.


i) Let the active set be J(x) {j 1, 2, ..., m/0j > 0} and
gJ (x) [gj (x)]t , j J(x).
If gJt (x)[gJt (x)B 1 gJ (x)]1 e < 0 , find
J =

(1 )lt (x, 0 )d0


gJt (x)[gJt (x)B 1 gJ (x)]1 e

(7.11)

where the Lagrangian l(x, ) = f (x) + t g(x) and e [1, 1, ..., 1]t .

ii) Let the inactive set be J(x)


{j 1, 2, ..., m/0j = 0}.

For each j J(x),


if
{1 gjt (x)B 1 gJ (x)[gJt (x)B 1 gJ (x)]1 e} < 0,
find

j =

gj (x) + gjt (x)d0


1 gJt (x)B 1 gJ (x)[gJt (x)B 1 gJ (x)]1 e

(7.12)

iii) Set =inf{ k do k2 , J ; j , j J(x)}.


iv) Solve the quadratic program in d
minimize 12 dt Bd + f t (x)d
subject to gJt (x)d + gJ (x) = e

(7.13)

Step 3. Line search.


Find a step length t satisfying a given constrained line search criterion
on the Lagrangian function l(x, 0 ) and such that (x+td) is a feasible point
of the nonlinear program (7.9).

A VIEW ON NONLINEAR OPTIMIZATION

33

Step 4. Updates.
Let be = td and = x l(x + td, 0 ) x l(x, 0 ).
i) If t < 0.2 t B, then compute
=

0.8 t B
t B t

and set = + (1 )B.


ii) Set
B := B +

t B t B
t
t
B

and
x := x + td
Step 4. Go back to Step 1.

As x is an admissible point of problem (7.9), the feasible region of (7.10)


is non empty. In effect, d0 = 0 satisfies the constraints of (7.10). Considering
KKT conditions of problem (7.10), since B is positive definite, it is easy to
show that d0 is a descent direction of l(x, 0 ). However, as discussed above,
d0 is not necessarily a feasible direction of the problem.
To obtain a feasible direction, we define the equality constrained quadratic
program (7.13) for d, where > 0. It follows from the constraints of this
last problem that gJt (x)d < 0, then d is a feasible direction of the original nonlinear program. The constraints of (7.13) are obtained by moving
slightly the active constraints of (7.10) to the interior of the feasible region.
Since this movement is proportional to , by establishing bounds on we
can ensure that d is a descent direction of l(x, 0 ) and that d is also a feasible direction, even in relation to the constraints of the original problem
that are not included in (7.13). These bounds are defined by (7.11) and
(7.12) respectively. In fact, in [27] it is proved that
t l(x, 0 )d t l(x, 0 )d0 ,

(7.14)

when (7.11) is true, and also that


gj (x) + gjt (x)d ,

(7.15)

if satisfies (7.12). As a consequence of (7.14), to get a feasible direction


we pay the price reducing the directional derivative of l(x, 0 ) by .

HERSKOVITS
JOSE

34

Although the present method requires a constrained line search, the


objective is smooth. This is an important advantage of feasible directions
algorithms.
In [27], the authors show that the search directions obtained by the
present approach constitute a uniformly feasible directions field. It can be
proved that if the sequence given by the algorithm converges, the limit is
a KKT point of the problem. In a similar way as in [40], it can be shown
that the rate of convergence of the present method is superlinear and that
Maratos effect is avoided.
7.2. THE CONSTRAINED LINE SEARCH

Feasible directions algorithms, and others methods that produce a sequence


of admissible points with respect to the inequality constraints, need a constrained line search. We consider Problem (7.9) and call (x) the line search
objective of an algorithm to solve this problem. It is assumed that (x) is
continuously differentiable in . The previous feasible directions algorithm
takes (x) = l(x, 0 ). The corresponding Constrained Line Search by Exact Minimization consists on finding t that minimizes (x + td), subject
to g(x + td) 0, where x is the current iterate and d the search direction.
To define Inaccurate Constrained Line Search Criteria we must take
in consideration that the exact minimum t may be in the interior or in
the boundary. In both cases there exists t such that (5.5) is true. That
is, a decrease of the line search objective can always be obtained. On the
other hand, if the exact minimum is not in the interior, the existence of t
such that (5.6) holds is not assured. As an example, when is linear the
derivative cannot be reduced.
We propose now extensions of Armijos and Wolfes criteria to constrained line search.
Armijos Constrained Line Search
Define the step length t as the first number of the sequence {1, , 2 , 3 , ...}
satisfying
(x + td) (x) + t1 t (x)d
(7.16)
and
g(x + td) 0
where 1 (0, 1) and (0, 1) also.

(7.17)
2

A VIEW ON NONLINEAR OPTIMIZATION

35

Wolfes Constrained Line Search Criterion


Accept a step length t if (7.16) and (7.17) are true and at least one of the
followings m + 1 conditions hold:
t (x + td)d 2 t (x)d

(7.18)

gi (x + td) gi (x); i = 1, 2, ..., m

(7.19)

and
where now 1 (0, 1/2), 2 (1 , 1) and (0, 1).

Conditions (7.16) and (7.17) define upper bounds on the step length in
both criteria and, in Wolfes criterion, a lower bounds is given by one of the
conditions (7.18) and (7.19). The iterative procedure to find a step length
satisfying Wolfes criterion is similar to that employed in unconstrained
optimization, but interpolations of the constraints have to be also done.
8. A Newton-like Interior Point Method
In the last section, the Sequential Quadratic Programming Method was presented as a technique to extend Newton - like iterations to inequality constrained optimization. We present now a family of Newton-like algorithms
to solve Karush-Kuhn-Tucker optimality conditions. To obtain these algorithms, Newton - like iterations to solve the nonlinear system in (x, ) given
by the equalities (3.5) - (3.7) in KKT conditions, are first defined. Then,
these iterations are slightly modified in such a way to have the inequalities
(3.8) and (3.9) satisfied at each iteration.
The algorithms based of this approach require an initial estimate of x,
at the interior of the feasible region defined by the inequality constraints,
and generate a sequence of points also at the interior of this set. They are
feasible directions algorithms, since at each iteration they define a search direction that is a feasible direction with respect to the inequality constraints
and a descent direction of the objective, or another appropriate function.
When only inequality constraints are considered, the objective is reduced
at each iteration. In the general problem, an increase of the objective may
be necessary to have the equalities satisfied.
The present method is simple to code, strong and efficient. It does not
involve penalty functions, active set strategies or quadratic programming
subproblems. It only requires the solution of two linear systems with the
same matrix at each iteration, followed by an inaccurate line search.
This approach, proposed by the author, is the basis of the first order
algorithm and the quasi - Newton one described in [25], [26], [27] and [28].
This is also the case of the quasi - Newton algorithms described in [29] and

HERSKOVITS
JOSE

36

[40] and of a quadratic programming algorithm developed by Tits et al.


[45]. A unified study of this technique is presented in [30] and [31].
Several problems in Engineering Optimization were solved using the
present method. We can mention applications in structural optimization
[25], fluid mechanics [4], multidisciplinary optimization with aerodynamics
and electromagnetism [4], [3], and real time optimal control [11], [12].
A Newton algorithm for limit analysis of solids with nonlinear yield
functions was proposed by Zouain et al. [49]. A Newton algorithm for stress
analysis of linear elastic solids in contact is described in [2] and [48].
To explain the ideas behind Newton - like Interior Point Method, we
consider the nonlinear inequality constrained program

minimize f (x)
subject to g(x) 0.

(8.1)

and the corresponding KKT optimality conditions


f (x) + g(x) = 0

(8.2)

G(x) = 0

(8.3)

(8.4)

g(x) 0.

(8.5)

A Newtons iteration to solve the nonlinear system of equations (8.2),


(8.3) in (x, ) is stated as
"

H(xk , k ) g(xk )
k g t (xk ) G(xk )

# "

xk+1 xk
k+1
k
0

"

f (xk ) + g(xk )k
G(xk )k

(8.6)
where
is the starting point of the iteration and
is the
new estimate, H(x, ) is the Hessian of the Lagrangian and a diagonal
matrix with ii i . Proceeding in a similar way as in Section 6, we define
the linear system in (dk0 , k+1
0 )
(xk+1 , k+1
0 )

(xk , k )

S k dk0 + g(xk )k+1


= f (xk )
0

(8.7)

k g t (xk )dk0 + G(xk )k+1


= 0,
0

(8.8)

where dk0 is a direction in the primal space, defined by dk0 = xk+1 xk ,


is a new estimate of . The matrix S k is symmetric and positive
and k+1
0
definite and it can be taken as to H(xk , k ), a quasi-Newton approximation
of H(xk , k ), or the identity.

A VIEW ON NONLINEAR OPTIMIZATION

37

It can be proved that d0 is a descent direction of f . However, d0 is not


useful as a search direction since it is not necessarily feasible. This is due
to the fact that as any constraint goes to zero, (8.8) forces d0 to tend to a
direction tangent to the feasible set. In fact, (8.8) is equivalent to
ki git (xk )dk0 + gi (xk )k+1
0i = 0 ; 1, 2, ..., m,

(8.9)

which implies that git (xk )dk0 = 0 for i such that gi (xk ) = 0.
k+1
To avoid this effect, we define the linear system in dk and
k+1 = f (xk )
S k dk + g(xk )

(8.10)

k+1 = k k ,
k g t (xk )dk + G(xk )

(8.11)

obtained by adding a negative vector to the right side of (8.8), where the
scalar factor is positive. In this case, (8.11) is equivalent to
k+1 = k k ; i = 1, m,
ki git (xk )dk + gi (xk )
i
i
and then, git (xk )dk < 0 for the active constraints. Thus, dk is a feasible
direction.
The addition of a negative number in the right hand side of (8.8) produces the effect of deflecting dk0 into the feasible region, where the deflection
is proportional to k . As the deflection of dk0 is proportional to k and dk0
is a descent direction of f , it is possible to find bounds on k , in a way to
t
ensure that dk is also a descent direction. Since dk0 f (xk ) < 0, we can get
these bounds by imposing
t

dk f (xk ) dk0 f (xk ),

(8.12)

which implies dk f (xk ) < 0.


Condition (8.12) means that dk is in a circular straight cone whose axis
is f (xk ). In general, the rate of descent of f along dk will be smaller than
along dk0 . This is a price that we pay for obtaining a good feasible descent
direction.
Let us consider the auxiliary linear system
S k dk1 + g(xk )k+1
=0
1

(8.13)

= k ,
k g t (xk )dk1 + G(xk )k+1
1

(8.14)

it is easy to deduce that


dk = dk0 + dk1 .

HERSKOVITS
JOSE

38

Substitution of this last expression in (8.12) gives


t

( 1)dk0 f (xk )/dk1 f (xk ).

(8.15)

Finally, to get a new feasible primal point and a satisfactory decrease


of the objective function, an inaccurate constrained line search along dk is
done. Different updating rules can be adopted to define a positive k+1 .
In Figure 8, the search direction of an optimization problem with two
design variables and one constraint is illustrated. At xk on the boundary, d0
is tangent to the constraint. In this example, d1 is normal to the boundary.
ALGORITHM 8.1 Newton-like Interior Point Algorithm for inequality constraints.
Parameters. (0, 1) and > 0.
Data. Initialize x 0a , > 0 and S Rnn symmetric and positive
definite.
Step 1. Computation of the search direction d.
i) Solve the linear system for (d0 , 0 )
Sd0 + g(x)O = f (x),

(8.16)

g t (x)d0 + G(x)0 = 0.

(8.17)

If d0 = 0, stop.
ii) Solve the linear system for (d1 , 1 )
Sd1 + g(x)1 = 0,

(8.18)

g t (x)d1 + G(x)1 = .

(8.19)

iii) If dt1 f (x) > 0, set


= inf [ k d0 k2 ; ( 1)dt0 f (x)/dt1 f (x)].

(8.20)

Otherwise, set
= k d0 k2 .

(8.21)

A VIEW ON NONLINEAR OPTIMIZATION

Figure 8.

39

Search direction of Algorithm 8.1

iv) Compute the search direction

and also

d = d0 + d1 ,

(8.22)

= 0 + 1 .

(8.23)

Step 2. Line search.


Find a step length t satisfying a given constrained line search criterion
on the objective functionf and such that
i 0,
gi (x + td) < 0 if

(8.24)

gi (x + td) gi (x)

(8.25)

or
otherwise.
Step 3. Updates.
i) Set

x = x + td

and define new values for > 0 and S symmetric and positive definite.
ii) Go back to Step 1.

HERSKOVITS
JOSE

40

In addition to (8.15), we have from (8.20) and (8.21) that is bounded


above by k d0 k2 . In [31] it is shown that = Ok d0 k2 ; we dont need
larger to ensure that d constitutes a uniformly feasible directions field. The
line search includes conditions (8.24) and (8.25) that force the new primal
point to be in the interior. Moreover, (8.25) prevents the saturation of the
constraints associated to negative dual variables, as it is needed to prove
convergence to Karush-Kuhn-Tucker points [31]. Armijos constrained line
search and Wolfes criterion, given in the previous section, can be easily
modified to take into account these requirements.
Different algorithms can be obtained according to the way of updating
and S. We assume that the updating rule for S is such that Assumption
5.1 is true and that satisfies the following:
Assumption 8.1. There exist positive numbers I , S and such that
0 S and i I for any i such that gi (x) .
The theoretical analysis in [31] includes first a proof that, if xk is a
regular point of the problem, then the solutions of the linear systems (8.16),
(8.17) and (8.18), (8.19) are unique. Then, it is shown that any sequence
{xk } generated by the algorithm converges to a Karush- Kuhn-Tucker point
of the problem, for any way of updating and S, provided that the previous
assumptions are true. We also have that (xk , k0 ) converges to a KarushKuhn-Tucker pair (x , ) and, depending on the way of updating , global
convergence in the dual space can also be obtained.
The matrix S can be defined in the same way as in equality constrained
algorithms discussed in Section 6. In the case of the dual variables , the
following updating rules can be stated in Step 3:
8.1. UPDATE OF

Rule 8.1.
Set, for i = 1, m,
i := sup [0i ; k d0 k2 ].
If gi (x) and i < I , set i = I .

(8.26)
2

The parameters , and I are taken positive. If and I are taken


small enough, after a finite number of iterations i becomes equal to 0i ,
for the active constraints. For the inactive ones, i 0 as xk x . As a
consequence, .

A VIEW ON NONLINEAR OPTIMIZATION

41

If Assumption 5.1 is verified, then defined above satisfies Assumption


3.1. In fact, it can be shown that 0 and d0 are bounded. Then i has a
positive upper bound S .
Far from the solution it is convenient to have a search direction that is
not pointing to the constraints boundary but slipping along their boundary.
In this way the steps will be longer and, consequently, the efficiency of the
algorithm will improve.
This effect can be obtained by increasing the dual variable as the corresponding constraint goes to zero since, as it follows from (2.9), git (x)d0
becomes smaller as 0i grows. The following rule satisfies this requirement
in a very simple way :
Rule 8.2.
Set, for i = 1, m,
i := gi1 (x).
If i > S , set i = S .

(8.27)
2

This rule is based on Dikins algorithm for Linear Programming, presented in [13] and also studied in [46]. In the case of linear programs, since
H(x, ) = 0, we can take S = 0. In [31] it is shown that d0 obtained with
this rule is identical to the search direction given by Dikins algorithm.
8.2. ASYMPTOTIC CONVERGENCE

In the case when (8.26) and BFGS rules are used for and B, the convergence is two-step superlinear, provided that a unit step length is obtained
after a finite number of iterations. This result can be obtained in a similar
way as in Theorem 4.6 in [40], by showing that the search directions of the
present method and of the SQP algorithm locally differ by a term of the
same order as k d0 k2 and then, extending to the present algorithm the
proof of two-step superlinear convergence of the SQP method.
To satisfy the requirement about the step length, d must be such that
Definition 7.1 holds for some > 1. It is clear that increases whenever
grows but, unfortunately, in some problems the upper bound on may be
not large enough to allow the step length to be unitary. This effect, which
is similar to Maratos effect, in theory can produce rates of convergence
lower than superlinear.
In the quasi-Newton algorithm studied in [24], as in Algorithm 7.2,
Maratos effect is avoided by looking at each iteration for a decrease of an
estimate of the Lagrangian, instead of the objective function. An algorithm

HERSKOVITS
JOSE

42

that also solves optimality conditions by means of Newton - like iterates


was presented in [40]. The authors obtained global and local superlinear
convergence by applying a technique presented by Mayne and Polak [38].
First, as in the present approach, a feasible descent direction d is obtained.
Then, the constraints are computed at (x + d) and an approximate projection, x
, of (x + d) on the active constraints is found. Finally, a new primal
variable is determined by doing a search along a parabola which is tangent
to d and contains x
. In this search, a decrease of the objective and the
feasibility of the new iterate are required.
8.3. INCLUDING EQUALITY CONSTRAINTS

In view of Algorithms 6.1 and 8.1, a Newton - like algorithm to solve the
general nonlinear program (2.1) is easily derived, [31]. This algorithm requires an initial point in the interior of the inequality constraints. It generates a sequence {xk } of interior points, with respect to the inequalities, that
converges to a Karush-Kuhn-Tucker point of the problem. In general, the
equalities are active only at the limit. Then, to have the equalities satisfied,
the objective function must be allowed to increase. The auxiliary function
(x, r), defined in Section 6, is taken as objective in the line search. The
algorithm is stated below, with d0 as a descent direction of (x, r) and d
obtained by deflecting d0 with respect to the inequality constraints.
ALGORITHM 8.2 Newton-like Interior Point Algorithm for constrained
optimization.
Parameters. (0, 1), > 0 and r > 0, r Rp
Data. Initialize x 0a , > 0 and S Rnn symmetric and positive
definite.
Step 1. Computation of the search direction d.
i) Solve the linear system for (d0 , 0 , 0 )
Sd0 + g(x)0 + h(x)0 = f (x),
g t (x)d0 + G(x)0 = 0.
ht (x)d0 = h(x)
If d0 = 0, stop.

A VIEW ON NONLINEAR OPTIMIZATION

43

ii) Solve the linear system for (d1 , 1 , 1 )


Sd1 + g(x)1 + h(x)1 = 0,
g t (x)d1 + G(x)1 = .
ht (x)d1 = 0
iii) If ri k0i k, then set ri > k0i k, for i = 1, 2, ..., p.
iv) Let (x, r) = f (x) + rt SG[h(x)]h(x). If dt1 (x, r) > 0, set
= inf [ k d0 k2 ; ( 1)dt0 (x, r)/dt1 (x, r)].
Otherwise, set

= k d0 k2 .

(v) Compute the search direction


d = d0 + d1 ,
and also

= 0 + 1 .

Step 2. Line search.


Find a step length t satisfying a given constrained line search criterion
on the auxiliary function (x, r) and such that
i 0,
gi (x + td) < 0 if
or
gi (x + td) gi (x)
otherwise.
Step 3. Updates.
i) Set

x = x + td

and define new values for > 0 and S symmetric and positive definite.
ii) Go back to Step 1.

HERSKOVITS
JOSE

44

Acknowledgements. I thank Prof. Jasbir S. Arora for his careful review of the manuscript and helpful comments.
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.

12.

13.
14.
15.
16.
17.
18.
19.

Arora, J. S. Introduction to Optimum Design, McGraw-Hill, New York, 1989.


Auatt, S. S. Nonlinear Programming Algorithms for Elastic Contact Problems,
(in Portuguese), MSc. Dissertation, COPPE - Federal University of Rio de Janeiro,
Mech. eng. Program, Caixa Postal 68503, 21945-970, Rio de Janeiro, Brazil, 1993.
Baron, F. J. and Pironneau, O. Multidisciplinary Optimal Design of a Wing Profile, Proceedings of STRUCTURAL OPTIMIZATION 93, Rio de Janeiro, Aug.
1993, J. Herskovits ed..
Baron, F. J. Constrained Shape Optimization of Coupled Problems with Electromagnetic Waves and Fluid Mechanics (in Spanish), D. Sc. Dissertation, University
of Malaga, Spain, 1994.
Baron, F. J., Duffa, G., Carrere, F. and Le Tallec, P. Optimisation de forme en
aerodynamique, CHOCS Revue scientifique et technique de la Direction des Applications Militaires du CEA, France. 1994.
Baron, F. J. Radar Visibility Optimization under Aerodynamic Constraints for
a Wing Profile, Research Report No. 2073, INRIA, BP 105, 78153 Le Chesnay
CEDEX, France, 1994.
Bazaraa, M. S. and Shetty, C. M. Nonlinear Programming, Theory and Algorithms
John Wiley and Sons, New York, 1979.
Beale, E. M. L. Numerical Methods , in Nonlinear Programming, J. Abadie ed,
North - Holland, Amsterdam, 1967.
Dennis, J. E. and More, J. J. Quasi - Newton Methods, Motivation and Theory,
SIAM Review, 19, pp. 46-89,1977.
Dennis, J. E. and Schnabel, R. Numerical Methods for Constrained Optimization
and Nonlinear Equations , Prentice Hall, 1983.
Dew, M. C. A First Order Feasible Directions Algorithm for Constrained Optimization Technical Report No. 153, Numerical Optimization Centre, The Hatfield
Polytechnic, P. O. Box 109, College Lane, Hatfield, Hertfordshire AL 10 9AB, U.
K., 1985.
Dew, M. C. A Feasible Directions Method for Constrained Optimization based on
the Variable Metric Principle Technical Report No. 155, Numerical Optimization
Centre, The Hatfield Polytechnic, P. O. Box 109, College Lane, Hatfield, Hertfordshire AL 10 9AB, U. K., 1985.
Dikin, I. I. About the Convergence of an Iterative Procedure (in Russian), Soviet
Mathematics Doklady, Vol. 8, pp. 674-675, 1967.
Fletcher, R. Practical methods in Optimization, Vol. I and II, John Wiley, Chichester, 1980.
Fox, R. L. Optimization Methods for Engineering Design, Addison-Wesley, Reading, 1973.
Gabay, D. Reduced Quasi - Newton Methods with Feasibility Improvement for
Nonlinearly Constrained Optimization, Mathematical Programming Study 16, pp.
18-44,1982.
Garcia Palomares, U. M. and Mangasarian, O. L., Superlinearly Convergent QuasiNewton Algorithms for Nonlinearly Constrained Optimization Problems, Mathematical Programming 11, pp. 1-13, 1976.
Gilbert, J. C. Superlinear Convergence of a Reduced BFGS Method With Piecewise
Line-search and Update Criterion, Research Report No. 103, INRIA, BP 105, 78153
Le Chesnay CEDEX, France, 1993.
Gill, P. E., Murray, W. and Wright, M. H. Practical Optimization, Academic
Press, New York, 1981.

A VIEW ON NONLINEAR OPTIMIZATION


20.
21.
22.
23.
24.
25.
26.

27.
28.
29.

30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.

45

Haftka, R. T., G
urdal, Z. and Kamat, M. P. Elements of Structural Optimization
, Kluwer Academic Publishers, 1990.
Han, S. P. A Globally Convergent Method for Nonlinear Programming, Journal
for Optimization Theory and Applications, 22, pp. 297-309, 1977.
Haug, E. J. and Arora, J.S. Applied Optimal Design, Wiley Interscience, New
York, 1979.
Herskovits, J. - A Two-Stage Feasible Direction Algorithm for Non-Linear Constrained Optimization, Research Report No. 103, INRIA, 1982.
Herskovits, J. A Two-Stage Feasible Directions Algorithm Including Variable Metric Techniques for Nonlinear Optimization, Research Report No. 118, INRIA, 1982.
Herskovits, J. Developpement dune methode numerique pour loptimisation nonlineaire, (in English), Th`ese de Docteur - Ingenieur en Mathematiques de la
Decision, Paris IX University, Published by INRIA, 1982.
Herskovits, J. A Two-Stage Feasible Directions Algorithm for Nonlinear Constrained Optimizations Using Quadratic Programming Subproblems, 11th IFIP
Conference on System Modeling and Optimization, Copenhagen, Denmark, July
1983.
Herskovits, J.N. and Carvalho, L.A.V. A Successive Quadratic Programming based
Feasible Directions Algorithm Proceedings of the 17th International Conference on
Analysis and Optimization of Systems, Antibes, France, Springer -Verlag, 1986.
Herskovits, J. A Two-Stage Feasible Directions Algorithm for Nonlinear Constrained Optimization, Mathematical Programming, Vol. 36, pp. 19-38, 1986.
Herskovits, J. and Coelho, C.A.B. An Interior Point Algorithm for Structural Optimization Problems, in Computer Aided Optimum Design of Structures: Recent
Advances, Brebbia, C. A. and Hernandez, S. ed., Computational Mechanics Publications, Springer-Verlag, 1989.
Herskovits, J. An Interior Points Method for Nonlinearly Constrained Optimization, in Optimization of Large Structural Systems, Rozvany, G. ed., Vol I, pp.
589-608, NATO/ASI Series, Springer - Verlag, 1991.
Herskovits, J. An Interior Point Technique for Nonlinear Optimization Research
Report No. 1808, INRIA, BP 105, 78153 Le Chesnay CEDEX, France, 1992.
Hiriart-Urruty, J. B. and Lemarechal, C. Convex analysis and Minimization Algorithms , I, II, Springer - Verlag, Berlin Heildelberg, 1993.
Hoyer, W. Variants of Reduced Newton Method for Nonlinear EqualityConstrained
Optimization Problems , Optimization, 17, pp.757-774,1986.
Kirsch, U. Structural Optimization, Fundamentals and Applications, Springer Verlag, Berlin, 1993.
Luenberger, D. G. Optimization By Vector Space Methods, John Wiley and Sons,
New York, 1969.
Luenberger D.G., Linear and Nonlinear Programming, 2nd. Edition, AddisonWesley, Reading, 1984.
Maratos, N. Exact Penalty Functions Algorithms for Finite Dimensional and control Optimization Problems, Ph. D. Dissertation, Imperial college, University of
London, 1978.
Mayne D.Q. and Polak E., Feasible Directions Algorithms for Optimization Problems with Equality and Inequality Constraints, Mathematical Programming 11,
pp. 67-80, 1976.
Minoux, M. Programation Mathematique: Theorie et Algorithmes ,I and II,
Dunod, Paris,1983.
Panier, E.R.,Tits A.L. and Herskovits J. A QP-Free, Globally Convergent, Locally Superlinearly Convergent Algorithm for Inequality Constrained Optimization, SIAM Journal of Control and Optimization, Vol 26, pp 788-810, 1988.
Powell, M.J.D. The Convergence of Variable Metric Methods for Nonlinearly Constrained Optimization Calculations, in Nonlinear Programming 3, edited by O.L.
Mangasarian, R.R. Meyer and S.M. Robinson, Academic Press, London, 1978.

46
42.
43.
44.
45.

46.
47.
48.

49.
50.

HERSKOVITS
JOSE
Powell M. J. D., Variable Metric Methods for Constrained Optimization, in Mathematical Programming - The State of the Art, Edited by A. Bachem, M. Grotschet
and B. Korte, Springer-Verlag, Berlin, 1983.
Rosen, J. The Gradient Projection Method for Nonlinear Programming, I, Linear
Constraints , SIAM J., 8, pp.181-217,1960.
Rosen, J. The Gradient Projection Method for Nonlinear Programming, II, Nonlinear Constraints , SIAM J., 9, pp.514-532,1961.
Tits, L. A. and Zhou J. L. A Simple, Quadratically Convergent Interior Point
Algorithm for Linear Programming and Convex Quadratic Programming, in Large
Scale Optimization: State of the Art, W. W. Hager, D. W. Hearn and P.M. Pardalos
ed., Kluwer Academic Publishers B.V, 1993.
Vanderbei, R. J.and Lagarios, J. C., I. I. Dikins Convergence Result for the Affine
Scaling Algorithm, Technical Report, AT&T Bell Laboratories, Murray Hill, NJ,
1988.
Vanderplaats, G. N. Numerical Optimization Techniques for Engineering Design
with Applications McGraw-Hill , New York, 1984.
Vautier, I., Salaun, M. and Herskovits, J. Application of an Interior Point Algorithm to the Modeling of Unilateral Contact Between Spot-Welded Shells, Proceedings of STRUCTURAL OPTIMIZATION 93, Rio de Janeiro, Aug. 1993, J.
Herskovits ed..
Zouain N.A., Herskovits J.N., Borges L.A. and Feij
oo R.A. - An Iterative Algorithm
for Limit Analysis with Nonlinear Yield Functions, International Journal on Solids
Structures, Vol. 30, n0 10, pp. 1397-1417, Gt. Britain, 1993.
Wilson, R. B. A Simplicial Algorithm for Concave Programming , Ph.D. Dissertation, Harvard University Graduate School of Business Administration, 1963.

You might also like