Optimization Methods and Software

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

This article was downloaded by: [University of South Carolina ]

On: 30 June 2013, At: 12:21


Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered
office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Optimization Methods and Software


Publication details, including instructions for authors and
subscription information:
http://www.tandfonline.com/loi/goms20

An adjoint-based SQP algorithm with


quasi-Newton Jacobian updates for
inequality constrained optimization
a b c
Moritz Diehl , Andrea Walther , Hans Georg Bock & Ekaterina
d
Kostina
a
OPTEC&ESAT-SCD, Katholieke Universiteit Leuven, Leuven,
Belgium
b
Institut für Mathematik, Universität Paderborn, Paderborn,
Germany
c
Interdisciplinary Centre for Scientific Computing, University of
Heidelberg, Heidelberg, Germany
d
Department of Mathematics and Computer Science, University of
Marburg, Marburg, Germany
Published online: 11 Aug 2009.

To cite this article: Moritz Diehl , Andrea Walther , Hans Georg Bock & Ekaterina Kostina (2010):
An adjoint-based SQP algorithm with quasi-Newton Jacobian updates for inequality constrained
optimization, Optimization Methods and Software, 25:4, 531-552

To link to this article: http://dx.doi.org/10.1080/10556780903027500

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-


conditions

This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation
that the contents will be complete or accurate or up to date. The accuracy of any
instructions, formulae, and drug doses should be independently verified with primary
sources. The publisher shall not be liable for any loss, actions, claims, proceedings,
demand, or costs or damages whatsoever or howsoever caused arising directly or
indirectly in connection with or arising out of the use of this material.
Downloaded by [University of South Carolina ] at 12:21 30 June 2013
Optimization Methods & Software
Vol. 25, No. 4, August 2010, 531–552

An adjoint-based SQP algorithm with quasi-Newton Jacobian


updates for inequality constrained optimization
Moritz Diehla , Andrea Waltherb *, Hans Georg Bockc and Ekaterina Kostinad
Downloaded by [University of South Carolina ] at 12:21 30 June 2013

a OPTEC&ESAT-SCD, Katholieke Universiteit Leuven, Leuven, Belgium; b Institut für Mathematik,


Universität Paderborn, Paderborn, Germany; c Interdisciplinary Centre for Scientific Computing,
University of Heidelberg, Heidelberg, Germany; d Department of Mathematics and Computer Science,
University of Marburg, Marburg, Germany

(Received 12 October 2006; final version received 3 March 2009 )

We present a sequential quadratic programming (SQP) type algorithm, based on quasi-Newton approxima-
tions of Hessian and Jacobian matrices, which is suitable for the solution of general nonlinear programming
problems involving equality and inequality constraints. In contrast to most existing SQP methods, no eval-
uation of the exact constraint Jacobian matrix needs to be performed. Instead, in each SQP iteration only
one evaluation of the constraint residuals and two evaluations of the gradient of the Lagrangian function are
necessary, the latter of which can efficiently be performed by the reverse mode of automatic differentiation.
Factorizations of the Hessian and of the constraint Jacobian are approximated by the recently proposed
STR1 update procedure. Inequality constraints are treated by solving within each SQP iteration a quadratic
program (QP), the dimension of which equals the number of degrees of freedom. A recently proposed
gradient modification in these QPs takes account of Jacobian inexactness in the active set determination.
Superlinear convergence of the procedure is shown under mild conditions. The convergence behaviour of
the algorithm is analysed using several problems from the Hock–Schittkowski test library. Furthermore,
we present numerical results for an optimization problem based on a small periodic adsorption process,
where the Jacobian of the equality constraints is dense.

Keywords: automatic differentiation; inequality constraints; sequential quadratic programming; inexact


newton methods; nonlinear optimization; quasi-newton updates

AMS (MOS): 49M37; 90C53; 65K05

1. Introduction

In this paper, we propose a sequential quadratic programming (SQP) type algorithm to solve
general nonlinear programming problems that are given in the form
min f (x) s.t. c(x) = 0, Gx ≤ b, (1)
x∈Rn

with twice differentiable nonlinear objective and constraint functions f : Rn → R and c : Rn →


Rm . The linear inequality constraints are specified by a matrix G ∈ Rp×n and a vector b ∈ Rp .

*Corresponding author. Email: Andrea.Walther@uni-paderborn.de

ISSN 1055-6788 print/ISSN 1029-4937 online


© 2010 Taylor & Francis
DOI: 10.1080/10556780903027500
http://www.informaworld.com
532 M. Diehl et al.

In contrast to the majority of SQP type algorithms, our algorithm does not evaluate the Jaco-
bian of the equality constraints exactly in each iteration, but approximates it by a quasi-Newton
update procedure first proposed in [13]. While this approach, as well as related inexact SQP type
methods [4,16], have so far only been applicable to equality constrained problems, recently it was
discovered that inequalities could be handled by SQP type methods with inexact Jacobians by a
simple gradient modification in the quadratic programming (QP) subproblems [3]. In this paper,
we study a combination of the adjoint-based quasi-Newton concept presented in [13] with the
gradient modification proposed in [3]. The main feature of this new algorithm is that one does
not need to form and factor the Jacobian of the equality constraints. Especially for optimization
problems where the Jacobian of the equality constraints is dense, this may allow a considerable
reduction of the computing time. Hence, we aim at the solution of optimization problems with
moderate size but a special structure of the constraint Jacobian. The corresponding applications
Downloaded by [University of South Carolina ] at 12:21 30 June 2013

cover a wide range of periodic adsorption processes including, for example, the purification of
hydrogen. In these cases, the Jacobian of the equality constraints is dense due to the periodicity of
the considered process. As a consequence, the runtime needed for the optimization process may
be dominated significantly by the computation of the dense Jacobian and its factorization [19].
The proposed SQP-type algorithm with superlinear local convergence properties is applicable
to general nonlinear programming problems (NLPs) with inequality constraints, since one can
convert any NLP into the form (1) by introducing suitable slacks.
SQP methods that use inexact Jacobians or that inexactly solve the linear subproblems have
often been considered for equality-constrained optimization, [16,18, or the references therein].
Quasi-Newton update procedures for the Hessian of the Lagrangian are widely employed since
the classical papers by Han [15] and Powell [21]. However, the use of quasi-Newton updates for
both, the Hessian and the constraint Jacobian was first proposed in [13] for equality-constrained
problems to avoid the forming and factoring of the constraint Jacobian matrix.
The treatment of inequalities in the presence of inexact Jacobians poses the question of correct
active set determination, and usually it is assumed that the gradients are sufficiently accurately
computed to ensure correct multiplier signs, or that interior point methods are used, see e.g. the
articles on inequality constrained optimization in [1]. In contrast to these approaches, the SQP
type methods considered here can cope with wrong constraint gradients as long as an accurate
gradient of the Lagrangian is computed.
The outline of this paper is as follows. In Section 2, we describe the algorithm for solving
optimization problems with arbitrary equality and linear inequalities in detail. We prove the
stability of the active set and the local convergence of the iterates under the assumption that
the approximate derivative information satisfies certain accuracy requirements in Section 3. It is
shown in Section 4 that the required quality of the derivative approximations can be achieved. First
numerical results are stated in Section 5 to illustrate the applicability of the proposed algorithm
for the Hock–Schittkowski problems [17] contained in the CUTE test set [5]. Furthermore, we
apply the proposed optimization algorithm to a small simulated moving bed (SMB) process as
one possible test case for periodic optimal control problems with a dense Jacobian of the equality
constraints. Finally, conclusions and ideas for further developments are presented in Section 6.

2. An SQP algorithm with inexact derivatives

In general, SQP algorithms aim at finding a local minimizer of Equation (1) by iteratively
refined guesses of the primal variables x ∈ Rn and of the dual variables λ ∈ Rm . Here, the
main idea is to find a root of the necessary optimality conditions of the considered optimization
problem.
Optimization Methods & Software 533

For our purpose to solve an optimization problem with inequality constraints by using only
inexact derivative matrices, we define a ‘truncated Lagrangian’ L by

L(x, λ) = f (x) + λT c(x), (2)

disregarding inequalities and their multipliers. Furthermore, we say that the ith inequality is
active at a point x iff (Gx − b)i = 0. We denote the matrix of rows within G that correspond to
active inequalities by the shorthand Ga . We say that the linear independence constraint qualifi-
cation (LICQ) holds at a point x iff the row vectors of ∇x c(x ∗ )T and Ga are linearly independent.
Here and throughout this paper, we use the notation ∇x c(x) := ∂c/∂x(x)T . By  ·  we will denote
the Euclidean norm. Necessary conditions for optimality can then be formulated as follows [20].

Theorem 2.1 (Karush–Kuhn–Tucker (KKT) Conditions) If x ∗ is a local minimum of


Downloaded by [University of South Carolina ] at 12:21 30 June 2013

Equation (1) and LICQ holds at x ∗ , then there exist unique multipliers λ∗ , μ∗ such that

∇x L(x ∗ , λ∗ ) + GT μ∗ = 0, (3a)
c(x ∗ ) = 0, (3b)

Gx ≤ b, (3c)

μ ≥ 0, (3d)
μ∗i (Gx ∗ − b)i = 0, i = 1, 2, . . . , p. (3e)

In the remainder of this section, we will first review an existing adjoint-based quasi-Newton
algorithm for the solution of problems without inequalities [13]. Here, the KKT conditions
simplify to

∇x L(x ∗ , λ∗ ) = 0, (4a)
c(x ∗ ) = 0. (4b)

The aim of the Newton-type algorithm is to find a point satisfying this system of nonlinear
equations. In a second step, we present our novel SQP algorithm, which is able to treat inequality
constraints, i.e. to find points x ∗ satisfying the full set of KKT conditions (3). The algorithm uses
the same quasi-Newton update procedure as the algorithm for equality constraints, but is based
on successive solution of the QPs, where a modified gradient is used within each QP to account
for the inexactness of the derivatives.

2.1 Equality constrained optimization: STR1 update

If one has to solve an equality-constrained problem of the form (1) in the absence of inequality
constraints, one may apply Newton’s method to the corresponding KKT conditions (4). A full
step Newton’s method would iterate

x k+1 = x k + s k , λk+1 = λk + σ k ,

where in each iteration k, the Newton step (s k , σ k ) is given as a solution of


 2    k k 
∇xx L(x k , λk ) ∇x c(x k ) s k g(x , λ )
=− . (5)
∇x c(x k )T 0 σk c(x k )

Here, g(x, λ) := ∇f (x) + ∇x c(x)λ is the gradient of the Lagrange function (2) with respect to
x. A SQP method based only on quasi-Newton approximations Bk and Ak of ∇xx 2
L(x k , λk ) and
534 M. Diehl et al.

∇x c(x k ) was first presented in [13]. In order to approximate the second-order information that is
required for solving the KKT system (5), the well-known SR1 update given by

(w k − Bk s k )(w k − Bk s k )T
Bk+1 = Bk + εk (6)
(w k − Bk s k )T s k

is employed, with s k := x k+1 − x k . The matrix Bk+1 is chosen to satisfy the secant condition

Bk+1 s k = wk := g(x k + s k , λk ) − g(x k , λk ) ∈ Rn .

The parameter εk in Equation (6) allows a modification of the SR1 update such that the norm
of the Hessian approximation remains bounded. Furthermore, it can be used to avoid negative
curvature of the next Hessian approximation [14]. For most SQP algorithms, [6,22], it is necessary
Downloaded by [University of South Carolina ] at 12:21 30 June 2013

that the full Jacobian of the constraints is available using either numerical differentiation or ana-
lytical methods. However, only vector×Jacobian products are required by the updating algorithm
proposed in [13]. This adjoint information can be computed efficiently using either an approach
based on continuous adjoint equations if the underlying model is based on differential equations
or the technique of automatic differentiation [11]. As a result, one obtains a quasi-Newton update
also for the Jacobian of the constraints, avoiding the possibly very time-consuming construction
of the full Jacobian in each optimization step. For that purpose, the TR1 update is defined as

(y k − Ak s k )(γ k − ATk σ k )T
Ak+1 = Ak + (7)
(γ k − ATk σ k )T s k

to satisfy both the direct secant condition

Ak+1 s k = y k := c(x k + s k ) − c(x k ) ≈ ∇x c(x k )T s k

as well as the adjoint or transposed secant condition

ATk+1 σ k ≈ γ k := g(x k + s k , λk + σ k ) − g(x k + s k , λk ) ≈ ∇x c(x k + s k )σ k

approximately with σ k := λk+1 − λk ∈ Rm . To compute the vector×Jacobian product required


for the update and to evaluate the right-hand side of (5), one can apply once more either adjoint
differential equations, if possible, or automatic differentiation.
Note that both the direct and the adjoint secant condition will, in general, not be exactly
consistent unless the constraint function c(x) is affine.

2.2 Treatment of inequalities

To handle inequalities, we follow an idea from [3] and formulate in each SQP iteration for given
iterates (x k , λk ) and for given Jacobian and Hessian approximations Ak and Bk the QP
1 T
min s Bk s + (mk )T s s.t. Ak s = −c(x k ), Gs ≤ b − Gx k , (8)
s ∈ Rn 2

where mk is a modification of the objective gradient, namely


 
mk := ∇f (x k ) + ∇x c(x k ) − ATk λk .

In this definition, the rightmost term can be interpreted as a correction to the exact objective
gradient ∇f (x k ) that is usually employed in SQP methods. The correction term accounts for the
Optimization Methods & Software 535

fact that the constraint Jacobian ∇x c(x k ) is not exactly known within the QP. As the Jacobian
of the inequalities is correctly known in the QP, no correction with respect to the inequalities
needs to be added, in contrast to the generic NLP algorithm proposed in [3]. Using adjoint
differentiation techniques, as before, the modified gradient mk can efficiently be computed as
mk = ∇x L(x k , λk ) − ATk λk avoiding the expensive computation of ∇x c(x k ).
The solution of Equation (8) yields an optimal step s k and corresponding equality and inequality
multipliers λkQP and μkQP . As the inequality multipliers are neither needed for the computation of
the modified gradient nor for the Hessian update, since the linear inequalities do not contribute
to the Hessian ∇xx2
L, we discard them altogether in our algorithm. Defining the multiplier step
σ := λQP − λ , the full step method would then iterate, again, as
k k k

x k+1 = x k + s k , λk+1 = λk + σ k .
Downloaded by [University of South Carolina ] at 12:21 30 June 2013

2.3 Efficient setup of the algorithm

First, we observe that the primal and dual steps s k and σ k can be obtained from the solution of a
slight modification of the original QP Equation (8), namely from the following equivalent QP
1 T
min s Bk s + ∇x L(x k , λk )T s s.t. Ak s = −c(x k ), Gs ≤ b − Gx k . (9)
s∈Rn 2
Here, we directly obtain the equality multiplier step σ k from the QP equality multipliers. In the
overall SQP algorithm that we propose, this QP subproblem is solved in each iteration to obtain
s k and σ k .

2.3.1 Factorized nullspace representation

To avoid an expensive factorization of the matrix Ak in each SQP iteration, we directly maintain
a factorized nullspace representation of the approximated derivatives. Hence, we do not update
Ak and Bk directly, but the matrices
Lk ∈ Rm×m , Yk ∈ Rn×m , Zk ∈ Rn×d , Mk ∈ Rd×d , Dk ∈ Rm×n
with d = n − m, Yk orthonormal and Lk lower triangular that are defined by
Mk = ZkT Bk Zk , Ak = Lk YkT , YkT Zk = 0, Dk = YkT Bk .
The damping parameter εk in Equation (6) is chosen such that the approximation of the reduced
Hessian, i.e. Mk , is positive definite. The corresponding updating formulas for the five matrices
can be found in [14].
The factorization of Ak allows us to consider a step vector s k of the form
s k = Yk syk + Zk szk ,
where the range space step syk is given by

syk = −L−1 k
k c(x ),

and the null space step szk is determined by the solution of a reduced QP, i.e. a QP in a space of
dimension d = n − m, that is equivalent to the QP (9), namely
1 T  
min sz Mk sz + szT ZkT g(x k , λk ) + DkT syk (10)
sz ∈Rd 2
s.t. GZk sz ≤ b − Gx k − GYk syk . (11)
536 M. Diehl et al.

Note that solving the reduced QPs (10) and (11) yields not only szk but also the Lagrange multipliers
μkQP of the inequalities. Then we can compute σ k using the formula

σ k = L−T (YkT GT μkQP − YkT g(x k , λk ) − Dk s k ). (12)

Once s k and σ k are computed from the QP solution, the new iterate is given by

x k+1 = x k + tk s k , λk+1 = λk + tk σ k ,

where tk ∈ (0, 1] is determined by a line search. The presented approach is summarized in the
following algorithm:

Algorithm 1
Downloaded by [University of South Carolina ] at 12:21 30 June 2013

Start: Choose an initial point x 0 and initial matrices M0 , L0 , Y0 , Z0 , D0


for k = 0, 1, . . .
1. Compute syk = −L−1 k
k c(x )
k k
2. Compute sz and μQP by solving the reduced QP (10) and (11)
3. Set s k = Yk syk + Zk szk
4. Compute σ k from Equation (12)
5. Perform a line search to determine tk and set
x k+1 = x k + tk s k and λk+1 = λk + tk σ k
6. Update the factors Mk+1 , Lk+1 , Yk+1 , Zk+1 , Dk+1 .

To keep the quasi-Newton approximations well defined, we perform the update (6) in step 6
only if the inequality

|(w k − Bk s k , s k )| ≥ c1 w k − Bk s k  s k  (13)

holds for the SR1-update with c1 ∈ (0, 1), where (., .) denotes the inner product. Here and
throughout, we denote by (x, y) the scalar product x T y. The update (7) is used only if the inequality

|(γ k − ATk σ k , s k )| ≥ c1 σ k  ρ k  (14)

is valid, where ρ k = y k − Ak s k and

(γ k − ATk σ k , s k ) = ((∇x c(x k + s k ) − ATk )σ k , s k ) ≈ (σ k , ρ k ).

3. Local convergence with inexact derivatives

We investigate first the question if the active set in the QP subproblems is stable in the presence
of inexact Jacobians, and under which circumstances we can ensure local convergence.

Theorem 3.1 (Stability of QP active set near NLP solution) Assume that vectors x ∗ , λ∗ are
given and
(i) at x ∗ LICQ holds, and there exists a μ∗ such that (x ∗ , λ∗ , μ∗ ) satisfies the KKT condi-
tions (3a)–(3e).
(ii) at x ∗ strict complementarity holds, i.e. the multipliers μ∗a of the active inequalities Ga x ∗ = ba
satisfy μ∗a > 0, where Ga is a matrix consisting of all rows of G that correspond to the active
inequalities in x ∗ .
Optimization Methods & Software 537

Then for any constant α > 0 there exists a neighbourhood N of (x ∗ , λ∗ ) and a C > 0 such that
for all (x, λ) ∈ N the following statement holds. For any two matrices A, B with B positive
semidefinite on the null space of A, and such that the matrix
⎡ ⎤
B AT GTa
J (A, B) := ⎣ A ⎦ (15)
Ga

is invertible and satisfies J (A, B)−1  ≤ α, the QP (9) has a unique solution (s, σ, μ) that
satisfies
(s, σ, μ − μ∗ ) ≤ C(x − x ∗  + λ − λ∗ ). (16)
Downloaded by [University of South Carolina ] at 12:21 30 June 2013

This QP solution has the same active set as the NLP solution x ∗ .

Proof The linear system in the variables (s, σ, μact ) given by


⎡ ⎤ ⎡ ⎤
Bs + AT σ + GTact (μact − μ∗act ) s
⎣ As ⎦ = J (A, B) ⎣ σ ⎦
Gact s μact − μ∗act
⎡ ⎤
−∇x L(x, λ) − Gact T μ∗act
=⎣ −c(x) ⎦
bact − Gact x

is uniquely solvable, if J (A, B) is invertible. Moreover, for (x, λ) = (x ∗ , λ∗ ) the solution is given
by (0, 0, μ∗act ) because ∇x L(x ∗ , λ∗ ) + Gact T μ∗act = 0, c(x ∗ ) = 0, bact − Gact x ∗ = 0. As the norm
of the inverse of J (A, B) is bounded, the size of (s, σ, μact − μ∗act ) is bounded by α times the
norm of the change in the right-hand side, and because the right-hand side depends differentiably
on (x, λ) we find C > 0 such that (s, σ, μact ) − (0, 0, μ∗act ) ≤ C(x − x ∗  + λ − λ∗ ). When
μact is augmented to the vector μ by suitable zeros at the positions of the nonactive constraints, we
obtain Equation (16). To show that (s, σ, μ) is indeed the unique solution of the QP (9), we first
deduce from the fact that B is assumed to be positive semidefinite on the nullspace of A that the QP
is convex. Moreover, μact > 0 if N is chosen sufficiently small, due to strict complementarity in
the NLP solution. Likewise, for the inactive inequalities, the fact that Ginact x − binact < 0 implies
that Ginact (x + s) − binact < 0 for N sufficiently small. Therefore, (s, σ, μ) satisfies the KKT
conditions of the QP, namely

Bs + ∇x L(x, λ) + AT σ + GT μ = 0,
c(x) + As = 0,
G(x + s) ≤ b,
μ ≥ 0,
μi (G(x + s) − b)i = 0, i = 1, 2, . . . , nb ,

with the same active set as the NLP solution and with strict complementarity. Thus s is a global
minimizer of the convex QP. This minimizer is unique if the Hessian
A of the QP,T B, is positive
definite on the null space of the Jacobian
of the
active constraints, G act
, i.e. if z Bz > 0 holds
for any vector z ∈ Rn , z
= 0 such that GAact z = 0. This is indeed the case, as zT Bz ≥ 0 due
A, and z Bz
= 0 must hold because otherwise
T
to positive semidefiniteness on the null space
zof
J (A, B) would be singular, due to J (A, B) 0 = 0. 
538 M. Diehl et al.

3.1 Local convergence for fixed active set

The fact that the active set remains stable near a solution allows us to establish conditions for local
convergence of our SQP method with inexact Jacobians. In this subsection, we will assume that
the active set is fixed, and we will only consider the full step method with tk = 1 in all iterations.
We observe that the iterates (x k , λk ) for a fixed active set can be interpreted as iterates of an
inexact Newton method towards the solution y ∗ = (x ∗ , λ∗ ) of the nonlinear system
⎡ T ⎤
  N ∇x L(x, λ)
x
F (y) = 0 with y := and F (y) := ⎣ c(x) ⎦. (17)
λ
Ga x − b a

Here, N ∈ Rn×q is a matrix of q := n − pa orthonormal column vectors such that Ga N = 0.


Downloaded by [University of South Carolina ] at 12:21 30 June 2013

Introducing this matrix allows us to disregard the active inequality multipliers μa from the iteration.
For given y k , we iterate
⎡ T ⎤
N Bk N T ATk
y k+1 = y k − Jk−1 F (y k ) with Jk = ⎣ Ak ⎦. (18)
Ga
The matrix Jk shall be a suitable approximation of the matrix
⎡ T 2 ⎤
N ∇xx L(x k , λk ) N T ∇x c(x k )
∂F k
(y ) = ⎣ ∇x c(x k )T ⎦. (19)
∂y G a

The following lemma is a variant of standard results that have similarly been formulated, e.g.
in [2,8].

Lemma 3.1 (Local convergence) Assume that F : D → Rny , D ⊂ Rny open, is continuously
differentiable and consider the sequence

y k+1 = y k + y k , y k = −Jk−1 F (y k ),

starting with some y 0 ∈ D. Let us make the following assumptions:


(i) The sequence of invertible matrices Jk is uniformly bounded and has uniformly bounded
inverses.
(ii) There exists a κ < 1 such that for all k ∈ N it can be guaranteed that

−1 ∂F k
J − + k
≤ κ y , ∀t ∈ [0, 1].
k k
k+1 k J (y t y ) y (20)
∂y
(iii) The set
  
  y 0 
B y 0 /(1−κ) (y 0 ) = y ∈ Rny y − y 0  ≤
1−κ
is contained in D.
Then the sequence (y k )k∈N remains in D and converges towards a point y ∗ ∈ B satisfying F (y ∗ ) =
0. Moreover, if
−1
J (Jk − (∂F /∂y)(y ∗ )) y k
k+1
lim = 0, (21)
k→∞  y k 
the convergence rate is q-superlinear.
Optimization Methods & Software 539

Proof We start by observing that


−1 −1
 
y k+1 = −Jk+1 F (y k+1 ) = −Jk+1 F (y k+1 ) − F (y k ) + F (y k )
 1 
−1 ∂F  k 
= −Jk+1 y + t y y dt + F (y )
k k k
0 ∂y
 1 
−1 ∂F  k 
=− Jk+1 y + t y k y k + F (y k ) dt.
0 ∂y

From Assumption (20), we conclude that  y k+1  ≤ κ y k  and, therefore, for arbitrary k ≥
m ≥ 0,
 y m 
y k − y m  ≤
Downloaded by [University of South Carolina ] at 12:21 30 June 2013

,
1−κ
i.e. y k is a Cauchy sequence that remains in the closed set B and thus converges towards a point
ỹ ∗ ∈ B. Because of Assumption (20) and uniform boundedness of Jk , we also have
∂F  k  k
lim F (y k ) = lim − y y = 0. 
k→∞ k→∞ ∂y

3.2 Local convergence and stability of an active set

As a consequence of Theorem 3.1 and Lemma 3.2, we obtain the following result.

Theorem 3.2 (Stability of active set, local convergence) Let the vectors x ∗ , λ∗ be given and
assume that:
(i) at x ∗ the LICQ holds, and there exists a μ∗ such that (x ∗ , λ∗ , μ∗ ) satisfies the KKT
conditions (3a)–(3b);
(ii) at x ∗ strict complementarity as defined in Theorem 3.1 holds;
(iii) there are two sequences of uniformly bounded matrices (Ak ), (Bk ), each Bk positive semidef-
inite on the null space of Ak , such that the sequence of matrices (Jk ) defined in Equation (18)
is uniformly bounded and invertible with the uniformly bounded inverse;
(iv) there is a sequence y k := [x k , λk ]T generated according to

y k+1 = y k + y k with y k := [s k , σ k ]T ,

where s k is the solution of the QP


1
minn s T Bk s + ∇x L(x k , λk )T s
s∈R 2

s. t. Ak s = −c(x k ), Gs ≤ b − Gx k (22)

and σ k is the multiplier vector of the equality constraints in the QP solution;


(v) there exists a κ < 1 such that for all k ∈ N it can be guaranteed that

−1
J ∂F  k
k
k+1 Jk − ∂y y + t y y ≤ κ y k , ∀t ∈ [0, 1].
k
(23)

Then there exists a neighbourhood N̄ of (x ∗ , λ∗ ) such that for all initial guesses (x 0 , λ0 ) ∈ N̄
the sequence (x k , λk ) converges q-linearly towards (x ∗ , λ∗ ) with rate κ, and the solution of each
QP (22) has the same active set as x ∗ .
540 M. Diehl et al.

Proof We start by noting that the boundedness of Jk implies also boundedness of J (Ak , Bk ) as
defined in Equation (15) for Theorem 3.1. Conversely, the inverse J (Ak , Bk )−1 is also defined
and bounded, as it can easily be shown that
⎡ T ⎤
  N
Jk−1 ⎢ Im ⎥
J (Ak , Bk )−1 = ⎢ ⎥,
(Ga Ga ) (Ga Bk |Ga ATk )Jk−1 (Ga GTa )−1 ⎣
T −1 Ip ⎦
Ga

which is uniformly bounded due to uniform boundedness of Jk−1 , Ak , Bk and invertibility of


the constant matrix Ga GTa , which follows from the LICQ assumption. Therefore, we can apply
Theorem 3.1.
Downloaded by [University of South Carolina ] at 12:21 30 June 2013

We continue by choosing a neighbourhood N according to Theorem 3.1, such that the stability
of the active set in the QP (22) is guaranteed for the given bounds on Jk−1 whenever (x k , λk ) ∈ N .
Then, the QP solution is given by
⎡ T ⎤
 k N ∇x L(x k , λk )
s
= −Jk−1 ⎣ c(x k ) ⎦.
σk
Ga x − ba

It remains to determine the neighbourhood N̄ of (x ∗ , λ∗ ) such that we can guarantee that the ball
  
n+m   y 0 
B y 0 /(1−κ) (y ) := y ∈ R
0
y − y  ≤
0
1−κ

is contained in the set N for all (x 0 , λ0 ) ∈ N̄ . Because N is a neighbourhood of y ∗ = (x ∗ , λ∗ ),


there is an > 0 such that B (y ∗ ) ⊂ N . Because F (y ∗ ) = 0 the first step y 0 = −J0−1 F (y 0 ) can
be made arbitrarily small if the distance y 0 − y ∗  is small enough, due to the differentiability
of F and boundedness of J0−1 . Thus, for the given > 0 we can find a δ such that y 0 − y ∗  +
 y 0 /(1 − κ) ≤ , whenever y 0 − y ∗  ≤ δ. Hence, for all y 0 ∈ Bδ (y ∗ ) we can guarantee that
B y 0 /(1−κ) (y 0 ) ⊂ B (y ∗ ) ⊂ N . 

Theorem 3.3 (Superlinear convergence) If the equality


 T   T 2 
N Bk N N T ATk N ∇xx L(x ∗ , λ∗ )N N T ∇x c(x ∗ )T
lim = , (24)
k→∞ Ak N 0 ∇x c(x ∗ )N 0

holds in addition to the assumptions of Theorem 3.3, then the convergence rate is q-superlinear.

Proof We will show that Equation (21) holds so that q-superlinear convergence follows from
Lemma 3.2. First, we observe that the matrices
 T   
N Bk N N T ATk
N
Mk := = Iq+m 0 Jk
Ak N 0 Im

have inverses    
NT Iq+m
Mk−1 = Jk−1
Im 0
that are uniformly bounded due to Assumption (iii) in Theorem 3.3. Defining the shorthand
 T 2 
N ∇xx L(x ∗ , λ∗ )N N T ∇x c(x ∗ )T
M∗ :=
∇x c(x ∗ )N 0
Optimization Methods & Software 541

−1
from Equation (24) we therefore obtain limk→∞ Mk+1 (Mk − M ∗ ) = 0. Furthermore, we note that
for k ≥ 1 all iterations are performed in the null space of Ga , i.e.
  T     
N N −1 Iq+m N −1
y k = y k and Jk+1 = Mk+1 .
Im Im 0 Im
Because of    
−1 ∂F ∗ N −1 Iq+m
Jk+1 Jk − (y ) = Jk+1 (Mk − M∗ ),
∂y Im 0
we obtain −1
J (Jk − ∂F /∂y(y ∗ )) y k
k+1
≤ Mk+1
−1
(Mk − M ∗ ) → 0. 
y k
Downloaded by [University of South Carolina ] at 12:21 30 June 2013

Remark The convergence theory for inexact, inequality constrained SQP methods in this section
is also, without quasi-Newton schemes, a useful tool for efficient algorithms. In particular, in the
context of nonlinear model predictive control (NMPC), where a series of neighbouring problems
is solved, it is often possible to keep the Hessian and Jacobian just constant for different prob-
lems [25]. A detailed analysis of the use of adjoint-based SQP methods for nonlinear optimal
control and NMPC has been performed in [24], where detailed proofs of very similar results as
in this section are given. Recent work on adjoint based NMPC [26] shows that the adjoint-based
methods can efficiently be coupled with online active set strategies for QP solution as presented
in [10].

4. Local convergence of STR1 update

The local convergence result for the STR1 update presented in this section is mainly based on the
ideas already used in [7]. Since we know that the active set will finally be fixed, we are interested
in matrix convergence only on a subspace. We are using again q := n − pa and the orthonormal
column matrix N ∈ Rn×q that spans the null space of Ga , i.e. N T N = Iq and Ga N = 0. We do
not need full steps tk = 1 for the matrix convergence results, therefore we define s k := x k+1 − x k
in this section. For the convergence proof of this section, we will use the following assumptions:

(AS1) The Lagrangian (2) is twice continuously differentiable.


(AS2) The function ∇xx 2
L(x, λ) is Lipschitz continuous, i.e. there exists a constant c1
such that for all (x1 , λ1 ), (x2 , λ2 ) ∈ Rn+m
∇xx
2
L(x1 , λ1 ) − ∇xx 2
L(x2 , λ2 ) ≤ c2 (x1 , λ1 ) − (x2 , λ2 ).
(AS3) The function ∇x c(x) is Lipschitz continuous, i.e. there exists a constant c3
such that for all x1 , x2 ∈ Rn
∇x c(x1 ) − ∇x c(x2 ) ≤ c3 x1 − x2 
(AS4) The iterates {(x k , λk )} generated by Algorithm 1 converge to a limit (x ∗ , λ∗ ).
(AS5) There is k0 such that the active set is stable for all iterates k ≥ k0 .
(AS6) The sequence {s k } is uniformly linearly independent in the null space of Ga , i.e.
there exist c4 > 0 and i ≥ q such that, for each k ≥ k0 , one can find
k k
q distinct indices kj with k ≤ k1 < · · · < kq ≤ k + i, sNj ∈ Rq , s kj = N sNj ,
j = 1, . . . , q, and
 minimum singular
k
 value σmin (SN ) of the matrix
k1 kq
sN s
SNk := , · · · , Nkq
sNk1  sN 
is bounded below by c4 , i.e. σmin (SNk ) ≥ c4 .
(AS7) For all k ≥ k2 , one has |εk | = 1 in Equation (6).
542 M. Diehl et al.

Assumptions (AS1) – (AS4) are moderate and used widely for the proof of convergence,
where (AS2) and (AS3) are assumed to hold globally. Since we examine only local superlinear
convergence, we ensure the convergence of the iterate by (AS4). Assumption (AS5) holds due to
the results presented in the last section, assumption (AS6) is rather strong. Near a solution, one
will have |εk | = 1. Therefore, for a convergent sequence of iterates {(x k , λk )}, assumption (AS7)
should be fulfilled for k2 large enough. Then, we can prove the following statement.

Theorem 4.1 Assume that the assumptions (AS1)–(AS7) hold, where {(x k , λk )} is a sequence
of iterates generated by Algorithm 1. Furthermore let Equation (13) and (14) hold for every
iteration. Then Equation (24) holds.

This convergence result together with the local convergence proved in Section 3 will yield the
Downloaded by [University of South Carolina ] at 12:21 30 June 2013

local superlinear convergence of the iterates generated by Algorithm 1. The proof of Theorem 4.1
consists of three parts. First, we will show that Ak N converges to ∇x c(x ∗ )N , then we will show
that Bk N converges to ∇xx 2
L(x ∗ , λ∗ )N. In a third step, both results are combined to prove the
stated convergence of the approximate KKT matrix.

4.1 Convergence of the matrices Ak

As a first step, we consider the approximations {Ak } of the Jacobians. Here, we will employ ideas
used in the proof of Lemma 1 in [7]. Then, it is possible to show the following result.

Lemma 4.1 Assume that the assumptions (AS1) and (AS3) hold. Let {(x k , λk )} be a sequence of
iterates generated by Algorithm 1 where Equation (14) holds for every iteration. Then
i−k
c3 2
y k − Ai s k  ≤ +1 ηi,k s k  (25)
c1 c1
for all i ≥ k + 1 where

ηi,k := max{x p − x s  | k ≤ s ≤ p ≤ i}. (26)

Proof The proof of this lemma is by induction. For i = k + 1 one has y k − Ai s k = 0 because
of Equation (7). Now we assume that Equation (25) holds for all j = k + 1, . . . , i. Using
Equation (25) and the definition of ρ k , we obtain
 i k 
 (τ , s ) 
y k − Ai+1 s k  ≤ y k − Ai s k  +  i i  ρ i ,
(τ , s )

where τ i = γ i − ATi σ i . Exploiting Equation (25) we can derive that

|(τ i , s k )| ≤ |(γ i − ATi σ i , s k )| ≤ |(γ i , s k ) − (σ i , y k )| + |(σ i , y k ) − (σ i , Ai s k )|


i−k
c3 2
≤ |(γ i , s k ) − (σ i , y k )| + +1 ηi,k σ i s k .
c 1 c1
Furthermore, it follows with the mean-value theorem that
  
 1 
|(γ i , s k ) − (σ i , y k )| = (σ i )T ∇x c(x i + s i )s k − (σ i )T ∇x c(x k + ts k ) dt s k 
0

≤ c3 ηi+1,k σ s .
i k
Optimization Methods & Software 543

From Equation (14) and ηi,k ≤ ηi+1,k , we obtain


i−k
2
c3
y k − Ai+1 s k  ≤ +1 ηi,k s k 
c1
c1
 i−k 
c3 2 σ i s k  i
+ c3 ηi+1,k + +1 ηi,k ρ 
c 1 c1 |(τ i , s i )|
i+1−k
c3 2
≤ +1 ηi+1,k s k .
c1 c 1


Using the uniformly linearly independence of {s k } and employing ideas already used for proving
Downloaded by [University of South Carolina ] at 12:21 30 June 2013

Theorem 2 in [7], we can show the following convergence result for the matrices {Ak }:

Theorem 4.2 Assume that the assumptions (AS1), (AS3)–(AS6) hold. Let {x k } be a sequence
of iterates generated by Algorithm 1, where Equation (14) holds for every iteration. Then

lim Ak N − ∇x c(x ∗ )N  = 0.


k→∞

Proof From the inequality x p − x s  ≤ x p − x ∗  + x s − x ∗  one obtains

ηk+i+1,k ≤ 2 νk for νk := max{x s − x ∗  | k ≤ s ≤ k + i + 1} (27)

for i ≥ q because of definition (26). Furthermore, we have


 1

y j − ∇x c(x ∗ )s j  =
∇ x c(x j
+ ts j
) dt s j
− ∇ x c(x ∗ j
)s ≤ c3 νk s 
j
(28)
0

for any k ≤ j ≤ k + i. Moreover, we have for any such j that


i
2c3 2
y j − Ak+i+1 s j  ≤ +1 νk s j  (29)
c1 c1
because of Lemma 4.2 and Equation (27). Combining Equation (28), (29) and the triangle
inequality yields for any k ≤ j ≤ k + i that
 i 
s j
2c 2
(Ak+i+1 − ∇x c(x ∗ )) ≤ 3
+ 1 + c3 ν k , (30)
s j  c1 c 1

which holds in particular for j = k1 , . . . , kq . Using the assumption (AS6) that guarantees the
existence of the inverse (SNk )−1 with (SNk )−1  ≥ 1/c4 , and Equation (30), it follows that

1
(Ak+i+1 − ∇x c(x ∗ ))N ≤ (Ak+i+1 − ∇x c(x ∗ ))N SNk  ≤ c5 νk
c4
with
 i 
c3 2 2 √
c5 = +1 +1 q.
c4 c1 c1

Since the assumption (AS4) implies that νk tends to zero, the assertion is proved. 
544 M. Diehl et al.

4.2 Convergence of the matrices Bk

Turning to the approximations {Bk } of the Hessians, we prove a result that is very similar to
Lemma 1 of [7].

Lemma 4.2 Assume that the assumption (AS1) and (AS2) hold. Let {(x k , λk )} be a sequence of
iterates generated by Algorithm 1 where Equation (13) holds for every iteration. Then
i−k−2
c2 2
w − Bi s  ≤
k k
+1 η̃i,k s k 
c1 c1

for all i ≥ k + 1 > k2 where


Downloaded by [University of South Carolina ] at 12:21 30 June 2013

η̃i,k := max{(x p , λp ) − (x s , λs ) | k ≤ s ≤ p ≤ i}. (31)

Proof Observe that


 1
w = Hk s
k k
with Hk = ∇xx
2
L(x k + ts k , λk ) dt.
0

This yields |s k (Hk − Hj )s j | ≤ c2 η̃k+1,j s j s k . Using this estimate and


 i k 
 (r , s ) 
w − Bi+1 s  ≤ w − Bi s  +  i i  r i 
k k k k
(r , s )

with r i = wi − Bi s i , the proof of Lemma 4.4 is identical to the corresponding proof in [7]. 

Next, we prove a result similar to Theorem 2 of [7].

Theorem 4.3 Assume that the assumptions (AS1), (AS2), (AS4)–(AS7) hold. Let {(x k , λk )} be
a sequence of iterates generated by Algorithm 1, where Equation (13) holds for every iteration.
Then

lim Bk N − ∇xx


2
L(x ∗ , λ∗ )N  = 0.
k→∞

Proof From the inequality

(x p , λp ) − (x s , λs ) ≤ (x p , λp ) − (x ∗ , λ∗ ) + (x s , λs ) − (x ∗ , λ∗ )

one obtains

η̃k+i+1,k ≤ 2 ν̃k with ν̃k := max{(x s , λs ) − (x ∗ , λ∗ ) |k ≤ s ≤ k + i + 1} (32)

for i ≥ q because of the definition (31). Furthermore, we have

w j − ∇xx
2
L(x ∗ )s j  = (Hj − ∇xx
2
L(x ∗ , λ∗ ))s j  ≤ c2 ν̃k s j  (33)

for any k ≤ j ≤ k + m. Using Lemma 4.4 and Equation (32), we have for any such j that
i
2c2 2
wj − Bk+i+1 s j  ≤ +1 ν̃k s j . (34)
c1 c1
Optimization Methods & Software 545

Combining Equation (33), (34) and the triangle inequality yields for any k ≤ j ≤ k + i
 i 
s j
2c 2
(Bk+i+1 − ∇ L(x , λ ))
2 ∗ ∗ ≤ 2
+ 1 + c2 ν̃k , (35)
xx
s j  c1 c 1

which holds in particular for j = k1 , . . . , kq . Using the assumption (AS6) that guarantees the
existence of the inverse (SNk )−1 with (SNk )−1  ≥ 1/c4 , and Equation (35), it follows that
1
(Bk+i+1 − ∇xx
2
L(x ∗ , λ∗ ))N ≤ (Bk+i+1 − ∇xx
2
L(x ∗ , λ∗ ))N SNk  ≤ c6 ν̃k
c4
with
 i 
Downloaded by [University of South Carolina ] at 12:21 30 June 2013

c2 2 2 √
c6 = +1 +1 q.
c4 c1 c1

Since the assumption (AS4) implies that ν̃k tends to zero, the assertion is proved. 

4.3 Convergence of the matrices Mk

Now, everything is prepared to prove the main result of this section that is restated for
completeness.

Theorem 4.4 Assume that the assumptions (AS1)–(AS7) hold, where {(x k , λk )} is a sequence
of iterates generated by Algorithm 1. Furthermore, let Equation (13) and (14) hold for every
iteration. Then Equation (24) holds, i.e. the following identity is valid:
 T   T 2 
N Bk N N T AT N ∇xx L(x ∗ , λ∗ )N N T ∇x c(x ∗ )T
lim k − = 0.
k→∞ Ak N 0 ∇x c(x ∗ )N 0

Proof Using the Theorems 4.3 and 4.5, one obtains


 T   T 2 
N Bk N N T AT ∗ ∗
N T ∇x c(x ∗ )T
k − N ∇xx L(x , λ )N
Ak N 0 ∇x c(x ∗ )N 0
 T     T 2 
N Bk N 0 0 N T ATk N ∇xx L(x ∗ , λ∗ ) 0
≤ + −
0 0 Ak N 0 0 0
 
∗ T
0 N ∇x c(x )
T

∇x c(x ∗ )N 0
   
T 0 N T ATk 0 N T ∇x c(x ∗ )T
∗ ∗
≤ N Bk − N ∇xx L(x , λ )N +
T 2 − →0
Ak N 0 ∇x c(x ∗ )N 0

for k → ∞. 

Hence, under quite usual assumptions it can be shown that the matrices Mk converge to the
exact derivative information at the solution, i.e. M∗ . Summing up the results obtained in the last
two sections, one has now the following situation. Provided the assumptions of Theorem 3.3
hold, the iterates generated by Algorithm 1 converge linearly if the initial guess is contained in
a certain neighbourhood of the solution. If the assumptions (AS1)–(AS7) made at the begin-
ning of this section hold in addition, then we have that the approximations Ak and Bk converge
– on the null space of the active constraints – to the true derivative matrices at the solution,
546 M. Diehl et al.

∇x c(x ∗ ) and ∇xx


2
L(x ∗ , λ∗ ). This convergence is expressed in the above Theorem 4.1, respectively
in Equation (24). Now, Theorem 3.4 ensures a q-superlinear convergence of the iterates. Thus,
the following theorem holds.

Theorem 4.5 (Q-superlinear convergence of the SQP algorithm) If the assumptions of Theo-
rem 3.3 hold, in particular, full steps tk = 1 are taken, if the assumptions (AS1)–(AS7) hold and
if the conditions (13) and (14) are fulfilled for every iteration, then the iterates x k , λk generated
by Algorithm 1 converge q-superlinearly to the stationary point x ∗ , λ∗ .

As far as we know, this is the first convergence result for an algorithm based only on quasi-Newton
updates of the Jacobian matrices for solving NLPs with inequality constraints. The performance of
the algorithm will be studied for several test problems in the next section. For an implementation
Downloaded by [University of South Carolina ] at 12:21 30 June 2013

ensuring that the assumption of Theorem 4.6 hold, we refer the reader to [14]. In that paper, a
variant of the STR1 update is proposed that guarantees the positive definiteness of the reduced
Hessian throughout the whole solution procedure.

5. Numerical tests

We present first numerical results for the proposed algorithm. For that purpose, we employ several
Hock–Schittkowski problems [17] contained in the CUTE test set [5] and a periodic optimal
control problem. Nonlinear inequalities are transformed into equality constraints by introducing
slack variables.

5.1 Implementation details

The optimization algorithm is implemented as a C/C++ routine. To provide the exact derivative
information that is required, we employ ADOL-C [12] for the automatic differentiation of C and
C++ code. Additionally, we use the exact derivative matrices as initializations for A0 and B0 .
To measure progress, we provide the l1 -merit function with m + q weights, i.e. each constraint
has its own weight factor (or with one weight for the norm of the constraints). These factors
are updated using a common heuristic due to Powell [21], see also [20]. For the determination
of the step length we use the backtracking approach as proposed in [20]. Note that we examine
only the local superlinear convergence results. Therefore, the starting points of the test problems
are chosen such that the globalization strategy plays only a minor role. As termination criterion,
we use

max(|∇x L(x k , λk )|, c(x k ) + Gx k − b+ ) <

with = 10−8 . To solve the reduced QPs (10) and (11), we apply the QP solver E04NAF from
the NAG library with the parameter values proposed by NAG.

5.2 Results for Hock–Schittkowski test set

As a first numerical test, we took several problems of the Hock–Schittkowski optimization suite
that are included in the CUTE test set with the original starting point given in the literature [17].
These examples contain all possible combinations of equality constraints, bounds as well as linear
and nonlinear inequality constraints. To analyse the convergence behaviour in more detail, we
distinguish five different problem classes.
Optimization Methods & Software 547

5.2.1 Class 1: problems with bounds as inequality constraints

As shown in Table 1, the algorithm converges for all selected Hock–Schittkowski problems of
the CUTE collection that belong to this problem class. The obtained solutions coincide with the
optimal values reported in the literature [17,22] expect for problem hs025. For this problem,
the algorithm terminates at a completely different value without an error message. To illustrate
the superlinear convergence property that is proven in this paper, Figure 1 shows the convergence
behaviour of the algorithm. For this purpose, we include all test problems where the iteration count
exceeds three. For readability, we only include the last 50 iterates of the test problem hs038. As
can be seen, for all shown test problems, the algorithm converges superlinearly for the last iterates.
Downloaded by [University of South Carolina ] at 12:21 30 June 2013

5.2.2 Class 2: problems with linear inequality constraints and bounds

For the second class of test problems, i.e. the Hock–Schittkowski problems contained in the CUTE
collection with linear inequality constraints and bounds, the algorithm does not converge for the
problems hs086 and hs205 because of difficulties with the solution of the quadratic subproblem.
Since we focus here on the superlinear convergence property, we do not examine these difficulties
in more detail. For all other test problems of this class, the iteration count is given in Table 2.
In these cases, the obtained solutions coincide with the optimal values reported in the literature
[17,22]. Once more, the convergence behaviour of the algorithm is illustrated in Figure 1 for all
test problems where the iteration count exceeds three. As can be seen, after a short startup phase,
the algorithm reaches the optimal solution within one step.

5.2.3 Class 3: problems with equality constraints and bounds

The third class of test cases contains the Hock–Schittkowski problems of the CUTE collection
with equality constraints and bounds. Our optimization algorithm does not converge for problems

Table 1. Iteration counts for first class of Hock–Schittkowski problems.

Number of Number of Number of Number of


prob iterations prob iterations prob iterations prob iterations

hs001 44 hs002 20 hs003 2 hs003mod 3


hs004 1 hs005 5 hs025 1 hs038 625
hs045 21 hs110 5

hs001 105
hs002 hs024
0 hs005 hs037
10 hs044
hs038
error ||xi − x*||

hs045 100 hs044new


error ||xi − x*||

hs110 hs268

−5
10 −5
10

−10 −10
10 10
0 20 40 60 0 5 10 15
iteration iteration

Figure 1. Convergence behaviour for the first two groups of test problems.
548 M. Diehl et al.

Table 2. Iteration counts for second class of Hock–Schittkowski problems.

Number of Number of Number of Number of


prob iterations prob iterations prob iterations prob iterations

hs021 1 hs021mod 1 hs024 5 hs035 1


hs035i 1 hs036 1 hs037 8 hs044 4
hs044new 9 hs076 3 hs076i 3 hs118 1
hs268 4

Table 3. Iteration counts for third class of Hock–Schittkowski problems.

Number of Number of Number of Number of


Downloaded by [University of South Carolina ] at 12:21 30 June 2013

prob iterations prob iterations prob iterations prob iterations

hs041 9 hs053 4 hs054 3 hs055 7


hs060 13 hs062 242 hs080 25 hs081 46
hs112 40 hs119 29

hs063, hs099, hs107, hs111 either because of difficulties with the solution of the quadratic sub-
problems or because of difficulties with the globalization strategy. Once more, since we only
examine local convergence properties, we do not examine these cases in detail. The required
iteration count to solve the test problems is given in Table 3. For the problems hs054 and hs062,
the algorithm terminates at a value that differs from the solution stated in the literature [17,22]
without an error message. All other obtained solutions coincide with the known optimal solutions.
Figure 2 shows the convergence behaviour of the proposed algorithm for the 10 test problems,
where only the last 40 iterates are shown for hs062.

5.2.4 Class 4: problems with equality constraints, linear inequality constraints, and bounds

This class contains only the three problems hs035mod, hs074, and hs075. For the first test case, the
algorithm yields within two iterations the known optimal solutions [17,22]. For the remaining two
problems, our optimization algorithm fails because of problems with the solution of the quadratic
subproblems.

5
hs041 10 hs062
hs053 hs080
5
10 hs054 hs081
hs055 hs112
error ||xi − x*||

error ||xi − x*||

hs060 0
hs119
0 10
10

−5
10 −5
10

−10
10
0 5 10 15 0 10 20 30 40 50
iteration iteration

Figure 2. Convergence behaviour for the third group of test problems.


Optimization Methods & Software 549

Table 4. Iteration counts for fifth class of Hock–Schittkowski problems.

Number of Number of Number of Number of


prob iterations prob iterations prob iterations prob iterations

hs010 29 hs011 7 hs012 13 hs013 585


hs014 9 hs015 11 hs016 16 hs022 5
hs023 46 hs029 771 hs030 13 hs031 10
hs032 9 hs033 6 hs034 14 hs057 3
hs065 36 hs066 9 hs067 176 hs083 12
hs088 33 hs089 36 hs090 42 hs091 45
hs092 34 hs104 281 hs108 414
Downloaded by [University of South Carolina ] at 12:21 30 June 2013

5.2.5 Class 5: problems involving nonlinear inequalities

Despite the very simple globalization strategy, our optimization algorithm converges for 27 out of
56 test problems to the optimal solution known from the literature [17,22]. For the remaining test
problems either the line search fails, a point with negative curvature was detected or difficulties
with the solution of the QP subproblems arose. Since there are enough test problems left to
analyse the convergence behaviour for a reasonable number of test problems, we did not analyse
the reasons for the failure in more detail. Table 4 states names and the the required iteration
counts for the solved test problems. Figure 3 shows the convergence behaviour for the solved
test problems. As can be seen, the convergence is indeed superlinear for all problems solved
with a reasonable iteration count. For the five problems that require a really high number of
iterations, the algorithm convergence only linearly and with a reduced accuracy. This behaviour

hs022 hs015
10
0 hs033 hs083
hs011 hs012
0
hs066 10 hs030
error ||xi − x*||

error ||xi − x*||

hs014 hs034
hs032 hs016
−5
hs031
10 −5
10

−10 −10
10 10
0 5 10 0 5 10 15
iteration iteration
hs010 hs013
hs088 hs029
0 hs092 hs067
10 0
hs065 10 hs104
error ||xi − x*||

error ||xi − x*||

hs089 hs108
hs090
hs091
10
−5 hs023
−5
10

−10
10
0 20 40 60 0 20 40 60
iteration iteration

Figure 3. Convergence behaviour for the fifth group of test problems.


550 M. Diehl et al.

was either caused by errors in the solution of the quadratic subproblem or the lack of positive
definiteness.

5.3 Optimizing a simulated moving bed process

Periodic adsorption processes consist of vessels or beds packed with solid sorbent. The sorbent
is in contact with a multi-component fluid feed to preferentially adsorb one of the chemical
components onto the solid. The SMB chromatography to separate two dissolved species in a
liquid phase is one typical example for this class of periodic optimal control problems. Due
to the periodicity of the considered system described by the equality constraints of the opti-
mization problem, the corresponding Jacobian is dense. That is, we face a major difference to
Downloaded by [University of South Carolina ] at 12:21 30 June 2013

numerous PDE-constrained optimization problems where the Jacobian is sparse with a known
structure.
For our numerical tests, we use a strongly simplified model for a specific SMB process described
in detail in [9,23]. This medium scale problem with 127 variables comprises 122 equality con-
straints. Due to the periodic behaviour of the considered system, the equality constraint Jacobian
is basically dense. The objective is to maximize the feed throughput.
Note that each function evaluation necessitates a forward simulation of an ODE with 125
states. In the practical computations, we have chosen a Runge–Kutta integrator of order four
with a fixed number of steps, namely 100, on the nonfixed time interval [0, T ]. If the problem
would be attacked by a full discretization approach like collocation, it would have a size of
at least 100 times, 125 times the number of collocation points per interval (say at least two).
With more than 25,000 variables, it must thus be considered a large scale problem. The shooting
formulation reduces the number of variables to only 127, and the Jacobian factorization within
the TR1 update scheme allows us to reduce the number of free variables further, namely to only
five independents. Thus, the reduced QP in each iteration has only five degrees of freedom and
18 inequality constraints, and costs a negligible fraction of the computing time needed by each
iteration.
It is exactly this type of simulation-based optimization problem with many ‘hidden variables’
causing expensive function evaluations, for which our novel adjoint-based SQP algorithm with
Jacobian updates is designed. For the same initialization strategy of the matrices A0 and B0 as
above and the same termination criterion, our proposed optimization algorithm converges in 230
iterations. The convergence behaviour is illustrated in Figure 4. Even for this application-oriented
test problem one can observe superlinear convergence in the last iterations.

0 smb
10
error ||xi − x ||
*

−5
10

−10
10
0 50 100 150 200
iteration

Figure 4. Convergence behaviour for the SMB problem.


Optimization Methods & Software 551

6. Conclusions and outlook

We have presented an SQP type algorithm for the solution of nonlinear programming prob-
lems with nonlinear equality and linear inequality constraints, which is based on quasi-Newton
approximations of both the Hessian and the constraint Jacobian within each QP subproblem. The
Hessian update uses the well-known SR1 formula, whereas the constraint Jacobian is updated
according to the recently proposed TR1 formula [13], which can be shown to be affine invariant.
Both matrices are updated in a factorized form that directly allows us to set up a reduced QP in
the null space of the linear constraints. The dimension of the QP equals the number of degrees of
freedom. It is solved by a standard convex QP solver. In addition to the QP solution, one constraint
residual and two Lagrange gradients need to be evaluated in each SQP iteration, the latter of which
can efficiently be performed by the reverse mode of automatic differentiation. We have shown
Downloaded by [University of South Carolina ] at 12:21 30 June 2013

that the SQP algorithm is able to detect the correct active set in the QP solutions, despite inexact-
ness of the constraint Jacobians. Furthermore, superlinear convergence of the procedure can be
shown under quite common assumptions, and the particular assumption that the search directions
uniformly span the null space of the active inequality constraints.
First numerical results with the Hock–Schittkowski test set suggest good practical performance
of the algorithm, which is able to solve 86% of the considered 124 test problems. The application
to a nonlinear periodic optimal control problem, where the equality constraints result in a dense
Jacobian, shows that the algorithm is able to treat medium-scale problems well.
The considered problem class allows only linear inequality constraints. A very natural extension
would be to use an update procedure similar to the TR1 also for the Jacobian of the inequalities
to avoid the additional slack variables that are needed for nonlinear inequalities. Furthermore, the
STR1 update as proposed in this paper does not exploit any structure that is present in a specific
optimization problem. For that reason, future work may be dedicated to modified STR1 update
procedures that respect block structure in Jacobian and Hessian, e.g. by high-rank block updates.
This approach could be coupled with appropriate adjoint solvers in ODE and PDE optimal control
applications. Finally, the performance of the algorithm would presumably benefit from a more
sophisticated globalization strategy.

Acknowledgements
We thank the anonymous referees for their valuable comments, which helped us to improve the quality of the paper.
The first author also gratefully acknowledges support by the Research Council KUL: CoE EF/05/006 Optimization in
Engineering(OPTEC), GOA AMBioRICS, IOF-SCORES4CHEM, PhD/postdoc and fellow grants; Flemish Government:
FWO: PhD/postdoc grants, projects G.0452.04, G.0499.04, G.0211.05, G.0226.06, G.0321.06, G.0302.07, G.0320.08,
G.0558.08, G.0557.08, research communities (ICCoS, ANMMM, MLDM); IWT: PhD Grants, McKnow-E, Eureka-Flite;
Helmholtz: viCERP; EU: ERNSI,HD-MPC; Contracts: AMINAL; Belgian Federal Science Policy Office: IUAP P6/04
(DYSCO, Dynamical systems, control and optimization, 2007-2011).

References

[1] L.T. Biegler, O. Ghattas, M. Heinkenschloss, and B.V. Bloemen Waanders (eds.), Large-Scale PDE-Constrained
Optimization, Lecture Notes in Computational Science and Engineering Vol. 30, Springer, Berlin, 2003.
[2] H.G. Bock, Randwertproblemmethoden zur Parameteridenti_zierung in Systemen nichtlinearer Differentialgleichun-
gen, Bonner Mathematische Schriften Vol. 183, University of Bonn, Bonn, 1987.
[3] H.G. Bock, M. Diehl, E.A. Kostina, and J.P. Schlöder, Constrained optimal feedback control of systems governed
by large differential algebraic equations, in Real-Time and Online PDE-Constrained Optimization, L. Biegler,
O. Ghattas, M. Heinkenschloss, D. Keyes, and B. van Bloemen Waanders, eds., SIAM, Philadelphia, pp. 3–22,
2007.
[4] H.G. Bock, W. Egartner, W. Kappis, and V. Schulz, Practical shape optimization for turbine and compressor blades
by the use of PRSQP methods, Optim. Eng. 3(4) (2002), pp. 395–414.
552 M. Diehl et al.

[5] I. Bongartz, A.R. Conn, N. Gould, and P. Toint, CUTE: Constrained and unconstrained testing environment, ACM
Trans. Math. Software 21 (1995), pp. 123–160.
[6] R. Byrd, J. Gilbert, and J. Nocedal, A trust region method based on interior point techniques for nonlinear
programming, Math. Program. 89A (2000), pp. 149–185.
[7] A. Conn, N. Gould, and P. Toint, Convergence of quasi-Newton matrices generated by the symmetric rank one update,
Math. Program. 50A (1991), pp. 177–195.
[8] J.E. Dennis and J. Moré, Quasi-Newton methods, motivation and theory, SIAM Rev. 19(1) (1977), pp. 46–89.
[9] M. Diehl and A. Walther, A test problem for periodic optimal control algorithms, Tech. Rep. MATH-WR-01-2006,
Technische Universität Dresden, Dresden, 2006.
[10] H.J. Ferreau, H.G. Bock, and M. Diehl, An online active set strategy to overcome the limitations of explicit MPC,
Internat. J. Robust Nonlinear Control 18(8) (2008), pp. 816–830.
[11] A. Griewank, Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, Frontiers in Applied
Mathematics Vol. 19, SIAM, Philadelphia, 2000.
[12] A. Griewank, D. Juedes, and J. Utke, ADOL-C: A package for the automatic differentiation of algorithms written in
C/C++, ACM Trans. Math. Software 22 (1996), pp. 131–167.
[13] A. Griewank and A.Walther, On constrained optimization by adjoint-based quasi-Newton methods, Optim. Methods
Downloaded by [University of South Carolina ] at 12:21 30 June 2013

Softw. 17 (2003), pp. 869–889.


[14] A. Griewank, A. Walther, and M. Korzec, Maintaining factorized KKT systems subject to rank-one updates of
Hessians and Jacobians, Optim. Methods Softw. 22 (2007), pp. 279–295.
[15] S.P. Han, Superlinearly convergent variable-metric algorithms for general nonlinear programming problems, Math.
Program. 11 (1976), pp. 263–282.
[16] M. Heinkenschloss and L.N. Vicente, Analysis of inexact trust-region SQP algorithms, SIAM J. Optim. 12(2) (2001),
pp. 283–302.
[17] W. Hock and K. Schittkowski, Test Examples for Nonlinear Programming Codes, Lecture Notes in Economics and
Mathematical Systems Vol. 187, Springer, 1981.
[18] H. Jäger and E. Sachs, Global convergence of inexact reduced SQP methods, Optim. Methods Softw. 7 (1997), pp.
83–110.
[19] L. Jiang, L.T. Biegler, and G. Fox, Optimization of pressure swing adsorption systems for air separation, AIChE J.
49 (2003), pp. 1140–1157.
[20] J. Nocedal and S. Wright, Numerical Optimization, Springer, New York, 1999.
[21] M.J.D. Powell, A fast algorithm for nonlinearly constrained optimization calculations, In Numerical Analysis,
Dundee 1977, G.A. Watson, ed., Lecture Notes in Mathematics Vol. 630, Springer, Berlin, 1978.
[22] A. Wächter and L. Biegler, On the implementation of an interior-point filter line-search algorithm for large-scale
nonlinear programming, Math. Program. 106(1) (2006), pp. 25–57.
[23] A. Walther and L. Biegler, A trust-region algorithm for nonlinear programming problems with dense constraint
Jacobians, Tech. Rep. MATH-WR-01-2007, Technische Universität Dresden, 2007 (submitted to Comput. Opt.
Appl.).
[24] L. Wirsching, An SQP algorithm with inexact derivatives for a direct multiple shooting method for optimal control
problems, Master’s thesis, University of Heidelberg, 2006.
[25] L. Wirsching, H.G. Bock, and M. Diehl, Fast NMPC of a chain of masses connected by springs, In Proceedings of
the IEEE International Conference on Control Applications, Munich, pp. 591–596, 2006.
[26] L. Wirsching, H.J. Ferreau, H.G. Bock, and M. Diehl, An online active set strategy for fast adjoint-based non-
linear model predictive control, In Proceedings of the 7th Symposium on Nonlinear Control Systems (NOLCOS),
Pretoria, 2007.

You might also like