Optimization Methods and Software
Optimization Methods and Software
Optimization Methods and Software
To cite this article: Moritz Diehl , Andrea Walther , Hans Georg Bock & Ekaterina Kostina (2010):
An adjoint-based SQP algorithm with quasi-Newton Jacobian updates for inequality constrained
optimization, Optimization Methods and Software, 25:4, 531-552
This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representation
that the contents will be complete or accurate or up to date. The accuracy of any
instructions, formulae, and drug doses should be independently verified with primary
sources. The publisher shall not be liable for any loss, actions, claims, proceedings,
demand, or costs or damages whatsoever or howsoever caused arising directly or
indirectly in connection with or arising out of the use of this material.
Downloaded by [University of South Carolina ] at 12:21 30 June 2013
Optimization Methods & Software
Vol. 25, No. 4, August 2010, 531–552
We present a sequential quadratic programming (SQP) type algorithm, based on quasi-Newton approxima-
tions of Hessian and Jacobian matrices, which is suitable for the solution of general nonlinear programming
problems involving equality and inequality constraints. In contrast to most existing SQP methods, no eval-
uation of the exact constraint Jacobian matrix needs to be performed. Instead, in each SQP iteration only
one evaluation of the constraint residuals and two evaluations of the gradient of the Lagrangian function are
necessary, the latter of which can efficiently be performed by the reverse mode of automatic differentiation.
Factorizations of the Hessian and of the constraint Jacobian are approximated by the recently proposed
STR1 update procedure. Inequality constraints are treated by solving within each SQP iteration a quadratic
program (QP), the dimension of which equals the number of degrees of freedom. A recently proposed
gradient modification in these QPs takes account of Jacobian inexactness in the active set determination.
Superlinear convergence of the procedure is shown under mild conditions. The convergence behaviour of
the algorithm is analysed using several problems from the Hock–Schittkowski test library. Furthermore,
we present numerical results for an optimization problem based on a small periodic adsorption process,
where the Jacobian of the equality constraints is dense.
1. Introduction
In this paper, we propose a sequential quadratic programming (SQP) type algorithm to solve
general nonlinear programming problems that are given in the form
min f (x) s.t. c(x) = 0, Gx ≤ b, (1)
x∈Rn
In contrast to the majority of SQP type algorithms, our algorithm does not evaluate the Jaco-
bian of the equality constraints exactly in each iteration, but approximates it by a quasi-Newton
update procedure first proposed in [13]. While this approach, as well as related inexact SQP type
methods [4,16], have so far only been applicable to equality constrained problems, recently it was
discovered that inequalities could be handled by SQP type methods with inexact Jacobians by a
simple gradient modification in the quadratic programming (QP) subproblems [3]. In this paper,
we study a combination of the adjoint-based quasi-Newton concept presented in [13] with the
gradient modification proposed in [3]. The main feature of this new algorithm is that one does
not need to form and factor the Jacobian of the equality constraints. Especially for optimization
problems where the Jacobian of the equality constraints is dense, this may allow a considerable
reduction of the computing time. Hence, we aim at the solution of optimization problems with
moderate size but a special structure of the constraint Jacobian. The corresponding applications
Downloaded by [University of South Carolina ] at 12:21 30 June 2013
cover a wide range of periodic adsorption processes including, for example, the purification of
hydrogen. In these cases, the Jacobian of the equality constraints is dense due to the periodicity of
the considered process. As a consequence, the runtime needed for the optimization process may
be dominated significantly by the computation of the dense Jacobian and its factorization [19].
The proposed SQP-type algorithm with superlinear local convergence properties is applicable
to general nonlinear programming problems (NLPs) with inequality constraints, since one can
convert any NLP into the form (1) by introducing suitable slacks.
SQP methods that use inexact Jacobians or that inexactly solve the linear subproblems have
often been considered for equality-constrained optimization, [16,18, or the references therein].
Quasi-Newton update procedures for the Hessian of the Lagrangian are widely employed since
the classical papers by Han [15] and Powell [21]. However, the use of quasi-Newton updates for
both, the Hessian and the constraint Jacobian was first proposed in [13] for equality-constrained
problems to avoid the forming and factoring of the constraint Jacobian matrix.
The treatment of inequalities in the presence of inexact Jacobians poses the question of correct
active set determination, and usually it is assumed that the gradients are sufficiently accurately
computed to ensure correct multiplier signs, or that interior point methods are used, see e.g. the
articles on inequality constrained optimization in [1]. In contrast to these approaches, the SQP
type methods considered here can cope with wrong constraint gradients as long as an accurate
gradient of the Lagrangian is computed.
The outline of this paper is as follows. In Section 2, we describe the algorithm for solving
optimization problems with arbitrary equality and linear inequalities in detail. We prove the
stability of the active set and the local convergence of the iterates under the assumption that
the approximate derivative information satisfies certain accuracy requirements in Section 3. It is
shown in Section 4 that the required quality of the derivative approximations can be achieved. First
numerical results are stated in Section 5 to illustrate the applicability of the proposed algorithm
for the Hock–Schittkowski problems [17] contained in the CUTE test set [5]. Furthermore, we
apply the proposed optimization algorithm to a small simulated moving bed (SMB) process as
one possible test case for periodic optimal control problems with a dense Jacobian of the equality
constraints. Finally, conclusions and ideas for further developments are presented in Section 6.
In general, SQP algorithms aim at finding a local minimizer of Equation (1) by iteratively
refined guesses of the primal variables x ∈ Rn and of the dual variables λ ∈ Rm . Here, the
main idea is to find a root of the necessary optimality conditions of the considered optimization
problem.
Optimization Methods & Software 533
For our purpose to solve an optimization problem with inequality constraints by using only
inexact derivative matrices, we define a ‘truncated Lagrangian’ L by
disregarding inequalities and their multipliers. Furthermore, we say that the ith inequality is
active at a point x iff (Gx − b)i = 0. We denote the matrix of rows within G that correspond to
active inequalities by the shorthand Ga . We say that the linear independence constraint qualifi-
cation (LICQ) holds at a point x iff the row vectors of ∇x c(x ∗ )T and Ga are linearly independent.
Here and throughout this paper, we use the notation ∇x c(x) := ∂c/∂x(x)T . By · we will denote
the Euclidean norm. Necessary conditions for optimality can then be formulated as follows [20].
Equation (1) and LICQ holds at x ∗ , then there exist unique multipliers λ∗ , μ∗ such that
∇x L(x ∗ , λ∗ ) + GT μ∗ = 0, (3a)
c(x ∗ ) = 0, (3b)
∗
Gx ≤ b, (3c)
∗
μ ≥ 0, (3d)
μ∗i (Gx ∗ − b)i = 0, i = 1, 2, . . . , p. (3e)
In the remainder of this section, we will first review an existing adjoint-based quasi-Newton
algorithm for the solution of problems without inequalities [13]. Here, the KKT conditions
simplify to
∇x L(x ∗ , λ∗ ) = 0, (4a)
c(x ∗ ) = 0. (4b)
The aim of the Newton-type algorithm is to find a point satisfying this system of nonlinear
equations. In a second step, we present our novel SQP algorithm, which is able to treat inequality
constraints, i.e. to find points x ∗ satisfying the full set of KKT conditions (3). The algorithm uses
the same quasi-Newton update procedure as the algorithm for equality constraints, but is based
on successive solution of the QPs, where a modified gradient is used within each QP to account
for the inexactness of the derivatives.
If one has to solve an equality-constrained problem of the form (1) in the absence of inequality
constraints, one may apply Newton’s method to the corresponding KKT conditions (4). A full
step Newton’s method would iterate
x k+1 = x k + s k , λk+1 = λk + σ k ,
Here, g(x, λ) := ∇f (x) + ∇x c(x)λ is the gradient of the Lagrange function (2) with respect to
x. A SQP method based only on quasi-Newton approximations Bk and Ak of ∇xx 2
L(x k , λk ) and
534 M. Diehl et al.
∇x c(x k ) was first presented in [13]. In order to approximate the second-order information that is
required for solving the KKT system (5), the well-known SR1 update given by
(w k − Bk s k )(w k − Bk s k )T
Bk+1 = Bk + εk (6)
(w k − Bk s k )T s k
is employed, with s k := x k+1 − x k . The matrix Bk+1 is chosen to satisfy the secant condition
The parameter εk in Equation (6) allows a modification of the SR1 update such that the norm
of the Hessian approximation remains bounded. Furthermore, it can be used to avoid negative
curvature of the next Hessian approximation [14]. For most SQP algorithms, [6,22], it is necessary
Downloaded by [University of South Carolina ] at 12:21 30 June 2013
that the full Jacobian of the constraints is available using either numerical differentiation or ana-
lytical methods. However, only vector×Jacobian products are required by the updating algorithm
proposed in [13]. This adjoint information can be computed efficiently using either an approach
based on continuous adjoint equations if the underlying model is based on differential equations
or the technique of automatic differentiation [11]. As a result, one obtains a quasi-Newton update
also for the Jacobian of the constraints, avoiding the possibly very time-consuming construction
of the full Jacobian in each optimization step. For that purpose, the TR1 update is defined as
(y k − Ak s k )(γ k − ATk σ k )T
Ak+1 = Ak + (7)
(γ k − ATk σ k )T s k
To handle inequalities, we follow an idea from [3] and formulate in each SQP iteration for given
iterates (x k , λk ) and for given Jacobian and Hessian approximations Ak and Bk the QP
1 T
min s Bk s + (mk )T s s.t. Ak s = −c(x k ), Gs ≤ b − Gx k , (8)
s ∈ Rn 2
In this definition, the rightmost term can be interpreted as a correction to the exact objective
gradient ∇f (x k ) that is usually employed in SQP methods. The correction term accounts for the
Optimization Methods & Software 535
fact that the constraint Jacobian ∇x c(x k ) is not exactly known within the QP. As the Jacobian
of the inequalities is correctly known in the QP, no correction with respect to the inequalities
needs to be added, in contrast to the generic NLP algorithm proposed in [3]. Using adjoint
differentiation techniques, as before, the modified gradient mk can efficiently be computed as
mk = ∇x L(x k , λk ) − ATk λk avoiding the expensive computation of ∇x c(x k ).
The solution of Equation (8) yields an optimal step s k and corresponding equality and inequality
multipliers λkQP and μkQP . As the inequality multipliers are neither needed for the computation of
the modified gradient nor for the Hessian update, since the linear inequalities do not contribute
to the Hessian ∇xx2
L, we discard them altogether in our algorithm. Defining the multiplier step
σ := λQP − λ , the full step method would then iterate, again, as
k k k
x k+1 = x k + s k , λk+1 = λk + σ k .
Downloaded by [University of South Carolina ] at 12:21 30 June 2013
First, we observe that the primal and dual steps s k and σ k can be obtained from the solution of a
slight modification of the original QP Equation (8), namely from the following equivalent QP
1 T
min s Bk s + ∇x L(x k , λk )T s s.t. Ak s = −c(x k ), Gs ≤ b − Gx k . (9)
s∈Rn 2
Here, we directly obtain the equality multiplier step σ k from the QP equality multipliers. In the
overall SQP algorithm that we propose, this QP subproblem is solved in each iteration to obtain
s k and σ k .
To avoid an expensive factorization of the matrix Ak in each SQP iteration, we directly maintain
a factorized nullspace representation of the approximated derivatives. Hence, we do not update
Ak and Bk directly, but the matrices
Lk ∈ Rm×m , Yk ∈ Rn×m , Zk ∈ Rn×d , Mk ∈ Rd×d , Dk ∈ Rm×n
with d = n − m, Yk orthonormal and Lk lower triangular that are defined by
Mk = ZkT Bk Zk , Ak = Lk YkT , YkT Zk = 0, Dk = YkT Bk .
The damping parameter εk in Equation (6) is chosen such that the approximation of the reduced
Hessian, i.e. Mk , is positive definite. The corresponding updating formulas for the five matrices
can be found in [14].
The factorization of Ak allows us to consider a step vector s k of the form
s k = Yk syk + Zk szk ,
where the range space step syk is given by
syk = −L−1 k
k c(x ),
and the null space step szk is determined by the solution of a reduced QP, i.e. a QP in a space of
dimension d = n − m, that is equivalent to the QP (9), namely
1 T
min sz Mk sz + szT ZkT g(x k , λk ) + DkT syk (10)
sz ∈Rd 2
s.t. GZk sz ≤ b − Gx k − GYk syk . (11)
536 M. Diehl et al.
Note that solving the reduced QPs (10) and (11) yields not only szk but also the Lagrange multipliers
μkQP of the inequalities. Then we can compute σ k using the formula
Once s k and σ k are computed from the QP solution, the new iterate is given by
x k+1 = x k + tk s k , λk+1 = λk + tk σ k ,
where tk ∈ (0, 1] is determined by a line search. The presented approach is summarized in the
following algorithm:
Algorithm 1
Downloaded by [University of South Carolina ] at 12:21 30 June 2013
To keep the quasi-Newton approximations well defined, we perform the update (6) in step 6
only if the inequality
|(w k − Bk s k , s k )| ≥ c1 w k − Bk s k s k (13)
holds for the SR1-update with c1 ∈ (0, 1), where (., .) denotes the inner product. Here and
throughout, we denote by (x, y) the scalar product x T y. The update (7) is used only if the inequality
We investigate first the question if the active set in the QP subproblems is stable in the presence
of inexact Jacobians, and under which circumstances we can ensure local convergence.
Theorem 3.1 (Stability of QP active set near NLP solution) Assume that vectors x ∗ , λ∗ are
given and
(i) at x ∗ LICQ holds, and there exists a μ∗ such that (x ∗ , λ∗ , μ∗ ) satisfies the KKT condi-
tions (3a)–(3e).
(ii) at x ∗ strict complementarity holds, i.e. the multipliers μ∗a of the active inequalities Ga x ∗ = ba
satisfy μ∗a > 0, where Ga is a matrix consisting of all rows of G that correspond to the active
inequalities in x ∗ .
Optimization Methods & Software 537
Then for any constant α > 0 there exists a neighbourhood N of (x ∗ , λ∗ ) and a C > 0 such that
for all (x, λ) ∈ N the following statement holds. For any two matrices A, B with B positive
semidefinite on the null space of A, and such that the matrix
⎡ ⎤
B AT GTa
J (A, B) := ⎣ A ⎦ (15)
Ga
is invertible and satisfies J (A, B)−1 ≤ α, the QP (9) has a unique solution (s, σ, μ) that
satisfies
(s, σ, μ − μ∗ ) ≤ C(x − x ∗ + λ − λ∗ ). (16)
Downloaded by [University of South Carolina ] at 12:21 30 June 2013
This QP solution has the same active set as the NLP solution x ∗ .
is uniquely solvable, if J (A, B) is invertible. Moreover, for (x, λ) = (x ∗ , λ∗ ) the solution is given
by (0, 0, μ∗act ) because ∇x L(x ∗ , λ∗ ) + Gact T μ∗act = 0, c(x ∗ ) = 0, bact − Gact x ∗ = 0. As the norm
of the inverse of J (A, B) is bounded, the size of (s, σ, μact − μ∗act ) is bounded by α times the
norm of the change in the right-hand side, and because the right-hand side depends differentiably
on (x, λ) we find C > 0 such that (s, σ, μact ) − (0, 0, μ∗act ) ≤ C(x − x ∗ + λ − λ∗ ). When
μact is augmented to the vector μ by suitable zeros at the positions of the nonactive constraints, we
obtain Equation (16). To show that (s, σ, μ) is indeed the unique solution of the QP (9), we first
deduce from the fact that B is assumed to be positive semidefinite on the nullspace of A that the QP
is convex. Moreover, μact > 0 if N is chosen sufficiently small, due to strict complementarity in
the NLP solution. Likewise, for the inactive inequalities, the fact that Ginact x − binact < 0 implies
that Ginact (x + s) − binact < 0 for N sufficiently small. Therefore, (s, σ, μ) satisfies the KKT
conditions of the QP, namely
Bs + ∇x L(x, λ) + AT σ + GT μ = 0,
c(x) + As = 0,
G(x + s) ≤ b,
μ ≥ 0,
μi (G(x + s) − b)i = 0, i = 1, 2, . . . , nb ,
with the same active set as the NLP solution and with strict complementarity. Thus s is a global
minimizer of the convex QP. This minimizer is unique if the Hessian
A of the QP,T B, is positive
definite on the null space of the Jacobian
of the
active constraints, G act
, i.e. if z Bz > 0 holds
for any vector z ∈ Rn , z
= 0 such that GAact z = 0. This is indeed the case, as zT Bz ≥ 0 due
A, and z Bz
= 0 must hold because otherwise
T
to positive semidefiniteness on the null space
zof
J (A, B) would be singular, due to J (A, B) 0 = 0.
538 M. Diehl et al.
The fact that the active set remains stable near a solution allows us to establish conditions for local
convergence of our SQP method with inexact Jacobians. In this subsection, we will assume that
the active set is fixed, and we will only consider the full step method with tk = 1 in all iterations.
We observe that the iterates (x k , λk ) for a fixed active set can be interpreted as iterates of an
inexact Newton method towards the solution y ∗ = (x ∗ , λ∗ ) of the nonlinear system
⎡ T ⎤
N ∇x L(x, λ)
x
F (y) = 0 with y := and F (y) := ⎣ c(x) ⎦. (17)
λ
Ga x − b a
Introducing this matrix allows us to disregard the active inequality multipliers μa from the iteration.
For given y k , we iterate
⎡ T ⎤
N Bk N T ATk
y k+1 = y k − Jk−1 F (y k ) with Jk = ⎣ Ak ⎦. (18)
Ga
The matrix Jk shall be a suitable approximation of the matrix
⎡ T 2 ⎤
N ∇xx L(x k , λk ) N T ∇x c(x k )
∂F k
(y ) = ⎣ ∇x c(x k )T ⎦. (19)
∂y G a
The following lemma is a variant of standard results that have similarly been formulated, e.g.
in [2,8].
Lemma 3.1 (Local convergence) Assume that F : D → Rny , D ⊂ Rny open, is continuously
differentiable and consider the sequence
y k+1 = y k + y k , y k = −Jk−1 F (y k ),
From Assumption (20), we conclude that y k+1 ≤ κ y k and, therefore, for arbitrary k ≥
m ≥ 0,
y m
y k − y m ≤
Downloaded by [University of South Carolina ] at 12:21 30 June 2013
,
1−κ
i.e. y k is a Cauchy sequence that remains in the closed set B and thus converges towards a point
ỹ ∗ ∈ B. Because of Assumption (20) and uniform boundedness of Jk , we also have
∂F k k
lim F (y k ) = lim − y y = 0.
k→∞ k→∞ ∂y
As a consequence of Theorem 3.1 and Lemma 3.2, we obtain the following result.
Theorem 3.2 (Stability of active set, local convergence) Let the vectors x ∗ , λ∗ be given and
assume that:
(i) at x ∗ the LICQ holds, and there exists a μ∗ such that (x ∗ , λ∗ , μ∗ ) satisfies the KKT
conditions (3a)–(3b);
(ii) at x ∗ strict complementarity as defined in Theorem 3.1 holds;
(iii) there are two sequences of uniformly bounded matrices (Ak ), (Bk ), each Bk positive semidef-
inite on the null space of Ak , such that the sequence of matrices (Jk ) defined in Equation (18)
is uniformly bounded and invertible with the uniformly bounded inverse;
(iv) there is a sequence y k := [x k , λk ]T generated according to
y k+1 = y k + y k with y k := [s k , σ k ]T ,
s. t. Ak s = −c(x k ), Gs ≤ b − Gx k (22)
Then there exists a neighbourhood N̄ of (x ∗ , λ∗ ) such that for all initial guesses (x 0 , λ0 ) ∈ N̄
the sequence (x k , λk ) converges q-linearly towards (x ∗ , λ∗ ) with rate κ, and the solution of each
QP (22) has the same active set as x ∗ .
540 M. Diehl et al.
Proof We start by noting that the boundedness of Jk implies also boundedness of J (Ak , Bk ) as
defined in Equation (15) for Theorem 3.1. Conversely, the inverse J (Ak , Bk )−1 is also defined
and bounded, as it can easily be shown that
⎡ T ⎤
N
Jk−1 ⎢ Im ⎥
J (Ak , Bk )−1 = ⎢ ⎥,
(Ga Ga ) (Ga Bk |Ga ATk )Jk−1 (Ga GTa )−1 ⎣
T −1 Ip ⎦
Ga
We continue by choosing a neighbourhood N according to Theorem 3.1, such that the stability
of the active set in the QP (22) is guaranteed for the given bounds on Jk−1 whenever (x k , λk ) ∈ N .
Then, the QP solution is given by
⎡ T ⎤
k N ∇x L(x k , λk )
s
= −Jk−1 ⎣ c(x k ) ⎦.
σk
Ga x − ba
It remains to determine the neighbourhood N̄ of (x ∗ , λ∗ ) such that we can guarantee that the ball
n+m y 0
B y 0 /(1−κ) (y ) := y ∈ R
0
y − y ≤
0
1−κ
holds in addition to the assumptions of Theorem 3.3, then the convergence rate is q-superlinear.
Proof We will show that Equation (21) holds so that q-superlinear convergence follows from
Lemma 3.2. First, we observe that the matrices
T
N Bk N N T ATk
N
Mk := = Iq+m 0 Jk
Ak N 0 Im
have inverses
NT Iq+m
Mk−1 = Jk−1
Im 0
that are uniformly bounded due to Assumption (iii) in Theorem 3.3. Defining the shorthand
T 2
N ∇xx L(x ∗ , λ∗ )N N T ∇x c(x ∗ )T
M∗ :=
∇x c(x ∗ )N 0
Optimization Methods & Software 541
−1
from Equation (24) we therefore obtain limk→∞ Mk+1 (Mk − M ∗ ) = 0. Furthermore, we note that
for k ≥ 1 all iterations are performed in the null space of Ga , i.e.
T
N N −1 Iq+m N −1
y k = y k and Jk+1 = Mk+1 .
Im Im 0 Im
Because of
−1 ∂F ∗ N −1 Iq+m
Jk+1 Jk − (y ) = Jk+1 (Mk − M∗ ),
∂y Im 0
we obtain −1
J (Jk − ∂F /∂y(y ∗ )) y k
k+1
≤ Mk+1
−1
(Mk − M ∗ ) → 0.
y k
Downloaded by [University of South Carolina ] at 12:21 30 June 2013
Remark The convergence theory for inexact, inequality constrained SQP methods in this section
is also, without quasi-Newton schemes, a useful tool for efficient algorithms. In particular, in the
context of nonlinear model predictive control (NMPC), where a series of neighbouring problems
is solved, it is often possible to keep the Hessian and Jacobian just constant for different prob-
lems [25]. A detailed analysis of the use of adjoint-based SQP methods for nonlinear optimal
control and NMPC has been performed in [24], where detailed proofs of very similar results as
in this section are given. Recent work on adjoint based NMPC [26] shows that the adjoint-based
methods can efficiently be coupled with online active set strategies for QP solution as presented
in [10].
The local convergence result for the STR1 update presented in this section is mainly based on the
ideas already used in [7]. Since we know that the active set will finally be fixed, we are interested
in matrix convergence only on a subspace. We are using again q := n − pa and the orthonormal
column matrix N ∈ Rn×q that spans the null space of Ga , i.e. N T N = Iq and Ga N = 0. We do
not need full steps tk = 1 for the matrix convergence results, therefore we define s k := x k+1 − x k
in this section. For the convergence proof of this section, we will use the following assumptions:
Assumptions (AS1) – (AS4) are moderate and used widely for the proof of convergence,
where (AS2) and (AS3) are assumed to hold globally. Since we examine only local superlinear
convergence, we ensure the convergence of the iterate by (AS4). Assumption (AS5) holds due to
the results presented in the last section, assumption (AS6) is rather strong. Near a solution, one
will have |εk | = 1. Therefore, for a convergent sequence of iterates {(x k , λk )}, assumption (AS7)
should be fulfilled for k2 large enough. Then, we can prove the following statement.
Theorem 4.1 Assume that the assumptions (AS1)–(AS7) hold, where {(x k , λk )} is a sequence
of iterates generated by Algorithm 1. Furthermore let Equation (13) and (14) hold for every
iteration. Then Equation (24) holds.
This convergence result together with the local convergence proved in Section 3 will yield the
Downloaded by [University of South Carolina ] at 12:21 30 June 2013
local superlinear convergence of the iterates generated by Algorithm 1. The proof of Theorem 4.1
consists of three parts. First, we will show that Ak N converges to ∇x c(x ∗ )N , then we will show
that Bk N converges to ∇xx 2
L(x ∗ , λ∗ )N. In a third step, both results are combined to prove the
stated convergence of the approximate KKT matrix.
As a first step, we consider the approximations {Ak } of the Jacobians. Here, we will employ ideas
used in the proof of Lemma 1 in [7]. Then, it is possible to show the following result.
Lemma 4.1 Assume that the assumptions (AS1) and (AS3) hold. Let {(x k , λk )} be a sequence of
iterates generated by Algorithm 1 where Equation (14) holds for every iteration. Then
i−k
c3 2
y k − Ai s k ≤ +1 ηi,k s k (25)
c1 c1
for all i ≥ k + 1 where
Proof The proof of this lemma is by induction. For i = k + 1 one has y k − Ai s k = 0 because
of Equation (7). Now we assume that Equation (25) holds for all j = k + 1, . . . , i. Using
Equation (25) and the definition of ρ k , we obtain
i k
(τ , s )
y k − Ai+1 s k ≤ y k − Ai s k + i i ρ i ,
(τ , s )
≤ c3 ηi+1,k σ s .
i k
Optimization Methods & Software 543
Using the uniformly linearly independence of {s k } and employing ideas already used for proving
Downloaded by [University of South Carolina ] at 12:21 30 June 2013
Theorem 2 in [7], we can show the following convergence result for the matrices {Ak }:
Theorem 4.2 Assume that the assumptions (AS1), (AS3)–(AS6) hold. Let {x k } be a sequence
of iterates generated by Algorithm 1, where Equation (14) holds for every iteration. Then
which holds in particular for j = k1 , . . . , kq . Using the assumption (AS6) that guarantees the
existence of the inverse (SNk )−1 with (SNk )−1 ≥ 1/c4 , and Equation (30), it follows that
1
(Ak+i+1 − ∇x c(x ∗ ))N ≤ (Ak+i+1 − ∇x c(x ∗ ))N SNk ≤ c5 νk
c4
with
i
c3 2 2 √
c5 = +1 +1 q.
c4 c1 c1
Since the assumption (AS4) implies that νk tends to zero, the assertion is proved.
544 M. Diehl et al.
Turning to the approximations {Bk } of the Hessians, we prove a result that is very similar to
Lemma 1 of [7].
Lemma 4.2 Assume that the assumption (AS1) and (AS2) hold. Let {(x k , λk )} be a sequence of
iterates generated by Algorithm 1 where Equation (13) holds for every iteration. Then
i−k−2
c2 2
w − Bi s ≤
k k
+1 η̃i,k s k
c1 c1
with r i = wi − Bi s i , the proof of Lemma 4.4 is identical to the corresponding proof in [7].
Theorem 4.3 Assume that the assumptions (AS1), (AS2), (AS4)–(AS7) hold. Let {(x k , λk )} be
a sequence of iterates generated by Algorithm 1, where Equation (13) holds for every iteration.
Then
one obtains
w j − ∇xx
2
L(x ∗ )s j = (Hj − ∇xx
2
L(x ∗ , λ∗ ))s j ≤ c2 ν̃k s j (33)
for any k ≤ j ≤ k + m. Using Lemma 4.4 and Equation (32), we have for any such j that
i
2c2 2
wj − Bk+i+1 s j ≤ +1 ν̃k s j . (34)
c1 c1
Optimization Methods & Software 545
Combining Equation (33), (34) and the triangle inequality yields for any k ≤ j ≤ k + i
i
s j
2c 2
(Bk+i+1 − ∇ L(x , λ ))
2 ∗ ∗ ≤ 2
+ 1 + c2 ν̃k , (35)
xx
s j c1 c 1
which holds in particular for j = k1 , . . . , kq . Using the assumption (AS6) that guarantees the
existence of the inverse (SNk )−1 with (SNk )−1 ≥ 1/c4 , and Equation (35), it follows that
1
(Bk+i+1 − ∇xx
2
L(x ∗ , λ∗ ))N ≤ (Bk+i+1 − ∇xx
2
L(x ∗ , λ∗ ))N SNk ≤ c6 ν̃k
c4
with
i
Downloaded by [University of South Carolina ] at 12:21 30 June 2013
c2 2 2 √
c6 = +1 +1 q.
c4 c1 c1
Since the assumption (AS4) implies that ν̃k tends to zero, the assertion is proved.
Now, everything is prepared to prove the main result of this section that is restated for
completeness.
Theorem 4.4 Assume that the assumptions (AS1)–(AS7) hold, where {(x k , λk )} is a sequence
of iterates generated by Algorithm 1. Furthermore, let Equation (13) and (14) hold for every
iteration. Then Equation (24) holds, i.e. the following identity is valid:
T T 2
N Bk N N T AT N ∇xx L(x ∗ , λ∗ )N N T ∇x c(x ∗ )T
lim k − = 0.
k→∞ Ak N 0 ∇x c(x ∗ )N 0
for k → ∞.
Hence, under quite usual assumptions it can be shown that the matrices Mk converge to the
exact derivative information at the solution, i.e. M∗ . Summing up the results obtained in the last
two sections, one has now the following situation. Provided the assumptions of Theorem 3.3
hold, the iterates generated by Algorithm 1 converge linearly if the initial guess is contained in
a certain neighbourhood of the solution. If the assumptions (AS1)–(AS7) made at the begin-
ning of this section hold in addition, then we have that the approximations Ak and Bk converge
– on the null space of the active constraints – to the true derivative matrices at the solution,
546 M. Diehl et al.
Theorem 4.5 (Q-superlinear convergence of the SQP algorithm) If the assumptions of Theo-
rem 3.3 hold, in particular, full steps tk = 1 are taken, if the assumptions (AS1)–(AS7) hold and
if the conditions (13) and (14) are fulfilled for every iteration, then the iterates x k , λk generated
by Algorithm 1 converge q-superlinearly to the stationary point x ∗ , λ∗ .
As far as we know, this is the first convergence result for an algorithm based only on quasi-Newton
updates of the Jacobian matrices for solving NLPs with inequality constraints. The performance of
the algorithm will be studied for several test problems in the next section. For an implementation
Downloaded by [University of South Carolina ] at 12:21 30 June 2013
ensuring that the assumption of Theorem 4.6 hold, we refer the reader to [14]. In that paper, a
variant of the STR1 update is proposed that guarantees the positive definiteness of the reduced
Hessian throughout the whole solution procedure.
5. Numerical tests
We present first numerical results for the proposed algorithm. For that purpose, we employ several
Hock–Schittkowski problems [17] contained in the CUTE test set [5] and a periodic optimal
control problem. Nonlinear inequalities are transformed into equality constraints by introducing
slack variables.
The optimization algorithm is implemented as a C/C++ routine. To provide the exact derivative
information that is required, we employ ADOL-C [12] for the automatic differentiation of C and
C++ code. Additionally, we use the exact derivative matrices as initializations for A0 and B0 .
To measure progress, we provide the l1 -merit function with m + q weights, i.e. each constraint
has its own weight factor (or with one weight for the norm of the constraints). These factors
are updated using a common heuristic due to Powell [21], see also [20]. For the determination
of the step length we use the backtracking approach as proposed in [20]. Note that we examine
only the local superlinear convergence results. Therefore, the starting points of the test problems
are chosen such that the globalization strategy plays only a minor role. As termination criterion,
we use
with = 10−8 . To solve the reduced QPs (10) and (11), we apply the QP solver E04NAF from
the NAG library with the parameter values proposed by NAG.
As a first numerical test, we took several problems of the Hock–Schittkowski optimization suite
that are included in the CUTE test set with the original starting point given in the literature [17].
These examples contain all possible combinations of equality constraints, bounds as well as linear
and nonlinear inequality constraints. To analyse the convergence behaviour in more detail, we
distinguish five different problem classes.
Optimization Methods & Software 547
As shown in Table 1, the algorithm converges for all selected Hock–Schittkowski problems of
the CUTE collection that belong to this problem class. The obtained solutions coincide with the
optimal values reported in the literature [17,22] expect for problem hs025. For this problem,
the algorithm terminates at a completely different value without an error message. To illustrate
the superlinear convergence property that is proven in this paper, Figure 1 shows the convergence
behaviour of the algorithm. For this purpose, we include all test problems where the iteration count
exceeds three. For readability, we only include the last 50 iterates of the test problem hs038. As
can be seen, for all shown test problems, the algorithm converges superlinearly for the last iterates.
Downloaded by [University of South Carolina ] at 12:21 30 June 2013
For the second class of test problems, i.e. the Hock–Schittkowski problems contained in the CUTE
collection with linear inequality constraints and bounds, the algorithm does not converge for the
problems hs086 and hs205 because of difficulties with the solution of the quadratic subproblem.
Since we focus here on the superlinear convergence property, we do not examine these difficulties
in more detail. For all other test problems of this class, the iteration count is given in Table 2.
In these cases, the obtained solutions coincide with the optimal values reported in the literature
[17,22]. Once more, the convergence behaviour of the algorithm is illustrated in Figure 1 for all
test problems where the iteration count exceeds three. As can be seen, after a short startup phase,
the algorithm reaches the optimal solution within one step.
The third class of test cases contains the Hock–Schittkowski problems of the CUTE collection
with equality constraints and bounds. Our optimization algorithm does not converge for problems
hs001 105
hs002 hs024
0 hs005 hs037
10 hs044
hs038
error ||xi − x*||
hs110 hs268
−5
10 −5
10
−10 −10
10 10
0 20 40 60 0 5 10 15
iteration iteration
Figure 1. Convergence behaviour for the first two groups of test problems.
548 M. Diehl et al.
hs063, hs099, hs107, hs111 either because of difficulties with the solution of the quadratic sub-
problems or because of difficulties with the globalization strategy. Once more, since we only
examine local convergence properties, we do not examine these cases in detail. The required
iteration count to solve the test problems is given in Table 3. For the problems hs054 and hs062,
the algorithm terminates at a value that differs from the solution stated in the literature [17,22]
without an error message. All other obtained solutions coincide with the known optimal solutions.
Figure 2 shows the convergence behaviour of the proposed algorithm for the 10 test problems,
where only the last 40 iterates are shown for hs062.
5.2.4 Class 4: problems with equality constraints, linear inequality constraints, and bounds
This class contains only the three problems hs035mod, hs074, and hs075. For the first test case, the
algorithm yields within two iterations the known optimal solutions [17,22]. For the remaining two
problems, our optimization algorithm fails because of problems with the solution of the quadratic
subproblems.
5
hs041 10 hs062
hs053 hs080
5
10 hs054 hs081
hs055 hs112
error ||xi − x*||
hs060 0
hs119
0 10
10
−5
10 −5
10
−10
10
0 5 10 15 0 10 20 30 40 50
iteration iteration
Despite the very simple globalization strategy, our optimization algorithm converges for 27 out of
56 test problems to the optimal solution known from the literature [17,22]. For the remaining test
problems either the line search fails, a point with negative curvature was detected or difficulties
with the solution of the QP subproblems arose. Since there are enough test problems left to
analyse the convergence behaviour for a reasonable number of test problems, we did not analyse
the reasons for the failure in more detail. Table 4 states names and the the required iteration
counts for the solved test problems. Figure 3 shows the convergence behaviour for the solved
test problems. As can be seen, the convergence is indeed superlinear for all problems solved
with a reasonable iteration count. For the five problems that require a really high number of
iterations, the algorithm convergence only linearly and with a reduced accuracy. This behaviour
hs022 hs015
10
0 hs033 hs083
hs011 hs012
0
hs066 10 hs030
error ||xi − x*||
hs014 hs034
hs032 hs016
−5
hs031
10 −5
10
−10 −10
10 10
0 5 10 0 5 10 15
iteration iteration
hs010 hs013
hs088 hs029
0 hs092 hs067
10 0
hs065 10 hs104
error ||xi − x*||
hs089 hs108
hs090
hs091
10
−5 hs023
−5
10
−10
10
0 20 40 60 0 20 40 60
iteration iteration
was either caused by errors in the solution of the quadratic subproblem or the lack of positive
definiteness.
Periodic adsorption processes consist of vessels or beds packed with solid sorbent. The sorbent
is in contact with a multi-component fluid feed to preferentially adsorb one of the chemical
components onto the solid. The SMB chromatography to separate two dissolved species in a
liquid phase is one typical example for this class of periodic optimal control problems. Due
to the periodicity of the considered system described by the equality constraints of the opti-
mization problem, the corresponding Jacobian is dense. That is, we face a major difference to
Downloaded by [University of South Carolina ] at 12:21 30 June 2013
numerous PDE-constrained optimization problems where the Jacobian is sparse with a known
structure.
For our numerical tests, we use a strongly simplified model for a specific SMB process described
in detail in [9,23]. This medium scale problem with 127 variables comprises 122 equality con-
straints. Due to the periodic behaviour of the considered system, the equality constraint Jacobian
is basically dense. The objective is to maximize the feed throughput.
Note that each function evaluation necessitates a forward simulation of an ODE with 125
states. In the practical computations, we have chosen a Runge–Kutta integrator of order four
with a fixed number of steps, namely 100, on the nonfixed time interval [0, T ]. If the problem
would be attacked by a full discretization approach like collocation, it would have a size of
at least 100 times, 125 times the number of collocation points per interval (say at least two).
With more than 25,000 variables, it must thus be considered a large scale problem. The shooting
formulation reduces the number of variables to only 127, and the Jacobian factorization within
the TR1 update scheme allows us to reduce the number of free variables further, namely to only
five independents. Thus, the reduced QP in each iteration has only five degrees of freedom and
18 inequality constraints, and costs a negligible fraction of the computing time needed by each
iteration.
It is exactly this type of simulation-based optimization problem with many ‘hidden variables’
causing expensive function evaluations, for which our novel adjoint-based SQP algorithm with
Jacobian updates is designed. For the same initialization strategy of the matrices A0 and B0 as
above and the same termination criterion, our proposed optimization algorithm converges in 230
iterations. The convergence behaviour is illustrated in Figure 4. Even for this application-oriented
test problem one can observe superlinear convergence in the last iterations.
0 smb
10
error ||xi − x ||
*
−5
10
−10
10
0 50 100 150 200
iteration
We have presented an SQP type algorithm for the solution of nonlinear programming prob-
lems with nonlinear equality and linear inequality constraints, which is based on quasi-Newton
approximations of both the Hessian and the constraint Jacobian within each QP subproblem. The
Hessian update uses the well-known SR1 formula, whereas the constraint Jacobian is updated
according to the recently proposed TR1 formula [13], which can be shown to be affine invariant.
Both matrices are updated in a factorized form that directly allows us to set up a reduced QP in
the null space of the linear constraints. The dimension of the QP equals the number of degrees of
freedom. It is solved by a standard convex QP solver. In addition to the QP solution, one constraint
residual and two Lagrange gradients need to be evaluated in each SQP iteration, the latter of which
can efficiently be performed by the reverse mode of automatic differentiation. We have shown
Downloaded by [University of South Carolina ] at 12:21 30 June 2013
that the SQP algorithm is able to detect the correct active set in the QP solutions, despite inexact-
ness of the constraint Jacobians. Furthermore, superlinear convergence of the procedure can be
shown under quite common assumptions, and the particular assumption that the search directions
uniformly span the null space of the active inequality constraints.
First numerical results with the Hock–Schittkowski test set suggest good practical performance
of the algorithm, which is able to solve 86% of the considered 124 test problems. The application
to a nonlinear periodic optimal control problem, where the equality constraints result in a dense
Jacobian, shows that the algorithm is able to treat medium-scale problems well.
The considered problem class allows only linear inequality constraints. A very natural extension
would be to use an update procedure similar to the TR1 also for the Jacobian of the inequalities
to avoid the additional slack variables that are needed for nonlinear inequalities. Furthermore, the
STR1 update as proposed in this paper does not exploit any structure that is present in a specific
optimization problem. For that reason, future work may be dedicated to modified STR1 update
procedures that respect block structure in Jacobian and Hessian, e.g. by high-rank block updates.
This approach could be coupled with appropriate adjoint solvers in ODE and PDE optimal control
applications. Finally, the performance of the algorithm would presumably benefit from a more
sophisticated globalization strategy.
Acknowledgements
We thank the anonymous referees for their valuable comments, which helped us to improve the quality of the paper.
The first author also gratefully acknowledges support by the Research Council KUL: CoE EF/05/006 Optimization in
Engineering(OPTEC), GOA AMBioRICS, IOF-SCORES4CHEM, PhD/postdoc and fellow grants; Flemish Government:
FWO: PhD/postdoc grants, projects G.0452.04, G.0499.04, G.0211.05, G.0226.06, G.0321.06, G.0302.07, G.0320.08,
G.0558.08, G.0557.08, research communities (ICCoS, ANMMM, MLDM); IWT: PhD Grants, McKnow-E, Eureka-Flite;
Helmholtz: viCERP; EU: ERNSI,HD-MPC; Contracts: AMINAL; Belgian Federal Science Policy Office: IUAP P6/04
(DYSCO, Dynamical systems, control and optimization, 2007-2011).
References
[1] L.T. Biegler, O. Ghattas, M. Heinkenschloss, and B.V. Bloemen Waanders (eds.), Large-Scale PDE-Constrained
Optimization, Lecture Notes in Computational Science and Engineering Vol. 30, Springer, Berlin, 2003.
[2] H.G. Bock, Randwertproblemmethoden zur Parameteridenti_zierung in Systemen nichtlinearer Differentialgleichun-
gen, Bonner Mathematische Schriften Vol. 183, University of Bonn, Bonn, 1987.
[3] H.G. Bock, M. Diehl, E.A. Kostina, and J.P. Schlöder, Constrained optimal feedback control of systems governed
by large differential algebraic equations, in Real-Time and Online PDE-Constrained Optimization, L. Biegler,
O. Ghattas, M. Heinkenschloss, D. Keyes, and B. van Bloemen Waanders, eds., SIAM, Philadelphia, pp. 3–22,
2007.
[4] H.G. Bock, W. Egartner, W. Kappis, and V. Schulz, Practical shape optimization for turbine and compressor blades
by the use of PRSQP methods, Optim. Eng. 3(4) (2002), pp. 395–414.
552 M. Diehl et al.
[5] I. Bongartz, A.R. Conn, N. Gould, and P. Toint, CUTE: Constrained and unconstrained testing environment, ACM
Trans. Math. Software 21 (1995), pp. 123–160.
[6] R. Byrd, J. Gilbert, and J. Nocedal, A trust region method based on interior point techniques for nonlinear
programming, Math. Program. 89A (2000), pp. 149–185.
[7] A. Conn, N. Gould, and P. Toint, Convergence of quasi-Newton matrices generated by the symmetric rank one update,
Math. Program. 50A (1991), pp. 177–195.
[8] J.E. Dennis and J. Moré, Quasi-Newton methods, motivation and theory, SIAM Rev. 19(1) (1977), pp. 46–89.
[9] M. Diehl and A. Walther, A test problem for periodic optimal control algorithms, Tech. Rep. MATH-WR-01-2006,
Technische Universität Dresden, Dresden, 2006.
[10] H.J. Ferreau, H.G. Bock, and M. Diehl, An online active set strategy to overcome the limitations of explicit MPC,
Internat. J. Robust Nonlinear Control 18(8) (2008), pp. 816–830.
[11] A. Griewank, Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, Frontiers in Applied
Mathematics Vol. 19, SIAM, Philadelphia, 2000.
[12] A. Griewank, D. Juedes, and J. Utke, ADOL-C: A package for the automatic differentiation of algorithms written in
C/C++, ACM Trans. Math. Software 22 (1996), pp. 131–167.
[13] A. Griewank and A.Walther, On constrained optimization by adjoint-based quasi-Newton methods, Optim. Methods
Downloaded by [University of South Carolina ] at 12:21 30 June 2013