School of Computer Science and Applied Mathematics: X I I N
School of Computer Science and Applied Mathematics: X I I N
7.1 Introduction
We consider problems of minimizing an objective function subject to constraints on the variables. A general
formulation for these problems is
(
ci (x) = 0 i ∈ E,
min f (x) subject to (7.1)
x ci (x) ≥ 0 i ∈ I,
where f and ci are continuously differentiable, real-valued functions on a subset of Rn , and E and I are
finite sets of equality and inequality indices respectively. A point x is called a feasible point if it satisfy
all constraints simultaneously. Thus, we look for a feasible point with the smallest objective function
value.
The goal of this lecture is to establish optimality conditions for constrained optimization problems. The
optimality conditions for nonlinear constrained problems are important because they form the basis for
algorithms for solving such problems.
To introduce the basic principles behind the characterization of solutions of constrained optimization
problems, we work through two simple examples. The ideas discussed in the examples will be made rigorous
to define the first order optimality conditions for constrained optimization. We start by noting one item of
terminology that recurs throughout this document: At a feasible point x, the inequality constraint i ∈ I
is said to be active if ci (x) = 0 and inactive if the strict inequality ci (x) > 0 is satisfied.
In the language of (7.1), we have f (x) := x1 + x2 , and c1 = x21 + x22 − 2, therefore I =√∅, and E = {1}.
We can see by inspection that the feasible set for this problem is the circle of radius 2 centred at the
7-1
origin−just the boundary of this circle, not its interior. Obviously, x∗ = (−1, −1) is the optimal solution.
From any other point on the circle, it is easy to find a way to move
√ that stays feasible (i.e remains on the
circle) while decreasing f . For instance from the point x = ( 2, 0) any move in the clockwise direction
around the circle has desired effect.
Notice, that at the optimal solution x∗ , the constrained normal ∇c1 (x∗ ) is parallel to ∇f (x∗ ). That is,
there is a scalar λ∗1 such that
Hence, the direction d retains feasibility with respect to equality constraint c1 , to first order, when it
satisfies
On the other hand, a direction of improvement must produce a decrease of the objective function f , so
that
Hence, if there exists a direction d that satisfies both equation (7.5) and inequality (7.6), we conclude
that improvement on the current point x is possible. It follows that a necessary condition for optimality
for problem (7.2) is that there exist no direction d satisfying both equation (7.5) and inequality (7.6)
simultaneously. The only way such a direction does not exist is when ∇f (x) and ∇c1 (x) are parallel. That
is, if the condition ∇f (x) = λ1 ∇c1 (x) holds at x, for some scalar λ1 .
Hence, we introduce the Lagrangian function
Note that ∇x L(x, λ1 ) = ∇f (x) − λ1 ∇c1 (x), we can state the optimality condition (7.3) equivalently as
follows: at the optimal point x∗ , there is a scalar λ1 such that
∇x L(x, λ1 ) = 0. (7.8)
This observation suggests that we can search for solutions of equality constrained problem (7.1) by searching
for stationary points of the Lagrangian function. The scalar quantity λ1 is called the Lagrange multiplier
for the constraint c1 (x) = 0.
Though the condition (7.3) (equivalently (7.8)) appears to be necessary for an optimal solution of problem
(7.2), it is clearly not sufficient. For instance, in problem (7.2), the condition (7.3) is satisfied at the point
x = (1, 1) with λ1 = 12 . This point is obviously not a solution−in fact, it maximizes the objective function
f on the circle. Moreover, in the case of equality constrained problems, we cannot turn condition (7.3) into
sufficient condition simply by placing some restriction on the sign of λ1 . To see this, consider replacing
the constraint x21 + x22 − 2 = 0 by its negative 2 − x21 + x22 in problem (7.2). The solution of the problem is
not affected, but the value of λ1 that satisfies the condition (7.3) changes form λ1 = − 21 to λ1 = 12 .
7-2
7.2.2 Inequality constraint
The second example is a two-variable problem with single inequality constraint: This is a slight modification
of problem (7.2), in which the equality constraint is replaced by an inequality. We consider the problem,
for which the feasible region is the circle and its interior. Here the objective function f (x) = x1 + x2 and
c1 (x) = 2 − x21 + x22 , therefore E = ∅ and I = {1}.
Note that the constraint normal ∇c1 (x) points toward the interior of the feasible region at each point on
the boundary of the circle. By inspection, we see that the optimal solution is still x∗ = (−1, −1) and the
condition (7.3) holds with Lagrange multiplier λ∗1 = 21 . However, this inequality constrained problem differs
form the equality constrained problem (7.2) in that the sign of the Lagrange multiplier plays a significant
role, as we now argue.
As before, we conjecture that a given feasible point x is not optimal if we can find a step d that retains
feasibility and decreases the objective function f to first order. The main difference between equality
constrained problem (7.2) and inequality constrained problem (7.9) comes in the handling of the feasibility
condition. As in (7.6), the direction d improves the objective function, to first order, if ∇f (x)T d < 0.
Meanwhile, for the inequality constraint, the direction d retains feasibility if
In determining whether a direction d exists that satisfies both conditions (7.6) and (7.10), we consider the
following two cases.
• Case I: Consider the first case in which x lines strictly inside the circle, the strict inequality
c1 (x) > 0 holds. In this vector any vector d satisfies the condition (7.10), provided that its length
is sufficiently small. In particular, the problem is just like unconstrained optimization problem,
whenever ∇f (x∗ ) 6= 0 we can obtain a direction d that satisfies both (7.6) and (7.10) simultaneously
by setting
∇f (x)
d = −c1 (x) .
k∇f (x)k
The only situation in which such direction fails to exist is when
∇f (x) = 0. (7.11)
• Case II: Now, consider the case in which x lies on the boundary of the circle, so that c1 (x) = 0.
The conditions (7.6) and (7.10) therefore become
The first case of this conditions defines an open half-space, while the second defines a closed half-space.
We can easily see by inspection that the two regions fail to intersect only when ∇f (x) and ∇c1 (x)
points in the same direction, that is when
Note that the sign of the multiplier is significant here. If condition (7.3) were satisfied with a negative
value of λ1 , then ∇f (x) and ∇c1 would point in opposite directions. We see that the set of directions
that satisfy both (7.6) and (7.10) would make up an entire open half-plane.
7-3
Similar to equality constraint, the optimality conditions for both case I and case II can be summarized
neatly with reference to the Lagrangian function. When no first-order feasibility descent direction exists
at some point x∗ , we have that
∇x L(x∗ , λ∗1 ) = 0, for some λ∗1 ≥ 0, (7.13)
where we also require that
λ∗1 c1 (x) = 0 (7.14)
Condition (7.14) is know as a complementarity condition; it implies that the Lagrange multiplier λ1 can
be strictly positive only when the corresponding constraint c1 (x) is active. Conditions of this type play
a central role in constrained optimization. Hence, (7.13) reduces to ∇f (x∗ ) = 0, as required by case I
(7.11). In case II, condition (7.14) allows λ∗1 to take on a nonnegative value, so (7.13) becomes equivalent
to (7.12).
The above discussion suggests that a number of conditions are important in characterization of solutions
for the constrained problem (7.1). These include the relation ∇x L(x, λ) = 0, the nonnegative of λi for
all inequality constraints ci (x), and the complementarity condition λi ci (x) = 0 that is required for all the
inequality constraints. We now generalize the observations made above to state the first-order conditions
in a rigorous fashion.
In general, the Lagrangian for the constrained optimization problem (7.1) is defined as
X
L(x, λ) = f (x) − λi ci (x). (7.15)
i∈E∪I
Note that every constraint ci has its associated Lagrange multiplier λi . We need the following definitions
to state the first-order optimality conditions.
Definition 7.1 (Active set). The active set A(x) of any feasible point x is the union of the equality set E
with the indices of active inequality constraints; that is
A(x) = E ∪ {i ∈ I|ci (x) = 0}. (7.16)
Next, we need to pay more attention to properties of the constraints gradients. The vector ∇ci (x) is often
called the normal to the constraint ci (x) at the point x, because it is usually a vector that is perpendicular
to the contours of the constraint ci at x. In the case of inequality constraint, it points toward the feasible
side of the constraint. It is possible, however, that ∇ci (x) vanishes due to the algebraic representation
of ci . So that the term λi ∇ci (x) vanishes for all values of λi and does not play a role in the Lagrangian
gradient ∇x L(x, λ).
For instance, if we replace the constraint in problem (7.2) by the equivalent condition
2
c1 (x) = x21 + x22 − 2 = 0.
Then we have the gradient
4 x21 + x22 − 2 x1
∇c1 (x) = .
4 x21 + x22 − 2 x2
It is obvious that ∇c1 (x) = 0 for all feasible points. In particular the condition ∇f (x) = λ1 ∇c1 (x) does
no longer holds at the optimal point x = (−1, −1). We usually make an assumption called constrained
qualification to ensure such degenerate behaviour does not occur at the value of x in question. One such
constrained qualification is defined here.
7-4
Definition 7.2 (LICQ). Given the point x∗ and the active set A(x∗ ) defined by (7.16), we say that the
linear independence constraint qualification (LICQ) hold if the set of active constraint gradients ∇ci (x∗ ),
i ∈ A(x∗ ) is linearly independent.
Note that if LICQ holds, none of the active constraint gradients can be zero. This condition allow us
to state the following optimality conditions for a general nonlinear programming problem (7.1). These
conditions provide the foundation for many of the algorithms designed for problem (7.1). They are called
first-order conditions because they concern themselves with properties of the gradients of the objective and
constraint functions.
Theorem 7.1 (First Order Necessary Conditions under LICQ). Suppose that x∗ is a local solution of (7.1)
and the LICQ holds at x∗ . Then there is a Lagrange multiplier λ∗ with components λ∗i , i ∈ E ∪ I, such
that the following conditions are satisfied at (x∗ , λ∗ )
∇x L(x∗ , λ∗ ) = 0, (7.17)
∗
ci (x ) = 0, for all i ∈ E, (7.18)
∗
ci (x ) ≥ 0, for all i ∈ I, (7.19)
λ∗i ≥ 0, for all i ∈ I, (7.20)
λ∗i ci (x∗ ), for all i ∈ E ∪ I. (7.21)
P
Here, L(x, λ) is the Lagrangian function of an constraint problem, defined as L(x, λ) = f (x) − i λi ci (x).
The conditions (7.18)-(7.21) are often known as the Karush-Kuhn-Tucker conditions, or simply KKT
conditions for short. They are necessary conditions for the optimum of a constrained problem. For
the given problem (7.1) and solution point x∗ , there may be many vectors λ∗ for which the conditions
(7.18)-(7.21) are satisfied. When the LIQC holds, however, the optimal λ∗ is unique.
As in the unconstrained case, the first order conditions are not sufficient to guarantee a local minimum. For
this, we turn to the second order sufficient conditions. The first order conditions− the KKT conditions−tell
us how the first derivatives of f and the active constraints ci are related at the optimal solution x∗ . For the
second order conditions, we are concerned with the behaviour of the Hessian of the Lagrangian, denoted
∇2xx L(x, λ), at locations there the KKT conditions hold. In particular, we look for positive definiteness in
a subspace defined by the linearised constraints. We define this subspace by F :
Definition 7.3 (Feasible space). Given a point x∗ and the active constraint set A(x∗ ), the set F is defined
by
∇ci (x∗ )T w
= 0, for all i ∈ E
∗
w ∈ F (λ ) =⇒ ∇ci (x∗ )T w = 0, for all i ∈ A(x∗ ) ∩ I with λi > 0, (7.22)
∇ci (x∗ )T w ≥ 0, for all i ∈ A(x∗ ) ∩ I with λi = 0.
The subset F (λ∗ ) contains the direction w that tend to adhere to the active inequality constraints for
which the Lagrange multiplier component λ∗i is positive, as well as to the equality constraints. The first
theorem defines a necessary condition involving the second derivative: If the point x∗ is local solution,
then the curvature of the Lagrangian along directions in the feasible space F (λ∗ ) must be nonnegative.
Theorem 7.2 (Second Order Necessary Conditions under LICQ). Suppose that x∗ is a local solution of
problem (7.1) and that the LICQ condition is satisfied. Let λ∗ be a Lagrange multiplier vector such that
the KKT conditions (7.18)-(7.21) are satisfied, and let F (λ∗ ) be define as above. Then
wT ∇2xx L(x∗ , λ∗ )w ≥ 0, for all w ∈ F (λ∗ ). (7.23)
7-5
The second theorem defines a sufficient condition involving the second derivative: If the point x∗ is local
solution, then the curvature of the Lagrangian along directions in F (λ∗ ) must be strictly positive. Note
that the constraint qualification is not required.
Theorem 7.3 (Second Order Sufficient Conditions). Suppose that for some feasible point x∗ there is a
Lagrange multiplier vector λ∗ such that the KKT conditions (7.18)-(7.21) are satisfied. Suppose also that
The Lagrangian is
Hence the Hessian is negative definite in the set F (λ1 = 21 ). Therefore the KKT point x = (1, 1) with
λ1 = 21 is not a minimizer of the problem.
7-6
• Similarly, for λ1 = − 21 , with the corresponding point x = (−1, −1)T , we have
2x1 −1
∇c1 (x) = = =⇒ ∇c(x∗ )T w = −w1 − w2 = 0 =⇒ w2 = −w1 .
2x2 x=(−1,−1) −1
Hence the Hessian is positive definite in the set F (− 12 ). Therefore the KKT point x = (−1, −1) with
λ1 = − 12 is a strict minimizer of the problem.
in which we seek to minimise a non-convex function over the exterior of the unit circle. The objective
function is not bounded below in the feasible region. Therefore, no global solution exists, but it may still be
possible to identify a strict local solution on the boundary of the constraint. We search for such solution
by using the KKT conditions (7.18)-(7.21) and the second order conditions of Theorem 7.3.
By defining the Lagrangian for (7.26) in the usual way, it is easy to verify that
−0.2(x1 − 4) − 2λ1 x1
∇x L(x, λ) = (7.27)
2x2 − 2λ1 x2
2 −0.2 − 2λ1 0
∇xx L(x, λ) = (7.28)
0 2 − 2λ1
The point x∗ = (1, 0) satisfies the KKT conditions with λ1 = 0.3 and the active set A(x∗ ) = {1}. To check
that the second order sufficient conditions are satisfied at this point, we note that
2
∇c1 (x) =
0
Now, by substituting x∗ and λ∗ into (7.28), we have for any w ∈ F with w 6= 0 that
T 2 ∗ ∗
−0.4 0 0
= 1.4w22 > 0.
w ∇xx L(x , λ )w = 0 w2
0 1.4 w2
Hence, the second order sufficient conditions are satisfied, and we conclude from Theorem 7.3 that x∗ =
(1, 0) is a strict local minimizer for problem (7.26).
7-7
PROBLEM SET VI
1. Consider the following inequality constrained optimization problem
3 2 1 4
min x1 − + x2 −
x 2 2
subject to: 1 − x1 − x2 ≥ 0
x1 − x2 − 1 ≤ 0
x2 − x1 − 1 ≤ 0
1 + x1 + x2 ≥ 0
4x22 + x21 ≤ 4
(x1 − 2)2 + x22 ≤ 5
x1 ≥ 0, x2 ≥ 0
Show that the LICQ is not satisfied at the the feasible point x∗ = [0, 1]T .
3. Use the First and Second order optimality conditions to solve the following constrained optimization
problems.
(a)
min x1
s.t: x2 ≥ 0,
(x1 − 1)2 + x22 − 1 ≤ 0.
(b)
4. Consider the problem of finding a point on the parabola y = 15 (x − 1)2 that is closest to the point
(x, y) = (1, 2) in the Euclidean norm sense.
(a) Formulate this problem as constrained optimization problem.
(b) Find the KKT points for this problem. Is the LICQ satisfied?
(c) Which points are solutions?
(d) By directly substituting the constraint into the objective function and eliminating the variable
x, we obtain an unconstrained optimization problem. Show that the solution of this problem
cannot be the solution of the original problem.
7-8
5. Find the maxima of f (x) = x1 x2 over a unit disk defined by the inequality constraint x21 + x22 ≤ 1.
6. Solve the following constrained optimization problem
7-9