Interior Point
Interior Point
Interior Point
LECTURE NOTES
INTERIOR POINT
POLYNOMIAL TIME METHODS
IN CONVEX PROGRAMMING
ISYE 8813
Arkadi Nemirovski
On sabbatical leave from Technion – Israel Institute of Technology
Contents:
Introduction: what the course is about
Developing Tools, I: self-concordant functions, self-concordant barriers and the Newton method
Interior Point Polynomial methods, I: the path-following scheme
Developing Tools, II: Conic Duality
Interior Point Polynomial methods, II: the potential reduction scheme
Developing Tools, III: how to construct self-concordant barriers
Applications:
Linear and Quadratic Programming
Quadratically Constrained Quadratic Problems
Geometrical Programming
Semidefinite Programming
3
About Exercises
2 Self-concordant functions 19
2.1 Examples and elementary combination rules . . . . . . . . . . . . . . . . . . . . . 20
2.2 Properties of self-concordant functions . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Exercises: Around Symmetric Forms . . . . . . . . . . . . . . . . . . . . . . . . . 36
3 Self-concordant barriers 41
3.1 Definition, examples and combination rules . . . . . . . . . . . . . . . . . . . . . 41
3.2 Properties of self-concordant barriers . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3 Exercises: Self-concordant barriers . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5
6 CONTENTS
What we are about to study in this semester are the theory and the applications of interior
point polynomial time methods in Convex Programming. Today, in the introductory lecture, I
am not going to prove theorems and present algorithms. My goal is to explain what the course
is about, what are the interior point methods and why so many researchers and practitioners
are now deeply involved in this new area.
9
10 CHAPTER 1. INTRODUCTION TO THE COURSE
specifying the problem instance. The new method of Karmarkar possessed the complexity bound
of O(m3/2 n2 L) operations. In the standard for the complexity analysis case of more or less
”square” problems m = O(n) the former estimate becomes O(n4 L), the latter O(n3.5 L). Thus,
there was some progress in the complexity. And it can be said for sure that neither this moderate
progress, nor remarkable elegance of the new algorithm never could cause the revolution in
Optimization. What indeed was a sensation, what inspired extremely intensive activity in the
new area and in a few years resulted in significant theoretical and computational progress, was
the claim that the new algorithm in real-world computations was by order of magnitudes more
efficient than the Simplex method. Let me explain you why this was a sensation. It is known
that the Simplex method is not polynomial: there exist bad problem instances where the number
of pivotings grows exponentially with the dimension of the instance. Thus, any polynomial time
algorithm for LP, the Ellipsoid one, the method of Karmarkar or whatever else, for sure is
incomparably better in its worst-case behaviour than the Simplex. But this is the theoretical
worst-case behaviour which, as is demonstrated by almost 50-year practice, never occurs in real-
world applications; from the practical viewpoint, the Simplex method is an extremely efficient
algorithm with fairy low empirical complexity; this is why the method is able to solve very large-
scale real world LP problems in reasonable time. In contrast to this, the Ellipsoid method works
more or less in accordance with its theoretical worst-case complexity bound, so that in practical
computations this ”theoretically good” method is by far dominated by the Simplex even on
very small problems with tens of variables and constraints. If the method of Karmarkar would
also behave itself according to its theoretical complexity bound, it would be only slightly better
then the Ellipsoid method and still would be incomparably worse than the Simplex. The point,
anyhow, is that actual behaviour of the method of Karmarkar turned out to be much better than
it is said by the worst-case theoretical complexity bound. This phenomenon combined with the
theoretical advantages of a polynomial time algorithm, not the latter advantages alone, (same
as, I believe, not the empirical behaviour of the method alone), inspired an actual revolution in
optimization which continues up today and hardly will terminate in the nearest future.
I have said something about the birth of the ”interior point science”. As it often happens in
our field, later it turned out that this was the second birth; the first one was in 1967 in Russia,
where Ilya Dikin, then the Ph.D. student of Leonid Kantorovich, invented what is now called
the affine scaling algorithm for LP. This algorithm which hardly is theoretically polynomial, is
certain simplification of the method of Karmarkar which shares all practical advantages of the
basic Karmarkar algorithm; thus, as a computational tool, interior point methods exist at least
since 1967. A good question is why this computational tool which is in extreme fashion now was
completely overlooked in the West, same as in Russia. I think that this happened due to two
reasons: first, Dikin came too early, when there was no interest to iterative procedures for LP - a
new-borned iterative procedure, even of a great potential, hardly could overcome as a practical
tool perfectly polished Simplex method, and the theoretical complexity issues in these years did
not bother optimization people (even today we do not know whether the theoretical complexity
of the Dikin algorithm is better than that one of the Simplex; and in 1967 the question itself
hardly could occur). Second, the Dikin algorithm appeared in Russia, where there were neither
hardware base for Dikin to perform large-scale tests of his algorithm, nor ”social demand” for
solving large-scale LP problems, so it was almost impossible to realize the practical potential of
the new algorithm and to convince people in Russia, not speaking about the West, that this is
something which worths attention.
Thus, although the prehistory of the interior point technique for LP started in 1967, the
actual history of this subject started only in 1984. It would be impossible to outline numerous
significant contributions to the field done since then; it would require mentioning tens, if not
hundreds, of authors. There is, anyhow, one contribution which must be indicated explicitly.
1.2. THE GOAL: POYNOMIAL TIME METHODS 11
I mean the second cornerstone of the subject, the paper of James Renegar (1986) where the
first path-following polynomial time interior point method for LP was developed. The efficiency
estimate of this method was better than that one of the method of Karmarkar, namely, O(n3 L)
2) - cubic in the dimension, same as for classical methods of solving systems of linear equations;
up to now this is the best known theoretical complexity bound for LP. Besides this remarkable
theoretical advantage, the method of Renegar possesses an important advantage in, let me say,
the human dimension: the method belongs to a quite classical and well-known in Optimization
scheme, in contrast to rather unusual Ellipsoid and Karmarkar algorithms. The paper of Renegar
was extremely important for the understanding of the new methods and it, same as a little bit
later independent paper of Clovis Gonzaga with close result, brought the area in the position
very favourable for future developments.
To the moment I was speaking about interior point methods for Linear Programming, and
this reflects the actual history of the subject: not only the first interior point methods vere
developed for this case, but till the very last years the main activity, both theoretical and com-
putational, in the field was focused on Linear Programming and the very close to it Linearly
constrained Quadratic Programming. To extend the approach to more general classes of prob-
lems, it was actually a challenge: the original constructions and proofs heavily exploited the
polyhedral structure of the feasible domain of an LP problem, and in order to pass to the non-
linear case, it required to realize what is the deep intrinsic nature of the methods. This latter
problem was solved in a series of papers of Yurii Nesterov in 1988; the ideas of these papers form
the basis of the theory the course is devoted to, the theory which now has became a kind of
standard for unified explanation and development of polynomial time interior point algorithms
for convex problems, both linear and nonlinear. To present this theory and its applications, this
is the goal of my course. In the remaining part of this introductory lecture I am going to explain
what we are looking for and what will be our general strategy.
of a given analytical structure, like the family of LP problems, or Linearly constrained Quadratic
problems, or Quadratically constrained Quadratic ones, etc. The only formal assumption on the
family is that a problem instance p from it is identified by a finite-dimensional data vector D(p);
normally you can understand this vector as the collection of the numeric coefficients in analytical
expressions for the objective and the constraints; these expressions themselves are fixed by the
description of the family. The dimension of the data vector is called the size l(p) of the problem
instance. A numerical method for solving problems from the family is a routine which, given
on input the data vector, generates a sequence of approximate solutions to the problem in
such a way that every of these solutions is obtained in finitely many operations of precise real
arithmetic, like the four arithmetic operations, taking square roots, exponents, logarithms and
other elementary functions; each operand in an operation is either an entry of the data vector,
or the result of one of the preceding operations. We call a numerical method convergent, if, for
any positive ε and for any problem instance p from the family, the approximate solutions xi
2
recall that we are speaking about ”almost square” problems with the number of inequalities m being of order
of the number of variables n
12 CHAPTER 1. INTRODUCTION TO THE COURSE
generated by the method, starting with certain i = i∗ (ε, p), are ε-solutions to the problem, i.e.,
they belong to G and satisfy the relations
(f ∗ is the optimal value in the problem). We call a method polynomial, if it is convergent and
the arithmetic cost C(ε, p) of ε-solution, i.e., the total number of arithmetic operations at the
first i∗ (ε, p) steps of the method as applied to p, admits an upper bound as follows:
V(p)
C(ε, p) ≤ π(l(p)) ln ,
ε
where π is certain polynomial independent on the data and V(p) is certain data-dependent scale
factor. The ratio V(p)/ε can be interpreted as the relative accuracy which corresponds to the
absolute accuracy ε, and the quantity ln( V(p) ε ) can be thought of as the number of accuracy
digits in ε-solution. With this interpretation, the polynomiality of a method means that for this
method the arithmetic cost of an accuracy digit is bounded from above by a polynomial of the
problem size, and this polynomial can be thought of as the characteristic of the complexity of
the method.
It is reasonable to compare this approach with the information-based approach we dealt with
in the previous course. In the information-based complexity theory the problem was assumed
to be represented by an oracle, by a black box, so that a method, starting its work, had no
information on the instance; this information was accumulated via sequential calls to the oracle,
and the number of these calls sufficient to find an ε-solution was thought of as the complexity of
the method; we did not include in this complexity neither the computational effort of the oracle,
nor the arithmetic cost of processing the answers of the oracle by the method. In contrast to this,
in our now approach the data specifying the problem instance form the input to the method, so
that the method from the very beginning possesses complete global information on the problem
instance. What the method should do is to transform this input information into ε-solution
to the problem, and the complexity of the method (which now might be called algorithmic or
combinatorial complexity) is defined by the arithmetic cost of this transformation. It is clear
that our new approach is not as general as the information-based one, since now we can speak
only on families of problems of a reasonable analytic structure (otherwise the notion of the
data vector becomes senseless). As a compensation, the combinatorial complexity is much more
adequate measure of the actual computational effort than the information-based complexity.
After I have outlined what is our final goals, let me give you an idea of how this goal will be
achieved. In what follows we will develop methods of two different types: the path-following and
the potential reduction ones; the LP prototypes of these methods are, respectively, the methods
of Renegar and Gonzaga, which are path-following routines, and the method of Karmarkar,
which is a potential reduction one. In contrast to the actual historical order, we shall start with
the quite traditional path-following scheme, since we are unprepared to understand what in fact
happens in the methods of the Karmarkar type.
associated with smooth (at least twice continuously defferentiable) convex functions f , gi on
Rn . Let
G = {x ∈ Rn | gi (x) ≤ 0}
be the feasible domain of the problem; assume for the sake of simplicity that this domain is
bounded, and let the constraints {gi } satisfy the Slater condition:
Under these assumptions the feasible domain G is a solid - a closed and bounded convex set in
Rn with a nonempty interior.
In 60’s people believed that it is not difficult to solve unconstrained smooth convex problems,
and it was very natural to try to reduce the constrained problem (P ) to a series of unconstrained
problems. To this end it was suggested to associate with the feasible domain G of problem (P )
a barrier - an interior penalty function F (x), i.e., a smooth convex function F defined on the
interior of G and tending to ∞ when we approach from inside the boundary of G:
lim F (xi ) = ∞ for any sequence {xi ∈ int G} with lim xi ∈ ∂G.
i→∞ i→∞
Here the penalty parameter t is positive. Of course, x in (Pt ) is subject to the ”induced”
restriction x ∈ int G, since Ft is outside the latter set.
From our assumptions on G it immediately follows that
a) every of the problems (Pt ) has a unique solution x∗ (t); this solution is, of course, in the
interior of G;
b) the path x∗ (t) of solutions to (Pt ) is a continuous function of t ∈ [0, ∞), and all its
limiting, as t → ∞, points belong to the set of optimal solutions to (P ).
It immediately follows that if we are able to follow the path x∗ (t) along certain sequence ti → ∞
of values of the penalty parameter, i.e., know how to form ”good enough” approximations
xi ∈ int G to the points x∗ (ti ), say, such that
xi − x∗ (ti ) → 0, i → ∞, (1.1)
then we know how to solve (P ): b) and (1.1) imply that all limiting points of the sequance of
our iterates {xi } belong to the optimal set of (P ).
Now, to be able to meet the requirement (1.1) is, basically, the same as to be able to solve
to a prescribed accuracy each of the ”penalized” problems (Pt ). What are our abilities in this
respect? (Pt ) is a minimization problem with smooth and nondegenerate (i.e., with nonsingular
Hessian) objective. Of course, this objective is defined on the proper open convex subset of
Rn rather than on the whole Rn , so that the problem, rigorously speaking, is a constrained
one, same as the initial problem (P ). The constrained nature of (Pt ) is, anyhow, nothing
but an illusion: the solution to the problem is unique and belongs to the interior of G, and
any converging minimization method of a relaxation type (i.e., monotonically decreasing the
14 CHAPTER 1. INTRODUCTION TO THE COURSE
value of the objective along the sequence of iterates) started in an interior point of G would
automatically keep the iterates away from the boundary of G (since Ft → ∞ together with
F as the argument approaches the boundary from inside); thus, qualitatively speaking, the
behaviour of the method as applied to (Pt ) would be the same as if the objective Ft was defined
everywhere. In other words, we have basically the same possibilities to solve (Pt ) as if it was
an unconstrained problem with smooth and nondegenerate objective. Thus, the outlined path-
following scheme indeed achieves our goal - it reduces the constrained problem (P ) to a series
of in fact unconstrained problems (Pt ).
We have outlined what are our abilities to solve to a prescribed accuracy every particular
problem (Pt ) - to this end we can apply to the problem any relaxation iterative routine for
smooth unconstrained minimization, starting the routine from an interior point of G. What we
need, anyhow, is to solve not a single problem from the family, but a sequence of these problems
associated with certain tending to ∞ sequence of values of the penalty parameter. Of course,
in principle we could choose an arbitrary sequence {ti } and solve each of the problems (Pti )
independently, but anybody understands that it is senseless. What makes sense is to use the
approximate solution xi to the ”previous” problem (Pti ) as the starting point when solving the
”new” problem (Pti+1 ). Since x∗ (t), as we just have mentioned, is a continuous function of t, a
good approximate solution to the previous problem will be a good initial point for solving the
new one, provided that ti+1 − ti is not too large; this latter asumption can be ensured by a
proper policy of updating the penalty parameter.
To implement the aforementioned scheme, one should specify its main blocks, namely, to
choose somehow:
1) the barrier F ;
2) the ”working horse” - the unconstrained minimization method for solving the problems
(Pt ), along with the stopping criterion for the method;
3) the policy for updating the penalty parameter.
The traditional recommendations here were rather diffuse. The qualitative theory insisted
on at least C2 -smoothness and nondegeneracy of the barrier, and this was basically all; within
this class of barriers, there were no clear theoretical priorities. What people were adviced to do,
was
for 1): to choose F as certain ”preserving smoothness” aggregate of gi , e.g.,
m
X α
1
F (x) = (1.2)
i=1
−gi (x)
or something else of this type; the idea was that the local information on this barrier required
by the ”working horse” should be easily computed via similar information on the constraints gi ;
for 2): to choose as the ”working horse” the Newton method; this recommendation came
from computational experience and had no serious theoretical justification;
for 3): qualitatively, updating the penalty at a high rate, we reduce the number of auxiliary
unconstrained problems at the cost of elaborating each of the problems (since for large ti+1 − ti
a good approximation of x∗ (ti ) may be a bad starting point for solving the updated problem;
a low rate of updating the penalty simplifies the auxiliary problems and increases the number
of the problems to be solved before a prescribed value of the penalty (which corresponds to
the required accuracy of solving (P )) is achieved. The traitional theory was unable to offer
1.3. THE PATH-FOLLOWING SCHEME 15
explicit recommendations on the ”balanced” rate resulting in the optimal overall effort, and this
question normally was solved on the basis of ”computational experience”.
What was said looks very natural and is known for more than 30 years. Nevertheless, the
classical results on the path-following scheme have nothing in common with polynomial com-
plexity bounds, and not only because in 60’s nobody bothered about polynomiality: even after
you pose this question, the traditional results do not allow to answer this question affirmatively.
The reason is as follows: to perform the complexity analysis of the path-following scheme, one
needs not only qualitative information like ”the Newton method, as applied to a smooth convex
function with nondegenerate Hessian, converges quadratically, provided that the starting point
is close enough to the minimizer of the objective”, but also quantitive information: what is this
”close enough”. The results of this latter type also existed and everybody in Optimization knew
them, but it did not help much. Indeed, the typical quantitive result on the behaviour of the
Newton optimization method was as follows:
let φ be a C2 -continuous convex function defined in the Euclidean ball V of radius R centered
at x∗ and taking minimum at x∗ such that
φ00 (x∗ ) is nondegenerate with the spectrum from certain segment segment [L0 , L1 ], 0 < L0 <
L1 ;
φ00 (x) is Lipschitz continuous at x∗ with certain constant L3 :
provided that
|x − x∗ | ≤ ρ.
The functions ρ(·) and c(·) can be written down explicitly, the statement itself can be modified
and a little bit strengthen, but it does not matter for us: the point is the structure of traditional
results on the Newton method, not the results themselves. These results are local: the quantitive
description of the convergence properties of the method is given in terms of the parameters
responsible for smoothness and nondegeneracy of the objective, and the ”constant factor” c
in the rate-of-convergence expression (1.4), same as the size ρ of the ”domain of quadratic
convergence” become worse and worse as the aforementioned parameters of smoothness and
nondegeneracy of the objective become worse. This is the structure of the traditional rate-
of-convergence results for the Newton method; the structure traditional results on any other
standard method for smooth unconstrained optimization is completely similar: these results
always involve some data-dependent parameters of smoothness and/or nondegeneracy of the
objective, and the quantitive description of the rate of convergence always becomes worse and
worse as these parameters become worse.
Now it is easy to realize why the traditional rate-of-convergence results for our candidate
”working horses” - the Newton method or something else - do not allow to establish polynomi-
ality of the path-following scheme. As the method goes on, the parameters of smoothness and
nondegeneracy of our auxiliary objectives Ft inevitably become worse and worse: if the solution
16 CHAPTER 1. INTRODUCTION TO THE COURSE
to (P ) is on the boundary of G, and this is the only case of interest in constrained minimization,
the minimizers x∗ (t) of Ft approach the boundary of G as t grows, and the behaviour of Ft in a
neighbourhood of x∗ (t) becomes less and less regular (indeed, for large t the function Ft goes to
∞ very close to x∗ (t). Since the parameters of smoothness/nondegeneracy of Ft become worse
and worse as t grows, the auxiliary problems, from the traditional viewpoint, become quanti-
tively more and more complicated, and the progress in accuracy (# of new digits of accuracy
per unit computational effort) tends to 0 as the method goes on.
The seminal contribution of Renegar and Gonzaga was in demonstration of the fact that the
above scheme applied to a Linear Programming problem
and to the concrete barrier for the feasible domain G of the problem - to the standard logarithmic
barrier
m
X
F (x) = − ln(bj − aTj x)
j=1
0.001
ti+1 = (1 + √ )ti ; xi+1 = xi − [∇2x Fti+1 (xi )]−1 ∇x Fti+1 (xi ) (1.5)
m
(a single Newton step per each step in the penalty parameter) keeps the iterates in the interior
of G, maintains the ”closeness relation”
(provided that this relation was satisfied by the initial pair (t0 , x0 )) and ensures linear data-
independent rate of convergence
f (xi ) − f ∗ ≤ 2mt−1 −1
i ≤ 2mt0 exp{−O(1)im
−1/2
}. (1.6)
Thus, in spite of the above discussion, it turned out that for the particular barrier in question
the path-following scheme is polynomial - the penalty can be increased at a constant rate (1 +
0.001m−1/2 ) depending only on the size of the problem instance, and each step in the penalty
should be accompanied by a single Newton step in x. According to (1.6), the absolute inaccuracy
is inverse proportional to the penalty parameter, so that to add an extra accuracy digit it suffices
to increase the parameter by an absolute constant factor, which, in view of the description of
√
the method, takes O( m) steps. Thus, the Newton complexity - the # of Newton steps - of
finding an ε-solution is
√ V(p)
N (ε, p) = O( m) ln , (1.7)
ε
and since each Newton step costs, as it is easily seen, O(mn2 ) operations, the combinatorial
complexity of the method turns out to be polynomial, namely,
1.5 2 V(p)
C(ε, p) ≤ O(m n ) ln .
ε
1.4. WHAT IS INSIDE: SELF-CONCORDANCE 17
of minimizing a linear objective over convex set. Thus, the possibilities to solve convex problems
by interior point polynomial time methods are restricted only by our abilities to point out ”ex-
plicit polynomially computable” self-concordant barriers for the corresponding feasible domains,
which normally is not so difficult.
18 CHAPTER 1. INTRODUCTION TO THE COURSE
Self-concordant functions
In this lecture I introduce the main concept of the theory in question - the notion of a self-
concordant function. The goal is to define a family of smooth convex functions convenient for
minimization by the Newton method. Recall that a step of the Newton method as applied to
the problem of (unconstrained) minimization of a smooth convex function f is based on the
following rule:
in order to find the Newton iterate of a point x compute the second-order Taylor expansion of
f at x, find the minimizer x
b of this expansion and perform a step from x along the direction
x
b − x.
What the step should be, it depends on the version of the method: in the pure Newton routine
the iterate is exactly xb; it the relaxation version of the method one minimizes f along the ray
[x, x
b), etc.
As it was mentioned in the introductory lecture, the traditional results on the Newton
method state, under reasonable smoothness and nondegeneracy assumptions, its local quadratic
convergence. These results, as it became clear recently, possess a generic conceptual drawback:
the quantitive description of the region of quadratic convergence, same as the convergence itself,
is given in terms of the condition number of the Hessian of f at the minimizer and the Lipschitz
constant of this Hessian. These quantities, anyhow, are ”frame-dependent”: they are defined not
by f itself, but also by the Euclidean structure in the space of variables. Indeed, we need this
structure simply to define the Hessian matrix of f , same, by the way, as to define the gradient
of f . When we change the Euclidean structure, the gradient and the Hessian are subject to
certain transformation which does not remain invariant the quantities like the condition number
of the Hessian or its Lipschitz constant. As a result, the traditional description of the behaviour
of the method depends not only on the objective itself, but also on an arbitrary choice of the
Euclidean structure used in the description, which contradicts the affine-invariant nature of the
method (note that no ”metric notions” are involved into the formulation of the method). To
overcome this drawback, note that the objective itself at any point x induces certain Euclidean
structure Ex ; to define this structure, let us regard the second order differential
∂2
D2 f (x)[h, g] = |t=s=0 f (x + th + sg)
∂t∂s
of f taken at x along the pair of directions h and g as the inner product of the vectors h and
g. Since f is convex, this inner product possesses all required properties (except, possibly, the
nondegeneracy requirement ”the square of a nonzero vector is strictly positive”; as we shall see,
this is a minor difficulty). Of course, this Euclidean structure is local - it depends on x. Note
that the Hessian of f , taken at x with respect to the Euclidean structure Ex , is fine - this is
19
20 CHAPTER 2. SELF-CONCORDANT FUNCTIONS
simply the unit matrix, the matrix with the smallest possible condition number, namely, 1. The
traditional results on the Newton method say that what is important for besides this condition
number is the Lipschitz constant of the Hessian, or, which is basically the same, the magnitude
of the third order derivatives of f . What happens if we relate these latter quantities to the local
Euclidean structure defined by f ? This is the key to the notion of self-concordance. And the
definition is as follows:
Definition 2.0.1 Let Q be a nonempty open convex set in Rn and F be a C3 smooth con-
vex function defined on Q. F is called self-concordant on Q, if it possesses the following two
properties:
[Barrier property] F (xi ) → ∞ along every sequence {xi ∈ Q} converging, as i → ∞, to a
boundary point of Q;
[Differential inequality of self-concordance] F satisfies the differential inequality
3/2
|D3 F (x)[h, h, h]| ≤ 2 D2 F (x)[h, h] (2.1)
∂k
Dk F (x)[h1 , ..., hk ] ≡ |t =...=tk =0 F (x + t1 h1 + ... + tk hk )
∂t1 ...∂tk 1
(2.1) says exactly that if a vector h is of local Euclidean length 1, then the third order derivative
of F in the direction h is, in absolute value, at most 2; this is nothing but the aforementioned
”Lipschitz continuity”, with certain once for ever fixed constant, namely, 2, of the second-order
derivative of F with respect to the local Euclidean metric defined by this derivative itself.
You can ask what is so magic in the constant 2. The answer is as follows: both sides of
(2.1) should be nad actually are of the same homogeneity degree with respect to h (this is the
origin of the exponentual 3/2 in the right hand side). As a consequence, they are of different
homogeneity degrees with respect to F . Therefore, given a function F satisfying the inequality
3/2
|D3 F (x)[h, h, h]| ≤ 2α D2 F (x)[h, h] ,
√
with certain positive α, you always may scale F , namely, multiply it by α, and come to a
function satisfying (2.1). We see that the choice of the constant factor in (2.1) is of no actual
importance and is nothing but a normalization condition. The indicated choice of this factor is
motivated by the desire to make the function − ln t, which plays important role in what follows,
to satisfy (2.1) ”as it is”, without any scaling.
f (x) = xT Ax − 2bT x + c
This is immediate: the left hand side of (2.1) is identically zero. An single-line verification of
the definition justifies also the following example:
Example 2.1.2 The function − ln t is self-concordant on the positive ray {t ∈ R | t > 0}.
The number of examples can be easily increased, due to the following extremely simple (and
very useful) combination rules:
Proposition 2.1.1 (i) [stability with respect to affine substitutions of argument] Let F be self-
concordant on Q ⊂ Rn and x = Ay + b be affine mapping from Rk to Rn with the image
intersecting Q. Then the inverse image of Q under the mapping, i.e., the set
Q+ = {y ∈ Rk | Ay + b ∈ Q}
F + (y) = F (Ay + b) : Q+ → R
is self-concordant on Q+ .
(ii) [stability with respect to summation and multiplication by reals ≥ 1] Let Fi be self-
concordant functions on the open convex domains Qi ⊂ Rn and αi ≥ 1 be reals, i = 1, ..., m.
Assume that the set Q = ∩m i=1 Qi is nonempty. Then the function
is self-concordant on Q.
(iii) [stability with respect to direct summation] Let Fi be self-concordant on open convex
domains Qi ⊂ Rni , i = 1, ..., m. Then the function
is self-concordant on Q.
Proof is given by immediate and absolutely trivial verification of the definition. E.g., let us
prove (ii). Since Oi are open convex domains with nonempty intersection Q, Q is an open convex
domain, as it should be. Further, F , is, of course, C3 smooth and convex on Q. To prove the
barrier property, note that since Fi are convex, they are below bounded on any bounded subset
of Q. It follows that if {xj ∈ Q} is a sequence converging to a boundary point x of Q, then all
the sequences {αi Fi (xj )}, i = 1, ..., m, are below bounded, and at least one of them diverges to
∞ (since x belongs to the boundary of at least one of the sets Qi ); consequently, F (xj ) → ∞,
as required.
To verify (2.1), add the inequalities
3/2
αi |D3 Fi (x)[h, h, h]| ≤ 2αi D2 Fi (x)[h, h]
(x ∈ Q, h ∈ Rn ). The left hand side of the resulting inequality clearly will be ≥ |D3 F (x)[h, h, h]|,
3/2
while the right hand side will be ≤ 2 D2 F (x)[h, h] , since for nonnegative bi and αi ≥ 1 one
has X X
3/2
αi bi ≤ ( αi bi )3/2 .
i i
Since the function − ln t is self-concordant on the positive half-axis, every of the functions
Fi (x) = − ln(bi −aTi x) is self-concordant on Gi (item (i) of Proposition; note that Gi is the inverse
P
image of the positive half-axis under the affine mapping x 7→ bi − aTi x), whence F (x) = i Fi (x)
is self-concordant on G = ∩i Gi (item (ii) of Proposition).
In spite of its extreme simplicity, the fact stated in Corollary, as we shall see in the mean
time, is responsible for 50% of all polynomial time results in Linear Programming.
Now let us come to systematic investigation of properties of self-concordant functions, with
the final goal to analyze the behaviour of the Newton method as applied to a function of this
type.
Comment. This is the result of applying to the symmetric 3-linear form D3 F (x)[h1 , h2 , h3 ]
and 2-linear positive semidefinite form D2 F (x)[h1 , h2 ] the following general fact:
let A[h1 , ..., hk ] be a symmetric k-linear form on Rn and B[h1 , h2 ] be a symmetrice positive
semidefinite bilinear form such that
The open unit Dikin ellipsoid W1 (x) is contained in Q. Within this ellipsoid the Hessians of F
are ”almost proportional” to F 00 (x),
|h|x
|z T (F 0 (x + h) − F 0 (x))| ≤ |z|x ∀z whenever |h|x < 1, (2.3)
1 − |h|x
where
s2 s3 s4
+
ρ(s) = − ln(1 − s) − s =+ + ... (2.5)
2 3 4
Lower bound in (2.4) is valid for all h such that x + h ∈ Q, not only for those h with |h|x < 1.
Proof. Let h be such that
r ≡ |h|x < 1 and x + h ∈ Q.
Let us prove that relations (2.2), (2.3) and (2.4) are satisfied at this particular h.
10 . Let us set
φ(t) = D2 F (x + th)[h, h],
so that φ is continuously differentiable on [0, 1]. We have
0 ≤ φ(t), r2 = φ(0) < 1, |φ0 (t)| = |D3 F (x + th)[h, h, h]| ≤ 2φ3/2 (t),
so that
d −1/2
| φ (t)| ≤ 1.
dt
It follows that
φ−1/2
(0) − t ≤ φ−1/2
(t) ≤ φ−1/2
(0) + t, 0 ≤ t ≤ 1,
whence
φ (0) φ (0)
1/2
≤ φ (t) ≤ 1/2
.
(1 + tφ (0))2 (1 − tφ (0))2
24 CHAPTER 2. SELF-CONCORDANT FUNCTIONS
The resulting inequalities hold true for all t ∈ [0, 1] and all > 0; passing to limit as → +0,
we come to
r2 2 r2
≤ φ(t) ≡ D F (x + th)[h, h] ≤ , 0 ≤ t ≤ 1. (2.6)
(1 + rt)2 (1 − rt)2
20 . Two sequential integrations of (2.6) result in
Z 1 Z τ
r2
F (x) + DF (x)[h] + { dt}dτ ≤ F (x + h) ≤
0 0 (1 + rt)2
Z 1 Z τ
r2
≤ F (x) + DF (x)[h] + { dt}dτ,
0 0 (1 − rt)2
which after straightforward computation leads to (2.4) (recall that r = |h|x ).
Looking at the presented reasoning, one can immediately see that the restriction r < 1 was
used only in the derivation of the upper, not the lower bound in (2.4); therefore this lower bound
is valid for all h such that x + h ∈ Q, as claimed.
30 . Now let us fix g ∈ E and set
(we have used 0.). Relation (2.7) means that ψ satisfies the linear differential inequality
r
|ψ 0 (t)| ≤ 2ψ(t)φ1/2 (t) ≤ 2ψ(t) , 0≤t≤1
1 − rt
(the second inequality follows from (2.6) combined with ψ ≥ 0). It follows that
d
[(1 − rt)2 ψ(t)] ≡ (1 − rt)2 [ψ 0 (t) − 2r(1 − rt)−1 ψ(t)] ≤ 0, 0 ≤ t ≤ 1,
dt
and
d
[(1 − rt)−2 ψ(t)] ≡ (1 − rt)−2 [ψ 0 (t) + 2r(1 − rt)−1 ψ(t)] ≥ 0, 0 ≤ t ≤ 1,
dt
whence, respectively,
than 1 and bounded away from 1. It follows from (2.4) that F is bounded on the half-segment,
which is the desired contradiction: since y is a boundary point of Q, F should tend to ∞ as a
point from [x, y) approaches to y.
50 . It remains to prove (2.3). To this end let us fix an arbitrary vector z and let us set
Since the open unit Dikin ellipsoid W1 (x) is contained in Q, the function g is well-defined on
the segment [0, 1]. We have
g(0) = 0;
|g 0 (t)| = q
|z T F 00 (x + th)h| q
≤ z T F 00 (x + th)z hT F 00 (x + th)h
[we have usedqCauchy’s inequality]
q
≤ (1 − t|h|x )−2 z T F 00 (x)z hT F 00 (x)h
[we have used (2.2)]
q
= |h|x (1 − t|h|x )−2 z T F 00 (x)z,
whence Z 1 q
|h|x T F 00 (x)z =
|h|x q T 00
|g(1)| ≤ dt z z F (x)z,
0 (1 − t|h|x )2 1 − |h|x
as claimed in (2.3).
II. Recessive subspace of a self-concordant function. For x ∈ Q consider the subspace
{h ∈ E | D2 F (x)[h, h] = 0} - the kernel of the Hessian of F at x. This recessive subspace EF
of F is independent of the choice of x and is such that
Q = Q + EF .
In particular, the Hessian of F is nonsingular everywhere if and only if there exists a point where
the Hessian of F is nonsingular; this is for sure the case if Q is bounded.
Terminology: we call F nondegenerate, if EF = {0}, or, which is the same, if the Hessian of
F is nonsingular somewhere (and then everywhere) on Q.
Proof of II. To prove that the kernel of the Hessian of F is independent of the point where
the Hessian is taken is the same as to prove that if D2 F (x0 )[h, h] = 0, then D2 F (y)[h, h] ≡ 0
identically in y ∈ Q. To demonstrate this, let us fix y ∈ Q and consider the function
which is consinuously differentiable on the segment [0, 1]. Same as in the item 30 of the previous
proof, we have
|ψ 0 (t)| = |D3 F (x0 + t(y − x))[h, h, y − x]| ≤
h i1/2
≤ 2D2 F (x0 + t(y − x))[h, h] D2 F (x0 + t(y − x))[y − x, y − x] ≡ ψ(t)ξ(t)
with certain continuous on [0, 1] function ξ. It follows that
|ψ 0 (t)| ≤ M ψ(t)
with certain constant M , whence 0 ≤ ψ(t) ≤ ψ(0) exp{M t}, 0 ≤ t ≤ 1 (look at the derivative of
the function ψ(t) exp{−M t}). Since ψ(0) = 0, we come to ψ(1) = 0, i.e., D2 F (y)[h, h] = 0, as
claimed.
26 CHAPTER 2. SELF-CONCORDANT FUNCTIONS
Thus, the kernel of the Hessian of F is independent of the point where the Hessian is taken.
If h ∈ EF and x ∈ Q, then, of course, |h|x = 0, so that x + h ∈ W1 (x); from I. we know that
W1 (x) belongs to Q, so that x + h ∈ Q; thus, x + EF ⊂ Q whenever x ∈ Q, as required.
Now it is time to introduce a very important concept of Newton decrement of a self-
concordant function at a point. Let x ∈ Q. The Newton decrement of F at x is defined
as
λ(F, x) = max{DF (x)[h] | h ∈ E, |h|x ≤ 1}.
In other words, the Newton decrement is nothing but the conjugate to | · |x norm of the first-
order derivative of F at x. To be more exact, we should note that | · |x is not necessary a norm:
it may be a seminorm, i.e., may be zero at certain nonzero vectors; this happens if and only
if the recessive subspace EF of F is nontrivial, or, which is the same, if the Dikin ellipsoid of
F is not an actual ellipsoid, but an unbounded set - elliptic cylinder. In this latter case the
maximum in the definition of the Newton decrement may (not necessarily should) be +∞. We
can immediately realize when this is the case.
III. Continuity of the Newton decrement. The Newton decrement of F at x ∈ Q is finite
if and only if DF (x)[h] = 0 for all h ∈ EF . If it is the case for certain x = x0 ∈ Q, then it is
also the case for all x ∈ Q, and in this case the Newton decrement is continuous in x ∈ Q and
F is constant along its recessive subspace:
F (x + h) = F (x) ∀x ∈ Q ∀h ∈ EF ; (2.8)
is the (unique) representation of h ∈ E as the sum of vectors from EF and EF⊥ , then
πh = h⊥
F.
It is clear that
|πh|x ≡ |h|x
(since the difference h − πh belongs to EF and therefore is of zero | · |x -seminorm), and since we
have assumed that DF (x)[u] is zero for u ∈ EF , we also have
DF (x)[h] = DF (x)[πh].
Combining these observations, we see that it is possible to replace E in the definition of the
Newton decrement by EF⊥ :
Since | · |x restricted onto EF⊥ is a norm rather than a seminorm, the right hand side of the latter
relation is finite, as claimed.
Now let us demonstrate that if λ(F, x) is finite at certain point x0 ∈ Q, then it is also finite
at any other point x of Q and is continuous in x. To prove finiteness, as we just have seen, it
2.2. PROPERTIES OF SELF-CONCORDANT FUNCTIONS 27
suffices to demonstrate that DF (x)[h] = 0 for any x and any h ∈ EF . To this end let us fix
x ∈ Q and h ∈ EF and consider the function
This function is continuously differentiable on [0, 1] and is zero at the point t = 0 (since λ(F, x0 )
is assumed finite); besides this,
(since h belongs to the null space of the positive semidefinite symmetric bilinear form D2 F (x0 +
t(x − x0 ))[h1 , h2 ]), so that ψ is constant, namely, 0, and ψ(1) = 0, as required. As a byproduct
of our reasonong, we see that if λ(F, ·) is finite, then
F (x + h) = F (x), x ∈ Q, h ∈ EF ,
since the derivative of F at any point from Q in any direction from EF is zero.
It remains to prove that if λ(F, x) is finite at certain (and then, as we just have proved, at
any) point, then this is a continuous function of x. This is immediate: we already know that if
λ(F, x) is finite, it can be defined by relation (2.9), and this relation, by the standard reasons,
defines a continuous function of x (since | · |x restricted onto EF⊥ is a continuously depending on
x norm, not a seminorm).
The following simple observation clarifies the origin of the Newton decrement and its relation
to the Newton method.
IV. Newton Decrement and Newton Iterate. Given x ∈ Q, consider the second-order
Newton expansion of F at x, i.e., the convex quadratic form
1 1
NF,x (h) = F (x) + DF (x)[h] + D2 F (x)[h, h] ≡ F (x) + DF (x)[h] + |h|2x .
2 2
This form is below bounded if and only if it attains its minimum on E and if and only if
λ(F, x) < ∞; if it is the case, then for (any) Newton direction e of F at x, i.e., any minimizer
of this form, one has
D2 F (x)[e, h] ≡ −DF (x)[h], h ∈ E, (2.10)
|e|x = λ(F, x) (2.11)
and
1
NF,x (0) − NF,x (e) = λ2 (F, x). (2.12)
2
Thus, the Newton decrement is closely related to the amount by which the Newton iteration
x 7→ x + e
λ = max{bT h | hT Ah ≤ 1}
28 CHAPTER 2. SELF-CONCORDANT FUNCTIONS
is finite; if it is the case, then the minimizers y of the form are exactly the vectors such that
y T Ah = −bT h, h ∈ E,
The observation given by IV. allows to compute the Newton decrement in the nondegenerate
case EF = {0}.
IVa. Expressions for the Newton direction and the Newton decrement. If F is
nondegenerate and x ∈ Q, then the Newton direction of F at x is unique and is nothing but
F 0 and F 00 being the gradient and the Hessian of F with respect to certain Euclidean structure
on E, and the Newton decrement is given by
q q q
λ(F, x) = (F 0 (x))T [F 00 (x)]−1 F 0 (x) = eT (F, x)F 00 (x)e(F, x) = −eT (F, x)F 0 (x).
Proof. This is an immediate consequence of IV. (pass from the ”coordinateless” differentials
to ”coordinate” representation in terms of the gradient and the Hessian).
Now comes the main statement about the behaviour of the Newton method as applied to a
self-concordant function.
V. Damped Newton Method: relaxation property. Let λ(F, ·) be finite on Q. Given
x ∈ Q, consider the damped Newton iterate of x
1
x+ ≡ x+ (F, x) = x + e,
1 + λ(F, x)
e being (any) Newton direction of F at x. Then
x+ ∈ Q
and
F (x) − F (x+ ) ≥ λ(F, x) − ln(1 + λ(F, x)). (2.13)
Proof. As we know from IV., |e|x = λ ≡ λ(F, x), and therefore |x+ − x|x = λ/(1 + λ) < 1.
Thus, x+ belongs to the open unit Dikin ellipsoid of F centered at x, and, consequently, to Q
(see I.). In view of (2.4) we have
1
F (x+ ) ≤ F (x) + DF (x)[e] + ρ((1 + λ)−1 |e|x ) =
1+λ
[see (2.10) - (2.12)]
1 λ λ2 λ
= F (x) − D2 F (x)[e, e] + ρ = F (x) − +ρ =
1+λ 1+λ 1+λ 1+λ
2.2. PROPERTIES OF SELF-CONCORDANT FUNCTIONS 29
then, according to V., we would have a possibility to pass from any point x ∈ Q to another
point x+ with at least by the constant λ − ln(1 + λ) less value of F , which, of course, is
impossible, since F is assumed below bounded. Since inf x∈Q λ(F, x) = 0, there exists a point x
with λ ≡ λ(F, x) ≤ 1/6. From (2.4) it follows that
(we have used the Cauchy inequality), which combined with (2.11) results in
DF (x)[h] ≥ −λ|h|x ,
and we come to
F (x + h) ≥ F (x) − λ|h|x + |h|x − ln(1 + |h|x ). (2.14)
30 CHAPTER 2. SELF-CONCORDANT FUNCTIONS
the domain of f∗ is, by definition, comprised of those y for which the right hand side is finite.
It is immediately seen that Dom f ∗ is convex and f ∗ is convex on its domain.
Let Dom f be open and f be k ≥ 2 times continuously differentiable on its domain, the
Hessian of f being nondegenerate. It is celarly seen that
(L.1) if x ∈ Dom f , then y = f 0 (x) ∈ Dom f ∗ , and
Since f 00 is nondegenerate, by the Implicit Function Theorem the set Dom∗ f ∗ of values of f 0 is
open; since, in addition, f is convex, the mapping
x 7→ f 0 (x)
is (k − 1) times continuously differentiable one-to-one mapping from Dom f onto Dom∗ f ∗ with
(k − 1) times continuously differentiable inverse. From (L.1) it follows that this inverse mapping
also is given by gradient of some function, namely, f ∗ . Thus,
(L.2) The mapping x 7→ f 0 (x) is a one-to-one mapping of Dom f onto an open set Dom∗ f ∗ ⊂
Dom f ∗ , and the inverse mapping is given by y 7→ (f ∗ )0 (y).
As an immediate consequence of (L.2), we come to the following statement
(L.3) f ∗ is k times continuously differentiable on Dom∗ f ∗ , and
VII. Self-concordance of the Legendre transformation. Let the Hessian of the self-
concordant function F be nondegenerate at some (and then, as we know from II., at any) point.
Then Dom F ∗ = Dom∗ F ∗ is an open convex set, and the function F ∗ is self-concordant on
Dom F ∗ .
Proof. 10 . Let us prove first that Dom F ∗ = Dom∗ F ∗ . If y ∈ Dom F ∗ , then, by definition, the
function y T x−F (x) is bounded from above on Q, or, which is the same, the function F (x)−y T x
2.2. PROPERTIES OF SELF-CONCORDANT FUNCTIONS 31
[since g = [F 00 (x)]−1 h]
3/2
= 2 hT [F 00 (x)]−1 h .
3/2
The latter quantity, due to (L.3), is exactly 2 hT (F ∗ )00 (F 0 (x))h , and we come to
3/2
|D3 F ∗ (y)[h, h, h]| ≤ 2 D2 F ∗ (y)[h, h]
for all h and all y = F 0 (x) with x ∈ Q. When x runs over Q, y, as we already know, runs
through the whole Dom F ∗ , and we see that (2.1) indeed holds true.
VIII. Existence of minimizer, B. F attains its minimum on Q if and only if there exists
x ∈ Q with λ(F, x) < 1, and for every x with the latter property one has
1
we use the following rule for differentiating the mapping x 7→ B(x) ≡ A−1 (x), A(x) being a square nonsingular
matrix smoothly depending on x:
DB(x)[g] = −B(x)DA(x)[g]B(x)
(to get it, differentiate the identity B(x)A(x) ≡ I).
32 CHAPTER 2. SELF-CONCORDANT FUNCTIONS
Proof. The ”only if” part is evident: λ(F, x) = 0 at any minimizer x of F . To prove the
”if” part, we, same as in the proof of VI., can reduce the situation to the case when F is
nondegenerate. Let x be such that λ ≡ λ(F, x) < 1, and let y = F 0 (x). In view of (L.3) we have
(the latter relation follows from VIa.). Since λ < 1, we see that 0 belongs to the centered at y
open Dikin ellipsoid of the self-concordant (as we know from VII.) function F ∗ and therefore
(I.) to the domain of this function. From VII. we know that this domain is comprised of values
of the gradient of F at the points of Q; thus, there exists x∗ ∈ Q such that F 0 (x∗ ) = 0, and F
attains its minimum on Q. Furthermore, from (2.4) as applied to F ∗ and from from (2.18) we
have
F ∗ (0) ≤ F ∗ (y) − y T (F ∗ )0 (y) + ρ(λ);
since y = F 0 (x) and 0 = F 0 (x∗ ), we have (see (L.1))
and we come to
−F (x∗ ) ≤ y T x − F (x) − y T x + ρ(λ),
which is nothing but (2.16).
Finally, setting q
|h|y = hT (F ∗ )00 (y)h
and noticing that, by (2.18), |y|y = λ < 1, we get for an arbitrary vector z
Remark 2.2.1 Note how sharp is the condition of existence of minimizer given by VII.: for
the self-concordant on the positive ray and below unbounded function F (x) = − ln x one has
λ(F, x) ≡ 1!
IX. Damped Newton method: local quadratic convergence. Let λ(F, ·) be finite, let
x ∈ Q, and let x+ be the damped Newton iterate of x (see V.). Then
Besides this, if λ(F, x) < 1, then F attains its minimum on Q, and for any minimizer x∗ of F
one has
λ(F, x)
|x − x∗ |x∗ ≤ (2.20)
1 − λ(F, x)
2.2. PROPERTIES OF SELF-CONCORDANT FUNCTIONS 33
and
λ(F, x)
|x − x∗ |x ≤ . (2.21)
1 − λ(F, x)
It follows that
Z r Z t
+ 0
DF (x )[h] ≡ ψ(r) ≤ ψ(0) + rψ (0) + |h|x { 2(1 − τ λ)−3 λ2 dτ }dt =
0 0
λ2 r2
= ψ(0) + rψ 0 (0) + |h|x =
1 − λr
[the definition of ψ]
λ2 r2
= DF (x)[h] + rD2 F (x)[h, e] + |h|x =
1 − λr
[see (2.10)]
λ2 r2
= (1 − r)DF (x)[h] + |h|x =
1 − λr
[the definition of r]
λ λ2
DF (x)[h] + |h|x ≤
1+λ 1+λ
[since DF (x)[h] ≤ λ|h|x by definition of λ = λ(F, x)]
λ2
≤2 |h|x ≤
1+λ
[see (2.2) and take into account that |x+ − x|x = r|e|x = rλ]
λ2 1
≤2 |h| + = 2λ2 |h|x+ .
1 + λ 1 − rλ x
Thus, for any h ∈ E we have DF (x+ )[h] ≤ 2λ2 |h|x+ , as claimed in (2.19).
20 . Let x ∈ Q be such that λ ≡ λ(F, x) < 1. We already know from VIII. that in this case
F attains its minimum on Q, and that
Let x∗ be a minimizer of F on Q and let r = |x − x∗ |x∗ . From (2.4) applied to x∗ in the, role of
x and x − x∗ in the role of h it follows that
for some x.
B. Given x1 ∈ Q, consider the Damped Newton minimization process given by the recurrence
1
xi+1 = xi − [F 00 (xi )]−1 F 0 (xi ). (2.23)
1 + λ(F, xi )
The recurrence keeps the iterates in Q and possesses the following properties
B.1 [relaxation property]
in particular, if λ(F, xi ) is greater than an absolute constant, then the progress in the value of F
at the step i is at least another absolute constant; e.g., if λ(F, xi ) ≥ 1/4, then F (xi ) − F (xi+1 ) ≥
1 5
4 − ln 4 = 0.026856...
B.2 [local quadratic convergence] If at certain step i we have λ(F, xi ) ≤ 14 , then we are in
the region of quadratic convergence of the method, namely, for every j ≥ i we have
1
λ(F, xj+1 ) ≤ 2λ2 (F, xj ) [≤ λ(F, xj )], (2.25)
2
λ2 (F, xj )
F (xj ) − min F ≤ ρ(λ(F, xj )) [≤ ], (2.26)
Q 2(1 − λ(F, xj ))
and for the (unique) minimizer x∗ of F we have
λ(F, xj )
|xj − x∗ |x∗ ≤ (2.27)
1 − λ(F, xj )
and
λ(F, xj )
|xj − x∗ |xj ≤ . (2.28)
1 − λ(F, xj )
2.2. PROPERTIES OF SELF-CONCORDANT FUNCTIONS 35
C. If F is below bounded, then the Newton complexity (i.e., # of steps (2.23)) of finding a point
x ∈ Q with λ(F, x) ≤ κ ≤ 0.1) does not exceed the quantity
1
O(1) [F (x1 ) − min F ] + ln ln (2.29)
Q κ
Then
k
Y
|A[h1 , ..., hk ]| ≤ α B 1/2 [hi , hi ] (2.31)
i=1
A k-linear form is called symmetric, if it remains unchanged under every permutation of the
collection of arguments.
Exercise 2.3.1 Prove that any 2-linear form on Rn can be represented as A[h1 , h2 ] = hT1 ah2
via certain n × n matrix a. When the form is symmetric? Which of the forms in the above
examples are symmetric?
The restriction of a symmetric k-linear form A[h1 , ..., hk ] onto the ”diagonal” h1 = h2 = ... =
hk = h, which is a function of h ∈ Rn , is called homogeneous polynomial of full degree k on
Rn ; the definition coincides with the usual Calculus definition: ”a polynomial of n variables is
a finite sum of monomials, every monomial being constant times product of nonnegative integer
powers of the variables. A polynomial is called homogeneous of full degree k if the sum of the
powers in every monomial is equal to k”.
Exercise 2.3.2 Prove the equivalence of the aforementioned two definitions of a homogeneous
polynomial. What is the 3-linear form on R2 which produces the polynomial xy 2 ((x, y) are
coordinates on R2 )?
Of course, you can restrict onto diagonal an arbitrary k-linear form, not necessarily symmetric,
and get certain function on E. You, anyhow, will not get something new: for any k-linear form
A[h1 , ..., hk ] there exists a symmetric k-linear form AS [h1 , ..., hk ] with the same restriction on
the diagonal:
A[h, ..., h] ≡ AS [h, ..., h], h ∈ E;
to get AS , it suffices to take average, over all permutations σ of the k-element index set, of the
forms Aσ [h1 , ..., hk ] = A[hσ(1) , ..., hσ(k) ].
2.3. EXERCISES: AROUND SYMMETRIC FORMS 37
From polylinearity of a k-linear form A[h1 , ..., hk ] it follows that the value of the form at the
collection of linear combinations
X
hi = ai,j ui,j , i = 1, ..., k,
j∈J
this is nothing but the usual rule for ”opening the parentheses”. In particular, A[·] is uniquely
defined by its values on the collections comprised of basis vectors e1 , ..., en :
X
A[h1 , ..., hk ] = h1,j1 h2,j2 ...hk,jk A[ej1 , ej2 , ..., ejk ],
1≤j1 ,...,jk ≤n
hi,j being j-th coordinate of the vector hi with respect to the basis. It follows that a polylinear
form is continuous (even C∞ ) function of its arguments.
A symmetric bilinear form A[h1 , h2 ] is called positive semidefinite, if the corresponding ho-
mogeneous polynomial is nonnegative, i.e., if A[h, h] ≥ 0 for all h. A symmetric positive semidef-
inite bilinear form sastisfies all requirements imposed on an inner product, except, possibly, the
nondegeneracy requirements ”square of nonzero vector is nonzero”. If this requirement also is
satisfied, i.e., if A[h, h] > 0 whenever h 6= 0, then A[h1 , h2 ] defines an Euclidean structure on
E. As we know from Exercise 2.3.1, a bilinear form on Rn always can be represented by a
n × n matrix a as hT1 ah2 ; the form is symmetric if and only if a = aT , and is symmetric positive
(semi)definite if and only if a is symmetric positive (semi)definite matrix.
A symmetric k-linear form produces, as we know, a uniquely defined homogeneous polyno-
mial of degree k. It turns out that the polynomial ”remembers everything” about the related
k-linear form:
• integer m,
with the following property: for any n and any k-linear symmetric form A[h1 , ..., hk ] on Rn
identically in h1 , ..., hk one has
m
" k k k
#
X X X X
A[h1 , ..., hk ] = wl A ri,l hi , ri,l hi , ..., ri,l hi .
l=1 i=1 i=1 i=1
In other words, A can be restored, in a linear fashion, via its restriction on the diagonal.
Find a set of scale factors and weights for k = 2 and k = 3.
Now let us come to the proof of (P). Of course, it suffices to consider the case when B is
positive definite rather than semidefinite (replace B[h1 , h2 ] with B [h1 , h2 ] = B[h1 , h2 ] + hT1 h2 ,
> 0, thus making B positive definite and preserving the assumption (2.30); given that (P)
is valid for positive definite B, we would know that (2.31) is valid for B replaced with B and
would be able to pass to limit as → 0). Thus, from now on we assume that B is symmetric
38 CHAPTER 2. SELF-CONCORDANT FUNCTIONS
positive definite. In this case B[h1 , h2 ] can be taken as an inner product on Rn , and in the
associated ”metric” terms (P) reads as follows:
(P’): let | · | be a Euclidean norm on Rn , A[h1 , ..., hk ] be a k-linear symmetric form on Rn such
that
|A[h, ..., h]| ≤ α|h|k , h ∈ Rn .
Then
|A[h1 , ..., hk ]| ≤ α|h1 |...|hk |, h1 , ..., hk ∈ Rn .
Now, due to homogeneity of A with respect to every hi , to prove the conclusion in (P’) is
the same as to prove that |A[h1 , ..., hk ]| ≤ α whenever |hi | ≤ 1, i = 1, ..., k. Thus, we come to
the following equivalent reformulation of (P’):
prove that for a k-linear symmetric form A[h1 , ..., hk ] one has
Note that from Exercise 2.3.3 it immediately follows that the right hand side of (2.32) is
majorated by a constant times the left hand side, with the constant depending on k only. For
this latter statement it is completely unimportant whether the norm | · | in question is or is
not Euclidean. The point, anyhow, is that in the case of Euclidean norm the aforementioned
constant factor can be set to 1. This is something which should be a ”common knowledge”;
surprisingly, I was unable to find somewhere even the statement, not speaking of the proof. I
do not think that the proof presented in the remaining exercises is the simplest one, and you
are welcome to find something better. We shall prove (2.32) by induction on k.
Exercise 2.3.4 Prove the base, i.e., that (2.32) holds true for k = 2.
Now assume that (2.32) is valid for k = l − 1 and any k-linear symmetric form A, and let is
prove that it is valid also for k = l.
Let us fix a symmetric l-linear form A, and let us call a collection T = {T1 , ..., Tl } of one-
dimensional subspaces of Rn an extremal, if for some (and then - for each) choice of unit vectors
ei ∈ Ti one has
|A[e1 , ..., el ]| = ω ≡ max |A[h1 , ..., hl ]|.
|h1 |=...=|hl |=1
Clearly, extremals exist (we have seen that a[·] is continuous). Let T be the set of all extremals.
To prove (2.32) is the same as to prove that T contains an extremal of the type {T, ..., T }.
Exercise 2.3.5 #+ Let {T1 , ..., Tl } ∈ T and T1 6= T2 . Let ei ∈ Ti be unit vectors, h = e1 +
e2 , q = e1 − e2 . Prove that then both {Rh, Rh, T3 , ..., Tl } and {Rq, Rq, T3 , ..., Tl } are extremals.
t times s times
z }| { z }| {
Let T∗ be the subset of T formed by the extremals of the type {T, ..., T , S, ..., S} for some
t and s (depending on the extremal). By virtue of the inductive assumption, T∗ is nonempty
t times s times
z }| { z }| {
(in fact,T∗ contains an extremal of the type {T, ..., T, S}). For T = {T, ..., T , S, ..., S} ∈ T∗ let
α(T ) denote the angle (from [0, π2 ]) between T and S.
Exercise 2.3.6 #+ Prove that if T = {T, ..., T, S, ..., S} is an extremal of the aforementioned
”2-line” type, then there exists an extremal T 0 of the same type with φ(T 0 ) ≤ 12 φ(T ). Derive
from this observation that there exists a 2-line extremal with φ(T ) = 0, i.e., of the type {T, ..., T },
and thus complete the inductive step.
2.3. EXERCISES: AROUND SYMMETRIC FORMS 39
Exercise 2.3.7 ∗ Let A[h1 , ..., hk ], h1 , ..., hk ∈ Rn be a linear with respect to every argument
and invariant with respect to permutations of arguments mapping taking values in certain Rl ,
and let B[h1 , h2 ] be a symmetric positive semidefinite bilinear scalar form on Rn such that
Self-concordant barriers
We have introduced and studied the notion of a self-concordant function for an open convex
domain. To complete developing of technical tools, we should investigate a specific subfamily of
this family - self-concordant barriers.
Recall that self-concordance is, basically, Lipschitz continuity of the Hessian of F with respect
to the local Euclidean metric defined by the Hessian itself. Similarly, (3.1) says that F should
be Lipschitz continuous, with constant ϑ1/2 , with respect to the same local metric.
Recall also that the quantity
was called the Newton decrement of F at x; this quantity played crucial role in our investigation
of self-concordant functions. Relation (3.1) means exactly that the Newton decrement of F
should be bounded from above, independently of x, by certain constant, and the square of this
constant is called the parameter of the barrier.
Let us point out preliminary examples of self-concordant barriers. To this end let us look at
the basic examples of self-concordant functions given in the previous lecture.
It can be proved that a constant is the only self-concordant barrier for the whole space, and
the only self-concordant barrier with the value of the parameter less than 1. In what follows we
never deal with the trivial - constant - barrier, so that you should remember that the parameters
of barriers in question will always be ≥ 1.
In connection with the above trivial example, note that the known to us self-concordant on
the whole space functions - linear and convex quadratic ones - are not self-concordant barriers,
41
42 CHAPTER 3. SELF-CONCORDANT BARRIERS
provided that they are nonconstant. This claim follows from the aforementioned general fact
that the only self-concordant barrier for the whole space is a constant and also can be easily
verified directly.
Another basic example of a self-concordant function known to us is more productive:
Example 3.1.2 The function F (x) = − ln x is a self-concordant barrier with parameter 1 for
the non-negative ray.
This is seen from an immediate computation.
The number of examples can be immediately increased, due to the following simple combi-
nation rules (completely similar to those for self-concordant functions):
Proposition 3.1.1 (i) [stability with respect to affine substitutions of argument] Let F be a
ϑ-self-concordant barrier for G ⊂ Rn and let x = Ay + b be affine mapping from Rk to Rn with
the image intersecting int G. Then the inverse image of G under the mapping, i.e., the set
G+ = {y ∈ Rk | Ay + b ∈ G}
[Cauchy’s inequality]
"m #1/2 " m #1/2 "m #1/2
X X X h i1/2
2
≤ αi ϑi αi D Fi (x)[h, h] = αi ϑi D2 F (x)[h, h] ,
i=1 i=1 i=1
as required.
An immediate consequence of our combination rules is as follows (cf. Corollary 2.1.1):
3.2. PROPERTIES OF SELF-CONCORDANT BARRIERS 43
In other words, to find πx (y), consider the ray [x, y) and look where this ray intersects the
boundary of G. If the intersection point y 0 exists, then πx (y) is the length of the segment [x, y 0 ]
divided by the length of the segment [x, y]; if the ray [x, y) is contained in G, then πx (y) = 0.
Note that the Minkowsky function is convex, continuous and positive homogeneous:
besides this, it is zero at x and is ≤ 1 in G, 1 on the boundary of G and > 1 outside G. Note
that this function is in fact defined in purely affine terms (the lengths of segments are, of course,
metric notions, but the ratio of lengths of parallel segments is metric-independent).
Now let us switch to properties of self-concordant barriers.
0. Explosure property: Let x ∈ int G and let y be such that DF (x)[y − x] > 0. Then
DF (x)[y − x]
πx (y) ≥ γ ≡ , (3.2)
ϑ
so that the point x + γ −1 (y − x) is not an interior point of G.
Proof. Let
φ(t) = F (x + t(y − x)) : ∆ → R,
44 CHAPTER 3. SELF-CONCORDANT BARRIERS
where ∆ = [0, T ) is the largest half-interval of the ray t ≥ 0 such that x + t(y − x) ∈ int G
whenever t ∈ ∆. Note that the function φ is three times continuously differentiable on ∆ and
that
T = πx−1 (y) (3.3)
(the definition of the Minkowsky function; here 0−1 = +∞).
From the fact that F is ϑ-self-concordant barrier for G it immediately follows (see Proposition
3.1.1.(i)) that q
|φ0 (t)| ≤ ϑ1/2 φ00 (t),
or, which is the same,
ϑψ 0 (t) ≥ ψ 2 (t), t ∈ ∆, (3.4)
where ψ(t) = φ0 (t). Note that ψ(0) = DF (x)[y − x] is positive by assumption and ψ is nonde-
creasing (as the derivative of a convex function), so that ψ is positive on ∆. From (3.4) and the
relation ψ(0) > 0 it follows that ϑ > 0. In view of the latter relation and since ψ(·) > 0, we can
rewrite (3.4) as
(−ψ −1 (t))0 ≡ ψ 0 (t)ψ −2 (t) ≥ ϑ−1 ,
whence
ϑψ(0)
ψ(t) ≥ , t ∈ ∆. (3.5)
ϑ − tψ(0)
The left hand side of the latter relation is bounded on any segment [0, T 0 ], 0 < T 0 < T , and we
conclude that
ϑ
T ≤ .
ψ(0)
Recalling that T = πx−1 (y) and that ψ(0) = DF (x)[y − x], we come to (3.2).
I. Semiboundedness. For any x ∈ int G and y ∈ G one has
DF (x)[y − x] ≤ ϑ. (3.6)
Proof. The relation is evident in the case of DF (x)[y − x] ≤ 0; for the case DF (x)[y − x] > 0
the relation is an immediate consequence of (3.2), since πx (y) ≤ 1 whenever y ∈ G.
II. Upper bound. Let x, y ∈ int G. Then
1
F (y) ≤ F (x) + ϑ ln . (3.7)
1 − πx (y)
(1 − t)πx (y)
πx+t(y−x) (y) = ;
1 − tπx (y)
whence
(1 − t)πx (y)
(1 − t)DF (x + t(y − x))[y − x] ≤ ϑ ,
1 − tπx (y)
3.2. PROPERTIES OF SELF-CONCORDANT BARRIERS 45
or
πx (y)
DF (x + t(y − x))[y − x] ≤ ϑ .
1 − tπx (y)
Integrating over t ∈ [0, 1], we come to
1
F (y) − F (x) ≤ ϑ ln ,
1 − πx (y)
as required.
III. Lower bound. Let x, y ∈ int G. Then
1
F (y) ≥ F (x) + DF (x)[y − x] + ln − πx (y). (3.8)
1 − πx (y)
Proof. Let φ(t) = F (x + t(y − x)), −T− < t < T ≡ πx−1 (t), where T− is the largest t such that
x − t(y − x) ∈ G. By Proposition 3.1.1.(i) φ is a self-concordant barrier for ∆ = [−T− , T ], and
therefore this function is self-concordant on ∆; the closed unit Dikin ellipsoid of φ centered at
t ∈ int ∆ should therefore belong to the closure of ∆ (Lecture 2, I.), which means that
V = {y + h | |h|x ≤ (1 − πx (y))}
46 CHAPTER 3. SELF-CONCORDANT BARRIERS
is an | · |x -ball centered at y and at the same time V ⊂ G (since W ⊂ G and the dilation maps
G into itself). From the semiboundedness property I. it follows that
DF (y)[h] ≤ ϑ ∀h : y + h ∈ G,
Proof. As we know from Lecture 2, II., the recessive subspace EF of any self-concordant
function is also the recessive subspace of its domain: int G + EF = int G. Therefore if G does
not contain lines, then EF = {0}, so that F is nondegenerate. Vice versa, if G contains a line
with direction h, then y = x + th ∈ int G for all x ∈ int G and all t ∈ R, from semiboundedness
(see I.) it immediately follows that DF (x)[y − x] = DF (x)[th] ≤ ϑ for all x ∈ int G and all
t ∈ R, which implies that DF (x)[h] = 0. Thus, F is constant along the direction h at any point
of int G, so that D2 F (x)[h, h] = 0 and therefore F is degenerate.
From now on assume that G does not contain lines. If G is bounded, then F , of course,
attains its minumum on int G due to the standard compactness reasons. Now assume that F
attains its minimum on int G; due to nondegeneracy, the minimizer x∗F is unique. Let W be the
closed unit Dikin ellipsoid of F centered at x∗F ; as we know√from I., Lecture 2, it is contained
in G (recall that G is closed). Let us prove that the ϑ + 2 ϑ times larger concentric ellipsoid
W + contains G; this will result both in the boundedness of G and in the announced centering
property and therefore will complete the proof.
Lemma 3.2.1 Let x ∈ int G and let h√be an arbitrary direction with |h|x = 1 such that
DF (x)[h] ≥ 0. Then the point x + (ϑ + 2 ϑ)h is outside the interior of G.
Note that Lemma 3.2.1 immediately implies the desired inclusion G ⊂ W + , since when x = x∗F
is the minimizer of F , so that DF (x)[h] = 0 for all h, the premise of the lemma is valid for any
h with |h|x = 1.
Proof of Lemma. Let φ(t) = D2 F (x + th)[h, h] and T = sup{t | x + th ∈ G}. From
self-concordance of F it follows that
whence 0
φ−1/2 (t) ≤ 1,
so that
1 1
p −p ≤ t, 0 ≤ t < T.
φ(t) φ(0)
In view of φ00 (0) = |h|2x = 1 we come to
1
φ(t) ≥ , 0 ≤ t < T,
(1 + t)2
(1 + r)ϑ
t≤r+ . (3.12)
r
Taking here r = 1/2, we get certain√ upper bound on t; thus, T ≡ sup{t √ | x + th ∈ G} < ∞, and
(3.12) is valid for t = T . If T > ϑ, then (3.12) is valid for t = T , r = ϑ, and we come to
√
T ≤ ϑ + 2 ϑ; (3.13)
√
this latter inequality is, of course, valid in the case of T ≤ ϑ as well. Thus, T always satisfies
√
(3.13). By construction, x + T h is not an interior point of G, and, consequently, x + [ϑ + 2 ϑ]h
also is not an interior point of G, as claimed.
Proof. Let x ∈ int G; since h is a recessive direction, y = x + th ∈ G for all t > 0, and I.
implies that DF (x)[y−x] = DF (x)[th] ≤ ϑ for all t ≥ 0, whence DF (x)[h] ≤ 0; thus, F indeed is
nonincreasing in the direction h at any point x ∈ int G. To prove (3.14), consider the restriction
f (t) of F onto the intersection of the line x + Rh with G. Since h is a recessive direction for G,
the domain of f is certain ray ∆ of the type (−a, ∞), a > 0. According to Proposition 3.1.1.(i),
f is self-concordant barrier for the ray ∆. It is possible that f is degenerate: Ef 6= {0}. Since
f is a function of one variable, it is possible only if ∆ = Ef = R (see II., Lecture 2), so that
f 00 ≡ 0; in this case (3.14) is an immediate consequence of already proved nonnegativity of the
left hand side in the relation. Now assume that f is nondegenerate. In view of V. f does not
attain its minimum on ∆ (since f is a nondegenerate self-concordant barrier for an unbounded
domain). From VIII., Lecture 2, we conclude that λ(f, t) ≥ 1 for all t ∈ ∆. Thus,
this is nothing but the (semi)norm of h associated with the symmetrization of G with respect
to x, i.e., the norm with the unit ball
Gx = {y ∈ E | x ± y ∈ G}.
One has √
px (h) ≤ |h|x ≤ (ϑ + 2 ϑ)px (h). (3.15)
Proof. The first inequality in (3.15) is evident: we know that the closed unit Dikin ellipsoid
of F centered at x is contained in G (since F is self-concordant and G is closed, see I, Lecture
2). In other words, G contains the unit | · |x ball Wc1 (x) centered at x; by definition, the unit
px (·)-ball centered at x is the largest symmetric with respect to x subset of G and therefore it
contains the set W c1 (x), which is equivalent to the left inequality in (3.15). To prove the right
√
inequality, this is the same as to demonstrate that if |h|x = 1, then px (h) ≥ (ϑ + 2 ϑ)−1 √, or,
which is the same in view of the origin of p, that at least one of the two vectors x ± (ϑ + 2 ϑ)h
does not belong to the interior of G. Without loss of generality, let us assume that DF (x)[h] ≥ 0
(if it is not the case, one should replace in what follows h with −h). The pair√x, h satisfies the
premise of Lemma 3.2.1, and this lemma says to us that the vector x + (ϑ + 2 ϑ)h indeed does
not belong to the interior of G.
VII. Compatibility of Hessians. Let x, y ∈ int G. Then for any h ∈ E one has
√ !2
2 ϑ+2 ϑ
D F (y)[h, h] ≤ D2 F (x)[h, h]. (3.16)
1 − πx (y)
Now, if |h|x ≤ 1, then x + h ∈ G (since the closed unit Dikin ellipsoid of F centered at x is
contained in G), so that the point
belongs to G. We conclude that the centered at y | · |x -ball of the radius 1 − πx (y) is contained
in G and therefore is contained in the largest symmetric with respect to x subset of G; in other
words, we have
|h|x ≤ 1 − πx (y) ⇒ py (h) ≤ 1,
or, which is the same,
py (h) ≤ [1 − πx (y)]−1 |h|x , ∀h.
Combining this inequality with (3.15), we come to (3.16).
We have established the main properties of self-concordant barriers; these properties, along
with the already known to us properties of general self-concordant functions, underly all our
further developments. Let me conclude with the statement of another type:
3.2. PROPERTIES OF SELF-CONCORDANT BARRIERS 49
ϑ ≤ O(1)n,
O(1) being an appropriate absolute constant. If G does not contain lines, then the above barrier
is given by
F (x) = O(1) ln Vol{Px (G)},
where O(1) is an appropriate absolute constant, Vol is the n-dimensional volume and
Px (G) = {ξ | ξ T (z − x) ≤ 1 ∀z ∈ G}
fi (x) ≤ 0, i = 1, ..., m,
Prove that if m > 2n, then one can eliminate from the system at least one inequality in such a
way, that the remaining system still defines a bounded domain.
2) Derive from 1) that if {Gα }α∈I are closed convex domains in Rn with bounded and
nonempty intersection, then there exist an at most 2n-element subset I 0 of the index set I such
that the intersection of the sets Gα over α ∈ I 0 also is bounded.
Note that the requirement m > 2n in the latter exercise is sharp, as it is immediately demon-
strated by the n-dimensional cube.
Exercise 3.3.3 #+ Prove that the function
F (x) = − ln Det x
Those who are not afraid of computations, are kindly asked to solve the following
Exercise 3.3.4 Let
K = {(t, x) ∈ R × Rn | t ≥ |x|2 }
be the ”ice cream” cone. Prove that the function
My congratulations, if you have solved the latter exercise! In the mean time we shall develop
technique which will allow to demonstrate self-concordance of numerous barriers (including those
given by the three previous exercises) without any computations; those solved exercises 3.3.1 -
3.3.4, especially the latter one, will, I believe, appreciate this technique.
Now let us switch to another topic. As it was announced in Lecture 1 and as we shall see in
the mean time, the value of the parameter of a self-concordant barrier is something extremely
important: this quantity is responsible for the Newton complexity (i.e., # of Newton steps) of
finding an ε-solution by the interior point methods associated with the barrier. This is why it
is interesting to realize what the value of the parameter could be.
Let us come to the statement announced in the beginning of Lecture 3:
(P): Let F be ϑ-self-concordant barrier for a closed convex domain G ⊂ Rn . Then either G = Rn
and F = const, or G is a proper subset of Rn and ϑ ≥ 1.
Exercise 3.3.5 #∗ Prove that the only self-concordant barrier for Rn is constant.
Exercise 3.3.6 #∗ Prove that if ∆ is a segment with a nonempty interior on the axis which
differs from the whole axis and f is a ϑ-self-concordant barrier for ∆, then ϑ ≥ 1. Using this
observation, complete the proof of (P).
(P) says that the parameter of any self-concordant barrier for a nontrivial (differing from the
whole space) convex domain G is ≥ 1. This lower bound can be extended as follows:
(Q) Let G be a closed convex domain in Rn and let u be a boundary point of G. Assume that
there is a neighbourhood U of u where G is given by m independent inequalities, i.e., there exist
m continuously differentiable functions g1 , ..., gm on U such that
and the gradients of gj at u are linearly independent. Then the parameter ϑ of any self-
concordant barrier F for G is at least m.
We are about to prove (Q). This is not that difficult, but to make the underlying construction
clear, let us start with the case of a simple polyhedral cone.
Now let us look at (Q). Under the premise of this statement G locally is similar to the above
polyhedral cone; to make the similarity more explicit, let us translate G to make u the origin
and let us choose the coordinates in Rn in such a way that the gradients of gj at the origin,
taken with respect to these coordinates, will be simply the first m basic orths. Thus, we come to
the situation when G contains the origin and in certain neighbourhood U of the origin is given
by
G ∩ U = {x ∈ U | xi ≥ hi (x), i = 1, ..., m},
52 CHAPTER 3. SELF-CONCORDANT BARRIERS
where hi are continuously differentiable functions such that hi (0) = 0, h0i (0) = 0.
Those who have solved the latter exercise understand that that what we need in order to
prove (Q) is certain version of (3.17), something like
∂
−r F (x(r)) ≥ 1 − α(r), i = 1, ..., m, (3.18)
∂xi
where x(r) is the vector with the first m coordinates equal to r > 0 and the remaining ones
equal to 0 and α(r) → 0 as r → +0.
Relation of the type (3.18) does exist, as it is seen from the following exercise:
Exercise 3.3.8 #+ Let f (t) be a ϑ-self-concordant barrier for an interval ∆ = [−a, 0], 0 < a ≤
+∞, of the real axis. Assume that t < 0 is such that the point γt belongs to ∆, where
√
γ > ( ϑ + 1)2 .
Prove that √
0 ( ϑ + 1)2
−f (t)t ≥ 1 − (3.19)
γ
Derive from this fact that if F is a ϑ-self-concordant barrier for G ⊂ Rn , z√is a boundary point
of G and x is an interior point of G such that z + γ(x − z) ∈ G with γ > ( ϑ + 1)2 , then
√
( ϑ + 1)2
−DF (x)[z − x] ≥ 1 − . (3.20)
γ
The results on self-concordant functions and self-concordant barriers allow us to develop the first
polynomial interior point scheme - the path-following one; on the qualitative level, the scheme
was presented in Lecture I.
4.1 Situation
Let G ⊂ Rn be a closed and bounded convex domain, and let c ∈ Rn , c 6= 0. In what follows
we deal with the problem of minimizing the linear objective cT x over the domain, i.e., with the
problem
P : minimize cT x s.t. x ∈ G.
I shall refer to problem P as to a convex programming program in the standard form. This
indeed is a universal format of a convex program, since a general-type convex problem
associated with convex continuous functions f , gj on a closed convex set H always can be
rewritten as a standard problem; to this end it clearly suffices to set
x = (t, u), c = (1, 0, 0, ..., 0)T , G = {(t, u) | u ∈ H, gj (u) ≤ 0, j = 1, ..., m, f (x) − t ≤ 0}.
The feasible domain G of the equivalent standard problem is convex and closed; passing, if
necessary, to the affine hull of G, we enforce G to be a domain. In our standard formulation, G
is assumed to be bounded, which is not always the case, but the boundedness assumption is not
so crucial from the practical viewpoint, since we can approximate the actual problem with an
unbounded G by a problem with bounded feasible domain, adding, say, the constraint |x|2 ≤ R
with large R.
Thus, we may focus on the case of problem in the standard form P. What we need to solve
P by an interior point method, is a ϑ-self-concordant barrier for the domain, and in what follows
we assume that we are given such a barrier, let it be called F . The exact meaning of the words
”we know F ” is that, given x ∈ int G, we are able to compute the value, the gradient and the
Hessian of the barrier at x.
53
54 CHAPTER 4. BASIC PATH-FOLLOWING METHOD
The indicated rules specify the method, up to the initialization rule - where to take the very
first pair (t0 , x0 ) satisfying the closeness relation; in the mean time we will come to this latter
issue. What we are interested in now are the convergence and the complexity properties of the
method.
Proposition 4.4.2 [Newton complexity of a step] The updating recurrency (4.3) is well-defined,
i.e., it keeps the iterates in int G and terminates after finitely many steps; the Newton complexity
of the recurrency, i.e., the # of Newton steps (4.3) before termination, does not exceed certain
constant N which depends on the path tolerance κ and the penalty rate γ only.
Proof. As we have mentioned in the previous proof, the function Fti+1 is self-concordant on
int G and is below bounded on this set (since G is bounded). Therefore the damped Newton
method does keep the iterates y l in int G and ensures the stopping criterion λ(Fti+1 , y l ) ≤ κ
after a finite number of steps (IX., Lecture 2). What we should prove is the fact that the
Newton complexity of the updating is bounded from above by something depending solely on
the path tolerance and the penalty rate. To make clear why it is important here that F is a
self-concordant barrier rather than an arbitrary self-concordant function, let us start with the
following reasoning.
We already have associated with a point x ∈ int G the Euclidean norm
q q
|h|x = hT F 00 (x)h ≡ hT Ft00 (x)h;
in our case F is nondegenerate, so that | · |x is an actual norm, not a seminorm. Let | · |∗x be the
conjugate norm:
|u|∗x = max{uT h | |h|x ≤ 1}.
By definition of the Newton decrement,
and similarly
λ(F, x) = |F 0 (x)|∗x . (4.10)
Now, (ti , xi ) satisfy the closeness relation λ(Ft , x) ≤ κ, i.e.
whence
γ γκ
|(ti+1 − ti )e|∗xi = √ |ti e|∗xi ≤ γ + √ .
ϑ ϑ
Combining the resulting inequality with (4.11), we come to
κ
λ(Fti+1 , xi ) = |ti+1 c + F 0 (xi )|∗xi ≤ γ + [1 + √ ]γ ≤ 3γ (4.13)
ϑ
(the concluding inequality follows from the fact that the parameter of any nontrivial self-
concordant barrier is ≥ 1, see the beginning of Lecture 3). Thus, the Newton decrement of
the new function Fti+1 at the previous iterate xi is at most the quantity 3γ; if γ and κ are small
enough, this quantity is ≤ 1/4, so that xi is within the region of the quadratic convergence of
the damped Newton method (see IX., Lecture 2), and therefore the method quickly restores the
4.4. CONVERGENCE AND COMPLEXITY 57
closeness relation. E.g., let the path tolerance κ and the penalty rate γ be set to the value 0.05.
Then the above computation results in
λ(Fti+1 , xi ) ≤ 0.15,
and from the description of the local properties of the damped Newton method as applied to
a self-concordant function (see (2.19), Lecture 2) it follows that the Newton iterate y 1 of the
starting point y 0 = xi , the Newton method being applied to Fti+1 , satisfies the relation
i.e., for the indicated values of the parameters a single damped Newton step restores the closeness
to the path after the penalty parameter is updated, so that in this particular case N = 1. Note
that the policy for updating the penalty - which is our presentation looked as something ad hoc
- in fact is a consequence of the outlined reasoning: growth of the penalty given by
O(1)
t 7→ (1 + √ )t
ϑ
is the highest one which results in the relation λ(Fti+1 , xi ) ≤ O(1).
The indicated reasoning gives an insight on what is the intrinsic nature of the method: it
does not allow, anyhow, to establish the announced statement in its complete form, since it
requires certain bounds on the penalty rate. Indeed, our complexity results on the behaviour
of the damped Newton method bound the complexity only when the Newton decrement at the
starting point is less than 1. To ”globalize” the reasoning, we should look at the initial residual
in terms of the objective the Newton method is applied to rather than in terms of the initial
Newton decrement. To this end let us prove the following
Proposition 4.4.3 Let t and τ be two values of the penalty parameter, and let (t, x) satisfy the
closeness relation Cκ (·, ·) with some κ < 1. Then
κ τ √ τ
Fτ (x) − min Fτ (u) ≤ ρ(κ) + |1 − | ϑ + ϑρ(1 − ), (4.14)
u 1−κ t t
where, as always,
ρ(s) = − ln(1 − s) − s.
Proof. The path x∗ (τ ) is given by the equation
F 0 (u) + τ c = 0; (4.15)
since F 00 is nondegenerate, the Implicit Function Theorem says to us that x∗ (t) is continuously
differentiable, and the derivative of the path can be found by differentiating (4.15) in τ :
Now let
φ(τ ) = [τ cT x∗ (t) + F (x∗ (t))] − [τ cT x∗ (τ ) + F (x∗ (τ ))]
be the residual in terms of the objective Fτ (·) taken at the point x∗ (t). We have
= [Fτ (x) − F τ (x∗ (t))] + φ(τ ) = [Ft (x) − Ft (x∗ (t))] + (t − τ )cT (x − x∗ (t)) + φ(τ ); (4.20)
since Ft (·) is self-concordant and λ(Ft , x) ≤ κ < 1, we have Ft (x) − Ft (x∗ (t)) = Ft (x) −
minu Ft (u) ≤ ρ(λ(Ft , x)) (see (2.16), Lecture 2), whence
Ft (x) − Ft (x∗ (t)) ≤ ρ(κ). (4.21)
√
(4.8) says to us that |cT (x − x∗ (t))| ≤ κ(1 − κ)−1 ϑt−1 ; combining this inequality, (4.20) and
(4.19), we come to (4.14).
Now we are able to complete the proof of Proposition 4.4.2. Applying (4.14) to x = xi , t = ti
and τ = ti+1 = (1 + √γϑ )ti , we come to
κγ γ
Fti+1 (xi ) − min Fti+1 (u) ≤ ρ(κ) + + ϑρ( √ ),
u 1−κ ϑ
and the left hand side of this inequality is bounded from above uniformly in ϑ ≥ 1 by certain
function depending on κ and γ only (as it is immediately seen from the evident relation ρ(s) ≤
O(s2 ), |s| ≤ 12 1 ).
An immediate consequence of Propositions 4.4.1 and 4.4.2 is the following
Theorem 4.4.1 Let problem P with a closed convex domain G ⊂ Rn be solved by the path-
following method associated with a ϑ-self-concordant barrier F , let κ ∈ (0, 1) and γ > 0 be
the path tolerance and the penalty rate used in the method, and let (t0 , x0 ) be the starting pair
satisfying the closeness relation Cκ (·, ·). Then the absolute inaccuracy cT xi − c∗ of approximate
solutions generated by the method admits the upper bound
2ϑ γ
cT xi − c∗ ≤ (1 + √ )−i , i = 1, 2, ... (4.22)
t0 ϑ
and the Newton complexity of each iteration (ti , xi ) 7→ (ti+1 , xi+1 ) of the method does not exceed
certain constant N depending on κ and γ only. In particular, the Newton complexity (total
# of Newton steps) of finding an ε-solution to the problem, i.e., of finding x ∈ G such that
cT x − c∗ ≤ ε, is bounded from above by
√
ϑ
O(1) ϑ ln +1 ,
t0 ε
with constant factor O(1) depending solely on κ and γ.
1
here is the corresponding reasoning: if s ≡ γϑ−1/2 ≤ 1/2, then g ≡ ϑρ(γϑ−1/2 ) ≤ O(1)γ 2 due to 0 ≤ s ≤ 1/2;
if s > 1/2, then ϑ ≤ 4γ 2 , and consequently g ≤ 4γ 2 ln γ; note that ϑ ≥ 1. Thus, in all cases the last term in the
estimate is bounded from above by certain function of γ
4.5. INITIALIZATION AND TWO-PHASE PATH-FOLLOWING METHOD 59
tends to the analytic center of G with respect to F , to the minimizer x∗F of F over G (since G
is bounded, we know from V., Lecture 3, that this minimizer does exist and is unique). Thus,
all F -generated paths associated with various objectives c start at the same point - the analytic
center of G - and run away from this point as t → ∞, each to the optimal set associated with
the corresponding objective. In other words, the analytic center of G is close to all the paths
generated by F , so that it is a good position to start following the path we are interested in.
Now, how to come to this position? An immediate idea is as follows: the paths associated with
various objectives cover the whole interior of G: if x 6= x∗ is an interior point of G, then a path
passing through x is given by any objective of the form
d = −λF 0 (x),
λ being positive; the path with the indicated objective passes through x when the value of the
penalty parameter is exactly λ. This observation suggests the following initialization scheme:
given a starting point x
b ∈ int G, let us follow the artificial path
u∗ (τ ) = argmin[τ dT x + F (x)], d = −F 0 (x
b)
in the ”inverse time”, i.e., decreasing the penalty parameter τ rather than increasing it. The
artificial path clearly passes through the point x
b:
b = u∗ (1),
x
and we can start tracing it with the pair (τ0 = 1, u0 = xb) which is exactly at the path. When
tracing the path in the outlined manner, we in the mean time come close to the analytic center
of G and, consequently, to the path x∗ (t) we are interested in; when it happens, we can switch
to tracing this target path.
The outlined ideas underly the
Two-Phase Path-Following Method:
Input: starting point x
b ∈ int G; path tolerance κ ∈ (0, 1); penalty rate γ > 0.
Phase 0 [approximating the analytic center] Starting with (τ0 , u0 ) = (1, x
b), generate the se-
quence {(τi , ui )}, updating (ti , ui ) into (τi+1 , ui+1 ) as follows:
• −1
γ
τi+1 = 1+ √ τi ;
ϑ
• to get ui+1 , apply to the function
starting with y 0 = ui . Terminate the method when the pair (τi+1 , y l ) turns out to satisfy
the predicate
thus obtaining the pair (t0 , x0 ) satisfying the predicate Cκ (·, ·).
Phase 1. [approximating optimal solution to P] Starting with the pair (t0 , x0 ), form the
sequence {(ti , xi )} according to the Basic path-following scheme from Section 4.3, namely, given
(ti , xi ), update it into (ti+1 , xi+1 ) as follows:
•
γ
ti+1 = 1 + √ ti ;
ϑ
• to get xi+1 , apply to Fti+1 the damped Newton method
1
y l+1 = y l − [∇2 F (y l )]−1 ∇x Fti+1 (y l ), (4.26)
1 + λ(Fti+1 , xi ) x
starting with y 0 = xi . Terminate the method when the pair (ti+1 , y l ) turns out to satisfy
the predicate Cκ (·, ·); when it happens, set
xi+1 = y l ,
thus obtaining the updated pair satisfying the predicate Cκ , and go to the next step of
Phase 1.
The properties of the indicated method are described in the following statement:
Theorem 4.5.1 Let problem P be solved by the two-phase path-following method associated with
a ϑ-self-concordant barrier for the domain G (the latter is assumed to be bounded). Then
(i) Phase 0 is finite and is comprised of no more than
!
√ ϑ
Nini = O(1) ϑ ln +1 (4.27)
1 − πx∗F (x
b)
4.5. INITIALIZATION AND TWO-PHASE PATH-FOLLOWING METHOD 61
iterations, with no more than O(1) Newton steps (4.23) at every iteration; here and further O(1)
are constant factors dpending solely on the path tolerance κ and the penalty rate γ used in the
method.
(ii) For any ε > 0, the number of iterations of Phase 1 before an ε-solution to P is generated,
does not exceed the quantity
√
ϑVar G (c)
Nmain (ε) = O(1) ϑ ln +1 , (4.28)
ε
where
Var G (c) = max cT x − min cT x,
x∈G x∈G
with no more than O(1) Newton steps (4.26) at every iteration.
In particular, the overall Newton complexity (total # of Newton steps of the both phases) of
finding an ε-solution to the problem does not exceed the quantity
√
V
Ntotal (ε) = O(1) ϑ ln +1 ,
ε
where the data-dependent constant V is given by
ϑVar G (c)
V= .
1 − πx∗F (x
b)
Proof.
10 . Following the line of argument used in the proof of Proposition 4.4.2, one can immediately
verify that the iterations of Phase 0 are well-defined and maintain along the sequence {(τi , ui )}
the predicate Cbκ/2 (·, ·), while the Newton complexity of every iteration of the phase does not
exceed O(1). To complete the proof of (i), we should establish upper bound (4.27) on the number
of iterations of Phase 0. To this end let us note that Cbκ/2 (τi , ui ) means exactly that
We see that the variation (the difference between the minumum and the maximum values) of
the linear form f (y) = y T F 0 (x
b) over the unit Dikin ellipsoid of ∗
√ F centered at xF does not exceed∗
2α. Consequently, the variation
√ of the form on the (ϑ + 2 ϑ)-larger concentric ellipsoid W
does not exceed 2α(ϑ + 2 ϑ). From the Centering property V., Lecture 3, we know that W ∗
contains the whole G; in particular, W ∗ contains the unit Dikin ellipsoid W c1 (ui ) of F centered
at ui (I., Lecture 2). Thus, the variation of the linear form y F (x T 0 b) over the √ c1 (ui ),
ellipsoid W
and this is nothing but twice the quantity |F (x 0 ∗
b)|ui , does not exceed 2α(ϑ + 2 ϑ):
√
0 ∗ ϑ(ϑ + 2 ϑ)
|F (xb)|ui ≤ β ≡ .
1 − πx∗ (x
b)
62 CHAPTER 4. BASIC PATH-FOLLOWING METHOD
λ(F, ui ) ≤ κ/2 + τi β.
Taking into account that τi = (1+ √γϑ )−i , we conclude that the stopping criterion λ(F, ui ) ≤ 3κ/4
b))−1 ), as claimed in (i).
for sure is satisfied when i is O(1) ln(1 + ϑ(1 − πx∗F (x
20 . Now let us verify that
κVar G (c)
t0 ≥ . (4.31)
2
Indeed, since c 6= 0, it follows from the origin of t0 (see (4.25)) that
(the term n3 is responsible for the arithmetic cost of solving the Newton system at a Newton
step).
4.6. CONCLUDING REMARKS 63
and assume that the system of linear inequalities aTj x ≤ bj , j = 1, ..., m, satisfies the Slater
condition and defines a polytope (i.e., a bounded polyhedral set) G. As we know from Corollary
3.1.1, the standard logarithmic barrier
m
X
F (x) = − ln(bj − aTj x)
j=1
and we see that the arithmetic cost of computing F (x), F 0 (x) and F 00 (x) is O(mn2 ), while the
dimension of the data vector for a problem instance is O(mn). Therefore the path-following
method associated with the standard logarithmic barrier for the polytope G finds an ε-solution
to the problem at the cost of
√ V
N (ε) = O(1) m ln +1
ε
Newton steps, with the arithmetic cost of a step O(1)mn2 (the arithmetic cost O(n3 ) of solving
the Newton system is dominated by the cost of assembling the system, i.e., that one of computing
F 0 and F 00 ; indeed, since G is bounded, we have m > n). Thus, the overall arithmetic cost of
finding an ε-solution to the problem is
V
M(ε) = O(1)m1.5 n2 ln +1 ,
ε
so that the ”arithmetic cost of an accuracy digit” is O(m1.5 n3 ). In fact the latter cost can be
reduced to O(mn2 ) by proper implementation of the method (the Newton systems arising at the
neighbouring steps of the method are ”close” to each other, which allows to reduce the average
over steps arithmetic cost of solving the Newton systems), but I am not going to speak about
these acceleration issues.
What should be stressed is that the outlined method is fine from the viewpoint of its the-
oretical complexity; it is, anyhow, far from being appropriate in practice. The main drawback
of the method is its ”short-step” nature: to ensure the theoretical complexity bounds, one is
enforced to increase the penalty √parameter at the rate (1 + O(1)ϑ−1/2 ), so that the number of
Newton steps is proportional to ϑ. For an LP problem of a not too large size - say, n = 1000,
m = 10000, the method would require solving several hundreds, if not thousands, linear systems
with 1000 variables, which will take hours - time incomparable with that one required by the
simplex method; and even moderate increasing of sizes results in days and months instead of
hours. You should not think that these unpleasant practical consequences are caused by the
intrinsic drawbacks of the scheme; they come from our ”pessimistic” approach to the implemen-
tation of the scheme. It turns out that ”most of the time” you can increase the penalty at a
significantly larger rate than that one given by the worst-case theoretical complexity analysis,
and still will be able to restore closeness to the path by a small number - 1-2 - of Newton
steps. There are very good practical implementations of the scheme which use various on-line
64 CHAPTER 4. BASIC PATH-FOLLOWING METHOD
strategies to control the penalty rate and result in a very reasonable - 20-40 - total number of
Newton steps, basically independent of the size of the problem. From the theoretical viewpoint,
anyhow, it is important to develop computationally
√ cheap rules for on-line adjusting the penalty
rate which ensure the theoretical O( ϑ) Newton complexity of the method; in the mean time
we shall speak about recent progress in this direction.
4.7. EXERCISES: BASIC PATH-FOLLOWING METHOD 65
In turn looking at the proof of this property (0., I., Lecture 3), one can find out that the only
properties of F and G used there were the following ones:
S(ϑ): G ∈ Rn is a closed convex domain; F is a twice continuously differentiable convex function
on int G such that
Exercise 4.7.2 # Prove that property S(·) is stable with respect to affine substitutions of argu-
ment and with respect to summation; namely, prove that
1) if the pair (G ⊂ Rn , F ) satisfies S(ϑ) and y = A(x) ≡ Ax + a is an affine mapping from Rk
into Rn with the image intersecting int G, then the pair (A−1 (G), F (A(·))) also satisfies S(ϑ);
2) if the pairs (Gi ⊂ Rn , Fi ), i = 1, ..., m, satisfy S(ϑi ) and G = ∩i Gi is a domain, then the
P P
pair (G, i αi Fi ), αi ≥ 0, satisfies S( i αi ϑi ).
Now let us formulate a simple necessary and sufficient condition for a pair (G, F ) to satisfy S(ϑ).
Exercise 4.7.3 # Let ϑ > 0, and let (G ⊂ Rn , F ) be a pair comprised of a closed convex
domain and a function twice continuously differentiable on the interior of the domain. Prove
that (G, F ) sastisfies S(ϑ) if and only if the function exp{−ϑF } is concave on int G. Derive
from this observation and the result of the previous exercise the following statement (due to
Fiacco and McCormic):
let gi , i = 1, ..., m, be convex twice continuously differentiable functions on Rn satisfying the
Slater condition. Consider the logarithmic barrier
X
F (x) = − ln(−gi (x))
i
where
fj (x) = xT Aj x + 2bTj x + cj , j = 0, ..., m
are convex quadratic forms. Assume that you are given a point x b such that fj (xb) < 0, j =
1, ..., m, and R > 0 such that the feasible set of the problem is inside the ball {x | |x|2 ≤ R}.
1) reduce the problem to the standard form with a bounded feasible domain and point out an
(m + 2)-self-concordant barrier for the domain, same as an interior point of the domain;
2) write down the algorithmic scheme of the associated path-following method. Evaluate the
arithmetic cost of a Newton step of the method.
Now let us discuss the following issue. In the Basic path-following method the rate of
updating the penalty parameter, i.e., the penalty ratio
ω = ti+1 /ti ,
is set to 1 + O(1)ϑ−1/2 , ϑ being the parameter of the underlying √ barrier. This choice of the
penalty ratio results in the best known, namely, proportional to ϑ, theoretical complexity
bound for the method. In Lecture 4 it was explained that this fine theoretically choice of the
penalty ratio in practice makes the method almost useless, since it for sure enforces the method
to work according its theoretical worst-case complexity bound; the latter bound is in many cases
too large for actual computations. In practice people normally take as the initial value of the
penalty ratio certain moderate constant, say, 2 or 3, and then use various routines for on-line
adjusting the ratio, slightly increasing/decreasing it depending on whether the previous updating
xi 7→ xi+1 took ”small” or ”large” (say, ≤ 2 or > 2) number of Newton steps. An immediate
theoretical question here is: what can be said about the Newton complexity of a path-following
method where the penalty ratio is a once for ever fixed constant ω > 1 (or, more generally, varies
somehow between once for ever fixed bounds ω− < ω+ , with 1 < ω− ≤ ω+ < ∞). The answer
is that in this case the Newton complexity of an iteration (ti , xi ) 7→ (ti+1 , xi+1 ) is of order of ϑ
rather than of order of 1.
Exercise 4.7.5 Consider the Basic path-following method from Section 4.3 with rule (4.1) re-
placed with
ti+1 = ωi ti ,
where ω− ≤ ωi ≤ ω+ and 1 < ω− ≤ ω+ < ∞. Prove that for this version of the method the
statement of Theorem 4.4.1 should be modified as follows: the total # of Newton steps required
to find an ε-solution to P can be bounded from above as
ϑ
O(1)ϑ ln +1 ,
t0 ε
In the previous lecture we dealt with the Basic path-following interior point method. It was
explained that the method, being fine theoretically, is not too attractive from the practical
viewpoint, since it is a routine with a prescribed (and normally close to 1) rate of updating
the penalty parameter; as a result, the actual number of Newton steps in the routine is more
or less the same√ as the number given by the theoretical worst-case analysis and for sure is
proportional to ϑ, ϑ being the parameter of the underlying self-concordant barrier. For large-
scale problems, ϑ normally is large, and the # of Newton steps turns out to be too large
for practical applications. The source of difficulty is the conceptual drawback of our scheme:
everything is strictly regulated, there is no place to exploit favourable circumstances which may
occur. As we shall see in the mean time, this conceptual drawback can be eliminated, to certain
extent, even within the path-following scheme; there is, anyhow, another family of interior
point methods, the so called potential reduction ones, which are free of this drawback of strict
regulation; some of these methods, e.g., the famous - and the very first - interior point method
of Karmarkar for Linear Programming, turn out to be very efficient in practice. The methods
of this potential reduction type are what we are about to investigate now; the investigation,
anyhow, should be preceded by developing a new portion of tools, interesting in their own right.
This development is our today goal.
tx ∈ K whenever x ∈ K and t ≥ 0;
in other words, a cone should contain with any of its points the whole ray spanned by the point.
A convex cone is called pointed, if it does not contain lines.
Given a convex cone K ⊂ Rn , one can define its dual as
K ∗ = {s ∈ Rn | sT x ≥ 0 ∀x ∈ K}.
In what follows we use the following elementary facts about convex cones: let K ⊂ Rn be a
closed convex cone and K ∗ be its dual. Then
67
68 CHAPTER 5. CONIC PROBLEMS AND CONIC DUALITY
• K is pointed if and only if K ∗ has a nonempty interior; K ∗ is pointed if and only if K has
a nonempty interior. The interior of K ∗ is comprised of all vectors s strictly positive on
K, i.e., such that sT x > 0 for all nonzero x ∈ K.
An immediate corollary of the indicated facts is that a closed convex cone K is pointed and
possesses a nonempty interior if and only if its dual shares these properties.
Conic problem. Let K ⊂ Rn be a closed pointed convex cone with a nonempty interior.
Consider optimization problem
where
• L is a linear subspace in Rn ;
• b is a vector from Rn .
Geometrically: we should minimize a linear objective (cT x) over the intersection of an affine
plane (b+L) with the cone K. This intersection is a convex set, so that (P) is a convex program;
let us refer to it as to convex program in the conic form.
Note that a program in the conic form strongly resembles a Linear Programming program in
the standard form; this latter problem is nothing but (P) with K specified as the nonnegative
orthant Rn+ . On the other hand, (P) is a universal form of a convex programming problem.
Indeed, it suffices to demonstrate that a standard convex problem
G being a closed convex domain, can be equivalently rewritten in the conic form (P). To this
end it suffices to represent G as an intersection of a closed convex cone and an affine plane,
which is immediate: identifying Rk with the affine hyperplane
where
0
c=
d
and
K = cl{(t, x) | t > 0, t−1 x ∈ G}
is the conic hull of G. It is easily seen that (S) is equivalent to (Sc ) and that the latter problem
is conic (i.e., K is a closed convex pointed cone with a nonempty interior), provided that the
closed convex domain G does not contain lines (whih actually is not a restriction at all). Thus,
(P) indeed is a universal form of a convex program.
5.2. CONIC DUALITY 69
which again is a convex, proper and closed function; the conjugacy is an involution:
(f ∗ )∗ = f .
Now, let f1 , ..., fk be convex proper and closed functions on Rn such that the
relative interiors of the domains of the functions (i.e., the interiors taken with respect
to the affine hulls of the domains) have a point in common. The Fenchel Duality
theorem says that if the function
k
X
f (x) = fi (x)
i=1
(note this min in the right hand side: the theorem says, in particular, that it indeed
is achieved). The problem
k
X X
minimize fi∗ (si ) s.t. si = 0
i=1 i
Now let us derive the Fenchel dual to the conic problem (P). To this end let us set
T 0, x∈b+L 0, x∈K
f1 (x) = c x; f2 (x) = ; f3 (x) = ;
+∞, otherwise +∞, otherwise
these functions clearly are convex, proper and closed, and (P) evidently is nothing but the
problem of minimizing f1 + f2 + f3 over Rn . To write down the Fenchel dual to the latter
problem, we should realize what are the functions fi∗ , i = 1, 2, 3. This is immediate:
0, s=c
f1∗ (s) = sup{sT x − cT x | x ∈ Rn } = ;
+∞ otherwise
1
equivalently: f is lower semicontinuous, or: the level sets {x | f (x) ≤ a} are closed for every a ∈ R
70 CHAPTER 5. CONIC PROBLEMS AND CONIC DUALITY
T
s b,s ∈ L⊥
f2∗ (s) = sup{sT x − 0 | x ∈ domf2 ≡ b + L} = ,
+∞, otherwise
where L⊥ is the orthogonal complement to L;
0, s ∈ −K ∗
f3∗ (s) T
= sup{s x − 0 | x ∈ domf3 ≡ K} = ,
+∞, otherwise
where K ∗ is the cone dual to K.
Now, in the Fenchel dual to (P), i.e., in the problem of minimizing f1∗ (s1 ) + f2∗ (s2 ) + f3∗ (s3 )
over s1 , s2 , s3 subject to s1 + s2 + s3 = 0, we clearly can restrict si to be in domfi∗ without
violating the optimal solution; thus, we may restrict ourselves to the case when s1 = c, s2 ∈ L⊥
and s3 ∈ −K ∗ , while s1 + s2 + s3 = 0; under these restrictions the objective in the Fenchel dual
is equal to sT2 b. Expressing s1 , s2 , s3 in terms of s = s1 + s2 ≡ −s3 , we come to the following
equivalent reformulation of the Fenchel dual to (P):
Note that the actual objective in the Fenchel dual is sT2 b ≡ sT b + cT b; writing down (D), we
omit the constant term cT b (this does not influence the optimal set, although varies the optimal
value). Problem (D) is called the conic dual to the primal conic problem (P).
Note that K is assumed to be closed convex and pointed cone with a nonempty interior;
therefore the dual cone K ∗ also is closed, pointed, convex and with a nonempty interior, so that
the dual problem also is conic. Bearing in mind that (K ∗ )∗ = K, one can immediately verify
that the indicated duality is completely symmetric: the problem dual to dual is exactly the
primal one. Note also that in the Linear Programming case the conic dual is nothing but the
usual dual problem written down in terms of slack variables.
(x − b)T (s − c) = 0,
P ∗ + D∗ ≥ cT b,
where, for finite a, ±∞ + a = ±∞, the sum of two infinities of the same sign is the infinity of
this sign and (+∞) + (−∞) = +∞.
5.2. CONIC DUALITY 71
This is immediate: take infimums in primal feasible x and dual feasible s in the relation cT x +
bT s ≥ cT b (see 0.).
II. If the dual problem is feasible, then the primal is below bounded2 ; if the primal problem is
feasible, then the dual is below bounded.
This is an immediate corollary of I.: if, say, D∗ is < +∞, then P ∗ > −∞, otherwise D∗ + P ∗
would be −∞, which is impossible in view of I.
III. Conic Duality Theorem. If one of the problems in the primal-dual pair (P), (D) is
strictly feasible (i.e., possesses feasible solutions from the interior of the corresponding cone)
and is below bounded, then the second problem is solvable, the optimal values in the problems
are finite and optimal duality gap P ∗ + D∗ − cT b is zero.
If both of the problems are strictly feasible, then both of them are solvable, and a pair
(x , s∗ ) comprised of feasible solutions to the problems is comprised of optimal solutions if and
∗
only if the duality gap cT x∗ + bT s∗ − cT b is zero, and if and only if the complementary slackness
(x∗ )T s∗ = 0 holds.
Proof. Let us start with the first statement of the theorem. Due to primal-dual symmetry, we
can restrict ourselves with the case when the strictly feasible below bounded problem is (P).
Strict feasibility means exactly that the relative interiors of the domains of the functions f1 , f2 ,
f3 (see the derivation of (D)) have a point in common, due to the description of the domains of
f1 (the whole space), f2 (the affine plane b + L), f3 (the cone K). The below boundedness of (P)
means exactly that the function f1 + f2 + f3 is below bounded. Thus, the situation is covered
by the premise of the Fenchel duality theorem, and according to this theorem, the Fenchel dual
to (P), which can be obtained from (D) by substracting the constant cT b from the objective, is
solvable. Thus, (D) is solvable, and the sum of optimal values in (P) and (D) (which is by cT b
greater than the zero sum of optimal values stated in the Fenchel theorem) is C T b, as claimed.
Now let us prove the second statement of the theorem. Under the premise of this statement
both problems are strictly feasible; from II. we conclude that both of them are also below
bounded. Applying the first statement of the theorem, we see that both of the problems are
solvable and the sum of their optimal values is cT b. It immediately follows that a primal-dual
feasible pair (x, s) is comprised of primal-dual optimal solutions if and only if cT x + bT s = cT b,
i.e., if and only if the duality gap at the pair is 0; since the duality gap equals also to xT s (see
0.), we conclude that the pair is comprised of optimal solutions if and only if xT s = 0.
Remark 5.2.1 The Conic duality theorem, although very similar to the Duality theorem in
LP, is a little bit weaker than the latter statement. In the LP case, already (feasibility + below
boundedness), not (strict feasibility + below boundedness), of one of the problems implies
solvability of both of them and characterization of the optimality identical to that one given
by the second statement of the Conic duality theorem. A ”word by word” extension of the LP
Duality theorem fails to be true for general cones, which is quite natural: in the non-polyhedral
case we need certain qualification of constrains, and strict feasibility is the simplest (and the
strongest) form of this qualification. From the exercises accompanying the lecture you can find
out what are the possibilities to strengthen the Conic duality theorem, on one hand, and what
are the pathologies which may occur if the assumptions are weakened too much, on the other
hand.
Let me conclude this part of the lecture by saying that the conic duality is, as we shall see, useful
for developing potential reduction interior point methods. It also turned out to be powerful tool
for analytical - on paper - processing a problem; in several interesting cases, as we shall see
2
i.e., P ∗ > −∞; it may happen, anyhow, that (P) is unfeasible
72 CHAPTER 5. CONIC PROBLEMS AND CONIC DUALITY
in the mean time, it allows to derive (completely mechanically!) nontrivial and informative
reformulations of the initial setting.
Definition 5.3.1 Let K ⊂ Rn be a a convex, closed and pointed cone with a nonempty interior,
and let ϑ ≥ 1 be a real. A function F : int K → R is called ϑ-logarithmically homogeneous self-
concordant barrier for K, if it is self-concordant on int K and satisfies the identity
Our terminology at this point looks confusing: it is not clear whether a ”logarithmically homo-
geneous self-concordant barrier” for a cone is a ”self-concordant barrier” for it. This temporary
difficulty is resolved by the following statement.
Proof. Since, by assumption, K does not contain lines, F is nondegenerate (II., Lecture 2).
Now let us prove (5.3) - (5.5). Differentiating the identity
−xT F 0 (x) = ϑ.
Due to already proved (5.4), this relation implies all equalities in (5.5), excluding the very
first of them; this latter follows from the fact that x, due to (5.4), is the Newton direction
−[F 00 (x)]−1 F 0 (x) of F at x, so that λ2 (F, x) = −xT F 0 (x) (IVa., Lecture 2). √
Form (5.5) it follows that the Newton decrement of F is identically equal to ϑ; since, by
definition, F is self-concordant on int K, F is ϑ-self-concordant barrier for K.
Let us list some examples of self-concordant barriers.
for the nonnegative orthant Rn+ is n-logarithmically homogeneous self-concordant barrier for the
orthant.
5.3. LOGARITHMICALLY HOMOGENEOUS BARRIERS 73
Proof.
10 . From Proposition 5.3.2 we know that F is nondegenerate; therefore F ∗ is self-concordant
on its domain Q, and the latter is nothing but the image of int K under the one-to-one mapping
(5.7), the inverse to the mapping being s 7→ (F ∗ )0 (s) (see Lecture 2, (L.1)-(L.3) and VII.).
Further, from (5.3) it follows that Q is an (open) cone; indeed, any point s ∈ Q, due to already
proved relation Q = F 0 (int K), can be represented as F 0 (x) for some x ∈ int K, and then
ts = F 0 (t−1 x) also belongs to Q. It follows that K + = cl Q is a closed convex cone with a
nonempty interior.
20 . Let us prove that K + = −K ∗ . This is exactly the same as to prove that the interior
of −K ∗ (which is comprised of s strictly negative on K, i.e., with sT x being negative for any
nonzero x ∈ K, see Section 5.1) coincides with Q ≡ F 0 (int K):
the concluding quantity here is strictly positive, since y is nonzero and F , as we already know,
is nondegenerate.
20 .2. To complete the proof of (5.10), we need to verify the inclusion inverse to (5.11), i.e.,
we should prove that if s is strictly negative on K, then s = F 0 (x) for certain x ∈ int K. Indeed,
since s is strictly negative on K, the cross-section
Ks = {y ∈ K | sT y = −1} (5.12)
is bounded (Section 5.1). The restirction of F onto the relative interior of this cross-section is
a self-concordant function on rintKs (stability of self-concordance with respect to affine substi-
tutions of argument, Proposition 2.1.1.(i)). Since Ks is bounded, F attains its minimum on the
relative interior of Ks at certain point y, so that
F 0 (y) = λs
for some λ, The coefficient λ is positive (since y T F 0 (y) = λy T s is negative in view of (5.5) and
y T s = −1 also is negative (recall that y ∈ Ks ). Since λ is positive and F 0 (y) = λs, we conclude
that F 0 (λ−1 y) = s (5.3), and s indeed is F 0 (x) for some x ∈ int K (namely, x = λ−1 y). The
inclusion (5.10) is proved.
30 . Summarising our considerations, we see that F ∗ is self-concordant on the interior of the
cone −K ∗ ; to complete the proof of (i), it suffices to verify that
F ∗ (ts) = F (s) − ϑ ln t.
This is immediate:
(i) is proved.
40 . Let us prove (ii). First of all, for x ∈ int K and s = −tF 0 (x) we have
[since F ∗ is ϑ-logarithmically homogeneous due to (i) and −xT F 0 (x) = ϑ, see (5.5)]
= F (x) + F ∗ (F 0 (x)) + ϑ ln ϑ =
[since F ∗ (F 0 (x)) = xT F 0 (x) − F (x) due to the definition of the Legendre transformation]
= xT F 0 (x) + ϑ ln ϑ = ϑ ln ϑ − ϑ
(we have used (5.5)). Thus, (5.8) indeed is equality when s = −tF 0 (x) with certain t > 0.
50 . To complete the proof of (5.8), it suffices to demonstrate that if x and s are such that
then s is proportional, with positive coefficient, to −F 0 (x). To this end consider the cross-section
of K as follows:
Ks = {y ∈ K | sT y = sT x}.
The restriction of V (·, s) onto the relative interior of Ks is, up to additive constant, equal to the
restriction of F , i.e., it is self-concordant (since Ks is cross-section of K by an affine hyperplane
passing through an interior point of K; we have used similar reasoning in 20 .2). Since Ks is
bounded (by virtue of s ∈ int K ∗ ), F , and, consequently, V (·, s) attains its minimum on the
relative interior of Ks , and this minimum is unique (since F is nondegenerate). At the minimizer,
let it be y, one should have
F 0 (y) = −λs;
taking here inner product with y and using (5.5) and the inclusion y ∈ Ks , we get λ > 0. As we
alerady know, the relation F 0 (y) − −λs with positive λ implies that V (y, s) = ϑ ln ϑ − ϑ; now
from (5.13) it follows that V (y, s) ≥ V (x, s). Since, by construction, x ∈ rintKs and y is the
unique minimizer of V (·, s) on the latter set, we conclude that x = y, so that F 0 (x) = −λs, and
we are done.
76 CHAPTER 5. CONIC PROBLEMS AND CONIC DUALITY
Exercise 5.4.2 #+ Prove that K possesses a nonempty interior if and only if K ∗ is pointed,
and that K ∗ possesses a nonempty interior if and only if K is pointed.
Exercise 5.4.3 #+ Let s ∈ Rn . Prove that the following properties of s are equivalent:
(i) s is strictly positive on K, i.e., sT x > 0 whenever x ∈ K is nonzero;
(ii) The set K(s) = {x ∈ K | sT x ≤ 1} is bounded;
(iii) s ∈ int K ∗ .
Formulate ”symmetric” characterization of the interior of K.
Exercise 5.4.5 Demonstrate by examples, that the following situations (which for sure do not
occur in LP duality) are possible:
1) the primal problem is strictly feasible and below bounded, and at the same time it is
unsolvable (cf. Exercise 5.4.4, 2));
2) the primal problem is solvable, and the dual is unfeasible (cf. Exercise 5.4.4, 2), 3), 4));
3) the primal problem is feasible with bounded feasible set, and the dual is unsolvable (cf.
Exercise 5.4.4, 2), 3));
3) both the primal and the dual problems are solvable, but there is nonzero duality gap: the
sum of optimal values in the problems is strictly greater than cT b (cf. Exercise 5.4.4, 2), 3)).
Exercise 5.4.6 ∗ Assume that both the primal and the dual problem are feasible. Prove that
the feasible set of at least one of the problems is unbounded.
• the cone Rn+ - the n-dimensional nonnegative orthant in Rn ; the latter space from now on
is equipped with the standard Euclidean structure given by the inner product xT y;
• the cone Sn+ of positive semidefinite symmetric n×n matrices in the space Sn of symmetric
n × n matrices; this latter space from now on is equipped with the Frobenius Euclidean
structure given by the inner product hx, yi = Tr{xy}, Tr being the trace; this is nothing
but the sum, over all entries, of the products of the corresponding entries in x and in y;
78 CHAPTER 5. CONIC PROBLEMS AND CONIC DUALITY
this is a cone in Rn+1 , and we already have said what is the Euclidean structure the space
is equipped with.
Exercise 5.4.7 # Prove that each of the aforementioned cones is closed, pointed, convex and
with a nonempty interior and, besides this, is self-dual, i.e., coincides with its dual cone3 .
Now let us look what means complementary slackness in the case of our standard cones.
Exercise 5.4.8 # Let K be a cone, K ∗ be a dual cone and let x, s satisfy the complementary
slackness relation
S(K) : {x ∈ K}&{s ∈ K ∗ }&{xT s = 0}.
Prove that
1) in the case of K = Rn+ the relation S says exactly that x and s are nonnegative n-
dimensional vectors with the zero dot product x × s = (x1 s1 , ..., xn sn )T ;
2)+ in the case of K = Sn+ the relation S says exactly that x and s are positive semidefinite
symmetric matrices with zero product xs; if it is the case, then x and s commutate and possess,
therefore, a common eigenbasis, and the dot product of the diagonals of x and s in this basis is
zero; q
3)+ in the case of K = Kn2 the relation S says exactly that xn+1 = x21 + ... + x2n , sn+1 =
q
s21 + ... + s2n and
x1 : s1 = x2 : s2 = ... = xn : sn = −[xn+1 : sn+1 ].
We have presented the ”explicit characterization” of complementary slackness for our particular
cones which often occur in applications, sometimes as they are, and sometimes - as certain
”building blocks”. I mean that there are decomposable situations where the cone in question is
a direct product:
K = K1 × ... × Kk ,
and the Euclidean embedding space for K is the direct product of Euclidean embedding spaces for
the ”component cones” Ki . In such a situation the complementary slackness is ”componentwise”:
and a pair x = (x1 , ..., xk ), s = (s1 , ..., sk ) possesses the complementary slackness property S(K)
if and only if each of the pairs xi , si possesses the property S(Ki ), i = 1, ..., k.
Thus, if we are in a decomposable situation and the cones Ki belong each to its own of our three
standard families, then we are able to interpret explicitly the complementary slackness relation.
Let me complete this section with certain useful observation related to the three families
of cones in question. We know form Lecture 5 that these cones admit explicit logarithmically
homogeneous self-concordant barriers; on the other hand, we know that the Legendre transfor-
mation of a logarithmically homogeneous self-concordant barrier for a cone is similar barrier
3
self-duality, of course, makes sense only with respect to certain Euclidean structure on the embedding linear
space, since this structure underlies the construction of the dual cone. We have already indicated what are these
structures for the spaces where our cones live
5.4. EXERCISES: CONIC PROBLEMS 79
for the anti-dual cone. It is interesting to look what are the Legendre transformations of the
particular barriers known to us. The answer is as it should be: these barriers are, basically,
”self-adjoint” - their Legendre transformations coincide with the barriers, up to negating the
argument and adding a constant:
Exercise 5.4.10 # Prove that
1) the Legendre transformation of the standard logarithmic barrier
n
X
F (x) = − ln xi
i=1
with
β = b + Ap. (5.15)
5.5. EXERCISES: TRUSS TOPOLOGY DESIGN VIA CONIC DUALITY 81
• loading scenarios f1 , ..., fk - vectors from Rn ; here n is the total number of degrees of
freedom of the nodes (i.e., the dimension of the space of virtual nodal displacements), and
the entries of f are the components of the external forces acting at the nodes.
n is something like twice (for 2D constructions) or 3 times (for 3D ones) the number of
nodes; ”something like”, because some of the nodes may be partially or completely fixed
(say, be in the fundament of the construction), which reduces the total # of freedom
degrees;
Under reasonable mechanical hypothesis, these matrices are symmetric positive semidefi-
nite with positive definite sum, and in fact even dyadic:
Ai = bi bTi
for certain vectors bi ∈ Rn (these vectors are defined by the geometry of the nodal set).
These assumptions on Ai are crucial for what follows4 .
4
crucial are positive semidefiniteness and symmetry of Ai , not the fact that they are dyadic; this latter
assumption, quite reasonable for actual trusses, is not too important, although simplifies some relations
82 CHAPTER 5. CONIC PROBLEMS AND CONIC DUALITY
From our initial formulation it is not even seen that the problem is convex (since equality
constraints (5.17) are bilinear in t and xj ). It is, anyhow, easy to demonstrate that in fact the
problem is convex. The motivation of the reasoning is as follows: when t is strictly positive,
A(t) is positive definite (since Ai are positive semidefinite with positive definite sum), and the
equilibrium equations can be solved explicitly:
xj = A−1 (t)fj ,
Note that cj (t) are closed and proper convex functions (as upper bounds of linear forms; the
fact that the functions are proper is an immediate consequence of the fact that A(t) is positive
definite for strictly positive t), so that (TTD1 ) is a convex program.
Our next step will be to reduce (TTD1 ) to a conic form. Let us first make the objective linear.
This is immediate: by introducing an extra variable τ , we can rewrite (TTD1 ) equivalently as
(TTD2 ): minimize τ by choice of t ∈ Rn and τ subject to the constraints (5.16) and
τ + z T A(t)z − 2z T fj ≥ 0, ∀z ∈ Rn ∀j = 1, ..., k. (5.18)
((5.18) clearly express the inequalities τ ≥ cj (t), j = 1, ..., k).
Our next step is guided by the following evident observation:
the inequality
τ + z T Az − 2z T f,
τ being real, A being symmetric n × n matrix and f being a n-dimensional vector, is valid for
all z ∈ Rn if and only if the symmetric (n + 1) × (n + 1) matrix
τ fT
≥0
f A
is positive semidefinite.
Exercise 5.5.3 Prove the latter statement. Derive from this statement that (TTD2 ) can be
equivalently written down as
(TTDp ): minimize τ by choice of s ∈ Rm and τ ∈ R subject to the constraint
m
X
A(τ, s) ∈ K; si = 0,
i=1
where
84 CHAPTER 5. CONIC PROBLEMS AND CONIC DUALITY
n+1
• K is the direct product of Rm
+ and k copies of the cone S+ ;
the component of A associated with j-th of the copies of the cone Sn+1
+ is
τ fjT
Aj (τ, s) = .
fj A(e) + A(s)
Note that At (τ, s) is nothing but our previous t; the constraint At (τ, s) ∈ Rm
+ (which is the part of
P
the constraint A(τ, s) ∈ K) together with the constraint i si = 0 give equivalent reformulation
of the constraint (5.16), while the remaining components of the constraint A(τ, s) ∈ K, i.e., the
inclusions Aj (τ, s) ∈ Sn+1
+ , represent the constraints (5.18).
Note that the problem (TTDp ) is in fact in the conic form (cf. Section 5.4.4). Indeed, it requires
to minimize a linear objective under the constraints that, first, the design vector (τ, s) belongs
P
to sertain linear subspace E (given by i si = 0) and, second, that the image of the design
vector under a given affine mapping belongs to certain cone (closed, pointed, convex and with
a nonempty interior). Now, the objective evidently can be respresented as a linear form cT u of
the image u = A(τ, s) of the design vector under the mapping, so that our problem is exactly in
minimizing a linear objective over the intersection of an affine plane (namely, the image of the
linear subspace E under the affine mapping A) and a given cone, which is a conic problem.
To the moment we acted in certain ”clever” way; from now on we act in completely ”mechan-
ical” manner, simply writing down and straightforwardly simplifying the conic dual to (TTDp ).
First step: writing down conic dual to (TTDp ). What we should do is to apply to (TTDp )
the general construction from Lecture 5 and look at the result. The data in the primal problem
are as follows:
n+1
• K is the direct product of Kt = Rm
+ and k copies Kj of the cone S+ ; the embedding
space for this cone is
E = Rn × Sn+1 × ... × Sn+1 ;
we denote a point from this latter space by u = (t, p1 , ..., pk ), t ∈ Rm and pj being
(n + 1) × (n + 1) symmetric matrices, and denote the inner product by (·, ·);
note that there are many other ways to choose c in accordance with this relation;
Now let us build up the dual problem. We know that the cone K is self-dual (as a direct product
of self-dual cones, see Exercises 5.4.7, 5.4.9), so that K ∗ = K. We should realize only what is
L⊥ , in other words, what are the vectors
s = (r, q1 , ..., qk ) ∈ E
which are orthogonal to the image of E under the homogeneous part of the affine mapping A.
This requires nothing but completely straightforward computations.
Exercise 5.5.4 + Prove that feasible plane c + L⊥ of the dual problem is comprised of exactly
those w = (r, q1 , ..., qk ) for which the symmetric (n + 1) × (n + 1) matrices qj , j = 1, ..., k, are
of the form
λj z Tj
qj = , (5.19)
z j σj
with λj satisfying the relation
k
X
λj = 1 (5.20)
j=1
and the n × n symmetric matrices σ1 , ..., σk , along with the n-dimensional vector r, and a real
ρ, satisfying the equations
k
X
ri + bTi σj bi = ρ, i = 1, ..., m. (5.21)
j=1
(bi are the vectors involved into the representation Ai = bi bTi , so that bTi σj bi = Tr{Ai σj }).
Derive from this observation that the conic dual to (TTDp ) is the problem
(TTDd ): minimize the linear functional
k
X
2 z Tj fj + V ρ (5.22)
j=1
by choice of positive semidefinite matrices qj of the form (5.19), nonnegative vector r ∈ Rn and
real ρ under the constraints (5.20) and (5.21).
Second step: simplifying the dual problem. Now let us simplify the dual problem. It is
immediately seen that one can eliminate the ”heavy” matrix variables σj and the vector r by
performing partial optimization in these variables:
can be extended to a feasible plan (r; q1 , ..., qk ; ρ) of problem (TTDd ) if and only if the collection
satisfies the following requirements:
k
X
λj ≥ 0; λj = 1; (5.23)
j=1
k
X (bTi z j )2
ρ≥ ∀i (5.24)
j=1
λj
86 CHAPTER 5. CONIC PROBLEMS AND CONIC DUALITY
(a fraction with zero denominator from now on is +∞), so that (TTDd ) is equivalent to the
problem of minimizing linear objective (5.22) of the variables λ· , z · , ρ under constraints (5.23),
(5.24).
Eliminate ρ from this latter problem to obtain the following equivalent reformulation of
(TTDd ):
(TTDd ): minimize the function
k
" #
X (bT z j )2
max 2z Tj fj +V i (5.25)
i=1,...,m
j=1
λj
k
X
λj ≥ 0; λj = 1. (5.26)
j=1
Note that in the important single-load case k = 1 the problem (TTDd ) is simply in minimizing,
with respect to z 1 ∈ Rn , the maximum over i = 1, ..., m of the quadratic forms
ψi (z 1 ) = 2z T1 f1 + V (bTi z 1 )2 .
Now look: the initial problem (TTDp ) contained m-dimensional design vector (τ, s) (the
”formal” dimension of the vector is m + 1, but we remember that the sum of si should be 0).
The dual problem (TTDd ) has k(n + 1) − 1 variables (there are k n-dimensional vectors z j and
k reals λj subject to a single linear equation). In the ”full topology TTD” (it is allowed to
link by a bar any pair of nodes), m is of order of n2 and n is at least of order of hundreds, so
that m is of order of thousands and tens of thousands. In contrast to these huge numbers, the
number k of loading scenarios is, normally, a small integer (less than 10). Thus, the dimension
of (TTDd ) is by order of magnitudes less than that one of (TTDp ). At the same time, solving
the dual problem one can easily recover, via the Conic duality theorem, the optimal solution to
the primal problem. As a kind of ”penalty” for relatively small # of variables, (TTDd ) has a lot
of inequality constraints; note, anyhow, that for many methods it is much easier to struggle with
many constraints than with many variables; this is, in particular, the case with the Newton-
based methods5 . Thus, passing - in a completely mechanical way! - from the primal problem to
the dual one, we improve the ”computational tractability” of the problem.
Third step: back to primal. And now let us demonstrate how duality allows to obtain a
better insight on the problem. To this end let us derive the problem dual to (TTDd ). This looks
crazy: we know that dual to dual is primal, the problem we started with. There is, anyhow, an
important point: (TTDd ) is equivalent to the conic dual to (TTDp ), not the conic dual itself;
therefore, taking dual to (TTDd ), we should not necessarily obtain the primal problem, although
we may expect that the result will be equivalent to this primal problem.
Let us implement our plan. First, we rewrite (TTDd ) in an equivalent conic form. To
this end we introduce extra variables yij ∈ R, i = 1, ..., m, j = 1, ..., k, in order to ”localize”
nonlinearities, and an extra variable f to represent the objective (5.25) (look: a minute ago we
tried to eliminate as many variables as possible, and now we go in the opposite direction... This
5
since the number of constraints influences only the complexity of assembling the Newton system, and the
complexity is linear in this number; in contrast to this, the # of variables defines the size of the Newton system,
and the complexity of solving the system is cubic in # of variables
5.5. EXERCISES: TRUSS TOPOLOGY DESIGN VIA CONIC DUALITY 87
is life, isn’t it?) More specifically, consider the system of constraints on the variables z j , λj , yij ,
f (i runs from 1 to m, j runs from 1 to k):
(bTi z j )2
yij ≥ ; λj ≥ 0, i = 1, ..., m, j = 1, ..., k; (5.27)
λj
k h
X i
f≥ 2z Tj fj + V yij , i = 1, ..., m; (5.28)
j=1
k
X
λj = 1. (5.29)
j=1
It is immediately seen that (TTDd ) is equivalent to minimization of the variable f under the
constraints (5.27) - (5.29). This latter problem is in the conic form (P) of Section 5.4.4, since
(5.27) can be equivalently rewritten as
yij bTi z j
≥ 0, i = 1, ..., m, j = 1, ..., k (5.30)
bTi z j λj
(”≥ 0” for symmetric matrices stands for ”positive semidefinite”); to justify this equivalence,
think what is the criterion of positive semidefiniteness of a 2 × 2 symmetric matrix.
We see that (TTDd ) is equivalent to the problem of minimizing f under the constraints (5.28)
- (5.30). This problem, let it be called (π), is of form (P), Section 5.4.4, with the following data:
k h
X i
(Aζ )i = f − 2z Tj fj + V yij ,
j=1
yij bTi z j
Aπij = ;
bTi z j λj
• χ is the vector with the only nonzero component (associated with the f -component of the
design vector) equal to 1.
P
• The system P (ξ−p) = 0 is j λj = 1, so that P T r, r ∈ R, is the vector with λ· -components
equal to r and remaining components equal to 0, and p is P T k1 .
88 CHAPTER 5. CONIC PROBLEMS AND CONIC DUALITY
Exercise 5.5.6 + Prove that the conic dual, in the sense of Section 5.4.4, to problem (π) is
equivalent to the following program:
(ψ): minimize "m 2#
X βij
max (5.31)
j=1,...,k
i=1
φi
by choice of m-dimensional vector φ and mk reals βij subject to the constraints
m
X
φ ≥ 0; φi = V ; (5.32)
i=1
m
X
βij bi = fj , j = 1, ..., k. (5.33)
i=1
Fourth step: from primal to primal. We do not know what is the actual relation between
problem (ψ) and our very first problem (TTDini ) - what we can say is:
”(ψ) is equivalent to the problem which is conic dual to the problem which is equivalent to
the conic dual to the problem which is equivalent to (TTDini )”;
it sounds awkful, especially taking into account that the notion of equivalency between problems
has no exact meaning. At the same time, looking at (ψ), namely, at equation (5.32), we may
guess that φi are nothing but our bar volumes ti - the design variables we actually are interested
in, so that (ψ) is a ”direct reformulation” of (TTDini ) - the φ-component of optimal solution to
(ψ) is nothing but the t-component of the optimal solution to (TTDini ). This actually is the
case, and the proof could be given by tracing the chain wich leaded us to (ψ). There is, anyhow,
a direct, simple and instructive way to establish equivalency between the initial and the final
problems in our chain, which is as follows.
Given a feasible solution (t, x1 , ..., xk ) to (TTDini ), consider the bar forces
βij = ti xTj bi ;
these quantities are magnitudes of the reaction forces caused by elongations of the bars under
the corresponding loads. The equilibrium equations
A(t)xj = fj
P P T
in view of A(t) = i ti Ai ≡ i ti bi bi say exactly that
X
βij bi = fj , j = 1, ..., k; (5.34)
i
to problem (ψ). What is the value of the objective of the latter problem at the indicated plan?
Multiplying (5.34) by xTj and taking into account the origin of βij , we see that j-th compliance
cj = xTj fj is equal to
X X 2
X βij 2
X βij
βij xTj bi = ti (xTj bi )2 = = ,
i i i
ti i
φi
5.5. EXERCISES: TRUSS TOPOLOGY DESIGN VIA CONIC DUALITY 89
so that the value of the objective of (TTDini ) at (t, x1 , ..., xk ), which is maxj cj , is exactly the
value of the objective (5.31) of the problem (ψ) at the feasible plan (5.35) of the latter problem.
Thus, we have establish the following proposition:
A. Transformation (5.35) maps a feasible plan (t, x1 , ..., xk ) to problem (TTSini ) into feasible
plan (φ, β· ) to problem (ψ), and the value of the objective of the first problem at the first plan
is equal to the value of the objective of the second problem at the second plan.
Are we done? Have we established the desired equivalence between the problems? No! Why do
we know that images of the feasible plans to (TTDini ) under mapping (5.35) cover the whole
set of feasible plans of (ψ)? And if it is not the case, how can we be sure that the problems are
equivalent - it may happen that optimal solution to (ψ) corresponds to no feasible plan of the
initial problem!
And the image of mapping (5.35) indeed does not cover the whole feasible set of (ψ), which
is clear by dimension reasons: the dimension of the feasible domain of (TTDini ), regarded as a
nonlinear manifold, is m − 1 (this is the # of independent ti ’s; xj are functions of t given by
the equilibrium equations); and the dimension of the feasible domain of (ψ), also regarded as
a manifold, is m − 1 (# of independent φi ’s) plus mk (# of βij ) minus nk (# of scalar linear
equations (5.33)), i.e., it might be by order of magnitudes greater than the dimension of the
feasible domain of (TTDini ) (recall that normally m >> n). In other words, transformation
(5.35) allows to obtain only those feasible plans of (ψ) where the β-part is determined, via the
expressions
βij = ti xTj bi ,
by k n-dimensional vectors xj (which is also clear from the origin of the problem: the actual
bar forces should be caused by certain displacements of the nodes), and this is in no sense a
consequence of the constraints of problem (ψ): relations (5.33) say only that the sum of the
reaction forces balances the external load, and says nothing on the ”mehcanical validity” of the
reaction forces, i.e., whether or not they are caused by certain displacements of the nodes. Our
dimension analysis demonstrates that the reaction forces caused by nodal displacements - i.e.,
those valid mechanically - form a very small part of all reaction forces allowed by equations
(5.33).
In spite of these pessimistic remarks, we know that the optimal value in (ψ) - which is
basically dual to dual to (TTDini ) - is the same one as that one in (TTDini ), so that in fact
the optimal solution to (ψ) is in the image of mapping (5.35). Can we see it directly, without
referring to the chain of transformations which leaded us to (ψ)? Yes! It is very simple to verify
that the following proposition holds:
B. Let (φ, β· ) be a feasible plan to (ψ) and ω be the corresponding value of the objective. Then
φ can be extended to a feasible plan (t = φ, x1 , ..., xk ) to (TTDini ), and the maximal, over the
loads f1 , ..., fk , compliance of the truss t is ≤ ω.
matrices), but most of them were completely routine - we used in a straightforward manner
the general scheme of conic duality. In fact the ”clever” steps also are completely routine;
small experience suffices to see immediately that the epigraph of the compliance can be
represented in terms of nonnegativity of certain quadratic forms or, which is the same,
in terms of positive semidefiniteness of certain matrices linearly depending on the control
vectors; this is even easier to do with the constraints (5.30). I would qualify our chain of
reformulations as a completely straightforward.
• Let us look, anyhow, what are the results of our effort. There are two of them:
(a) ”compressed”, as far as # of variables is concerned, form (TTDd ) of the problem; as it
was mentioned, reducing # of variables, we get better possibilities for numerical processing
the problem;
(b) very instructive ”bar forces” reformulation (ψ) of the problem.
• After the ”bar forces” formulation is guessed, one can easily establish its equivalence to the
initial formulation; thus, if our only goal were to replace (TTDini ) by (ψ), we could restrict
ourselves with the fourth step of our construction and skip the preceding three steps. The
question, anyhow, how to guess that (ψ) indeed is equivalent to (TTDini ). This is not that
difficult to look what are the equilibrium equations in terms of the bar forces βij = ti xTj bi ;
but one hardly could be courageous enough (and, to the best of our knowledge, in fact was
not courageous) to conjecture that the ”heart of the situation” - the restriction that the
bar forces should be caused by certain displacements of the nodes - simply is redundant:
in fact we can forget that the bar forces should belong to an ”almost negligible”, as far
as dimensions are concerned, manifold (given by the equations βij = ti xTj bi ), since this
restriction on the bar forces is automatically satisfied at any optimal solution to (ψ) (this
is what actually is said by B.).
Thus, the things are as they should be: routine transformations result in something which,
in principle, could be guessed and proved directly and quickly; the bottleneck is in this ”in
principle”: it is not difficult to justify the answer, it is difficult to guess what the answer is.
In our case, this answer was ”guessed” via straightforward applications of a quite routine
general scheme, scheme useful in other cases as well; to demonstrate the efficiency of this
scheme and some ”standard” tricks in its implementation, this is exactly the goal of this
text.
• To conclude, let me say several words on the ”bar forces” formulation of the TTD problem.
First of all, let us look what is this formulation in the single-load case k = 1. Here the
problem becomes
X β2
i
minimize
i
φi
where κ > 0 is given. In this case the ”direct” reasoning establishing the equivalence
between (TTDini ) and (ψ) remains valid and results in the following ”bar forces” setting:
m
X 2
βij
minimize max
j=1,...,k
i=1
tκi
A bad news here is that the problem turns out to be convex in (t, β· ) if and only if κ ≥ 1,
and from the mechanical viewpoint, the only interesting case in this range of values of κ
is that one of linear model (κ = 1).
92 CHAPTER 5. CONIC PROBLEMS AND CONIC DUALITY
Chapter 6
The goal of this lecture is to develop the method which extends onto the general convex case
the very first polynomial time interior point method - the method of Karmarkar. Let me say
that there is no necessity to start with the initial LP method and then pass to the extensions,
since the general scheme seems to be more clear than its particular LP implementation.
93
94 CHAPTER 6. THE METHOD OF KARMARKAR
b + L = {x ∈ M | eT x = 1}, M = {x | Ax = 0}.
minimize cT x s.t. x ∈ K ∩ M, eT x = 1,
σ = c − c∗ e;
since on the feasible plane of the problem eT x is identically 1, this updating indeed results in
equivalent problem with the optimal value equal to 0.
Thus, we have seen that (P) can be easily rewritten in the so called Karmarkar format
with M being a linear subspace in Rn and the optimal value in the problem being zero; this
transformation preserves, of course, properties A, B.
Remark 6.2.1 In the original description of the method of Karmarkar, the problem from the
very beginning is assumed to be in the form (PK ), with K = Rn+ ; moreover, Karmarkar assumes
that
e = (1, ..., 1)T ∈ Rn
b to the problem is the barycenter n−1 e of
and that the given in advance strictly feasible solution x
the standard simplex; thus, in the original version of the method it is assumed that the feasible
set Kf of the problem is the intersection of the standard simplex
n
X
∆ = {x ∈ Rn+ | xi ≡ eT x = 1}
i=1
6.3. THE KARMARKAR POTENTIAL FUNCTION 95
and a linear subspace of Rn passing through the barycenter n−1 e of the simplex and, besides
this, that the optimal value in the problem is 0.
And, of course, in the Karmarkar paper the barrier for the cone K = Rn+ underlying the
whole construction is the standard n-logarithmically homogeneous barrier
n
X
F (x) = − ln xi
i=1
v(x
b) − v(x) F (x
b) − minrint Kf F
σ T x ≡ cT x − c∗ ≤ V exp{− }, V = (cT x
b − c∗ ) exp{ }; (6.4)
ϑ ϑ
note that minrint Kf F is well defined, since Kf is bounded (due to A) and the restriction of F
onto the relative interior of Kf is self-concordant barrier for Kf (Proposition 3.1.1.(i)).
The proof is immediate:
b) − v(x) = ϑ[ln(σ T x
v(x b) − ln(σ T x)] + F (x
b) − F (x) ≤
≤ ϑ[ln(σ T x
b) − ln(σ T x)] + F (x
b) − min F,
rint Kf
K : x 7→ x+
1) Given strictly feasible solution x to problem (PK ), compute the gradient F 0 (x) of the barrier
F;
σ T (y − x)
vx (y) = F (y) + ϑ + ϑ ln(σ T x)
σT x
Ex = {y | y ∈ M, (y − x)T F 0 (x) = 0}
1
ex = argmin{hT ∇y vx (x) + hT ∇2y vx (x)h | h ∈ M, hT F 0 (x) = 0};
2
and set
1
x0 = x + ex .
1+ω
4) The point x0 belongs to the intersection of the subspace M and the interior of K. Find a
point x00 from this intersection such that
v(x00 ) ≤ v(x0 )
Proposition 6.4.1 The above updating is well defined, maps a strictly feasible solution x to
(P)K into another strictly feasible solution x+ to (P) and decreases the Karmarkar potential at
least by absolute constant:
1 4
v(x+ ) ≤ v(x) − χ, χ = − ln > 0. (6.5)
3 3
6.4. THE KARMARKAR UPDATING SCHEME 97
Proof.
00 . Let us start with the following simple observations:
are well defined and, moreover, belong to Kf (indeed, since both x and y are in K ∩ M and
φ(t) is positive for 0 ≤ t < t∗ , the points xt also are in K ∩ M ; to establish feasibility, we should
verify, in addition, that eT xt = 1, which is evident).
Thus, xt , 0 ≤ t < t∗ , is certain curve in the feasible set. Let us prove that |xt |2 → ∞ as
t → t∗ − 0; this will be the desired contradiction, since Kf is assumed to be bounded (see A).
Indeed, φ(t) → 0 as t → t∗ − 0, while x + t(y − x) has a nonzero limit x + t∗ (y − x) (this limit
is nonzero as a convex combination of two points from the interior of K and, therefore, a point
from this interior; recall that K is pointed, so that the origin is not in its interior).
We have proved (6.6); (6.7) is an immediate consequence of this relation, since if there were
y ∈ int K ∩ M with σ T y ≤ 0, the vector [eT y]−1 y would be a strictly feasible solution to the
problem (since we already know that eT y > 0, so that the normalization y 7→ [eT y]−1 y would
keep the point in the interior of the cone) with nonnegative value of the objective, which, as we
know, is impossible.
10 . Let us set
since x ∈ M is an interior point of K, G is a closed convex domain in the affine plane Ex (this
latter plane from now on is regarded as the linear space G is embedded to); the (relative) interior
of G is exactly the intersection of Ex and the interior of the cone K.
20 . Further, let f (·) be the restriction of the barrier F on rint G; due to our combination
rules for self-concordant barriers, namely, that one on affine substitutions of argument, f is
ϑ-self-concordant barrier for G.
30 . By construction, the ”partially linearized” potential, regarded as a function on rint G, is
the sum of the barrier f and a linear form:
vx (y) = f (y) + pT (y − x) + q,
where the linear term pT (y −x)+q is nothing but the first order Taylor expansion of the function
ϑ ln(σ T y)
at the point y = x. From (6.7) it immediately follows that this function (and therefore v(·)) is
well-defined onto int K ∩ M and, consequently, on rint G; besides this, the function is concave
in y ∈ rint G. Thus, we have
50 . Now comes the first crucial point of the proof: the reduced Newton decrement ω is not
too small, namely,
1
ω≥ . (6.10)
3
Indeed, x is the analytic center of G with respect to the barrier f (since, by construction, Ex is
orthogonal to the gradient F 0 of the barrier F at x, and f is the restriction of F onto Ex ). Since
f , as we just have mentioned, is ϑ-self-concordant barrier for G, and f is nondegenerate (as
a restriction of a nondegenerate self-concordant barrier F , see Proposition 5.3.1), the enlarged
Dikin ellipsoid
√
W + = {y ∈ Ex | |y − x|x ≤ ϑ + 2 ϑ}
(| · |x is the Euclidean norm generated by F 00 (x)) contains the whole G (the Centering property,
Lecture 3, V.). Now, the optimal solution x∗ to (PK ) satisfies the relation σ T x∗ = 0 (the origin
of σ) and is a nonzero vector from K ∩ M (since x∗ is feasible for the problem). It follows
that the quantity (x∗ )T F 0 (x) is negative (since F 0 (x) ∈ int (−K ∗ ), Proposition 5.3.3.(i)), and
therefore the ray spanned by x∗ intersects G at certain point y ∗ (indeed, G is the part of K ∩ M
given by the linear equation y T F 0 (x) = xT F 0 (x), and the right hand side in this equation is −ϑ,
see (5.5), Lecture 5, i.e., is of the same sign as (x∗ )T F 0 (x)). Since σ T x∗ = 0, we have σ T y ∗ = 0;
thus,
there exists y ∗ in G, and, consequently, in the ellipsoid W + , with σ T y ∗ = 0.
We conclude that the linear form
σT y
ψ(y) = ϑ
σT x
which is equal to ϑ at the center x of the ellipsoid W + , attains the zero value somewhere in
the ellipsoid, and therefore its variation over the ellipsoid is at least 2ϑ. Consequently, the
variation√of the form over the centered at x unit Dikin ellipsoid of the barrier f is at least
2ϑ(ϑ + 2 ϑ)−1 ≥ 2/3:
σT h 1
max{ϑ T
| h ∈ M, hT F 0 (x) = 0, |h|x ≤ 1} ≥ .
σ x 3
But the linear form in question is exactly ∇y vx (x), since ∇y f (x) = 0 (recall that x is the analytic
center of G with respect to f ), so that the left hand side in the latter inequality is the Newton
decrement of vx (·) (as always, regarded as a function on rint G) at x, i.e., it is nothing but ω.
6.5. OVERALL COMPLEXITY OF THE METHOD 99
60 . Now comes the concluding step: the Karmarkar potential v is constant along rays:
v(tu) = v(t) whenever u ∈ Dom v and t > 0 [this is an immediate consequence of ϑ-logarithmic
homogeneity of the barrier F ]1 . As we just have seen,
1
v(x0 ) ≤ v(x) − ρ(− );
3
by construction, x00 is a point from int K ∩ M such that
v(x00 ) ≤ v(x0 ).
According to (6.6), when passing from x00 to x+ = [eT x00 ]−1 x00 , we get a strictly feasible solution
to the problem, and due to the fact that v remains constant along rays, v(x+ ) = v(x00 ). Thus,
we come to v(x+ ) ≤ v(x) − ρ(− 13 ), as claimed.
xi = K(xi−1 ), x0 = x
b, (6.11)
x
b being the initial strictly feasible solution to the problem (see B).
An immediate corollary of Propositions 6.3.1 and 6.4.1 is the following complexity result:
Theorem 6.5.1 Let problem (PK ) be solved by the method of Karmarkar associated with ϑ-
logarithmically homogeneous barrier F for the cone K, and let assumptions A - C be satisfied.
Then the iterates xi generated by the method are strictly feasible solutions to the problem and
v(x
b) − v(xi ) iχ 1 4
cT xi − c∗ ≤ V exp{− } ≤ V exp{− }, χ = − ln , (6.12)
ϑ ϑ 3 3
with the data-dependent scale factor V given by
F (x
b) − minrint Kf F
V = (cT x
b − c∗ ) exp{ }. (6.13)
ϑ
In particular, the Newton complexity (# of iterations of the method) of finding an ε-solution to
the problem does not exceed the quantity
V
NKarm (ε) = O(1)ϑ ln + 1 + 1, (6.14)
ε
O(1) being an absolute constant.
Comments.
• We see that the Newton complexity of finding an ε-solution by the method of Karmarkar is
proportional to ϑ; on the other hand, the restriction of F on the feasible set Kf is a ϑ-self-
concordant barrier for this set (Proposition 3.1.1.(i)), and we might solve the problem by
the path-following method associated with this √ restriction, which would result in a better
Newton complexity, namely, proportional to ϑ. Thus, from the theoretical complexity
1
and in fact the assumption of logarithmic homogeneity of F , same as the form of the Karmarkar potential,
originate exactly from the desire to make the potential constant along rays
100 CHAPTER 6. THE METHOD OF KARMARKAR
viewpoint the method of Karmarkar is significantly worse than the path-following method;
why should we be interested in the method of Karmarkar?
The answer is: due to the potential reduction nature of the method, the nature which un-
derlies the excellent practical performance of the algorithm. Look: in the above reasoning,
the only thing we are interested in is to decrease as fast as possible certain explicitly given
function - the potential. The theory gives us certain ”default” way of updating the current
iterate in a manner which guarantees certain progress (at least by an absolute constant)
in the value of the potential at each iteration, and it does not forbid as to do whatever we
want to get a better progress (this possibility was explicitly indicated in our construction,
see the requirements on x00 ). E.g., after x0 is found, we can perform the line search on the
intersection of the ray [x, x0 ) with the interior of G in order to choose as x00 the best, in
terms of the potential, point of this intersection rather than the ”default” point x0 . There
are strong reasons to expect that in some important cases the line search decreases the
value of the potential by much larger quantity than that one given by the above theoretical
analysis (see exercises accompanying this lecture); in accordance with these expectations,
the method in fact behaves itself incomparably better than it is said by the theoretical
complexity analysis.
• What is also important is that all ”common sense” improvements of the basic Karmarkar
scheme, like the aforementioned line search, do not spoil the theoretical complexity bound;
and from the practical viewpoint a very attractive property of the method is that the
potential gives us a clear criterion to decide what is good and what is bad. In contrast to
this, in the path-following scheme we either should follow the theoretical recommendations
on the rate of updating the penalty - and then for sure will be enforced to perform a lot of
Newton steps - or could increase the penalty at a significantly higher rate, thus destroying
the theoretical complexity bound and imposing a very difficult questions of how to choose
and to tune this higher rate.
• Let me say several words about the original method of Karmarkar for LP. In fact this
is exactly the particular case of the aforementioned scheme for the sutiation described in
Remark 6.2.1; Karmarkar, anyhow, presents the same method in a different way. Namely,
instead of processing the same data in varying, from iteration to iteration, plane Ex , he
uses scaling - after a new iterate xi is found, he performs fractional-linear substitution of
the argument
X −1 x
x 7→ T i −1 , Xi = Diag{xi }
e Xi x
(recall that in the Karmarkar situation e = (1, ..., 1)T ). With this substitution, the problem
becomes another problem of the same type (with new objective σ and new linear subspace
M ), and the image of the actual iterate xi becomes the barycenter n−1 e of the simplex ∆.
It is immediately seen that in the Karmarkar case to decrease by something the Karmarkar
potential for the new problem at the image n−1 e of the current iterate is the same as to
decrease by the same quantity the potential of the initial problem at the actual iterate xi ;
thus, scaling allows to reduce the question of how to decrease the potential to the particular
case when the current iterate is the barycenter of ∆; this (specific for LP) possibility to deal
with certain convenient ”standard configuration” allows to carry out all required estimates
(which in our approach were consequences of general properties of self-concordant barriers)
P
via direct analysis of the behaviour of the standard logarithmic barrier F (x) = − i ln xi
in a neighbourhood of the point n−1 e, which is quite straightforward.
6.6. HOW TO IMPLEMENT THE METHOD OF KARMARKAR 101
Let me also add that in the Karmarkar situation our general estimate becomes
iχ
cT xi − c∗ ≤ (cT x
b − c∗ ) exp{− },
n
since the parameter of the barrier in the case in question is ϑ = n and the starting point
b = n−1 e is the minimizer of F on ∆ and, consequently, on the feasible set of the problem.
x
σ = c − c∗ e,
c∗i = c∗i−1
and, consequently,
σi = σi−1 .
If it turns out that ωi < 1/3, we act as follows. The quantity ω given by rule 3) depends on the
objective σ the rules 1)-3) are applied to:
ω = Ωi (σ).
It follows that the equation Ωi (c − te) = 13 is solvable, and its closest to c∗i−1 root to the right of
c∗i−1 separates c∗ and c∗i−1 , i.e., this root (which can be immediately computed) is an improved
lower bound for c∗ . This is exactly the lower bound which we take as c∗i ; after it is found, we
set
σi = c − c∗i e
and update xi into xi+1 by the basic scheme applied to this ”improved” objective (for which
this scheme, by construction, results in ω = 13 ).
Following the line of argument used in the proofs of Propositions 6.3.1, 6.4.1, one can verify
that the modification in question produces strictly feasible solutions xi and nondecreasing lower
bounds c∗i ≤ c∗ of the unknown optimal value in such a way that the sequence of local potentials
v0 (x0 ) − vi (xi ) iχ
cT xi − c∗ ≤ V exp{− } ≤ V exp{− },
ϑ ϑ
F (x
b) − minrint Kf F
V = (cT x
b − c∗0 ) exp{ }
ϑ
completely similar to that one for the case of known optimal value.
6.7. EXERCISES ON THE METHOD OF KARMARKAR 103
K σ = {z ∈ M ∩ K | σ T z = 1},
find a new point y + of this relative interior with F (y + ) being ”significantly less” than F (y).
Could you guess what is the ”linesearch” (with x00 = argminy=x+t(x0 −x) v(y)) version of the
Karmarkar updating K in terms of this new parameterization of R?
Exercise 6.7.2 # Verify that the Karmarkar updating with linesearch is nothing but the Newton
iteration with linesearch as applied to the restriction of F onto the relative interior of K σ .
Now, can we see from our new interpretation of the method why it converges at the rate
given by Theorem 6.5.1? This is immediate:
Exercise 6.7.3 #+ Prove that
104 CHAPTER 6. THE METHOD OF KARMARKAR
Exercise 6.7.4 # Prove that the scaling X possesses the following properties:
F (X u) = F (u) + const(x);
in particular,
|X h|X u = |h|u , u ∈ int K, h ∈ Sn ;
• the scaling maps the feasible set Kf of problem (PK ) onto the feasible set of another
problem (PK0 ) of the same type; the updated problem is defined by the subspace
M 0 = X M,
e0 = x1/2 ex1/2
• let v(·) be the potential of the initial problem, and v 0 be the potential of the new one. Then
the potentials at the corresponding points coincide, up to an additive constant:
• X maps the point x onto the unit matrix I, and the iterate x+ of x given by the linesearch
version of the method as applied to the initial problem into the similar iterate I + of I given
by the linesearch version of the method as applied to the transformed problem.
From Exercise 6.7.4 it is clear that in order to answer the question (?), it suffices to answer the
similar question (of course, not about the initial problem itself, but about a problem of the same
type with updated data) for the particular case when the current iterate is the unit matrix I.
Let us consider this special case. In what follows we use the original notation for the data of the
transformed problem; this should not cause any confusion, since we shall speak about exactly
one step of the method.
Now, what is the situation in our ”standard configuration” case x = I? It is as follows:
we are given a linear subspace M passing through x = I and the objective σ; what we know
is that2
I. (σ, u) ≥ 0 whenever u ∈ int K ∩ M and there exists a nonzero matrix x∗ ∈ int K ∩ M
such that (σ, x∗ ) = 0;
II. In order to update x = I into x+ , we compute the steepest descent direction ξ of the
Karmarkar potential v(·) at the point x along the affine plane
Ex = {y ∈ M | (F 0 (x), y − x) = 0},
2
from now on we denote the inner product on the space in question, i.e., on the space Sn of symmetric n × n
matrices, by (x, y) (recall that this is the Frobenius inner product Tr{xy}), in order to avoid confusion with the
matrix products like xT y
106 CHAPTER 6. THE METHOD OF KARMARKAR
the metric in the subspace being |h|x ≡ (F 00 (x)h, h)1/2 , i.e., find among the unit, with respect
to the indicated norm, directions parallel to Ex that one with the smallest (e.g., the ”most
negative”) inner product onto v 0 (x). Note that the Newton direction ex is proportional, with
positive coefficient, to the steepest descent direction ξ. Note also, that the steepest descent
direction of v at x is the same as the similar direction for the function n ln((σ, u)) at u = x (recall
that for the barrier in question ϑ = n), since x is the minimizer of the remaining component
F (·) of v(·) along Ex .
Now, in our standard configuration case x = i we have F 0 (x) = −I, and |h|x = (h, h)1/2 is
the usual Frobenius norm3 ; thus, ξ is the steepest descent direction of the linear form
Π = M ∩ {h : Tr h ≡ (F 0 (I), h) = 0}
with respect to the standard Euclidean structure of our universe Sn . In other words, ξ is
proportional, with negative coefficient, to the orthogonal projection η of
S ≡ (σ, I)−1 σ
III. says to us that (S, I) = 1; since η is the orthoprojection of S onto Π (see II.), we have also
(S, η) = (η, η). Thus,
n
X
φ(t) ≡ v(I − tη) = − ln Det (I − tη) + n ln(1 − t(η, η)) = − ln((1 − tgi ) + n ln(1 − t|g|22 ), (6.17)
i=1
where g = (g1 , ..., gn )T is the vector comprised of the eigenvalues of the symmetric matrix η.
Exercise 6.7.5 #+ Prove that
P
1) ni=1 gi = 0;
2) |g|∞ ≥ n−1 .
3
due to the useful formulae for the derivatives of the barrier F (u) = − ln Det u: F 0 (u) = −u−1 , F 00 (u)h =
u−1 hu−1 ; those solved Exercise 3.3.3, for sure know these formulae, and all others are kindly asked to derive them
4
recall that ex is proportional, with positive coefficient, to ξ and, consequently, is proportional, with negative
coefficient, to η
6.7. EXERCISES ON THE METHOD OF KARMARKAR 107
Now, from (6.17) it turns out that the progress in the potential is given by
n
X
α = φ(0) − min φ(t) = max[ ln(1 − tgi ) − n ln(1 − t|g|22 )], (6.18)
t∈T t∈T
i=1
demonstrate that 2
|g|2
α ≥ (1 − ln 2) . (6.19)
|g|∞
The conclusion of our analysis is as follows:
each step of the method of Karmarkar with linesearch applied to a semidefinite program can
be associated with an n-dimensional vector g (depending on the data and the iteration number)
in such a way that the progress in the Karmarkar potential at a step is at least the quantity
given by (6.19).
Now, the worst case complexity bound for the method comes from the worst case value of
the right hand side in (6.19); this latter value (equal to 1 − ln 2) corresponds to the case when
|g|2 |g|−1
∞ ≡ π(g) attains its minimum in g (which is equal to 1); note that π(g) is of order of 1
only if g is an ”orth-like” vector - its 2-norm comes from O(1) dominating coordinates. Note,
anyhow, that the ”typical” n-dimensional vector is far from being an ”orth-like” one, and the
”typical” value of π(g) is much larger than 1. Namely, if g is a random vector in Rn with the
direction uniformly distributed on the unit sphere, than the ”typical value” of π(g) is of order of
p
n/ ln n (the probability for π to be less than certain absolute constant times this square root
tends to 0 as n → ∞; please prove this simple statement). If (if!) we could use this ”typical”
value of π(g) in our lower bound for the progress in the potential, we would come to the progress
per step equal to O(n/ ln n) rather than to the worst-case value O(1); as a result, the Newton
complexity of finding ε-solution would be proportional to ln n rather than to n, which would be
actually excellent! Needless to say, there is no way to prove something definite of this type, even
after we equip the family of problems in question by a probability distribution in order to treat
the vectors g arising at sequential steps as a random sequence. The difficulty is that the future
of the algorithm is strongly predetermined by its past, so that any initial symmetry seems to be
destroyed as the algorithm goes on.
Note, anyhow, that impossibility to prove something does not necessarily imply impossibility
to understand it. The ”anticipated” complexity of the method (proportional to ln n rather than
to n) seems to be quite similar to its empirical complexity; given the results of the above
”analysis”, one hardly could be too surprised by this phenomenon.
108 CHAPTER 6. THE METHOD OF KARMARKAR
Chapter 7
We became acquainted with the very first of the potential reduction interior point methods
- with the method of Karmarkar. Theoretically, a disadvantage of the method is in not so
good complexity bound - it is proportional to the parameter ϑ of the underlying barrier, not
to the square root of this parameter, as in the case of the path-following
√ method. There are,
anyhow, potential reduction methods with the same theoretical O( ϑ) complexity bound as in
the path-following scheme; these methods combine the best known theoretical complexity with
the practical advantages of the potential reduction algorithms. Our today lecture is devoted to
one of these methods, the so called Primal-Dual algorithm; the LP prototype of the construction
is due to Todd and Ye.
where
K ∗ = {s ∈ Rn | sT x ≥ 0 ∀x ∈ K}
109
110 CHAPTER 7. THE PRIMAL-DUAL POTENTIAL REDUCTION METHOD
This assumption, by virtue of the Conic duality theorem (Lecture 5), implies that both the
primal and the dual problem are solvable, and the sum of the optimal values in the problems is
equal to cT b:
P ∗ + D∗ = cT b. (7.1)
Besides this, we know from Lecture 5 that for any pair (x, s) of feasible solutions to the problems
one has
δ(x, s) ≡ cT x + bT s − cT b = sT x ≥ 0. (7.2)
Substracting from this identity equality (7.1), we come to the following conclusion:
(*): for any primal-dual feasible pair (x, s), the duality gap δ(x, s) is nothing but the sum of
inaccuracies, in terms of the corresponding objectives, of x regarded as an approximate solution
to the primal problem and s regarded as an approximate solution to the dual one.
In particular, all we need is to generate somehow a sequence of primal-dual feasible pairs
with the duality gap tending to zero.
Now, how to enforce the duality gap to go to zero? To this end we shall use certain potential;
to construct this potential, this is our first goal.
(”we know”, as usual, means that given x, we can check whether x ∈ Dom F and if it is the
case, can compute F (x), F 0 (x), F 00 (x), and similarly for F ∗ ).
As we know from Lecture 5, F ∗ is ϑ-logarithmically homogeneous self-concordant barrier for
the cone −K ∗ anti-dual to K, and, consequently, the function
F + (s) = F ∗ (−s)
is a ϑ-logarithmically homogeneous self-concordant barrier for the dual cone K ∗ involved into
the dual problem. In what follows I refer to F as to the primal, and to F + - as to the dual
barrier.
Now let us consider the following aggregate:
This function is well-defined on the direct product of the interiors of the primal and the dual
cones, and, in particular, on the direct product
rint Kp × rint Kd
Kp = {b + L} ∩ K, Kd = {c + L⊥ } ∩ K ∗ .
7.2. PRIMAL-DUAL POTENTIAL 111
The function V0 resembles the Karmarkar potential; indeed, when s ∈ rint Kd is fixed, this func-
tion, regarded as a function of primal feasible x, is, up to an additive constant, the Karmarkar
potential of the primal problem, where one should replace the initial objective c by the objective
s 1.
Note that we know something about the aggregate V0 : Proposition 5.3.3 says to us that
V0 (x, s) ≥ ϑ ln ϑ − ϑ, (7.4)
the inequality being equality if and only if ts + F 0 (x) = 0 for some positive t.
Now comes the crucial step. Let us choose a positive µ and pass from the aggregate V0 to
the potential
My claim is that this potential possesses the same fundamental property as the Karmarkar
potential: when it is small (i.e., negative with large absolute value) at a strictly feasible primal-
dual pair (x, s), then the pair is comprised of good primal and dual approximate solutions.
The reason for this claim is clear: before we had added to the aggregate V0 the ”penalty
term” µ ln(sT x), the aggregate was below bounded, as it is said by (7.4); therefore the only way
for the potential to be small is to have small (negative of large modulus) value of the penalty
term, which, in turn, may happen only when the duality gap (which at a primal-dual feasible
pair (x, s) is exactly sT x, see (7.2)) is close to zero.
The quantitive expression of this observation is as follows:
Proposition 7.2.1 For any strictly feasible primal-dual pair (x, s) one has
Vµ (x, s)
δ(x, s) ≤ Γ exp{ }, Γ = exp{−µ−1 ϑ(ln ϑ − 1)}. (7.5)
µ
Vµ (x, s) − V0 (x, s)
ln δ(s, x) = ln(sT x) = ≤
µ
[due to (7.4)]
Vµ (x, s)
≤ − µ−1 ϑ(ln ϑ − 1).
µ
where
ζ = ϑ + µ, const(s) = F + (s).
Now, same as in the method of Karmarkar, let us linearize the logarithmic term in v(·), i.e.,
form the function
sT y
vx (y) = F (y) + ζ T + const(x, s) : rint Kp → R, (7.6)
s x
where, as it is immediately seen,
const(x, s) = const(s) + ζ ln sT x − ζ.
so that in order to update x into a new strictly feasible primal solution x+ with improved value
of the potential v(·), it suffices to improve the value of the upper bound vx (·) of the potential.
Now, vx is the sum of a self-concordant barrier for the primal feasible set (namely, the restriction
of F onto this set) and a linear form, and therefore it is self-concordant on the relative interior
rint Kp of the primal feasible set; consequently, to decrease the function, we may use the damped
Newton method. Thus, we come to the following
Rule 1. In order to update a given strictly feasible pair (x, s) into a new strictly feasible pair
(x0 , s) with the same dual component and with better value of the potential Vµ , act as follows:
1) Form the ”partially linearized” reduced potential vx (y) according to (7.6);
2) Update x into x0 by damped Newton iteration applied to vx (·), i.e.,
- compute the (reduced) Newton direction
1
ex = argmin{hT ∇y vx (x) + hT ∇2y vx (x)h | h ∈ L} (7.8)
2
and the (reduced) Newton decrement
q
ω= −eTx ∇y vx (x); (7.9)
7.3. THE PRIMAL-DUAL UPDATING 113
- set
1
x0 = x + ex .
1+ω
As we know from Lecture 2, the damped Newton step keeps the iterate within the domain of
the function, so that x0 ∈ rint Kp , and decreases the function at least by ρ(−ω) ≡ ω − ln(1 + ω).
This is the progress in vx ; from (7.7) it follows that the progress in the potential v(·), and,
consequently, in Vµ , is at least the progress in vx . Thus, we come to the following conclusion:
I. Rule 1 transforms the initial strictly feasible primal-dual pair (x, s) into a new strictly
feasible primal-dual pair (x0 , s), and the potential Vµ at the updated pair is such that
since for primal-dual feasible (y, s) the product sT y is nothing but the duality gap cT y+bT s−cT b
(Lecture 5). The duality gap is always nonnegative, so that the quantity
cT b − bT s
associated with a dual feasible s is a lower bound for the primal optimal value. Thus, the
potential Vµ , regarded as a function of y, resembles the ”local” potential used in the sliding
objective version of the method of Karmarkar - the Karmarkar potential where the primal
optimal value is replaced by its lower bound. Now, in the sliding objective version of the
method of Karmarkar we also met with the situation when the reduced Newton decrement was
small, and, as we remember, in this situation we were able to update the lower bound for the
primal optimal value and thus got the possibility to go ahead. This is more or less what we are
going to do now: we shall see in a while that if ω turns out to be small, then there is a possibility
to update the current dual strictly feasible solution s into a new solution s0 of this type and to
improve by this ”significantly” the potential.
To get the idea how to update the dual solution, consider the ”worst” for Rule 1 case - the
reduced Newton decrement ω is zero. What happens in this situation? The reduced Newton
decrement is zero if and only if the gradient of vx , taken at x along the primal feasible plane, is 0,
or, which is the same, if the gradient taken with respect to the whole primal space is orthogonal
to L, i.e., if and only if
s
F 0 (x) + ζ T ∈ L⊥ . (7.11)
s x
This is a very interesting relation. Indeed, let
sT x 0
s∗ ≡ − F (x) (7.12)
ζ
114 CHAPTER 7. THE PRIMAL-DUAL POTENTIAL REDUCTION METHOD
The above inclusion says that −s∗ + s ∈ L⊥ , i.e., that s∗ ∈ s + L⊥ ; since s ∈ c + L⊥ , we come
to the relation
sT x 0
s∗ ≡ − F (x) ∈ c + L⊥ . (7.13)
ζ
The latter relation says that the vector −F 0 (x) can be normalized, by multiplication by a positive
constant, to result in a vector s∗ from the dual feasible plane. On the other hand, s∗ belongs
to the interior of the dual cone K ∗ , since −F 0 (x) does (Proposition 5.3.3). Thus, in the case
in question (when ω = 0), a proper normalization of the vector −F 0 (x) gives us a new strictly
feasible dual solution s0 ≡ s∗ . Now, what happens with the potential when we pass from s to s∗
(and do not vary the primal solution x)? The answer is immediate:
Vµ (x, s) = V0 (x, s) + µ ln sT x ≥ ϑ ln ϑ − ϑ + µ ln sT x;
ζ ζ µ
= µ ln = µ ln = µ ln(1 + ) (7.14)
(−F 0 (x))T x ϑ ϑ
(the second equality in the chain is (7.12), the fourth comes from the identity (5.5), see Lecture
5). Thus, we see that in the particular case ω = 0 updating
sT x 0
(x, s) 7→ (x, s∗ = − F (x))
ζ
results in a strictly feasible primal-dual pair and decreases the potential at least by the quantity
µ ln(1 + µ/ϑ).
We have seen what to do in the case of ω = 0, when Rule 1 does not work at all. This is
unsifficient: we should understand also what to do when Rule 1 works, but works bad, i.e., when
ω is small, although nonzero. But this is more or less clear: what is good for the limiting case
ω = 0, should work also when ω is small. Thus, we get an idea to use, in the case of small ω, the
updating of the dual solution given by (7.12). This updating, anyhow, cannot be used directly,
since in the case of positive ω it results in s∗ which is unfeasible for the dual problem. Indeed,
dual feasibility of s∗ in the case of ω = 0 was a consequence of two facts:
1. Inclusion s∗ ∈ int K ∗ - since s∗ is proportional, with negative coefficient, to F 0 (x), and
all vectors of this type do belong to int K ∗ (Proposition 5.3.3); the inclusion s∗ ∈ int K ∗ is
therefore completely independent of whether ω is large or small;
2. Inclusion s∗ ∈ c + L⊥ . This inclusion came from (7.11), and it does use the hypothesis
that ω = 0 (and in fact is equivalent to this hypothesis).
Thus, we meet with the difficulty that 2. does not remain valid when ω is positive, although
small. Ok, if the only difficulty is that s∗ given by (7.12) does not belong to the dual feasible
plane, we can correct s∗ - replace it by a properly chosen projection s0 of s∗ onto the dual
feasible plane. When ω = 0, s∗ is in the dual feasible plane and in the interior of the cone K ∗ ;
by continuity reasons, for small ω s∗ is close to the dual feasible plane and the projection will
be close to s∗ and therefore, hopefully, will be still in the interior of the dual cone (so that s0 ,
which by construction is in the dual feasible plane, will be strictly dual feasible), and, besides
7.3. THE PRIMAL-DUAL UPDATING 115
this, the updating (x, s) 7→ (x, s0 ) would result in ”almost” the same progress in the potential
as in the above case ω = 0.
The outlined idea is exactly what we are going to use. The implementation of it is as follows.
Rule 2. In order to update a strictly feasible primal-dual pair (x, s) into a new strictly
feasible primal-dual pair (x, s0 ), act as follows. Same as in Rule 1, compute the reduced Newton
direction ex , the reduced Newton decrement ω and set
sT x 0
s0 = − [F (x) + F 00 (x)ex ]. (7.15)
ζ
Note that in the case of ω = 0 (which is equivalent to ex = 0), updating (7.15) becomes
exactly the updating (7.12). As it can be easily seen2 , s0 is the projection of s∗ onto the dual
feasible plane in the metric given by the Hessian (F + )00 (s∗ ) of the dual barrier at the point s∗ ;
in particular, s0 always belong to the dual feasible plane, although not necesarily to the interior
of the dual cone K ∗ ; this latter inclusion, anyhow, for sure takes place if ω < 1, so that in this
latter case s0 is strictly dual feasible. Moreover, in the case of small ω the updating given by
Rule 2 decreases the potential ”significantly”, so that Rule 2 for sure works well when Rule 1
does not, and choosing the best of these two rules, we come to the updating which always works
well.
The exact formulation of the above claim is as follows:
II. (i) The point s0 given by (7.15) always belongs to the dual feasible plane.
(ii) The point s0 is in the interior of the dual cone K ∗ (and, consequently, is dual strictly
feasible) whenever ω < 1, and in this case one has
ϑ+µ
Vµ (x, s) − Vµ (x, s0 ) ≥ µ ln √ − ρ(ω), ρ(r) = − ln(1 − r) − r, (7.16)
ϑ+ω ϑ
and the progress in the potential is therefore positive for all small enough positive ω.
Proof.
10 . By definition, ex is the minimizer of the quadratic form
1
Q(h) = hT [F 0 (x) + γs] + hT F 00 (x)h,
2
ζ ϑ+µ
γ= ≡ T , (7.17)
sT x s x
over h ∈ L; note that
hT [F 0 (x) + γs] = hT ∇y vx (x), h ∈ L.
Writing down the optimality condition, we come to
Substituting u = −F 0 (s) and t = 1/γ and taking into account the relation between F + and the
Legendre transformation F ∗ of the barrier F , we come to
thus, we come to
(F + )00 (s∗ ) = γ 2 [F 00 (x)]−1 . (7.23)
Combining this observation with relation (7.22), we come to
(the concluding equality is given by (7.19)). Thus, we come to the following conclusion:
IIa. The distance |s0 −s∗ |F + ,s∗ between s∗ and s0 in the Euclidean metric given by the Hessian
(F + )00 (s∗ )
of the dual barrier F + at the point s∗ is equal to the reduced Newton decrement ω.
In particular, if this decrement is < 1, s0 belongs to the centered at s∗ open unit Dikin ellipsoid
of the self-concordant barrier F + and, consequently, s0 belongs to the domain of the barrier (I.,
Lecture 2), i.e., to int K ∗ . Since we already know that s0 always belongs to the dual feasible
plane (see 20 ), s0 is strictly dual feasible whenever ω < 1.
We have proved all required in (i)-(ii), except inequality (7.16) related to the progress in the
potential. This is the issue we come to, and from now on we assume that ω < 1, as it is stated
in (7.16).
40 . Thus, let us look at the progress in the potential
xT s0
α = Vµ (x, s) − Vµ (x, s0 ) = V0 (x, s) − V0 (x, s0 ) − µ ln . (7.24)
xT s
7.3. THE PRIMAL-DUAL UPDATING 117
We have
h i
V0 (x, s0 ) = F (x) + F + (s0 ) + ϑ ln xT s0 = F (x) + F + (s∗ ) + ϑ ln xT s∗ +
1
" #
xT s0
+ F + (s0 ) − F + (s∗ ) + ϑ ln T ∗ ; (7.25)
x s 2
since s∗ = −tF 0 (x) with some positive t, (**) says to us that
[·]1 = ϑ ln ϑ − ϑ. (7.26)
Now, s0 , as we know from IIa., is in the open unit Dikin ellipsoid of F + centered at s∗ , and
the corresponding local distance is equal to ω; therefore, applying the upper bound (2.4) from
Lecture 2 (recall that F + is self-concordant), we come to
((5.3), Lecture 5); since F + (u) = F ∗ (−u), F ∗ being the Legendre transformation of F , we have
(F + )0 (s∗ ) = −γx.
xT s0
[·]2 ≤ xT F 00 ex + ρ(ω) + ϑ ln ,
xT s∗
which combined with (7.25) and (7.26) results in
xT s0
V0 (x, s0 ) ≤ ϑ ln ϑ − ϑ + xT F 00 (x)ex + ρ(ω) + ϑ ln . (7.28)
xT s∗
On the other hand, we know from (**) that V0 (x, s) ≥ ϑ ln ϑ − ϑ; combining this inequality,
(7.24) and (7.28), we come to
xT s0 xT s0
α ≥ −xT F 00 (x)ex − ρ(ω) − ϑ ln − µ ln . (7.29)
xT s∗ xT s
50 . Now let us find appropriate representations for the inner products involved into (7.29).
To this end let us set
π = −xT F 00 (x)ex . (7.30)
In view of (7.22) we have
1 T 00 π
xT s0 = xT s∗ − x F (x)ex = xT s∗ +
γ γ
118 CHAPTER 7. THE PRIMAL-DUAL POTENTIAL REDUCTION METHOD
ϑ+π T ∗ ϑ
xT s0 = , x s = ,
γ γ
whence
xT s0 π
=1+ , (7.31)
xT s∗ ϑ
and
xT s0 ϑ+π ϑ+π
T
= T
= (7.32)
x s γx s ϑ+µ
(the concluding equality follows from the definition of γ, see (7.17)).
Substituting (7.31) and (7.32) into (7.29), we come to the following expression for the progress
in potential:
π ϑ+π
α ≥ π − ρ(ω) − ϑ ln 1 + − µ ln . (7.33)
ϑ ϑ+µ
Taking into account that ln(1 + z) ≤ z, we derive from this inequality that
ϑ+µ
α ≥ µ ln − ρ(ω). (7.34)
ϑ+π
Our last task is to evaluate π, which is immediate:
q q √
|π| = |xT F 00 (x)ex | ≤ xT F 00 (x)x eTx F 00 (x)ex ≤ ω ϑ
(we have used (7.19) and identity (5.5), Lecture 5). With this estimate we derive from (7.34)
that
ϑ+µ
α ≥ µ ln √ − ρ(ω), (7.35)
ϑ+ω ϑ
as claimed in II.
1) given (xi−1 , si−1 ), apply to the pair Rules 1 and 2 to get the updated pairs (x0i−1 , si−1 )
and (xi−1 , s0i−1 ), respectively.
7.4. OVERALL COMPLEXITY ANALYSIS 119
2) Check whether s0i−1 is strictly dual feasible. If it is not the case, forget about the pair
(xi−1 , s0i−1 ) and set (x+ + 0 + +
i , si ) = (xi−1 , si−1 ), otherwise choose as (xi , si ) the best (with the
smallest value of the potential Vµ ) of the two pairs given by 1).
3) The pair (x+ +
i , si ) for sure is a strictly feasible primal-dual pair, and the value of the
potential Vµ at the pair is less than at the pair (xi−1 , si−1 ). Choose as (xi , si ) an arbitrary
strictly feasible primal-dual pair such that the potential Vµ at the pair is not greater than at
(x+ + + +
i , si ) (e.g., set xi = xi , si = si ) and loop.
The method, as it is stated now, involves the parameter µ, which in principle can be chosen
as an arbitrary positive real. Let us find out what is the reasonable choice of the parameter.
To this end let us note that what we are intersted in is not the progress p in the potential Vµ
per step, but the quantity β = π/µ, since this is this ratio which governs the exponent in the
accuracy estimate (7.5). Now, at a step it may happen that we are in the situation ω = O(1),
say, ω = 1, so that the only productive rule is Rule 1 and the progress in the potential, according
to I., is of order of 1, which results in β = O(1/µ). On the other hand, we may come to the
situation ω = 0, when the only productive rule is Rule 2, and the progress in the potential is
p = µ ln(1 + µ/ϑ), see (7.16), i.e., β = ln(1 + µ/ϑ). A reasonable choice of µ should balance the
values of β for these two cases, which leads to
√
µ = κ ϑ,
κ being of order of 1. The complexity of the primal-dual method for this - ”optimal” - choice of
µ is given by the following
Theorem 7.4.1 Assume that the primal-dual pair of conic problems (P), (D) (which satis-
fies assumption A) is solved by the primal-dual potential reduction method associated with ϑ-
logarithmically self-concordant primal and dual barriers F and F + , and that the parameter µ of
the method is chosen according to √
µ = κ ϑ,
with certain κ > 0. Then the method generates a sequence of strictly feasible primal-dual pairs
(xi , si ), and the duality gap δ(xi , xi ) (equal to the sum of residuals, in terms of the corresponding
objectives, of the components of the pair) admits the following upper bound:
Vµ (x
b, sb) − Vµ (xi , si ) iΩ(κ)
δ(xi , si ) ≤ V exp{− √ } ≤ V exp{− √ }, (7.36)
κ ϑ κ ϑ
where
Ω(κ) = min 1 − ln 2; inf max{ω − ln(1 + ω); κ ln(1 + κ) − (κ − 1)ω + ln(1 − ω)} (7.37)
0≤ω<1
Vµ (xi , si ) Vµ (x
b, sb) Vµ (x
b, sb) − Vµ (xi , si )
δ(xi , si ) ≤ Γ exp{ } = [Γ exp{ }] exp{− },
µ µ µ
which, after substituting the value of Γ from (7.5), results in the first inequality in (7.36), with
V given by (7.38).
To prove the second inequality in (7.36), it suffices to demonstrate that the progress in the
potential Vµ at a step of the method is at least the quantity Ω(κ) given by (7.37). To this
end let us note that, by construction, this progress is at least the progress given by each of the
rules 1 and 2 (when Rule 2 does not result in a strictly feasible dual solution, the corresponding
progress is −∞). Let ω be the reduced Newton decrement at the step in question. If ω ≥ 1, then
the progress related to Rule 1 is at least 1 − ln 2, see I., which clearly is ≥ Ω(κ). Now consider
the case when ω < 1. Here both of the rules 1 and 2 are productive, and the corresponding
reductions in the potential are, respectively,
p1 = ω − ln(1 + ω)
[since ln(1 + z) ≤ z] √ √
≥ κ ϑ ln(1 + κ/ ϑ) − κω + ln(1 − ω) + ω ≥
[since, as it is immediately seen, z ln(1 + a/z) ≥ ln(1 + a) whenever z ≥ 1 and a > 0]
≥ κ ln(1 + κ) − κω + ln(1 − ω) + ω,
so that the progress in the potential in the case of ω < 1 is at least the quantity given by (7.37).
The claim that the right hand side of (7.37) is a positive continuous function of κ > 0 is
evidently true. The complexity bound (7.39) is an immediate consequence of (7.36).
primal-dual feasible direction (i.e., a direction from L × L⊥ ) d1 = (x0 − x, 0); shifting the current
strictly feasible pair (x, s) in this direction, we for sure get a strictly feasible pair with better
(or, in the case of ω = 0, the same) value of the potential. Similraly, applying Rule 2, we get
another primal-dual feasible direction d2 = (0, s0 − s); shifting the current pair in this direction,
we always get a pair from the primal-dual feasible plane L = {b + L} × {c + L⊥ }, although not
necessarily belonging to the interior of the primal-dual cone K = K × K ∗ , What we always get,
is certain 2-dimensional plane D (passing through (x, s) parallel to the directions d1 , d2 ) which
is contained in the primal-dual feasible plane L, and one (or two, depending on whether Rule
2 was or was not productive) strictly feasible primal-dual pairs - candidates to the role of the
next iterate; what we know from our theoretical analysis, is that the value of the potential at
one of the candidate pairs is ”significantly” - at least by the quantity Ω(κ) - less that the value
of the potential at the previous iterate (x, s). Given this situation, a resonable policy to get
additional progress in the potential at the step is 2-dimensional minimization of the potential
over the intersection of the plane D with the interior of the cone K × K ∗ . The potential is not
convex, and it would be difficult to ensure a prescribed quality of its minimization even over the
2-dimensional plane D, but this is not the point where we must get a good minimizer; for our
purposes it suffices to perform a once for ever fixed (and small) number of steps of any relaxation
method for smooth minimization (the potential is smooth), running the method from the best
of our candidate pairs. In the case of LP, same as in some other interesting cases, there are
possibilities to implement this 2-dimensional search in a way which almost does not increase the
total computational effort per step3 , and at the same time accelerates the method dramatically.
3
this total effort normally is dominated by the cost of computing the reduced Newton direction ex
122 CHAPTER 7. THE PRIMAL-DUAL POTENTIAL REDUCTION METHOD
{ξ ∈ Rl | P (ξ − p) = 0}
x = A(ξ) ≡ Aξ + b,
A being n × l and P being k × l matrices; usually one can assume that A is of full column rank,
i.e., that its columns are linearly independent, and that P is of full row rank, i.e., the rows of
P are linearly independent; from now on we make this regularity assumption. As far as the
objective is concerned, it is a linear form χT ξ of the design vector.
Thus, the typical for applications form of the primal problem is
K being a pointed closed and convex cone with a nonempty interior in Rn . This is exactly the
setting presented in Section 5.4.4.
As we know from Exercise 5.4.11, the problem dual to (P) is
where the control vector is comprised of s ∈ Rn and r ∈ Rk , K ∗ is the cone dual to K, and
β = A(p).
In what follows F denotes the primal barrier - ϑ-logarithmically homogeneous self-concordant
barrier for K, and F + denotes the dual barrier (see Lecture 7).
Let us look how the primal-dual method could be implemented in the case when the primal-
dual pair of problems is in the form (P) - (D). We should answer the following basic questions
As far as the first of this issues is concerned, the most natural decision is
to represent x’s of the form A(ξ) (note that all our primal feasible x’s are of this type) by
storing both x (as an n-dimensional vector) and ξ (as an l-dimensional one);
to represent s’s and r’s ”as they are” - as n- and k-dimensional vectors, respectively.
Now, what can be said about the main issue - how to implement the updating of strictly
feasible primal-dual pairs? In what follows we speak about the basic version of the method
only, not discussing the large step strategy from Section 7.5, since implementation of the latter
strategy (and even the possibility to implement it) heavily depends on the specific analytic
structure of the problem.
7.6. EXERCISES: PRIMAL-DUAL METHOD 123
Looking at the description of the primal-dual method, we see that the only nontrivial issue
is how to compute the Newton direction
1
ex = argmin{hT g + hT F 00 (x)h | h ∈ L},
2
ϑ+µ
where (x, s) is the current iterate to be updated and g = F 0 (x) + sT x
s. Since L is the image of
the linear space
L0 = {ζ ∈ Rl | P ζ = 0}
under the mapping ζ 7→ Aζ, we have
ex = Aηx
for certain ηx ∈ L0 , and the problem is how to compute ηx .
Exercise 7.6.1 # Prove that ηx is uniquely defined by the linear system of equations
Q PT η −q
= (7.40)
P 0 u 0
where
Q = AT F 00 (x)A, q = AT g, (7.41)
so that ηx is given by the relation
h i
ηx = −Q−1 AT g − P T [P Q−1 P T ]−1 P Q−1 AT g ; (7.42)
ηx = −Q−1 AT g. (7.43)
Note that normally k is a small integer, so that the main effort in computing ηx is to assemble
and to invert the matrix Q. Usually this is the main part of the overall effort per iteration, since
other actions, like computing F (x), F 0 (x), F 00 (x), are relatively cheap.
derivative in t of the quantity v T (t)Lv(t) is strictly negative for every t and every trajectory v(t)
with nonzero v(t). This latter requirement, in view of v 0 (t) = V (t)v(t), is equivalent to
or, which is the same (since for a given t v(t) can be an arbitrary vector and V (t) can be an
arbitrary matrix from Conv{V1 , ..., Vm }), is equivalent to the requirement
1
v T V T Lv = v T [V T L + LV ]v < 0, v 6= 0, V ∈ Conv{V1 , ..., Vm }.
2
In other words, L should be a positive definite symmetric matrix such that all the matrices of
the form V T L + LV associated with V ∈ Conv{V1 , ..., Vm } are negative definite; matrix L with
these properties will be called appropriate.
Our first (and extremely simple) task is to characterize the appopriate matrices.
We see that to find an appropriate matrix (and to demonstrate by this stability of (C) via a
quadratic Lyapunov function) is the same as to find a solution to the following system of strict
matrix inequalities
L > 0; ViT L + LVi < 0, i = 1, ..., m, (7.44)
where inequalities with symmetric matrices are understood as positive definiteness (for strict
inequalities) or semidefiniteness (for non-strict ones) of the corresponding differences.
We can immediately pose our problem as a conic problem with trivial objective; to this end
it suffices to treat L as the design variable (which varies over the space Sν of symmetric ν × ν
matrices) and introduce the linear mapping
from this space into the space (Sν )m+1 - the direct product of m + 1 copies of the space Sν , so
that (Sν )m+1 is the space of symmetric block-diagonal [(m+1)ν]×[(m+1)ν] matrices with m+1
diagonal blocks of the size ν × ν each. Now, (Sν )m+1 contains the cone K of positive semidefinite
matrices of the required block-diagonal structure; it is clearly seen that L is appropriate if and
only if B(L) ∈ int K, so that the set of appropriate matrices is the same as the set of strictly
feasible solutions to the conic problem
to the problem, and in our particular case such a solution is exactly what should be finally found.
There is, anyhow, a straightforward way to avoid the difficulty. First of all, our system (7.44) is
homogeneous in L; therefore we can normalize L to be ≤ I (I stands for the unit matrix of the
context-determined size) and pass from the initial system to the new one
Now let us extend our design vector L by one variable t, so that the new design vector becomes
ξ = (t, L) ∈ E ≡ R × Sn ,
Clearly, to solve system (7.45) is the same as to find a feasible solution to optimization problem
(7.46) with negative value of the objective; on the other hand, in (7.46) we have no difficulties
with an initial strictly feasible solution: we may set L = 12 I and then choose t large enough to
make all remaining inequalities strict.
It is clear that (7.46) is of the form (P) with the data given by the affine mapping
E being the space (Sν )m+2 of block-diagonal symmetric matrices with m + 2 diagonal blocks of
the size ν × ν each; the cone K in our case is the cone of positive semidefinite matrices from E,
and matrix P is absent, so that our problem is
(Dl) minimize Tr{s0 } under choice of m + 2 symmetric ν × ν matrices s−1 , ..., sm s.t.
m
X
s−1 − s0 − [Vi si + si ViT ] = 0;
i=1
m
X
Tr{s−1 } + Tr{si } = 1.
i=1
126 CHAPTER 7. THE PRIMAL-DUAL POTENTIAL REDUCTION METHOD
It is time now to think of the initialization. Could we in fact point out strictly feasible
solutions xb and sb to the primal and to the dual problems? As we just have mentioned, as far as
the primal problem (Pr) is concerned, there is nothing to do: we can set
x
b = A(t, b
b L),
To the moment we are acquainted with three particular interior point algorithms, namely, with
the short-step path-following method and with two potential reduction algorithms. As we know,
the main advantage of the potential reduction scheme is not of theoretical origin (in fact one
of the potential reduction routines, the method of Karmarkar, is even worse theoretically than
the path-following algorithm), but in possibility to implement ”long step” tactics. Recently it
became clear that such a possibility also exists within the path-following scheme; and the goal
of this lecture is to present to you the ”long step” version of the path-following method.
G being a closed and bounded convex domain in Rn . In order to solve the problem, we take a
ϑ-self-concordant barrier F for the feasible domain G and trace the path
as the penalty parameter t tends to infinity. More specifically, we generate a sequence of pairs
(ti , xi ) κ-close to the path, i.e., satisfying the predicate
q
{t > 0} & {x ∈ int G} & {λ(Ft , x) ≡ [∇x Ft (x)]T ∇2x Ft (x)∇x Ft (x) ≤ κ}, (8.3)
the path tolerance κ < 1 being the parameter of the method. The policy of tracing the path
in the basic scheme of the method was very simple: in order to update (t, x) ≡ (ti−1 , xi−1 ) into
(t+ , x+ ) = (ti , xi ), we first increased, in certain prescribed ratio, the value of the penalty, i.e.,
set
γ
t+ = t + dt, dt = √ t, (8.4)
ϑ
and then applied to the new function Ft+ (·) the damped Newton method in order to update x
into x+ :
1
y l+1 = y l − [∇2 F (y l )]−1 ∇x Ft+ (y l ); (8.5)
1 + λ(Ft+ , y l ) x
we initialized this recurrence by setting y 0 = x and terminated it when the closeness to the path
was restored, i.e., when λ(Ft+ , y l ) turned out to be ≤ κ, and took the corresponding y l as x+ .
127
128 CHAPTER 8. LONG-STEP PATH-FOLLOWING METHODS
Looking at the scheme, we immediately see at least two weak points of it: first, we use a once
for ever fixed penalty rate and do not try to use larger dt’s; second, when applying the damped
Newton method to the function Ft+ , we start the recurrence at y 0 = x; why do not we use a
better forecast for our target point x∗ (t + dt)? Let us start with discussing this second point.
The path x∗ (·) is smooth (at least two times continuously differentiable), as it is immediately
seen from the Implicit Function Theorem applied to the equation
tc + F 0 (x) = 0 (8.6)
which defines the path. Given a tight approximation x to the point x∗ (t) of the path, we could
try to use the first-order prediction
xf (dt) = x + x0 dt
d ∗
of our target point x∗ (t+dt); here x0 is some approximation of the derivative dt x (·) at the point
t. The simplest way to get this approximation is to note that what we finally are interested in
is to solve with respect to y the equation
(t + dt)c + F 0 (y) = 0;
a good idea is to linearize the left hand side at y = x and to use, as the forecast of x∗ (t + dt),
the solution to the linearized equation. The linearized equation is
and we come to
dx(dt) ≡ y − x = −[F 00 (x)]−1 ∇x Ft+dt (x). (8.7)
Thus, it is reasonable to start the damped Newton method with the forecast
Note that in fact we do not get anything significantly new: xf (dt) is simply the Newton (not
the damped Newton) iterate of x with respect to the function Ft+ (·); nevertheless, this is not
exactly the same as the initial implementation. The actual challenge is, of course, to get rid of
the once for ever fixed penalty rate. To realize what could be done here, let us write down the
generic scheme we came to:
Predictor-Corrector Updating scheme:
in order to update a given κ-close to the path x∗ (·) pair (t, x) into a new pair (t+ , x+ ) of the
same type, act as follows
• Predictor step:
1) form the primal search line
• Corrector step:
3) starting with y 0 = xf , run the damped Newton method (8.5) until λ(t+ , y l ) becomes
≤ κ; when it happens, set x+ = y l , thus completing the updating (t, x) 7→ (t+ , x+ ).
Now let us look what are the stepsizes δt acceptable for us. Of course, there is an immediate
requirement that xf = x + dx(δt) should be strictly feasible - otherwise we simply will be unable
to start the damped Newton method with xf . There is, anyhow, a more severe restriction.
Remember that the complexity estimate for the method in question heavily depended on the
fact that the ”default” stepsize (8.4) results in a once for ever fixed (depending on the penalty
rate γ and the path tolerance κ only) Newton complexity of the corrector step. If we wish to
preserve the complexity bounds - and we do wish to preserve them - we should take care of fixed
Newton complexity of the corrector step. Recall that our basic results on the damped Newton
method as applied to the self-concordant function Ft+ (·) (X., Lecture 2) say that the number of
Newton iterations of the method, started at certain point y 0 ∈ int G and ran until the relation
λ(Ft+ , y l ) ≤ κ becomes true, is bounded from above by the quantity
1
O(1) [Ft+ (y 0 ) − min Ft+ (y)] + ln(1 + ln ) ,
y∈int G κ
O(1) being an appropriate absolute constant. We see that in order to bound from above the
Newton complexity of the corrector step it suffices to bound from above the residual
where κ is a once for ever fixed constant - the additional to the path tolerance κ parameter of
the method. The problem, of course, is how to ensure (8.11). If it would be easy to compute
the residual at a given pair (t+ , xf ), we could apply a linesearch in the stepsize δt in order to
choose the largest stepsize compatible with a prescribed upper bound on the residual. Given a
candidate stepsize δt, we normally have no problems with ”cheap” computation of t+ , xf and
the quantity Ft+ (xf ) (usually the cost of computing the value of the barrier is much less than
our natural ”complexity unit” - the arithmetic cost of a Newton step); the difficulty, anyhow,
is that the residual invloves not only the value of Ft+ at the forecast, but also the unknown to
us minimum value of Ft+ (·). What we are about to do is to derive certain duality-based and
computationally cheap lower bounds for the latter minimum value, thus obtaining ”computable”
upper bounds for the residual.
Note that Q indeed defines a ϑ-self-concordant barrier for G, see Proposition 3.1.1.(i).
Note that the essence of the Structural assumption is that we know the Legendre transfor-
mation of Φ (otherwise there would be no assumption at all - we simply could set Φ ≡ F ). This
assumption indeed is satisfied in many important cases, e.g., in Linear Programming, where G
is a polytope given by linear inequalities aTi x ≤ bi , i = 1, ..., m, and
m
X
F (x) = − ln(bi − aTi x);
i=1
here
m
X
G + = Rm
+ , Φ(u) = − ln ui
i=1
and
(πx + p)i = bi − aTi x, i = 1, ..., m;
the Legendre transformation of Φ, as it is immediately seen, is
Φ∗ (s) = Φ(−s) − m, s ∈ Rm
−.
In the mean time we shall speak about other important cases where the assumption is valid.
Now let us make the following simple and crucial observation:
Proposition 8.2.1 Let a pair (τ, s) ∈ R+ × Dom Φ∗ satisfy the linear homogeneous equation
τ c + π T s = 0. (8.13)
whence
Fτ (y) ≡ τ cT y + F (y) ≡ τ cT y + Φ(πy + p) ≥
≥ τ cT y + [πy + p]T s − Φ∗ (s) = [τ c + π T s]T y + pT s − Φ∗ (s) = pT s − Φ∗ (s)
(the concluding inequality follows from (8.13)).
Our next observation is that there exists a systematic way to generate dual feasible pairs
(τ, s), i.e., the pairs satisfying the premise of the above proposition.
8.2. DUAL BOUNDS AND DUAL SEARCH LINE 131
Proposition 8.2.2 Let (t, x) be a primal feasible pair (i.e., with t > 0 and x ∈ int G), and let
Proof.
(i): from (8.16) it follows that
(t + dt)c + π T (s + ds(dt)) = (t + dt)c + π T Φ0 (u) + Φ00 (u)πdx(dt) =
[since F 0 (x) = π T Φ0 (u) and F 00 (x) = π T Φ00 (u)π in view of (8.12) and (8.16)]
and the concluding quantity is 0 due to the origin of dx(dt), see (8.7). (i) is proved.
(ii): let us start with the following simple
and
|ds(0)|(Φ∗ )00 (s) = |dx(0)|F 00 (x) = λ2 (Ft , x). (8.18)
(see (L.3), Lecture 2). Besides this, ds(dt) = Φ00 (u)du(dt) by (8.16), whence
|ds(dt)|2(Φ∗ )00 ≡ [ds(dt)]T [(Φ∗ )00 ][ds(dt)] = [Φ00 du(dt)]T [Φ00 ]−1 [Φ00 du(dt)] =
Now we can immediately complete the proof of item (ii) of the Proposition. Indeed, as we
know from VII., Lecture 2, the function Φ∗ is self-concordant on its domain; since s = Φ0 (u),
we have s ∈ Dom Φ∗ . (8.18) says that the | · |(Φ∗ )00 (s) -distance between s ∈ Dom Φ∗ and sf (0)
equals to λ(Ft , x) and is therefore < 1 due to the premise of (ii). Consequently, s(0) belongs to
the centered at s open unit Dikin ellipsoid of the self-concordant function Φ∗ and is therefore in
the domain of the function (I., Lecture 2). The latter domain is open (VII., Lecture 2), so that
sf (dt) ∈ Dom Φ∗ for all small enough dt ≥ 0; since S(dt) always satisfies (8.13), we conclude
that S(dt) is dual feasible for all small enough |dt|.
Propositions 8.2.1 and 8.2.2 lead to the following
Acceptability Test:
given a κ-close to the path primal feasible pair (t, x) and a candidate stepsize dt, form
the corresponding primal and dual pairs X(dt) = (t + dt, xf (dt) = x + dx(dt)), S(dt) = (t +
dt, sf (dt) = s + ds(dt)) and check whether the associated upper bound
v(dt) ≡ Vsf (dt) (t + dt, xf (dt)) = (t + dt)cT xf (dt) + F (xf (dt)) + Φ∗ (sf (dt)) − pT sf (dt) (8.20)
for the residual V (t + dt, xf (dt)) is ≤ κ (by definition, v(dt) = +∞ if xf (dt) 6∈ Dom F or if
sf (dt) 6∈ Dom Φ∗ ).
If v(dt) ≤ κ, accept the stepsize dt, otherwise reject it.
An immediate corollary of Propositions 8.2.1, 8.2.2 is the following
Proposition 8.2.3 If (t, x) is a κ-close to the path primal feasible pair and a stepsize dt passes
the Acceptability Test, then
V (t + dt, xf (dt)) ≤ κ
and, consequently, the Newton complexity of the corrector step under the choice δt = dt does not
exceed the quantity
1
N (κ, κ) = O(1) κ + ln 1 + ln ,
κ
O(1) being an absolute constant.
Now it is clear that in order to get a ”long step” version of the path-following method,
it suffices to equip the Predictor-Corrector Updating scheme with a linesearch-based rule for
choosing the largest possible stepsize δt which passes our Acceptability Test. Such a rule for
sure keeps the complexity of a corrector step at a fixed level; at the same time, the rule is
computationally cheap, since to test a stepsize, we should compute the values of Φ and Φ∗ only,
which normally is nothing as compared to the cost of the corrector step.
The outlined approach needs, of course, theoretical justification. Indeed, to the moment we
do not know what √ is the ”power” of our Acceptability Test - does it accept, e.g., the ”short”
stepsizes dt = O(t/ ϑ) used in the very first version of the method. This is the issue we come
to.
thus coming to the conjugate point s ∈ Dom Φ∗ and to the conjugate direction δs. Now, let
ρ∗u [δu] be the remainder in the second-order Taylor expansion of the function Φ(v) + Φ∗ (w) at
the point (u, s) along the direction (δu, δs):
Proof. (8.21) is proved exactly as relation (8.17), see Lemma 8.2.1. From (8.21) it follows that
if ζ < 1, then both u + δu and s + δs are in the centered at u, respectively, s open unit Dikin
ellipsoids of the self-concordant functions Φ, Φ∗ (the latter function is self-concordant due to
VII., Lecture 2). Applying to Φ and Φ∗ I., Lecture 2, we come to
(h is fixed) with respect to v in the direction h (cf. item 40 in the proof of VII., Lecture 2).
The differentiation results in
D3 Φ∗ (Φ0 (v))[h, h, h] = −D3 Φ(v)[[Φ00 (v)]−1 h, [Φ00 (v)]−1 h, [Φ00 (v)]−1 h];
Proposition 8.3.1 Let (t, x) be κ-close to the path, and let dt, |dt| < t, be a stepsize. Then the
quantity v(dt) (see (8.20)) satisfies the inequality
while
|dt| |dt| √
|du(dt)|Φ00 (u) ≤ ω ≡ λ(Ft , x) + [λ(Ft , x) + λ(F, x)] ≤ κ + [κ + ϑ]. (8.24)
t t
In particular, if ω < 1, then v(dt) is well-defined and is ≤ 2ρ(ω) − ω 2 . Consequently, if
|dt| κ+ − κ
≤ , (8.26)
t κ + λ(F, x)
Proof. Let u, s, du(dt), ds(dt) be given by (8.16). In view of (8.16), s is conjugate to u and
ds(dt) is conjugate to du(dt), so that by definition of ρ∗u [·], we have, denoting ζ = |du(dt)|Φ00 (u) =
|ds(dt)|(Φ∗ )00 (s) (see (8.21))
[since u + du(dt) = π[x + dx(dt)] + p and, by Proposition 8.2.2, π T [s + ds(dt)] = −(t + dt)c]
v(dt) ≡ (t + dt)cT xf (dt) + F (xf (dt)) + Φ∗ (sf (dt)) − pT sf (dt) = ρ∗u [du(dt)],
as required in (8.23).
8.4. SUMMARY 135
8.4 Summary
Summarizing our observations and results, we come to the following
Long-Step Predictor-Corrector Path-Following method:
• The parameters of the method are the path tolerance κ ∈ (0, 1) and the threshold κ >
2ρ(κ) − κ2 ; the input to the method is a κ-close to the path primal feasible pair (t0 , x0 ) .
• The method forms, starting with (t0 , x0 ), the sequence of κ-close to the path pairs (ti , xi ),
with the updating
(ti−1 , xi−1 ) 7→ (ti , xi )
being given by the Predictor-Corrector Updating scheme, where the stepsizes δti ≡ ti −ti−1
are nonnegative reals passing the Acceptability Test associated with the pair (ti−1 , xi−1 ).
Since, as we know from Proposition 8.3.1, the stepsizes
κ+ − κ
δti∗ = ti−1
κ + λ(F, xi−1 )
for sure pass the Acceptability Test, we may assume that the stepsizes in the above method are
at least the default values δti∗ :
κ+ − κ
δti ≥ ti−1 ; (8.27)
κ + λ(F, xi−1 )
note that to use the default stepsizes δti ≡ δti∗ , no Acceptability Test, and, consequently, no
Structural assumption on the barrier F is needed. Note also that to initialize the method (to get
the initial close to the path pair (t0 , x0 )), one can trace ”in the reverse time” the auxiliary path
associated with a given strictly feasible initial solution x b ∈ int G (see Lecture 4); and, of course,
when tracing the auxiliary path, we also can use the long-step predictor-corrector technique.
The method in question, of course, fits the standard complexity bounds:
136 CHAPTER 8. LONG-STEP PATH-FOLLOWING METHODS
Theorem 8.4.1 Let problem (8.1) be solved by the Long-Step Predict-or-Corrector Path-Follo-
wing method which starts at a κ-close to the path primal feasible pair (t0 , x0 ) and uses stepsizes
δti passing the Acceptability Test and satisfying (8.27). Then the total number of Newton steps
in the method before an ε-solution to the problem is found does not exceed
√
ϑ
O(1) ϑ ln + 1 + 1,
t0 ε
Proof. Since (ti , xi ) are κ-close to the path, we have cT xi − minx∈G cT x ≤ O(1)ϑt−1i with
certain O(1) depending on κ only (see Proposition 4.4.1, Lecture 4); this inaccuracy
√ bound
combined with (8.27) (where one should take into account√ that λ(F, xi−1 ) ≤ ϑ) implies that
cT xi − minx∈G cT x becomes ≤ ε after no more than O(1) ϑ ln(1 + ϑt−1 −1
0 ε ) + 1 steps, with O(1)
depending on κ and κ only. It remains to note that since the stepsizes pass the Acceptability
Test, the Newton complexity of a step in the method, due to Proposition 8.2.3, is O(1).
8.5. EXERCISES: LONG-STEP PATH-FOLLOWING METHODS 137
Exercise 8.5.1 #∗ The Structural assumption requires F to be obtained from a barrier with
known Legendre transformation by affine substitution of the argument. Why did not we simplify
things by assuming that F itself has a known Legendre transformation?
The remaining exercises tell us another story. We have presented certain ”long step” variant
of the path-following scheme; note, anyhow, that the ”cost” of ”long steps” is certain structural
assumption on the underlying barrier. Although this assumption is automatically satisfied in
many important cases, we have paid something. Can we say something definite about the
advantages we have paid for? ”Definite” in the previous sentence means ”something which can
be proved”, not ”something which can be supported by computational experience” (this latter
aspect of the situation is more or less clear).
The answer is as follows. As far as the worst case complexity bound is concerned, there is no
progress at all, and the current state of the theory of interior √point methods do not give us any
hope to get a worst-case complexity estimate better than O( ϑ ln(V/ε)). Thus, if we actually
have got something, this is not an improvement in the worst case complexity. The goal of the
forthcoming exercises is to explain what is the improvement.
Let us start with some preliminary considerations. Consider a step of a path-following
predictor-corrector method; for the sake of simplicity, assume that at the beginning of the step
we are exactly at the path rather than are close to it (what follows can be without any difficulties
extended onto this latter situation). Thus, we are given t > 0 and x = x∗ (t), and our goal is to
update the pair (t, x) into a new pair (t+ , x+ ) close to the path with larger value of the penalty
parameter. To this end we choose a stepsize dt > 0, set t+ = t + dt and make the predictor step
x 7→ xf = x + (x∗ )0 (t)dt,
shifting x along the tangent to the path line l. At the corrector step we apply to Ft+ the damped
Newton method, starting with xf , to restore closeness to the path. Assume that the method in
question ensures that the residual
is ≤ O(1) (this is more or less the same as to ensure a fixed Newton complexity of the corrector
step). Given that the method in question possesses the aforementioned properties, we may ask
ourselves what is the length of the displacement xf − x which is guaranteed by the method. It
is natural to measure the length in the local metric | · |F 00 (x) given by the Hessian of the barrier.
Note that in the short-step version of the method, where dt = O(1)t(1 + λ(F, x))−1 , we have
(see (8.7))
dx(dt) = −dt[F 00 (x)]−1 c = t−1 dt[F 00 (x)]−1 F 0 (x)
(since at the path tc + F 0 (x) = 0), whence
|xf (dt) − x|F 00 (x) = |dx(dt)|F 00 (x) = t−1 dt|[F 00 (x)]−1 F 0 (x)|F 00 (x) =
so that Ω = O(1), provided that λ(F, x) ≥ O(1), or, which is the same, provided that we are
not too close to the analytic center of G.
Thus, the quantity Ω - let us call it the prediction power of the method - for the default
short-step version of the method is O(1). The goal of what follows is to investigate the prediction
power of the long-step version of the method and to compare it with the above reference point
- the O(1)-power of the short-step version; this is a natural formalization of the question ”how
long are the long steps”.
First of all, let us note that there is a natural upper bound on the prediction power - namely,
the distance (measured, of course, in | · |F 00 (x) ) from x to the boundary of G along the tangent
line l. Actually there are two distances, since there are two ways to reach ∂G along l - the
”forward” and the ”backward” movement. It is reasonable to speak about the shortest of these
distances - about the quantity
Since G contains the centered at x unit Dikin ellipsoid of F (i.e., the centered at x | · |F 00 (x) -unit
ball), we have
∆ ≥ 1.
Note that there is no prediction policy which always results in Ω >> 1, since it may happen
that both ”forward” and ”backward” distances from x to the boundary of G are of order of 1
(look at the case when G is the unit cube {y ∈ Rn | |y|∞ ≤ 1}, F (y) is the standard logarithmic
P
barrier − ni=1 [ln(1 − yi ) + ln(1 + yi )] for the cube, x = (0.5, 0, ..., 0)T and c = (−1, 0, ..., 0)T ).
What we can speak about is the type of dependence Ω = Ω(∆); in other words, it is reasonable
to ask ourselves ”how large is Ω when ∆ is large”, not ”how large is Ω” - the answer to this
latter question cannot be better than O(1).
In what follows we answer the above question for the particular case as follows:
Semidefinite Programming: the barrier Φ involved into our Structural assumption is the barrier
Φ(X) = − ln Det X
Note that the Semidefinite Programming case (very important in its own right) covers, in par-
ticular, Linear Programming (look what happens when πx + p takes values in the subspace of
diagonal matrices).
Let us summarize our current knowledge on the situation in question.
• Φ is k-self-concordant barrier for the cone Sk ; the derivatives of the barrier are given by
b h
DΦ(u)[h] = − Tr{u−1 h} = − Tr{h}, b = u−1/2 hu−1/2 ,
so that
Φ0 (u) = −u−1 ; (8.28)
b 2 },
D2 Φ(u)[h, h] = Tr{u−1 hu−1 h} = Tr{h
8.5. EXERCISES: LONG-STEP PATH-FOLLOWING METHODS 139
so that
Φ00 (u)h = u−1 hu−1 ; (8.29)
3
D Φ(u)[h, h, h] = −2 Tr{u −1
hu −1
hu −1 b 3}
h} = −2 Tr{h
(see Example 5.3.3, Lecture 5, and references therein);
• the cone Sk+ is self-dual; the Legendre transformation of Φ is
Φ∗ (s) = −Φ(−s) + const, Dom Φ∗ = − int Sn+
(Exercises 5.4.7, 5.4.10).
Let us get more information on the barrier Φ. Let us call an arrow a pair (v, dv) comprised
of v ∈ int Sk+ and dv ∈ Sk with |dv|Φ00 (v) = 1. Given an arrow (v, dv), let us define the conjugate
co-arrow (v ∗ , dv ∗ ) as
v ∗ = Φ0 (v) = −v −1 , dv ∗ = Φ00 (v)dv = v −1 dvv −1 .
Let also
ζ(v, dv) = sup{p | v ± pdv ∈ Sk+ }, (8.30)
ζ ∗ (v ∗ , dv ∗ ) = sup{p | v ∗ ± dv ∗ ∈ −Sk+ }. (8.31)
In what follows |w|∞ , |w|2 are the spectral norm (maximum modulus of eigenvalues) and the
Frobenius norm Tr1/2 {w2 } of a symmetric matrix w, respectively.
Exercise 8.5.2 Let (v, dv) be an arrow and (v ∗ , dv ∗ ) be the conjugate co-arrow. Prove that
q
1 = |dv|Φ00 (v) = |v −1/2 dvv −1/2 |2 = |dv ∗ |(Φ∗ )00 (v∗ ) = Tr{dv dv ∗ } (8.32)
and that
1
ζ(v, dv) = ζ ∗ (v ∗ , dv ∗ ) = . (8.33)
|v −1/2 dvv −1/2 |∞
Exercise 8.5.3 Prove that for any positive integer j, any v ∈ int Sk+ and any h ∈ Sk one has
b j }, h
Dj Φ(v)[h, ..., h] = (−1)j (j − 1)! Tr{h b = v −1/2 hv −1/2 , (8.34)
and, in particular,
b 2 |h|
|Dj Φ(v)[h, ..., h]| ≤ (j − 1)!|h| b j−2 , j ≥ 2. (8.35)
∞
Let ρj (z) be the reminder in j-th order Taylor expansion of the function − ln(1 − z) at z = 0:
1 j+1 1 j+2
ρj (z) = z + z + ...
j+1 j+2
(so that the perfectly known to us function ρ(z) = − ln(1 − z) − z is nothing but ρ1 (z)).
j
Exercise 8.5.4 Let (v, dv) be an arrow, and let R(v,dv) (r), j ≥ 2, be the remainder in j-th order
Taylor expansion of the function f (r) = Φ(v + rdv) at r = 0:
j
X
j f (i) (0) i
R(v,dv) (r) = f (r) − r
i=0
i!
Exercise 8.5.5 Let (v, dv) be an arrow and (v ∗ , dv ∗ ) be the conjugate co-arrow. Let Rj(v,dv) (r),
j ≥ 2, be the reminder in j-th order Taylor expansion of the function ψ(r) = Φ(v + rdv) +
Φ∗ (v ∗ + rdv ∗ ) at r = 0:
j
X ψ (i) (0) i
Rj(v,dv) (r) = ψ(r) − r
i=0
i!
Now let us come back to our goal - investigating the forecast power of the long step predictor-
corrector scheme for the case of Semidefinite Programming. Thus, let us fix the pair (t, x)
belonging to the path (so that t > 0 and x = x∗ (t) = argminy∈G [tcT x + F (x)]). We use the
notation as follows:
• u = πx + p;
du = πdx
Exercise 8.5.6 Prove that (u, du) is an arrow, (s, ds) is the conjugate co-arrow and that
∆ = ζ(u, du).
Now we are ready to answer what is the prediction power of the long step predictor-corrector
scheme.
Exercise 8.5.7 Consider the Long-Step Predictor-Corrector Updating scheme with linesearch
(which chooses, as the stepsize, the largest value of dt which passes the Acceptability Test) as
applied to Semidefinite Programming. Prove that the prediction power of the scheme is at least
• in order to form the forecast xf , we move along the tangent line l to the path [in principle
we could use higher-order polynomial approximations on it; here we ignore this possibility]
It can be proved that in the case of Linear (and, consequently, Semidefinite) Programming the
prediction power of any predictor-corrector scheme subject to the above restrictions cannot be
better than O(1)∆2/3 (x) (which is slightly better than the prediction power O(1)∆1/2 (x) of our
method). I do not know what is the origin of the gap - drawbacks of the long-step method in
question or too optimistic upper bound, and you are welcome to investigate the problem.
142 CHAPTER 8. LONG-STEP PATH-FOLLOWING METHODS
Chapter 9
To the moment we are acquainted with four interior point methods; the ”interior point toolbox”
contains more of them, but we are enforced to stop somewhere, and I think it is a right time to
stop. Let us think how could we exploit our knowledge in order to solve a convex program by
one of our methods. Our actions are clear:
(a) we should reformulate our problem in the standard form
of a problem of minimizing a linear objective over a closed convex domain (or in the conic form
- as a problem of minimizing a linear objective over the intersection of a convex cone and an
affine plane; for the sake of definiteness, let us speak about the standard form).
In principle (a) does not cause any difficulty - we know that both standard and conic problems
are universal forms of convex programs.
(b) we should equip the domain/cone given by step (a) by a ”computable” self-concordant
barrier.
Sometimes we need something more - e.g., to apply the potential reduction methods, we are
interested in logarithmically homogeneous barriers, possibly, along with their Legendre trans-
formations, and to use the long-step path-following scheme, we need a barrier satisfying the
Structural assumption from Lecture 8.
Now, our current knowledge on the crucial issue of constructing self-concordant barriers is
rather restricted. We know exactly 3 ”basic” self-concordant barriers:
• (I) the 1-self-concordant barrier − ln x for the nonnegative axis (Example 3.1.2, Lecture
3);
• (III) the 2-self-concordant barrier − ln(t2 −xT x) for the second-order cone {(t, x) ∈ R×Rk |
t ≥ |x|2 } (Example 5.3.2, Lecture 5).
Note that the latter two examples were not justified in the lectures; and this is not that easy to
prove that (III) indeed is a self-concordant barrier for the second-order cone.
Given the aforementioned basic barriers, we can produce many other self-concordant barriers
by applying the combination rules, namely, by taking sums of these barriers, their direct sums
and superpositions with affine mappings (Proposition 3.1.1, Lecture 3). These rules, although
143
144 CHAPTER 9. HOW TO CONSTRUCT SELF-CONCORDANT BARRIERS
very simple, are surprisingly powerful; what should be mentioned first, is that the rules allow
to treat all constraints defining the feasible set G seperately. We mean the following. Normally
the feasible set G is defined by a finite number m of constraints; each of them defines its own
feasible set Gi , so that the resulting feasible set G is the intersection of the Gi :
G = ∩m
i=1 Gi .
(the image of A should intersect the interior of G+ ); given such representation, you can take as
F the superposition
F (x) = F + (A(x))
of F + and the mapping A.
The Decomposition and the Substitution rules as applied to the particular self-concordant
barriers (I) - (III) allow to obtain barriers required by several important generic Convex Pro-
gramming problems, e.g., they immediately imply self-concordance of the standard logarithmic
barrier m X
F (x) = − ln(bi − aTi x)
i=1
for the polyhedral set
G = {x | aTi x ≤ bi , i = 1, ..., m};
this latter fact covers all needs of Linear Programming. Thus, we cannot say that we are
completely unequipped; at the same time, our equipment is not too rich. Consider, for example,
the problem of the best | · |p -approximation:
(Lp ): given sample uj ∈ Rn , j = 1, ..., N , of ”regressors” along with the responses vj ∈ R,
find the linear model
v = xT u
which optimally fits the observations in the | · |p -norm, i.e., minimizes the quantity
N
X
f (x) = |vj − xT uj |p
j=1
(in fact | · |p -criterion is f 1/p (x), but it is, of course, the same what to minimize - f or f 1/p ).
f (·) clearly is a convex function, so that our approximation problem is a convex program. In
order to solve it by an interior point method, we can write the problem down in the standard
form, which is immediate:
Definition 9.1.1 Let G+ ⊂ RN be closed convex domain, and let K = R(G+ ) be the recessive
cone of G+ . A mapping
A(x) : int G− → RN
defined and C3 smooth on the interior of a closed convex domain G− ⊂ Rn is called β-appropriate
for G+ (here β ≥ 0) if
(i) A is concave with respect to K, i.e.,
D2 A(x)[h, h] ≤K 0 ∀x ∈ int G− ∀h ∈ Rn
G ≡ cl G0 ,
ϑ = ϑ+ + max[1, β 2 ]ϑ− .
The above Theorem resembles the Substitution rule: we see that an affine mapping A in
the latter rule can be replaced by an arbitrary nonlinear mapping (which should, anyhow,
be appropriate for G+ ), and the substitution F + (·) 7→ F + (A(·)) should be accompanied by
adding to the result a self-concordant barrier for the domain of A. Let us call this new rule
”Substitution rule (N)” (nonlinear); to distinguish between this rule and the initial one, let us
call the latter ”Substitution rule (L)” (linear). In fact Substitution rule (L) is a very particular
case of Substitution rule (N); indeed, an affine mapping A, as we know, is appropriate for any
domain G+ , and since domain of A is the whole Rn , one can set F − ≡ 0 (this is 0-self-concordant
barrier for Rn ), thus coming to the Substitution rule (L).
Proposition 9.2.2 Let f (x) be a 3 times continuously differentiable real-valued convex function
on the ray {x > 0} such that
• G+ = R+ [K = R+ ],
• G− = {(x, t) | t ≥ 0},
• A(x, t) = f (t) − x,
which results in
G = cl{(x, t) | t > 0, x ≤ f (t)}.
The assumptions on f say exactly that A is β-appropriate for G+ , so that the conclusion in
Proposition 9.2.1 is immediately given by Theorem 9.1.1.
To get Proposition 9.2.2, it suffices to apply Proposition 9.2.1 to the image of the set Gf
under the mapping (x, t) 7→ (t, −x).
Example 9.2.1 [epigraph of the increasing power function] Whenever p ≥ 1, the function
− ln t − ln(t1/p − x)
−2 ln t − ln(t2/p − x2 )
{(x, t) ∈ R2 | t ≥ |x|p }
The result on the epigraph of (x+ )p is given by Proposition 9.2.1 with f (t) = t1/p , β = 2p−1
3p ; to
get the result on the epigraph of |x|p , take the sum of the already known to us barriers for the
epigraphs E+ , E− of the functions (x+ )p and ([−x]+ )p , thus obtaining the barrier for E− ∩ E+ ,
which is exactly the epigraph of |x|p .
2+p
The case of 0 < p ≤ 1 is given by Proposition 9.2.2 applied with f (x) = x−p , β = 3 . The case
of p > 1 can be reduced to the former one by swapping x and t.
− ln t − ln(ln t − x)
{(x, t) ∈ R2 | t ≥ exp{x}}
of the exponent.
2
Proposition 9.2.1 applied with f (t) = ln t, β = 3
− ln x − ln(t − x ln x)
cl{(x, t) ∈ R2 | t ≥ x ln x, x > 0}
1
Proposition 9.2.2 applied to f (x) = x ln x, β = 3
The indicated examples allow to handle those of the constraints defining the feasible set G
which are separable, i.e., are of the type
X
fi (aTi x + bi ),
i
fi being a convex function on the axis. To make this point clear, let us look at the typical
example - the | · |p -approximation problem (Lp ). Introducing N additional variables ti , we can
rewrite this problem equivalently as
N
X
minimize ti s.t. ti ≥ |vi − uTi x|p , i = 1, ..., N,
i=1
so that now there are N ”simple” constraints rather than a single, but ”complicated” one.
Now, the feasible set of i-th of the ”simple” constraints is the inverse image of the epigraph of
the increasing power function under an affine mapping, so that the feasible domain G of the
reformulated problem admits the following explicit self-concordant barrier (Example 9.2.1 plus
the usual Decomposition and Substitution rules):
N
X 2/p
F (t, x) = − [ln(ti − (vi − uTi x)2 ) + 2 ln ti ]
i=1
ξ2
A(τ, ξ, η) = τ −
η
(ξ, η, τ are real variables and η > 0); the general case is given by ”vectorization” of the numerator
and the denominator and looks as follows:
Given are
Q[ξ 0 , ξ 00 ] : Rn × Rn → RN
Qi [ξ 0 , ξ 00 ] = (ξ 0 )T Qi ξ 00
it turns out that this mapping is, under reasonable restrictions, appropriate for domains in RN .
To formulate the restrictions, note first that A is not necessarily everywhere defined, since the
matrix A(η) may, for some η, be singular. Therefore it is reasonable to restrict η to vary in
certain closed convex domain Y ∈ Rqη ; thus, from now on the mapping A is considered along
with the domain Y where η varies. The conditions which ensure that A is compatible with a
given closed convex domain G+ ⊂ RN are as follows:
(A): A(η) is positive definite for η ∈ int Y ;
(B): the bilinear form Q[A−1 (η)ξ 0 , ξ 00 ] of ξ 0 , ξ 00 is symmetric in ξ 0 , ξ 00 for any η ∈ int Y ;
(C): the quadratic form Q[ξ, ξ] takes its values in the recessive cone K of the domain G+ .
and
B(ξ, η) = −Q[A−1 (η)ξ, ξ] : G− ≡ Y × Rnξ → RN
are 1-appropriate for G+ .
In particular, if F + is ϑ+ -self-concordant barrier for G+ and FY is a ϑY -self-concordant
barrier for Y , then
F (τ, ξ, η) = F + (τ − Q[A−1 (η)ξ, ξ]) + FY (η)
is (ϑ+ + ϑY )-self-concordant barrier for the closed convex domain
The proof of the proposition is given in Section 9.5. What we are about to do now is to present
several examples.
{(x, t) ∈ Rn × R | t ≥ f (x)}.
• G+ = R+ [N = 1];
• Q[ξ 0 , ξ 00 ] = (ξ 0 )T ξ 00 , ξ 0 , ξ 00 ∈ Rn ;
• Rqη = R = Y, A(η) ≡ I
(from now on I stands for the identity operator).
Conditions (A) - (C) clearly are satisfied; Proposition 9.3.1 applied with
F + (τ ) = − ln τ, FY (·) ≡ 0
G = {(τ, ξ, η) | τ ≥ ξ T ξ}.
The epigraph of the quadratic form f clearly is the inverse image of G under the affine mapping
τ = t − bT x − c
(t, x) 7→ ξ = Px ,
η=0
F (t, x) = − ln(t2 − xT x)
• G+ = R+ [N = 1];
9.3. FRACTIONAL-QUADRATIC SUBSTITUTION 151
• Q[ξ 0 , ξ 00 ] = (ξ 0 )T ξ 00 , ξ 0 , ξ 00 ∈ Rn ;
F + (τ ) = − ln τ, FY (η) = − ln η
The second order cone Kn2 clearly is the inverse image of G under the affine mapping
τ =t
(t, x) 7→ ξ = x ,
η=t
and to prove that F (t, x) is 2-self-concordant barrier for the second order cone, it remains to
apply the Substitution rule (L). Logarithmic homogeneity of F (t, x) is evident.
The next example originally required somewhere 15-page ”brut force” justification which was
by far more complicated than the justification of the general results presented in this lecture.
Example 9.3.3 [epigraph of the spectral norm of a matrix] The function
F + (τ ) = − ln Det τ, FY (η) = − ln η
G = cl{(τ, ξ, η) | τ − η −1 ξ T ξ ∈ int Sm
+ , η > 0}.
The spectral norm of a k × m matrix x is < t if and only if the maximum eigenvalue of the
matrix xT x is < t2 , or, which is the same, if the m × m matrix tI − t−1 xT x is positive definite;
thus, the epigraph of the spectral norm of x is the inverse image of G under the affine mapping
τ = tI
(t, x) 7→ ξ = x ,
η=t
and to prove that F (t, x) is (m+1)-self-concordant barrier for the epigraph of the spectral norm,
it suffices to apply the Substitution rule (L). The logarithmic homogeneity of F (t, x) is evident.
The indicated examples of self-concordant barriers are sufficient for applications which will
be our goal in the remaining lectures; at the same time, these examples explain how to use the
general results of the lecture to obtain barriers for other convex domains.
y ≥K αy 0 + (1 − α)y 00 ; (9.2)
indeed, the right hand side in this inequality belongs to int G+ together with y 0 , y 00 ; since K
is the recessive cone of G+ , the translation of any vector from int G+ by a vector form K also
belongs to int G+ , so that (9.2) - which says that y is a translation of the right hand side by a
direction from K would imply that y ∈ int G+ .
To prove (9.2) is the same as to demonstrate that
D2 f (z)[h, h] = sT D2 A(z)[h, h] ≤ 0
as required in (9.3).
9.4. PROOF OF THEOREM 10.1 153
B. Now let us prove self-concordance of F . To this end let us fix x ∈ G0 and h ∈ Rn and
verify that
|D3 F (x)[h, h, h]| ≤ 2{D2 F (x)[h, h]}3/2 , (9.4)
|DF (x)[h]| ≤ ϑ1/2 {D2 F (x)[h, h]}1/2 . (9.5)
B.1. Let us start with writing down the derivatives of F . Under notation
we have
DF (x)[h] = DF + (a)[a0 ] + γ 2 DF − (x)[h], γ = max[1, β], (9.6)
D2 F (x)[h, h] = D2 F + (a)[a0 , a0 ] + DF + (a)[a00 ] + γ 2 D2 F − (x)[h, h], (9.7)
D3 F (x)[h, h, h] = D3 F + (a)[a0 , a0 , a0 ] + 3DF + (a)[a0 , a00 ] + DF + (a)[a000 ] + γ 2 D3 F − (x)[h, h, h].
(9.8)
B.2. Now let us summarize our knowledge on the quantities involved into (9.6) - (9.8).
Since F + is ϑ+ -self-concordant barrier, we have
q q
|DF + (a)[a0 ]| ≤ p ϑ+ , p ≡ D2 F + (a)[a0 , a0 ], (9.9)
since the inequality t3 a000 ≤K −3βt2 a00 is valid for all t with |t|q ≤ 1, (9.15) follows.
Note that from (9.13) and (9.15) it follows that the quantity
q
r≡ DF + (a)[a00 ] (9.16)
D2 F (x)[h, h] = p2 + r2 + γ 2 q 2 , (9.20)
3 3
|D3 F (x)[h, h, h]| ≤ 2[p3 + pr2 + βqr2 ] + 2γ 2 q 3 . (9.21)
2 2
By passing from q to s = γq, we come to inequalities
q q
|DF (x)[h]| ≤ ϑ+ p + ϑ− γs, D2 F (x)[h, h] = p2 + r2 + s2 ,
and
3 3β 2
|D3 F (x)[h, h, h]| ≤ 2[p3 + pr2 + sr ] + 2γ −1 s3 ≤
2 2γ
[since γ ≥ β and γ ≥ 1]
3
≤ 2[p3 + s3 + r2 (p + s)] ≤
2
[straightforward computation]
≤ 2[p2 + r2 + s2 ]3/2 .
Thus,
q
|DF (x)[h]| ≤ ϑ+ + γ 2 ϑ− {D2 F (x)[h, h]}1/2 , |D3 F (x)[h, h, h]| ≤ 2{D2 F (x)}1/2 . (9.22)
C. (9.22) says that F satisfies the differential inequalities required by the definition of a
γ 2 -self-concordantbarrier for G = cl G0 . To complete the proof, we should demonstrate that F
is a barrier for G, i.e., that F (xi ) → ∞ whenever xi ∈ G0 are such that x ≡ limi xi ∈ ∂G. To
prove the latter statement, set
yi = A(xi )
and consider two possible cases:
C.1: x ∈ int G− ;
C.2: x ∈ ∂G− .
In the easy case of C.1 there exists y = limi yi = A(x), since A is continuous on the interior
of G− and, consequently, in a neighbourhood of x. Since x 6∈ G0 , y 6∈ int G+ , so that the
sequence yi comprised of the interior points of G+ converges to a boundary point of G+ and
therefore F + (yi ) → ∞. Since xi converge to an interior point of G− , the sequence F − (xi ) is
bounded, and the sequence F (xi ) = F + (yi ) + γ 2 F − (xi ) diverges to +∞, as required.
Now consider the more difficult case when x ∈ ∂G− . Here we know that F − (xi ) → ∞ (since
xi converge to a boundary point of the domain G− for which F − is a self-concordant barrier);
in order to prove that F (xi ) ≡ F + (yi ) + γ 2 F − (xi ) → ∞ it suffices, therefore, to prove that the
sequence F + (yi ) is below bounded. From concavity of A we have (compare with A)
F + (yi ) ≥ F + (zi ).
Now below boundedness of F + (yi ) is an immediate conseqeunce of the fact that the sequence
F + (zi ) is below bounded (indeed, {xi } is a bounded sequence, and consequently its image {zi }
under affine mapping also is bounded; and convex function F + is below bounded on any bounded
subset of its domain).
Ai (X) = τi − ξ T α(η)Qi ξ;
D. Now we are basically done. First, α commutates with Qi and is positive definite in view
of condition (A) (since α = A−1 (η) and η ∈ int Y ). It follows that α1/2 also commutates with
Qi , so that (9.24) can be rewritten as
√ √
D2 Ai (X)[Ξ] = −2[ αζ]T Qi [ αζ],
(as always, ≥ 0 for symmetric matrices stands for ”positive semidefinite”), whence also
γ = α[α−1 − a0 ]α ≥ 0.
and since γ is positive semidefinite and, due to its origin, commutates with Qi (since α and a0
do), we have ζ T γQi ζ = ζ T γ 1/2 Qi γ 1/2 ζ, so that
F (t, x) = − ln(t2/p − xT x) − ln t
xT x
G2 = cl{(x, s) ∈ Rn × R | ( , t) ∈ G, t > 0}.
t
Derive from this observation that if 1 ≤ p ≤ 2, then the function
F (t, x) = − ln(t2/p − xT x) − ln t
Exercise 9.6.2 #
1) Let
τ ξT
A=
ξ η
be a symmetric matrix (τ is p × p, η is q × q). Prove that A is positive definite if and only if both
p+q
the matrices η and τ − ξ T η −1 ξ are positive definite; in other words, the cone S+ of positive
158 CHAPTER 9. HOW TO CONSTRUCT SELF-CONCORDANT BARRIERS
Now we are in a position to complete, in a sense, our considerations related to the Truss
Topology Design problem (Section 5.7, Lecture 5). To the moment we know two formulations
of the problem:
Dual form (TTDd ): minimize t by choice of the vector x = (t; λ1 , ..., λk ; z1 , ..., zm ) (t and λj
are reals, zi ∈ Rn ) subject to the constraints
k
" #
X (bT zj )2
t≥ 2zjT fj +V i , i = 1, ..., m,
j=1
λj
9.6. EXERCISES ON CONSTRUCTING SELF-CONCORDANT BARRIERS 159
X
λ ≥ 0; λj = 1.
j
Primal form (ψ): minimize t by choice of x = (t; φ; βij ) (t and βij , i = 1, ..., m, j = 1, ..., k
are reals, φ ∈ Rm ) subject to the constraints
m
X 2
βij
t≥ , j = 1, ..., k;
i=1
φi
m
X
φ ≥ 0; φi = V ;
i=1
m
X
βij bi = fj , j = 1, ..., k.
i=1
Both forms are respectable convex problems; the question, anyhow, is whether we are
equipped enough to solve them via interior point machinery, or, in other words, are we clever
enough to point out explicit self-concordant barriers for the corresponding feasible domains. The
answer is positive.
Exercise 9.6.4 Consider the problem (TTDd ), and let
minimize cT x ≡ t s.t. x ∈ G ⊂ E,
where
k
X
E = {x | λj = 1}
j=1
Prove that Φ is (m+k)-logarithmically homogeneous self-concordant barrier for the closed convex
cone
k
X
G+ = cl{u | rj > 0, j = 1, ..., k; si ≥ rj−1 t2ij , i = 1, ..., m},
j=1
3) Prove that the domain G of the standard reformulation of (TTDd ) given by 1) is the
inverse image of G# = cl G0 under the affine mapping
P
si = t − 2 kj=1 zjT fj
√
x 7→ πx + p = tij = (bTi zj ) V
rj = λj
F (x) = Φ(πx + p)
and thus get the possibility to solve (TTDd ) by the long-step path-following method.
Note also that the problem
minimize cT x ≡ t s.t. x ∈ E, πx + p ∈ G+
Note that the primal formulation (ψ) of TTD can be treated in completely similar way (since
its formal structure is similar to that one of (TTDd ), up to presence of a larger number of linear
equality constraints; linear equalities is something which does not influence our abilities to point
out self-concordant barriers, due to the Substitution rule (L).
9.6. EXERCISES ON CONSTRUCTING SELF-CONCORDANT BARRIERS 161
is 1-appropriate for G+ = R+ .
Conclude from this observation that the function
Applications in Convex
Programming
To the moment we know several general schemes of polynomial time interior point methods; at
the previous lecture we also have developed technique for constructing self-concordant barriers
the methods are based on. It is time now to look how this machinery works. To this end let us
consider several standard classes of convex programming problems. The order of exposition is as
follows: for each class of problems in question, I shall present the usual description of the problem
instances, the standard and conic reformulations required by the interior point approcah, the
related self-concordant barriers and, finally, the complexities (Newton and arithmetic) of the
resulting methods.
In what follows, if opposite is not explicitly stated, we always assume that the constraints
involved into the problem satisfy the Slater condition.
163
164 CHAPTER 10. APPLICATIONS IN CONVEX PROGRAMMING
Thus, to solve an LP problem, we can use both the basic and the long-step versions of the
path-following method.
Complexity: as we remember, the Newton complexity of finding an ε-solution by a path-following
√
method associated with a ϑ-self-concordant barrier is M = O(1) ε ln(Vε−1 ), O(1) being certain
absolute constant2 and V is a data-dependent scale factor. Consequently, the arithmetic cost of
an ε-solution is MN , where N is the arithmetic cost of a single Newton step. We see that the
complexity of the method is completely characterized by the quantities ϑ and N . Note that the
product √
C = ϑN
is the factor at the term ln(Vε−1 ) in the expression for the arithmetic cost of an ε-solution; thus,
C can be thought of as the arithmetic cost of an accuracy digit in the solution (since ln(Vε−1 )
can be naturally interpreted as the amount of accuracy digits in an ε-solution).
Now, in the situation in question ϑ = m is the larger size of the LP problem, and it remains
to understand what is the cost N of a Newton step. At a step we are given an x and should
form and solve with respect to y the linear system of the type
the gradient and the Hessian of the barrier in our case, as it is immediately seen, are given by
m
X
F 0 (x) = dj aj , F 00 (x) = AT D2 A,
i=j
where
dj = [bj − aTj x]−1
are the inverse residuals in the constraints at the point x and
D = Diag(d1 , ..., dm ).
It is immediately seen that the arithmetic cost of assembling the Newton system (i.e., the cost
of computing F 0 and F 00 ) is O(mn2 ); to solve the system after it is assembled, it takes O(n3 )
operations more3 . Since m ≥ n (recall that Rank A = n), the arithmetic complexity of a step is
dominated by the cost O(mn2 ) of assembling the Newton system. Thus, we come to
• first, when evaluating the arithmetic cost of a Newton step, we have implicitly assumed
that the matrix of the problem is dense and ”unstructured”; this case never occurs in
actual large-scale computations, so that the arithmetic cost of a Newton step normally has
nothing in common with the above O(mn2 ) and heavily depends on the specific structure
of the problem;
• second, and more important fact is that the ”long-step” versions of the methods (like the
potential reduction ones and the long step path following method) in practice possess much
better Newton complexity than it is said by the theoretical worst-case efficiency estimate.
According to the latter estimate, the Newton complexity should be proportional at least
to the square root of the larger size m of the problem; in practice the dependence turns
out to be much better, something like O(ln m); in the real-world range of values of sizes it
means that the Newton complexity of long step interior point methods for LP is basically
independent of the size of the problem and is something like 20-50 iterations. This is the
source of ”competitive potential” of the interior point methods versus the Simplex method.
3) Unfeasible start. To the moment all schemes of interior point methods known to us have com-
mon practical drawback: they are indeed ”interior point schemes”, and to start a method, one
should know in advance a strictly feasible solution to the problem. In real-world computations
this might be a rather restrictive requirement. There are several ways to avoid this drawback,
e.g., the following ”big M ” approach: to solve (10.1), let us extend x by an artificial design
variable t and pass from the original problem to the new one
here e = (1, ..., 1)T . The new problem admits an evident strictly feasible solution x = 0, t = 1;
on the other hand when M is large, then the x-component of optimal solution to the problem is
”almost feasible almost optimal” for the initial problem (theoretically, for large enough M the
x-components of all optimal solutions to the modified problem are optimal solutions to the initial
one). Thus, we can apply our methods to the modified problem (where we have no difficulties
with initial strictly feasible solution) and thus get a good approximate solution to the problem
of interest. Note that the same trick can be used in our forthcoming situations.
Standard reformulation: the problem from the very beginning is in the standard form.
Barrier: as we know from Lecture 9, the function
− ln(t − f (x))
is 1-self-concordant barrier for the epigraph {t ≥ f (x)} of a convex quadratic form f (x) =
xT B T Bx + bT x + c. Since the Lebesque set Gf = {x | f (x) ≤ 0} of f is the inverse image
of this epigraph under the linear mapping x 7→ (0, x), we conclude from the Substitution rule
(L) (Lecture 9) that the function − ln(−f (x)) is 1-self-concordant barrier for Gf , provided that
f (x) < 0 at some x. Applying the Decomposition rule (Lecture 9), we see that the function
m
X
F (x) = − ln(−fj (x)) (10.8)
j=1
r(Bj ) = k(Bj ) + 1
2
Thus, we see that Gj is exactly the inverse image of the second order cone Kr(B under the
j)
affine mapping
τ = 12 [1 − bTj x − cj ]
x 7→ πj x + pj = σ = 21 [1 + bTj x + cj ] .
ξ = Bj x
It is immediately seen that the above barrier − ln(−fj (x)) for Gj is the superposition of the
standard barrier
Ψj (τ, σ, ξ) = − ln(τ 2 − σ 2 − ξ T ξ)
168 CHAPTER 10. APPLICATIONS IN CONVEX PROGRAMMING
2
for the cone Kr(B and the affine mapping x 7→ πj x + pj . Consequently, the barrier F (x) for
j)
the feasible domain G of our quadraticaly constrained problem can be represented as
τ1 = 12 [1 − bT1 x − c1 ]
σ1 = 1 [1 + bT x + c1 ]
2 1
ξ1 = B1 x
F (x) = Φ(πx + p), πx + p = ... , (10.9)
τm = 1 [1 − bTm x − cm ]
2
σm = 1 [1 + bT x + cm ]
2 m
ξm = Bm x
where m
X
Φ(τ1 , σ1 , ξ1 , ..., τm , σm , ξm ) = − ln(τj2 − σj2 − ξjT ξj ) (10.10)
j=1
is the direct sum of the standard self-concordant barriers for the second order cones Kr(B 2 ;
j)
as we know from Proposition 5.3.2.(iii), Φ is (2m)-logarithmically homogeneous self-concordant
2
barrier for the direct product K of the cones Kr(B . The barrier Φ possesses the immediately
j)
computable Legendre transformation
(as in the LP case, expressions for N and C correspond to the case of dense ”unstructured”
matrices Bj ; in the case of sparse matrices with reasonable nonzero patterns these characteristics
become better).
L⊥ + f = {s | π T s = c}
10.3. APPROXIMATION IN LP NORM 169
Path-following approach seems to be the only one which can be easily carried out (in the
potential reduction scheme there are difficulties with explicit formulae for the Legendre trans-
formation of the primal barrier).
Standard reformulation of the problem is obtained by adding m extra variables tj and rewriting
the problem in the equivalent form
m
X
minimize tj s.t. (t, x) ∈ G = {(t, x) ∈ Rm+n | |vj − uTj x|p ≤ tj , j = 1, ..., m}. (10.16)
j=1
Barrier: self-concordant barrier for the feasible set G of problem (10.16) was constructed in
Lecture 9 (Example 9.2.1, Substitution rule (L) and Decomposition rule):
m
X 2/p
F (t, x) = Fj (tj , x), Fj (t, x) = − ln(tj − (vj − uTj x)2 ) − 2 ln tj , ϑ = 4m.
j=1
Complexity of the path-following method associated with the indicated barrier is characterized
by
ϑ = 4m; N = O([m + n]n2 ); C = O(m1/2 [m + n]n2 ).
The above expression for the arithmetic complexity N needs certain clarification: our barrier
depends on m + n variables, and its Hessian is therefore an (m + n) × (m + n) matrix; how it
could be that we can assemble and invert this matrix at the cost of O(n2 [m + n]) operations,
not at the ”normal” cost O([m + n]3 )?
The estimate for N is given by the following reasoning. Since the barrier is separable, its
Hessian H is the sum of Hessians of the ”partial barriers” Fj (t, x); the latter Hessians, as it
is easily seen, can be computed at the arithmetic cost O(n2 ) and are of very specific form:
the m × m block corresponding to t-variables contains only one nonzero entry (coming from to
∂2 2
∂tj ∂tj ). It follows that H can be computed at the cost O(mn ) and is (m + n) × (m + n) matrix
of the form
T PT
H= ,
P Q
170 CHAPTER 10. APPLICATIONS IN CONVEX PROGRAMMING
u = T −1 [p − P T v]
and substitute this expression in the remaining equations to get a n × n system for u:
[Q − P T −1 P T ]u = q − P T −1 p.
To assemble this latter system it clearly costs O(mn2 ) operations, to solve it - O(n3 ) operations,
and the subsequent computation of u takes O(mn) operations, so that the total arithmetic cost
of assembling and solving the entire Newton system indeed is O([m + n]n2 ).
What should be noticed here is not the particular expression for N , but the general rule which
is illustrated by this expression: the Newton systems which arise in the interior point machinery
normally possess nontrivial structure, and a reasonable solver should use this structure in order
to reduce the arithmetic cost of Newton steps.
Here x ∈ Rn , Ij are subsets of the index set I = {1, ..., k} and all coefficients cij are positive,
j = 1, ..., m.
Note that in the standard formulation of a Geometrical Programming program the objective
and the constraints are sums, with nonnegative coefficients, of ”monomials” ξ1α1 ...ξnαn , ξi being
the design variables (which are restricted to be positive); the exponential form (10.17) is obtained
from the ”monomial” one by passing from ξi to the new variables xi = ln ξi .
Here it again is difficult to compute the Legendre transformation of the barrier associated
with the conic reformulation of the problem, so that we restrict ourselves with the Path-following
approach only.
Standard reformulation: to get it, we introduce k additional variables ti , one per each of the
exponents exp{aTi x} involved into the problem, and rewrite (10.17) in the following equivalent
form: X
minimize ci0 ti s.t. (t, x) ∈ G, (10.18)
i∈I0
with
X
G = {(t, x) ∈ Rk × Rn | cij tj ≤ dj , j = 1, ..., m; exp{aTi x} ≤ ti , i = 1, ..., k}.
i∈Ij
10.4. GEOMETRICAL PROGRAMMING 171
Barrier. The feasible domain G of the resulting standard problem is given by a number of linear
constraints and a number of exponential inequalities exp{aTi x} ≤ ti . We know how to penalize
the feasible set of a linear constraint, and there is no difficulty in penalizing the feasible set of
an exponential inequality, since this set is inverse image of the epigraph
{(τ, ξ) | τ ≥ exp{ξ}}
under an affine mapping.
Now, a 2-self-concordant barrier for the epigraph of the exponent, namely, the function
Ψ(τ, ξ) = − ln(ln τ − ξ) − ln τ
was found in Lecture 9 (Example 9.2.3). Consequently, the barrier for the feasible set G is
k
X m
X X
t
F (t, x) = Ψ(ti , aTi x) − ln dj − cij tj = Φ π +p ,
i=1 j=1
x
i∈Ij
where
k
X m
X
Φ(τ1 , ξ1 , ..., τk , ξk ; τk+1 , τk+2 , ..., τk+m ) = Ψ(τi , ξi ) − ln τk+j
i=1 j=1
t
is self-concordant barrier with parameter 2k + m and the affine substitution π + p is given
x
by
τ1 = t 1
ξ1 = aT1 x
...
τ k = t k
t
π
+p= ξk = ak xT .
x P
τ
k+1 = d1 − Pi∈I1 ci1 ti
τk+2 = d2 − i∈I2 ci2 ti
...
P
τk+m = dm − i∈Im cim ti
Structural assumption. To demonstrate that the indicated barrier satisfies the Structural as-
sumption, it suffices to point out the Legendre transformation of Φ; since this latter barrier is
the direct sum of k copies of the barrier
Ψ(τ, ξ) = − ln(ln τ − ξ) − ln τ
and m copies of the barrier
ψ(τ ) = − ln τ,
the Legendre transformation of Φ is the direct sum of the indicated number of copies of the
Legendre transformations of Ψ and ψ. The latter transformations can be computed explicitly:
∗ η+1
Ψ (σ, η) = (η + 1) ln − η − ln η − 2, Dom Ψ∗ = {σ < 0, η > 0},
−σ
ψ ∗ (σ) = − ln(−σ) − 1, Dom ψ ∗ = {σ < 0}.
Thus, we can solve Geometrical programming problems by both the basic and the long-step
path-following methods.
Complexity of the path-following method associated with the aforementioned barrier is given by
ϑ = 2k + m; N = O(mk 2 + k 3 + n3 ); C = O((k + m)1/2 [mk 2 + k 3 + n3 ]).
172 CHAPTER 10. APPLICATIONS IN CONVEX PROGRAMMING
where Q is a polytope in Rn and f is convex function. The InsEll, which can be regarded as a
multidimensional extension of the usual bisection, generates a decreasing sequence of polytopes
Qi which cover the optimal set of the problem; these localizers are defined as
i
f (xi ) − min f ≤ exp{−κ }[max f − min f ],
Q n Q Q
κ > 0 being an absolute constant; it is known also that the indicated rate of convergence is the
best, in certain rigorous sense, rate a convex minimization method can achieve, so that InsEll
is optimal. And to run the method, you should solve at each step an auxiliary problem of the
type (Inner) related to a polytope Q given by list of linear inequalities defining the polytope.
As for problem (Outer), the applications known to me come from Control. Consider a
discrete time linear controlled plant given by
where x(t) ∈ Rn and u(t) ∈ Rk are the state of the plant and the control at moment t and A,
B are given n × n and n × k matrices, A being nonsingular. Assume that u(·) can take values
in a polytope U ⊂ Rk given as a convex hull of finitely many points u1 , ..., um :
U = Conv{u1 , ..., um }.
Let the initial state of the plant be known, say, be zero. The question is: what is the set XT of
possible states of the plant at a given moment T ?
This is a difficult question which, in the multi-dimensional case, normally cannot be answered
in a ”closed analytic form”. One of the ways to get certain numerical information here is to
compute outer ellipsoidal approximations of the sets Xt , t = 0, ..., T - ellipsoids Et which cover
the sets Xt . The advantage of this approach is that these approximations are of once for ever fixed
”tractable” geometry, in contrast to the sets Xt which may become more and more complicated
10.5. EXERCISES ON APPLICATIONS OF INTERIOR POINT METHODS 173
• an ellipsoid W ⊂ Rn is the image of the unit Euclidean ball under a one-to-one affine
mapping of Rn onto itself:
W = {u | (u + X −1 x)T X(u + X −1 x) + r − xT X −1 x ≤ 0;
δ(r, x, X) ≡ xT X −1 x − r > 0.
The representation of W via r, x, X is not unique (proportional triples define the same ellipsoid).
Therefore we always can enforce the quantity δ to be ≤ 1, and in what follows this is our default
convention on the parameterization in question.
It is clearly seen that the volume of the ellipsoid E(r, x, X) is nothing but
Now let us look at problem (Inner). From the above discussion we see that it can be written
down as
(Inner’) minimize F (X) = − ln Det X s.t. (x, X) ∈ GI ,
with
GI = {(x, X) | X ∈ Sn+ , I(x, X) ⊂ Q};
here Sn+ is the cone of positive semidefinite matrices in the space Sn of symmetric n×n matrices.
Exercise 10.5.1 Prove that (Inner’) is a convex program: its feasible domain GI is closed
and bounded convex set with a nonempty interior in the space Rn × Sn , and the objective is a
continuous convex function (taking values in R ∪ {+∞}) on GI and finite on the interior of the
domain GI .
Exercise 10.5.2 + Prove that (Outer’) is a convex programming program: GO is closed convex
domain, and F is continuous convex function on GO taking values in R ∪ {+∞} and finite on
int GO . Prove that the problem is equivalent to (Outer).
Thus, both (Inner) and (Outer) can be reformulated as convex programs. This does not,
anyhow, mean that the problems are computationally tractable. Indeed, the minimal ”well
posedness” requirement on a convex problem which allows to speak about it numerical solution
is as follows:
(!) given a candidate solution to the problem, you should be able to check whether the
solution is feasible, and if it is the case, you should be able to compute the value of the objective
at this solution5 .
Whether (!) is satisfied or not for problems (Inner) and (Outer), it depends on what is the
set Q and how it is represented; and, as we shall see in a while, ”well posed” cases for one of
our problems could be ”ill posed” for another. Note that ”well posedness” for (Inner) means a
possibility, given an ellipsoid W to check whether W is contained in Q; for (Outer) you should
be able to check whether W contains Q.
Consider a couple of examples.
• Q is a polytope given ”by facets”, more exactly, by a list of linear inequalities (not all of
them should represent facets, some may be redundant).
This leads to well-posed (Inner) (indeed, to check whether W is contained in Q, i.e., in
the intersection of a given finite family of half-spaces, is the same as to check whether W
is contained in each of the half-spaces, and this is immediate). In contrast to this, in the
5
to apply interior point methods, you need, of course, much stronger assumptions: you should be able to point
out a ”computable” self-concordant barrier for the feasible set
10.5. EXERCISES ON APPLICATIONS OF INTERIOR POINT METHODS 175
case in question (Outer) is ill-posed: to check whether, say, a Euclidean ball W contains
a polytope given by a list of linear inequalities is, basically, the same as to maximize a
convex quadratic form (namely, |x|22 ) under linear inequality constraints, and this is an
NP-hard problem.
• Q is a polytope given ”by vertices”, i.e., represented as a convex hull of a given finite set
S.
Here (Outer) is well-posed (indeed, W contains Q if and only if it contains S, which can
be immediately verified), and (Inner) is ill-posed (it is NP-hard).
As we shall see in a while, in the case of a polytope Q our problems can be efficiently solved
by interior point machinery, provided that they are well-posed.
Q = {x | aTj x ≤ bj , j = 1, ..., m}
Exercise 10.5.3 Prove that in the case in question problem (Inner) can be equivalently formu-
lated as follows:
To solve (Inner Lin) by interior point machinery, we need self-concordant barrier for the feasible
set of the problem. This set is given by a number of constraints, and in our ”barrier toolbox”
we have self-concordant barriers for the feasible sets of all of these constraints, except the latter
of them. This shortcoming, anyhow, can be immediately overcome.
of the function − ln Det X. Derive from this observation that the function
m
X
F (t, x, X) = − ln([bj − aTj x]2 − aTj X T Xaj ) − ln(t + ln Det X) − ln Det X
j=1
is (2m + n + 1)-self-concordant barrier for the feasible domain G of problem (Inner Lin). What
are the complexity characteristic of the path-following method associated with this barrier?
176 CHAPTER 10. APPLICATIONS IN CONVEX PROGRAMMING
Exercise 10.5.5 Prove that in the case in question problem (Outer’) becomes the problem
(Outer Lin) minimize t s.t. (t, r, x, X) ∈ G,
with
G = {(t, r, x, X) |
v T Xv + 2txT v + rt2 ≤ 0
v T Y v + 2ty T v + st2 ≤ 0.
In fact we can omit here ”t 6= 0”, since for t = 0 the first inequality can be valid only when
v = 0 (recall that X is positive definite), and the second inequality then also is valid. Thus, we
come to the conclusion as follows:
wT Sw ≤ 0 ⇒ wT Rw ≤ 0,
where
X x Y y
S= , R= .
xT r yT s
In fact we can say something additional about the quadratic forms S and R we actually are
interested in:
(*) in the case of matrices coming from ellipsoids there is a direction w with negative wT Sw,
and there is a direction w0 with positive (w0 )T Rw0 .
Now, there is an evident sufficient condition which allos to give a positive answer to (??): if
R ≤ λS with some nonnegative λ, then, of course, (Impl) is valid. It is a kind of miracle that
this sufficient condition is also necessary, provided that wT Sw < 0 for some w:
Exercise 10.5.7 ∗ Prove that if S and R are symmetric matrices of the same size such that
the implication (Impl) is valid and S is such that wT Sw < 0 for some w, then there exists
nonnegative λ such that
R ≤ λS;
if, in addition, (w0 )T Rw0 > 0 for some w0 , then the above λ is positive.
Conclude from the above, that if S and R are symmetric matrices of the same size such that
wST SwS < 0 for some wS and wR T Rw > 0 for some w , then implication (Impl) is valid if and
R R
only if
R ≤ λS
It is worthy to explain why the statement given in the latter exercise is so amazing. (Impl)
says exactly that the quadratic form f1 (w) = −wT Rw is nonnegative whenever the quadratic
form f2 (w) = wT Sw is nonpositive, or, in other words, that the function
is nonegative everywhere and attains therefore its minimum at w = 0. If the functions f1 and f2
were convex, we could conclude from this that certain convex combination µf1 (w) + (1− µ)f2 (w)
of these functions also attains its minimum at w = 0, so that −µR + (1 − µ)S is positive
semidefinite; the conclusion is exactly what is said by our statement (it says also that µ > 0,
so that the matrix inequality can be rewritten as R ≤ λS with λ = (1 − µ)µ−1 ; this additional
information is readily given by the assumption that wT Sw < 0 and causes no surprise). Thus,
the conclusion is the same as in the situation of convex f1 and f2 ; but we did not assume the
functions to be convex! Needless to say, the ”statement” of the type
max{f1 , f2 } ≥ 0 everywhere ⇒ ∃µ ∈ [0, 1] : µf1 + (1 − µ)f2 ≥ 0 everywhere
fails to be true for arbitrary f1 and f2 , but, as we have seen, it is true for homogeneous quadratic
forms. Let me add that the implication
where Si = R(pi , ai , Ai ).
Exercise 10.5.8 Prove that in the case in question problem (Outer) can be equivalently formu-
lated as the following convex program:
(Outer Ell) minimize t s.t. (t, r, x, X, λ) ∈ G,
where
G = cl{(t, r, x, X, λ) |
X ∈ int Sn+ , t + ln Det X ≥ 0, δ(r, x, X) ≤ 1, R(r, x, X) ≤ λi Si , i = 1, ..., m}.
Prove that the function
is ([m + 2]n + 2)-self-concordant barrier for the feasible domain G of the problem. What are the
complexity characteristics of the path-following method associated with this barrier?
Chapter 11
Semidefinite Programming
This concluding lecture is devoted to an extremely interesting and important class of convex
programs - the so called Semidefinite Programming.
Aj (x) ≥ 0, j = 1, ..., M,
where Aj (x) are symmetric matrices affinely depending on x (i.e., each entry of Aj (·) is an affine
function of x), and A ≥ 0 for a symmetric matrix A stands for ”A is positive semidefinite”.
Note that a system of m Linear Matrix Inequality constraints (LMI’s) Aj (x) ≥ 0, j = 1, ..., M ,
is equivalent to a single LMI
A1 (x)
A2 (x)
A(x) ≥ 0, A(x) = Diag{A1 (x), ..., AM (x)} =
... ... ... ...
AM (x)
(blank space corresponds to zero blocks). Further, an affine in x matrix-valued function A(x)
can be represented as
n
X
A(x) = A0 + xi Ai ,
i=1
A0 ,...,An being fixed matrices of the same size; thus, in a semidefinite program we should
minimize a linear form of x1 , ..., xn provided that a linear combination of given matrices Ai with
the coefficients xi plus the constant term A0 is positive semidefinite.
The indicated problem seems to be rather artificial. Let me start with indicating several
examples of important problems covered by Semidefinite Programming.
179
180 CHAPTER 11. SEMIDEFINITE PROGRAMMING
f (x) ≡ xT B T Bx + bT x + c ≤ 0,
with positive definite block Q is positive semidefinite if and only if the matrix P − RT Q−1 R is
positive semidefinite1 ; thus, Af (x) is positive semidefinite if and only if −c − bT x ≥ xT B T Bx,
i.e., if and only if f (x) ≤ 0.
Thus, a convex quadratic constraint can be equivalently represented by an LMI; it follows
that a convex quadratic quadratically constrained problem can be resresented as a problem of
optimization under LMI constraints, i.e., as a semidefinite program.
The outlined examples are not that convincing: there are direct ways to deal with LP and
QCQP, and it hardly makes sense to reduce these problems to evidently more complicated
semidefinite programs. In the forthcoming examples LMI constraints come from the nature of
the problem in question.
As an application of the Eigenvalue problem, let us look at computation of the Lovasz capacity
number of a graph. Consider a graph Γ with the set of vertices V and set of arcs E. One of
the fundamental characteristics of the graph is its inner stability number α(Γ) - the maximum
cardinality of an independent subset of vertices (a subset is called independent, if no two vertices
in it are linked by an arc). To compute α(Γ), this is an NP-hard problem.
There is another interesting characteristic of a graph - the Shannon capacity number σ(Γ)
defined as follows. Let us interpret the vertices of Γ as letters of certain alphabet. Assume that
1
to verify this statement, note that the minimum of the quadratic form v T P v + 2v T RT u + uT Qu with respect
to u is given by u = −Q−1 Rv, and the corresponding minimum value is v T P v − v T RT Q−1 Rv; A is positive
semidefinite if and only if this latter quantity is ≥ 0 for all v
11.2. SEMIDEFINITE PROGRAMMING: EXAMPLES 181
we are transmitting words comprised of these letters via an unreliable communication channel;
unreliability of the channel is described by the arcs of the graph, namely, letter i on input can
become letter j on output if and only if i and j are linked by an arc in the graph. Now, what is
the maximum number sk of k-letter words which you can send through the channel without risk
that one of the words will be converted to another? When k = 1, the answer is clear - exactly
α(Γ); you can use, as these words, letters from (any) maximal independent set V ∗ of vertices.
Now, sk ≥ sk1 - the words comprised of letters which cannot be ”mixed” also cannot be mixed.
In fact sk can be greater than sk1 , as it is seen from simple examples. E.g., if Γ is the 5-letter
graph-pentagon, then s1 = α(Γ) = 2, but s2 = 5 > 4 (you can draw the 25 2-letter words in
our alphabet and find 5 of them which cannot be mixed). Similarly to the inequality sk ≥ sk1 ,
you can prove that sp×q ≥ sqp (consider sp p-letter words which cannot be mixed as your new
alphabet and note that the words comprised of these q ”macro-letters” also cannot be mixed).
From the relation sp×q ≥ sqp (combined with the evident relation sp ≤ |V |p ) it follows that there
exists
σ(Γ) = lim s1/pp = sup s1/p
p ;
p→∞ p
1/p
this limit is exactly the Shannon capacity number. Since σ(Γ) ≥ sp for every p, and, in
particular, for p = 1, we have
σ(Γ) ≥ α(Γ);
√ √
for the above 5-letter graph we also have σ(Γ) ≥ s2 = 5.
The Shannon capacity number is an upper bound for the inner stability number, which is a
good news; a bad news is that σ(Γ) is even less computationally tractable than α(Γ). E.g., for
more than √ 20 years nobody knew whether the Shannon capacity of the above 5-letter graph is
equal to 5 or is greater than this quantity.
In 1979, Lovasz introduced a ”computable” upper bound for σ(Γ) (and, consequently, for
α(Γ)) - the Lovasz capacity number θ(Γ) which is defined as follows: let N be the number of
vertices in the graph, and let the vertices be numbered by 1,...,N . Let us associate with each arc
γ in the graph its own variable xγ , and let B(x) be the following symmetric matrix depending
on the collection x of these variables: Bij (x) is 1, if either i = j, or the vertices i and j are not
adjacent; if the vertices are linked by arc γ, then Bij (x) = xγ . For the above 5-letter graph,
e.g.,
1 x12 1 1 x51
x 1 x23 1 1
12
B(x) = 1 x23 1 x34 1 .
1 1 x34 1 x45
x51 1 1 x45 1
Now, by definition the Lovasz capacity number is the minimum, over all x’s, of the maximum
eigenvalue of the matrix B(x). Lovasz has proved that his capacity number is an upper bound
for the Shannon capacity number and the inner stability number:
Thus, Lovasz capacity number (which can be computed via solving a semidefinite program)
gives important information on the fundamental combinatorial characteristic of a graph. √In
many cases the information
√ is complete, as it happens in our example, where√θ(Γ) = 5;
consequently, σ(Γ) = 5, since we know that for the graph in question σ(Γ) ≥ 5; and since
α(Γ) is integer, we can rewrite the Lovasz inequality as α(Γ) ≤ bθ(Γ)c and get for our example
the correct answer α(Γ) = 2.
182 CHAPTER 11. SEMIDEFINITE PROGRAMMING
One of the standard ways to solve the problem is to use the branch-and-bound scheme, and for
this scheme it is crucial to generate lower bounds for the optimal value in the subproblems arising
in course of running the method. These subproblems are of the same structure as the initial
problem, so that we may think of how to bound from below the optimal value in the problem. The
traditional way here is to pass from the Boolean problem to its Linear Programming relaxation
by replacing the Boolean restrictions uj ∈ {0; 1} with linear inequalities 0 ≤ uj ≤ 1. Some years
ago Shor suggested to use nonlinear relaxation which is as follows. We can rewrite the Boolean
constraints equivalently as quadratic equalities
uj (1 − uj ) = 0, j = 1, ..., k;
further, we can add to our initial linear equations their quadratic implications like
k
X k
X
[qi − pij uj ][qi0 − pi0 j uj ] = 0, i, i0 = 1, ..., n.
j=1 j=1
thus, we can equivalently rewrite our problem as a problem of continuous optimization with
linear objective and quadratic equality constraints
where all Ki are quadratic forms. Let us form the Lagrange function
N
X
L(u, x) = dT u + xi Ki (u) = uT A(x)u + 2bT (x)u + c(x),
i=1
where A(x), b(x), c(x) clearly are affine functions of the vector x of Lagrange multipliers. Now
let us pass to the ”dual” problem
If our primal problem (11.1) were convex, the optimal value c∗ in the dual, under mild regularity
assumptions, would be the same as the optimal value in the primal problem; our situation has
nothing in common with convexity, so that we should not hope that c∗ is the optimal value in
(11.1); anyhow, independently of any convexity assumptions c∗ is a lower bound for the primal
optimal value2 ; this is the bound suggested by Shor.
Let us look how to compute Shor’s bound. We have
so that f (x) is the largest real f for which the quadratic form of u
is nonnegative for all u; substituting u = t−1 v, we see that the latter quadratic form of u is
nonnegative for all u if and only if the homogeneous quadratic form of v, t
is nonnegative whenever t 6= 0. By continuity reasons the resulting form is nonnegative for all
v, t with t 6= 0 if and only if it is nonnegative for all v, t, i.e., if and only if the matrix
c(x) − f bT (x)
A(f, x) =
b(x) A(x)
is positive semidefinite. Thus, f (x) is the largest f for which the matrix A(f, x) is positive
semidefinite; consequently, the quantity supx f (x) we are interested in is nothing but the optimal
value in the following semidefinite program:
It can be easily seen that the lower bound c∗ given by Shor’s relaxation is not worse than that
one given by the usual LP relaxation. Normally the ”semidefinite” bound is better, as it is the
case, e.g., in the following toy problem
x1 , x2 , x3 , x4 = 0, 1
with optimal value 68 (x∗1 = x∗3 = 1, x∗2 = x∗4 = 0); here Shor’s bound is 43, and the LP-based
bound is 40.
where
Q(x) = Conv{Q1 x, ..., QM x},
Qi being k × k matrices. Thus, every vector x ∈ Rk is associated with the polytope Q(x),
and the trajectories of the inclusion are differentiable functions x(t) such that their derivatives
x0 (t) belong, for any t, to the polytope Q(x(t)). When M = 1, we come to the usual linear
time-invariant system
x0 (t) = Q1 x(t).
3
this example was the subject of exercises to Lecture 7, see Section 7.6.1
184 CHAPTER 11. SEMIDEFINITE PROGRAMMING
The general case M > 1 allows to model time-varying systems with uncertainty; indeed, a
trajectory of the inclusion is the solution to the time-varying equation
and the trajectory of any time-varying equation of this type clearly is a trajectory of the inclusion.
One of the most fundamental questions about a dynamic system is its stability: what happens
with the trajectories as t → ∞ - do they tend to 0 (this is the stability), or remain bounded,
or some of them go to infinity. A natural way to prove stability is to point out a quadratic
Lyapunov function f (x) = xT Lx, L being positive definite symmetric matrix, which ”proves the
decay rate α of the system”, i.e., satisfies, for some α, the inequality
d
f (x(t)) ≤ −αf (x(t))
dt
along all trajectories x(t) of the inclusion. From this differential inequality it immediately follows
that
f (x(t)) ≤ f (x(0)) exp{−αt};
if α > 0, this proves stability (the trajectories approach the origin at a known exponential rate);
if α = 0, the trajectories remain bounded; if α < 0, we do not know whether the system is
stable, but we have certain upper bound for the rate at which the trajectories may go to infinity.
It is worthy to note that in the case of linear time-invariant system the existence of quadratic
Lyapunov function which ”proves a negative decay rate” is a necessary and sufficient stability
condition (this is stated by the famous Lyapunov Theorem); in the general case M > 1 this
condition is only sufficient, and is not anymore necessary.
Now, where could we take a quadratic Lyapunov function which proves stability? The
derivative of the function xT (t)Lx(t) in t is 2xT (x)Lx0 (t); if L proves the decay rate α, this
quantity should be ≤ −αxT (t)Lx(t) for all trajectories x(·). Now, x(t) can be an arbitrary
point of Rk , and for given x = x(t) the vector x0 (t) can be an arbitrary vector from Q(x).
Thus, L ”proves decay rate α” if and only if it is symmetric positive definite (this is our a priori
restriction on the Lyapunov function) and is such that
2xT Ly ≤ −αxT Lx
for all x and for all y ∈ Q(x); since the required inequality is linear in y, it is valid for all
y ∈ Q(x) if and only if it is valid for y = Qi x, i = 1, ..., M (recall that Q(x) is the convex hull
of the points Qi x). Thus, positive definite symmetric L proves the decay rate α if and only if
for all x, i.e., if and only if L satisfies the system of Linear Matrix Inequalities
where
Q(x, u) = Conv{Q1 x + B1 u, ..., QM x + BM u}
with k × k matrices Qi and k × l matrices Bi . Here x ∈ Rk denotes state of the system and
u ∈ Rl denotes the control. Our goal is to ”close” the system by a linear time-invariant feedback
u(t) = Kx(t),
K being k × l feedback matrix, in a way which ensures stability of the closed-loop system
Here again we can try to achieve our goal via quadratic Lyapunov function xT Lx. Namely,
if, for some given α > 0, we are able to find simultaneously a k × l matrix K and a positive
definite symmetric k × k matrix L in such a way that
d T
(x (t)Lx(t)) ≤ −αxT (t)Lx(t) (11.7)
dt
for all trajectories of (11.6), then we will get both the stabilizing feedback and a sertificate that
it indeed stabilizes the system.
Same as above, (11.7) and the initial requirement that L should be positive definite result
in the system of matrix inequalities
the unknowns in the system are both L and K. The system is not linear in (L, K); nevertheless,
the LMI-based approach still works. Namely, let us perform nonlinear substitution:
QTi R−1 + R−1 Qi + R−1 P T BiT R−1 + R−1 Bi P R−1 ≤ −αR−1 , i = 1, ..., M ; R > 0,
or, which is the same (multiply by R from the left and from the right)
which is a system of LMI’s in variables R, P , or, which is the same, a semidefinite program with
trivial objective.
There are many other examples of semidefinite problems arising in Control (and in other
areas like Structural Design), but I believe that the already indicated examples demonstrate
that Semidefinite Programming possesses a wide variety of important applications.
186 CHAPTER 11. SEMIDEFINITE PROGRAMMING
It is reasonable to assume that A(·) possesses certain structure, namely, that it is is block-
diagonal matrix with certain number M of diagonal blocks, and the blocks are of the row sizes
m1 , ..., mM . Indeed, normally A(·) represents a system of LMI’s rather than a single LMI; and
when assembling system of LMI’s
Ai (x) ≥ 0, i = 1, ..., M
x 7→ A(x) : Rn → Sµ .
x 7→ A(x),
Structural assumption is satisfied simply by the origin of the barrier F : it comes from the m-
logarithmically homogeneous self-concordant barrier Φ for Sµ+ , and the latter barrier possesses
the explicit Legendre transformation
Φ∗ (S) = Φ(−S) − m.
Complexity. The only complexity characteristic which needs special investigation is the arith-
metic cost N of a Newton step. Let us look what is, computationally, this step. First of all, a
straigtforward computation results in the following expressions for the derivatives of the barrier
Φ:
DΦ(X)[H] = − Tr{X −1 H}; D2 Φ(X)[H, H] = Tr{X −1 HX −1 H}.
Therefore the derivatives of the barrier F (x) = Φ(A(x)) are given by the relations
∂
F (x) = − Tr{A−1 (x)Ai }
∂xi
Pn
(recall that A(x) = A0 + i=1 xi Ai ),
∂2
F (x) = Tr{A−1 (x)Ai A−1 (X)Aj }.
∂xi ∂xj
we should perform computations as follows (the expressions in brackets {·} represent the arith-
metic cost of the computation; for the sake of clarity, I omit absolute constant factors):
P
• given x, compute X = A(x) {n M 2
i=1 mi - you should multiply n block-diagonal matrices
Ai by xi ’s and take the sum of these matrices and the matrix A0 };
PM
• given X, compute X −1 { 3
i=1 mi ; recall that X is block-diagonal};
PM
• given X −1 , compute n components − Tr{X −1 Ai } of the vector F 0 (x) {n 2
i=1 mi };
P
• given X −1 , compute n matrices Abi = X −1 Ai X −1 {n M 3
i=1 mi } and then compute n(n +
P
1)/2 quantities F 00 (x)ij = Tr{Abi Aj }, 1 ≤ i ≤ j ≤ n {n2 M 2
i=1 mi }.
It takes O(n3 ) operations more to solve the Newton system after it is assembled. Note that we
may assume that A(·) is an embedding - otherwise the feasible set G of the problem contains
lines, and the problem is unstable - small perturbation of the objective makes the problem below
unbounded. Assuming from now on that A(·) is an embedding (as a byproduct, this assumption
P
ensures nonsingularity of F 00 (·)), we see that n ≤ Mi=1 mi (mi + 1)/2 - simply because the latter
quantity is the dimension of the space where the mapping A(·) takes its values. Thus, here,
as in the (dense) Linear Programming case, the cost of assembling the Newton system (which
P
is at least O(n2 M 2 3
i=1 mi )) dominates the cost O(n ) of solving the system, and we come to
188 CHAPTER 11. SEMIDEFINITE PROGRAMMING
N = O(Nass ). Thus, the complexity characteristics of the path-following method for solving
semidefinite programs are
M
X M
X M
X √
ϑ=m= mi ; N =)(n2 m2i + n m3i ); C = N m. (11.10)
i=1 i=1 i=1
Potential reduction approach also is immediate: Conic reformulation of the problem is given
by
minimize Tr{f y} s.t. y = A(x) ∈ Sµ+ , (11.11)
Pn
where f ∈ Sµ ”represents the objective xT c in terms of y = i=1 xi Ai ”, i.e., is such that
Logarithmically homogeneous self-concordant barrier: we already know that Sµ+ admits explicit
m-logarithmically homogeneous self-concordant barrier Φ(X) = − ln Det X with explicit Legen-
dre transformation Φ∗ (S) = Φ(−S) − m; thus, we have no conceptual difficulties with applying
the methods of Karmarkar or the primal-dual method.
Complexity: it is easily seen that the complexity characteristics of the primal-dual method
associated with the indicated barrier are given by (11.10); the characteristic C for the method of
√
Karmarkar is O( m) times worse than that one given by (11.10). Comments. One should take
into account that in the case of Semidefinite Programming, same as in the Linear Programming
case, complexity characteristics (11.10) give very poor impression of actual performance of the
algorithms. The first source of this phenomenon is that ”real-world” semidefinite programs
normally possess additional structure which was ignored in our evaluation of the arithmetic cost
of a Newton step; e.g., for the Lyapunov Stability problem (11.4) we have mi = k, i = 1, ..., M ,
k being the dimension of the state space of the system, n = O(k 2 ) (# of design variables equals
to # of free entries in a k × k symmetric matrix L). Our general considerations result in
N = O(k 6 M )
(see (11.10) and in the qualitative conclusion that the cost of a step is dominated by the cost
of assembling the Newton system. It turns out, anyhow, that the structure of our LMI’s allows
to reduce Nass to O(k 4 M ), which results in N = O(k 6 + k 4 M ); in particular, if M << k 2 , then
the cost of assembling the Newton system is negligible as compared to the cost of solving the
system.
Further, numerical experiments demonstrate that the Newton complexity of finding an ε-
solution of a semidefinite program by a long-step path-following or a potential reduction interior
√
point method normally is significantly less than its theoretical O( m) upper bound; in prac-
tice # of Newton steps looks like a moderate constant (something 30-60). Thus, Semidefinite
Programming is, basically, as computationally tractable as Linear Programming.
11.4. EXERCISES ON SEMIDEFINITE PROGRAMMING 189
G = {x | ∃u : AG (x, u) ∈ Sk+ }.
G = {x ∈ R5 | x1 , x2 , x3 , x4 ≥ 0, x5 ≤ [x1 x2 x3 x4 ]1/4 }
It is clear that a given x can be extended, by certain u, to a collection satisfying the indicated
inequalities if and only if x1 , ..., x4 are nonnegative and x5 ≤ [x1 ...x4 ]1/4 , i.e., if and only if
x ∈ G.
The relation of the introduced notion to Semidefinite Programming is clear from the following
Exercise 11.4.1 i# Let G be an SDR domain with semidefinite representation AG . Prove that
the convex program
minimize cT x s.t. x ∈ G
is equivalent to the semidefinite program
Exercise 11.4.2 # . 1) Let G+ ⊂ Rn be SDR, and let x = B(y) be an affine mapping from
Rl into Rn with the image intersecting int G+ . Prove that G = B −1 (G+ ) is SDR, and that a
semidefinite representation of G+ induces, in an explicit manner, a semidefinite representation
of G.
2) Let G = ∩m n
i=1 Gi be a closed convex domain in R , and let all Gi be SDR. Prove that
G also is SDR, and that semidefinite representations of Gi induce, in an explicit manner, a
semidefinite representation of G.
3) Let Gi ⊂ Rni be SDR, i = 1, ..., m. Prove that the direct product G = G1 ×G2 ×...×Gm is
SDR, and that semidefinite representations of Gi induce, in an explicit manner, a semidefinite
representation of G.
The above exercises demonstrate that the possibilities to pose convex problems as semidefinite
programs are limited only by our abilities to find semidefinite representations for the constraints
involved into the problem. The family of conves sets which admit explicit semidefinite repre-
sentations is surprisingly wide. Lecture 11 already gives us a number of examples which are
summarized in the following
Exercise 11.4.3 # Verify that the below sets are SDR and point out their explicit semidefinite
representations:
• half-space
• Lebesque set {x | f (x) ≤ 0} of a convex quadratic form, such that f (x) < 0 for some x
• the second order cone K 2 = {(t, x) ∈ R × Rn | t ≥ |x|2 }
• the epigraph {(t, X) ∈ R × Sk | t ≥ λmax (X)} of the masimal eigenvalue of a symmetric
k × k matrix X
Now some more examples.
Exercise 11.4.4 Prove that
A(t, x) = Diag{t − x1 , t − x2 , ..., t − xn }
is SDR for the epigraph
{(t, x) ∈ R × Rn | t ≥ xi , i = 1, ..., n}
of the function max{x1 , ..., xn }.
Exercise 11.4.5 Prove that
t xT
A(t, x) =
x X
is SDR for the epigraph
cl{(t, xi, X) ∈ R × Rn × (int Sn+ ) | t ≥ xT X −1 x}
of fractional-quadratic funtion xT X −1 x of vector x and symmetric positive semidefinite matrix
X.
Exercise 11.4.6 The above Example gives a SDR of the hypograph of the geometrical mean
[x1 ...x4 ]1/4 of four nonnegative variables. Find SDR for the hypograph of the geometrical mean
of 2l nonnegative variables.
Exercise 11.4.7 Find semidefinite representation of the epigraph
{(t, x) ∈ R2 | p ≥ (x+ )p }, x+ = max[0, x],
of the power function for
1) p = 1; 2) p = 2; 3) p = 3; 4) arbitrary integer p > 0.
11.4. EXERCISES ON SEMIDEFINITE PROGRAMMING 191
X, X 0 ∈ Sk , X ≤ X 0 → λi (X) ≤ λi (X 0 ), i = 1, ..., k.
are convex.
{(t, X) ∈ R × Sk | t ≥ Sm (X)};
in particular, Sm (x) is convex (since its epigraph is SDR and is therefore convex) monotone
function.
For an arbitrary k × k matrix X let σi (X) be the singular values of X, i.e., square roots of
the eigenvalues of the matrix X T X. In what follows we always use the descent order of singular
values:
σ1 (X) ≥ σ2 (X) ≥ ... ≥ σk (X).
Let also m
X
Σm (X) = σi (X).
i=1
The importance of singular values is seen from the following fundamental Singular Value De-
composition Theorem (which for non-symmetric matrices plays basically the same role as the
theorem that a symmetric matrix is orthogonally equivalent to a diagonal matrix):
If X is a k × k matrix with singular values σ1 ,..., σk , then there exist pair of orthonormal basises
{ei } and {fi } such that
k
X
X= σi ei fiT
i=1
4
I strongly recommend to those who do not know this characterization pay attention to it; a good (and not
difficult) exercise if to prove the characterization
192 CHAPTER 11. SEMIDEFINITE PROGRAMMING
(geometrically: the mapping x → Xx takes the coordinates of x in the basis {fi }, multiplies
them by the singular values and makes the result the coordinates of Xx in the basis {ei }).
In particular, the spectral norm of X (the quantity max|x|2 ≤1 |Xx|2 ) is nothing but the
largest singular value σ1 of X.
In the symmetric case we, of course, have ei = ±fi (plus corresponds to eigenvectors fi of X
with positive, minus - to those with negative eigenvalues).
What we are about to do is to prove that the functions Σm (X) are convex, and to find their
SDR’s. To this end we make the following important observation:
let A and B be two k × k matrices. Then the sequences of eigenvalues (counted with their
multiplicities) of the matrices AB and BA are equal (more exactly, become equal under appro-
priate reordering). The proof is immediate: we should prove that the characteristic polynomials
Det (λI − AB) and Det (λI − BA) are equal to each other. By continuouty reasons, it suffices
to establish this identity when A is nondegenerate. But then it is evident:
Det (λI − AB) = Det (A(λI − BA)A−1 ) = (Det A) Det (λI − BA)(Det (A−1 )) = Det (λI − BA).
Now we are enough equipped to construct SDR’s for sums of singular values.
Prove that the eigenvalues of this matrix are as follows: the first k of them are σ1 (X),σ2 (X),...,
σk (X), and the remaining k are −σk (X), −σk−1 (X),...,−σ1 (X). Derive from this observation
that
Σm (X) = Sm (Y (X))
and use SDR’s for Sm (·) given by Exercise 11.4.8 to get SDR’s for Σm (X).
The results stated in the exercises from this subsection play the central role in constructing
semidefinite representations for the epigraphs of functions of eigenvalues/singular values of sym-
metric/arbitrary matrices.
Hints to Exercises
k u k∗ ≡ sup uT v ≤ 1.
v∈Rk min kvk≤1
is self-concordant barrier for G (Exercise 3.3.1). Since G is bounded, F attains its minimum
on int G at certain point x∗ (V., Lecture 3). Choosing appropriate coordinates in Rn , we may
assume that F 00 (x∗ ) is the unit matrix. Now let j ∗ be the index of that one of the matrices
Fj00 (x∗ ) which has the minimal trace; eliminate j ∗ th of the inequalities and look at the Newton
P
decrement of the self-concordant function j6=j ∗ Fj (x) at x∗ .
2): we clearly can eliminate from the list of the sets Gα all elements which coincide with
the whole space, without violating boundedness of the intersection. Now, every closed convex
set which differs from the whole space is intersection of closed half-spaces, and these half-spaces
can be chosen in such a way that their interiors have the same intersection as the half-spaces
themselves. Representing all Gα as intersections of the above type, we see that the statement
in question clearly can be reduced to a similar statement with all Gα being closed half-spaces
such that the intersection of the interiors of these half-spaces is the same as the intersection of
the half-spaces themselves. Prove that if ∩α∈I Gα is bounded and nonempty, then there exists a
finite I 0 ⊂ I such that ∩α∈I 0 Gα also is bounded (and, of course, nonempty); after this is proved,
apply 1).
Exercise 3.3.5: this is an immediate consequence of II., Lecture 3.
Exercise 3.3.6: without loss of generality we may assume that ∆ = (a, 0) with some a < 0.
Choose an arbitrary x ∈ ∆ and look what are the conclusions of II., III., Lecture 3, when
y → −0.
To complete the proof of (P), note that if G differs from Rn , then the intersection of G with
certain line is a sgement ∆ with a nonempty interior which is a proper part of the line, and
choose as f the restriction of F onto ∆ (this restriction is a ϑ-self-concordant barrier for ∆ in
view of Proposition 3.1.1.(i)).
Exercise 3.3.7: note that the standard basis orths ei are recessive directions of G (see
Corollary 3.2.1) and therefore, according to the Corollary,
193
194 HINTS TO EXERCISES
To prove (3.17), combine (12.13) and the fact that D2 F (x)[ei , ei ] ≥ x−2 i , 1 ≤ i ≤ m (since
x − xi ei 6∈ int G, while the open unit Dikin ellipsoid of F centered at x is contained in int G (I.,
Lecture 2)).
To derive from (3.17) the lower bound ϑ ≥ m, note that, in view of II., Lecture 3, it should
be
ϑ ≥ DF (x)[0 − x],
while (3.17) says that the latter quantity is at least m.
Exercise 3.3.9: as it was already explained, we can reduce the situation to the case of
associated with certain r0 > 0, belongs to G; here ei are the standard basis orths. Now,
let ∆i (r) be the set of those t for which the vector x(r) − (t + r)ei belongs to G. Prove
that ∆i (r) is of the type [−ai (r), bi (r)] which contains in its interior r, and that bi (r)/r → 0,
ai (r)/r → ∞ as r → +0. Derive from these observations and the statement of Exercise 3.3.8
that −DF (x(r))[ei ]r ≥ 1 − α(r), i = 1, ..., m, with certain α(r) → 0, r → +0. To complete the
proof of (Q), apply the Semiboundedness inequality I., Lecture 3, to x = x(r) and y = 0.
Hints to Section 7.6
Exercise 7.6.7: (Pr’) could be used, but not when we intend to solve it by the primal-dual
method. Indeed, it is immediately seen that if (7.44) is solvable, i.e., in the case we actually are
interested in, the objective in (Pr’) is below unbounded, so that the problem dual to (Pr’) is
unfeasible (why?) Thus, we simply would be unable to start the method!
Hints to Section 8.5
Exercise 8.5.1: we could, of course, assume that the Legendre transformation F ∗ of F is
known; but it would be less restrictive to assume instead that the solution to the problem
is given in advance. Indeed, knowledge of F ∗ means, in particular, ability to solve ”in one
step” any equation of the type F 0 (x) = d (the solution is given by x = (F ∗ )0 (d)); thus, setting
x = (F ∗ )0 (−1020 c), we could get - in one step - the point of the path x∗ (·) associated with
t = 1020 .
Exercise 8.5.3: to get (8.34), prove by induction that
is 23 -appropriate for the domain G+ = R+ and apply Superposition rule (N) from Lecture 9.
HINTS TO EXERCISES 195
Exercise 10.5.7+ : let for a vector v the set Lv on the axis be defined as
Lv = {λ ≥ 0 | v T Rv ≤ λv T Sv}.
This is a closed convex set, and the premise of the statement we are proving says that the set
is nonempty for every v; and the statement we should prove is that all these sets have a point
in common. Of course, the proof should use the Helley Theorem; according to this theorem, all
we should prove is that
(a) Lv ∩ Lv0 6= ∅ for any pair v, v 0 ;
(b) Lv is bounded for some v.
196 HINTS TO EXERCISES
Solutions to Exercises
k
X k
X k
X X
A[ ri hi , ri hi , ..., ri hi ] = ωα (r)Aα [h1 , ..., hk ] (13.14)
i=1 i=1 i=1 α∈A
(open parentheses and take into account symmetry of A), with ωα (r) being certain polynomials
of r.
What we are asked to do is to find certain number m of vectors r1 , r2 ,...,rm and certain
weights w1 , ..., wm in such a way that when substituting r = rl into (13.14) and taking sum of
the resulting identities with the weights w1 , ..., wm , we get in the right hand side the only term
A[h1 , ..., hk ] ≡ A(1,...,1) [h1 , ..., hk ], with the unit coefficient; then the resulting identity will be the
required representation of A[h1 , ..., hk ] as a linear combination of the restriction of A[·] onto the
diagonal.
Our reformulated problem is to choose m vectors from the family
F = {ω
b (r) = (ωα (r) | α ∈ A)}r∈Rk
of Sk -dimensional vectors in such a way that certain given Sk -dimensional vectors (unit at certain
specified place, zeros at the remaining places) will be a linear combination of the selected vectors.
This for sure is possible, with m = Sk , if the linear span of vectors from F is the entire space
RSk of Sk -dimensional vectors; and we are about to prove that this is actually the case (this
will complete the proof). Assume, on contrary, that the linear span of F is a proper subspace
in RSk . Then there exists a nonzero linear functional on the space which vanishes on F, i.e.,
there exists a set of coefficients λα , not all zeros, such that
X
p(r) ≡ λα ωα (r) = 0
α∈A
k!
ωα (r) = rα1 rα2 ...rkαk .
α1 !α2 !...αk ! 1 2
197
198 SOLUTIONS TO EXERCISES
k
It follows that the partial derivative ∂ α1 r1 ∂ α2∂r2 ...∂ αk rk of p(·) is identically equal to λα ; if p ≡ 0,
then all these derivatives, and, consequently, all λα ’s, are zero, which is the desired contradiction.
Exercise 2.3.5: first of all, e1 and e2 are linearly independent since T1 6= T2 , therefore h 6=
0, q 6= 0. Let (Qx, y) = A[x, y, e3 , ..., el ]; then Q is a symmetric matrix.
Since {T1 , ..., Tl } is an extremal, we have
ω = |(Qe1 , e2 )| ≤ |ω(e+ + − − 0 0 0
1 , e2 ) − ω(e1 , e2 )| + ω k e1 kk e2 k≤
≤ ω(k e+ + − − 0 0 0
1 kk e2 k + k e1 kk e2 k) + ω k e1 kk e2 k≤
≤ ω{k e+ 2 + 2 1/2
1 k + k e2 k } {k e− 2 − 2 1/2
1 k + k e2 k } + ω 0 k e01 kk e02 k≤ ω
(we have taken into account that k e+ 2 − 2 0 2
i k + k ei k + k ei k = 1, i = 1, 2). We see that all the
inequalities in the above chain are equalities. Therefore we have
moreover, |(e+ + + + − − − − + +
1 , e2 )| =k e1 kk e2 k and |(e1 , e2 )| =k e1 kk e2 k, which means that e1 = ±e2
− −
and e1 = ±e2 . Since e1 and e2 are linearly independent, only two cases are possible:
(a) e+ + − − 0 0
1 = e2 6= 0, e1 = −e2 6= 0, e1 = e2 = 0;
+ + − − 0
(b) e1 = −e2 6= 0, e1 = e2 6= 0, e1 = e02 = 0.
In case (a) h is proportional to e+ −
1 , q is proportional to e1 , therefore
and
{Rq, Rq, T3 , ...Tl } ∈ T.
The same arguments can be used in case (b).
Exercise 2.3.6: let e ∈ T and f ∈ S be unit vectors with the angle between them being equal to
α(T ). Without loss of generality we can assume that t ≤ s (note that reordering of an extremal
leads to an extremal, since A is symmetric). By virtue of Exercise 2.3.5 in the case of α(T ) 6= 0
the collection
2t times s−t times
z }| { z }| {
0
T = {R(e + f ), ..., R(e + f ), S, ..., S }
belongs to T∗ and clearly α(T 0 ) = α(T )/2. Thus, either T∗ contains an extremal T with α(T ) =
0, or we can find a sequence {Ti ∈ T∗ } with α(Ti ) → 0. In the latter case the sequence {Ti }
contains a subsequence converging (in the natural sense) to certain collection T , which clearly
belongs to T∗ , and α(T ) = 0. Thus, T contains an extremal T with α(T ) = 0, or, which is the
same, an extremal of the type {T, ..., T }.
Solutions to Section 3.3
SOLUTIONS TO EXERCISES 199
Exercise 3.3.1: F clearly is C3 smooth on Q = int G and possesses the barrier property, i.e.,
tends to ∞ along every sequence of interior points of G converging to a boundary point. Let
x ∈ Q and h ∈ Rn . We have
Df (x)[h]
F (x) = − ln(−f (x)); DF (x)[h] = − ;
f (x)
at the point x∗ . Since the gradient of F at the point is 0, the gradient of Φ is −g; since the
n
Hessian of F at x∗ is I, the Hessian of Φ is I − Q1 ≥ (1 − m )I (the latter inequality immediately
n
follows from the fact that Q1 ≥ 0 and Tr Q1 ≤ m . We see that
n −1 n
λ2 (Φ, x∗ ) = [Φ0 (x∗ )]T [Φ00 (x∗ )]−1 Φ0 (x∗ ) = g T [Φ00 (x∗ )]−1 g ≤ |g|22 (1 − ) ≤ <1
m m−n
n
(we have used the already proved estimate |g|22 ≤ m and the fact that m > 2n). Thus, the
Newton decrement of a nondegenerate (in view of Φ (x∗ ) > 0) self-concordant barrier (in view
00
therefore Φ attains its minimum on int G+ (VII., Lecture 2). Since Φ is a nondegenerate self-
concordant barrier for G+ , the latter is possible only when G+ is bounded (V., Lecture 3).
2): as explained in Hints, we can reduce the situation to that one with Gα being closed
half-spaces such that the intersection of the interiors of these half-spaces coincides with the
intersection of the half-spaces themselves; in particular, the intersection of any finite subfamily
of the half-spaces Gα possesses a nonempty interior. Let us first prove that there exists a finite
I 0 ⊂ I such that ∩α∈I 0 Gα is bounded. Without loss of generality we may assume that 0 ∈ Gα ,
α ∈ I (since the intersection of all Gα is nonempty). Assume that for every finite subset I 0
0
of I the intersection GI = ∩α∈I 0 Gα is unbounded. Then for every R > 0 and every I 0 the
0 0
set GIR = {x ∈ GI | |x|2 = R} is a nonempty compact set; these compact sets form a nested
family and therefore their intersection is nonempty, which means that ∩α∈I Gα contains, for
every R > 0, a vector of the norm R and is therefore an unbounded set, which in fact is not the
case.
Thus, we can reduce the situation to a similar one for a finite family of closed half-spaces Gα
with the intersection of the interiors being bounded and nonempty; for this case the required
statement is given by 1).
Remark 13.0.1 I do not think that the above proof of item 1) of Exercise 3.3.2 is the simplest
one; please try to find a better proof.
Exercise 3.3.3: it is clear that F is C3 smooth on the interior of Sm
+ and possesses the barrier
property, i.e., tends to ∞ along every sequence of interior point of the cone converging to a
boundary point of it. Now, let x be an interior point of Sm
+ and h be an arbitrary direction in
the space Sm of symmetric m × m matrices, which is the embedding space of the cone. We have
F (x) = − ln Det x;
∂ ∂
DF (x)[h] = |t=0 [− ln Det (x + th)] = |t=0 [− ln Det x − ln Det (I + tx−1 h)] =
∂t ∂t
∂
|t=0 Det (I + tx−1 h)
= − ∂t = − Tr(x−1 h)
Det (I)
(to understand the concluding step, look at the matrix I + tx−1 h; its diagonal entries are
1+t[x−1 h]ii , and the entries outside the diagonal are of order of t. Representing the determinant
Q
as the sum of products, we obtain m! terms, one of them being i (1+t[x−1 h]ii ) and the remaining
being of the type tk p with k ≥ 2 and p independent of t. These latter terms no not contribute to
the derivative with respect to t at t = 0, and the contribution of the ”diagonal” term is exactly
P −1 −1
i [x h]ii = Tr(x h)).
Thus,
DF (x)[h] = − Tr(x−1 h),
whence
D2 F (x)[h, h] = Tr(x−1 hx−1 h)
(we have already met with the relation DB(x)[h] = −B(x)hB(x), B(x) ≡ x−1 ; to prove it,
differentiate the identity B(x)x ≡ I).
Differentiating the expression for D2 F , we come to
(we again have used the rule for differentiating the mapping x 7→ x−1 ). Now, x is positive
definite symmetric matrix; therefore there exists a positive semidefinite symmetric y such that
SOLUTIONS TO EXERCISES 201
x−1 = y 2 . Replacing x−1 by y and taking into account that Tr(AB) = Tr(BA), we come to the
expressions
(compare these relations with the expressions for the derivatives of the function − ln t). The
matrix ξ clearly is symmetric; expressing the traces via the eigenvalues λ1 , ..., λm of the matrix
ξ, we come to
m
X m
X m
X
DF (x)[h] = − λi ; D2 F (x)[h, h] = λ2i ; D3 F (x)[h, h, h] = −2 λ3i ,
i=1 i=1 i=1
and h i3/2
|D3 F (x)[h, h, h]| ≤ 2 D2 F (x)[h, h] .
Exercise 3.3.8: If ∆ = (−∞, 0], then the statement in question is given by Corollary 3.2.1.
From now on we assume that ∆ is finite (i.e., that a < +∞). Then f attains its minimum
on int√∆ at a unique point t∗ (V., Lecture 3), and t∗ partitiones ∆ in the ratio not exceeding
√
(ϑ+2 ϑ) : 1 (this is the centering property stated by the same V.). Thus, t∗ ≤ −a/(ϑ+2 ϑ+1);
the latter quantity is < t, since γt ∈ ∆ and therefore t ≥ −a/γ. Since t∗ < t, we have f 0 (t) > 0.
Note that we have also establish that
√
∗ (1 + ϑ)2
t/t ≤ .
γ
Let λ be the Newton decrement of a self-concordant function f at t; since f 0 (t) > 0, we have
q
λ = f 0 (t)/ f 00 (t).
Note that f 00 (t) ≥ t−2 (because the open Dikin ellipsoid of f centered at t should be contained
in int ∆ and 0 is a boundary point of ∆), and therefore
f (t) ≥ f (t∗ ) + f 0 (t∗ )(t − t∗ ) − ln(1 − πt∗ (t)) − πt∗ (t) ≡ f (t∗ ) + ρ(πt∗ (t)).
Thus, we come to
ρ(λ) ≥ ρ(πt∗ (t)),
whence √ 2
∗ ∗ (1 + ϑ)
λ ≥ πt∗ (t) ≡ |(t − t )/t | ≥ 1 − .
γ
202 SOLUTIONS TO EXERCISES
as required in (3.19). (3.20) is nothing but (3.19) applied to the restriction of F onto the
contained in G part of the line passing through x and z.
Solutions to Section 5.4
Exercise 5.5.4: let α(τ, s) be the homogeneous part of the affine mapping A. A vector
w = (r, q1 , ..., qk ) is in c + L⊥ if and only if
identically in (τ, s) with the zero sum of si , which immediately results in (5.20), (5.21).
To complete the derivation of the dual problem, we should realize what is (b, w) for w ∈
c + L⊥ . This is immediate:
k
X k
X
(b, w) = [eT r + Tr{A(e)σj }] + 2 z Tj fj ,
j=1 j=1
λj ≥ 0; λj = 0 ⇒ z j = 0; σj ≥ λ−1 T
j zj zj
λj z Tj
(these relations say exactly that the symmetric matrix qj = is positive semidefinite,
zj σj
cf. Exercise 5.5.3).
From these observations we immediately conclude that replacing in the feasible plan in
question the matrices σj by the matrices σj0 = λ−1 T
j z j z j for λj > 0 and zero matrices for λj = 0,
P
we preserve positive semidefiniteness of the updated matrices qj and ensure that j bTi σj0 bi ≤
P T
j bi σj bi ; these latter quantities were equal to ρ − ri with nonnegative ri , so that the former
ones also can be represented as ρ − ri0 with nonnegative ri0 . Thus, we may pass from a feasible
plan of (TTDd ) to another feasible plan with the same value of the objective, and with σj being
of the dyadic form λ−1 T
j z j z j ; the remaining simplifications are straightforward.
Exercise 5.5.6: as we know, K is self-dual, so that the formalism presented in Exercise 5.4.11
results in the following description of the problem dual to (π):
minimize β T η by choice of
η = (ζ, π· ) ∈ K
and real r subject to the constraint that the equality
AT η = χ + P T r,
A being the matrix of the mapping A (in our case this mapping is linear homogeneous); we have
taken into account that P T r = krp, see the description of the data of (π).
Now, using in the straightforward manner the description of the data in (π) and denoting
αij βij
πij = ,
βij γij
we can rewrite identity (13.16) as the following identity with respect to f , λ, yij and z j (in what
follows i varies from 1 to m, j varies from 1 to k):
X X Xn i X
ζ f − [2z Tj fj + V yij ] + yij αij + 2βij bTi z j + λj γij = f + r λj ,
i
i j i,j j
V ζi = αij ; (13.18)
X X
( ζi )fj = βij bi ; (13.19)
i i
X
γij = r. (13.20)
i
Expressing via equations (13.17) - (13.20) all components of η via in terms of variables φi ≡ V ζi ,
βij and r and taking into account that the condition πij ≥ 0 is equivalent to αij ≥ 0, γij ≥ 0,
2 , and eliminating in the resulting problem the variables γ by partial optimization
αij γij ≥ βij ij
with respect to these variables, we immediately come to the desired formulation of the problem
dual to (π).
Exercise 5.5.7: let (φ, β· ) be a feasible solution to (ψ), and let I be the set of indices of
nonzero φi . Then βij = 0 whenever i 6∈ I - otherwise the objective of (ψ) at the solution would
be infinite (this is our rule for interpreting fractions with zero denominators), and the solution
is assumed to be feasible. Let us fix j and consider the following optimization problem:
X X
(Pj ) : minimize vi2 φ−1
i s.t. vi bi = fj ,
i∈I i∈I
204 SOLUTIONS TO EXERCISES
vi being the control variables. The problem clearly is feasible: a feasible plan is given by
vi = βij , i ∈ I. Now, (Pj ) is a quadratic problem with nonnegative objective and linear equality
constraints; therefore it is solvable. Let βij ∗ , i ∈ I, be an optimal solution to the problem,
∗
and let βij = 0 for i 6∈ I. From the optimality conditions for (Pj ) it follows that there is an
n-dimensional vector 2xj - the vector of Lagrange multipliers for the equality constraints - such
∗ , i ∈ I, is an optimal solution to the unconstrained problem
that βij
X X
minimize vi2 φ−1 T
i + 2xj (fj − vi bi ),
i∈I i
This latter relation combined with (13.22) says that the plan (φ, β·∗ ) is the image of the feasible
plan (φ, x1 , ..., xk ) under the mapping (5.35).
What are the compliances cj associated with the plan (φ, x1 , ..., xk )? In view of (13.22) -
(13.23) we have
X X X
∗ 2 −1
cj = xTj fj = xTj ∗
βij bj = ∗
βij (xTj bj ) = [βij ] φj ;
i i∈I i∈I
∗ - an optimal plan to (P ), we come to
and since βij form a feasible, and βij j
X
2 −1
cj ≤ βij φi .
i
Thus, the value of the objective (i.e., maxj cj ) of (TTDini ) at the plan (φ, x1 , ..., xk ) does not
exceed the value of the objective of (ψ) at the plan (φ, β· ), and we are done.
Solutions to Section 6.7
Exercise 6.7.3: if the set K σ = {y ∈ K ∩ M | σ T y = 1} were bounded, the set K(σ) =
{y ∈ K ∩ M | σ T y ≤ 1} also would be bounded (since, as we know from (6.7), σ T y is positive
on M ∩ int K). From this latter fact it would follow that σ is strictly positive on the cone
K 0 = K ∩ M (see basic statements on convex cones in Lecture 5). The optimal solution x∗ is a
nonzero vector from the cone K 0 and we know that σ T x∗ = 0; this is the desired contradiction.
All remaining statements are immediate: φ is nondegenerate self-concordant barrier for K σ
(regarded as a domain in its affine hull) due to Proposition 5.3.1; Dom φ is unbounded and
therefore φ is below unbounded on its domain (V., Lecture 3); since φ is below unbounded, its
Newton decrement is ≥ 1 at any point (VIII., Lecture 2) and therefore the damped Newton
step decreases φ at least by ρ(−1) = 1 − ln 2 (V., Lecture 2).
Exercise 6.7.5: 1) is an immediate consequence of III.. To prove 2), note that (S, χ∗ ) = 0
for certain positive semidefinite χ∗ = I − δ with δ ∈ Π (IVb.). Since (S, I) = 1 (III.), we have
(δ, S) = 1; since η is the orthoprojection of S onto Π and δ ∈ Π, we have (δ, η) = (δ, S), whence
(δ, η) = 1. Now, (η, I) = 0 (recall that η ∈ Π and Π is contained in the subspace of matrices
with zero trace, see II.). Thus, we come to (I − δ, η) ≡ (χ∗ , η) = −1. Writing down the latter
relation in the eigenbasis of η, we come to
n
X
χi gi = −1,
i=1
SOLUTIONS TO EXERCISES 205
χi being the diagonal entries of χ∗ with respect to the basis; since χi ≥ 0 (recall that χ∗ is
P
positive semidefinite) and i χ∗i = n (see IVb.), we conclude that maxi |gi | ≥ n−1 .
Exercise 6.7.6: one clearly has τ ∈ T , and, consequently, τ ∈ Dom φ. We have
X
φ(0) − φ(τ ) = ln(1 − τ gi ) − n ln(1 − τ |g|22 ) ≥
i
X ∞ X
X
≥ ln(1 − τ gi ) + nτ |g|22 = j −1 (−τ gi )j + nτ |g|22 =
i j=1 i
P
[since i gi = 0, see Exercise 6.7.5, 1)]
∞ X
X
= j −1 (−τ gi )j + nτ |g|22 ≥
j=2 i
∞
X
≥− j −1 [τ |g|2 ]2 [τ |g|∞ ]j−2 + nτ |g|22 =
j=2
∞
|g|22 X
=− j −1 (τ |g|∞ )j + nτ |g|22 =
|g|2∞ j=2
|g|22
= [ln(1 − τ |g|∞ ) + τ |g|∞ ] + nτ |g|22 .
|g|2∞
Substituting into the resulting lower bound for φ(0)−φ(τ ) the value of τ indicated in the exercise,
we come to the lower bound
|g|22
α≥ [n|g|∞ − ln(1 + n|g|∞ )] ;
|g|2∞
[Exercise 8.5.3]
∞
X b i}
0 Tr{h b = v −1/2 dvv −1/2 .
= f (0) + f (0)r + (−1)i ri , h
i=2
i
In view of (8.35) the absolute values of the coefficients in the latter series are bounded from
b 2 |h|
above by i−1 |h| b i−2 , so that the series converges (and, consequently, represents f - recall that
2 ∞
f is analytic on its domain) when r < ζ(v, dv) ≡ |h|−1 ∞ (see (8.33). It follows that the reminder
for the aforementioned r is bounded from above by the series
b2 X∞
|h|2 b ∞ )i ,
i−1 (r|h|
b2
|h|∞ i=j+1
so that (u, du) indeed is an arrow; by construction, (s, ds) is the conjugate to (u, du) co-arrow.
It remain to note that by definition of ∆ and due to the normalization |dx|F 00 (x) = 1 we have
Exercise 8.5.7: by Lemma 8.3.1 and Proposition 8.3.1, the upper bound v(r) for the residual
Ft+dt (x+dx(dt))−miny Ft+dt (y) is bounded from above by the reminder ρ∗ (r) in the third order
Taylor expansion of the function Φ(u+rdu(dt))+Φ∗ (s+rds(dt)); here dt is an arbitrary positive
scale factor, and we are in our right to choose dt in a way which ensures that |dx(dt)|F 00 (x) = 1;
with this normalization, Ω = Ω(x) will be exactly the quantity δt/dt, where δt is the stepsize
given by the linesearch. The quantity Ω is therefore such that v(Ω) = O(1) (since we use
linesearch to get the largest r which results in v(r) ≤ κ); consequently, ρ∗ (Ω) ≥ O(1). On the
other hand, in view of Exercises 8.5.5 and 8.5.6, ρ∗ (r) is exactly R3(u,du) (r); combining (8.37)
and the inequality ρ∗ (Ω) ≥ O(1), we come to
ρ3 (Ω/∆) ≥ O(1)∆−2 .
the resulting inequality for sure is true if Ω/∆ > 1/2, since, as we know, ∆ ≥ 1.
Solutions to Section 9.6
Exercise 9.6.1:
1): the ”general” part is an immediate consequence of the Substitution rule (N) as applied
to the mapping T
x x
B : (t, x) 7→ [G− = R × Rn ]
t
which is 1-appropriate for G+ in view of Proposition 9.3.1.
SOLUTIONS TO EXERCISES 207
G+ = {(u, s) | u ≤ s2/p , s ≥ 0}
and the 2-self-concordant barrier F + (u, s) = − ln(s2/p − u) − ln s for G+ , see Example 9.2.1.
2): the ”general” part is an immediate consequence of the Substitution rule (N) applied to
the mapping xT x
B : (t, x) 7→ t [G− = R+ × Rn ]
t
which is appropriate for G+ in view of Proposition 9.3.1.
The ”particular” part is given by the general one as applied to
F + (u, s) = − ln(s2/p−1 − u) − ln s
F − (t, x) = − ln t
• G+ = Sm +
+ , F (τ ) = − ln Det τ ;
1 Pq
• Q[ξ 0 , ξ 00 ] = 2
0 T 00
j=1 [(ξj ) ξj + (ξj00 )T ξj0 ],
ξ = (ξ1 , ..., ξq );
• q = k, n1 = ... = nk = m;
k
X
η ≥ 0; τ − ηj−1 ξjT ξj ≥ 0.
j=1
The cone G+ is the inverse image of the ”huge” cone K under the linear mapping
τ = Diag{s1 , ..., sm }
(si ; tij ; rj ) 7→ ξj = Diag{t1j , ..., tmj }I ,
ηj = rj
208 SOLUTIONS TO EXERCISES
and Φ is nothing but the superposition of the barrier F for K given by the result of Exercise
9.6.3 and this mapping.
Exercise 9.6.5: let us compute derivatives of A at a point u = (t, y) ∈ int G− in a direction
du = (dt, dy) such that u ± du ∈ G− ; what we should prove is that
and that
D3 A(u)[du, du, du] ≤ −3D2 A(u)[du, du]. (13.25)
Pp k
Let us set ηi = dyi /yi , σk = i=1 ηi , so that, in the clear notation, dσk = −kσk+1 , and let
φ(t, y) = (y1 ...yp )1/p . We have
D2 A(t, y)[du, du] = p−2 σ12 φ(t, y) − p−1 σ2 A(t, y) = p−2 [σ12 − pσ2 ]φ(t, y),
D3 A(t, y)[du, du, du] = −p−2 [2σ1 σ2 − 2pσ3 ]φ(t, y) + p−3 σ1 [σ22 − pσ2 ].
Now, let
λ = p−1 σ1 , αi = ηi − λ.
We clearly have
p
X p
X p
X p
X p
X
σ1 = pλ; σ2 = ηi2 2
= pλ + αi2 ; σ3 = ηi3 3
= pλ + 3λ αi2 + αi3 . (13.26)
i=1 i=1 i=1 i=1 i=1
Substituting these expressions for σk in the expressions for the second and the third derivative
of A, we come to
p
X
d2 ≡ −D2 A(t, y)[du, du] = p−1 φ(t, y) αi2 ≥ 0, (13.27)
i=1
p
X
−p−1 φ(u)λ αi2 =
i=1
p
X X p
X p
3 2 3 2
= φ(u)λ αi2 + φ(u) αi3 = φ(u) [λ + αi ]αi2 =
p i=1
p i=1
p i=1
3
X 1 p
3 2
= φ(u) [ λ + ηi ]αi2 . (13.28)
p i=1
3 3
Now, the inclusion u ± du ∈ G− means exactly that −1 ≤ ηi ≤ 1, i = 1, ..., p, whence also
|λ| ≤ 1; therefore | 13 λ + 23 ηi | ≤ 1, and comparing (13.28) and (13.27), we come to (13.25).
Exercise 9.6.6: The mapping B(·) is the superposition A(L(·)) of the mapping
is β-appropriate for G+ .
Thus, our particular B indeed is 1-appropriate with R+ ; the remaining claims of the Exercise
are given by Theorem 9.1.1 applied with F + (z) = − ln z and F − (τ, ξ, η) = − ln τ − ln η.
Solutions to Section 10.5
Exercise 10.5.2: what we should prove is that GO is convex and that the solutions to (Outer’)
are exactly the minimum volume ellipsoids which contain Q.
To prove convexity, assume that (r0 , x0 , X 0 ) and (r00 , x00 , X 00 ) are two points of G0 , λ ∈ [0, 1]
and (r, x, X) = λ(r0 , x0 , X 0 ) + (1 − λ)(r00 , x00 , X 00 ); we should prove that (r, x, X) ∈ G0 . Indeed,
by the definition of G0 we have for all u ∈ Q
uT Xu + 2xT u + r ≤ 0.
Thus, the points of Q indeed satisfy the quadratic inequality associated with (r, x, X); since
X clearly is symmetric positive definite and Q possesses a nonempty interior, this quadratic
inequality does define an ellipsoid, and, as we have seen, this ellipsoid E(r, x, X) contains Q. It
remains to prove that the triple (r, x, X) satisfies the normalizing condition δ(r, x, X) ≤ 1; but
this is an immediate consequence of convexity of the function xT X −1 x − r on the set (r, x, X)
with X ∈ int Sn+ (see the section on the fractional-quadratic mapping in Lecture 9).
It remains to prove that optimal solutions to (Outer’) represent exactly minimum volume
ellipsoids which cover Q. Indeed, let (r, x, X) be a feasible solution to (Outer’) with finite value
of the objective. I claim that δ(r, x, X) > 0. Indeed, X is positive definite (since it is in Sn+ and
F is finite at X), therefore the set E(r, x, X) is empty, a point or an ellipsoid, depending on
whether δ(r, x, X) is negative, zero or positive; since (r, x, X) ∈ GO , the set E(r, x, X) contains
Q, and is therefore neither empty nor a point (since int Q 6= ∅), so that δ(r, x, X) must be
positive. Thus, feasible solutions (r, x, X) to (Outer’) with finite value of the objective are such
that the sets E(r, x, X) are ellipsoids containing Q; it is immediately seen that every ellipsoid
with the latter property comes from certain feasible solution to (Outer’). Note that the objective
in (Outer’) is ”almost” (monotone transformation of) the objective in (Outer):
n 1
ln Vol(E(r, x, X)) = ln κn + ln δ(r, x, X) − ln Det X,
2 2
and the objective in (Outer’) is F (X) = − ln Det X. We conclude that (Outer) is equivalent to
the problem (Outer”) which is obtained from (Outer’) by replacing the inequality δ(r, x, X) ≤ 1
with the equation δ(r, x, X) = 1. But this is immediate: if (r, x, X) is a feasible solution
210 SOLUTIONS TO EXERCISES
to (Outer’) with finite value of the objective, then, as we know, δ(r, x, X) > 0; setting γ =
δ −1 (r, x, X) and (r0 , x0 , X 0 ) = γ(r, x, X), we come to E(r, x, X) = E(r0 , x0 , X 0 ), δ(r0 , x0 , X 0 ) = 1,
so that (r0 , x0 , X 0 ) ∈ GO , and F (X 0 ) = F (X) + n ln γ ≤ F (X). From this latter observation it
immediately follows that (Outer’) is equivalent to (Outer”), and this latter problem, as we just
have seen, is nothing but (Outer).
Exercise 10.5.4: to prove that A is 23 -appropriate for G+ , note that a direct computation
says that for positive definite symmetric X and any (dt, dX) one has
− ln Det X ≤ t
− ln(1 + r − xT X −1 x) − ln Det X
on R2 , and let
a d α δ
S= , R=
d b δ β
be the matrices of these forms. What we should prove is that there exists nonnegative λ such
that
α ≤ λa, β ≤ λb. (13.29)
The following four cases are possible:
Case A: a > 0, b > 0. In this case (13.29) is valid for all large enough positive λ.
Case B: a ≤ 0, b ≤ 0. Since a = v T Sv and α = v T Rv, in the case of a ≤ 0 we have also
α ≤ 0 (this is given by (Impl)). Similarly, b ≤ 0 ⇒ β ≤ 0. Thus, in the case in question α ≤ 0,
β ≤ 0, and (13.29) is satisfied by λ = 0.
Case C: a ≤ 0, b > 0; Case D: a > 0, b ≤ 0. These are the only nontrivial cases which we
should consider; due to the symmetry, we may restrict ourselves with the case C only. Thus,
from now on a ≤ 0, b > 0.
20 . Assume (case C.1) that a < 0. Then the determinant ab − d2 of the matrix S is negative,
so that inappropriate
coordinates
p0 , q 0 on the plane the matrix S 0 of the quadratic form S[·]
1 0 ξ ζ
becomes . Let be the matrix of the form R[·] in the coordinates p0 , q 0 .
0 −1 ζ −η
(Impl) says to us that for any 2-dimensional vector z = (p0 , q 0 )T we have
The premise in this implication is satisfied by z = (0, 1)T , z = (1, 1)T and z = (1, −1)T , and the
conclusion of it says to us that η ≥ 0, ξ − η ± 2ζ ≤ 0, whence
η ≥ 0; η − ξ ≥ 2|ζ|. (13.31)
η+ξ
λ=
2
is nonnegative. Then the matrix
η−ξ
0 0 2 −ζ
λS − R = η−ξ
−ζ 2
λS − R
p
for all small enough positive
ε. Passing to limit as ε → 0, we come to |ζ| ≤ η|ξ|. Thus, in
−|ξ| ζ
the case in question R0 = is 2 × 2 matrix with nonpositive diagonal entries and
ζ −|η|
nonnegative determinant; consequently, this matrix is negative semidefinite, so that R also is
negative semidefinite, and (13.29) is satisfied by λ = 0.
30 . It remains to consider the case C.2 when b > 0, a = 0. Here we have a = v T Sv = 0, so
that α = v T Rv ≤ 0 by (Impl). Since b > 0, (13.29) is satisfied for all large enough positive λ.
Solutions to Section 11.4
Exercise 11.4.8: we should prove that
(i) if Am (t, X; τ, U ) is positive semidefinite, then Sm (X) ≤ t;
(ii) if Sm (X) ≤ t, then there exist τ and U such that Am (t, X; τ, U ) is positive semidefinite.
Let us start with (i). Due to construction of Am (·), both matrices τ I + U − X and U are
positive semidefinite; in particular, X ≤ τ I +U , whence, due to monotonicity of Sm (·), Sm (X) ≤
Sm (τ I +U ), The latter quantity clearly is mτ +Sm (U ) ≤ mτ +Tr U . Thus, Sm (X) ≤ mτ +Tr U ,
while t ≥ mτ + Tr U , again due to the construction of Am (·), Thus, Sm (X) ≤ t, as required.
To prove (ii), let us denote by λ1 ≥ ... ≥ λk the eigenvalues of X, and let U have the same
eigenvectors as X and the eigenvalues
Set also τ = λm . The U is positive senidefinite, while τ I + U − X is the matrix with the
eigenvalues
0, 0, ..., 0, λm − λm+1 , ..., λm − λk ,
so that it also is positive semidefinite. At the same time mτ + Tr U = Sm (X) ≤ t, so that
Am (t, X; τ, U ) is positive semidefinite.
Exercise 11.4.9: let λi , i = 1, ..., 2k, be the eigenvalues
of Y (X),and σ1 , ..., σk be the singular
2 XX T 0
values of X. It is immediately seen that Y (X) = . We know that the sequence
0 XT X
of eigenvalues of XX T is the sasme of sequence of eigenvalues of X T X, and the latter sequence
is the same as the sequence of squared eigenvalues of X, by definition of the singular values.
Since Y 2 (X) is block diagonal with diagonal blocks XX T and X T X, and both blocks have the
same seqeunces of eigenvalues, to get the sequence of eigenvalues of Y 2 (X), you should twicen
the multiplicity of each eigenvalue of X T X. Thus, the sequence of eigenvalues of Y 2 (X) is
On the other hand, the sequence of eigenvalues of Y 2 (x) is comprised of (possibly, reordered)
squared eigenvalues of Y (x). Thus, the sequence
differs from (I) only by order. To derive from this intermediate conclusion the statement in
question, it suffices to prove that if certain λ 6= 0 is an eigenvalue of Y (X) of certain multiplicity
s, then −λ also is an eigenvalue of the same multiplicity s. But this is simple. Let L be the
eigenspace
of Y (X) associated with the eigenvalue λ. In other words, L is comprised of all
u
vectors , u, v ∈ Rk , for which
v
Xv u
=λ . (13.32)
XT u v
SOLUTIONS TO EXERCISES 213