Interior Point

Download as pdf or txt
Download as pdf or txt
You are on page 1of 213

GEORGIA INSTITUTE OF TECHNOLOGY

SCHOOL OF INDUSTRIAL AND SYSTEMS ENGINEERING

LECTURE NOTES

INTERIOR POINT
POLYNOMIAL TIME METHODS
IN CONVEX PROGRAMMING
ISYE 8813

Arkadi Nemirovski
On sabbatical leave from Technion – Israel Institute of Technology

Spring Semester 2004


2

Interior Point Polynomial Methods in Convex Programming


Goals. During the last decade the area of interior point polynomial methods (started in
1984 when N. Karmarkar invented his famous algorithm for Linear Programming) became one
of the dominating fields, or even the dominating field, of theoretical and computational activity
in Convex Optimization. The goal of the course is to present a general theory of interior point
polynomial algorithms in Convex Programming. The theory allows to explain all known methods
of this type and to extend them from the initial area of interior point technique - Linear and
Quadratic Programming - onto a wide variety of essentially nonlinear classes of convex programs.
We present in a self-contained manner the basic theory along with its applications to several
important classes of convex programs (LP, QP, Quadratically constrained Quadratic program-
ming, Geometrical programming, Eigenvalue problems, etc.)
The course follows the recent book
Yu. Nesterov, A. Nemirovski Interior-Point Polynomial Algorithms in Convex Programming
SIAM Studies in Applied Mathematics, 1994
Prerequisites for the course are the standard Calculus and the most elementary parts of
Convex Analysis.
Duration: one semester, 2 hours weekly

Contents:
Introduction: what the course is about
Developing Tools, I: self-concordant functions, self-concordant barriers and the Newton method
Interior Point Polynomial methods, I: the path-following scheme
Developing Tools, II: Conic Duality
Interior Point Polynomial methods, II: the potential reduction scheme
Developing Tools, III: how to construct self-concordant barriers
Applications:
Linear and Quadratic Programming
Quadratically Constrained Quadratic Problems
Geometrical Programming
Semidefinite Programming
3

About Exercises

The majority of Lectures are accompanied by the ”Exercise” sections. In several


cases, the exercises relate to the lecture where they are placed; sometimes they
prepare the reader to the next lecture.
The mark ∗ at the word ”Exercise” or at an item of an exercise means that you
may use hints given in Appendix ”Hints”. A hint, in turn, may refer you to the
solution of the exercise given in the Appendix ”Solutions”; this is denoted by the
mark + . Some exercises are marked by + rather than by ∗ ; this refers you directly
to the solution of an exercise.
Exercises marked by # are closely related to the lecture where they are placed;
it would be a good thing to solve such an exercise or at least to become acquainted
with its solution (if one is given).
Exercises which I find difficult are marked with > .
The exercises, usually, are not that simple. They in no sense are obligatory, and
the reader is not expected to solve all or even the majority of the exercises. Those
who would like to work on the solutions should take into account that the order
of exercises is important: a problem which could cause serious difficulties as it is
becomes much simpler in the context (at least I hope so).
4
Contents

1 Introduction to the Course 9


1.1 Some history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 The goal: poynomial time methods . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 The path-following scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 What is inside: self-concordance . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5 Structure of the course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2 Self-concordant functions 19
2.1 Examples and elementary combination rules . . . . . . . . . . . . . . . . . . . . . 20
2.2 Properties of self-concordant functions . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Exercises: Around Symmetric Forms . . . . . . . . . . . . . . . . . . . . . . . . . 36

3 Self-concordant barriers 41
3.1 Definition, examples and combination rules . . . . . . . . . . . . . . . . . . . . . 41
3.2 Properties of self-concordant barriers . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3 Exercises: Self-concordant barriers . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4 Basic path-following method 53


4.1 Situation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 F -generated path-following method . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3 Basic path-following scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.4 Convergence and complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.5 Initialization and two-phase path-following method . . . . . . . . . . . . . . . . . 59
4.6 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.7 Exercises: Basic path-following method . . . . . . . . . . . . . . . . . . . . . . . 65

5 Conic problems and Conic Duality 67


5.1 Conic problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 Conic duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2.1 Fenchel dual to (P) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2.2 Duality relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.3 Logarithmically homogeneous barriers . . . . . . . . . . . . . . . . . . . . . . . . 72
5.4 Exercises: Conic problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.4.1 Basic properties of cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.4.2 More on conic duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.4.3 Complementary slackness: what it means? . . . . . . . . . . . . . . . . . . 77
5.4.4 Conic duality: equivalent form . . . . . . . . . . . . . . . . . . . . . . . . 79
5.5 Exercises: Truss Topology Design via Conic duality . . . . . . . . . . . . . . . . . 81

5
6 CONTENTS

6 The method of Karmarkar 93


6.1 Problem setting and assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.2 Homogeneous form of the problem . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.3 The Karmarkar potential function . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.4 The Karmarkar updating scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.5 Overall complexity of the method . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.6 How to implement the method of Karmarkar . . . . . . . . . . . . . . . . . . . . 101
6.7 Exercises on the method of Karmarkar . . . . . . . . . . . . . . . . . . . . . . . . 103

7 The Primal-Dual potential reduction method 109


7.1 The idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.2 Primal-dual potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.3 The primal-dual updating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
7.4 Overall complexity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
7.5 Large step strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
7.6 Exercises: Primal-Dual method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.6.1 Example: Lyapunov Stability Analysis . . . . . . . . . . . . . . . . . . . . 123

8 Long-Step Path-Following Methods 127


8.1 The predictor-corrector scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.2 Dual bounds and Dual search line . . . . . . . . . . . . . . . . . . . . . . . . . . 129
8.3 Acceptable steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
8.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
8.5 Exercises: Long-Step Path-Following methods . . . . . . . . . . . . . . . . . . . . 137

9 How to construct self-concordant barriers 143


9.1 Appropriate mappings and Main Theorem . . . . . . . . . . . . . . . . . . . . . . 145
9.2 Barriers for epigraphs of functions of one variable . . . . . . . . . . . . . . . . . . 146
9.3 Fractional-Quadratic Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
9.4 Proof of Theorem 10.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
9.5 Proof of Proposition 10.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
9.6 Exercises on constructing self-concordant barriers . . . . . . . . . . . . . . . . . . 157
9.6.1 Epigraphs of functions of Euclidean norm . . . . . . . . . . . . . . . . . . 157
9.6.2 How to guess that − ln Det x is a self-concordant barrier . . . . . . . . . . 157
9.6.3 ”Fractional-quadratic” cone and Truss Topology Design . . . . . . . . . . 158
9.6.4 Geometrical mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

10 Applications in Convex Programming 163


10.1 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
10.2 Quadratically Constrained Quadratic Programming . . . . . . . . . . . . . . . . . 166
10.3 Approximation in Lp norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
10.4 Geometrical Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
10.5 Exercises on applications of interior point methods . . . . . . . . . . . . . . . . . 172
10.5.1 (Inner) and (Outer) as convex programs . . . . . . . . . . . . . . . . . . . 173
10.5.2 Problem (Inner), polyhedral case . . . . . . . . . . . . . . . . . . . . . . . 175
10.5.3 Problem (Outer), polyhedral case . . . . . . . . . . . . . . . . . . . . . . . 176
10.5.4 Problem (Outer), ellipsoidal case . . . . . . . . . . . . . . . . . . . . . . . 176
CONTENTS 7

11 Semidefinite Programming 179


11.1 A Semidefinite program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
11.2 Semidefinite Programming: examples . . . . . . . . . . . . . . . . . . . . . . . . . 179
11.2.1 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
11.2.2 Quadratically Constrained Quadratic Programming . . . . . . . . . . . . 180
11.2.3 Minimization of Largest Eigenvalue and Lovasz Capacity of a graph . . . 180
11.2.4 Dual bounds in Boolean Programming . . . . . . . . . . . . . . . . . . . . 182
11.2.5 Problems arising in Control . . . . . . . . . . . . . . . . . . . . . . . . . . 183
11.3 Interior point methods for Semidefinite Programming . . . . . . . . . . . . . . . 186
11.4 Exercises on Semidefinite Programming . . . . . . . . . . . . . . . . . . . . . . . 189
11.4.1 Sums of eigenvalues and singular values . . . . . . . . . . . . . . . . . . . 191

Hints to exercises 193

Solutions to exercises 197


8 CONTENTS
Chapter 1

Introduction to the Course

What we are about to study in this semester are the theory and the applications of interior
point polynomial time methods in Convex Programming. Today, in the introductory lecture, I
am not going to prove theorems and present algorithms. My goal is to explain what the course
is about, what are the interior point methods and why so many researchers and practitioners
are now deeply involved in this new area.

1.1 Some history


The modern theory of polynomial time interior point methods takes its origin in the seminal
paper of Narendra Karmarkar published in 1984. Now, after 10 years, there are hundreds of
researchers working in the area, and thousands of papers and preprints on the subject. The
electronic bibliography on interior point methods collected and maintained by Dr. Eberhard
Kranich, although far from being complete, contains now over 1,500 entries. For Optimization
Community which covers not so many people, this is a tremendous concentration of effort in a
single area, for sure incomparable with all happened in the previous years.
Although to the moment the majority of the papers on interior point methods deal with
the theoretical issues, the practical yield also is very remarkable. It suffices to say that the
Karmarkar algorithm for Linear Programming was used as the working horse for the US Army
logistic planning (i.e., planning of all kinds of supplies) in the Gulf War. Another interior point
method for Linear Programming, the so called primal-dual one, forms the nucleus of an extremely
efficient and very popular now software package OSL2. Let me present you a citation from G.
Dantzig: ”At the present time (1990), interior algorithms are in open competition with variants
of the simplex methods”1) . It means something when new-borned methods can be competitive
against an extremely powerful and polished for almost 50 years by thousands of people Simplex
method.
Now let me switch from the style of advertisements to the normal one. What actually
happened in 1984, was the appearance of a new iterative polynomial-time algorithm for Linear
Programming. We already know what does it mean ”a polynomial time algorithm for LP”
- recall the lecture about the Ellipsoid method and the Khachiyan theorem on polynomial
solvability of LP. As we remember, Khachiyan proved in 1979 that Linear Programming is
polynomially solvable, namely, that an LP problem with rational coefficients, m inequality
constraints and n variables can be solved exactly in O(n3 (n + m)L) arithmetic operations,
L being the input length of the problem, i.e., the total binary length of the numerical data
1
History of Mathematica Programming, J.K. Lenstra. A.H.G. Rinnooy Kan, A. Schrijver, Eds. CWI, North-
Holland, 1991

9
10 CHAPTER 1. INTRODUCTION TO THE COURSE

specifying the problem instance. The new method of Karmarkar possessed the complexity bound
of O(m3/2 n2 L) operations. In the standard for the complexity analysis case of more or less
”square” problems m = O(n) the former estimate becomes O(n4 L), the latter O(n3.5 L). Thus,
there was some progress in the complexity. And it can be said for sure that neither this moderate
progress, nor remarkable elegance of the new algorithm never could cause the revolution in
Optimization. What indeed was a sensation, what inspired extremely intensive activity in the
new area and in a few years resulted in significant theoretical and computational progress, was
the claim that the new algorithm in real-world computations was by order of magnitudes more
efficient than the Simplex method. Let me explain you why this was a sensation. It is known
that the Simplex method is not polynomial: there exist bad problem instances where the number
of pivotings grows exponentially with the dimension of the instance. Thus, any polynomial time
algorithm for LP, the Ellipsoid one, the method of Karmarkar or whatever else, for sure is
incomparably better in its worst-case behaviour than the Simplex. But this is the theoretical
worst-case behaviour which, as is demonstrated by almost 50-year practice, never occurs in real-
world applications; from the practical viewpoint, the Simplex method is an extremely efficient
algorithm with fairy low empirical complexity; this is why the method is able to solve very large-
scale real world LP problems in reasonable time. In contrast to this, the Ellipsoid method works
more or less in accordance with its theoretical worst-case complexity bound, so that in practical
computations this ”theoretically good” method is by far dominated by the Simplex even on
very small problems with tens of variables and constraints. If the method of Karmarkar would
also behave itself according to its theoretical complexity bound, it would be only slightly better
then the Ellipsoid method and still would be incomparably worse than the Simplex. The point,
anyhow, is that actual behaviour of the method of Karmarkar turned out to be much better than
it is said by the worst-case theoretical complexity bound. This phenomenon combined with the
theoretical advantages of a polynomial time algorithm, not the latter advantages alone, (same
as, I believe, not the empirical behaviour of the method alone), inspired an actual revolution in
optimization which continues up today and hardly will terminate in the nearest future.
I have said something about the birth of the ”interior point science”. As it often happens in
our field, later it turned out that this was the second birth; the first one was in 1967 in Russia,
where Ilya Dikin, then the Ph.D. student of Leonid Kantorovich, invented what is now called
the affine scaling algorithm for LP. This algorithm which hardly is theoretically polynomial, is
certain simplification of the method of Karmarkar which shares all practical advantages of the
basic Karmarkar algorithm; thus, as a computational tool, interior point methods exist at least
since 1967. A good question is why this computational tool which is in extreme fashion now was
completely overlooked in the West, same as in Russia. I think that this happened due to two
reasons: first, Dikin came too early, when there was no interest to iterative procedures for LP - a
new-borned iterative procedure, even of a great potential, hardly could overcome as a practical
tool perfectly polished Simplex method, and the theoretical complexity issues in these years did
not bother optimization people (even today we do not know whether the theoretical complexity
of the Dikin algorithm is better than that one of the Simplex; and in 1967 the question itself
hardly could occur). Second, the Dikin algorithm appeared in Russia, where there were neither
hardware base for Dikin to perform large-scale tests of his algorithm, nor ”social demand” for
solving large-scale LP problems, so it was almost impossible to realize the practical potential of
the new algorithm and to convince people in Russia, not speaking about the West, that this is
something which worths attention.
Thus, although the prehistory of the interior point technique for LP started in 1967, the
actual history of this subject started only in 1984. It would be impossible to outline numerous
significant contributions to the field done since then; it would require mentioning tens, if not
hundreds, of authors. There is, anyhow, one contribution which must be indicated explicitly.
1.2. THE GOAL: POYNOMIAL TIME METHODS 11

I mean the second cornerstone of the subject, the paper of James Renegar (1986) where the
first path-following polynomial time interior point method for LP was developed. The efficiency
estimate of this method was better than that one of the method of Karmarkar, namely, O(n3 L)
2) - cubic in the dimension, same as for classical methods of solving systems of linear equations;

up to now this is the best known theoretical complexity bound for LP. Besides this remarkable
theoretical advantage, the method of Renegar possesses an important advantage in, let me say,
the human dimension: the method belongs to a quite classical and well-known in Optimization
scheme, in contrast to rather unusual Ellipsoid and Karmarkar algorithms. The paper of Renegar
was extremely important for the understanding of the new methods and it, same as a little bit
later independent paper of Clovis Gonzaga with close result, brought the area in the position
very favourable for future developments.
To the moment I was speaking about interior point methods for Linear Programming, and
this reflects the actual history of the subject: not only the first interior point methods vere
developed for this case, but till the very last years the main activity, both theoretical and com-
putational, in the field was focused on Linear Programming and the very close to it Linearly
constrained Quadratic Programming. To extend the approach to more general classes of prob-
lems, it was actually a challenge: the original constructions and proofs heavily exploited the
polyhedral structure of the feasible domain of an LP problem, and in order to pass to the non-
linear case, it required to realize what is the deep intrinsic nature of the methods. This latter
problem was solved in a series of papers of Yurii Nesterov in 1988; the ideas of these papers form
the basis of the theory the course is devoted to, the theory which now has became a kind of
standard for unified explanation and development of polynomial time interior point algorithms
for convex problems, both linear and nonlinear. To present this theory and its applications, this
is the goal of my course. In the remaining part of this introductory lecture I am going to explain
what we are looking for and what will be our general strategy.

1.2 The goal: poynomial time methods


I have declared that the purpose of the theory to be presented is developing of polynomial time
algorithms for convex problems. Let me start with explaining what a polynomial time method
is. Consider a family of convex problems

(p) : minimize f (x) s.t. gj (x) ≤ 0, i = 1, ..., m, x ∈ G

of a given analytical structure, like the family of LP problems, or Linearly constrained Quadratic
problems, or Quadratically constrained Quadratic ones, etc. The only formal assumption on the
family is that a problem instance p from it is identified by a finite-dimensional data vector D(p);
normally you can understand this vector as the collection of the numeric coefficients in analytical
expressions for the objective and the constraints; these expressions themselves are fixed by the
description of the family. The dimension of the data vector is called the size l(p) of the problem
instance. A numerical method for solving problems from the family is a routine which, given
on input the data vector, generates a sequence of approximate solutions to the problem in
such a way that every of these solutions is obtained in finitely many operations of precise real
arithmetic, like the four arithmetic operations, taking square roots, exponents, logarithms and
other elementary functions; each operand in an operation is either an entry of the data vector,
or the result of one of the preceding operations. We call a numerical method convergent, if, for
any positive ε and for any problem instance p from the family, the approximate solutions xi
2
recall that we are speaking about ”almost square” problems with the number of inequalities m being of order
of the number of variables n
12 CHAPTER 1. INTRODUCTION TO THE COURSE

generated by the method, starting with certain i = i∗ (ε, p), are ε-solutions to the problem, i.e.,
they belong to G and satisfy the relations

f (xi ) − f ∗ ≤ ε, gj (xi ) ≤ ε, j = 1, ..., m,

(f ∗ is the optimal value in the problem). We call a method polynomial, if it is convergent and
the arithmetic cost C(ε, p) of ε-solution, i.e., the total number of arithmetic operations at the
first i∗ (ε, p) steps of the method as applied to p, admits an upper bound as follows:
 
V(p)
C(ε, p) ≤ π(l(p)) ln ,
ε

where π is certain polynomial independent on the data and V(p) is certain data-dependent scale
factor. The ratio V(p)/ε can be interpreted as the relative accuracy which corresponds to the
absolute accuracy ε, and the quantity ln( V(p) ε ) can be thought of as the number of accuracy
digits in ε-solution. With this interpretation, the polynomiality of a method means that for this
method the arithmetic cost of an accuracy digit is bounded from above by a polynomial of the
problem size, and this polynomial can be thought of as the characteristic of the complexity of
the method.
It is reasonable to compare this approach with the information-based approach we dealt with
in the previous course. In the information-based complexity theory the problem was assumed
to be represented by an oracle, by a black box, so that a method, starting its work, had no
information on the instance; this information was accumulated via sequential calls to the oracle,
and the number of these calls sufficient to find an ε-solution was thought of as the complexity of
the method; we did not include in this complexity neither the computational effort of the oracle,
nor the arithmetic cost of processing the answers of the oracle by the method. In contrast to this,
in our now approach the data specifying the problem instance form the input to the method, so
that the method from the very beginning possesses complete global information on the problem
instance. What the method should do is to transform this input information into ε-solution
to the problem, and the complexity of the method (which now might be called algorithmic or
combinatorial complexity) is defined by the arithmetic cost of this transformation. It is clear
that our new approach is not as general as the information-based one, since now we can speak
only on families of problems of a reasonable analytic structure (otherwise the notion of the
data vector becomes senseless). As a compensation, the combinatorial complexity is much more
adequate measure of the actual computational effort than the information-based complexity.
After I have outlined what is our final goals, let me give you an idea of how this goal will be
achieved. In what follows we will develop methods of two different types: the path-following and
the potential reduction ones; the LP prototypes of these methods are, respectively, the methods
of Renegar and Gonzaga, which are path-following routines, and the method of Karmarkar,
which is a potential reduction one. In contrast to the actual historical order, we shall start with
the quite traditional path-following scheme, since we are unprepared to understand what in fact
happens in the methods of the Karmarkar type.

1.3 The path-following scheme


The, let me say, ”classical” stage in developing the scheme is summarized in the seminal mono-
graph of Fiacco and McCormic (1967). Assume we intend to solve a convex program

(P ) : minimize f (x) s.t. gi (x) ≤ 0, i = 1, ..., m


1.3. THE PATH-FOLLOWING SCHEME 13

associated with smooth (at least twice continuously defferentiable) convex functions f , gi on
Rn . Let
G = {x ∈ Rn | gi (x) ≤ 0}
be the feasible domain of the problem; assume for the sake of simplicity that this domain is
bounded, and let the constraints {gi } satisfy the Slater condition:

∃x : gi (x) < 0, i = 1, ..., m.

Under these assumptions the feasible domain G is a solid - a closed and bounded convex set in
Rn with a nonempty interior.
In 60’s people believed that it is not difficult to solve unconstrained smooth convex problems,
and it was very natural to try to reduce the constrained problem (P ) to a series of unconstrained
problems. To this end it was suggested to associate with the feasible domain G of problem (P )
a barrier - an interior penalty function F (x), i.e., a smooth convex function F defined on the
interior of G and tending to ∞ when we approach from inside the boundary of G:

lim F (xi ) = ∞ for any sequence {xi ∈ int G} with lim xi ∈ ∂G.
i→∞ i→∞

It is also reasonble to assume that F is nondegenerate, i.e.,

F 00 (x) > 0, x ∈ int G

(here > 0 stands for ”positive definite”).


Given such a barrier, one can associate with it and with the objective f of (P ) the barrier-
generated family comprised of the problems

(Pt ) : minimize Ft (x) ≡ tf (x) + F (x).

Here the penalty parameter t is positive. Of course, x in (Pt ) is subject to the ”induced”
restriction x ∈ int G, since Ft is outside the latter set.
From our assumptions on G it immediately follows that
a) every of the problems (Pt ) has a unique solution x∗ (t); this solution is, of course, in the
interior of G;
b) the path x∗ (t) of solutions to (Pt ) is a continuous function of t ∈ [0, ∞), and all its
limiting, as t → ∞, points belong to the set of optimal solutions to (P ).
It immediately follows that if we are able to follow the path x∗ (t) along certain sequence ti → ∞
of values of the penalty parameter, i.e., know how to form ”good enough” approximations
xi ∈ int G to the points x∗ (ti ), say, such that

xi − x∗ (ti ) → 0, i → ∞, (1.1)

then we know how to solve (P ): b) and (1.1) imply that all limiting points of the sequance of
our iterates {xi } belong to the optimal set of (P ).
Now, to be able to meet the requirement (1.1) is, basically, the same as to be able to solve
to a prescribed accuracy each of the ”penalized” problems (Pt ). What are our abilities in this
respect? (Pt ) is a minimization problem with smooth and nondegenerate (i.e., with nonsingular
Hessian) objective. Of course, this objective is defined on the proper open convex subset of
Rn rather than on the whole Rn , so that the problem, rigorously speaking, is a constrained
one, same as the initial problem (P ). The constrained nature of (Pt ) is, anyhow, nothing
but an illusion: the solution to the problem is unique and belongs to the interior of G, and
any converging minimization method of a relaxation type (i.e., monotonically decreasing the
14 CHAPTER 1. INTRODUCTION TO THE COURSE

value of the objective along the sequence of iterates) started in an interior point of G would
automatically keep the iterates away from the boundary of G (since Ft → ∞ together with
F as the argument approaches the boundary from inside); thus, qualitatively speaking, the
behaviour of the method as applied to (Pt ) would be the same as if the objective Ft was defined
everywhere. In other words, we have basically the same possibilities to solve (Pt ) as if it was
an unconstrained problem with smooth and nondegenerate objective. Thus, the outlined path-
following scheme indeed achieves our goal - it reduces the constrained problem (P ) to a series
of in fact unconstrained problems (Pt ).
We have outlined what are our abilities to solve to a prescribed accuracy every particular
problem (Pt ) - to this end we can apply to the problem any relaxation iterative routine for
smooth unconstrained minimization, starting the routine from an interior point of G. What we
need, anyhow, is to solve not a single problem from the family, but a sequence of these problems
associated with certain tending to ∞ sequence of values of the penalty parameter. Of course,
in principle we could choose an arbitrary sequence {ti } and solve each of the problems (Pti )
independently, but anybody understands that it is senseless. What makes sense is to use the
approximate solution xi to the ”previous” problem (Pti ) as the starting point when solving the
”new” problem (Pti+1 ). Since x∗ (t), as we just have mentioned, is a continuous function of t, a
good approximate solution to the previous problem will be a good initial point for solving the
new one, provided that ti+1 − ti is not too large; this latter asumption can be ensured by a
proper policy of updating the penalty parameter.
To implement the aforementioned scheme, one should specify its main blocks, namely, to
choose somehow:
1) the barrier F ;
2) the ”working horse” - the unconstrained minimization method for solving the problems
(Pt ), along with the stopping criterion for the method;
3) the policy for updating the penalty parameter.
The traditional recommendations here were rather diffuse. The qualitative theory insisted
on at least C2 -smoothness and nondegeneracy of the barrier, and this was basically all; within
this class of barriers, there were no clear theoretical priorities. What people were adviced to do,
was
for 1): to choose F as certain ”preserving smoothness” aggregate of gi , e.g.,
m 
X α
1
F (x) = (1.2)
i=1
−gi (x)

with some α > 0, or


m
X
F (x) = − ln(−gi (x)), (1.3)
i=1

or something else of this type; the idea was that the local information on this barrier required
by the ”working horse” should be easily computed via similar information on the constraints gi ;
for 2): to choose as the ”working horse” the Newton method; this recommendation came
from computational experience and had no serious theoretical justification;
for 3): qualitatively, updating the penalty at a high rate, we reduce the number of auxiliary
unconstrained problems at the cost of elaborating each of the problems (since for large ti+1 − ti
a good approximation of x∗ (ti ) may be a bad starting point for solving the updated problem;
a low rate of updating the penalty simplifies the auxiliary problems and increases the number
of the problems to be solved before a prescribed value of the penalty (which corresponds to
the required accuracy of solving (P )) is achieved. The traitional theory was unable to offer
1.3. THE PATH-FOLLOWING SCHEME 15

explicit recommendations on the ”balanced” rate resulting in the optimal overall effort, and this
question normally was solved on the basis of ”computational experience”.
What was said looks very natural and is known for more than 30 years. Nevertheless, the
classical results on the path-following scheme have nothing in common with polynomial com-
plexity bounds, and not only because in 60’s nobody bothered about polynomiality: even after
you pose this question, the traditional results do not allow to answer this question affirmatively.
The reason is as follows: to perform the complexity analysis of the path-following scheme, one
needs not only qualitative information like ”the Newton method, as applied to a smooth convex
function with nondegenerate Hessian, converges quadratically, provided that the starting point
is close enough to the minimizer of the objective”, but also quantitive information: what is this
”close enough”. The results of this latter type also existed and everybody in Optimization knew
them, but it did not help much. Indeed, the typical quantitive result on the behaviour of the
Newton optimization method was as follows:
let φ be a C2 -continuous convex function defined in the Euclidean ball V of radius R centered
at x∗ and taking minimum at x∗ such that
φ00 (x∗ ) is nondegenerate with the spectrum from certain segment segment [L0 , L1 ], 0 < L0 <
L1 ;
φ00 (x) is Lipschitz continuous at x∗ with certain constant L3 :

|φ00 (x) − φ00 (x∗ )| ≤ L3 |x − x∗ |, x ∈ V.

Then there exist


ρ = ρ(R, L0 , L1 , L2 ) > 0, c = c(R, L0 , L1 , L2 )
such that the Newton iterate
x+ = x − [φ00 (x)]−1 φ0 (x)
of a point x satisfies the relation

|x + −x∗ | ≤ c|x − x∗ |2 , (1.4)

provided that
|x − x∗ | ≤ ρ.
The functions ρ(·) and c(·) can be written down explicitly, the statement itself can be modified
and a little bit strengthen, but it does not matter for us: the point is the structure of traditional
results on the Newton method, not the results themselves. These results are local: the quantitive
description of the convergence properties of the method is given in terms of the parameters
responsible for smoothness and nondegeneracy of the objective, and the ”constant factor” c
in the rate-of-convergence expression (1.4), same as the size ρ of the ”domain of quadratic
convergence” become worse and worse as the aforementioned parameters of smoothness and
nondegeneracy of the objective become worse. This is the structure of the traditional rate-
of-convergence results for the Newton method; the structure traditional results on any other
standard method for smooth unconstrained optimization is completely similar: these results
always involve some data-dependent parameters of smoothness and/or nondegeneracy of the
objective, and the quantitive description of the rate of convergence always becomes worse and
worse as these parameters become worse.
Now it is easy to realize why the traditional rate-of-convergence results for our candidate
”working horses” - the Newton method or something else - do not allow to establish polynomi-
ality of the path-following scheme. As the method goes on, the parameters of smoothness and
nondegeneracy of our auxiliary objectives Ft inevitably become worse and worse: if the solution
16 CHAPTER 1. INTRODUCTION TO THE COURSE

to (P ) is on the boundary of G, and this is the only case of interest in constrained minimization,
the minimizers x∗ (t) of Ft approach the boundary of G as t grows, and the behaviour of Ft in a
neighbourhood of x∗ (t) becomes less and less regular (indeed, for large t the function Ft goes to
∞ very close to x∗ (t). Since the parameters of smoothness/nondegeneracy of Ft become worse
and worse as t grows, the auxiliary problems, from the traditional viewpoint, become quanti-
tively more and more complicated, and the progress in accuracy (# of new digits of accuracy
per unit computational effort) tends to 0 as the method goes on.
The seminal contribution of Renegar and Gonzaga was in demonstration of the fact that the
above scheme applied to a Linear Programming problem

minimize f (x) = cT x s.t. gj (x) ≡ aTi − bj ≤ 0, j = 1, ..., m, x ∈ Rn

and to the concrete barrier for the feasible domain G of the problem - to the standard logarithmic
barrier
m
X
F (x) = − ln(bj − aTj x)
j=1

for the polytope G - is polynomial.


More specifically, it was proved that the method

0.001
ti+1 = (1 + √ )ti ; xi+1 = xi − [∇2x Fti+1 (xi )]−1 ∇x Fti+1 (xi ) (1.5)
m

(a single Newton step per each step in the penalty parameter) keeps the iterates in the interior
of G, maintains the ”closeness relation”

Fti (xi ) − min Fti ≤ 0.01

(provided that this relation was satisfied by the initial pair (t0 , x0 )) and ensures linear data-
independent rate of convergence

f (xi ) − f ∗ ≤ 2mt−1 −1
i ≤ 2mt0 exp{−O(1)im
−1/2
}. (1.6)

Thus, in spite of the above discussion, it turned out that for the particular barrier in question
the path-following scheme is polynomial - the penalty can be increased at a constant rate (1 +
0.001m−1/2 ) depending only on the size of the problem instance, and each step in the penalty
should be accompanied by a single Newton step in x. According to (1.6), the absolute inaccuracy
is inverse proportional to the penalty parameter, so that to add an extra accuracy digit it suffices
to increase the parameter by an absolute constant factor, which, in view of the description of

the method, takes O( m) steps. Thus, the Newton complexity - the # of Newton steps - of
finding an ε-solution is
 
√ V(p)
N (ε, p) = O( m) ln , (1.7)
ε

and since each Newton step costs, as it is easily seen, O(mn2 ) operations, the combinatorial
complexity of the method turns out to be polynomial, namely,
 
1.5 2 V(p)
C(ε, p) ≤ O(m n ) ln .
ε
1.4. WHAT IS INSIDE: SELF-CONCORDANCE 17

1.4 What is inside: self-concordance


Needless to say that the proofs of the announced results given by Renegar and Gonzaga were
completely non-standard and heavily exploited the specific form of the logarithmic barrier for the
polytope. The same can be said about subsequent papers devoted to the Linear Programming
case. The key to nonlinear extensions found by Yurii Nesterov was in realizing that among all
various properties of the logarithmic barrier for a polytope, in fact only two are responsible for
the polynomiality of the path-following methods associated with this polytope. These properties
are expressed by the following pair of differential inequalities:
[self-concordance]:
!3/2
d3 d2 t
| 3 |t=0 F (x + th)| ≤ 2 |t=0 F (x + th) , ∀h ∀x ∈ int G,
dt dt2

[finiteness of the barrier parameter]:


!1/2
d d2 t
∃ϑ < ∞ : | |t=0 F (x + th)| ≤ ϑ1/2 |t=0 F (x + th) , ∀h ∀x ∈ int G.
dt dt2

The inequality in the second relation in fact is satisfied with θ = m.


I am not going to comment these properties now; this is the goal of the forthcoming lectures.
What should be said is that these properties do not refer explicitly to the polyhedral structure
of G. Given an arbitrary solid G, not necessarily polyhedral, one can try to find for this solid a
barrier F with the indicated properties. It turns out that such a self-concordant barrier always
exists; moreover, in many important cases it can be written down in explicit and ”computable”
form. And the essense of the theory is that
given a self-concordant barrier F for a solid G, one can associate with this barrier
interior-point methods for minimizing linear objectives over G in completely the same
manner as in the case when G is a polytope and F is the standard logarithmic barrier
for G. E.g., to get a path-following method, it suffices to replace in the relations
(1.5) the standard logarithmic barrier for a polytope with the given self-concordant
barrier for the solid G, and the quantity m with the parameter ϑ of the latter barrier,
with similar substitution m ⇐ ϑ in the expression for the Newton complexity of the
method.
In particular, if F is ”polynomially computable”, so that its gradient and Hessian
at a given point can be computed at a polynomial arithmetic cost, then the associated
with F path-following method turns out to be polynomial.
Note that in the above claim I spoke about minimizing linear objectives only. This does not
cause any loss of generality, since, given a general convex problem

minimize f (u) s.t. gj (u) ≤ 0, j = 1, ..., m, u ∈ Q ⊂ Rk ,

you always can pass from it to an equivalent problem

minimize t s.t. x ≡ (t, u) ∈ G ≡ {(t, u) | f (u) − t ≤ 0, gj (u) ≤ 0, i = 1, ..., m, u ∈ Q}

of minimizing a linear objective over convex set. Thus, the possibilities to solve convex problems
by interior point polynomial time methods are restricted only by our abilities to point out ”ex-
plicit polynomially computable” self-concordant barriers for the corresponding feasible domains,
which normally is not so difficult.
18 CHAPTER 1. INTRODUCTION TO THE COURSE

1.5 Structure of the course


I hope now you have certain preliminary impression of what we are going to do. More specifically,
our plans are as follows.
1) First of all, we should study the basic properties of self-concordant functions and barriers;
these properties underly all our future constructions and proofs. This preliminary part of the
course is technical; I hope we shall survive the technicalities which, I think, will take two lectures.
2) As an immediate consequence of our technical effort, we shall find ourselves in a fine
position to develop and study path-following interior point methods for convex problems, and
this will be the first application of our theory.
3) To extend onto the nonlinear case another group of interior point methods known for LP,
the potential reduction ones (like the method of Karmarkar), we start with a specific and very
interesting in its own right geometry - conic formulation of a Convex Programming Problem and
Conic Duality. After developing the corresponding geometrical tools, we would be in a position
to develop potential reduction methods for general convex problems.
4) The outlined ”general” part of the course is, in a sense, conditional: the typical statements
here claim that, given a ”good” - self-concordant - barrier for the feasible domain of the problem
in question, you should act in such and such way and will obtain such and such polynomial effi-
ciency estimate. As far as applications are concerned, these general schemes should, of course,
be accompanied by technique for constructing the required ”good” barriers. This technique is
developed in the second part of the course. Applying this technique and our general schemes,
we shall come to concrete ”ready-to-use” interior point polynomial time algorithms for a series
of important classes of Convex Programming problems, including, besides Linear Programming,
Linearly constrained Quadratic Programming, Quadratically constrained Quadratic Program-
ming, Geometrical Programming, Optimization over the cone of positive semidefinite matrices,
etc.
Chapter 2

Self-concordant functions

In this lecture I introduce the main concept of the theory in question - the notion of a self-
concordant function. The goal is to define a family of smooth convex functions convenient for
minimization by the Newton method. Recall that a step of the Newton method as applied to
the problem of (unconstrained) minimization of a smooth convex function f is based on the
following rule:
in order to find the Newton iterate of a point x compute the second-order Taylor expansion of
f at x, find the minimizer x
b of this expansion and perform a step from x along the direction
x
b − x.

What the step should be, it depends on the version of the method: in the pure Newton routine
the iterate is exactly xb; it the relaxation version of the method one minimizes f along the ray
[x, x
b), etc.
As it was mentioned in the introductory lecture, the traditional results on the Newton
method state, under reasonable smoothness and nondegeneracy assumptions, its local quadratic
convergence. These results, as it became clear recently, possess a generic conceptual drawback:
the quantitive description of the region of quadratic convergence, same as the convergence itself,
is given in terms of the condition number of the Hessian of f at the minimizer and the Lipschitz
constant of this Hessian. These quantities, anyhow, are ”frame-dependent”: they are defined not
by f itself, but also by the Euclidean structure in the space of variables. Indeed, we need this
structure simply to define the Hessian matrix of f , same, by the way, as to define the gradient
of f . When we change the Euclidean structure, the gradient and the Hessian are subject to
certain transformation which does not remain invariant the quantities like the condition number
of the Hessian or its Lipschitz constant. As a result, the traditional description of the behaviour
of the method depends not only on the objective itself, but also on an arbitrary choice of the
Euclidean structure used in the description, which contradicts the affine-invariant nature of the
method (note that no ”metric notions” are involved into the formulation of the method). To
overcome this drawback, note that the objective itself at any point x induces certain Euclidean
structure Ex ; to define this structure, let us regard the second order differential

∂2
D2 f (x)[h, g] = |t=s=0 f (x + th + sg)
∂t∂s
of f taken at x along the pair of directions h and g as the inner product of the vectors h and
g. Since f is convex, this inner product possesses all required properties (except, possibly, the
nondegeneracy requirement ”the square of a nonzero vector is strictly positive”; as we shall see,
this is a minor difficulty). Of course, this Euclidean structure is local - it depends on x. Note
that the Hessian of f , taken at x with respect to the Euclidean structure Ex , is fine - this is

19
20 CHAPTER 2. SELF-CONCORDANT FUNCTIONS

simply the unit matrix, the matrix with the smallest possible condition number, namely, 1. The
traditional results on the Newton method say that what is important for besides this condition
number is the Lipschitz constant of the Hessian, or, which is basically the same, the magnitude
of the third order derivatives of f . What happens if we relate these latter quantities to the local
Euclidean structure defined by f ? This is the key to the notion of self-concordance. And the
definition is as follows:

Definition 2.0.1 Let Q be a nonempty open convex set in Rn and F be a C3 smooth con-
vex function defined on Q. F is called self-concordant on Q, if it possesses the following two
properties:
[Barrier property] F (xi ) → ∞ along every sequence {xi ∈ Q} converging, as i → ∞, to a
boundary point of Q;
[Differential inequality of self-concordance] F satisfies the differential inequality
 3/2
|D3 F (x)[h, h, h]| ≤ 2 D2 F (x)[h, h] (2.1)

for all x ∈ Q and all h ∈ Rn .


From now on

∂k
Dk F (x)[h1 , ..., hk ] ≡ |t =...=tk =0 F (x + t1 h1 + ... + tk hk )
∂t1 ...∂tk 1

denotes kth differential of F taken at x along the directions h1 , ..., hk .

(2.1) says exactly that if a vector h is of local Euclidean length 1, then the third order derivative
of F in the direction h is, in absolute value, at most 2; this is nothing but the aforementioned
”Lipschitz continuity”, with certain once for ever fixed constant, namely, 2, of the second-order
derivative of F with respect to the local Euclidean metric defined by this derivative itself.
You can ask what is so magic in the constant 2. The answer is as follows: both sides of
(2.1) should be nad actually are of the same homogeneity degree with respect to h (this is the
origin of the exponentual 3/2 in the right hand side). As a consequence, they are of different
homogeneity degrees with respect to F . Therefore, given a function F satisfying the inequality
 3/2
|D3 F (x)[h, h, h]| ≤ 2α D2 F (x)[h, h] ,

with certain positive α, you always may scale F , namely, multiply it by α, and come to a
function satisfying (2.1). We see that the choice of the constant factor in (2.1) is of no actual
importance and is nothing but a normalization condition. The indicated choice of this factor is
motivated by the desire to make the function − ln t, which plays important role in what follows,
to satisfy (2.1) ”as it is”, without any scaling.

2.1 Examples and elementary combination rules


We start with a pair of examples of self-concordant functions.

Example 2.1.1 A convex quadratic form

f (x) = xT Ax − 2bT x + c

on Rn (and, in particular, a linear form on Rn ) is self-concordant on Rn .


2.1. EXAMPLES AND ELEMENTARY COMBINATION RULES 21

This is immediate: the left hand side of (2.1) is identically zero. An single-line verification of
the definition justifies also the following example:

Example 2.1.2 The function − ln t is self-concordant on the positive ray {t ∈ R | t > 0}.

The number of examples can be easily increased, due to the following extremely simple (and
very useful) combination rules:

Proposition 2.1.1 (i) [stability with respect to affine substitutions of argument] Let F be self-
concordant on Q ⊂ Rn and x = Ay + b be affine mapping from Rk to Rn with the image
intersecting Q. Then the inverse image of Q under the mapping, i.e., the set

Q+ = {y ∈ Rk | Ay + b ∈ Q}

is an open convex subset of Rk , and the composite function

F + (y) = F (Ay + b) : Q+ → R

is self-concordant on Q+ .
(ii) [stability with respect to summation and multiplication by reals ≥ 1] Let Fi be self-
concordant functions on the open convex domains Qi ⊂ Rn and αi ≥ 1 be reals, i = 1, ..., m.
Assume that the set Q = ∩m i=1 Qi is nonempty. Then the function

F (x) = α1 F1 (x) + ... + αm Fm (x) : Q → R

is self-concordant on Q.
(iii) [stability with respect to direct summation] Let Fi be self-concordant on open convex
domains Qi ⊂ Rni , i = 1, ..., m. Then the function

F (x1 , ..., xm ) = F1 (x1 ) + ... + Fm (xm ) : Q ≡ Q1 × ... × Qm → R

is self-concordant on Q.

Proof is given by immediate and absolutely trivial verification of the definition. E.g., let us
prove (ii). Since Oi are open convex domains with nonempty intersection Q, Q is an open convex
domain, as it should be. Further, F , is, of course, C3 smooth and convex on Q. To prove the
barrier property, note that since Fi are convex, they are below bounded on any bounded subset
of Q. It follows that if {xj ∈ Q} is a sequence converging to a boundary point x of Q, then all
the sequences {αi Fi (xj )}, i = 1, ..., m, are below bounded, and at least one of them diverges to
∞ (since x belongs to the boundary of at least one of the sets Qi ); consequently, F (xj ) → ∞,
as required.
To verify (2.1), add the inequalities
 3/2
αi |D3 Fi (x)[h, h, h]| ≤ 2αi D2 Fi (x)[h, h]

(x ∈ Q, h ∈ Rn ). The left hand side of the resulting inequality clearly will be ≥ |D3 F (x)[h, h, h]|,
3/2
while the right hand side will be ≤ 2 D2 F (x)[h, h] , since for nonnegative bi and αi ≥ 1 one
has X X
3/2
αi bi ≤ ( αi bi )3/2 .
i i

Thus, F satisfies (2.1).


An immediate consequence of our combination rules is the following
22 CHAPTER 2. SELF-CONCORDANT FUNCTIONS

Corollary 2.1.1 Let


G = {x ∈ Rn | aTi x − bi ≤ 0, i = 1, ..., m}
be a convex polyhedron defined by a set of linear inequalities satisfying the Slater condition:

∃x ∈ Rn : aTi x − bi < 0, i = 1, ..., m.

Then the standard logarithmic barrier for G given by


m
X
F (x) = − ln(bi − aTi x)
i=1

is self-concordant on the interior of G.

Proof. From the Slater condition it follows that

int G = {x ∈ Rn | aTi x − bi < 0, i = 1, ..., m} = ∩m n T


i=1 Gi , Gi = {x ∈ R | ai x − bi < 0}.

Since the function − ln t is self-concordant on the positive half-axis, every of the functions
Fi (x) = − ln(bi −aTi x) is self-concordant on Gi (item (i) of Proposition; note that Gi is the inverse
P
image of the positive half-axis under the affine mapping x 7→ bi − aTi x), whence F (x) = i Fi (x)
is self-concordant on G = ∩i Gi (item (ii) of Proposition).
In spite of its extreme simplicity, the fact stated in Corollary, as we shall see in the mean
time, is responsible for 50% of all polynomial time results in Linear Programming.
Now let us come to systematic investigation of properties of self-concordant functions, with
the final goal to analyze the behaviour of the Newton method as applied to a function of this
type.

2.2 Properties of self-concordant functions


Let Q be an open convex domain in E = Rn and F be self-concordant on Q. For x ∈ Q and
h, g ∈ E let us define
hg, hix = D2 F (x)[g, h], |h|x = hh, hi1/2
x

so that | · |x is a Euclidean seminorm on E; it is a norm if and only if D2 F (x) is nondegenerate.


Let us establish the basic properties of F .
0. Basic inequality. For any x ∈ Q and any triple hi ∈ E, i = 1, 2, 3, one has
3
Y
|D3 F (x)[h1 , h2 , h3 ]| ≤ 2 |hi |x .
i=1

Comment. This is the result of applying to the symmetric 3-linear form D3 F (x)[h1 , h2 , h3 ]
and 2-linear positive semidefinite form D2 F (x)[h1 , h2 ] the following general fact:
let A[h1 , ..., hk ] be a symmetric k-linear form on Rn and B[h1 , h2 ] be a symmetrice positive
semidefinite bilinear form such that

|A[h, h, ..., h]| ≤ αB k/2 [h, h]

for certain α and all h. Then

|A[h1 , ..., hk ]| ≤ αB 1/2 [h1 , h1 ]B 1/2 [h2 , h2 ]...B 1/2 [hk , hk ]


2.2. PROPERTIES OF SELF-CONCORDANT FUNCTIONS 23

for all h1 , ..., hk .


The proof of this statement is among the exercises to the lecture.
I. Behaviour in the Dikin ellipsoid For x ∈ Q let us define the centered at x open Dikin
ellipsoid of radius r as the set

Wr (x) = {y ∈ E | |y − x|x < r},

and the closed Dikin ellipsoid as the set


cr (x) = cl Wr (x) = {y ∈ E | |y − x|x ≤ r}.
W

The open unit Dikin ellipsoid W1 (x) is contained in Q. Within this ellipsoid the Hessians of F
are ”almost proportional” to F 00 (x),

(1 − |h|x )2 F 00 (x) ≤ F 00 (x + h) ≤ (1 − |h|x )−2 F 00 (x) whenever |h|x < 1, (2.2)

the gradients of F satisfy the following Lipschitz-type condition:

|h|x
|z T (F 0 (x + h) − F 0 (x))| ≤ |z|x ∀z whenever |h|x < 1, (2.3)
1 − |h|x

and we have the following lower and upper bounds on F :

F (x) + DF (x)[h] + ρ(−|h|x ) ≤ F (x + h) ≤ F (x) + DF (x)[h] + ρ(|h|x ), |h|x < 1. (2.4)

where
s2 s3 s4
+
ρ(s) = − ln(1 − s) − s =+ + ... (2.5)
2 3 4
Lower bound in (2.4) is valid for all h such that x + h ∈ Q, not only for those h with |h|x < 1.
Proof. Let h be such that
r ≡ |h|x < 1 and x + h ∈ Q.
Let us prove that relations (2.2), (2.3) and (2.4) are satisfied at this particular h.
10 . Let us set
φ(t) = D2 F (x + th)[h, h],
so that φ is continuously differentiable on [0, 1]. We have

0 ≤ φ(t), r2 = φ(0) < 1, |φ0 (t)| = |D3 F (x + th)[h, h, h]| ≤ 2φ3/2 (t),

whence, for all small enough positive ,

0 < φ (t) ≡  + φ(t), φ (0) < 1, |φ0 (t)| ≤ 2φ3/2


 (t),

so that
d −1/2
| φ (t)| ≤ 1.
dt 
It follows that
φ−1/2
 (0) − t ≤ φ−1/2
 (t) ≤ φ−1/2
 (0) + t, 0 ≤ t ≤ 1,
whence
φ (0) φ (0)
1/2
≤ φ (t) ≤ 1/2
.
(1 + tφ (0))2 (1 − tφ (0))2
24 CHAPTER 2. SELF-CONCORDANT FUNCTIONS

The resulting inequalities hold true for all t ∈ [0, 1] and all  > 0; passing to limit as  → +0,
we come to
r2 2 r2
≤ φ(t) ≡ D F (x + th)[h, h] ≤ , 0 ≤ t ≤ 1. (2.6)
(1 + rt)2 (1 − rt)2
20 . Two sequential integrations of (2.6) result in
Z 1 Z τ
r2
F (x) + DF (x)[h] + { dt}dτ ≤ F (x + h) ≤
0 0 (1 + rt)2
Z 1 Z τ
r2
≤ F (x) + DF (x)[h] + { dt}dτ,
0 0 (1 − rt)2
which after straightforward computation leads to (2.4) (recall that r = |h|x ).
Looking at the presented reasoning, one can immediately see that the restriction r < 1 was
used only in the derivation of the upper, not the lower bound in (2.4); therefore this lower bound
is valid for all h such that x + h ∈ Q, as claimed.
30 . Now let us fix g ∈ E and set

ψ(t) = D2 F (x + th)[g, g],

so that ψ a continuously differentiable nonnegative function on [0, 1]. We have


h i1/2
|ψ 0 (t)| = |D3 F (x + th)[g, g, h]| ≤ 2D2 F (x + th)[g, g] D2 F (x + th)[h, h] (2.7)

(we have used 0.). Relation (2.7) means that ψ satisfies the linear differential inequality
r
|ψ 0 (t)| ≤ 2ψ(t)φ1/2 (t) ≤ 2ψ(t) , 0≤t≤1
1 − rt
(the second inequality follows from (2.6) combined with ψ ≥ 0). It follows that
d
[(1 − rt)2 ψ(t)] ≡ (1 − rt)2 [ψ 0 (t) − 2r(1 − rt)−1 ψ(t)] ≤ 0, 0 ≤ t ≤ 1,
dt
and
d
[(1 − rt)−2 ψ(t)] ≡ (1 − rt)−2 [ψ 0 (t) + 2r(1 − rt)−1 ψ(t)] ≥ 0, 0 ≤ t ≤ 1,
dt
whence, respectively,

(1 − rt)2 ψ(t) ≤ ψ(0), (1 − rt)−2 ψ(t) ≥ ψ(0),

or, recalling what ψ and r are,

(1 − |h|x t)−2 D2 F (x + th)[g, g] ≥ D2 F (x)[g, g] ≥ (1 − |h|x t)2 D2 F (x + th)[g, g];

since g is arbitrary, we come to (2.2).


40 . We have proved that (2.2) and (2.4) hold true for any h such that x + h is in the open
unit Dikin ellipsoid W1 (x) and x + h ∈ Q. To complete the proof, it remains to demonstrate
that the latter ”and” is redundant: x + h ∈ Q whenever x + h belongs to the open unit Dikin
ellipsoid W1 (x). To prove the latter statement, assume, on contrary, that W1 (x) is not contained
in Q. Then there is a point y in W1 (x) such that the half-segment [x, y) belongs to Q and y
itself does not belong to Q. The function F is well-defined on this half-segment; moreover, as
we already have seen, at any point x + h of this half-segment (2.4) holds. When x + h runs over
the half-segment, the quantities |h|x are bounded from above by |y − x|x and are therefore less
2.2. PROPERTIES OF SELF-CONCORDANT FUNCTIONS 25

than 1 and bounded away from 1. It follows from (2.4) that F is bounded on the half-segment,
which is the desired contradiction: since y is a boundary point of Q, F should tend to ∞ as a
point from [x, y) approaches to y.
50 . It remains to prove (2.3). To this end let us fix an arbitrary vector z and let us set

g(t) = z T (F 0 (x + th) − F 0 (x)).

Since the open unit Dikin ellipsoid W1 (x) is contained in Q, the function g is well-defined on
the segment [0, 1]. We have

g(0) = 0;
|g 0 (t)| = q
|z T F 00 (x + th)h| q
≤ z T F 00 (x + th)z hT F 00 (x + th)h
[we have usedqCauchy’s inequality]
q
≤ (1 − t|h|x )−2 z T F 00 (x)z hT F 00 (x)h
[we have used (2.2)]
q
= |h|x (1 − t|h|x )−2 z T F 00 (x)z,

whence Z 1 q
|h|x T F 00 (x)z =
|h|x q T 00
|g(1)| ≤ dt z z F (x)z,
0 (1 − t|h|x )2 1 − |h|x
as claimed in (2.3).
II. Recessive subspace of a self-concordant function. For x ∈ Q consider the subspace
{h ∈ E | D2 F (x)[h, h] = 0} - the kernel of the Hessian of F at x. This recessive subspace EF
of F is independent of the choice of x and is such that

Q = Q + EF .

In particular, the Hessian of F is nonsingular everywhere if and only if there exists a point where
the Hessian of F is nonsingular; this is for sure the case if Q is bounded.
Terminology: we call F nondegenerate, if EF = {0}, or, which is the same, if the Hessian of
F is nonsingular somewhere (and then everywhere) on Q.
Proof of II. To prove that the kernel of the Hessian of F is independent of the point where
the Hessian is taken is the same as to prove that if D2 F (x0 )[h, h] = 0, then D2 F (y)[h, h] ≡ 0
identically in y ∈ Q. To demonstrate this, let us fix y ∈ Q and consider the function

ψ(t) = D2 F (x0 + t(y − x))[h, h],

which is consinuously differentiable on the segment [0, 1]. Same as in the item 30 of the previous
proof, we have
|ψ 0 (t)| = |D3 F (x0 + t(y − x))[h, h, y − x]| ≤
h i1/2
≤ 2D2 F (x0 + t(y − x))[h, h] D2 F (x0 + t(y − x))[y − x, y − x] ≡ ψ(t)ξ(t)
with certain continuous on [0, 1] function ξ. It follows that

|ψ 0 (t)| ≤ M ψ(t)

with certain constant M , whence 0 ≤ ψ(t) ≤ ψ(0) exp{M t}, 0 ≤ t ≤ 1 (look at the derivative of
the function ψ(t) exp{−M t}). Since ψ(0) = 0, we come to ψ(1) = 0, i.e., D2 F (y)[h, h] = 0, as
claimed.
26 CHAPTER 2. SELF-CONCORDANT FUNCTIONS

Thus, the kernel of the Hessian of F is independent of the point where the Hessian is taken.
If h ∈ EF and x ∈ Q, then, of course, |h|x = 0, so that x + h ∈ W1 (x); from I. we know that
W1 (x) belongs to Q, so that x + h ∈ Q; thus, x + EF ⊂ Q whenever x ∈ Q, as required.
Now it is time to introduce a very important concept of Newton decrement of a self-
concordant function at a point. Let x ∈ Q. The Newton decrement of F at x is defined
as
λ(F, x) = max{DF (x)[h] | h ∈ E, |h|x ≤ 1}.
In other words, the Newton decrement is nothing but the conjugate to | · |x norm of the first-
order derivative of F at x. To be more exact, we should note that | · |x is not necessary a norm:
it may be a seminorm, i.e., may be zero at certain nonzero vectors; this happens if and only
if the recessive subspace EF of F is nontrivial, or, which is the same, if the Dikin ellipsoid of
F is not an actual ellipsoid, but an unbounded set - elliptic cylinder. In this latter case the
maximum in the definition of the Newton decrement may (not necessarily should) be +∞. We
can immediately realize when this is the case.
III. Continuity of the Newton decrement. The Newton decrement of F at x ∈ Q is finite
if and only if DF (x)[h] = 0 for all h ∈ EF . If it is the case for certain x = x0 ∈ Q, then it is
also the case for all x ∈ Q, and in this case the Newton decrement is continuous in x ∈ Q and
F is constant along its recessive subspace:

F (x + h) = F (x) ∀x ∈ Q ∀h ∈ EF ; (2.8)

otherwise the Newton decrement is identically +∞.


Proof. It is clear that if there is h ∈ EF such that DF (x)[h] 6= 0, then λ(F, x) = ∞, since
|th|x = 0 for all real t and, consequently, DF (x)[u] is above unbounded on the set {|u|x ≤ 1}.
Vice versa, assume that DF (x)[h] = 0 for all h ∈ EF , and let us prove that then λ(F, x) < ∞.
There is nothing to prove if EF = E, so that let us assume that EF 6= E. Let EF⊥ be certain
subspace of E complementary to EF : EF ∩ EF⊥ = {0}, EF + EF⊥ = E, and let π be the projector
of E onto EF⊥ parallel to EF , i.e., if
h = hF + h⊥F

is the (unique) representation of h ∈ E as the sum of vectors from EF and EF⊥ , then

πh = h⊥
F.

It is clear that
|πh|x ≡ |h|x
(since the difference h − πh belongs to EF and therefore is of zero | · |x -seminorm), and since we
have assumed that DF (x)[u] is zero for u ∈ EF , we also have

DF (x)[h] = DF (x)[πh].

Combining these observations, we see that it is possible to replace E in the definition of the
Newton decrement by EF⊥ :

λ(F, x) = max{DF (x)[h] | h ∈ EF⊥ , |h|x ≤ 1}. (2.9)

Since | · |x restricted onto EF⊥ is a norm rather than a seminorm, the right hand side of the latter
relation is finite, as claimed.
Now let us demonstrate that if λ(F, x) is finite at certain point x0 ∈ Q, then it is also finite
at any other point x of Q and is continuous in x. To prove finiteness, as we just have seen, it
2.2. PROPERTIES OF SELF-CONCORDANT FUNCTIONS 27

suffices to demonstrate that DF (x)[h] = 0 for any x and any h ∈ EF . To this end let us fix
x ∈ Q and h ∈ EF and consider the function

ψ(t) = DF (x0 + t(x − x0 ))[h].

This function is continuously differentiable on [0, 1] and is zero at the point t = 0 (since λ(F, x0 )
is assumed finite); besides this,

ψ 0 (t) = D2 F (x0 + t(x − x0 ))[h, x − x0 ] = 0

(since h belongs to the null space of the positive semidefinite symmetric bilinear form D2 F (x0 +
t(x − x0 ))[h1 , h2 ]), so that ψ is constant, namely, 0, and ψ(1) = 0, as required. As a byproduct
of our reasonong, we see that if λ(F, ·) is finite, then

F (x + h) = F (x), x ∈ Q, h ∈ EF ,

since the derivative of F at any point from Q in any direction from EF is zero.
It remains to prove that if λ(F, x) is finite at certain (and then, as we just have proved, at
any) point, then this is a continuous function of x. This is immediate: we already know that if
λ(F, x) is finite, it can be defined by relation (2.9), and this relation, by the standard reasons,
defines a continuous function of x (since | · |x restricted onto EF⊥ is a continuously depending on
x norm, not a seminorm).
The following simple observation clarifies the origin of the Newton decrement and its relation
to the Newton method.
IV. Newton Decrement and Newton Iterate. Given x ∈ Q, consider the second-order
Newton expansion of F at x, i.e., the convex quadratic form
1 1
NF,x (h) = F (x) + DF (x)[h] + D2 F (x)[h, h] ≡ F (x) + DF (x)[h] + |h|2x .
2 2
This form is below bounded if and only if it attains its minimum on E and if and only if
λ(F, x) < ∞; if it is the case, then for (any) Newton direction e of F at x, i.e., any minimizer
of this form, one has
D2 F (x)[e, h] ≡ −DF (x)[h], h ∈ E, (2.10)
|e|x = λ(F, x) (2.11)
and
1
NF,x (0) − NF,x (e) = λ2 (F, x). (2.12)
2
Thus, the Newton decrement is closely related to the amount by which the Newton iteration

x 7→ x + e

decreases F in its second-order expansion.


Proof. This is an immediate consequence of the standard fact of Linear Algebra: a convex
quadratic form
1
fA,b (h) = hT Ah + bT h + c
2
is below bounded if and only if it attains its minimum and if and only if the quantity

λ = max{bT h | hT Ah ≤ 1}
28 CHAPTER 2. SELF-CONCORDANT FUNCTIONS

is finite; if it is the case, then the minimizers y of the form are exactly the vectors such that

y T Ah = −bT h, h ∈ E,

for every minimizer y one has


y T Ay = λ2
and
1
fA,b (0) − min fA,b = λ2 .
2

The observation given by IV. allows to compute the Newton decrement in the nondegenerate
case EF = {0}.
IVa. Expressions for the Newton direction and the Newton decrement. If F is
nondegenerate and x ∈ Q, then the Newton direction of F at x is unique and is nothing but

e(F, x) = −[F 00 (x)]−1 F 0 (x),

F 0 and F 00 being the gradient and the Hessian of F with respect to certain Euclidean structure
on E, and the Newton decrement is given by
q q q
λ(F, x) = (F 0 (x))T [F 00 (x)]−1 F 0 (x) = eT (F, x)F 00 (x)e(F, x) = −eT (F, x)F 0 (x).

Proof. This is an immediate consequence of IV. (pass from the ”coordinateless” differentials
to ”coordinate” representation in terms of the gradient and the Hessian).
Now comes the main statement about the behaviour of the Newton method as applied to a
self-concordant function.
V. Damped Newton Method: relaxation property. Let λ(F, ·) be finite on Q. Given
x ∈ Q, consider the damped Newton iterate of x
1
x+ ≡ x+ (F, x) = x + e,
1 + λ(F, x)
e being (any) Newton direction of F at x. Then

x+ ∈ Q
and
F (x) − F (x+ ) ≥ λ(F, x) − ln(1 + λ(F, x)). (2.13)

Proof. As we know from IV., |e|x = λ ≡ λ(F, x), and therefore |x+ − x|x = λ/(1 + λ) < 1.
Thus, x+ belongs to the open unit Dikin ellipsoid of F centered at x, and, consequently, to Q
(see I.). In view of (2.4) we have
1
F (x+ ) ≤ F (x) + DF (x)[e] + ρ((1 + λ)−1 |e|x ) =
1+λ
[see (2.10) - (2.12)]
   
1 λ λ2 λ
= F (x) − D2 F (x)[e, e] + ρ = F (x) − +ρ =
1+λ 1+λ 1+λ 1+λ
2.2. PROPERTIES OF SELF-CONCORDANT FUNCTIONS 29

[see the definition of ρ in (2.4)]


 
λ2 λ λ
= F (x) − − ln 1 − − =
1+λ 1+λ 1+λ

= F (x) − λ + ln(1 + λ),


so that
F (x) − F (x+ ) ≥ λ − ln(1 + λ),
as claimed.
VI. Existence of minimizer, A. F attains its minimum on Q if and only if it is below
bounded on Q; if it is the case, then λ(F, ·) is finite and, moreover, minx∈Q λ(F, x) = 0.
Proof. Of course, if F attains its minimum on Q, it is below bounded on this set. To prove
the inverse statement, assume that F is below bounded on Q, and let us prove that it attains
its minimum on Q. First of all, λ(F, ·) is finite. Indeed, if there would be x ∈ Q with infinite
λ(F, x), it would mean that the derivative of F taken at x in certain direction h ∈ EF is nonzero.
As we know from II., the affine plane x + EF is contained in Q, and the second order derivative
of the restriction of F onto this plane is identically zero, so that the restriction is linear (and
nonconstant, since the first order derivative of F at x in certain direction from EF is nonzero).
And a nonconstant linear function F |x+EF is, of course, below unbounded. Now let Q⊥ be
the cross-section of Q by the plane x + EF⊥ , where x ∈ Q is certain fixed point and EF⊥ is a
subspace complementary to EF . Then Q⊥ is an open convex set in certain Rk and, in view of II.,
Q = Q⊥ + EF ; in view of III. F is constant along any translation of EF , and we see that it is the
same to prove that F attains its minimum on Q and to prove that the restriction of F onto Q⊥
attains its minimum on Q⊥ . This restriction is a self-concordant function on Q⊥ (Proposition
2.1.1); of course, it is below bounded on Q⊥ , and its recessive subspace is trivial. Passing
from (Q, F ) to (Q⊥ , F |Q⊥ ), we see that the statement in question can be reduced to a similar
statement for a nondegenerate self-concordant below bounded function; to avoid complicated
notation, let us assume that F itself is nondegenerate.
Since F is below bounded, the quantity inf x∈Q λ(F, x) is 0; indeed, if it were positive:

λ(F, x) > λ > 0 ∀x ∈ Q,

then, according to V., we would have a possibility to pass from any point x ∈ Q to another
point x+ with at least by the constant λ − ln(1 + λ) less value of F , which, of course, is
impossible, since F is assumed below bounded. Since inf x∈Q λ(F, x) = 0, there exists a point x
with λ ≡ λ(F, x) ≤ 1/6. From (2.4) it follows that

F (x + h) ≥ F (x) + DF (x)[h] + |h|x − ln(1 + |h|x ), |h|x < 1.

Further, in view of (2.10),

DF (x)[h] = −D2 F (x)[e, h] ≥ −|e|x |h|x

(we have used the Cauchy inequality), which combined with (2.11) results in

DF (x)[h] ≥ −λ|h|x ,

and we come to
F (x + h) ≥ F (x) − λ|h|x + |h|x − ln(1 + |h|x ). (2.14)
30 CHAPTER 2. SELF-CONCORDANT FUNCTIONS

When 0 ≤ t < 1, we have


1 1 1
f (t) ≡ −λt + t − ln(1 + t) ≥ −λt + t − t + t2 − t3 + t4 − ... ≥
2 3 4
 
1 1 1 1
≥ −λt + t2 − t3 = t t − t2 − λ ,
2 3 2 3
and we see that if
t(λ) = 2(1 + 3λ)λ,
then f (t(λ)) > 0 and t(λ) < 1. From (2.14) we conclude that F (x + h) > F (x) whenever x + h
belongs to the boundary of the closed Dikin ellipsoid W c (x) which in the case in question
t(λ)
is a compact subset of Q (recall that F is assumed to be nondegenerate). It follows that the
minimizer of F over the ellipsoid (which for sure exists) is an interior point of the ellipsoid and
therefore (due to convexity of F ) is a minimizer of F over Q, so that F attains its minimum
over Q.
To proceed, let me recall to you the concept of the Legendre transformation. Given a convex
function f defined on a convex subset Dom f of Rn , one can define the Legendre transformation
f ∗ of f as
f ∗ (y) = sup [y T x − f (x)];
x∈Dom f

the domain of f∗ is, by definition, comprised of those y for which the right hand side is finite.
It is immediately seen that Dom f ∗ is convex and f ∗ is convex on its domain.
Let Dom f be open and f be k ≥ 2 times continuously differentiable on its domain, the
Hessian of f being nondegenerate. It is celarly seen that
(L.1) if x ∈ Dom f , then y = f 0 (x) ∈ Dom f ∗ , and

f ∗ (f 0 (x)) = (f 0 (x))T x − f (x); x ∈ ∂f ∗ (f 0 (x)).

Since f 00 is nondegenerate, by the Implicit Function Theorem the set Dom∗ f ∗ of values of f 0 is
open; since, in addition, f is convex, the mapping

x 7→ f 0 (x)

is (k − 1) times continuously differentiable one-to-one mapping from Dom f onto Dom∗ f ∗ with
(k − 1) times continuously differentiable inverse. From (L.1) it follows that this inverse mapping
also is given by gradient of some function, namely, f ∗ . Thus,
(L.2) The mapping x 7→ f 0 (x) is a one-to-one mapping of Dom f onto an open set Dom∗ f ∗ ⊂
Dom f ∗ , and the inverse mapping is given by y 7→ (f ∗ )0 (y).
As an immediate consequence of (L.2), we come to the following statement
(L.3) f ∗ is k times continuously differentiable on Dom∗ f ∗ , and

(f ∗ )00 (f 0 (x)) = [f 00 (x)]−1 , x ∈ Dom f. (2.15)

VII. Self-concordance of the Legendre transformation. Let the Hessian of the self-
concordant function F be nondegenerate at some (and then, as we know from II., at any) point.
Then Dom F ∗ = Dom∗ F ∗ is an open convex set, and the function F ∗ is self-concordant on
Dom F ∗ .
Proof. 10 . Let us prove first that Dom F ∗ = Dom∗ F ∗ . If y ∈ Dom F ∗ , then, by definition, the
function y T x−F (x) is bounded from above on Q, or, which is the same, the function F (x)−y T x
2.2. PROPERTIES OF SELF-CONCORDANT FUNCTIONS 31

is below bounded on Q. This function is self-concordant (Proposition 2.1.1.(ii) and Example


2.1.1), and since it is below bounded, it attains its minimum on Q (VI.). At the minimizer x∗
of the function we have F 0 (x∗ ) = y, and we see that y ∈ Dom∗ F ∗ . Thus, Dom F = Dom∗ F ∗ .
20 . The set Dom F ∗ is convex, and the set Dom∗ F ∗ is open ((L.2)); from 10 it follows
therefore that F ∗ is a convex function with a convex open domain Dom F ∗ . The function is
3 times continuously differentiable on Dom F ∗ = Dom∗ F ∗ in view of (L.3). To prove self-
concordance of F ∗ , it suffices to verify the barrier property and the differential inequality (2.1).
30 . The barrier property is immediate: if a sequence yi ∈ Dom F ∗ converges to a point y
and the sequence {F ∗ (yi )} is bounded from above, then the functions yiT x − F (x) are uniformly
bounded from above on Q and therefore their pointwise limit y T x − F (x) also is bounded from
above on Q; by definition of Dom F ∗ it means that y ∈ Dom F ∗ , and since we already know
that Dom F ∗ is open, we conclude that any convergent sequence of points from Dom F ∗ along
which F ∗ is bounded from above converges to an interior point of Dom F ∗ ; this, of course, is an
equivalent reformulation of the barrier property.
40 . It remains to verify (2.1). From (L.3) for any fixed h we have

hT (F ∗ )00 (F 0 (x))h = hT [F 00 (x)]−1 h, x ∈ Q.

Differentiating this identity in x in a direction g, we come to1)

D3 F ∗ (F 0 (x))[h, h, F 00 (x)g] = −D3 F (x)[[F 00 (x)]−1 h, [F 00 (x)]−1 h, g];

substituting g = [F 00 (x)]−1 h, we come to


 3/2  3/2
|D3 F ∗ (F 0 (x))[h, h, h]| = |D3 F (x)[g, g, g]| ≤ 2 D2 F (x)[g, g] ≡ 2 g T F 00 (x)g =

[since g = [F 00 (x)]−1 h]
 3/2
= 2 hT [F 00 (x)]−1 h .
 3/2
The latter quantity, due to (L.3), is exactly 2 hT (F ∗ )00 (F 0 (x))h , and we come to
 3/2
|D3 F ∗ (y)[h, h, h]| ≤ 2 D2 F ∗ (y)[h, h]

for all h and all y = F 0 (x) with x ∈ Q. When x runs over Q, y, as we already know, runs
through the whole Dom F ∗ , and we see that (2.1) indeed holds true.
VIII. Existence of minimizer, B. F attains its minimum on Q if and only if there exists
x ∈ Q with λ(F, x) < 1, and for every x with the latter property one has

F (x) − min F ≤ ρ(λ(F, x)); (2.16)


Q

moreover, for an arbitrary minimizer x∗ of F on Q and the above x one has


 2
λ(F, x)
D2 F (x)[x∗ − x, x∗ − x] ≤ . (2.17)
1 − λ(F, x)

1
we use the following rule for differentiating the mapping x 7→ B(x) ≡ A−1 (x), A(x) being a square nonsingular
matrix smoothly depending on x:
DB(x)[g] = −B(x)DA(x)[g]B(x)
(to get it, differentiate the identity B(x)A(x) ≡ I).
32 CHAPTER 2. SELF-CONCORDANT FUNCTIONS

Proof. The ”only if” part is evident: λ(F, x) = 0 at any minimizer x of F . To prove the
”if” part, we, same as in the proof of VI., can reduce the situation to the case when F is
nondegenerate. Let x be such that λ ≡ λ(F, x) < 1, and let y = F 0 (x). In view of (L.3) we have

y T (F ∗ )00 (y)y = (F 0 (x))T [F 00 (x)]−1 F 0 (x) = λ2 (2.18)

(the latter relation follows from VIa.). Since λ < 1, we see that 0 belongs to the centered at y
open Dikin ellipsoid of the self-concordant (as we know from VII.) function F ∗ and therefore
(I.) to the domain of this function. From VII. we know that this domain is comprised of values
of the gradient of F at the points of Q; thus, there exists x∗ ∈ Q such that F 0 (x∗ ) = 0, and F
attains its minimum on Q. Furthermore, from (2.4) as applied to F ∗ and from from (2.18) we
have
F ∗ (0) ≤ F ∗ (y) − y T (F ∗ )0 (y) + ρ(λ);
since y = F 0 (x) and 0 = F 0 (x∗ ), we have (see (L.1))

F ∗ (y) = y T x − F (x), (F ∗ )0 (y) = x, F ∗ (0) = −F ∗ (x∗ ),

and we come to
−F (x∗ ) ≤ y T x − F (x) − y T x + ρ(λ),
which is nothing but (2.16).
Finally, setting q
|h|y = hT (F ∗ )00 (y)h
and noticing that, by (2.18), |y|y = λ < 1, we get for an arbitrary vector z

|z T (x∗ − x)| = |z T [(F ∗ 0 ∗ 0


q ) (0) − (F ) (y)]|
λ
≤ 1−λ z T (F ∗ )00 (y)z
[we q have applied (2.3) to F ∗ at the point y with h = −y]
λ
= 1−λ z T [F 00 (x)]−1 z;

substituting z = F 00 (x)(x∗ − x), we get


q λ
(x∗ − x)F 00 (x)(x∗ − x) ≤ ,
1−λ
as required in (2.17).

Remark 2.2.1 Note how sharp is the condition of existence of minimizer given by VII.: for
the self-concordant on the positive ray and below unbounded function F (x) = − ln x one has
λ(F, x) ≡ 1!

IX. Damped Newton method: local quadratic convergence. Let λ(F, ·) be finite, let
x ∈ Q, and let x+ be the damped Newton iterate of x (see V.). Then

λ(F, x+ ) ≤ 2λ2 (F, x). (2.19)

Besides this, if λ(F, x) < 1, then F attains its minimum on Q, and for any minimizer x∗ of F
one has
λ(F, x)
|x − x∗ |x∗ ≤ (2.20)
1 − λ(F, x)
2.2. PROPERTIES OF SELF-CONCORDANT FUNCTIONS 33

and
λ(F, x)
|x − x∗ |x ≤ . (2.21)
1 − λ(F, x)

Proof. 10 . To prove (2.19), denote by e the Newton direction of F at x, set


1
λ = λ(F, x), r = ,
1+λ
and let h ∈ E. The function
ψ(t) = DF (x + te)[h]
is twice continuously differentiable on [0, r]; we have

ψ 0 (t) = D2 F (x + te)[h, e], ψ 00 (t) = D3 F (x + te)[h, e, e],

whence, in view of O.,


|ψ 00 (t)| ≤ 2|h|x+te |e|2x+te ≤
[in view of (2.2) and since |e|x = λ, see (2.11)]

≤ 2(1 − tλ)−3 |h|x |e|2x = 2(1 − tλ)−3 λ2 |h|x .

It follows that
Z r Z t
+ 0
DF (x )[h] ≡ ψ(r) ≤ ψ(0) + rψ (0) + |h|x { 2(1 − τ λ)−3 λ2 dτ }dt =
0 0

λ2 r2
= ψ(0) + rψ 0 (0) + |h|x =
1 − λr
[the definition of ψ]
λ2 r2
= DF (x)[h] + rD2 F (x)[h, e] + |h|x =
1 − λr
[see (2.10)]
λ2 r2
= (1 − r)DF (x)[h] + |h|x =
1 − λr
[the definition of r]
λ λ2
DF (x)[h] + |h|x ≤
1+λ 1+λ
[since DF (x)[h] ≤ λ|h|x by definition of λ = λ(F, x)]

λ2
≤2 |h|x ≤
1+λ
[see (2.2) and take into account that |x+ − x|x = r|e|x = rλ]

λ2 1
≤2 |h| + = 2λ2 |h|x+ .
1 + λ 1 − rλ x
Thus, for any h ∈ E we have DF (x+ )[h] ≤ 2λ2 |h|x+ , as claimed in (2.19).
20 . Let x ∈ Q be such that λ ≡ λ(F, x) < 1. We already know from VIII. that in this case
F attains its minimum on Q, and that

F (x) − min F ≤ ρ(λ) ≡ − ln(1 − λ) − λ. (2.22)


Q
34 CHAPTER 2. SELF-CONCORDANT FUNCTIONS

Let x∗ be a minimizer of F on Q and let r = |x − x∗ |x∗ . From (2.4) applied to x∗ in the, role of
x and x − x∗ in the role of h it follows that

F (x) ≥ F (x∗ ) + ρ(−r) ≡ F (x∗ ) + r − ln(1 + r).

Combining this observation with (2.22), we come to

r − ln(1 + r) ≤ −λ − ln(1 − λ),


λ
and it immediately follows that r ≤ 1−λ , as required in (2.20). (2.21) is nothing but (2.17).
The main consequence of the indicated properties of self-concordant functions is the following
description of the behaviour of the Damped Newton method (for the sake of simplicity, we restrict
ourselves with the case of nondegenerate F ):
X. Summary on the Damped Newton method. Let F be self-concordant nondegenerate
function of Q. Then
A. [existence of minimizer] F attains its minimum on Q if and only if it is below bounded
on Q; this is for sure the case if
q
λ(F, x) ≡ (F 0 (x))T [F 00 (x)]−1 F 0 (x) < 1

for some x.
B. Given x1 ∈ Q, consider the Damped Newton minimization process given by the recurrence
1
xi+1 = xi − [F 00 (xi )]−1 F 0 (xi ). (2.23)
1 + λ(F, xi )

The recurrence keeps the iterates in Q and possesses the following properties
B.1 [relaxation property]

F (xi+1 ) ≤ F (xi ) − [λ(F, xi ) − ln(1 + λ(F, xi ))]; (2.24)

in particular, if λ(F, xi ) is greater than an absolute constant, then the progress in the value of F
at the step i is at least another absolute constant; e.g., if λ(F, xi ) ≥ 1/4, then F (xi ) − F (xi+1 ) ≥
1 5
4 − ln 4 = 0.026856...
B.2 [local quadratic convergence] If at certain step i we have λ(F, xi ) ≤ 14 , then we are in
the region of quadratic convergence of the method, namely, for every j ≥ i we have
1
λ(F, xj+1 ) ≤ 2λ2 (F, xj ) [≤ λ(F, xj )], (2.25)
2

λ2 (F, xj )
F (xj ) − min F ≤ ρ(λ(F, xj )) [≤ ], (2.26)
Q 2(1 − λ(F, xj ))
and for the (unique) minimizer x∗ of F we have

λ(F, xj )
|xj − x∗ |x∗ ≤ (2.27)
1 − λ(F, xj )

and
λ(F, xj )
|xj − x∗ |xj ≤ . (2.28)
1 − λ(F, xj )
2.2. PROPERTIES OF SELF-CONCORDANT FUNCTIONS 35

C. If F is below bounded, then the Newton complexity (i.e., # of steps (2.23)) of finding a point
x ∈ Q with λ(F, x) ≤ κ ≤ 0.1) does not exceed the quantity
 
1
O(1) [F (x1 ) − min F ] + ln ln (2.29)
Q κ

with an absolute constant O(1).


The statements collected in X. in fact are already proved: A is given by VIII.; B.1 is V.; B.2
is IX.; C is an immediate consequence of B.1 and B.2.
Note that the description of the convergence properties of the Newton method as applied to
a self-concordant function is completely objective-independent; it does not involve any specific
numeric characteristics of F .
36 CHAPTER 2. SELF-CONCORDANT FUNCTIONS

2.3 Exercises: Around Symmetric Forms


The goal of the below exercises is to establish the statement underlying 0.:
(P): let A[h1 , ..., hk ] be a k-linear symmetric form on Rn and B[h1 , h2 ] be a symmetric positive
semidefinite 2-linear form on Rn . Assume that for some α one has

|A[h, ..., h]| ≤ αB k/2 [h, h], h ∈ Rn . (2.30)

Then
k
Y
|A[h1 , ..., hk ]| ≤ α B 1/2 [hi , hi ] (2.31)
i=1

for all h1 , ..., hk .


Let me start with recalling the terminology. A k-linear form A[h1 , ..., hk ] on E = Rn is a
real-valued function of k arguments h1 , ..., hk , each of them varying over E, which is linear and
homogeneous function with respect to every argument, the remaining arguments being set to
arbitrary (fixed) values. The examples are:

• a linear form A[h] = aT h (k = 1);

• a bilinear form A[h1 , h2 ] = hT1 ah2 , a being n × n matrix (k = 2);

• 3-linear form of the type A[h1 , h2 , h3 ] = (aT h1 )(hT2 h3 );

• the n-linear form A[h1 , ..., hn ] = Det (h1 ; ...; hn ).

A k-linear form is called symmetric, if it remains unchanged under every permutation of the
collection of arguments.

Exercise 2.3.1 Prove that any 2-linear form on Rn can be represented as A[h1 , h2 ] = hT1 ah2
via certain n × n matrix a. When the form is symmetric? Which of the forms in the above
examples are symmetric?

The restriction of a symmetric k-linear form A[h1 , ..., hk ] onto the ”diagonal” h1 = h2 = ... =
hk = h, which is a function of h ∈ Rn , is called homogeneous polynomial of full degree k on
Rn ; the definition coincides with the usual Calculus definition: ”a polynomial of n variables is
a finite sum of monomials, every monomial being constant times product of nonnegative integer
powers of the variables. A polynomial is called homogeneous of full degree k if the sum of the
powers in every monomial is equal to k”.

Exercise 2.3.2 Prove the equivalence of the aforementioned two definitions of a homogeneous
polynomial. What is the 3-linear form on R2 which produces the polynomial xy 2 ((x, y) are
coordinates on R2 )?

Of course, you can restrict onto diagonal an arbitrary k-linear form, not necessarily symmetric,
and get certain function on E. You, anyhow, will not get something new: for any k-linear form
A[h1 , ..., hk ] there exists a symmetric k-linear form AS [h1 , ..., hk ] with the same restriction on
the diagonal:
A[h, ..., h] ≡ AS [h, ..., h], h ∈ E;
to get AS , it suffices to take average, over all permutations σ of the k-element index set, of the
forms Aσ [h1 , ..., hk ] = A[hσ(1) , ..., hσ(k) ].
2.3. EXERCISES: AROUND SYMMETRIC FORMS 37

From polylinearity of a k-linear form A[h1 , ..., hk ] it follows that the value of the form at the
collection of linear combinations
X
hi = ai,j ui,j , i = 1, ..., k,
j∈J

J being a finite index set, can be expressed as


k
!
X Y
ai,j A[u1,j1 , u2,j2 , ..., uk,jk ];
j1 ,...,jk ∈J i=1

this is nothing but the usual rule for ”opening the parentheses”. In particular, A[·] is uniquely
defined by its values on the collections comprised of basis vectors e1 , ..., en :
X
A[h1 , ..., hk ] = h1,j1 h2,j2 ...hk,jk A[ej1 , ej2 , ..., ejk ],
1≤j1 ,...,jk ≤n

hi,j being j-th coordinate of the vector hi with respect to the basis. It follows that a polylinear
form is continuous (even C∞ ) function of its arguments.
A symmetric bilinear form A[h1 , h2 ] is called positive semidefinite, if the corresponding ho-
mogeneous polynomial is nonnegative, i.e., if A[h, h] ≥ 0 for all h. A symmetric positive semidef-
inite bilinear form sastisfies all requirements imposed on an inner product, except, possibly, the
nondegeneracy requirements ”square of nonzero vector is nonzero”. If this requirement also is
satisfied, i.e., if A[h, h] > 0 whenever h 6= 0, then A[h1 , h2 ] defines an Euclidean structure on
E. As we know from Exercise 2.3.1, a bilinear form on Rn always can be represented by a
n × n matrix a as hT1 ah2 ; the form is symmetric if and only if a = aT , and is symmetric positive
(semi)definite if and only if a is symmetric positive (semi)definite matrix.
A symmetric k-linear form produces, as we know, a uniquely defined homogeneous polyno-
mial of degree k. It turns out that the polynomial ”remembers everything” about the related
k-linear form:

Exercise 2.3.3 #+ Prove that for every k there exist:

• integer m,

• real ”scale factors” r1,l , r2,l , ..., rl,l , l = 1, ..., m,

• real weights wl , l = 1, ..., m,

with the following property: for any n and any k-linear symmetric form A[h1 , ..., hk ] on Rn
identically in h1 , ..., hk one has
m
" k k k
#
X X X X
A[h1 , ..., hk ] = wl A ri,l hi , ri,l hi , ..., ri,l hi .
l=1 i=1 i=1 i=1

In other words, A can be restored, in a linear fashion, via its restriction on the diagonal.
Find a set of scale factors and weights for k = 2 and k = 3.

Now let us come to the proof of (P). Of course, it suffices to consider the case when B is
positive definite rather than semidefinite (replace B[h1 , h2 ] with B [h1 , h2 ] = B[h1 , h2 ] + hT1 h2 ,
 > 0, thus making B positive definite and preserving the assumption (2.30); given that (P)
is valid for positive definite B, we would know that (2.31) is valid for B replaced with B and
would be able to pass to limit as  → 0). Thus, from now on we assume that B is symmetric
38 CHAPTER 2. SELF-CONCORDANT FUNCTIONS

positive definite. In this case B[h1 , h2 ] can be taken as an inner product on Rn , and in the
associated ”metric” terms (P) reads as follows:
(P’): let | · | be a Euclidean norm on Rn , A[h1 , ..., hk ] be a k-linear symmetric form on Rn such
that
|A[h, ..., h]| ≤ α|h|k , h ∈ Rn .
Then
|A[h1 , ..., hk ]| ≤ α|h1 |...|hk |, h1 , ..., hk ∈ Rn .

Now, due to homogeneity of A with respect to every hi , to prove the conclusion in (P’) is
the same as to prove that |A[h1 , ..., hk ]| ≤ α whenever |hi | ≤ 1, i = 1, ..., k. Thus, we come to
the following equivalent reformulation of (P’):
prove that for a k-linear symmetric form A[h1 , ..., hk ] one has

max |A[h, ..., h]| = max |A[h1 , ..., hk ]|. (2.32)


|h|=1 |hi |≤1

Note that from Exercise 2.3.3 it immediately follows that the right hand side of (2.32) is
majorated by a constant times the left hand side, with the constant depending on k only. For
this latter statement it is completely unimportant whether the norm | · | in question is or is
not Euclidean. The point, anyhow, is that in the case of Euclidean norm the aforementioned
constant factor can be set to 1. This is something which should be a ”common knowledge”;
surprisingly, I was unable to find somewhere even the statement, not speaking of the proof. I
do not think that the proof presented in the remaining exercises is the simplest one, and you
are welcome to find something better. We shall prove (2.32) by induction on k.

Exercise 2.3.4 Prove the base, i.e., that (2.32) holds true for k = 2.
Now assume that (2.32) is valid for k = l − 1 and any k-linear symmetric form A, and let is
prove that it is valid also for k = l.
Let us fix a symmetric l-linear form A, and let us call a collection T = {T1 , ..., Tl } of one-
dimensional subspaces of Rn an extremal, if for some (and then - for each) choice of unit vectors
ei ∈ Ti one has
|A[e1 , ..., el ]| = ω ≡ max |A[h1 , ..., hl ]|.
|h1 |=...=|hl |=1

Clearly, extremals exist (we have seen that a[·] is continuous). Let T be the set of all extremals.
To prove (2.32) is the same as to prove that T contains an extremal of the type {T, ..., T }.
Exercise 2.3.5 #+ Let {T1 , ..., Tl } ∈ T and T1 6= T2 . Let ei ∈ Ti be unit vectors, h = e1 +
e2 , q = e1 − e2 . Prove that then both {Rh, Rh, T3 , ..., Tl } and {Rq, Rq, T3 , ..., Tl } are extremals.
t times s times
z }| { z }| {
Let T∗ be the subset of T formed by the extremals of the type {T, ..., T , S, ..., S} for some
t and s (depending on the extremal). By virtue of the inductive assumption, T∗ is nonempty
t times s times
z }| { z }| {
(in fact,T∗ contains an extremal of the type {T, ..., T, S}). For T = {T, ..., T , S, ..., S} ∈ T∗ let
α(T ) denote the angle (from [0, π2 ]) between T and S.

Exercise 2.3.6 #+ Prove that if T = {T, ..., T, S, ..., S} is an extremal of the aforementioned
”2-line” type, then there exists an extremal T 0 of the same type with φ(T 0 ) ≤ 12 φ(T ). Derive
from this observation that there exists a 2-line extremal with φ(T ) = 0, i.e., of the type {T, ..., T },
and thus complete the inductive step.
2.3. EXERCISES: AROUND SYMMETRIC FORMS 39

Exercise 2.3.7 ∗ Let A[h1 , ..., hk ], h1 , ..., hk ∈ Rn be a linear with respect to every argument
and invariant with respect to permutations of arguments mapping taking values in certain Rl ,
and let B[h1 , h2 ] be a symmetric positive semidefinite bilinear scalar form on Rn such that

k A[h, ..., h] k≤ αB k/2 [h, h], h ∈ Rn ,

k · k being certain norm on Rk . Prove that then


k
Y
k A[h1 , ..., hk ] k≤ α B 1/2 [hi , hi ], h1 , ..., hk ∈ Rn .
i=1
40 CHAPTER 2. SELF-CONCORDANT FUNCTIONS
Chapter 3

Self-concordant barriers

We have introduced and studied the notion of a self-concordant function for an open convex
domain. To complete developing of technical tools, we should investigate a specific subfamily of
this family - self-concordant barriers.

3.1 Definition, examples and combination rules


Definition 3.1.1 Let G be a closed convex domain in Rn (”domain” means ”a set with a
nonempty interior”), and let ϑ ≥ 0. A function F : int G → R is called self-concordant barrier
for G with the parameter value ϑ (in short, ϑ-self-concordant barrier for G), if
a) F is self-concordant on int G;
b) one has
h i1/2
|DF (x)[h]| ≤ ϑ1/2 D2 F (x)[h, h] (3.1)
for all x ∈ int G and all h ∈ Rn .

Recall that self-concordance is, basically, Lipschitz continuity of the Hessian of F with respect
to the local Euclidean metric defined by the Hessian itself. Similarly, (3.1) says that F should
be Lipschitz continuous, with constant ϑ1/2 , with respect to the same local metric.
Recall also that the quantity

λ(F, x) = max{DF (x)[h] | D2 F (x)[h, h] ≤ 1}

was called the Newton decrement of F at x; this quantity played crucial role in our investigation
of self-concordant functions. Relation (3.1) means exactly that the Newton decrement of F
should be bounded from above, independently of x, by certain constant, and the square of this
constant is called the parameter of the barrier.
Let us point out preliminary examples of self-concordant barriers. To this end let us look at
the basic examples of self-concordant functions given in the previous lecture.

Example 3.1.1 A constant is self-concordant barrier for Rn with the parameter 0.

It can be proved that a constant is the only self-concordant barrier for the whole space, and
the only self-concordant barrier with the value of the parameter less than 1. In what follows we
never deal with the trivial - constant - barrier, so that you should remember that the parameters
of barriers in question will always be ≥ 1.
In connection with the above trivial example, note that the known to us self-concordant on
the whole space functions - linear and convex quadratic ones - are not self-concordant barriers,

41
42 CHAPTER 3. SELF-CONCORDANT BARRIERS

provided that they are nonconstant. This claim follows from the aforementioned general fact
that the only self-concordant barrier for the whole space is a constant and also can be easily
verified directly.
Another basic example of a self-concordant function known to us is more productive:
Example 3.1.2 The function F (x) = − ln x is a self-concordant barrier with parameter 1 for
the non-negative ray.
This is seen from an immediate computation.
The number of examples can be immediately increased, due to the following simple combi-
nation rules (completely similar to those for self-concordant functions):
Proposition 3.1.1 (i) [stability with respect to affine substitutions of argument] Let F be a
ϑ-self-concordant barrier for G ⊂ Rn and let x = Ay + b be affine mapping from Rk to Rn with
the image intersecting int G. Then the inverse image of G under the mapping, i.e., the set

G+ = {y ∈ Rk | Ay + b ∈ G}

is a closed convex domain in Rk , and the composite function

F + (y) = F (Ay + b) : int G+ → R

is a ϑ-self-concordant barrier for G+ .


(ii) [stability with respect to summation and multiplication by reals ≥ 1] Let Fi be ϑi -self-
concordant barriers for the closed convex domains Gi ⊂ Rn and αi ≥ 1 be reals, i = 1, ..., m.
Assume that the set G = ∩m i=1 Gi has a nonempty interior. Then the function

F (x) = α1 F1 (x) + ... + αm Fm (x) : int G → R


P
is ( i αi ϑi )-self-concordant
barrier for G.
(iii) [stability with respect to direct summation] Let Fi be ϑi -self-concordant barriers for
closed convex domains Gi ⊂ Rni , i = 1, ..., m. Then the function

F (x1 , ..., xm ) = F1 (x1 ) + ... + Fm (xm ) : int G → R, G ≡ G1 × ... × Gm ,


P
is ( i ϑi )-self-concordant barrier for G.
Proof is given by immediate and absolutely trivial verification of the definition. E.g., let us
prove (ii). From Proposition 2.1.1.(ii) we know that F is self-concordant on int G ≡ ∩m
i=1 int Gi .
The verification of (3.1) is as follows:
m
X m
X
|DF (x)[h]| = | αi DFi (x)[h]| ≤ αi |DFi (x)[h]| ≤
i=1 i=1

[since Fi are ϑi -self-concordant barriers]


m
X h i1/2 m
X h i1/2
1/2
≤ αi ϑi D2 Fi (x)[h, h] = [αi ϑi ]1/2 αi D2 Fi (x)[h, h] ≤
i=1 i=1

[Cauchy’s inequality]
"m #1/2 " m #1/2 "m #1/2
X X X h i1/2
2
≤ αi ϑi αi D Fi (x)[h, h] = αi ϑi D2 F (x)[h, h] ,
i=1 i=1 i=1

as required.
An immediate consequence of our combination rules is as follows (cf. Corollary 2.1.1):
3.2. PROPERTIES OF SELF-CONCORDANT BARRIERS 43

Corollary 3.1.1 Let


G = {x ∈ Rn | aTi x − bi ≤ 0, i = 1, ..., m}
be a convex polyhedron defined by a set of linear inequalities satisfying the Slater condition:

∃x ∈ Rn : aTi x − bi < 0, i = 1, ..., m.

Then the standard logarithmic barrier for G given by


m
X
F (x) = − ln(bi − aTi x)
i=1

is m-self-concordant barrier for G.


Proof. The function − ln t is 1-self-concordant barrier for the positive half-axis (Example 3.1.2);
therefore every of the functions Fi (x) = − ln(bi − aTi x) is 1-self-concordant barrier for the closed
half-space {x ∈ Rn | bi − aTi x ≥ 0} (item (i) of Proposition; note that Gi is the inverse image of
P
the nonnegative half-axis under the affine mapping x 7→ bi − aTi x), whence F (x) = i Fi (x) is
m-self-concordant barrier for the intersection G of these half-spaces (item (ii) of Proposition).
The fact stated in Corollary is responsible for 100% of polynomial time results in Linear
Programming.
Now let us come to systematic investigation of properties of self-concordant barriers. Please
do not be surprised by the forthcoming miscellania; everything will be heavily exploited in the
mean time.

3.2 Properties of self-concordant barriers


Let G be a closed convex domain in E = Rn , and let F be ϑ-self-concordant barrier for G.
Preliminaries: the Minkowsky function of a convex domain. Recall that, given an
interior point x of G, one can define the Minkowsky function of G with the pole at x as

πx (y) = inf{t > 0 | x + t−1 (y − x) ∈ G}.

In other words, to find πx (y), consider the ray [x, y) and look where this ray intersects the
boundary of G. If the intersection point y 0 exists, then πx (y) is the length of the segment [x, y 0 ]
divided by the length of the segment [x, y]; if the ray [x, y) is contained in G, then πx (y) = 0.
Note that the Minkowsky function is convex, continuous and positive homogeneous:

πx (λy) = λπx (y), λ ≥ 0;

besides this, it is zero at x and is ≤ 1 in G, 1 on the boundary of G and > 1 outside G. Note
that this function is in fact defined in purely affine terms (the lengths of segments are, of course,
metric notions, but the ratio of lengths of parallel segments is metric-independent).
Now let us switch to properties of self-concordant barriers.
0. Explosure property: Let x ∈ int G and let y be such that DF (x)[y − x] > 0. Then
DF (x)[y − x]
πx (y) ≥ γ ≡ , (3.2)
ϑ
so that the point x + γ −1 (y − x) is not an interior point of G.
Proof. Let
φ(t) = F (x + t(y − x)) : ∆ → R,
44 CHAPTER 3. SELF-CONCORDANT BARRIERS

where ∆ = [0, T ) is the largest half-interval of the ray t ≥ 0 such that x + t(y − x) ∈ int G
whenever t ∈ ∆. Note that the function φ is three times continuously differentiable on ∆ and
that
T = πx−1 (y) (3.3)
(the definition of the Minkowsky function; here 0−1 = +∞).
From the fact that F is ϑ-self-concordant barrier for G it immediately follows (see Proposition
3.1.1.(i)) that q
|φ0 (t)| ≤ ϑ1/2 φ00 (t),
or, which is the same,
ϑψ 0 (t) ≥ ψ 2 (t), t ∈ ∆, (3.4)
where ψ(t) = φ0 (t). Note that ψ(0) = DF (x)[y − x] is positive by assumption and ψ is nonde-
creasing (as the derivative of a convex function), so that ψ is positive on ∆. From (3.4) and the
relation ψ(0) > 0 it follows that ϑ > 0. In view of the latter relation and since ψ(·) > 0, we can
rewrite (3.4) as
(−ψ −1 (t))0 ≡ ψ 0 (t)ψ −2 (t) ≥ ϑ−1 ,
whence
ϑψ(0)
ψ(t) ≥ , t ∈ ∆. (3.5)
ϑ − tψ(0)
The left hand side of the latter relation is bounded on any segment [0, T 0 ], 0 < T 0 < T , and we
conclude that
ϑ
T ≤ .
ψ(0)
Recalling that T = πx−1 (y) and that ψ(0) = DF (x)[y − x], we come to (3.2).
I. Semiboundedness. For any x ∈ int G and y ∈ G one has

DF (x)[y − x] ≤ ϑ. (3.6)

Proof. The relation is evident in the case of DF (x)[y − x] ≤ 0; for the case DF (x)[y − x] > 0
the relation is an immediate consequence of (3.2), since πx (y) ≤ 1 whenever y ∈ G.
II. Upper bound. Let x, y ∈ int G. Then
1
F (y) ≤ F (x) + ϑ ln . (3.7)
1 − πx (y)

Proof. For 0 ≤ t ≤ 1 we clearly have

(1 − t)πx (y)
πx+t(y−x) (y) = ;
1 − tπx (y)

from (3.6) applied to the pair (x + t(y − x); y) it follows that

DF (x + t(y − x))[y − [x + t(y − x)]] ≤ ϑπx+t(y−x) (y),

whence
(1 − t)πx (y)
(1 − t)DF (x + t(y − x))[y − x] ≤ ϑ ,
1 − tπx (y)
3.2. PROPERTIES OF SELF-CONCORDANT BARRIERS 45

or
πx (y)
DF (x + t(y − x))[y − x] ≤ ϑ .
1 − tπx (y)
Integrating over t ∈ [0, 1], we come to
1
F (y) − F (x) ≤ ϑ ln ,
1 − πx (y)
as required.
III. Lower bound. Let x, y ∈ int G. Then
1
F (y) ≥ F (x) + DF (x)[y − x] + ln − πx (y). (3.8)
1 − πx (y)

Proof. Let φ(t) = F (x + t(y − x)), −T− < t < T ≡ πx−1 (t), where T− is the largest t such that
x − t(y − x) ∈ G. By Proposition 3.1.1.(i) φ is a self-concordant barrier for ∆ = [−T− , T ], and
therefore this function is self-concordant on ∆; the closed unit Dikin ellipsoid of φ centered at
t ∈ int ∆ should therefore belong to the closure of ∆ (Lecture 2, I.), which means that

t + [φ00 (t)]−1/2 ≤ T, 0 ≤ t < T

(here 0−1/2 = +∞). We come to the inequality

φ00 (t) ≥ (T − t)−2 , 0 ≤ t < T.

Two sequential integrations of this inequality result in


R1 Rt
F (y) − F (x) − DF (x)[y − x] ≡ φ(1) − φ(0) − φ0 (0) = dt φ00 (τ )dτ
0 0
R1 Rt T
≥ { (T − τ )−2 dτ }dt = ln T −1 − T −1 ;
0 0

substituting T = πx−1 (y), we come to (3.8).


IV. Upper bound on local norm of the first derivative. Let x, y ∈ int G. Then for any
h ∈ E one has
ϑ ϑ h i1/2
|DF (y)[h]| ≤ |h|x ≡ D2 F (x)[h, h] . (3.9)
1 − πx (y) 1 − πx (y)

Comment: By definition, the first-order derivative


√ of the ϑ-self-concordant barrier F at a point
x in any direction h is bounded from above by ϑ times the x-norm |h|x of the direction. The
announced statement says that this derivative is also bounded from above by another constant
times the y-norm of the direction.
Proof of IV. Since x ∈ int G, the closed unit Dikin ellipsoid W of F centered at x is contained
in G (Lecture 2, I.; note that G is closed). Assume, first, that πx (y) > 0. Then there exists
w ∈ G such that
y = x + πx (y)(w − x).
Consider the image V of the ellipsoid W under the dilation mapping z 7→ z + πx (y)(w − z); then

V = {y + h | |h|x ≤ (1 − πx (y))}
46 CHAPTER 3. SELF-CONCORDANT BARRIERS

is an | · |x -ball centered at y and at the same time V ⊂ G (since W ⊂ G and the dilation maps
G into itself). From the semiboundedness property I. it follows that

DF (y)[h] ≤ ϑ ∀h : y + h ∈ G,

and since V ⊂ G, we conclude that

DF (y)[h] ≤ ϑ ∀h : |h|x ≤ 1 − πx (y),

which is nothing but (3.9).


It remains to consider the case when πx (y) = 0, so that the ray [x, y) is contained in G.
From convexity of G it follows that in the case in question y − x is a recessive direction of G:
u + t(y − x) ∈ G whenever u ∈ G and t ≥ 0. In particular, the translation V = W + (y − x) of
W by the vector y − x belongs to G; V is nothing but the | · |x -unit ball centered at y, and it
remains to repeat word by word the above reasoning.
V. Uniqueness of minimizer and Centering property. F is nondegenerate if and only if
G does not contain lines. If G does not contain lines, then F attains its minimum on int G if
and only if G is bounded, and if it is the case, the minimizer x∗F - the F -center of G - is unique
and possesses the following Centering property: √
The closed unit Dikin ellipsoid of F centered at x∗F is contained in G, and the ϑ + 2 ϑ times
larger concentric ellipsoid contains G:

x ∈ G ⇒ |x − x∗F |x∗F ≤ ϑ + 2 ϑ. (3.10)

Proof. As we know from Lecture 2, II., the recessive subspace EF of any self-concordant
function is also the recessive subspace of its domain: int G + EF = int G. Therefore if G does
not contain lines, then EF = {0}, so that F is nondegenerate. Vice versa, if G contains a line
with direction h, then y = x + th ∈ int G for all x ∈ int G and all t ∈ R, from semiboundedness
(see I.) it immediately follows that DF (x)[y − x] = DF (x)[th] ≤ ϑ for all x ∈ int G and all
t ∈ R, which implies that DF (x)[h] = 0. Thus, F is constant along the direction h at any point
of int G, so that D2 F (x)[h, h] = 0 and therefore F is degenerate.
From now on assume that G does not contain lines. If G is bounded, then F , of course,
attains its minumum on int G due to the standard compactness reasons. Now assume that F
attains its minimum on int G; due to nondegeneracy, the minimizer x∗F is unique. Let W be the
closed unit Dikin ellipsoid of F centered at x∗F ; as we know√from I., Lecture 2, it is contained
in G (recall that G is closed). Let us prove that the ϑ + 2 ϑ times larger concentric ellipsoid
W + contains G; this will result both in the boundedness of G and in the announced centering
property and therefore will complete the proof.

Lemma 3.2.1 Let x ∈ int G and let h√be an arbitrary direction with |h|x = 1 such that
DF (x)[h] ≥ 0. Then the point x + (ϑ + 2 ϑ)h is outside the interior of G.

Note that Lemma 3.2.1 immediately implies the desired inclusion G ⊂ W + , since when x = x∗F
is the minimizer of F , so that DF (x)[h] = 0 for all h, the premise of the lemma is valid for any
h with |h|x = 1.
Proof of Lemma. Let φ(t) = D2 F (x + th)[h, h] and T = sup{t | x + th ∈ G}. From
self-concordance of F it follows that

φ0 (t) ≥ −2φ3/2 (t), 0 ≤ t < T,


3.2. PROPERTIES OF SELF-CONCORDANT BARRIERS 47

whence  0
φ−1/2 (t) ≤ 1,
so that
1 1
p −p ≤ t, 0 ≤ t < T.
φ(t) φ(0)
In view of φ00 (0) = |h|2x = 1 we come to

1
φ(t) ≥ , 0 ≤ t < T,
(1 + t)2

which, after integration, results in


Z r Z r
1 r
DF (x + rh)[h] ≡ φ(t)dt ≥ 2
dt = , 0 ≤ r < T. (3.11)
0 0 (1 + t) 1+r

Now, let t ≥ 1 be such that y = x + th ∈ G. Then, as we know from the semiboundedness


relation (3.2),
(t − r)DF (x + rh)[h] ≡ DF (x + rh)[y − (x + rh)] ≤ ϑ.
Combining the inequalities, we come to

(1 + r)ϑ
t≤r+ . (3.12)
r
Taking here r = 1/2, we get certain√ upper bound on t; thus, T ≡ sup{t √ | x + th ∈ G} < ∞, and
(3.12) is valid for t = T . If T > ϑ, then (3.12) is valid for t = T , r = ϑ, and we come to

T ≤ ϑ + 2 ϑ; (3.13)

this latter inequality is, of course, valid in the case of T ≤ ϑ as well. Thus, T always satisfies

(3.13). By construction, x + T h is not an interior point of G, and, consequently, x + [ϑ + 2 ϑ]h
also is not an interior point of G, as claimed.

Corollary 3.2.1 Let h be a recessive direction of G, i.e., such that x + th ∈ G whenever x ∈ G


and t ≥ 0. Then F is nonincreasing in the direction h, and the following inequality holds:
q
−DF (x)[h] ≥ D2 F (x)[h, h], ∀x ∈ int G. (3.14)

Proof. Let x ∈ int G; since h is a recessive direction, y = x + th ∈ G for all t > 0, and I.
implies that DF (x)[y−x] = DF (x)[th] ≤ ϑ for all t ≥ 0, whence DF (x)[h] ≤ 0; thus, F indeed is
nonincreasing in the direction h at any point x ∈ int G. To prove (3.14), consider the restriction
f (t) of F onto the intersection of the line x + Rh with G. Since h is a recessive direction for G,
the domain of f is certain ray ∆ of the type (−a, ∞), a > 0. According to Proposition 3.1.1.(i),
f is self-concordant barrier for the ray ∆. It is possible that f is degenerate: Ef 6= {0}. Since
f is a function of one variable, it is possible only if ∆ = Ef = R (see II., Lecture 2), so that
f 00 ≡ 0; in this case (3.14) is an immediate consequence of already proved nonnegativity of the
left hand side in the relation. Now assume that f is nondegenerate. In view of V. f does not
attain its minimum on ∆ (since f is a nondegenerate self-concordant barrier for an unbounded
domain). From VIII., Lecture 2, we conclude that λ(f, t) ≥ 1 for all t ∈ ∆. Thus,

(f 0 (0))2 (DF (x)[h])2


1 ≤ λ(f, 0) = 00
= 2 ,
f (0) D F (x)[h, h]
48 CHAPTER 3. SELF-CONCORDANT BARRIERS

which combined with already proved nonpositivity of DF (x)[h] results in (3.14).


VI. Geometry of Dikin’s ellipsoids. For x ∈ int G and h ∈ E let

px (h) = inf{r ≥ 0 | x ± r−1 h ∈ G};

this is nothing but the (semi)norm of h associated with the symmetrization of G with respect
to x, i.e., the norm with the unit ball

Gx = {y ∈ E | x ± y ∈ G}.

One has √
px (h) ≤ |h|x ≤ (ϑ + 2 ϑ)px (h). (3.15)

Proof. The first inequality in (3.15) is evident: we know that the closed unit Dikin ellipsoid
of F centered at x is contained in G (since F is self-concordant and G is closed, see I, Lecture
2). In other words, G contains the unit | · |x ball Wc1 (x) centered at x; by definition, the unit
px (·)-ball centered at x is the largest symmetric with respect to x subset of G and therefore it
contains the set W c1 (x), which is equivalent to the left inequality in (3.15). To prove the right

inequality, this is the same as to demonstrate that if |h|x = 1, then px (h) ≥ (ϑ + 2 ϑ)−1 √, or,
which is the same in view of the origin of p, that at least one of the two vectors x ± (ϑ + 2 ϑ)h
does not belong to the interior of G. Without loss of generality, let us assume that DF (x)[h] ≥ 0
(if it is not the case, one should replace in what follows h with −h). The pair√x, h satisfies the
premise of Lemma 3.2.1, and this lemma says to us that the vector x + (ϑ + 2 ϑ)h indeed does
not belong to the interior of G.
VII. Compatibility of Hessians. Let x, y ∈ int G. Then for any h ∈ E one has
√ !2
2 ϑ+2 ϑ
D F (y)[h, h] ≤ D2 F (x)[h, h]. (3.16)
1 − πx (y)

Proof. By definition of the Minkowski function, there exists w ∈ G such that

y = x + πx (y)(w − x) = [1 − πx (y)]x + πx (y)w.

Now, if |h|x ≤ 1, then x + h ∈ G (since the closed unit Dikin ellipsoid of F centered at x is
contained in G), so that the point

y + [1 − πx (y)]h = [1 − πx (y)](x + h) + πx (y)w

belongs to G. We conclude that the centered at y | · |x -ball of the radius 1 − πx (y) is contained
in G and therefore is contained in the largest symmetric with respect to x subset of G; in other
words, we have
|h|x ≤ 1 − πx (y) ⇒ py (h) ≤ 1,
or, which is the same,
py (h) ≤ [1 − πx (y)]−1 |h|x , ∀h.
Combining this inequality with (3.15), we come to (3.16).
We have established the main properties of self-concordant barriers; these properties, along
with the already known to us properties of general self-concordant functions, underly all our
further developments. Let me conclude with the statement of another type:
3.2. PROPERTIES OF SELF-CONCORDANT BARRIERS 49

VIII. Existence of a self-concordant barrier for a given domain. Let G be a closed


convex domain in Rn . Then there exists a ϑ-self-concordant barrier for G, with

ϑ ≤ O(1)n,

O(1) being an appropriate absolute constant. If G does not contain lines, then the above barrier
is given by
F (x) = O(1) ln Vol{Px (G)},
where O(1) is an appropriate absolute constant, Vol is the n-dimensional volume and

Px (G) = {ξ | ξ T (z − x) ≤ 1 ∀z ∈ G}

is the polar of G with respect to x.


I shall not prove this theorem, since we are not going to use it. Let me stress that to apply
the theory we are developing to a particular convex problem, it is necessary and more or less
sufficient to point out an explicit self-concordant barrier for the corresponding feasible domain.
The aforementioned theorem says that such a barrier always exists, and thus gives us certain
encouragement. At the same time, the ”universal” barrier given by the theorem usually is
too complicated numerically, since straightforward computation of a multidimensional integral
involved into the construction is, typically, an untractable task. In the mean time we shall
develop certain technique for constructing ”computable” self-concordant barriers; although not
that universal, this technique will equip us with good barriers for feasible domains of a wide
variety of interesting and important convex programs.
50 CHAPTER 3. SELF-CONCORDANT BARRIERS

3.3 Exercises: Self-concordant barriers


Let us start with a pair of simple exercises which will extend our list of examples of self-
concordant barriers.
Exercise 3.3.1 #+ Let f (x) be a convex quadratic form on Rn , and let the set Q = {x | f (x) <
0} be nonempty. Prove that
F (x) = − ln(−f (x))
is a 1-self-concordant barrier for G = cl Q.
Derive from this observation that if G ⊂ Rn is defined by a system

fi (x) ≤ 0, i = 1, ..., m,

of convex quadratic inequalities which satisfies the Slater condition

∃x : fi (x) < 0, i = 1, ..., m,

then the function


m
X
F (x) = − ln(−fi (x))
i=1
is an m-self-concordant barrier for G.
Note that the result in question is a natural extension of Corollary 3.1.1.
Exercise 3.3.2 ∗
1) Let G be a bounded convex domain in Rn given by m linear or convex quadratic inequalities
fj (x) ≤ 0 satisfying the Slater condition:

G = {x ∈ Rm | fj (x) ≤ 0, j = 1, ..., m}.

Prove that if m > 2n, then one can eliminate from the system at least one inequality in such a
way, that the remaining system still defines a bounded domain.
2) Derive from 1) that if {Gα }α∈I are closed convex domains in Rn with bounded and
nonempty intersection, then there exist an at most 2n-element subset I 0 of the index set I such
that the intersection of the sets Gα over α ∈ I 0 also is bounded.
Note that the requirement m > 2n in the latter exercise is sharp, as it is immediately demon-
strated by the n-dimensional cube.
Exercise 3.3.3 #+ Prove that the function

F (x) = − ln Det x

is m-self-concordant barrier for the cone Sm


+ of symmetric positive semidefinite m × m matrices.

Those who are not afraid of computations, are kindly asked to solve the following
Exercise 3.3.4 Let
K = {(t, x) ∈ R × Rn | t ≥ |x|2 }
be the ”ice cream” cone. Prove that the function

F (x) = − ln(t2 − |x|22 )

is a 2-self-concordant barrier for K.


3.3. EXERCISES: SELF-CONCORDANT BARRIERS 51

My congratulations, if you have solved the latter exercise! In the mean time we shall develop
technique which will allow to demonstrate self-concordance of numerous barriers (including those
given by the three previous exercises) without any computations; those solved exercises 3.3.1 -
3.3.4, especially the latter one, will, I believe, appreciate this technique.
Now let us switch to another topic. As it was announced in Lecture 1 and as we shall see in
the mean time, the value of the parameter of a self-concordant barrier is something extremely
important: this quantity is responsible for the Newton complexity (i.e., # of Newton steps) of
finding an ε-solution by the interior point methods associated with the barrier. This is why it
is interesting to realize what the value of the parameter could be.
Let us come to the statement announced in the beginning of Lecture 3:
(P): Let F be ϑ-self-concordant barrier for a closed convex domain G ⊂ Rn . Then either G = Rn
and F = const, or G is a proper subset of Rn and ϑ ≥ 1.

Exercise 3.3.5 #∗ Prove that the only self-concordant barrier for Rn is constant.

Exercise 3.3.6 #∗ Prove that if ∆ is a segment with a nonempty interior on the axis which
differs from the whole axis and f is a ϑ-self-concordant barrier for ∆, then ϑ ≥ 1. Using this
observation, complete the proof of (P).

(P) says that the parameter of any self-concordant barrier for a nontrivial (differing from the
whole space) convex domain G is ≥ 1. This lower bound can be extended as follows:
(Q) Let G be a closed convex domain in Rn and let u be a boundary point of G. Assume that
there is a neighbourhood U of u where G is given by m independent inequalities, i.e., there exist
m continuously differentiable functions g1 , ..., gm on U such that

G ∩ U = {x ∈ U | gj (x) ≥ 0, j = 1, ..., m}, gj (u) = 0, j = 1, ..., m,

and the gradients of gj at u are linearly independent. Then the parameter ϑ of any self-
concordant barrier F for G is at least m.
We are about to prove (Q). This is not that difficult, but to make the underlying construction
clear, let us start with the case of a simple polyhedral cone.

Exercise 3.3.7 #∗ Let


G = {x ∈ Rn | xi ≥ 0, i = 1, ..., m},
where xi are the coordinates in Rn and m is certain positive integer ≤ n, and let F be a ϑ-self-
concordant barrier for G. Prove that for any x ∈ int G one has

−xi F (x) ≥ 1, i = 1, ..., m; (3.17)
∂xi
derive from this observation that the parameter ϑ of the barrier F is at least m.

Now let us look at (Q). Under the premise of this statement G locally is similar to the above
polyhedral cone; to make the similarity more explicit, let us translate G to make u the origin
and let us choose the coordinates in Rn in such a way that the gradients of gj at the origin,
taken with respect to these coordinates, will be simply the first m basic orths. Thus, we come to
the situation when G contains the origin and in certain neighbourhood U of the origin is given
by
G ∩ U = {x ∈ U | xi ≥ hi (x), i = 1, ..., m},
52 CHAPTER 3. SELF-CONCORDANT BARRIERS

where hi are continuously differentiable functions such that hi (0) = 0, h0i (0) = 0.
Those who have solved the latter exercise understand that that what we need in order to
prove (Q) is certain version of (3.17), something like


−r F (x(r)) ≥ 1 − α(r), i = 1, ..., m, (3.18)
∂xi
where x(r) is the vector with the first m coordinates equal to r > 0 and the remaining ones
equal to 0 and α(r) → 0 as r → +0.
Relation of the type (3.18) does exist, as it is seen from the following exercise:

Exercise 3.3.8 #+ Let f (t) be a ϑ-self-concordant barrier for an interval ∆ = [−a, 0], 0 < a ≤
+∞, of the real axis. Assume that t < 0 is such that the point γt belongs to ∆, where

γ > ( ϑ + 1)2 .

Prove that √
0 ( ϑ + 1)2
−f (t)t ≥ 1 − (3.19)
γ
Derive from this fact that if F is a ϑ-self-concordant barrier for G ⊂ Rn , z√is a boundary point
of G and x is an interior point of G such that z + γ(x − z) ∈ G with γ > ( ϑ + 1)2 , then

( ϑ + 1)2
−DF (x)[z − x] ≥ 1 − . (3.20)
γ

Now we are in a position to prove (Q).

Exercise 3.3.9 #∗ Prove (Q).


Chapter 4

Basic path-following method

The results on self-concordant functions and self-concordant barriers allow us to develop the first
polynomial interior point scheme - the path-following one; on the qualitative level, the scheme
was presented in Lecture I.

4.1 Situation
Let G ⊂ Rn be a closed and bounded convex domain, and let c ∈ Rn , c 6= 0. In what follows
we deal with the problem of minimizing the linear objective cT x over the domain, i.e., with the
problem
P : minimize cT x s.t. x ∈ G.
I shall refer to problem P as to a convex programming program in the standard form. This
indeed is a universal format of a convex program, since a general-type convex problem

minimize f (u) s.t. gj (u) ≤ 0, j = 1, ..., m, u ∈ H ⊂ Rk

associated with convex continuous functions f , gj on a closed convex set H always can be
rewritten as a standard problem; to this end it clearly suffices to set

x = (t, u), c = (1, 0, 0, ..., 0)T , G = {(t, u) | u ∈ H, gj (u) ≤ 0, j = 1, ..., m, f (x) − t ≤ 0}.

The feasible domain G of the equivalent standard problem is convex and closed; passing, if
necessary, to the affine hull of G, we enforce G to be a domain. In our standard formulation, G
is assumed to be bounded, which is not always the case, but the boundedness assumption is not
so crucial from the practical viewpoint, since we can approximate the actual problem with an
unbounded G by a problem with bounded feasible domain, adding, say, the constraint |x|2 ≤ R
with large R.
Thus, we may focus on the case of problem in the standard form P. What we need to solve
P by an interior point method, is a ϑ-self-concordant barrier for the domain, and in what follows
we assume that we are given such a barrier, let it be called F . The exact meaning of the words
”we know F ” is that, given x ∈ int G, we are able to compute the value, the gradient and the
Hessian of the barrier at x.

4.2 F -generated path-following method


Recall that the general path-following scheme for solving P is as follows: given convex smooth
and nondegenerate barrier F for the feasible domain G of the problem, we associate with this

53
54 CHAPTER 4. BASIC PATH-FOLLOWING METHOD

barrier and the objective the penalized family


Ft (x) = tcT x + F (x) : int G → R,
t > 0 being the penalty parameter, and the path of minimizers of the family
x∗ (t) = argmin Ft (·)
int G

which is well-defined due to nondegeneracy of F and boundedness of G. The method generates


a sequence xi ∈ int G which approximates the sequence x∗ (ti ) of points of the path along certain
sequence of values of the penalty parameter ti → ∞; namely, given current pair (ti , xi ) with
xi being ”close” to x∗ (ti ), at an iteration of the method we replace ti by a larger value of the
parameter ti+1 and then update xi into an approximation xi+1 to our new target point x∗ (ti+1 ).
To update xi , we apply to the new function of our family, i.e., to Fti+1 , a method for smooth
unconstrained minimization, xi being the starting point. This is the general path-following
scheme. Note that a self-concordant barrier for a bounded convex domain does satisfy the
general requirements imposed by the scheme; indeed, such a barrier is convex, C3 smooth and
nondegenerate (the latter property is given by V., Lecture 3). The essence of the matter is, of
course, in the specific properties of a self-concordant barrier which make the scheme polynomial.

4.3 Basic path-following scheme


Even with the barrier fixed, the path-following scheme represents a family of methods rather
than a single method; to get a method, one should specify
• policy for updating the penalty parameter;
• what is the ”working horse” - the optimization method used to update x’s;
• what is the stopping criterion for the latter method, or, which is the same, what is the
”closeness to the path x∗ (·)” which is maintained when tracing the path.
In the basic path-following method we are about to present the aforementioned issues are spec-
ified as follows:
• we fix certain parameter γ > 0 - the penalty rate - and update t’s according to the rule
γ
ti+1 = (1 + √ )ti ; (4.1)
ϑ
• to define the notion of ”closeness to the path”, we fix another parameter κ ∈ (0, 1) - the
path tolerance - and maintain along the sequence {(ti , xi )} the closeness relation, namely,
the predicate
q
Cκ (t, x) : {t > 0}&{x ∈ int G}&{λ(Ft , x) ≡ [∇x Ft (x)]T [∇2x F (x)]−1 [∇x Ft (x)] ≤ κ}
(4.2)
(we write ∇2x F instead of ∇2x Ft , since F differs from Ft by a linear function);
• the updating xi 7→ xi+1 is given by the damped Newton method:
1
y l+1 = y l − [∇2 F (y l )]−1 ∇x Fti+1 (y l ); (4.3)
1 + λ(Fti+1 , y l ) x
the recurrency starts at y 0 = xi and is continued until the pair (ti+1 , y l ) turns out to
satisfy the closeness relation Cκ (·, ·); when it happens, we set xi+1 = y l , thus coming to
the updated pair (ti+1 , xi+1 ).
4.4. CONVERGENCE AND COMPLEXITY 55

The indicated rules specify the method, up to the initialization rule - where to take the very
first pair (t0 , x0 ) satisfying the closeness relation; in the mean time we will come to this latter
issue. What we are interested in now are the convergence and the complexity properties of the
method.

4.4 Convergence and complexity


The convergence and the complexity properties of the basic path-following method are described
by the following two propositions:
Proposition 4.4.1 [Rate of convergence] If a pair (t, x) satisfies the closeness relation Pκ with
certain κ ≤ 1/4, then
χ κ √
cT x − c∗ ≤ , χ = ϑ + ϑ, (4.4)
t 1−κ
c∗ being the optimal value in P and ϑ being the parameter of the underlying self-concordant
barrier F . In particular, in the above scheme one has
 −i
T ∗ χ γ χ i
c xi − c ≤ 1+ √ ≤ exp{−O(1) √ }, (4.5)
t0 ϑ t0 ϑ
with positive constant O(1) depending on γ only.
Proof. Let x∗ = x∗ (t) be the minimizer of Ft ; let us start with proving that
ϑ
cT x∗ − c∗ ≤
; (4.6)
t
in other words, when we are exactly on the trajectory, the residual in terms of the objective
admits an objective-independent upper bound which is inverse proportional to the penalty pa-
rameter. This is immediate; indeed, denoting by x+ a minimizer of our objective cT x over G,
we have
∇x Ft (x∗ ) = 0 ⇒ tc = −F 0 (x∗ ) ⇒ t(cT x − cT x+ ) ≡ t(cT x − c∗ ) = [F 0 (x∗ )]T (x+ − x∗ ) ≤ ϑ
(the concluding inequality is the Semiboundedness property I., Lecture 3, and (4.6) follows.
To derive (4.5) from (4.6), let us act as follows. The function Ft (x) is self-concordant on
int G (as a sum of two self-concordant functions, namely, F and a linear function tcT x, see
Proposition 2.1.1.(ii)) and, by assumption, λ ≡ λ(Ft , x) ≤ κ < 1; applying (2.20) (see Lecture
2), we come to
κ
|x − x∗ |x∗ ≤ , (4.7)
1−κ
where | · |x∗ is the Euclidean norm defined by the Hessian of Ft , or, which is the same, of F , at
x∗ . We now have
tc = −F 0 (x∗ ) ⇒
t(cT x − cT x∗ ) = [F 0 (x∗ )]T (x∗ − x) ≤ |x∗ − x|x∗ sup{DF (x∗ )[h] | |h|x∗ ≤ 1} =
κ √
= |x∗ − x|x∗ λ(F, x∗ ) ≤ ϑ
1−κ
(the concluding inequality √ follows from (4.7) and the fact that F is a ϑ-self-concordant barrier
for G, so that λ(F, ·) ≤ ϑ). Thus,
κ √
|cT x − cT x∗ | ≤ ϑ, (4.8)
t(1 − κ)
which combined with (4.6) results in (4.4).
Now we come to the central result
56 CHAPTER 4. BASIC PATH-FOLLOWING METHOD

Proposition 4.4.2 [Newton complexity of a step] The updating recurrency (4.3) is well-defined,
i.e., it keeps the iterates in int G and terminates after finitely many steps; the Newton complexity
of the recurrency, i.e., the # of Newton steps (4.3) before termination, does not exceed certain
constant N which depends on the path tolerance κ and the penalty rate γ only.

Proof. As we have mentioned in the previous proof, the function Fti+1 is self-concordant on
int G and is below bounded on this set (since G is bounded). Therefore the damped Newton
method does keep the iterates y l in int G and ensures the stopping criterion λ(Fti+1 , y l ) ≤ κ
after a finite number of steps (IX., Lecture 2). What we should prove is the fact that the
Newton complexity of the updating is bounded from above by something depending solely on
the path tolerance and the penalty rate. To make clear why it is important here that F is a
self-concordant barrier rather than an arbitrary self-concordant function, let us start with the
following reasoning.
We already have associated with a point x ∈ int G the Euclidean norm
q q
|h|x = hT F 00 (x)h ≡ hT Ft00 (x)h;

in our case F is nondegenerate, so that | · |x is an actual norm, not a seminorm. Let | · |∗x be the
conjugate norm:
|u|∗x = max{uT h | |h|x ≤ 1}.
By definition of the Newton decrement,

λ(Ft , x) = max{[∇x Ft (x)]T h | |h|x ≤ 1} = |∇x Ft (x)|∗x = |tc + F 0 (x)|∗x , (4.9)

and similarly
λ(F, x) = |F 0 (x)|∗x . (4.10)
Now, (ti , xi ) satisfy the closeness relation λ(Ft , x) ≤ κ, i.e.

|ti c + F 0 (x)|∗xi ≤ κ, (4.11)



and F is ϑ-self-concordant barrier, so that λ(F, xi ) ≤ ϑ, or, which is the same in view of (4.10),

|F 0 (xi )|∗xi ≤ ϑ. (4.12)

Combining (4.11) and (4.12), we come to



|ti e|∗xi ≤ κ + ϑ,

whence
γ γκ
|(ti+1 − ti )e|∗xi = √ |ti e|∗xi ≤ γ + √ .
ϑ ϑ
Combining the resulting inequality with (4.11), we come to
κ
λ(Fti+1 , xi ) = |ti+1 c + F 0 (xi )|∗xi ≤ γ + [1 + √ ]γ ≤ 3γ (4.13)
ϑ
(the concluding inequality follows from the fact that the parameter of any nontrivial self-
concordant barrier is ≥ 1, see the beginning of Lecture 3). Thus, the Newton decrement of
the new function Fti+1 at the previous iterate xi is at most the quantity 3γ; if γ and κ are small
enough, this quantity is ≤ 1/4, so that xi is within the region of the quadratic convergence of
the damped Newton method (see IX., Lecture 2), and therefore the method quickly restores the
4.4. CONVERGENCE AND COMPLEXITY 57

closeness relation. E.g., let the path tolerance κ and the penalty rate γ be set to the value 0.05.
Then the above computation results in

λ(Fti+1 , xi ) ≤ 0.15,

and from the description of the local properties of the damped Newton method as applied to
a self-concordant function (see (2.19), Lecture 2) it follows that the Newton iterate y 1 of the
starting point y 0 = xi , the Newton method being applied to Fti+1 , satisfies the relation

λ(Fti+1 , y 1 ) ≤ 2 × (0.15)2 = 0.045 < 0.05 = κ,

i.e., for the indicated values of the parameters a single damped Newton step restores the closeness
to the path after the penalty parameter is updated, so that in this particular case N = 1. Note
that the policy for updating the penalty - which is our presentation looked as something ad hoc
- in fact is a consequence of the outlined reasoning: growth of the penalty given by
O(1)
t 7→ (1 + √ )t
ϑ
is the highest one which results in the relation λ(Fti+1 , xi ) ≤ O(1).
The indicated reasoning gives an insight on what is the intrinsic nature of the method: it
does not allow, anyhow, to establish the announced statement in its complete form, since it
requires certain bounds on the penalty rate. Indeed, our complexity results on the behaviour
of the damped Newton method bound the complexity only when the Newton decrement at the
starting point is less than 1. To ”globalize” the reasoning, we should look at the initial residual
in terms of the objective the Newton method is applied to rather than in terms of the initial
Newton decrement. To this end let us prove the following

Proposition 4.4.3 Let t and τ be two values of the penalty parameter, and let (t, x) satisfy the
closeness relation Cκ (·, ·) with some κ < 1. Then
κ τ √ τ
Fτ (x) − min Fτ (u) ≤ ρ(κ) + |1 − | ϑ + ϑρ(1 − ), (4.14)
u 1−κ t t
where, as always,
ρ(s) = − ln(1 − s) − s.
Proof. The path x∗ (τ ) is given by the equation

F 0 (u) + τ c = 0; (4.15)

since F 00 is nondegenerate, the Implicit Function Theorem says to us that x∗ (t) is continuously
differentiable, and the derivative of the path can be found by differentiating (4.15) in τ :

(x∗ )0 (τ ) = −[F 00 (x∗ (τ ))]−1 c. (4.16)

Now let
φ(τ ) = [τ cT x∗ (t) + F (x∗ (t))] − [τ cT x∗ (τ ) + F (x∗ (τ ))]
be the residual in terms of the objective Fτ (·) taken at the point x∗ (t). We have

φ0 (τ ) = cT x∗ (t) − cT x∗ (τ ) − [τ c + F 0 (x∗ (τ ))]T (x∗ )0 (τ ) = cT x∗ (t) − cT x∗ (τ )

(see (4.15)). We conclude that


φ(t) = φ0 (t) = 0 (4.17)
58 CHAPTER 4. BASIC PATH-FOLLOWING METHOD

and that φ0 (·) = cT x∗ (t) − cT x∗ (τ ) is continuously differentiable; differentiating in τ once more


and taking into account (4.16), we come to
φ00 (τ ) = −cT (x∗ )0 (τ ) = cT [F 00 (x∗ (τ ))]−1 c,
which combined with (4.15) results in
1 0 ∗ 1 ϑ
0 ≤ φ00 (τ ) =
[F (x (τ ))]T [F 00 (x∗ (τ ))]−1 F 0 (x∗ (τ )) = 2 λ2 (F, x∗ (τ )) ≤ 2 (4.18)
τ2 τ τ
(we have used the fact that F is ϑ-self-concordant barrier).
From (4.17), (4.18) it follows that
τ
φ(τ ) ≤ ϑρ(1 − ). (4.19)
t
Now let us estimate the residual invloved into our target inequality (4.14):
Fτ (x) − min Fτ (u) = Fτ (x) − Fτ (x∗ (τ )) = [Fτ (x) − Fτ (x∗ (t))] + [Fτ (x∗ (t)) − Fτ (x∗ (τ ))] =
u

= [Fτ (x) − F τ (x∗ (t))] + φ(τ ) = [Ft (x) − Ft (x∗ (t))] + (t − τ )cT (x − x∗ (t)) + φ(τ ); (4.20)
since Ft (·) is self-concordant and λ(Ft , x) ≤ κ < 1, we have Ft (x) − Ft (x∗ (t)) = Ft (x) −
minu Ft (u) ≤ ρ(λ(Ft , x)) (see (2.16), Lecture 2), whence
Ft (x) − Ft (x∗ (t)) ≤ ρ(κ). (4.21)

(4.8) says to us that |cT (x − x∗ (t))| ≤ κ(1 − κ)−1 ϑt−1 ; combining this inequality, (4.20) and
(4.19), we come to (4.14).
Now we are able to complete the proof of Proposition 4.4.2. Applying (4.14) to x = xi , t = ti
and τ = ti+1 = (1 + √γϑ )ti , we come to
κγ γ
Fti+1 (xi ) − min Fti+1 (u) ≤ ρ(κ) + + ϑρ( √ ),
u 1−κ ϑ
and the left hand side of this inequality is bounded from above uniformly in ϑ ≥ 1 by certain
function depending on κ and γ only (as it is immediately seen from the evident relation ρ(s) ≤
O(s2 ), |s| ≤ 12 1 ).
An immediate consequence of Propositions 4.4.1 and 4.4.2 is the following
Theorem 4.4.1 Let problem P with a closed convex domain G ⊂ Rn be solved by the path-
following method associated with a ϑ-self-concordant barrier F , let κ ∈ (0, 1) and γ > 0 be
the path tolerance and the penalty rate used in the method, and let (t0 , x0 ) be the starting pair
satisfying the closeness relation Cκ (·, ·). Then the absolute inaccuracy cT xi − c∗ of approximate
solutions generated by the method admits the upper bound
2ϑ γ
cT xi − c∗ ≤ (1 + √ )−i , i = 1, 2, ... (4.22)
t0 ϑ
and the Newton complexity of each iteration (ti , xi ) 7→ (ti+1 , xi+1 ) of the method does not exceed
certain constant N depending on κ and γ only. In particular, the Newton complexity (total
# of Newton steps) of finding an ε-solution to the problem, i.e., of finding x ∈ G such that
cT x − c∗ ≤ ε, is bounded from above by
√  
ϑ
O(1) ϑ ln +1 ,
t0 ε
with constant factor O(1) depending solely on κ and γ.
1
here is the corresponding reasoning: if s ≡ γϑ−1/2 ≤ 1/2, then g ≡ ϑρ(γϑ−1/2 ) ≤ O(1)γ 2 due to 0 ≤ s ≤ 1/2;
if s > 1/2, then ϑ ≤ 4γ 2 , and consequently g ≤ 4γ 2 ln γ; note that ϑ ≥ 1. Thus, in all cases the last term in the
estimate is bounded from above by certain function of γ
4.5. INITIALIZATION AND TWO-PHASE PATH-FOLLOWING METHOD 59

4.5 Initialization and two-phase path-following method


The aforementioned description of the method is uncomplete - we know how to follow the path
x∗ (·), provided that we once came close to it, but we do not know yet how to get close to the
path to start the tracing. There are several ways to resolve this initialization difficulty, and the
simplest one is as follows. We know where the path x∗ (t) ends, where it tends to as t → ∞ - all
cluster points of the path belong to the optimal set of the problem. Let us look where the path
starts, i.e., where it tends as t → +0. The answer is evident - as t → +0, the path

x∗ (t) = argmin[tcT x + F (x)]

tends to the analytic center of G with respect to F , to the minimizer x∗F of F over G (since G
is bounded, we know from V., Lecture 3, that this minimizer does exist and is unique). Thus,
all F -generated paths associated with various objectives c start at the same point - the analytic
center of G - and run away from this point as t → ∞, each to the optimal set associated with
the corresponding objective. In other words, the analytic center of G is close to all the paths
generated by F , so that it is a good position to start following the path we are interested in.
Now, how to come to this position? An immediate idea is as follows: the paths associated with
various objectives cover the whole interior of G: if x 6= x∗ is an interior point of G, then a path
passing through x is given by any objective of the form

d = −λF 0 (x),

λ being positive; the path with the indicated objective passes through x when the value of the
penalty parameter is exactly λ. This observation suggests the following initialization scheme:
given a starting point x
b ∈ int G, let us follow the artificial path

u∗ (τ ) = argmin[τ dT x + F (x)], d = −F 0 (x
b)

in the ”inverse time”, i.e., decreasing the penalty parameter τ rather than increasing it. The
artificial path clearly passes through the point x
b:

b = u∗ (1),
x

and we can start tracing it with the pair (τ0 = 1, u0 = xb) which is exactly at the path. When
tracing the path in the outlined manner, we in the mean time come close to the analytic center
of G and, consequently, to the path x∗ (t) we are interested in; when it happens, we can switch
to tracing this target path.
The outlined ideas underly the
Two-Phase Path-Following Method:
Input: starting point x
b ∈ int G; path tolerance κ ∈ (0, 1); penalty rate γ > 0.
Phase 0 [approximating the analytic center] Starting with (τ0 , u0 ) = (1, x
b), generate the se-
quence {(τi , ui )}, updating (ti , ui ) into (τi+1 , ui+1 ) as follows:

•  −1
γ
τi+1 = 1+ √ τi ;
ϑ
• to get ui+1 , apply to the function

Fbτi (x) ≡ τ dT x + F (x)


60 CHAPTER 4. BASIC PATH-FOLLOWING METHOD

the damped Newton method


1
y l+1 = y l − [∇2x F (y l )]−1 ∇x Fbτi+1 (y l )
1 + λ(Fbτ l
i+1 , y )

starting with y 0 = ui . Terminate the method when the pair (τi+1 , y l ) turns out to satisfy
the predicate

Cbκ/2 (τ, u) : {τ > 0}&{u ∈ int G}&{λ(Fbτ , u) ≤ κ/2}; (4.23)

when it happens, set


ui+1 = y l ;

• after (τi+1 , ui+1 ) is formed, check whether


3
λ(F, ui+1 ) ≤ κ; (4.24)
4
if it happens, terminate Phase 0 and call u∗ ≡ ui+1 the result of the phase, otherwise go
to the next step of Phase 0.

Initialization of Phase 1. Given the result u∗ of Phase 0, set

t0 = max{t | λ(Ft , u∗ ) ≤ κ}, x0 = u∗ , (4.25)

thus obtaining the pair (t0 , x0 ) satisfying the predicate Cκ (·, ·).
Phase 1. [approximating optimal solution to P] Starting with the pair (t0 , x0 ), form the
sequence {(ti , xi )} according to the Basic path-following scheme from Section 4.3, namely, given
(ti , xi ), update it into (ti+1 , xi+1 ) as follows:
•  
γ
ti+1 = 1 + √ ti ;
ϑ
• to get xi+1 , apply to Fti+1 the damped Newton method
1
y l+1 = y l − [∇2 F (y l )]−1 ∇x Fti+1 (y l ), (4.26)
1 + λ(Fti+1 , xi ) x

starting with y 0 = xi . Terminate the method when the pair (ti+1 , y l ) turns out to satisfy
the predicate Cκ (·, ·); when it happens, set

xi+1 = y l ,

thus obtaining the updated pair satisfying the predicate Cκ , and go to the next step of
Phase 1.

The properties of the indicated method are described in the following statement:
Theorem 4.5.1 Let problem P be solved by the two-phase path-following method associated with
a ϑ-self-concordant barrier for the domain G (the latter is assumed to be bounded). Then
(i) Phase 0 is finite and is comprised of no more than
!
√ ϑ
Nini = O(1) ϑ ln +1 (4.27)
1 − πx∗F (x
b)
4.5. INITIALIZATION AND TWO-PHASE PATH-FOLLOWING METHOD 61

iterations, with no more than O(1) Newton steps (4.23) at every iteration; here and further O(1)
are constant factors dpending solely on the path tolerance κ and the penalty rate γ used in the
method.
(ii) For any ε > 0, the number of iterations of Phase 1 before an ε-solution to P is generated,
does not exceed the quantity
√  
ϑVar G (c)
Nmain (ε) = O(1) ϑ ln +1 , (4.28)
ε
where
Var G (c) = max cT x − min cT x,
x∈G x∈G
with no more than O(1) Newton steps (4.26) at every iteration.
In particular, the overall Newton complexity (total # of Newton steps of the both phases) of
finding an ε-solution to the problem does not exceed the quantity
√  
V
Ntotal (ε) = O(1) ϑ ln +1 ,
ε
where the data-dependent constant V is given by
ϑVar G (c)
V= .
1 − πx∗F (x
b)

Proof.
10 . Following the line of argument used in the proof of Proposition 4.4.2, one can immediately
verify that the iterations of Phase 0 are well-defined and maintain along the sequence {(τi , ui )}
the predicate Cbκ/2 (·, ·), while the Newton complexity of every iteration of the phase does not
exceed O(1). To complete the proof of (i), we should establish upper bound (4.27) on the number
of iterations of Phase 0. To this end let us note that Cbκ/2 (τi , ui ) means exactly that

λ(Fbτi , ui ) = |τi d + F 0 (ui )|∗ui ≤ κ/2, (4.29)


(compare with (4.9)), whence
λ(F, ui ) = |F 0 (ui )|∗ui ≤ κ/2 + τi |d|∗ui = κ/2 + τi |F 0 (x
b)|∗ui . (4.30)
We have
|F 0 (x
b)|∗x∗ ≡ max{hT F 0 (x b)[h] | D2 F (x∗F )[h, h] ≤ 1} ≤
b) | |h|x∗ ≤ 1} = max{DF (x
F
F

[see IV., Lecture 3, namely, (3.9)]


ϑ
≤α≡ .
1 − πx∗F (x
b)

We see that the variation (the difference between the minumum and the maximum values) of
the linear form f (y) = y T F 0 (x
b) over the unit Dikin ellipsoid of ∗
√ F centered at xF does not exceed∗
2α. Consequently, the variation
√ of the form on the (ϑ + 2 ϑ)-larger concentric ellipsoid W
does not exceed 2α(ϑ + 2 ϑ). From the Centering property V., Lecture 3, we know that W ∗
contains the whole G; in particular, W ∗ contains the unit Dikin ellipsoid W c1 (ui ) of F centered
at ui (I., Lecture 2). Thus, the variation of the linear form y F (x T 0 b) over the √ c1 (ui ),
ellipsoid W
and this is nothing but twice the quantity |F (x 0 ∗
b)|ui , does not exceed 2α(ϑ + 2 ϑ):

0 ∗ ϑ(ϑ + 2 ϑ)
|F (xb)|ui ≤ β ≡ .
1 − πx∗ (x
b)
62 CHAPTER 4. BASIC PATH-FOLLOWING METHOD

Substituting this estimate in (4.30), we come to

λ(F, ui ) ≤ κ/2 + τi β.

Taking into account that τi = (1+ √γϑ )−i , we conclude that the stopping criterion λ(F, ui ) ≤ 3κ/4
b))−1 ), as claimed in (i).
for sure is satisfied when i is O(1) ln(1 + ϑ(1 − πx∗F (x
20 . Now let us verify that
κVar G (c)
t0 ≥ . (4.31)
2
Indeed, since c 6= 0, it follows from the origin of t0 (see (4.25)) that

λ(Ft0 , u∗ ) ≡ |t0 c + F 0 (u∗ )|∗u∗ = κ, (4.32)

while from the termination rule for Phase 0 we know that


3
λ(F, u∗ ) ≡ |F 0 (u∗ )|∗u∗ ≤ κ;
4
we immediately conclude that
κ
t0 |c|∗u∗ ≥ .
2
Now, as above, |c|∗u∗ is the variation of the linear form y T c over the closed unit Dikin ellipsoid of
F centered at u∗ ; this ellipsoid is contained in G (I., Lecture 2), whence |c|∗u∗ ≤ Var G (c). Thus,
κ
t0 Var G (c) ≥ ,
4
and (4.31) follows.
30 . In view of (4.32), the starting pair (t0 , x0 ≡ u∗ ) for Phase 1 satisfies the predicate Cκ ;
applying Theorem 4.4.1 and taking into account (4.31), we come to (ii).

4.6 Concluding remarks


We have seen that the basic path-following method for solving P associated with a ϑ-self-
concordant barrier F for feasible domain G of the problem finds an ε-solution to P in no more
than  
√ V
N (ε) = O(1) ϑ ln
ε
damped Newton steps; here O(1) depends on the path tolerance κ and the penalty rate γ only,
and V is certain data-dependent quantity (note that we include into the data the starting point
x
b ∈ int G as well). When κ and γ are once for ever fixed absolute constants, then the above
O(1) also is an absolute constant; in this case we see that if the barrier F is ”computable”,
i.e., given x and the data vector D(p) identifying the problem instance, one can compute F (x),
F 0 (x) and F 00 (x) in polynomial in l(p) ≡ dim D(p) number of arithmetic operations M, then
the method is polynomial (see Lecture 1), and the arithmetic cost of finding an ε-solution by
the method does not exceed the quantity

M(ε) = O(1)[M + n3 ]N (ε)

(the term n3 is responsible for the arithmetic cost of solving the Newton system at a Newton
step).
4.6. CONCLUDING REMARKS 63

Consider, e.g., a Linear Programming problem

minimize cT x s.t. aTj x ≤ bj , j = 1, ..., m, x ∈ Rn ,

and assume that the system of linear inequalities aTj x ≤ bj , j = 1, ..., m, satisfies the Slater
condition and defines a polytope (i.e., a bounded polyhedral set) G. As we know from Corollary
3.1.1, the standard logarithmic barrier
m
X
F (x) = − ln(bj − aTj x)
j=1

is m-self-concordant logarithmic barrier for G. Of course, this barrier is ”computable”:


m m
X aj X aj aTj
F 0 (x) = , F 00
(x) = ,
b − aTj x
j=1 j
(b − aTj x)2
j=1 j

and we see that the arithmetic cost of computing F (x), F 0 (x) and F 00 (x) is O(mn2 ), while the
dimension of the data vector for a problem instance is O(mn). Therefore the path-following
method associated with the standard logarithmic barrier for the polytope G finds an ε-solution
to the problem at the cost of
 
√ V
N (ε) = O(1) m ln +1
ε

Newton steps, with the arithmetic cost of a step O(1)mn2 (the arithmetic cost O(n3 ) of solving
the Newton system is dominated by the cost of assembling the system, i.e., that one of computing
F 0 and F 00 ; indeed, since G is bounded, we have m > n). Thus, the overall arithmetic cost of
finding an ε-solution to the problem is
 
V
M(ε) = O(1)m1.5 n2 ln +1 ,
ε

so that the ”arithmetic cost of an accuracy digit” is O(m1.5 n3 ). In fact the latter cost can be
reduced to O(mn2 ) by proper implementation of the method (the Newton systems arising at the
neighbouring steps of the method are ”close” to each other, which allows to reduce the average
over steps arithmetic cost of solving the Newton systems), but I am not going to speak about
these acceleration issues.
What should be stressed is that the outlined method is fine from the viewpoint of its the-
oretical complexity; it is, anyhow, far from being appropriate in practice. The main drawback
of the method is its ”short-step” nature: to ensure the theoretical complexity bounds, one is
enforced to increase the penalty √parameter at the rate (1 + O(1)ϑ−1/2 ), so that the number of
Newton steps is proportional to ϑ. For an LP problem of a not too large size - say, n = 1000,
m = 10000, the method would require solving several hundreds, if not thousands, linear systems
with 1000 variables, which will take hours - time incomparable with that one required by the
simplex method; and even moderate increasing of sizes results in days and months instead of
hours. You should not think that these unpleasant practical consequences are caused by the
intrinsic drawbacks of the scheme; they come from our ”pessimistic” approach to the implemen-
tation of the scheme. It turns out that ”most of the time” you can increase the penalty at a
significantly larger rate than that one given by the worst-case theoretical complexity analysis,
and still will be able to restore closeness to the path by a small number - 1-2 - of Newton
steps. There are very good practical implementations of the scheme which use various on-line
64 CHAPTER 4. BASIC PATH-FOLLOWING METHOD

strategies to control the penalty rate and result in a very reasonable - 20-40 - total number of
Newton steps, basically independent of the size of the problem. From the theoretical viewpoint,
anyhow, it is important to develop computationally
√ cheap rules for on-line adjusting the penalty
rate which ensure the theoretical O( ϑ) Newton complexity of the method; in the mean time
we shall speak about recent progress in this direction.
4.7. EXERCISES: BASIC PATH-FOLLOWING METHOD 65

4.7 Exercises: Basic path-following method


The proof of our main rate-of-convergence statement - Proposition 4.4.1 - is based on the fol-
lowing fact:
(*) if x belongs to the path x∗ (t) = argminint G [tcT x + F (x)]: x = x∗ (t) for certain t > 0,
then
ϑ
cT x − c∗ ≤ ,
t

c being the optimal value in P. What is responsible for this remarkable and simple inequality?
The only property of a ϑ-self-concordant barrier F used in the corresponding place of the proof
of Proposition 4.4.1 was the semiboundedness property:

DF (x)[y − x] ≤ ϑ ∀x ∈ int G ∀y ∈ G. (4.33)

In turn looking at the proof of this property (0., I., Lecture 3), one can find out that the only
properties of F and G used there were the following ones:
S(ϑ): G ∈ Rn is a closed convex domain; F is a twice continuously differentiable convex function
on int G such that

DF (x)[h] ≤ ϑ1/2 {D2 F (x)[h, h]}1/2 ∀x ∈ int G ∀h ∈ Rn .

Thus, (4.33) has nothing to do with self-concordance of F .


Exercise 4.7.1 # Verify that S(ϑ) implies (4.33).

Exercise 4.7.2 # Prove that property S(·) is stable with respect to affine substitutions of argu-
ment and with respect to summation; namely, prove that
1) if the pair (G ⊂ Rn , F ) satisfies S(ϑ) and y = A(x) ≡ Ax + a is an affine mapping from Rk
into Rn with the image intersecting int G, then the pair (A−1 (G), F (A(·))) also satisfies S(ϑ);
2) if the pairs (Gi ⊂ Rn , Fi ), i = 1, ..., m, satisfy S(ϑi ) and G = ∩i Gi is a domain, then the
P P
pair (G, i αi Fi ), αi ≥ 0, satisfies S( i αi ϑi ).

Now let us formulate a simple necessary and sufficient condition for a pair (G, F ) to satisfy S(ϑ).
Exercise 4.7.3 # Let ϑ > 0, and let (G ⊂ Rn , F ) be a pair comprised of a closed convex
domain and a function twice continuously differentiable on the interior of the domain. Prove
that (G, F ) sastisfies S(ϑ) if and only if the function exp{−ϑF } is concave on int G. Derive
from this observation and the result of the previous exercise the following statement (due to
Fiacco and McCormic):
let gi , i = 1, ..., m, be convex twice continuously differentiable functions on Rn satisfying the
Slater condition. Consider the logarithmic barrier
X
F (x) = − ln(−gi (x))
i

for the domain


G = {x ∈ Rn | gi (x) ≤ 0, i = 1, ..., m}.
Then the pair (G, F ) satisfies S(m), and therefore F satisfies relation (4.33) with ϑ = m. In
particular, let
x ∈ Argmin[tcT u + F (u)]
u∈int G
66 CHAPTER 4. BASIC PATH-FOLLOWING METHOD

for some positive t; then f (u) ≡ cT u is below bounded on G and


m
cT x − inf f ≤ .
G t
The next exercise is an ”exercise” in the direct meaning of the word.

Exercise 4.7.4 Consider a Quadratically Constrained Quadratic Programming program

minimize f0 (x) s.t. fj (x) ≤ 0, j = 1, ..., m, x ∈ Rn ,

where
fj (x) = xT Aj x + 2bTj x + cj , j = 0, ..., m
are convex quadratic forms. Assume that you are given a point x b such that fj (xb) < 0, j =
1, ..., m, and R > 0 such that the feasible set of the problem is inside the ball {x | |x|2 ≤ R}.
1) reduce the problem to the standard form with a bounded feasible domain and point out an
(m + 2)-self-concordant barrier for the domain, same as an interior point of the domain;
2) write down the algorithmic scheme of the associated path-following method. Evaluate the
arithmetic cost of a Newton step of the method.

Now let us discuss the following issue. In the Basic path-following method the rate of
updating the penalty parameter, i.e., the penalty ratio

ω = ti+1 /ti ,

is set to 1 + O(1)ϑ−1/2 , ϑ being the parameter of the underlying √ barrier. This choice of the
penalty ratio results in the best known, namely, proportional to ϑ, theoretical complexity
bound for the method. In Lecture 4 it was explained that this fine theoretically choice of the
penalty ratio in practice makes the method almost useless, since it for sure enforces the method
to work according its theoretical worst-case complexity bound; the latter bound is in many cases
too large for actual computations. In practice people normally take as the initial value of the
penalty ratio certain moderate constant, say, 2 or 3, and then use various routines for on-line
adjusting the ratio, slightly increasing/decreasing it depending on whether the previous updating
xi 7→ xi+1 took ”small” or ”large” (say, ≤ 2 or > 2) number of Newton steps. An immediate
theoretical question here is: what can be said about the Newton complexity of a path-following
method where the penalty ratio is a once for ever fixed constant ω > 1 (or, more generally, varies
somehow between once for ever fixed bounds ω− < ω+ , with 1 < ω− ≤ ω+ < ∞). The answer
is that in this case the Newton complexity of an iteration (ti , xi ) 7→ (ti+1 , xi+1 ) is of order of ϑ
rather than of order of 1.

Exercise 4.7.5 Consider the Basic path-following method from Section 4.3 with rule (4.1) re-
placed with
ti+1 = ωi ti ,
where ω− ≤ ωi ≤ ω+ and 1 < ω− ≤ ω+ < ∞. Prove that for this version of the method the
statement of Theorem 4.4.1 should be modified as follows: the total # of Newton steps required
to find an ε-solution to P can be bounded from above as
 
ϑ
O(1)ϑ ln +1 ,
t0 ε

with O(1) depending only on κ, ω− , ω+ .


Chapter 5

Conic problems and Conic Duality

In the previous lecture we dealt with the Basic path-following interior point method. It was
explained that the method, being fine theoretically, is not too attractive from the practical
viewpoint, since it is a routine with a prescribed (and normally close to 1) rate of updating
the penalty parameter; as a result, the actual number of Newton steps in the routine is more
or less the same√ as the number given by the theoretical worst-case analysis and for sure is
proportional to ϑ, ϑ being the parameter of the underlying self-concordant barrier. For large-
scale problems, ϑ normally is large, and the # of Newton steps turns out to be too large
for practical applications. The source of difficulty is the conceptual drawback of our scheme:
everything is strictly regulated, there is no place to exploit favourable circumstances which may
occur. As we shall see in the mean time, this conceptual drawback can be eliminated, to certain
extent, even within the path-following scheme; there is, anyhow, another family of interior
point methods, the so called potential reduction ones, which are free of this drawback of strict
regulation; some of these methods, e.g., the famous - and the very first - interior point method
of Karmarkar for Linear Programming, turn out to be very efficient in practice. The methods
of this potential reduction type are what we are about to investigate now; the investigation,
anyhow, should be preceded by developing a new portion of tools, interesting in their own right.
This development is our today goal.

5.1 Conic problems


In order to use the path-following method from the previous lecture, one should reduce the
problem to the specific form of minimizing a linear objective over convex domain; we called this
form standard. Similarly, to use a potential reduction method, one also needs to represent the
problem in certain specific form, called conic; I am about to introduce this form.
Cones. Recall that a convex cone K in Rn is a nonempty convex set with the property

tx ∈ K whenever x ∈ K and t ≥ 0;

in other words, a cone should contain with any of its points the whole ray spanned by the point.
A convex cone is called pointed, if it does not contain lines.
Given a convex cone K ⊂ Rn , one can define its dual as

K ∗ = {s ∈ Rn | sT x ≥ 0 ∀x ∈ K}.

In what follows we use the following elementary facts about convex cones: let K ⊂ Rn be a
closed convex cone and K ∗ be its dual. Then

67
68 CHAPTER 5. CONIC PROBLEMS AND CONIC DUALITY

• K ∗ is closed convex cone, and the cone (K ∗ )∗ dual to it is nothing but K.

• K is pointed if and only if K ∗ has a nonempty interior; K ∗ is pointed if and only if K has
a nonempty interior. The interior of K ∗ is comprised of all vectors s strictly positive on
K, i.e., such that sT x > 0 for all nonzero x ∈ K.

• s ∈ K ∗ is strictly positive on K if and only if the set K(s) = {x ∈ K | sT x ≤ 1} is


bounded.

An immediate corollary of the indicated facts is that a closed convex cone K is pointed and
possesses a nonempty interior if and only if its dual shares these properties.
Conic problem. Let K ⊂ Rn be a closed pointed convex cone with a nonempty interior.
Consider optimization problem

(P) : minimize cT x s.t. x ∈ {b + L} ∩ K,

where

• L is a linear subspace in Rn ;

• b is a vector from Rn .

Geometrically: we should minimize a linear objective (cT x) over the intersection of an affine
plane (b+L) with the cone K. This intersection is a convex set, so that (P) is a convex program;
let us refer to it as to convex program in the conic form.
Note that a program in the conic form strongly resembles a Linear Programming program in
the standard form; this latter problem is nothing but (P) with K specified as the nonnegative
orthant Rn+ . On the other hand, (P) is a universal form of a convex programming problem.
Indeed, it suffices to demonstrate that a standard convex problem

(S) minimize dT u s.t. u ∈ G ⊂ Rk ,

G being a closed convex domain, can be equivalently rewritten in the conic form (P). To this
end it suffices to represent G as an intersection of a closed convex cone and an affine plane,
which is immediate: identifying Rk with the affine hyperplane

Γ = {x = (t, u) ∈ Rk+1 | t = 1},

we can rewrite (S) equivalently as

(Sc ) minimize cT x s.t. x ∈ Γ ∩ K,

where  
0
c=
d
and
K = cl{(t, x) | t > 0, t−1 x ∈ G}
is the conic hull of G. It is easily seen that (S) is equivalent to (Sc ) and that the latter problem
is conic (i.e., K is a closed convex pointed cone with a nonempty interior), provided that the
closed convex domain G does not contain lines (whih actually is not a restriction at all). Thus,
(P) indeed is a universal form of a convex program.
5.2. CONIC DUALITY 69

5.2 Conic duality


The similarity between conic problem (P) and a Linear Programming problem becomes very
clear when the duality issues are concerned. This duality, which is important for developing
potential reduction methods and interesting in its own right, is our now subject.

5.2.1 Fenchel dual to (P)


We are about to derive the Fenchel dual of conic problem (P), and let me start with recalling
you what is the Fenchel duality.
Given a convex, proper, and closed function f on Rn taking values in the extended
real axis R∪{+∞} (”proper” means that the domain domf of the function f , i.e., the
set where f is finite, is nonempty; ”closed” means that the epigraph of the function
is closed1 , one can define its congugate (the Legendre transformation)
f ∗ (s) = sup {sT x − f (x)} = sup {sT x − f (x)},
x∈Rn x∈domf

which again is a convex, proper and closed function; the conjugacy is an involution:
(f ∗ )∗ = f .
Now, let f1 , ..., fk be convex proper and closed functions on Rn such that the
relative interiors of the domains of the functions (i.e., the interiors taken with respect
to the affine hulls of the domains) have a point in common. The Fenchel Duality
theorem says that if the function
k
X
f (x) = fi (x)
i=1

is below bounded, then


− inf f = min {f1∗ (s1 ) + ... + fk∗ (sk )} (5.1)
s1 ,...,sk :s1 +...+sk =0

(note this min in the right hand side: the theorem says, in particular, that it indeed
is achieved). The problem
k
X X
minimize fi∗ (si ) s.t. si = 0
i=1 i

is called the Fenchel dual to the problem


X
minimize fi (x).
i

Now let us derive the Fenchel dual to the conic problem (P). To this end let us set
 
T 0, x∈b+L 0, x∈K
f1 (x) = c x; f2 (x) = ; f3 (x) = ;
+∞, otherwise +∞, otherwise
these functions clearly are convex, proper and closed, and (P) evidently is nothing but the
problem of minimizing f1 + f2 + f3 over Rn . To write down the Fenchel dual to the latter
problem, we should realize what are the functions fi∗ , i = 1, 2, 3. This is immediate:

0, s=c
f1∗ (s) = sup{sT x − cT x | x ∈ Rn } = ;
+∞ otherwise
1
equivalently: f is lower semicontinuous, or: the level sets {x | f (x) ≤ a} are closed for every a ∈ R
70 CHAPTER 5. CONIC PROBLEMS AND CONIC DUALITY
 T
s b,s ∈ L⊥
f2∗ (s) = sup{sT x − 0 | x ∈ domf2 ≡ b + L} = ,
+∞, otherwise
where L⊥ is the orthogonal complement to L;

0, s ∈ −K ∗
f3∗ (s) T
= sup{s x − 0 | x ∈ domf3 ≡ K} = ,
+∞, otherwise
where K ∗ is the cone dual to K.
Now, in the Fenchel dual to (P), i.e., in the problem of minimizing f1∗ (s1 ) + f2∗ (s2 ) + f3∗ (s3 )
over s1 , s2 , s3 subject to s1 + s2 + s3 = 0, we clearly can restrict si to be in domfi∗ without
violating the optimal solution; thus, we may restrict ourselves to the case when s1 = c, s2 ∈ L⊥
and s3 ∈ −K ∗ , while s1 + s2 + s3 = 0; under these restrictions the objective in the Fenchel dual
is equal to sT2 b. Expressing s1 , s2 , s3 in terms of s = s1 + s2 ≡ −s3 , we come to the following
equivalent reformulation of the Fenchel dual to (P):

(D) minimize bT s s.t. s ∈ {c + L⊥ } ∩ K ∗ .

Note that the actual objective in the Fenchel dual is sT2 b ≡ sT b + cT b; writing down (D), we
omit the constant term cT b (this does not influence the optimal set, although varies the optimal
value). Problem (D) is called the conic dual to the primal conic problem (P).
Note that K is assumed to be closed convex and pointed cone with a nonempty interior;
therefore the dual cone K ∗ also is closed, pointed, convex and with a nonempty interior, so that
the dual problem also is conic. Bearing in mind that (K ∗ )∗ = K, one can immediately verify
that the indicated duality is completely symmetric: the problem dual to dual is exactly the
primal one. Note also that in the Linear Programming case the conic dual is nothing but the
usual dual problem written down in terms of slack variables.

5.2.2 Duality relations


Now let us establish several useful facts about conic duality; all of them are completely similar
to what we know from LP duality.
0. Let (x, s) be a primal-dual feasible pair, i.e., a pair comprised of feasible solutions to (P) and
(D). Then
cT x + bT s − cT b = xT s ≥ 0.
The left hand side of the latter relation is called the duality gap; 0. says that the duality gap
is equal to xT s and always is nonnegative. The proof is immediate: since x is primal feasible,
x − b ∈ L, and since s is dual feasible, s − c ∈ L⊥ , whence

(x − b)T (s − c) = 0,

or, which is the same,


cT x + bT s − cT b = xT s;
the right hand side here is nonnegative, since x ∈ K and s ∈ K ∗ .
I. Let P ∗ and D∗ be the optimal values in the primal and the dual problem, respectively (optimal
value is +∞, if the problem is unfeasible, and −∞, if it is below unbounded). Then

P ∗ + D∗ ≥ cT b,

where, for finite a, ±∞ + a = ±∞, the sum of two infinities of the same sign is the infinity of
this sign and (+∞) + (−∞) = +∞.
5.2. CONIC DUALITY 71

This is immediate: take infimums in primal feasible x and dual feasible s in the relation cT x +
bT s ≥ cT b (see 0.).
II. If the dual problem is feasible, then the primal is below bounded2 ; if the primal problem is
feasible, then the dual is below bounded.
This is an immediate corollary of I.: if, say, D∗ is < +∞, then P ∗ > −∞, otherwise D∗ + P ∗
would be −∞, which is impossible in view of I.
III. Conic Duality Theorem. If one of the problems in the primal-dual pair (P), (D) is
strictly feasible (i.e., possesses feasible solutions from the interior of the corresponding cone)
and is below bounded, then the second problem is solvable, the optimal values in the problems
are finite and optimal duality gap P ∗ + D∗ − cT b is zero.
If both of the problems are strictly feasible, then both of them are solvable, and a pair
(x , s∗ ) comprised of feasible solutions to the problems is comprised of optimal solutions if and

only if the duality gap cT x∗ + bT s∗ − cT b is zero, and if and only if the complementary slackness
(x∗ )T s∗ = 0 holds.
Proof. Let us start with the first statement of the theorem. Due to primal-dual symmetry, we
can restrict ourselves with the case when the strictly feasible below bounded problem is (P).
Strict feasibility means exactly that the relative interiors of the domains of the functions f1 , f2 ,
f3 (see the derivation of (D)) have a point in common, due to the description of the domains of
f1 (the whole space), f2 (the affine plane b + L), f3 (the cone K). The below boundedness of (P)
means exactly that the function f1 + f2 + f3 is below bounded. Thus, the situation is covered
by the premise of the Fenchel duality theorem, and according to this theorem, the Fenchel dual
to (P), which can be obtained from (D) by substracting the constant cT b from the objective, is
solvable. Thus, (D) is solvable, and the sum of optimal values in (P) and (D) (which is by cT b
greater than the zero sum of optimal values stated in the Fenchel theorem) is C T b, as claimed.
Now let us prove the second statement of the theorem. Under the premise of this statement
both problems are strictly feasible; from II. we conclude that both of them are also below
bounded. Applying the first statement of the theorem, we see that both of the problems are
solvable and the sum of their optimal values is cT b. It immediately follows that a primal-dual
feasible pair (x, s) is comprised of primal-dual optimal solutions if and only if cT x + bT s = cT b,
i.e., if and only if the duality gap at the pair is 0; since the duality gap equals also to xT s (see
0.), we conclude that the pair is comprised of optimal solutions if and only if xT s = 0.

Remark 5.2.1 The Conic duality theorem, although very similar to the Duality theorem in
LP, is a little bit weaker than the latter statement. In the LP case, already (feasibility + below
boundedness), not (strict feasibility + below boundedness), of one of the problems implies
solvability of both of them and characterization of the optimality identical to that one given
by the second statement of the Conic duality theorem. A ”word by word” extension of the LP
Duality theorem fails to be true for general cones, which is quite natural: in the non-polyhedral
case we need certain qualification of constrains, and strict feasibility is the simplest (and the
strongest) form of this qualification. From the exercises accompanying the lecture you can find
out what are the possibilities to strengthen the Conic duality theorem, on one hand, and what
are the pathologies which may occur if the assumptions are weakened too much, on the other
hand.

Let me conclude this part of the lecture by saying that the conic duality is, as we shall see, useful
for developing potential reduction interior point methods. It also turned out to be powerful tool
for analytical - on paper - processing a problem; in several interesting cases, as we shall see
2
i.e., P ∗ > −∞; it may happen, anyhow, that (P) is unfeasible
72 CHAPTER 5. CONIC PROBLEMS AND CONIC DUALITY

in the mean time, it allows to derive (completely mechanically!) nontrivial and informative
reformulations of the initial setting.

5.3 Logarithmically homogeneous barriers


To develop potential reduction methods, we need deal with conic formulations of convex pro-
grams and should equip the corresponding cones with specific self-concordant barriers - the
logarithmically homogeneous ones. This latter issue is our current goal.

Definition 5.3.1 Let K ⊂ Rn be a a convex, closed and pointed cone with a nonempty interior,
and let ϑ ≥ 1 be a real. A function F : int K → R is called ϑ-logarithmically homogeneous self-
concordant barrier for K, if it is self-concordant on int K and satisfies the identity

F (tx) = F (x) − ϑ ln t ∀x ∈ int K ∀t > 0. (5.2)

Our terminology at this point looks confusing: it is not clear whether a ”logarithmically homo-
geneous self-concordant barrier” for a cone is a ”self-concordant barrier” for it. This temporary
difficulty is resolved by the following statement.

Proposition 5.3.1 A ϑ-logarithmically homogeneous self-concordant barrier F for K is a non-


degenerate ϑ-self-concordant barrier for K. Besides this, F satisfies the following identities
(x ∈ int K, t > 0):
F 0 (tx) = t−1 F 0 (x); (5.3)
F 0 (x) = −F 00 (x)x; (5.4)
λ2 (F, x) ≡ −xT F 0 (x) ≡ xT F 00 (x)x ≡ ϑ. (5.5)

Proof. Since, by assumption, K does not contain lines, F is nondegenerate (II., Lecture 2).
Now let us prove (5.3) - (5.5). Differentiating the identity

F (tx) = F (x) − ϑ ln t (5.6)

in x, we come to (5.3); differentiating (5.3) in t and setting t = 1, we obtain (5.4). Differentiating


(5.6) in t and setting t = 1, we come to

−xT F 0 (x) = ϑ.

Due to already proved (5.4), this relation implies all equalities in (5.5), excluding the very
first of them; this latter follows from the fact that x, due to (5.4), is the Newton direction
−[F 00 (x)]−1 F 0 (x) of F at x, so that λ2 (F, x) = −xT F 0 (x) (IVa., Lecture 2). √
Form (5.5) it follows that the Newton decrement of F is identically equal to ϑ; since, by
definition, F is self-concordant on int K, F is ϑ-self-concordant barrier for K.
Let us list some examples of self-concordant barriers.

Example 5.3.1 The standard logarithmic barrier


n
X
F (x) = − ln xi
i=1

for the nonnegative orthant Rn+ is n-logarithmically homogeneous self-concordant barrier for the
orthant.
5.3. LOGARITHMICALLY HOMOGENEOUS BARRIERS 73

Example 5.3.2 The function


F (x) = − ln(t2 − |x|22 )
is 2-logarithmically homogeneous self-concordant barrier for the ice-cream cone
Kn2 = {(t, x) ∈ Rn+1 | t ≥ |x|2 }.
Example 5.3.3 The function
F (x) = − ln Det x
is n-logarithmically self-concordant barrier for the cone Sn+ of symmetric positive semidefinite
n × n matrices.
Indeed, self-concordance of the functions listed in the above examples is given, respectively, by
Corollary 2.1.1, Exercise 3.3.4 and Exercise 3.3.3; logarithmic homogeneity is evident.
The logarithmically homogeneous self-concordant barriers admit combination rules com-
pletely similar to those for self-concordant barriers:
Proposition 5.3.2 (i) [stability with respect to linear substitutions of the argument] Let F be
ϑ-logarithmically homogeneous self-concordant barrier for cone K ⊂ Rn , and let x = Ay be a
linear homogeneous mapping from Rk into Rn , with matrix A being of the rank k, such that the
image of the mapping intersects int K. Then the inverse image K + = A−1 (K) of K under the
mapping is convex pointed and closed cone with a nonempty interior in Rk , and the function
F (Ay) is ϑ-logarithmically homogeneous self-concordant barrier for K + .
(ii) [stability with respect to summation] Let Fi , i = 1, ..., k, be ϑi -logarithmically ho-
mogeneous self-concordant barriers for cones Ki ⊂ Rn , and let αi ≥ 1. Assume that the
P P
cone K = ∩ki=1 Ki possesses a nonempty interior; then the function ki=1 αi Fi is ( i αi ϑi )-
logarithmically homogeneous self-concordant barrier for K.
(iii) [stability with respect to direct summation] Let Fi , i = 1, ..., k, be ϑi -logarithmically
homogeneous self-concordant barriers for cones Ki ⊂ Rni . Then the direct sum
F1 (x1 ) + ... + Fk (xk )
P
of the barriers is ( i ϑi )-logarithmically homogeneous self-concordant barrier for the direct prod-
uct K1 × ... × Kk of the cones.
The proposition is an immediate corollary of Proposition 3.1.1 and Definition 5.3.1.
In what follows we heavily exploit the following property of logatrithmically homogeneous
self-concordant barriers:
Proposition 5.3.3 Let K ⊂ Rn be a convex pointed closed cone with a nonempty interior, and
let F be a ϑ-logarithmically homogeneous self-concordant barrier for K. Then
(i) The domain Dom F ∗ of the Legendre transformation of the barrier F is exactly the interior
of the cone −K ∗ anti-dual to K and F ∗ is ϑ-logarithmically homogeneous self-concordant barrier
for this anti-dual cone. In particular, the mapping
x 7→ F 0 (x) (5.7)
is a one-to-one mapping of int K onto − int K ∗ with the inverse given by s 7→ (F ∗ )0 (s).
(ii) For any x ∈ int K and s ∈ int K ∗ the following inequality holds:
F (x) + F ∗ (−s) + ϑ ln(xT s) ≥ ϑ ln ϑ − ϑ. (5.8)
This inequality is equality if and only if
s = −tF 0 (x) (5.9)
for some positive t.
74 CHAPTER 5. CONIC PROBLEMS AND CONIC DUALITY

Proof.
10 . From Proposition 5.3.2 we know that F is nondegenerate; therefore F ∗ is self-concordant
on its domain Q, and the latter is nothing but the image of int K under the one-to-one mapping
(5.7), the inverse to the mapping being s 7→ (F ∗ )0 (s) (see Lecture 2, (L.1)-(L.3) and VII.).
Further, from (5.3) it follows that Q is an (open) cone; indeed, any point s ∈ Q, due to already
proved relation Q = F 0 (int K), can be represented as F 0 (x) for some x ∈ int K, and then
ts = F 0 (t−1 x) also belongs to Q. It follows that K + = cl Q is a closed convex cone with a
nonempty interior.
20 . Let us prove that K + = −K ∗ . This is exactly the same as to prove that the interior
of −K ∗ (which is comprised of s strictly negative on K, i.e., with sT x being negative for any
nonzero x ∈ K, see Section 5.1) coincides with Q ≡ F 0 (int K):

F 0 (int K) = − int K ∗ . (5.10)

20 .1. The inclusion


F 0 (int K) ⊂ − int K ∗ (5.11)
is immediate: indeed, we should verify that for any x ∈ int K F 0 (x) is strictly negative on K,
i.e., that y T F 0 (x) is negative whenever y ∈ K is nonzero. This is readily given by Corollary
3.2.1: since K is a cone, y ∈ K is a recessive direction for K, and, due to the Corollary,

−y T F 0 (x) ≡ −DF (x)[y] ≥ {D2 F (x)[y, y]}1/2 ;

the concluding quantity here is strictly positive, since y is nonzero and F , as we already know,
is nondegenerate.
20 .2. To complete the proof of (5.10), we need to verify the inclusion inverse to (5.11), i.e.,
we should prove that if s is strictly negative on K, then s = F 0 (x) for certain x ∈ int K. Indeed,
since s is strictly negative on K, the cross-section

Ks = {y ∈ K | sT y = −1} (5.12)

is bounded (Section 5.1). The restirction of F onto the relative interior of this cross-section is
a self-concordant function on rintKs (stability of self-concordance with respect to affine substi-
tutions of argument, Proposition 2.1.1.(i)). Since Ks is bounded, F attains its minimum on the
relative interior of Ks at certain point y, so that

F 0 (y) = λs

for some λ, The coefficient λ is positive (since y T F 0 (y) = λy T s is negative in view of (5.5) and
y T s = −1 also is negative (recall that y ∈ Ks ). Since λ is positive and F 0 (y) = λs, we conclude
that F 0 (λ−1 y) = s (5.3), and s indeed is F 0 (x) for some x ∈ int K (namely, x = λ−1 y). The
inclusion (5.10) is proved.
30 . Summarising our considerations, we see that F ∗ is self-concordant on the interior of the
cone −K ∗ ; to complete the proof of (i), it suffices to verify that

F ∗ (ts) = F (s) − ϑ ln t.

This is immediate:

(F ∗ )(ts) = sup {tsT x − F (x)} = sup {sT y − F (y/t)} =


x∈int K y≡tx∈int K

= sup {sT y − [F (y) − ϑ ln(1/t)]} = F ∗ (s) − ϑ ln t.


y∈int K
5.3. LOGARITHMICALLY HOMOGENEOUS BARRIERS 75

(i) is proved.
40 . Let us prove (ii). First of all, for x ∈ int K and s = −tF 0 (x) we have

F (x) + F ∗ (−s) + ϑ ln(xT s) = F (x) + F ∗ (tF 0 (x)) + ϑ ln(−txT F 0 (x)) =

[since F ∗ is ϑ-logarithmically homogeneous due to (i) and −xT F 0 (x) = ϑ, see (5.5)]

= F (x) + F ∗ (F 0 (x)) + ϑ ln ϑ =

[since F ∗ (F 0 (x)) = xT F 0 (x) − F (x) due to the definition of the Legendre transformation]

= xT F 0 (x) + ϑ ln ϑ = ϑ ln ϑ − ϑ

(we have used (5.5)). Thus, (5.8) indeed is equality when s = −tF 0 (x) with certain t > 0.
50 . To complete the proof of (5.8), it suffices to demonstrate that if x and s are such that

V (x, s) = F (x) + F ∗ (−s) + ϑ ln(sT x) ≤ ϑ ln ϑ − ϑ, (5.13)

then s is proportional, with positive coefficient, to −F 0 (x). To this end consider the cross-section
of K as follows:
Ks = {y ∈ K | sT y = sT x}.
The restriction of V (·, s) onto the relative interior of Ks is, up to additive constant, equal to the
restriction of F , i.e., it is self-concordant (since Ks is cross-section of K by an affine hyperplane
passing through an interior point of K; we have used similar reasoning in 20 .2). Since Ks is
bounded (by virtue of s ∈ int K ∗ ), F , and, consequently, V (·, s) attains its minimum on the
relative interior of Ks , and this minimum is unique (since F is nondegenerate). At the minimizer,
let it be y, one should have
F 0 (y) = −λs;
taking here inner product with y and using (5.5) and the inclusion y ∈ Ks , we get λ > 0. As we
alerady know, the relation F 0 (y) − −λs with positive λ implies that V (y, s) = ϑ ln ϑ − ϑ; now
from (5.13) it follows that V (y, s) ≥ V (x, s). Since, by construction, x ∈ rintKs and y is the
unique minimizer of V (·, s) on the latter set, we conclude that x = y, so that F 0 (x) = −λs, and
we are done.
76 CHAPTER 5. CONIC PROBLEMS AND CONIC DUALITY

5.4 Exercises: Conic problems


The list of below exercises is unusually large; you are kindly asked at least to look through the
formulations.

5.4.1 Basic properties of cones


Those not familiar with some of the facts on convex cones used in the lecture (see Section 5.1),
are recommended to solve the exercises from this subsection; in these exercises, K ⊂ Rn is a
closed convex cone and K ∗ is its dual.
Exercise 5.4.1 #+ Prove that K ∗ is closed cone and (K ∗ )∗ = K.

Exercise 5.4.2 #+ Prove that K possesses a nonempty interior if and only if K ∗ is pointed,
and that K ∗ possesses a nonempty interior if and only if K is pointed.

Exercise 5.4.3 #+ Let s ∈ Rn . Prove that the following properties of s are equivalent:
(i) s is strictly positive on K, i.e., sT x > 0 whenever x ∈ K is nonzero;
(ii) The set K(s) = {x ∈ K | sT x ≤ 1} is bounded;
(iii) s ∈ int K ∗ .
Formulate ”symmetric” characterization of the interior of K.

5.4.2 More on conic duality


Here we list some duality relations for the primal-dual pair (P), (D) of conic problems (see
Lecture 5). The goal is to realize to which extent the standard properties of LP duality preserve
in the general case. The forthcoming exercises are not accompanied by solutions, although some
of then are not so simple.
Given a conic problem, let it be called (T ), with the data Q (the cone), r (the objective),
d + M (the feasible plane; M is the corresponding linear subspace), denote by D(T ) the feasible
set of the problem and consider the following properties:
• (F): Feasibility: D(T ) 6= ∅;
• (B): Boundedness of the feasible set (D(T ) is bounded, e.g., empty);
• (SB): Boundedness of the solution set (the set of optimal solutions to (T ) is nonempty
and bounded);
• (BO): Boundedness of the objective (the objective is below bounded on D(T ), e.g., due
to D(T ) = ∅);
• (I): Existence of a feasible interior point (D(T ) intersects int Q);
• (S): Solvability ((T ) is solvable);
• (WN): Weak normality (both (T ) and its conic dual are feasible, and the sum of their
optimal values equals to rT d).
• (N): Normality (weak normality + solvability of both (T ) and its conic dual).
Considering a primal-dual pair of conic problems (P), (D), we mark by superscript p, d, that
the property in question is shared by the primal, respectively, the dual problem of the pair; e.g.,
(Sd ) is abbreviation for the property ”the dual problem (D) is solvable”.
Good news about conic duality:
5.4. EXERCISES: CONIC PROBLEMS 77

Exercise 5.4.4 Prove the following implications:


1) (Fp )⇒ (BOd )
”if primal is feasible, then the dual is below bounded”; this is II., Lecture 5; this is exactly as
in LP;
2) [(Fp ) & (Bp )] ⇒ [(Sp ) & (WN)]
”if primal is feasible and its feasible set is bounded, then primal is solvable, dual is feasible and
below bounded, and the sum of primal and dual optimal values equals to cT b”; in LP one can
add to the conclusion ”the dual is solvable”;
3) [(Ip ) & (BOp )] ⇒ [(Sd ) & (WN)]
this is exactly the Conic duality theorem;
4) (SBp ) ⇒ (WN)
”if primal is solvable and its optimal set is bounded, then dual is feasible and below bounded,
and the sum of primal and dual optimal values equals to cT b”; in LP one can omit ”optimal set
is bounded” in the premise and add ”dual is solvable” to the conclusion.
Formulate the ”symmetric” versions of these implications, by interchanging the primal and
the dual problems.

Bad news about conic duality:

Exercise 5.4.5 Demonstrate by examples, that the following situations (which for sure do not
occur in LP duality) are possible:
1) the primal problem is strictly feasible and below bounded, and at the same time it is
unsolvable (cf. Exercise 5.4.4, 2));
2) the primal problem is solvable, and the dual is unfeasible (cf. Exercise 5.4.4, 2), 3), 4));
3) the primal problem is feasible with bounded feasible set, and the dual is unsolvable (cf.
Exercise 5.4.4, 2), 3));
3) both the primal and the dual problems are solvable, but there is nonzero duality gap: the
sum of optimal values in the problems is strictly greater than cT b (cf. Exercise 5.4.4, 2), 3)).

The next exercise is of some interest:

Exercise 5.4.6 ∗ Assume that both the primal and the dual problem are feasible. Prove that
the feasible set of at least one of the problems is unbounded.

5.4.3 Complementary slackness: what it means?


The Conic duality theorem says that if both the primal problem (P) and the dual problem (D),
see Lecture 5, are strictly feasible, then both of them are solvable, and the pair (x, s) of feasible
solutions to the problems is comprised of optimal solutions if and only if xT s = 0. What does
the latter relation actually mean, it depends on analytic structure of the underlying cone K.
Let us look what happens in several specific cases which are responsible for a wide spectrum of
applications.
Recall that in Lecture 5 we have mentioned three particular (families of) cones:

• the cone Rn+ - the n-dimensional nonnegative orthant in Rn ; the latter space from now on
is equipped with the standard Euclidean structure given by the inner product xT y;

• the cone Sn+ of positive semidefinite symmetric n×n matrices in the space Sn of symmetric
n × n matrices; this latter space from now on is equipped with the Frobenius Euclidean
structure given by the inner product hx, yi = Tr{xy}, Tr being the trace; this is nothing
but the sum, over all entries, of the products of the corresponding entries in x and in y;
78 CHAPTER 5. CONIC PROBLEMS AND CONIC DUALITY

• the ”ice-cream” (more scientific name - second-order) cone


q
Kn2 = {x ∈ Rn+1 | xn+1 ≥ x21 + ... + x2n };

this is a cone in Rn+1 , and we already have said what is the Euclidean structure the space
is equipped with.

Exercise 5.4.7 # Prove that each of the aforementioned cones is closed, pointed, convex and
with a nonempty interior and, besides this, is self-dual, i.e., coincides with its dual cone3 .

Now let us look what means complementary slackness in the case of our standard cones.

Exercise 5.4.8 # Let K be a cone, K ∗ be a dual cone and let x, s satisfy the complementary
slackness relation
S(K) : {x ∈ K}&{s ∈ K ∗ }&{xT s = 0}.
Prove that
1) in the case of K = Rn+ the relation S says exactly that x and s are nonnegative n-
dimensional vectors with the zero dot product x × s = (x1 s1 , ..., xn sn )T ;
2)+ in the case of K = Sn+ the relation S says exactly that x and s are positive semidefinite
symmetric matrices with zero product xs; if it is the case, then x and s commutate and possess,
therefore, a common eigenbasis, and the dot product of the diagonals of x and s in this basis is
zero; q
3)+ in the case of K = Kn2 the relation S says exactly that xn+1 = x21 + ... + x2n , sn+1 =
q
s21 + ... + s2n and
x1 : s1 = x2 : s2 = ... = xn : sn = −[xn+1 : sn+1 ].

We have presented the ”explicit characterization” of complementary slackness for our particular
cones which often occur in applications, sometimes as they are, and sometimes - as certain
”building blocks”. I mean that there are decomposable situations where the cone in question is
a direct product:
K = K1 × ... × Kk ,
and the Euclidean embedding space for K is the direct product of Euclidean embedding spaces for
the ”component cones” Ki . In such a situation the complementary slackness is ”componentwise”:

Exercise 5.4.9 # Prove that in the aforementioned decomposable situation

K ∗ = K1∗ × ... × Kk∗ ,

and a pair x = (x1 , ..., xk ), s = (s1 , ..., sk ) possesses the complementary slackness property S(K)
if and only if each of the pairs xi , si possesses the property S(Ki ), i = 1, ..., k.

Thus, if we are in a decomposable situation and the cones Ki belong each to its own of our three
standard families, then we are able to interpret explicitly the complementary slackness relation.
Let me complete this section with certain useful observation related to the three families
of cones in question. We know form Lecture 5 that these cones admit explicit logarithmically
homogeneous self-concordant barriers; on the other hand, we know that the Legendre transfor-
mation of a logarithmically homogeneous self-concordant barrier for a cone is similar barrier
3
self-duality, of course, makes sense only with respect to certain Euclidean structure on the embedding linear
space, since this structure underlies the construction of the dual cone. We have already indicated what are these
structures for the spaces where our cones live
5.4. EXERCISES: CONIC PROBLEMS 79

for the anti-dual cone. It is interesting to look what are the Legendre transformations of the
particular barriers known to us. The answer is as it should be: these barriers are, basically,
”self-adjoint” - their Legendre transformations coincide with the barriers, up to negating the
argument and adding a constant:
Exercise 5.4.10 # Prove that
1) the Legendre transformation of the standard logarithmic barrier
n
X
F (x) = − ln xi
i=1

for the cone Rn+ is


F ∗ (s) = F (−s) − n, Dom F ∗ = −Rn+ ;
2) the Legendre transformation of the standard barrier
F (x) = − ln Det x
for the cone Sn+ is
F ∗ (s) = F (−s) − n, Dom F ∗ = −Sn+ ;
3) the Legendre transformation of the standard barrier
F (x) = − ln(x2n+1 − x21 − ... − x2n )
for the cone Kn2 is
F ∗ (s) = F (−s) + 2 ln 2 − 2, Dom F ∗ = −Kn2 .

5.4.4 Conic duality: equivalent form


In many applications the ”natural” form of a conic problem is
(P) : minimize χT ξ s.t. ξ ∈ Rl , P (ξ − p) = 0, A(ξ) ∈ K,
where ξ is the vector of design variables, P is given k × l matrix, p is given l-dimensional vector,
χ ∈ Rl is the objective,
A(ξ) = Aξ + b
is an affine embedding of Rl into Rn and K is a convex, closed and pointed cone with a nonempty
interior in Rn . Since A is an embedding (different ξ’s have different images), the objective can
be expressed in terms of the image x = A(ξ) of the vector ξ under the embedding: there exists
(not necessarily unique) c ∈ Rn such that
cT A(ξ) = cT A(0) + χT ξ
identically in ξ ∈ Rl .
It is clear that (P) is equivalent to the problem
(P0 ) : minimize cT x s.t. x ∈ {β + L} ∩ K,
where the affine plane β + L is nothing but the image of the affine space
{ξ ∈ Rl | P (ξ − p) = 0}
under the affine mapping A. Problem (P’) is a conic program in our ”canonical” form, and we
can write down the conic dual to it, let this dual be called (D). A useful thing (which saves a
lot of time in computations with conic duality) is to know how to write down this dual directly
in terms of the data involved into (P), thus avoiding the necessity to compute c.
80 CHAPTER 5. CONIC PROBLEMS AND CONIC DUALITY

Exercise 5.4.11 # Prove that (D) is as follows:

minimize β T s s.t. s ∈ K ∗ , AT s = χ + P T r for some r ∈ Rk , (5.14)

with
β = b + Ap. (5.15)
5.5. EXERCISES: TRUSS TOPOLOGY DESIGN VIA CONIC DUALITY 81

5.5 Exercises: Truss Topology Design via Conic duality


It was said in Lecture 5 that conic duality is a powerful tool for mathematical processing a convex
problem. Let us illustrate this point by considering an interesting example - Truss Topology
Design problem (TTD).
”Human” formulation. We should design a truss - a construction, like the Eifel Tower, com-
prised of thin bars linked with each other at certain points - nodes of the truss. The construction
is subject to loads, i.e., external forces acting at the nodes. A particular collection of these forces
- each element of the collection specifying the external force acting at the corresponding node -
is called loading scenario. A given load causes certain deformation of the truss - nodes move a
little bit, the bars become shorter or longer. As a result, the truss capacitates certain energy -
the compliance. It is reasonable to regard this compliance as the measure of rigidity of the truss
under the loading in question - the larger is the compliance, the less rigid is the construction.
For a given loading scenario, the compliance depends on the truss - on how thick are the bars
linking the nodes. Now, the rigidity of a truss with respect to a given set of loading scenarios
is usually defined as its largest, over the scenarios, compliance. And the problem is to design,
given the set of scenarios and restrictions on the total mass of the construction, the most rigid
truss.
More specifically, when solving the problem you are given a finite 2D or 3D set of tentative
nodes, same as the finite set of tentative bars; for each of these bars it is said at which node it
should start and at which node end. To specify a truss is the same as to choose the volumes
ti , i = 1, ..., m, of the tentative bars (some of these volumes may be 0, which means that the
corresponding bar in fact does not present in the truss); the sum V of these volumes (proportional
to the total mass of the construction) is given in advance.
Mathematical formulation. Given are

• loading scenarios f1 , ..., fk - vectors from Rn ; here n is the total number of degrees of
freedom of the nodes (i.e., the dimension of the space of virtual nodal displacements), and
the entries of f are the components of the external forces acting at the nodes.
n is something like twice (for 2D constructions) or 3 times (for 3D ones) the number of
nodes; ”something like”, because some of the nodes may be partially or completely fixed
(say, be in the fundament of the construction), which reduces the total # of freedom
degrees;

• bar-stiffness matrices - n × n matrices Ai , i = 1, ..., m, where m is the number of tentative


bars. The meaning of these matrices is as follows: for a truss with bar volumes ti virtual
displacement x ∈ Rn of the nodes result in reaction forces

f = A(t)x, A(t) = t1 A1 + ... + tm Am .

Under reasonable mechanical hypothesis, these matrices are symmetric positive semidefi-
nite with positive definite sum, and in fact even dyadic:

Ai = bi bTi

for certain vectors bi ∈ Rn (these vectors are defined by the geometry of the nodal set).
These assumptions on Ai are crucial for what follows4 .
4
crucial are positive semidefiniteness and symmetry of Ai , not the fact that they are dyadic; this latter
assumption, quite reasonable for actual trusses, is not too important, although simplifies some relations
82 CHAPTER 5. CONIC PROBLEMS AND CONIC DUALITY

• total bar volume V > 0 of the truss.


Now, the vector x of nodal displacements caused by loading scenario f satisfies the equilibrium
equation
A(t)x = f
(which says that the reaction forces A(t)x caused by the deformation of the truss under the load
should balance the load; if the equilibrium equation has no solution, that means that the truss
is unable to carry the load in question). The compliance, up to an absolute constant factor,
turns out to be
xT f.
Thus, we come to the following problem of Multi-Loaded Truss Topology Design:
(TTDini ): find vector t ∈ Rm of bar volumes satisfying the constraints
m
X
t ≥ 0; ti = V (5.16)
i=1

and the displacement vectors xj ∈ Rn , j = 1, ..., k, satisfying the equilibrium equations

A(t)xj = fj , j = 1, ..., k, (5.17)

which minimize the worst-case compliance

C(t, x1 , ..., xk ) = max xTj fj .


j=1,...,k

From our initial formulation it is not even seen that the problem is convex (since equality
constraints (5.17) are bilinear in t and xj ). It is, anyhow, easy to demonstrate that in fact the
problem is convex. The motivation of the reasoning is as follows: when t is strictly positive,
A(t) is positive definite (since Ai are positive semidefinite with positive definite sum), and the
equilibrium equations can be solved explicitly:

xj = A−1 (t)fj ,

so that j-th compliance, as a function of t > 0, is

cj (t) = fjT A−1 (t)fj .

This function is convex in t > 0, since the interior of its epigraph

Gj = {(τ, t) | t > 0, τ > fjT A−1 (t)fj }

is convex, due to the following useful observation:


 
τ fT
(*): a block-diagonal symmetric matrix (τ and A are l × l and n × n symmetric
f A
matrices, f is n × l matrix) is positive definite if and only if both the matrices A and τ − f T A−1 f
are positive definite.
The convexity of Gj is an immediate consequence of this observation, since, due to it (applied
with l = 1 and f = fj ) Gj is the intersection of the convex set {(τ, t) | t > 0} and the inverse
image of a convex set (the cone of positive definite (n + 1) × (n + 1) matrices) under the affine
mapping  
τ fjT
(τ, t) 7→
fj A(t)
5.5. EXERCISES: TRUSS TOPOLOGY DESIGN VIA CONIC DUALITY 83

Exercise 5.5.1 # Prove (*).


The outlined reasoning is unsufficient for our purposes: it does not say what happens if some
of ti ’s are zero, which may cause degeneracy of A(t). In fact, of course, nothing happens: the
epigraph of the function ”compliance with respect to j-th load”, regarded as a function of t ≥ 0,
is simply the closure of the above Gj (and is therefore convex). Instead of proving this latter
fact directly, we shall come to the same conclusion in another way.
Exercise 5.5.2 Prove that the linear equation
Ax = f
with symmetric positive semidefinite matrix A is solvable if and only if the concave quadratic
form
qf (z) = 2z T f − z T Az
is above bounded, and if this is the case, then the quantity xT f , x being an arbitrary solution to
the equation, coincides with maxz qf (z).
Derive from this observation that one can eliminate from (TTDini) the displacements xj by
passing to the problem
(TTD1 ): find vector t of bar volumes subject to the constraint (5.16) which minimizes the
objective
c(t) = max cj (t), cj (t) = sup [2z T fj − z T A(t)z].
j=1,...,k z∈Rn

Note that cj (t) are closed and proper convex functions (as upper bounds of linear forms; the
fact that the functions are proper is an immediate consequence of the fact that A(t) is positive
definite for strictly positive t), so that (TTD1 ) is a convex program.
Our next step will be to reduce (TTD1 ) to a conic form. Let us first make the objective linear.
This is immediate: by introducing an extra variable τ , we can rewrite (TTD1 ) equivalently as
(TTD2 ): minimize τ by choice of t ∈ Rn and τ subject to the constraints (5.16) and
τ + z T A(t)z − 2z T fj ≥ 0, ∀z ∈ Rn ∀j = 1, ..., k. (5.18)
((5.18) clearly express the inequalities τ ≥ cj (t), j = 1, ..., k).
Our next step is guided by the following evident observation:
the inequality
τ + z T Az − 2z T f,
τ being real, A being symmetric n × n matrix and f being a n-dimensional vector, is valid for
all z ∈ Rn if and only if the symmetric (n + 1) × (n + 1) matrix
 
τ fT
≥0
f A
is positive semidefinite.
Exercise 5.5.3 Prove the latter statement. Derive from this statement that (TTD2 ) can be
equivalently written down as
(TTDp ): minimize τ by choice of s ∈ Rm and τ ∈ R subject to the constraint
m
X
A(τ, s) ∈ K; si = 0,
i=1

where
84 CHAPTER 5. CONIC PROBLEMS AND CONIC DUALITY

n+1
• K is the direct product of Rm
+ and k copies of the cone S+ ;

• the affine mapping A(τ, s) is as follows: the Rn+ -component of A is

At (τ, s) = s + m−1 (V, ..., V )T ≡ s + e;

the component of A associated with j-th of the copies of the cone Sn+1
+ is
 
τ fjT
Aj (τ, s) = .
fj A(e) + A(s)

Note that At (τ, s) is nothing but our previous t; the constraint At (τ, s) ∈ Rm
+ (which is the part of
P
the constraint A(τ, s) ∈ K) together with the constraint i si = 0 give equivalent reformulation
of the constraint (5.16), while the remaining components of the constraint A(τ, s) ∈ K, i.e., the
inclusions Aj (τ, s) ∈ Sn+1
+ , represent the constraints (5.18).

Note that the problem (TTDp ) is in fact in the conic form (cf. Section 5.4.4). Indeed, it requires
to minimize a linear objective under the constraints that, first, the design vector (τ, s) belongs
P
to sertain linear subspace E (given by i si = 0) and, second, that the image of the design
vector under a given affine mapping belongs to certain cone (closed, pointed, convex and with
a nonempty interior). Now, the objective evidently can be respresented as a linear form cT u of
the image u = A(τ, s) of the design vector under the mapping, so that our problem is exactly in
minimizing a linear objective over the intersection of an affine plane (namely, the image of the
linear subspace E under the affine mapping A) and a given cone, which is a conic problem.
To the moment we acted in certain ”clever” way; from now on we act in completely ”mechan-
ical” manner, simply writing down and straightforwardly simplifying the conic dual to (TTDp ).

First step: writing down conic dual to (TTDp ). What we should do is to apply to (TTDp )
the general construction from Lecture 5 and look at the result. The data in the primal problem
are as follows:
n+1
• K is the direct product of Kt = Rm
+ and k copies Kj of the cone S+ ; the embedding
space for this cone is
E = Rn × Sn+1 × ... × Sn+1 ;
we denote a point from this latter space by u = (t, p1 , ..., pk ), t ∈ Rm and pj being
(n + 1) × (n + 1) symmetric matrices, and denote the inner product by (·, ·);

• c ∈ E is given by c = (0, χ, ...χ), where


 −1 
k 0
χ=
0 0
is (n + 1) × (n + 1) matrix with the only nonzero entry, which ensures the desired relation

(c, A(τ, s)) ≡ τ ;

note that there are many other ways to choose c in accordance with this relation;

• L is the image of E under the homogeneous part of the affine mapping A;

• b = A(0, 0) = (e, φ1 , ..., φk ), where


 
0 fjT
φj = .
fj A(e)
5.5. EXERCISES: TRUSS TOPOLOGY DESIGN VIA CONIC DUALITY 85

Now let us build up the dual problem. We know that the cone K is self-dual (as a direct product
of self-dual cones, see Exercises 5.4.7, 5.4.9), so that K ∗ = K. We should realize only what is
L⊥ , in other words, what are the vectors

s = (r, q1 , ..., qk ) ∈ E

which are orthogonal to the image of E under the homogeneous part of the affine mapping A.
This requires nothing but completely straightforward computations.

Exercise 5.5.4 + Prove that feasible plane c + L⊥ of the dual problem is comprised of exactly
those w = (r, q1 , ..., qk ) for which the symmetric (n + 1) × (n + 1) matrices qj , j = 1, ..., k, are
of the form  
λj z Tj
qj = , (5.19)
z j σj
with λj satisfying the relation
k
X
λj = 1 (5.20)
j=1

and the n × n symmetric matrices σ1 , ..., σk , along with the n-dimensional vector r, and a real
ρ, satisfying the equations
k
X
ri + bTi σj bi = ρ, i = 1, ..., m. (5.21)
j=1

(bi are the vectors involved into the representation Ai = bi bTi , so that bTi σj bi = Tr{Ai σj }).
Derive from this observation that the conic dual to (TTDp ) is the problem
(TTDd ): minimize the linear functional
k
X
2 z Tj fj + V ρ (5.22)
j=1

by choice of positive semidefinite matrices qj of the form (5.19), nonnegative vector r ∈ Rn and
real ρ under the constraints (5.20) and (5.21).

Second step: simplifying the dual problem. Now let us simplify the dual problem. It is
immediately seen that one can eliminate the ”heavy” matrix variables σj and the vector r by
performing partial optimization in these variables:

Exercise 5.5.5 + Prove that in the notation given by (5.19), a collection

(λ1 , ..., λk ; z 1 , ..., z k ; ρ)

can be extended to a feasible plan (r; q1 , ..., qk ; ρ) of problem (TTDd ) if and only if the collection
satisfies the following requirements:
k
X
λj ≥ 0; λj = 1; (5.23)
j=1

k
X (bTi z j )2
ρ≥ ∀i (5.24)
j=1
λj
86 CHAPTER 5. CONIC PROBLEMS AND CONIC DUALITY

(a fraction with zero denominator from now on is +∞), so that (TTDd ) is equivalent to the
problem of minimizing linear objective (5.22) of the variables λ· , z · , ρ under constraints (5.23),
(5.24).
Eliminate ρ from this latter problem to obtain the following equivalent reformulation of
(TTDd ):
(TTDd ): minimize the function

k
" #
X (bT z j )2
max 2z Tj fj +V i (5.25)
i=1,...,m
j=1
λj

by choice of λj , j = 1, ..., k, and z j ∈ Rn subject to the constraint

k
X
λj ≥ 0; λj = 1. (5.26)
j=1

Note that in the important single-load case k = 1 the problem (TTDd ) is simply in minimizing,
with respect to z 1 ∈ Rn , the maximum over i = 1, ..., m of the quadratic forms

ψi (z 1 ) = 2z T1 f1 + V (bTi z 1 )2 .

Now look: the initial problem (TTDp ) contained m-dimensional design vector (τ, s) (the
”formal” dimension of the vector is m + 1, but we remember that the sum of si should be 0).
The dual problem (TTDd ) has k(n + 1) − 1 variables (there are k n-dimensional vectors z j and
k reals λj subject to a single linear equation). In the ”full topology TTD” (it is allowed to
link by a bar any pair of nodes), m is of order of n2 and n is at least of order of hundreds, so
that m is of order of thousands and tens of thousands. In contrast to these huge numbers, the
number k of loading scenarios is, normally, a small integer (less than 10). Thus, the dimension
of (TTDd ) is by order of magnitudes less than that one of (TTDp ). At the same time, solving
the dual problem one can easily recover, via the Conic duality theorem, the optimal solution to
the primal problem. As a kind of ”penalty” for relatively small # of variables, (TTDd ) has a lot
of inequality constraints; note, anyhow, that for many methods it is much easier to struggle with
many constraints than with many variables; this is, in particular, the case with the Newton-
based methods5 . Thus, passing - in a completely mechanical way! - from the primal problem to
the dual one, we improve the ”computational tractability” of the problem.
Third step: back to primal. And now let us demonstrate how duality allows to obtain a
better insight on the problem. To this end let us derive the problem dual to (TTDd ). This looks
crazy: we know that dual to dual is primal, the problem we started with. There is, anyhow, an
important point: (TTDd ) is equivalent to the conic dual to (TTDp ), not the conic dual itself;
therefore, taking dual to (TTDd ), we should not necessarily obtain the primal problem, although
we may expect that the result will be equivalent to this primal problem.
Let us implement our plan. First, we rewrite (TTDd ) in an equivalent conic form. To
this end we introduce extra variables yij ∈ R, i = 1, ..., m, j = 1, ..., k, in order to ”localize”
nonlinearities, and an extra variable f to represent the objective (5.25) (look: a minute ago we
tried to eliminate as many variables as possible, and now we go in the opposite direction... This
5
since the number of constraints influences only the complexity of assembling the Newton system, and the
complexity is linear in this number; in contrast to this, the # of variables defines the size of the Newton system,
and the complexity of solving the system is cubic in # of variables
5.5. EXERCISES: TRUSS TOPOLOGY DESIGN VIA CONIC DUALITY 87

is life, isn’t it?) More specifically, consider the system of constraints on the variables z j , λj , yij ,
f (i runs from 1 to m, j runs from 1 to k):

(bTi z j )2
yij ≥ ; λj ≥ 0, i = 1, ..., m, j = 1, ..., k; (5.27)
λj

k h
X i
f≥ 2z Tj fj + V yij , i = 1, ..., m; (5.28)
j=1

k
X
λj = 1. (5.29)
j=1

It is immediately seen that (TTDd ) is equivalent to minimization of the variable f under the
constraints (5.27) - (5.29). This latter problem is in the conic form (P) of Section 5.4.4, since
(5.27) can be equivalently rewritten as
 
yij bTi z j
≥ 0, i = 1, ..., m, j = 1, ..., k (5.30)
bTi z j λj

(”≥ 0” for symmetric matrices stands for ”positive semidefinite”); to justify this equivalence,
think what is the criterion of positive semidefiniteness of a 2 × 2 symmetric matrix.
We see that (TTDd ) is equivalent to the problem of minimizing f under the constraints (5.28)
- (5.30). This problem, let it be called (π), is of form (P), Section 5.4.4, with the following data:

• the design vector is


ξ = (f ; λ· ; y· ; z · );

• K is the direct product of Rm 2


+ and mk copies of the cone S+ of symmetric positive
semidefinite 2 × 2 matrices; we denote the embedding space of the cone by F, the vectors
from F by η = (ζ, {πij }i=1,...,m,j=1,...,k ), ζ being m-dimensional and πij being 2×2 matrices,
and equip F with the natural inner product
X
(η 0 , η 00 ) = (ζ 0 )T ζ 00 + 0 00
Tr{πij πij };
i,j

• A is the homogeneous linear mapping with the components

k h
X i
(Aζ )i = f − 2z Tj fj + V yij ,
j=1

 
yij bTi z j
Aπij = ;
bTi z j λj

• χ is the vector with the only nonzero component (associated with the f -component of the
design vector) equal to 1.
P
• The system P (ξ−p) = 0 is j λj = 1, so that P T r, r ∈ R, is the vector with λ· -components
equal to r and remaining components equal to 0, and p is P T k1 .
88 CHAPTER 5. CONIC PROBLEMS AND CONIC DUALITY

Exercise 5.5.6 + Prove that the conic dual, in the sense of Section 5.4.4, to problem (π) is
equivalent to the following program:
(ψ): minimize "m 2#
X βij
max (5.31)
j=1,...,k
i=1
φi
by choice of m-dimensional vector φ and mk reals βij subject to the constraints
m
X
φ ≥ 0; φi = V ; (5.32)
i=1

m
X
βij bi = fj , j = 1, ..., k. (5.33)
i=1

Fourth step: from primal to primal. We do not know what is the actual relation between
problem (ψ) and our very first problem (TTDini ) - what we can say is:
”(ψ) is equivalent to the problem which is conic dual to the problem which is equivalent to
the conic dual to the problem which is equivalent to (TTDini )”;
it sounds awkful, especially taking into account that the notion of equivalency between problems
has no exact meaning. At the same time, looking at (ψ), namely, at equation (5.32), we may
guess that φi are nothing but our bar volumes ti - the design variables we actually are interested
in, so that (ψ) is a ”direct reformulation” of (TTDini ) - the φ-component of optimal solution to
(ψ) is nothing but the t-component of the optimal solution to (TTDini ). This actually is the
case, and the proof could be given by tracing the chain wich leaded us to (ψ). There is, anyhow,
a direct, simple and instructive way to establish equivalency between the initial and the final
problems in our chain, which is as follows.
Given a feasible solution (t, x1 , ..., xk ) to (TTDini ), consider the bar forces

βij = ti xTj bi ;

these quantities are magnitudes of the reaction forces caused by elongations of the bars under
the corresponding loads. The equilibrium equations

A(t)xj = fj
P P T
in view of A(t) = i ti Ai ≡ i ti bi bi say exactly that
X
βij bi = fj , j = 1, ..., k; (5.34)
i

thus, we come to a feasible plan

(φ, β· ) : φ = t, βij = ti xTj bi (5.35)

to problem (ψ). What is the value of the objective of the latter problem at the indicated plan?
Multiplying (5.34) by xTj and taking into account the origin of βij , we see that j-th compliance
cj = xTj fj is equal to

X X 2
X βij 2
X βij
βij xTj bi = ti (xTj bi )2 = = ,
i i i
ti i
φi
5.5. EXERCISES: TRUSS TOPOLOGY DESIGN VIA CONIC DUALITY 89

so that the value of the objective of (TTDini ) at (t, x1 , ..., xk ), which is maxj cj , is exactly the
value of the objective (5.31) of the problem (ψ) at the feasible plan (5.35) of the latter problem.
Thus, we have establish the following proposition:
A. Transformation (5.35) maps a feasible plan (t, x1 , ..., xk ) to problem (TTSini ) into feasible
plan (φ, β· ) to problem (ψ), and the value of the objective of the first problem at the first plan
is equal to the value of the objective of the second problem at the second plan.
Are we done? Have we established the desired equivalence between the problems? No! Why do
we know that images of the feasible plans to (TTDini ) under mapping (5.35) cover the whole
set of feasible plans of (ψ)? And if it is not the case, how can we be sure that the problems are
equivalent - it may happen that optimal solution to (ψ) corresponds to no feasible plan of the
initial problem!
And the image of mapping (5.35) indeed does not cover the whole feasible set of (ψ), which
is clear by dimension reasons: the dimension of the feasible domain of (TTDini ), regarded as a
nonlinear manifold, is m − 1 (this is the # of independent ti ’s; xj are functions of t given by
the equilibrium equations); and the dimension of the feasible domain of (ψ), also regarded as
a manifold, is m − 1 (# of independent φi ’s) plus mk (# of βij ) minus nk (# of scalar linear
equations (5.33)), i.e., it might be by order of magnitudes greater than the dimension of the
feasible domain of (TTDini ) (recall that normally m >> n). In other words, transformation
(5.35) allows to obtain only those feasible plans of (ψ) where the β-part is determined, via the
expressions
βij = ti xTj bi ,
by k n-dimensional vectors xj (which is also clear from the origin of the problem: the actual
bar forces should be caused by certain displacements of the nodes), and this is in no sense a
consequence of the constraints of problem (ψ): relations (5.33) say only that the sum of the
reaction forces balances the external load, and says nothing on the ”mehcanical validity” of the
reaction forces, i.e., whether or not they are caused by certain displacements of the nodes. Our
dimension analysis demonstrates that the reaction forces caused by nodal displacements - i.e.,
those valid mechanically - form a very small part of all reaction forces allowed by equations
(5.33).
In spite of these pessimistic remarks, we know that the optimal value in (ψ) - which is
basically dual to dual to (TTDini ) - is the same one as that one in (TTDini ), so that in fact
the optimal solution to (ψ) is in the image of mapping (5.35). Can we see it directly, without
referring to the chain of transformations which leaded us to (ψ)? Yes! It is very simple to verify
that the following proposition holds:
B. Let (φ, β· ) be a feasible plan to (ψ) and ω be the corresponding value of the objective. Then
φ can be extended to a feasible plan (t = φ, x1 , ..., xk ) to (TTDini ), and the maximal, over the
loads f1 , ..., fk , compliance of the truss t is ≤ ω.

Exercise 5.5.7 + Prove B.


From A. and B. it follows, of course, that problems (TTDini ) and (ψ) are equivalent -
ε-solution to any of them can be immediately transformed into ε-solution to another.
Concluding remarks. Let me make several comments on our ”truss adventure”.
• Our main effort was to pass from the initial form (TTDini ) of the Truss Topology Design
problem to its dual (TTDd ) and then - to the ”dual to dual” bar forces reformulation
(ψ) of the initial problem. Some steps seemed to be ”clever” (convex reformulation of
(TTDini ); conic reformulation of of (TTDd ) in terms of cones of positive semidefinite 2 × 2
90 CHAPTER 5. CONIC PROBLEMS AND CONIC DUALITY

matrices), but most of them were completely routine - we used in a straightforward manner
the general scheme of conic duality. In fact the ”clever” steps also are completely routine;
small experience suffices to see immediately that the epigraph of the compliance can be
represented in terms of nonnegativity of certain quadratic forms or, which is the same,
in terms of positive semidefiniteness of certain matrices linearly depending on the control
vectors; this is even easier to do with the constraints (5.30). I would qualify our chain of
reformulations as a completely straightforward.
• Let us look, anyhow, what are the results of our effort. There are two of them:
(a) ”compressed”, as far as # of variables is concerned, form (TTDd ) of the problem; as it
was mentioned, reducing # of variables, we get better possibilities for numerical processing
the problem;
(b) very instructive ”bar forces” reformulation (ψ) of the problem.
• After the ”bar forces” formulation is guessed, one can easily establish its equivalence to the
initial formulation; thus, if our only goal were to replace (TTDini ) by (ψ), we could restrict
ourselves with the fourth step of our construction and skip the preceding three steps. The
question, anyhow, how to guess that (ψ) indeed is equivalent to (TTDini ). This is not that
difficult to look what are the equilibrium equations in terms of the bar forces βij = ti xTj bi ;
but one hardly could be courageous enough (and, to the best of our knowledge, in fact was
not courageous) to conjecture that the ”heart of the situation” - the restriction that the
bar forces should be caused by certain displacements of the nodes - simply is redundant:
in fact we can forget that the bar forces should belong to an ”almost negligible”, as far
as dimensions are concerned, manifold (given by the equations βij = ti xTj bi ), since this
restriction on the bar forces is automatically satisfied at any optimal solution to (ψ) (this
is what actually is said by B.).
Thus, the things are as they should be: routine transformations result in something which,
in principle, could be guessed and proved directly and quickly; the bottleneck is in this ”in
principle”: it is not difficult to justify the answer, it is difficult to guess what the answer is.
In our case, this answer was ”guessed” via straightforward applications of a quite routine
general scheme, scheme useful in other cases as well; to demonstrate the efficiency of this
scheme and some ”standard” tricks in its implementation, this is exactly the goal of this
text.
• To conclude, let me say several words on the ”bar forces” formulation of the TTD problem.
First of all, let us look what is this formulation in the single-load case k = 1. Here the
problem becomes
X β2
i
minimize
i
φi

under the constraints X X


φ ≥ 0; φi = V ; βi bi = f.
i i
We can immediately perform partial optimization in φi :
" #−1
X
φi = V |β|i |βi | .
i
The remaining optimization in βi , i.e., the problem
" #2
X X
minimize V −1 |βi | s.t. βi bi = f,
i i
5.5. EXERCISES: TRUSS TOPOLOGY DESIGN VIA CONIC DUALITY 91

can be immediately reduced to an LP program.


P
Another useful observation is as follows: above we dealt with A(t) = m T
i=1 ti Ai , Ai = bi bi ;
in mechanical terms, this is the linear elastic model of the material. For other mechanical
models, other types of dependencies A(t) occur, e.g.,
m
X
A(t) = tκi Ai , Ai = bi bTi ,
i=1

where κ > 0 is given. In this case the ”direct” reasoning establishing the equivalence
between (TTDini ) and (ψ) remains valid and results in the following ”bar forces” setting:
m
X 2
βij
minimize max
j=1,...,k
i=1
tκi

under the constraints


X X
t ≥ 0; ti = V ; βij bi = fj , j = 1, ..., k.
i i

A bad news here is that the problem turns out to be convex in (t, β· ) if and only if κ ≥ 1,
and from the mechanical viewpoint, the only interesting case in this range of values of κ
is that one of linear model (κ = 1).
92 CHAPTER 5. CONIC PROBLEMS AND CONIC DUALITY
Chapter 6

The method of Karmarkar

The goal of this lecture is to develop the method which extends onto the general convex case
the very first polynomial time interior point method - the method of Karmarkar. Let me say
that there is no necessity to start with the initial LP method and then pass to the extensions,
since the general scheme seems to be more clear than its particular LP implementation.

6.1 Problem setting and assumptions


The method in question is for solving a convex program in the conic form:
(P) : minimize cT x s.t. x ∈ {b + L} ∩ K, (6.1)
where
• K is a closed convex pointed cone with a nonempty interior in Rn ;
• L is a linear subspace in Rn ;
• b and c are given n-dimensional vectors.
We assume that
A: the feasible set
Kf = {b + L} ∩ K
of the problem is bounded and intersects the interior of the cone K.
B: we are given in advance a strictly feasible solution x
b to the problem, i.e., a feasible solution
belonging to the interior of K;
Assumptions A and B are more or less standard for the interior point approach. The next
assumption is specific for the method of Karmarkar:
C: the optimal value, c∗ , of the problem is known.
Assumption C. might look rather restrictive; in the mean time we shall see how one can eliminate
it.
Our last assumption is as follows:
D: we are given a ϑ-logarithmically homogeneous self-concordant barrier F for the cone K.
As in the case of the path-following method, ”we are given F ” means that we are able, for any
x ∈ Rn , to decide whether x ∈ Dom F ≡ int K, and if it is the case, can compute the value
F (x), the gradient F 0 (x) and the Hessian F 00 (x) of the barrier at x. Note that the barrier F is
the only representation of the cone used in the method.

93
94 CHAPTER 6. THE METHOD OF KARMARKAR

6.2 Homogeneous form of the problem


To proceed, let us note that the feasible affine plane b + L of problem (P) can be, by many
ways, represented as an intersection of a linear space M and an affine hyperplane Π = {x ∈ Rn |
eT x = 1}. Indeed, our feasible affine plane always can be represented as the plane of solutions
to a system
Px = p
of, say, m+1 linear equations. Note that the system for sure is not homogeneous, since otherwise
the feasible plane would pass through the origin; and since, in view of A, it intersects also the
interior of the cone, the feasible set Kf would be a nontrivial cone, which is impossible, since
Kf is assumed to be bounded (by the same A). Thus, at least one of the equations, say, the
last of them, is with a nonzero right hand side; normalizing the equation, we may think that
it is of the form eT x = 1. Subtracting this equation, with properly chosen coefficient, from the
remaining m equations of the system, we may make these equations homogeneous, thus reducing
the system to the form
Ax = 0; eT x = 1;
now b + L is represented in the desired form

b + L = {x ∈ M | eT x = 1}, M = {x | Ax = 0}.

Thus, we can rewrite (P) as

minimize cT x s.t. x ∈ K ∩ M, eT x = 1,

with M being a linear subspace in Rn .


It is convenient to convert the problem into an equivalent one where the optimal value of the
objective (which, according to C, is known in advance) is zero; to this end it suffices to replace
the initial objective c with a new one

σ = c − c∗ e;

since on the feasible plane of the problem eT x is identically 1, this updating indeed results in
equivalent problem with the optimal value equal to 0.
Thus, we have seen that (P) can be easily rewritten in the so called Karmarkar format

(PK ) minimize σ T x s.t. x ∈ K ∩ M, eT x = 1, (6.2)

with M being a linear subspace in Rn and the optimal value in the problem being zero; this
transformation preserves, of course, properties A, B.

Remark 6.2.1 In the original description of the method of Karmarkar, the problem from the
very beginning is assumed to be in the form (PK ), with K = Rn+ ; moreover, Karmarkar assumes
that
e = (1, ..., 1)T ∈ Rn
b to the problem is the barycenter n−1 e of
and that the given in advance strictly feasible solution x
the standard simplex; thus, in the original version of the method it is assumed that the feasible
set Kf of the problem is the intersection of the standard simplex
n
X
∆ = {x ∈ Rn+ | xi ≡ eT x = 1}
i=1
6.3. THE KARMARKAR POTENTIAL FUNCTION 95

and a linear subspace of Rn passing through the barycenter n−1 e of the simplex and, besides
this, that the optimal value in the problem is 0.
And, of course, in the Karmarkar paper the barrier for the cone K = Rn+ underlying the
whole construction is the standard n-logarithmically homogeneous barrier
n
X
F (x) = − ln xi
i=1

for the nonnegative orthant.


In what follows we refer to the particular LP situation presented in the above remark as to the
Karmarkar case.

6.3 The Karmarkar potential function


In what follows we assume that the objective cT x, or, which is the same, our new objective σ T x
is nonconstant on the feasible set of the problem (otherwise there is nothing to do: σ T x b = 0,
i.e., the initial strictly feasible solution, same as any feasible solution, is optimal). Since the new
objective is nonconstant on the feasible set and its optimal value is 0, it follows that the objective
is strictly positive at any strictly feasible solution to the problem, i.e., on the relative interior
rint Kf of Kf (due to A, this relative interior is nothing but the intersection of the feasible plane
and the interior of K, i.e., nothing but the set of all strictly feasible solutions to the problem).
Since σ T x is strictly positive on the relative interior of Kf , the following Karmarkar potential

v(x) = F (x) + ϑ ln(σ T x) : Dom v ≡ {x ∈ int K | σ T x > 0} → R (6.3)

is well-defined on rint Kf ; this potential is the main hero of our story.


The first observation related to the potential is that when x is strictly feasible and the
potential at x is small (negative with large absolute value), then x is a good approximate
solution.
The exact statement is as follows:
Proposition 6.3.1 Let x ∈ int K be feasible for (PK ). Then

v(x
b) − v(x) F (x
b) − minrint Kf F
σ T x ≡ cT x − c∗ ≤ V exp{− }, V = (cT x
b − c∗ ) exp{ }; (6.4)
ϑ ϑ
note that minrint Kf F is well defined, since Kf is bounded (due to A) and the restriction of F
onto the relative interior of Kf is self-concordant barrier for Kf (Proposition 3.1.1.(i)).
The proof is immediate:

b) − v(x) = ϑ[ln(σ T x
v(x b) − ln(σ T x)] + F (x
b) − F (x) ≤

≤ ϑ[ln(σ T x
b) − ln(σ T x)] + F (x
b) − min F,
rint Kf

and (6.4) follows.


The above observation says to us that all we need is certain rule for updating strictly feasible
solution x into another strictly feasible solution x+ with a ”significantly less” value of the
potential; iterating this updating, we obtain a sequence of strictly feasible solutions with the
potential tending to −∞, so that the solutions converge in terms of the objective. This is how
the method works; and the essence of the matter is, of course, the aforementioned updating
which we are about to represent.
96 CHAPTER 6. THE METHOD OF KARMARKAR

6.4 The Karmarkar updating scheme


The updating of strictly feasible solutions

K : x 7→ x+

which underlies the method of Karmarkar is as follows:

1) Given strictly feasible solution x to problem (PK ), compute the gradient F 0 (x) of the barrier
F;

2) Find the Newton direction ex of the ”partially linearized” potential

σ T (y − x)
vx (y) = F (y) + ϑ + ϑ ln(σ T x)
σT x

at the point x along the affine plane

Ex = {y | y ∈ M, (y − x)T F 0 (x) = 0}

tangent to the corresponding level set of the barrier, i.e., set

1
ex = argmin{hT ∇y vx (x) + hT ∇2y vx (x)h | h ∈ M, hT F 0 (x) = 0};
2

3) Compute the reduced Newton decrement


q
ω= −eTx ∇y vx (x)

and set
1
x0 = x + ex .
1+ω
4) The point x0 belongs to the intersection of the subspace M and the interior of K. Find a
point x00 from this intersection such that

v(x00 ) ≤ v(x0 )

(e.g., set x00 = x0 ) and set


x+ = (eT x00 )−1 x00 ,

thus completing the updating x 7→ x+ .

The following proposition is the central one.

Proposition 6.4.1 The above updating is well defined, maps a strictly feasible solution x to
(P)K into another strictly feasible solution x+ to (P) and decreases the Karmarkar potential at
least by absolute constant:

1 4
v(x+ ) ≤ v(x) − χ, χ = − ln > 0. (6.5)
3 3
6.4. THE KARMARKAR UPDATING SCHEME 97

Proof.
00 . Let us start with the following simple observations:

y ∈ int K ∩ M ⇒ eT y > 0; (6.6)

y ∈ int K ∩ M ⇒ σ T y > 0. (6.7)


To prove (6.6), assume, on contrary, that there exists y ∈ int K ∩ M with eT y ≤ 0. Consider
the linear function
φ(t) = eT [x
b + t(y − x
b)], 0 ≤ t ≤ 1.

This function is positive at t = 0 (since x


b is feasible) and nonpositive at t = 1; therefore it has

a unique root t ∈ (0, 1] and is positive to the left of this root. We conclude that the points

xt = φ−1 (t)[x b)], 0 ≤ t < t∗ ,


b + t(y − x

are well defined and, moreover, belong to Kf (indeed, since both x and y are in K ∩ M and
φ(t) is positive for 0 ≤ t < t∗ , the points xt also are in K ∩ M ; to establish feasibility, we should
verify, in addition, that eT xt = 1, which is evident).
Thus, xt , 0 ≤ t < t∗ , is certain curve in the feasible set. Let us prove that |xt |2 → ∞ as
t → t∗ − 0; this will be the desired contradiction, since Kf is assumed to be bounded (see A).
Indeed, φ(t) → 0 as t → t∗ − 0, while x + t(y − x) has a nonzero limit x + t∗ (y − x) (this limit
is nonzero as a convex combination of two points from the interior of K and, therefore, a point
from this interior; recall that K is pointed, so that the origin is not in its interior).
We have proved (6.6); (6.7) is an immediate consequence of this relation, since if there were
y ∈ int K ∩ M with σ T y ≤ 0, the vector [eT y]−1 y would be a strictly feasible solution to the
problem (since we already know that eT y > 0, so that the normalization y 7→ [eT y]−1 y would
keep the point in the interior of the cone) with nonnegative value of the objective, which, as we
know, is impossible.
10 . Let us set

G = K ∩ Ex ≡ K ∩ {y | y ∈ M, (y − x)T F 0 (x) = 0};

since x ∈ M is an interior point of K, G is a closed convex domain in the affine plane Ex (this
latter plane from now on is regarded as the linear space G is embedded to); the (relative) interior
of G is exactly the intersection of Ex and the interior of the cone K.
20 . Further, let f (·) be the restriction of the barrier F on rint G; due to our combination
rules for self-concordant barriers, namely, that one on affine substitutions of argument, f is
ϑ-self-concordant barrier for G.
30 . By construction, the ”partially linearized” potential, regarded as a function on rint G, is
the sum of the barrier f and a linear form:

vx (y) = f (y) + pT (y − x) + q,

where the linear term pT (y −x)+q is nothing but the first order Taylor expansion of the function

ϑ ln(σ T y)

at the point y = x. From (6.7) it immediately follows that this function (and therefore v(·)) is
well-defined onto int K ∩ M and, consequently, on rint G; besides this, the function is concave
in y ∈ rint G. Thus, we have

v(y) ≤ vx (y), y ∈ rint G; v(x) = vx (x). (6.8)


98 CHAPTER 6. THE METHOD OF KARMARKAR

40 . Since vx is sum of a self-concordant barrier and a linear form, it is self-concordant on


the set rint G. From definition of e and ω it is immediately seen that ex is nothing but the
Newton direction of vx (y) (regarded as a function on rint G) at the point y = x, and ω is the
corresponding Newton decrement; consequently (look at rule 3)) x0 is the iterate of y = x under
the action of the damped Newton method. From Lecture 2 we know that this iterate belongs to
rint G and that the iteration of the method decreases vx ”significantly”, namely, that

vx (x) − vx (x0 ) ≥ ρ(−ω) = ω − ln(1 + ω).

Taking into account (6.8), we conclude that


x0 belongs to the intersection of the subspace M and the interior of the cone K and

v(x) − v(x0 ) ≥ ρ(−ω). (6.9)

50 . Now comes the first crucial point of the proof: the reduced Newton decrement ω is not
too small, namely,
1
ω≥ . (6.10)
3
Indeed, x is the analytic center of G with respect to the barrier f (since, by construction, Ex is
orthogonal to the gradient F 0 of the barrier F at x, and f is the restriction of F onto Ex ). Since
f , as we just have mentioned, is ϑ-self-concordant barrier for G, and f is nondegenerate (as
a restriction of a nondegenerate self-concordant barrier F , see Proposition 5.3.1), the enlarged
Dikin ellipsoid

W + = {y ∈ Ex | |y − x|x ≤ ϑ + 2 ϑ}

(| · |x is the Euclidean norm generated by F 00 (x)) contains the whole G (the Centering property,
Lecture 3, V.). Now, the optimal solution x∗ to (PK ) satisfies the relation σ T x∗ = 0 (the origin
of σ) and is a nonzero vector from K ∩ M (since x∗ is feasible for the problem). It follows
that the quantity (x∗ )T F 0 (x) is negative (since F 0 (x) ∈ int (−K ∗ ), Proposition 5.3.3.(i)), and
therefore the ray spanned by x∗ intersects G at certain point y ∗ (indeed, G is the part of K ∩ M
given by the linear equation y T F 0 (x) = xT F 0 (x), and the right hand side in this equation is −ϑ,
see (5.5), Lecture 5, i.e., is of the same sign as (x∗ )T F 0 (x)). Since σ T x∗ = 0, we have σ T y ∗ = 0;
thus,
there exists y ∗ in G, and, consequently, in the ellipsoid W + , with σ T y ∗ = 0.
We conclude that the linear form

σT y
ψ(y) = ϑ
σT x

which is equal to ϑ at the center x of the ellipsoid W + , attains the zero value somewhere in
the ellipsoid, and therefore its variation over the ellipsoid is at least 2ϑ. Consequently, the
variation√of the form over the centered at x unit Dikin ellipsoid of the barrier f is at least
2ϑ(ϑ + 2 ϑ)−1 ≥ 2/3:

σT h 1
max{ϑ T
| h ∈ M, hT F 0 (x) = 0, |h|x ≤ 1} ≥ .
σ x 3

But the linear form in question is exactly ∇y vx (x), since ∇y f (x) = 0 (recall that x is the analytic
center of G with respect to f ), so that the left hand side in the latter inequality is the Newton
decrement of vx (·) (as always, regarded as a function on rint G) at x, i.e., it is nothing but ω.
6.5. OVERALL COMPLEXITY OF THE METHOD 99

60 . Now comes the concluding step: the Karmarkar potential v is constant along rays:
v(tu) = v(t) whenever u ∈ Dom v and t > 0 [this is an immediate consequence of ϑ-logarithmic
homogeneity of the barrier F ]1 . As we just have seen,
1
v(x0 ) ≤ v(x) − ρ(− );
3
by construction, x00 is a point from int K ∩ M such that

v(x00 ) ≤ v(x0 ).

According to (6.6), when passing from x00 to x+ = [eT x00 ]−1 x00 , we get a strictly feasible solution
to the problem, and due to the fact that v remains constant along rays, v(x+ ) = v(x00 ). Thus,
we come to v(x+ ) ≤ v(x) − ρ(− 13 ), as claimed.

6.5 Overall complexity of the method


As it was already indicated, the method of Karmarkar as applied to problem (PK ) simply iterates
the updating K presented in Section 6.4, i.e., generates the sequence

xi = K(xi−1 ), x0 = x
b, (6.11)

x
b being the initial strictly feasible solution to the problem (see B).
An immediate corollary of Propositions 6.3.1 and 6.4.1 is the following complexity result:

Theorem 6.5.1 Let problem (PK ) be solved by the method of Karmarkar associated with ϑ-
logarithmically homogeneous barrier F for the cone K, and let assumptions A - C be satisfied.
Then the iterates xi generated by the method are strictly feasible solutions to the problem and
v(x
b) − v(xi ) iχ 1 4
cT xi − c∗ ≤ V exp{− } ≤ V exp{− }, χ = − ln , (6.12)
ϑ ϑ 3 3
with the data-dependent scale factor V given by
F (x
b) − minrint Kf F
V = (cT x
b − c∗ ) exp{ }. (6.13)
ϑ
In particular, the Newton complexity (# of iterations of the method) of finding an ε-solution to
the problem does not exceed the quantity
 
V
NKarm (ε) = O(1)ϑ ln + 1 + 1, (6.14)
ε
O(1) being an absolute constant.

Comments.

• We see that the Newton complexity of finding an ε-solution by the method of Karmarkar is
proportional to ϑ; on the other hand, the restriction of F on the feasible set Kf is a ϑ-self-
concordant barrier for this set (Proposition 3.1.1.(i)), and we might solve the problem by
the path-following method associated with this √ restriction, which would result in a better
Newton complexity, namely, proportional to ϑ. Thus, from the theoretical complexity
1
and in fact the assumption of logarithmic homogeneity of F , same as the form of the Karmarkar potential,
originate exactly from the desire to make the potential constant along rays
100 CHAPTER 6. THE METHOD OF KARMARKAR

viewpoint the method of Karmarkar is significantly worse than the path-following method;
why should we be interested in the method of Karmarkar?
The answer is: due to the potential reduction nature of the method, the nature which un-
derlies the excellent practical performance of the algorithm. Look: in the above reasoning,
the only thing we are interested in is to decrease as fast as possible certain explicitly given
function - the potential. The theory gives us certain ”default” way of updating the current
iterate in a manner which guarantees certain progress (at least by an absolute constant)
in the value of the potential at each iteration, and it does not forbid as to do whatever we
want to get a better progress (this possibility was explicitly indicated in our construction,
see the requirements on x00 ). E.g., after x0 is found, we can perform the line search on the
intersection of the ray [x, x0 ) with the interior of G in order to choose as x00 the best, in
terms of the potential, point of this intersection rather than the ”default” point x0 . There
are strong reasons to expect that in some important cases the line search decreases the
value of the potential by much larger quantity than that one given by the above theoretical
analysis (see exercises accompanying this lecture); in accordance with these expectations,
the method in fact behaves itself incomparably better than it is said by the theoretical
complexity analysis.

• What is also important is that all ”common sense” improvements of the basic Karmarkar
scheme, like the aforementioned line search, do not spoil the theoretical complexity bound;
and from the practical viewpoint a very attractive property of the method is that the
potential gives us a clear criterion to decide what is good and what is bad. In contrast to
this, in the path-following scheme we either should follow the theoretical recommendations
on the rate of updating the penalty - and then for sure will be enforced to perform a lot of
Newton steps - or could increase the penalty at a significantly higher rate, thus destroying
the theoretical complexity bound and imposing a very difficult questions of how to choose
and to tune this higher rate.

• Let me say several words about the original method of Karmarkar for LP. In fact this
is exactly the particular case of the aforementioned scheme for the sutiation described in
Remark 6.2.1; Karmarkar, anyhow, presents the same method in a different way. Namely,
instead of processing the same data in varying, from iteration to iteration, plane Ex , he
uses scaling - after a new iterate xi is found, he performs fractional-linear substitution of
the argument
X −1 x
x 7→ T i −1 , Xi = Diag{xi }
e Xi x

(recall that in the Karmarkar situation e = (1, ..., 1)T ). With this substitution, the problem
becomes another problem of the same type (with new objective σ and new linear subspace
M ), and the image of the actual iterate xi becomes the barycenter n−1 e of the simplex ∆.
It is immediately seen that in the Karmarkar case to decrease by something the Karmarkar
potential for the new problem at the image n−1 e of the current iterate is the same as to
decrease by the same quantity the potential of the initial problem at the actual iterate xi ;
thus, scaling allows to reduce the question of how to decrease the potential to the particular
case when the current iterate is the barycenter of ∆; this (specific for LP) possibility to deal
with certain convenient ”standard configuration” allows to carry out all required estimates
(which in our approach were consequences of general properties of self-concordant barriers)
P
via direct analysis of the behaviour of the standard logarithmic barrier F (x) = − i ln xi
in a neighbourhood of the point n−1 e, which is quite straightforward.
6.6. HOW TO IMPLEMENT THE METHOD OF KARMARKAR 101

Let me also add that in the Karmarkar situation our general estimate becomes

cT xi − c∗ ≤ (cT x
b − c∗ ) exp{− },
n
since the parameter of the barrier in the case in question is ϑ = n and the starting point
b = n−1 e is the minimizer of F on ∆ and, consequently, on the feasible set of the problem.
x

6.6 How to implement the method of Karmarkar


To the moment our abilities to solve conic problems by the method of Karmarkar are restricted
by the assumptions A - C. Among these assumptions, A (strict feasibility of the problem and
boundedness of the feasible set) is not that restrictive. Assumption B (a strictly feasible solution
should be known in advance) is not so pleasant, but let me postpone discussing this issue - this
is a common problem in interior point methods, and in the mean time we shall speak about it.
And what in fact is restrictive, is assumption C - we should know in advance the optimal value
in the problem. There are several ways to eliminate this unpleasant hypothesis; let me present
to you the simplest one - the sliding objective approach. Assume, instead of C, that
C∗ : we are given in advance a lower bound c∗0 for the unknown optimal value c∗
(this, of course, is by far less restrictive than the assumption that we know c∗ exactly). In this
case we may act as follows: at i-th iteration of the method, we use certain lower bound c∗i−1 for
c∗ (the initial lower bound c∗0 is given by C∗ ). When updating xi into xi+1 , we begin exactly as
in the original method, but use, instead of the objective

σ = c − c∗ e,

the ”current objective”


σi−1 = c − c∗i−1 e.
Now, after the current ”reduced Newton decrement” ω = ωi is computed, we check whether it is
≥ 13 . If it is the case, we proceed exactly as in the original scheme and do not vary the current
lower bound for the optimal value, i.e., set

c∗i = c∗i−1

and, consequently,
σi = σi−1 .
If it turns out that ωi < 1/3, we act as follows. The quantity ω given by rule 3) depends on the
objective σ the rules 1)-3) are applied to:

ω = Ωi (σ).

In the case in question we have


1
Ωi (c − te) < when t = c∗i−1 . (6.15)
3
The left hand side of this relation is certain explicit function of t (square root of a nonnegative
fractional-quadratic form of t); and as we know from the proof of Proposition 6.4.1,
1
Ωi (c − c∗ e) ≥ . (6.16)
3
102 CHAPTER 6. THE METHOD OF KARMARKAR

It follows that the equation Ωi (c − te) = 13 is solvable, and its closest to c∗i−1 root to the right of
c∗i−1 separates c∗ and c∗i−1 , i.e., this root (which can be immediately computed) is an improved
lower bound for c∗ . This is exactly the lower bound which we take as c∗i ; after it is found, we
set
σi = c − c∗i e
and update xi into xi+1 by the basic scheme applied to this ”improved” objective (for which
this scheme, by construction, results in ω = 13 ).
Following the line of argument used in the proofs of Propositions 6.3.1, 6.4.1, one can verify
that the modification in question produces strictly feasible solutions xi and nondecreasing lower
bounds c∗i ≤ c∗ of the unknown optimal value in such a way that the sequence of local potentials

vi (xi ) = F (xi ) + ϑ ln(σiT xi ) ≡ F (xi ) + ϑ ln(cT xi − c∗i )

decreases at a reasonable rate:


1
vi (xi ) ≤ vi−1 (xi−1 ) − ρ(− ),
3
which, in turn, ensures the rate of convergence

v0 (x0 ) − vi (xi ) iχ
cT xi − c∗ ≤ V exp{− } ≤ V exp{− },
ϑ ϑ
F (x
b) − minrint Kf F
V = (cT x
b − c∗0 ) exp{ }
ϑ
completely similar to that one for the case of known optimal value.
6.7. EXERCISES ON THE METHOD OF KARMARKAR 103

6.7 Exercises on the method of Karmarkar


Our first exercise is quite natural.
Exercise 6.7.1 #. Justify the sliding objective approach presented in Section 6.6.
Our next story gives a very instructive equivalent description of the method of Karmarkar
(in the LP case, this description is due to Bayer and Lagarias). At a step of the method the
situation is as follows: we are given a strictly feasible solution x to (PK ) and are struggling for
updating it into a new strictly feasible solution with ”significantly less” value of the potential.
Now, strictly feasible solutions are in one-to-one correspondence with strictly feasible rays - i.e.,
rays r = {ty | t > 0} generated by y ∈ M ∩ int K. Indeed, any strictly feasible solution x
spans a unique ray of this type, and any strictly feasible ray intersects the relative interior of
the feasible set in a unique point (since, as we know from (6.6), the quantity eT y is positive
whenever y ∈ M ∩ int K and therefore the normalization [eT y]−1 y is a strictly feasible solution
to the problem). On the other hand, the Karmarkar potential v is constant along rays, and
therefore it can be thought of as a function defined on the space R of strictly feasible rays.
Thus, the goal of a step can be reformulated as follows:
given a strictly feasible ray r, find a new ray r+ of this type with ”significantly less” value
of the potential.
Now let us make the following observation: there are many ways to identify strictly feasible rays
with points of certain set; e.g., given a linear functional g T x which is positive on M ∩ int K, we
may consider the cross-section K g of M ∩ K by the hyperplane given by the equation g T x = 1.
It is immediately seen that any strictly feasible ray intersects the relative interior of K g and,
vice versa, any point from this relative interior spans a strictly feasible ray. What we used in the
initial representation of the method, was the ”parameterization” of the space R of strictly feasible
rays by the points of the relative interior of the feasible set Kf (i.e., by the set K e associated,
in the aforementioned sense, with the constraint functional eT x). Now, what happens if we use
another parameterization of R? Note that we have a natural candidate on the role of g - the
objective σ (indeed, we know that σ T x is positive at any strictly feasible x and therefore is
positive on M ∩ int K). What is the potential in terms of our new parameterization of R, where
a strictly feasible ray r is represented by its intersection y(r) with the plane {y | σ T y = 1}?
The answer is immediate:
v(y(r)) = F (y(r)).
In other words, the goal of a step can be equivalently reformulated as follows:
given a point y from the relative interior of the set

K σ = {z ∈ M ∩ K | σ T z = 1},

find a new point y + of this relative interior with F (y + ) being ”significantly less” than F (y).
Could you guess what is the ”linesearch” (with x00 = argminy=x+t(x0 −x) v(y)) version of the
Karmarkar updating K in terms of this new parameterization of R?
Exercise 6.7.2 # Verify that the Karmarkar updating with linesearch is nothing but the Newton
iteration with linesearch as applied to the restriction of F onto the relative interior of K σ .
Now, can we see from our new interpretation of the method why it converges at the rate
given by Theorem 6.5.1? This is immediate:
Exercise 6.7.3 #+ Prove that
104 CHAPTER 6. THE METHOD OF KARMARKAR

• the set K σ is unbounded;


• the Newton decrement λ(φ, u) of the restriction φ of the barrier F onto the relative interior
of K σ is ≥ 1 at any point u ∈ rint K σ ;
• each damped Newton iteration (and therefore - Newton iteration with linesearch) as applied
to φ decreases φ at least by 1 − ln 2 > 0.
Conclude from these observations that each iteration of the Karmarkar method with linesearch
reduces the potential at least by 1 − ln 2.
Now we understand what in fact goes on in the method of Kramarkar. We start from the
problem of minimizing a linear objective over a closed and bounded convex domain Kf ; we know
the optimal value, i.e., we know what is the hyperplane {cT x = c∗ } which touches the feasible
set; what we do not know and what should be found, is where the plane touches the feasible set.
What we do is as follows (the below explanation is illustrated by a picture at the next page):
we perform projective transformation of the affine hull of Kf which moves the target plane
{cT x = c∗ } to infinity (this is exactly the transformation of Kf onto K σ given by the receipt: to
find an image of x ∈ rint Kf , take the intersection of the ray spanned by x with the hyperplane
{σ T y = 1}). The image of the feasible set Kf of the problem is an unbounded convex domain
K σ , and our goal is to go to infinity, staying within this image (the inverse image of the point
moving in K σ will then stay within Kf and approach the target plane {cT x = c∗ }). Now, in
order to solve this latter problem, we take a self-concordant barrier φ for K σ and apply to this
barrier the damped Newton method (or the Newton method with linesearch). As explained in
Exercise 6.7.3, the routine decreases φ at every step at least by absolute constant, thus enforcing
φ to tend to −∞ at certain rate. Since φ is convex (and therefore below bounded on any bounded
subset of K σ ), this inevitably enforces the iterate to go to infinity. Rather sophisticated way to
go far away, isn’t it?
Our last story is related to a quite different issue - to the anitcipated behaviour of the
method of Karmarkar. The question, unformally, is as follows: we know that a step of the
method decreases the potential at least by an absolute constant; this is given by our theoretical
worst-case analysis. What is the ”expected” progress in the potential?
It hardly makes sense to pose this question in the general case. In what follows we restrict
ourselves to the case of semidefinite programming, where
K = Sn+
is the cone of positive semidefinite symmetric n × n matrices and
F (x) = − ln Det x
is the standard n-logarithmically homogeneous self-concordant barrier for the cone (Lecture 5,
Example 5.3.3); the below considerations can be word by word repeated for the case of LP
P
(K = Rn+ , F (x) = − i ln xi ).
Consider a step of the method of Karmarkar with linesearch, the method being applied to a
semidefinite program. Let x be the current strictly feasible solution and x+ be its iterate given
by a single step of the linesearch version of the method. Let us pose the following question:
(?) what is the progress α = v(x) − v(x+ ) in the potential at the step in question?
To answer this question, it is convenient to pass to certain ”standard configuration” - to
perform scaling. Namely, consider the linear transformation
u 7→ X u = x−1/2 ux−1/2
in the space of symetric n × n matrices.
6.7. EXERCISES ON THE METHOD OF KARMARKAR 105

Exercise 6.7.4 # Prove that the scaling X possesses the following properties:

• it is a one-to-one mapping of int K onto itself;

• it ”almost preserves” the barrier:

F (X u) = F (u) + const(x);

in particular,
|X h|X u = |h|u , u ∈ int K, h ∈ Sn ;

• the scaling maps the feasible set Kf of problem (PK ) onto the feasible set of another
problem (PK0 ) of the same type; the updated problem is defined by the subspace

M 0 = X M,

the normalizing equation (e0 , x) = 1 with

e0 = x1/2 ex1/2

and the objective


σ 0 = x1/2 σx1/2 ;
this problem also satisfies the assumptions A - C;

• let v(·) be the potential of the initial problem, and v 0 be the potential of the new one. Then
the potentials at the corresponding points coincide, up to an additive constant:

Dom v 0 = X (Dom v); v 0 (X u) − v(u) ≡ const, , u ∈ Dom v;

• X maps the point x onto the unit matrix I, and the iterate x+ of x given by the linesearch
version of the method as applied to the initial problem into the similar iterate I + of I given
by the linesearch version of the method as applied to the transformed problem.

From Exercise 6.7.4 it is clear that in order to answer the question (?), it suffices to answer the
similar question (of course, not about the initial problem itself, but about a problem of the same
type with updated data) for the particular case when the current iterate is the unit matrix I.
Let us consider this special case. In what follows we use the original notation for the data of the
transformed problem; this should not cause any confusion, since we shall speak about exactly
one step of the method.
Now, what is the situation in our ”standard configuration” case x = I? It is as follows:
we are given a linear subspace M passing through x = I and the objective σ; what we know
is that2
I. (σ, u) ≥ 0 whenever u ∈ int K ∩ M and there exists a nonzero matrix x∗ ∈ int K ∩ M
such that (σ, x∗ ) = 0;
II. In order to update x = I into x+ , we compute the steepest descent direction ξ of the
Karmarkar potential v(·) at the point x along the affine plane

Ex = {y ∈ M | (F 0 (x), y − x) = 0},
2
from now on we denote the inner product on the space in question, i.e., on the space Sn of symmetric n × n
matrices, by (x, y) (recall that this is the Frobenius inner product Tr{xy}), in order to avoid confusion with the
matrix products like xT y
106 CHAPTER 6. THE METHOD OF KARMARKAR

the metric in the subspace being |h|x ≡ (F 00 (x)h, h)1/2 , i.e., find among the unit, with respect
to the indicated norm, directions parallel to Ex that one with the smallest (e.g., the ”most
negative”) inner product onto v 0 (x). Note that the Newton direction ex is proportional, with
positive coefficient, to the steepest descent direction ξ. Note also, that the steepest descent
direction of v at x is the same as the similar direction for the function n ln((σ, u)) at u = x (recall
that for the barrier in question ϑ = n), since x is the minimizer of the remaining component
F (·) of v(·) along Ex .
Now, in our standard configuration case x = i we have F 0 (x) = −I, and |h|x = (h, h)1/2 is
the usual Frobenius norm3 ; thus, ξ is the steepest descent direction of the linear form

φ(h) = n(σ, h)/(σ, I)

(this is the differential of n ln((σ, u)) at u = I) taken along the subspace

Π = M ∩ {h : Tr h ≡ (F 0 (I), h) = 0}

with respect to the standard Euclidean structure of our universe Sn . In other words, ξ is
proportional, with negative coefficient, to the orthogonal projection η of

S ≡ (σ, I)−1 σ

onto the subspace Π.


From these observations we conclude that
III. Tr η = 0; Tr S = 1 (since η ∈ Π and Π is contained in the subspace of matrices with
zero trace, and due to the origin of S, respectively);
IVa. (S, u) > 0 for all positive definite u of the form I +rη, r ∈ R (an immediate consequence
of I.);
IVb. There exists positive semidefinite matrix χ∗ such that χ∗ − I ∈ Π and (S, χ∗ ) = 0 (χ∗
is proportional to x∗ with the coefficient given by the requirement that (F 0 (I), χ∗ − I) = 0, or,
which is the same, by the requirement that Tr χ∗ = n; recall that F 0 (I) = −I).
Now, at the step we choose t∗ as the minimizer of the potential v(I − tη) over the set t of
nonnegative T such that I − tη ∈ Dom v, or, which is the same in view of I., such that I − tη is
positive definite4 , and define x+ as (e, x00 )−1 x00 , x00 = I − t∗ η; the normalization x00 7→ x+ does
not vary the potential, so that the quantity α we are interested in is simply v(I) − v(x00 ).
To proceed, let us look at the potential along our search ray:

v(I − tη) = − ln Det (I − tη) + n ln((S, I − tη)).

III. says to us that (S, I) = 1; since η is the orthoprojection of S onto Π (see II.), we have also
(S, η) = (η, η). Thus,
n
X
φ(t) ≡ v(I − tη) = − ln Det (I − tη) + n ln(1 − t(η, η)) = − ln((1 − tgi ) + n ln(1 − t|g|22 ), (6.17)
i=1

where g = (g1 , ..., gn )T is the vector comprised of the eigenvalues of the symmetric matrix η.
Exercise 6.7.5 #+ Prove that
P
1) ni=1 gi = 0;
2) |g|∞ ≥ n−1 .
3
due to the useful formulae for the derivatives of the barrier F (u) = − ln Det u: F 0 (u) = −u−1 , F 00 (u)h =
u−1 hu−1 ; those solved Exercise 3.3.3, for sure know these formulae, and all others are kindly asked to derive them
4
recall that ex is proportional, with positive coefficient, to ξ and, consequently, is proportional, with negative
coefficient, to η
6.7. EXERCISES ON THE METHOD OF KARMARKAR 107

Now, from (6.17) it turns out that the progress in the potential is given by
n
X
α = φ(0) − min φ(t) = max[ ln(1 − tgi ) − n ln(1 − t|g|22 )], (6.18)
t∈T t∈T
i=1

where T = {t ≥ 0 | 1 − tgi > 0, i = 1, ..., n}.

Exercise 6.7.6 #+ Testing the value of t equal to


n
τ≡ ,
1 + n|g|∞

demonstrate that  2
|g|2
α ≥ (1 − ln 2) . (6.19)
|g|∞
The conclusion of our analysis is as follows:
each step of the method of Karmarkar with linesearch applied to a semidefinite program can
be associated with an n-dimensional vector g (depending on the data and the iteration number)
in such a way that the progress in the Karmarkar potential at a step is at least the quantity
given by (6.19).
Now, the worst case complexity bound for the method comes from the worst case value of
the right hand side in (6.19); this latter value (equal to 1 − ln 2) corresponds to the case when
|g|2 |g|−1
∞ ≡ π(g) attains its minimum in g (which is equal to 1); note that π(g) is of order of 1
only if g is an ”orth-like” vector - its 2-norm comes from O(1) dominating coordinates. Note,
anyhow, that the ”typical” n-dimensional vector is far from being an ”orth-like” one, and the
”typical” value of π(g) is much larger than 1. Namely, if g is a random vector in Rn with the
direction uniformly distributed on the unit sphere, than the ”typical value” of π(g) is of order of
p
n/ ln n (the probability for π to be less than certain absolute constant times this square root
tends to 0 as n → ∞; please prove this simple statement). If (if!) we could use this ”typical”
value of π(g) in our lower bound for the progress in the potential, we would come to the progress
per step equal to O(n/ ln n) rather than to the worst-case value O(1); as a result, the Newton
complexity of finding ε-solution would be proportional to ln n rather than to n, which would be
actually excellent! Needless to say, there is no way to prove something definite of this type, even
after we equip the family of problems in question by a probability distribution in order to treat
the vectors g arising at sequential steps as a random sequence. The difficulty is that the future
of the algorithm is strongly predetermined by its past, so that any initial symmetry seems to be
destroyed as the algorithm goes on.
Note, anyhow, that impossibility to prove something does not necessarily imply impossibility
to understand it. The ”anticipated” complexity of the method (proportional to ln n rather than
to n) seems to be quite similar to its empirical complexity; given the results of the above
”analysis”, one hardly could be too surprised by this phenomenon.
108 CHAPTER 6. THE METHOD OF KARMARKAR
Chapter 7

The Primal-Dual potential reduction


method

We became acquainted with the very first of the potential reduction interior point methods
- with the method of Karmarkar. Theoretically, a disadvantage of the method is in not so
good complexity bound - it is proportional to the parameter ϑ of the underlying barrier, not
to the square root of this parameter, as in the case of the path-following
√ method. There are,
anyhow, potential reduction methods with the same theoretical O( ϑ) complexity bound as in
the path-following scheme; these methods combine the best known theoretical complexity with
the practical advantages of the potential reduction algorithms. Our today lecture is devoted to
one of these methods, the so called Primal-Dual algorithm; the LP prototype of the construction
is due to Todd and Ye.

7.1 The idea


The idea of the method is as follows. Consider a convex problem in the conic form

(P) : minimize cT x s.t. x ∈ {b + L} ∩ K

along with its conic dual

(D) : minimize bT s s.t. s ∈ {c + L⊥ } ∩ K ∗ ,

where

• K is a cone (closed, pointed, convex and with a nonempty interior) in Rn and

K ∗ = {s ∈ Rn | sT x ≥ 0 ∀x ∈ K}

is the cone dual to K;

• L is a linear subspace in Rn , L⊥ is its orthogonal complement and c, b are given vectors


from Rn - the primal objective and the primal translation vector, respectively.

From now on, we assume that


A: both primal and dual problems are strictly feasible, and we are given an initial strictly
feasible primal-dual pair (x
b, sb) [i.e., a pair of strictly feasible solutions to the problems].

109
110 CHAPTER 7. THE PRIMAL-DUAL POTENTIAL REDUCTION METHOD

This assumption, by virtue of the Conic duality theorem (Lecture 5), implies that both the
primal and the dual problem are solvable, and the sum of the optimal values in the problems is
equal to cT b:
P ∗ + D∗ = cT b. (7.1)
Besides this, we know from Lecture 5 that for any pair (x, s) of feasible solutions to the problems
one has
δ(x, s) ≡ cT x + bT s − cT b = sT x ≥ 0. (7.2)
Substracting from this identity equality (7.1), we come to the following conclusion:
(*): for any primal-dual feasible pair (x, s), the duality gap δ(x, s) is nothing but the sum of
inaccuracies, in terms of the corresponding objectives, of x regarded as an approximate solution
to the primal problem and s regarded as an approximate solution to the dual one.
In particular, all we need is to generate somehow a sequence of primal-dual feasible pairs
with the duality gap tending to zero.
Now, how to enforce the duality gap to go to zero? To this end we shall use certain potential;
to construct this potential, this is our first goal.

7.2 Primal-dual potential


From now on we assume that
B: we know a ϑ-logarithmically homogeneous self-concordant barrier F for the primal cone
K along with its Legendre transformation

F ∗ (s) = sup [sT x − F (x)].


x∈int K

(”we know”, as usual, means that given x, we can check whether x ∈ Dom F and if it is the
case, can compute F (x), F 0 (x), F 00 (x), and similarly for F ∗ ).
As we know from Lecture 5, F ∗ is ϑ-logarithmically homogeneous self-concordant barrier for
the cone −K ∗ anti-dual to K, and, consequently, the function

F + (s) = F ∗ (−s)

is a ϑ-logarithmically homogeneous self-concordant barrier for the dual cone K ∗ involved into
the dual problem. In what follows I refer to F as to the primal, and to F + - as to the dual
barrier.
Now let us consider the following aggregate:

V0 (x, s) = F (x) + F + (s) + ϑ ln(sT x) (7.3)

This function is well-defined on the direct product of the interiors of the primal and the dual
cones, and, in particular, on the direct product

rint Kp × rint Kd

of the relative interiors of the primal and dual feasible sets

Kp = {b + L} ∩ K, Kd = {c + L⊥ } ∩ K ∗ .
7.2. PRIMAL-DUAL POTENTIAL 111

The function V0 resembles the Karmarkar potential; indeed, when s ∈ rint Kd is fixed, this func-
tion, regarded as a function of primal feasible x, is, up to an additive constant, the Karmarkar
potential of the primal problem, where one should replace the initial objective c by the objective
s 1.
Note that we know something about the aggregate V0 : Proposition 5.3.3 says to us that

(**) for any pair (x, s) ∈ Dom V0 ≡ int (K × K ∗ ), one has

V0 (x, s) ≥ ϑ ln ϑ − ϑ, (7.4)

the inequality being equality if and only if ts + F 0 (x) = 0 for some positive t.

Now comes the crucial step. Let us choose a positive µ and pass from the aggregate V0 to
the potential

Vµ (x, s) = V0 (x, s) + µ ln(sT x) ≡ F (x) + F + (s) + (ϑ + µ) ln(sT x).

My claim is that this potential possesses the same fundamental property as the Karmarkar
potential: when it is small (i.e., negative with large absolute value) at a strictly feasible primal-
dual pair (x, s), then the pair is comprised of good primal and dual approximate solutions.
The reason for this claim is clear: before we had added to the aggregate V0 the ”penalty
term” µ ln(sT x), the aggregate was below bounded, as it is said by (7.4); therefore the only way
for the potential to be small is to have small (negative of large modulus) value of the penalty
term, which, in turn, may happen only when the duality gap (which at a primal-dual feasible
pair (x, s) is exactly sT x, see (7.2)) is close to zero.
The quantitive expression of this observation is as follows:

Proposition 7.2.1 For any strictly feasible primal-dual pair (x, s) one has

Vµ (x, s)
δ(x, s) ≤ Γ exp{ }, Γ = exp{−µ−1 ϑ(ln ϑ − 1)}. (7.5)
µ

The proof is immediate:

Vµ (x, s) − V0 (x, s)
ln δ(s, x) = ln(sT x) = ≤
µ

[due to (7.4)]
Vµ (x, s)
≤ − µ−1 ϑ(ln ϑ − 1).
µ

Thus, enforcing the potential to go to −∞ along a sequence of strictly feasible primal-dual


pairs, we enforce the sequence to converge to the primal-dual optimal set. Similarly to the
method of Karmarkar, the essence of the matter is how to update a strictly feasible pair (x, s)
into another strictly feasible pair (x+ , s+ ) with ”significantly less” value of the potential. This
is the issue we come to.
1
by the way, this updating of the primal objective varies it by a constant (it is an immediate consequence of
the fact that s is dual feasible)
112 CHAPTER 7. THE PRIMAL-DUAL POTENTIAL REDUCTION METHOD

7.3 The primal-dual updating


The question we address to in this section is:
given a strictly feasible pair (x, s), how to update it into a new strictly feasible pair (x+ , s+ )
in a way which ensures ”significant” progress in the potential Vµ ?
It is natural to start with investigating possibilities to reduce the potential by changing one
of our two - primal and dual - variables, not both of them simultaneously. Let us look what are
our abilities to improve the potential by changing the primal variable.
The potential Vµ (y, v), regarded as a function of the primal variable, resembles the Karmarkar
potential, and it is natural to improve it as it was done in the method of Karmarkar. There is,
anyhow, important difference: the Karmarkar potential was constant along primal feasible rays,
and in order to improve it, we first pass from the ”unconvenient” fesible set Kp of the original
primal problem to a more convenient set G (see Lecture 6), which is in fact the projective image
of Kp . Now the potential is not constant along rays, and we should reproduce the Karmarkar
construction in the actual primal feasible set. Well, there is nothing difficult in it. Let us write
down the potential as the function of the primal variable:

v(y) ≡ Vµ (y, s) = F (y) + ζ ln sT y + const(s) : rint Kp → R,

where
ζ = ϑ + µ, const(s) = F + (s).
Now, same as in the method of Karmarkar, let us linearize the logarithmic term in v(·), i.e.,
form the function
sT y
vx (y) = F (y) + ζ T + const(x, s) : rint Kp → R, (7.6)
s x
where, as it is immediately seen,

const(x, s) = const(s) + ζ ln sT x − ζ.

Same as in the Karmarkar situation, vx is an upper bound for v:

vx (y) ≥ v(y), y ∈ rint Kp ; vx (x) = v(x), (7.7)

so that in order to update x into a new strictly feasible primal solution x+ with improved value
of the potential v(·), it suffices to improve the value of the upper bound vx (·) of the potential.
Now, vx is the sum of a self-concordant barrier for the primal feasible set (namely, the restriction
of F onto this set) and a linear form, and therefore it is self-concordant on the relative interior
rint Kp of the primal feasible set; consequently, to decrease the function, we may use the damped
Newton method. Thus, we come to the following
Rule 1. In order to update a given strictly feasible pair (x, s) into a new strictly feasible pair
(x0 , s) with the same dual component and with better value of the potential Vµ , act as follows:
1) Form the ”partially linearized” reduced potential vx (y) according to (7.6);
2) Update x into x0 by damped Newton iteration applied to vx (·), i.e.,
- compute the (reduced) Newton direction
1
ex = argmin{hT ∇y vx (x) + hT ∇2y vx (x)h | h ∈ L} (7.8)
2
and the (reduced) Newton decrement
q
ω= −eTx ∇y vx (x); (7.9)
7.3. THE PRIMAL-DUAL UPDATING 113

- set
1
x0 = x + ex .
1+ω

As we know from Lecture 2, the damped Newton step keeps the iterate within the domain of
the function, so that x0 ∈ rint Kp , and decreases the function at least by ρ(−ω) ≡ ω − ln(1 + ω).
This is the progress in vx ; from (7.7) it follows that the progress in the potential v(·), and,
consequently, in Vµ , is at least the progress in vx . Thus, we come to the following conclusion:
I. Rule 1 transforms the initial strictly feasible primal-dual pair (x, s) into a new strictly
feasible primal-dual pair (x0 , s), and the potential Vµ at the updated pair is such that

Vµ (x, s) − Vµ (x0 , s) ≥ ω − ln(1 + ω), (7.10)

ω being the reduced Newton decrement given by (7.8) - (7.9).


Now, in the method of Karmarkar we proceeded by proving that the reduced Newton decre-
ment is not small. This is not the case anymore; the quantity ω can be very close to zero or
even equal to zero. What should we do in this unpleasant sutiation where Rule 1 fails? Here
again our experience with the method of Karmarkar gives the answer. Look, the potential

Vµ (y, s) = F (y) + F + (s) + ζ ln sT y

regarded as a function of the strictly feasible primal solution y is nothing but

F (y) + F + (s) + ζ ln(cT y − [cT b − bT s]),

since for primal-dual feasible (y, s) the product sT y is nothing but the duality gap cT y+bT s−cT b
(Lecture 5). The duality gap is always nonnegative, so that the quantity

cT b − bT s

associated with a dual feasible s is a lower bound for the primal optimal value. Thus, the
potential Vµ , regarded as a function of y, resembles the ”local” potential used in the sliding
objective version of the method of Karmarkar - the Karmarkar potential where the primal
optimal value is replaced by its lower bound. Now, in the sliding objective version of the
method of Karmarkar we also met with the situation when the reduced Newton decrement was
small, and, as we remember, in this situation we were able to update the lower bound for the
primal optimal value and thus got the possibility to go ahead. This is more or less what we are
going to do now: we shall see in a while that if ω turns out to be small, then there is a possibility
to update the current dual strictly feasible solution s into a new solution s0 of this type and to
improve by this ”significantly” the potential.
To get the idea how to update the dual solution, consider the ”worst” for Rule 1 case - the
reduced Newton decrement ω is zero. What happens in this situation? The reduced Newton
decrement is zero if and only if the gradient of vx , taken at x along the primal feasible plane, is 0,
or, which is the same, if the gradient taken with respect to the whole primal space is orthogonal
to L, i.e., if and only if
s
F 0 (x) + ζ T ∈ L⊥ . (7.11)
s x
This is a very interesting relation. Indeed, let

sT x 0
s∗ ≡ − F (x) (7.12)
ζ
114 CHAPTER 7. THE PRIMAL-DUAL POTENTIAL REDUCTION METHOD

The above inclusion says that −s∗ + s ∈ L⊥ , i.e., that s∗ ∈ s + L⊥ ; since s ∈ c + L⊥ , we come
to the relation
sT x 0
s∗ ≡ − F (x) ∈ c + L⊥ . (7.13)
ζ
The latter relation says that the vector −F 0 (x) can be normalized, by multiplication by a positive
constant, to result in a vector s∗ from the dual feasible plane. On the other hand, s∗ belongs
to the interior of the dual cone K ∗ , since −F 0 (x) does (Proposition 5.3.3). Thus, in the case
in question (when ω = 0), a proper normalization of the vector −F 0 (x) gives us a new strictly
feasible dual solution s0 ≡ s∗ . Now, what happens with the potential when we pass from s to s∗
(and do not vary the primal solution x)? The answer is immediate:

Vµ (x, s) = V0 (x, s) + µ ln sT x ≥ ϑ ln ϑ − ϑ + µ ln sT x;

Vµ (x, s∗ ) = V0 (x, s∗ ) + µ ln(s∗ )T x = ϑ ln ϑ − ϑ + µ ln(s∗ )T x


(indeed, we know from (**) that V0 (y, u) ≥ ϑ ln ϑ − ϑ, and that this inequality is an equality
when u = −tF 0 (y), which is exactly the case for the pair (x, s∗ )). Thus, the progress in the
potential is at least the quantity
!
T ∗ T T sT x
α = µ[ln s x − ln(s ) x] = µ[ln s x − ln (−F 0 (x))T x ] =
ζ

ζ ζ µ
= µ ln = µ ln = µ ln(1 + ) (7.14)
(−F 0 (x))T x ϑ ϑ
(the second equality in the chain is (7.12), the fourth comes from the identity (5.5), see Lecture
5). Thus, we see that in the particular case ω = 0 updating

sT x 0
(x, s) 7→ (x, s∗ = − F (x))
ζ
results in a strictly feasible primal-dual pair and decreases the potential at least by the quantity
µ ln(1 + µ/ϑ).
We have seen what to do in the case of ω = 0, when Rule 1 does not work at all. This is
unsifficient: we should understand also what to do when Rule 1 works, but works bad, i.e., when
ω is small, although nonzero. But this is more or less clear: what is good for the limiting case
ω = 0, should work also when ω is small. Thus, we get an idea to use, in the case of small ω, the
updating of the dual solution given by (7.12). This updating, anyhow, cannot be used directly,
since in the case of positive ω it results in s∗ which is unfeasible for the dual problem. Indeed,
dual feasibility of s∗ in the case of ω = 0 was a consequence of two facts:
1. Inclusion s∗ ∈ int K ∗ - since s∗ is proportional, with negative coefficient, to F 0 (x), and
all vectors of this type do belong to int K ∗ (Proposition 5.3.3); the inclusion s∗ ∈ int K ∗ is
therefore completely independent of whether ω is large or small;
2. Inclusion s∗ ∈ c + L⊥ . This inclusion came from (7.11), and it does use the hypothesis
that ω = 0 (and in fact is equivalent to this hypothesis).
Thus, we meet with the difficulty that 2. does not remain valid when ω is positive, although
small. Ok, if the only difficulty is that s∗ given by (7.12) does not belong to the dual feasible
plane, we can correct s∗ - replace it by a properly chosen projection s0 of s∗ onto the dual
feasible plane. When ω = 0, s∗ is in the dual feasible plane and in the interior of the cone K ∗ ;
by continuity reasons, for small ω s∗ is close to the dual feasible plane and the projection will
be close to s∗ and therefore, hopefully, will be still in the interior of the dual cone (so that s0 ,
which by construction is in the dual feasible plane, will be strictly dual feasible), and, besides
7.3. THE PRIMAL-DUAL UPDATING 115

this, the updating (x, s) 7→ (x, s0 ) would result in ”almost” the same progress in the potential
as in the above case ω = 0.
The outlined idea is exactly what we are going to use. The implementation of it is as follows.

Rule 2. In order to update a strictly feasible primal-dual pair (x, s) into a new strictly
feasible primal-dual pair (x, s0 ), act as follows. Same as in Rule 1, compute the reduced Newton
direction ex , the reduced Newton decrement ω and set

sT x 0
s0 = − [F (x) + F 00 (x)ex ]. (7.15)
ζ

Note that in the case of ω = 0 (which is equivalent to ex = 0), updating (7.15) becomes
exactly the updating (7.12). As it can be easily seen2 , s0 is the projection of s∗ onto the dual
feasible plane in the metric given by the Hessian (F + )00 (s∗ ) of the dual barrier at the point s∗ ;
in particular, s0 always belong to the dual feasible plane, although not necesarily to the interior
of the dual cone K ∗ ; this latter inclusion, anyhow, for sure takes place if ω < 1, so that in this
latter case s0 is strictly dual feasible. Moreover, in the case of small ω the updating given by
Rule 2 decreases the potential ”significantly”, so that Rule 2 for sure works well when Rule 1
does not, and choosing the best of these two rules, we come to the updating which always works
well.
The exact formulation of the above claim is as follows:
II. (i) The point s0 given by (7.15) always belongs to the dual feasible plane.
(ii) The point s0 is in the interior of the dual cone K ∗ (and, consequently, is dual strictly
feasible) whenever ω < 1, and in this case one has

ϑ+µ
Vµ (x, s) − Vµ (x, s0 ) ≥ µ ln √ − ρ(ω), ρ(r) = − ln(1 − r) − r, (7.16)
ϑ+ω ϑ
and the progress in the potential is therefore positive for all small enough positive ω.
Proof.
10 . By definition, ex is the minimizer of the quadratic form

1
Q(h) = hT [F 0 (x) + γs] + hT F 00 (x)h,
2
ζ ϑ+µ
γ= ≡ T , (7.17)
sT x s x
over h ∈ L; note that
hT [F 0 (x) + γs] = hT ∇y vx (x), h ∈ L.
Writing down the optimality condition, we come to

F 00 (x)ex + [F 0 (x) + γs] ≡ ξ ∈ L⊥ ; (7.18)

multiplying both sides by ex ∈ L, we come to

ω 2 ≡ −eTx ∇y vx (x) = −eTx [F 0 (x) + γs] = eTx F 00 (x)ex . (7.19)


2
we skip verification, since we do not use this fact; those interested can make the corresponding computation
116 CHAPTER 7. THE PRIMAL-DUAL POTENTIAL REDUCTION METHOD

20 . From (7.18) and (7.15) it follows that


1
s0 ≡ − [F 0 (x) + F 00 (x)ex ] = s − γ −1 ξ ∈ s + L⊥ , (7.20)
γ

and since s ∈ c + L⊥ (recall that s is dual feasible), we conclude that s0 ∈ c + L⊥ , as claimed in


(i).
Besides this,
1
s∗ = − F 0 (x) (7.21)
γ
(see (7.12), (7.17)), so that the equivalence in (7.20) says that
1 00
s0 = s∗ − F (x)ex . (7.22)
γ

30 . Since F + (u) = F ∗ (−u) is ϑ-logarithmically homogeneous self-concordant barrier for K ∗


(Proposition 5.3.3), we have

(F + )0 (tu) = t−1 (F + )0 (u), u ∈ int K, t > 0

(see (5.3), Lecture 5); differentiating in u, we come to

(F + )00 (tu) = t−2 (F + )00 (u).

Substituting u = −F 0 (s) and t = 1/γ and taking into account the relation between F + and the
Legendre transformation F ∗ of the barrier F , we come to

(F + )00 (s∗ ) = γ 2 (F + )00 (−F 0 (x)) = γ 2 (F ∗ )00 (F 0 (x)).

But F ∗ is the Legendre transformation of F , and therefore (see (L.3), Lecture 2)

(F ∗ )00 (F 0 (x)) = [F 00 (x)]−1 ;

thus, we come to
(F + )00 (s∗ ) = γ 2 [F 00 (x)]−1 . (7.23)
Combining this observation with relation (7.22), we come to

[s0 − s∗ ]T (F + )00 (s∗ )[s0 − s∗ ] = [F 00 (x)ex ]T [F 00 (x)]−1 [F 00 (x)ex ] = eTx F 00 (x)ex = ω 2

(the concluding equality is given by (7.19)). Thus, we come to the following conclusion:
IIa. The distance |s0 −s∗ |F + ,s∗ between s∗ and s0 in the Euclidean metric given by the Hessian
(F + )00 (s∗ )
of the dual barrier F + at the point s∗ is equal to the reduced Newton decrement ω.
In particular, if this decrement is < 1, s0 belongs to the centered at s∗ open unit Dikin ellipsoid
of the self-concordant barrier F + and, consequently, s0 belongs to the domain of the barrier (I.,
Lecture 2), i.e., to int K ∗ . Since we already know that s0 always belongs to the dual feasible
plane (see 20 ), s0 is strictly dual feasible whenever ω < 1.
We have proved all required in (i)-(ii), except inequality (7.16) related to the progress in the
potential. This is the issue we come to, and from now on we assume that ω < 1, as it is stated
in (7.16).
40 . Thus, let us look at the progress in the potential

xT s0
α = Vµ (x, s) − Vµ (x, s0 ) = V0 (x, s) − V0 (x, s0 ) − µ ln . (7.24)
xT s
7.3. THE PRIMAL-DUAL UPDATING 117

We have
h i
V0 (x, s0 ) = F (x) + F + (s0 ) + ϑ ln xT s0 = F (x) + F + (s∗ ) + ϑ ln xT s∗ +
1
" #
xT s0
+ F + (s0 ) − F + (s∗ ) + ϑ ln T ∗ ; (7.25)
x s 2
since s∗ = −tF 0 (x) with some positive t, (**) says to us that

[·]1 = ϑ ln ϑ − ϑ. (7.26)

Now, s0 , as we know from IIa., is in the open unit Dikin ellipsoid of F + centered at s∗ , and
the corresponding local distance is equal to ω; therefore, applying the upper bound (2.4) from
Lecture 2 (recall that F + is self-concordant), we come to

F + (s0 ) − F + (s∗ ) ≤ [s0 − s∗ ]T (F + )0 (s∗ ) + ρ(ω), ρ(r) = − ln(1 − r) − r. (7.27)

We have s∗ = −γ −1 F 0 (x), and since F + is ϑ-logarithmically homogeneous,

(F + )0 (s∗ ) = γ(F + )0 (−F 0 (x))

((5.3), Lecture 5); since F + (u) = F ∗ (−u), F ∗ being the Legendre transformation of F , we have

(F + )0 (−F 0 (x)) = −(F ∗ )0 (F 0 (x)),

and the latter quantity is −x ((L.2), Lecture 2). Thus,

(F + )0 (s∗ ) = −γx.

Now, by (7.22) we have s0 − s∗ = −γ −1 F 00 (x)ex , so that

[s0 − s∗ ]T (F + )0 (s∗ ) = xT F 00 (x)ex .

From this observation and (7.27) we conclude that

xT s0
[·]2 ≤ xT F 00 ex + ρ(ω) + ϑ ln ,
xT s∗
which combined with (7.25) and (7.26) results in

xT s0
V0 (x, s0 ) ≤ ϑ ln ϑ − ϑ + xT F 00 (x)ex + ρ(ω) + ϑ ln . (7.28)
xT s∗
On the other hand, we know from (**) that V0 (x, s) ≥ ϑ ln ϑ − ϑ; combining this inequality,
(7.24) and (7.28), we come to

xT s0 xT s0
α ≥ −xT F 00 (x)ex − ρ(ω) − ϑ ln − µ ln . (7.29)
xT s∗ xT s
50 . Now let us find appropriate representations for the inner products involved into (7.29).
To this end let us set
π = −xT F 00 (x)ex . (7.30)
In view of (7.22) we have
1 T 00 π
xT s0 = xT s∗ − x F (x)ex = xT s∗ +
γ γ
118 CHAPTER 7. THE PRIMAL-DUAL POTENTIAL REDUCTION METHOD

and, besides this,


1 ϑ
xT s∗ = − xT F 0 (x) =
γ γ
(see (7.21) and (5.5), Lecture 5). We come to

ϑ+π T ∗ ϑ
xT s0 = , x s = ,
γ γ

whence
xT s0 π
=1+ , (7.31)
xT s∗ ϑ
and
xT s0 ϑ+π ϑ+π
T
= T
= (7.32)
x s γx s ϑ+µ
(the concluding equality follows from the definition of γ, see (7.17)).
Substituting (7.31) and (7.32) into (7.29), we come to the following expression for the progress
in potential:  
π ϑ+π
α ≥ π − ρ(ω) − ϑ ln 1 + − µ ln . (7.33)
ϑ ϑ+µ
Taking into account that ln(1 + z) ≤ z, we derive from this inequality that

ϑ+µ
α ≥ µ ln − ρ(ω). (7.34)
ϑ+π
Our last task is to evaluate π, which is immediate:
q q √
|π| = |xT F 00 (x)ex | ≤ xT F 00 (x)x eTx F 00 (x)ex ≤ ω ϑ

(we have used (7.19) and identity (5.5), Lecture 5). With this estimate we derive from (7.34)
that
ϑ+µ
α ≥ µ ln √ − ρ(ω), (7.35)
ϑ+ω ϑ
as claimed in II.

7.4 Overall complexity analysis


We have presented two rules - Rule 1 and Rule 2 - for updating a strictly feasible primal-dual
pair (x, s) into a new pair of the same type. The first of the rules always is productive, although
the progress in the potential for the rule is small when the reduced Newton decrement ω is small;
the second of the rules, on contrary, is for sure productive when ω is small, although for large
ω it may result in an unfeasible s0 . And, of course, what we should do is to apply both of the
rules and choose the best of the results. Thus, we come to the
Primal-Dual Potential Reduction method P D(µ):
form the sequence of strictly feasible primal-dual pairs (xi , si ), starting with the initial pair
(x0 = xb, s0 = sb) (see A), as follows:

1) given (xi−1 , si−1 ), apply to the pair Rules 1 and 2 to get the updated pairs (x0i−1 , si−1 )
and (xi−1 , s0i−1 ), respectively.
7.4. OVERALL COMPLEXITY ANALYSIS 119

2) Check whether s0i−1 is strictly dual feasible. If it is not the case, forget about the pair
(xi−1 , s0i−1 ) and set (x+ + 0 + +
i , si ) = (xi−1 , si−1 ), otherwise choose as (xi , si ) the best (with the
smallest value of the potential Vµ ) of the two pairs given by 1).
3) The pair (x+ +
i , si ) for sure is a strictly feasible primal-dual pair, and the value of the
potential Vµ at the pair is less than at the pair (xi−1 , si−1 ). Choose as (xi , si ) an arbitrary
strictly feasible primal-dual pair such that the potential Vµ at the pair is not greater than at
(x+ + + +
i , si ) (e.g., set xi = xi , si = si ) and loop.

The method, as it is stated now, involves the parameter µ, which in principle can be chosen
as an arbitrary positive real. Let us find out what is the reasonable choice of the parameter.
To this end let us note that what we are intersted in is not the progress p in the potential Vµ
per step, but the quantity β = π/µ, since this is this ratio which governs the exponent in the
accuracy estimate (7.5). Now, at a step it may happen that we are in the situation ω = O(1),
say, ω = 1, so that the only productive rule is Rule 1 and the progress in the potential, according
to I., is of order of 1, which results in β = O(1/µ). On the other hand, we may come to the
situation ω = 0, when the only productive rule is Rule 2, and the progress in the potential is
p = µ ln(1 + µ/ϑ), see (7.16), i.e., β = ln(1 + µ/ϑ). A reasonable choice of µ should balance the
values of β for these two cases, which leads to

µ = κ ϑ,

κ being of order of 1. The complexity of the primal-dual method for this - ”optimal” - choice of
µ is given by the following

Theorem 7.4.1 Assume that the primal-dual pair of conic problems (P), (D) (which satis-
fies assumption A) is solved by the primal-dual potential reduction method associated with ϑ-
logarithmically self-concordant primal and dual barriers F and F + , and that the parameter µ of
the method is chosen according to √
µ = κ ϑ,
with certain κ > 0. Then the method generates a sequence of strictly feasible primal-dual pairs
(xi , si ), and the duality gap δ(xi , xi ) (equal to the sum of residuals, in terms of the corresponding
objectives, of the components of the pair) admits the following upper bound:

Vµ (x
b, sb) − Vµ (xi , si ) iΩ(κ)
δ(xi , si ) ≤ V exp{− √ } ≤ V exp{− √ }, (7.36)
κ ϑ κ ϑ
where
 
Ω(κ) = min 1 − ln 2; inf max{ω − ln(1 + ω); κ ln(1 + κ) − (κ − 1)ω + ln(1 − ω)} (7.37)
0≤ω<1

is positive continuous function of κ > 0; the data-dependent scale factor V is given by


V0 (x
b, sb) − [ϑ ln ϑ − ϑ]
V = δ(x
b, sb) exp{ √ }. (7.38)
κ ϑ
In particular, the Newton complexity (# of iterations of the method) of finding ε-solutions to
the primal and the dual problems does not exceed the quantity
√  
V
NPrDl (ε) ≤ Oκ (1) ϑ ln + 1 + 1, (7.39)
ε
with the constant factor Oκ (1) depending on κ only.
120 CHAPTER 7. THE PRIMAL-DUAL POTENTIAL REDUCTION METHOD

The proof is immediate. Indeed, we know from Proposition 7.2.1 that

Vµ (xi , si ) Vµ (x
b, sb) Vµ (x
b, sb) − Vµ (xi , si )
δ(xi , si ) ≤ Γ exp{ } = [Γ exp{ }] exp{− },
µ µ µ

which, after substituting the value of Γ from (7.5), results in the first inequality in (7.36), with
V given by (7.38).
To prove the second inequality in (7.36), it suffices to demonstrate that the progress in the
potential Vµ at a step of the method is at least the quantity Ω(κ) given by (7.37). To this
end let us note that, by construction, this progress is at least the progress given by each of the
rules 1 and 2 (when Rule 2 does not result in a strictly feasible dual solution, the corresponding
progress is −∞). Let ω be the reduced Newton decrement at the step in question. If ω ≥ 1, then
the progress related to Rule 1 is at least 1 − ln 2, see I., which clearly is ≥ Ω(κ). Now consider
the case when ω < 1. Here both of the rules 1 and 2 are productive, and the corresponding
reductions in the potential are, respectively,

p1 = ω − ln(1 + ω)

(see I.) and



ϑ+µ √ 1 + κ/ ϑ
p2 = µ ln √ + ln(1 − ω) + ω = κ ϑ ln √ + ln(1 − ω) + ω
ϑ+ω ϑ 1 + ω/ ϑ

(see II.). We clearly have


√ √ √ √
p2 = κ ϑ ln(1 + κ/ ϑ) − κ ϑ ln(1 + ω/ ϑ) + ln(1 − ω) + ω ≥

[since ln(1 + z) ≤ z] √ √
≥ κ ϑ ln(1 + κ/ ϑ) − κω + ln(1 − ω) + ω ≥
[since, as it is immediately seen, z ln(1 + a/z) ≥ ln(1 + a) whenever z ≥ 1 and a > 0]

≥ κ ln(1 + κ) − κω + ln(1 − ω) + ω,

and we come to the inequality

max{p1 , p2 } ≥ max{ω − ln(1 + ω); κ ln(1 + κ) − (κ − 1)ω + ln(1 − ω)},

so that the progress in the potential in the case of ω < 1 is at least the quantity given by (7.37).
The claim that the right hand side of (7.37) is a positive continuous function of κ > 0 is
evidently true. The complexity bound (7.39) is an immediate consequence of (7.36).

7.5 Large step strategy


To conclude the presentation of the primal-dual method, let me briefly outline how one could
exploit the advantages of the potential reduction nature of the method. Due to this nature, the
only thing we are interested in is ”significant” progress in the potential at a step, same as it was
in the method of Karmarkar. In this latter method, the simplest way to get a better progress
than that one given by the ”default” theoretical step, was to perform linesearch in the direction
of this default step and to find the best, in terms of the potenital, point in this direction. What
is the analogy of linesearch for the primal-dual method? It is as follows. Applying Rule 1,
we get certain primal feasible direction x0 − x, which we can extend in the trivial way to a
7.5. LARGE STEP STRATEGY 121

primal-dual feasible direction (i.e., a direction from L × L⊥ ) d1 = (x0 − x, 0); shifting the current
strictly feasible pair (x, s) in this direction, we for sure get a strictly feasible pair with better
(or, in the case of ω = 0, the same) value of the potential. Similraly, applying Rule 2, we get
another primal-dual feasible direction d2 = (0, s0 − s); shifting the current pair in this direction,
we always get a pair from the primal-dual feasible plane L = {b + L} × {c + L⊥ }, although not
necessarily belonging to the interior of the primal-dual cone K = K × K ∗ , What we always get,
is certain 2-dimensional plane D (passing through (x, s) parallel to the directions d1 , d2 ) which
is contained in the primal-dual feasible plane L, and one (or two, depending on whether Rule
2 was or was not productive) strictly feasible primal-dual pairs - candidates to the role of the
next iterate; what we know from our theoretical analysis, is that the value of the potential at
one of the candidate pairs is ”significantly” - at least by the quantity Ω(κ) - less that the value
of the potential at the previous iterate (x, s). Given this situation, a resonable policy to get
additional progress in the potential at the step is 2-dimensional minimization of the potential
over the intersection of the plane D with the interior of the cone K × K ∗ . The potential is not
convex, and it would be difficult to ensure a prescribed quality of its minimization even over the
2-dimensional plane D, but this is not the point where we must get a good minimizer; for our
purposes it suffices to perform a once for ever fixed (and small) number of steps of any relaxation
method for smooth minimization (the potential is smooth), running the method from the best
of our candidate pairs. In the case of LP, same as in some other interesting cases, there are
possibilities to implement this 2-dimensional search in a way which almost does not increase the
total computational effort per step3 , and at the same time accelerates the method dramatically.

3
this total effort normally is dominated by the cost of computing the reduced Newton direction ex
122 CHAPTER 7. THE PRIMAL-DUAL POTENTIAL REDUCTION METHOD

7.6 Exercises: Primal-Dual method


The subject of the forthcoming problems is implementation of the primal-dual method. We shall
start with some remarks related to the general situation and then consider a particular problem
coming from Control.
When speaking about implementation, i.e., about algorithmical issues, we should, of course,
fix somehow the way the data are represented; for a conic problem, this is, basically, the question
of how the feasible subspace L is described. In most of applications known to me the situation
is as follows. b + L ⊂ Rn is defined as the image of certain subspace

{ξ ∈ Rl | P (ξ − p) = 0}

(ξ is the vector of the design variables) under a given affine mapping

x = A(ξ) ≡ Aξ + b,

A being n × l and P being k × l matrices; usually one can assume that A is of full column rank,
i.e., that its columns are linearly independent, and that P is of full row rank, i.e., the rows of
P are linearly independent; from now on we make this regularity assumption. As far as the
objective is concerned, it is a linear form χT ξ of the design vector.
Thus, the typical for applications form of the primal problem is

(P) : minimize χT ξ s.t. ξ ∈ Rl , P (ξ − p) = 0, x ≡ Aξ + b ∈ K,

K being a pointed closed and convex cone with a nonempty interior in Rn . This is exactly the
setting presented in Section 5.4.4.
As we know from Exercise 5.4.11, the problem dual to (P) is

(D) : minimize β T s s.t. AT s = χ + P T r, s ∈ K ∗ ,

where the control vector is comprised of s ∈ Rn and r ∈ Rk , K ∗ is the cone dual to K, and
β = A(p).
In what follows F denotes the primal barrier - ϑ-logarithmically homogeneous self-concordant
barrier for K, and F + denotes the dual barrier (see Lecture 7).
Let us look how the primal-dual method could be implemented in the case when the primal-
dual pair of problems is in the form (P) - (D). We should answer the following basic questions

• how to represent the primal and the dual solutions;

• how to perform the updating (xi , si ) 7→ (xi+1 , si+1 ).

As far as the first of this issues is concerned, the most natural decision is
to represent x’s of the form A(ξ) (note that all our primal feasible x’s are of this type) by
storing both x (as an n-dimensional vector) and ξ (as an l-dimensional one);
to represent s’s and r’s ”as they are” - as n- and k-dimensional vectors, respectively.
Now, what can be said about the main issue - how to implement the updating of strictly
feasible primal-dual pairs? In what follows we speak about the basic version of the method
only, not discussing the large step strategy from Section 7.5, since implementation of the latter
strategy (and even the possibility to implement it) heavily depends on the specific analytic
structure of the problem.
7.6. EXERCISES: PRIMAL-DUAL METHOD 123

Looking at the description of the primal-dual method, we see that the only nontrivial issue
is how to compute the Newton direction
1
ex = argmin{hT g + hT F 00 (x)h | h ∈ L},
2
ϑ+µ
where (x, s) is the current iterate to be updated and g = F 0 (x) + sT x
s. Since L is the image of
the linear space
L0 = {ζ ∈ Rl | P ζ = 0}
under the mapping ζ 7→ Aζ, we have
ex = Aηx
for certain ηx ∈ L0 , and the problem is how to compute ηx .

Exercise 7.6.1 # Prove that ηx is uniquely defined by the linear system of equations
    
Q PT η −q
= (7.40)
P 0 u 0

where
Q = AT F 00 (x)A, q = AT g, (7.41)
so that ηx is given by the relation
h i
ηx = −Q−1 AT g − P T [P Q−1 P T ]−1 P Q−1 AT g ; (7.42)

in the particular case when P is absent (formally, k = 0), ηx is given by

ηx = −Q−1 AT g. (7.43)

Note that normally k is a small integer, so that the main effort in computing ηx is to assemble
and to invert the matrix Q. Usually this is the main part of the overall effort per iteration, since
other actions, like computing F (x), F 0 (x), F 00 (x), are relatively cheap.

7.6.1 Example: Lyapunov Stability Analysis


The goal of the forthcoming exercises is to develop the (principal elements of) algorithmic scheme
of the primal-dual method as applied to the following interesting and important problem coming
from Control theory:
(C) given a ”polytopic” linear time-varying ν-dimensional system

v 0 (t) = V (t)v(t), V (t) ∈ Conv{V1 , ..., Vm },

find a quadratic Lyapunov function v T Lv which demonstrates stability of the system.


Let us start with explaining what we are asked to do. The system in question is a time-varying
linear dynamic system with uncertainty: v(t) is ν-dimensional vector-function of time t - the
trajectory, and V (t) is the time-varying matrix of the system. Note that we do not know in
advance what this matrix is; all we know is that, for every t, the matrix V (t) belongs to the
convex hull of a given finite set of matrices Vi , i = 1, ..., m.
Now, the system in question is called stable, if v(t) → 0 as t → ∞ for all trajectories. A
good sufficient condition for stability is the existence of a positive definite quadratic Lyapunov
function v T Lv for the system, i.e., a positive definite symmetric ν × ν matrix L such that the
124 CHAPTER 7. THE PRIMAL-DUAL POTENTIAL REDUCTION METHOD

derivative in t of the quantity v T (t)Lv(t) is strictly negative for every t and every trajectory v(t)
with nonzero v(t). This latter requirement, in view of v 0 (t) = V (t)v(t), is equivalent to

[V (t)v(t)]T Lv(t) < 0 whenever v(t) 6= 0 and V (t) ∈ Conv{V1 , ..., Vm },

or, which is the same (since for a given t v(t) can be an arbitrary vector and V (t) can be an
arbitrary matrix from Conv{V1 , ..., Vm }), is equivalent to the requirement
1
v T V T Lv = v T [V T L + LV ]v < 0, v 6= 0, V ∈ Conv{V1 , ..., Vm }.
2
In other words, L should be a positive definite symmetric matrix such that all the matrices of
the form V T L + LV associated with V ∈ Conv{V1 , ..., Vm } are negative definite; matrix L with
these properties will be called appropriate.
Our first (and extremely simple) task is to characterize the appopriate matrices.

Exercise 7.6.2 # Prove that a symmetric ν × ν matrix L is appropriate if and only if it is


positive definite and the matrices

ViT L + LVi , i = 1, ..., m

are negative definite.

We see that to find an appropriate matrix (and to demonstrate by this stability of (C) via a
quadratic Lyapunov function) is the same as to find a solution to the following system of strict
matrix inequalities
L > 0; ViT L + LVi < 0, i = 1, ..., m, (7.44)
where inequalities with symmetric matrices are understood as positive definiteness (for strict
inequalities) or semidefiniteness (for non-strict ones) of the corresponding differences.
We can immediately pose our problem as a conic problem with trivial objective; to this end
it suffices to treat L as the design variable (which varies over the space Sν of symmetric ν × ν
matrices) and introduce the linear mapping

B(L) = Diag{L; −V1T L − LV1 ; ...; −VmT L − LVm }

from this space into the space (Sν )m+1 - the direct product of m + 1 copies of the space Sν , so
that (Sν )m+1 is the space of symmetric block-diagonal [(m+1)ν]×[(m+1)ν] matrices with m+1
diagonal blocks of the size ν × ν each. Now, (Sν )m+1 contains the cone K of positive semidefinite
matrices of the required block-diagonal structure; it is clearly seen that L is appropriate if and
only if B(L) ∈ int K, so that the set of appropriate matrices is the same as the set of strictly
feasible solutions to the conic problem

minimize 0 s.t. B(L) ∈ K

with trivial objective.


Thus, the problem in question is reduced to a conic problem involving the cone of positive
semidefinite matrices of certain block-diagonal structure; the problems of this type are called
semidefinite programs or optimization under LMI’s (Linear Matrix Inequality constraints).
Of course, we could try to solve the problem by an interior point potential reduction method
known to us, say, by the method of Karmarkar or by the primal-dual method; we immdeiately
discover, anyhow, that the technique developed so far cannot be applied to our problem - indeed,
in all methods known to us it was required at least to know in advance a strictly feasible solution
7.6. EXERCISES: PRIMAL-DUAL METHOD 125

to the problem, and in our particular case such a solution is exactly what should be finally found.
There is, anyhow, a straightforward way to avoid the difficulty. First of all, our system (7.44) is
homogeneous in L; therefore we can normalize L to be ≤ I (I stands for the unit matrix of the
context-determined size) and pass from the initial system to the new one

L > 0; L ≤ I; ViT L + LVi < 0, i = 1, ..., m. (7.45)

Now let us extend our design vector L by one variable t, so that the new design vector becomes

ξ = (t, L) ∈ E ≡ R × Sn ,

and consider the semidefinite program

minimize t s.t. L + tI ≥ 0; I − L ≥ 0; tI − ViT L − LVi ≥ 0, i = 1, ..., m. (7.46)

Clearly, to solve system (7.45) is the same as to find a feasible solution to optimization problem
(7.46) with negative value of the objective; on the other hand, in (7.46) we have no difficulties
with an initial strictly feasible solution: we may set L = 12 I and then choose t large enough to
make all remaining inequalities strict.
It is clear that (7.46) is of the form (P) with the data given by the affine mapping

A(ξ) ≡ A(t, L) = Diag{L + tI; I − L; tI − V1T L − LV1 ; ...; tI − VmT L − LVm } : E → E,

E being the space (Sν )m+2 of block-diagonal symmetric matrices with m + 2 diagonal blocks of
the size ν × ν each; the cone K in our case is the cone of positive semidefinite matrices from E,
and matrix P is absent, so that our problem is

(Pr) minimize t s.t. A(t, L) ∈ K.

Now let us form the method.


Exercise 7.6.3 #+ Prove that
1) the cone K is self-dual;
2) the function
F (x) = − ln Det x
is a (m + 2)ν-logarithmically homogeneous self-concordant barrier for the cone K;
3) the dual barrier F + associated with the barrier F is, up to an additive constant, the barrier
F itself:
F + (s) = − ln Det s − (m + 2)ν.
Thus, we are equipped with the primal and the dual barriers required to solve (Pr) via the
primal-dual method. Now let us look what the method is. First of all, what is the dual to (Pr)
problem (Dl)?
Exercise 7.6.4 # Prove that when the primal problem (P) is specified to be (Pr), the dual
problem (D) becomes

(Dl) minimize Tr{s0 } under choice of m + 2 symmetric ν × ν matrices s−1 , ..., sm s.t.
m
X
s−1 − s0 − [Vi si + si ViT ] = 0;
i=1
m
X
Tr{s−1 } + Tr{si } = 1.
i=1
126 CHAPTER 7. THE PRIMAL-DUAL POTENTIAL REDUCTION METHOD

It is time now to think of the initialization. Could we in fact point out strictly feasible
solutions xb and sb to the primal and to the dual problems? As we just have mentioned, as far as
the primal problem (Pr) is concerned, there is nothing to do: we can set
x
b = A(t, b
b L),

where L b is < I, e.g., L


b = 1 I, and tb is large enough to ensure that L + tI
b > 0, tI b + LV
b > V TL b i,
2 i
i = 1, ..., m.
Exercise 7.6.5 # Point out a strictly feasible solution sb to (Dl).
It remains to realize what are the basic operations at a step of the method.
Exercise 7.6.6 # Verify that in the case in question the quantities involved into the description
of the primal-dual method can be specified as follows:
1) The quantities related to F are given by
F 0 (x) = −x−1 ; F 00 (x)h = x−1 hx−1 ;
2) The matrix Q involved into the system for finding ηx (see Exercise 7.6.1), taken with
respect to certain orthonormal basis {eα }α=1,...,N in the space E, is given by
Qαβ = Tr{Aα x−1 Aβ x−1 }, Aα = Aeα .
Think about the algorithmic implementation of the primal-dual method and, in particular,
about the following issues:
• What is the dimension N of the ”design space” E? What is the dimension M of the
”image space” E?
• How would you choose a ”natural” orthonormal basis in E?
• Is it necessary/reasonable to store F 00 (x) as an M × M square array? How to assemble
the matrix Q? What is the arithmetic cost of the assembling?
• Is it actually necessary to invert Q explicitly? Which method of Linear Algebra would you
choose to solve system (7.40)?
• What is the arithmetic cost of the step in the basic version of the primal-dual method?
Where the dominating expenses come from?
• Are there ways to implement at a relatively low cost a large step strategy? How would you
do it?
• When would you terminate the computations? How could you recognize that the optimal
value in the problem is positive, so that you are unable to find a quadratic Lyapunov
function which proves the stability? Is it possible that running the method you never will
be able neither to present an appropriate L nor to come to the conclusion that it does not
exist?
Last exercise is as follows:
Exercise 7.6.7 #∗ Is it reasonable to replace (Pr) by ”less redundant” problem
(Pr0 ) minimize t s.t. L ≥ I; tI − ViT L − LVi ≥ 0, i = 1, ..., m
(here we normalize L in (7.44) by L ≥ I and, same as in (Pr), add the ”slack” variable t to
make the problem ”evidently feasible”)?
Chapter 8

Long-Step Path-Following Methods

To the moment we are acquainted with three particular interior point algorithms, namely, with
the short-step path-following method and with two potential reduction algorithms. As we know,
the main advantage of the potential reduction scheme is not of theoretical origin (in fact one
of the potential reduction routines, the method of Karmarkar, is even worse theoretically than
the path-following algorithm), but in possibility to implement ”long step” tactics. Recently it
became clear that such a possibility also exists within the path-following scheme; and the goal
of this lecture is to present to you the ”long step” version of the path-following method.

8.1 The predictor-corrector scheme


Recall that in the path-following scheme (Lecture 4) we were interested in the problem

minimize cT x s.t. x ∈ G, (8.1)

G being a closed and bounded convex domain in Rn . In order to solve the problem, we take a
ϑ-self-concordant barrier F for the feasible domain G and trace the path

x∗ (t) = argmin Ft (x), Ft (x) = tcT x + F (x), (8.2)


x∈int G

as the penalty parameter t tends to infinity. More specifically, we generate a sequence of pairs
(ti , xi ) κ-close to the path, i.e., satisfying the predicate
q
{t > 0} & {x ∈ int G} & {λ(Ft , x) ≡ [∇x Ft (x)]T ∇2x Ft (x)∇x Ft (x) ≤ κ}, (8.3)

the path tolerance κ < 1 being the parameter of the method. The policy of tracing the path
in the basic scheme of the method was very simple: in order to update (t, x) ≡ (ti−1 , xi−1 ) into
(t+ , x+ ) = (ti , xi ), we first increased, in certain prescribed ratio, the value of the penalty, i.e.,
set
γ
t+ = t + dt, dt = √ t, (8.4)
ϑ
and then applied to the new function Ft+ (·) the damped Newton method in order to update x
into x+ :
1
y l+1 = y l − [∇2 F (y l )]−1 ∇x Ft+ (y l ); (8.5)
1 + λ(Ft+ , y l ) x
we initialized this recurrence by setting y 0 = x and terminated it when the closeness to the path
was restored, i.e., when λ(Ft+ , y l ) turned out to be ≤ κ, and took the corresponding y l as x+ .

127
128 CHAPTER 8. LONG-STEP PATH-FOLLOWING METHODS

Looking at the scheme, we immediately see at least two weak points of it: first, we use a once
for ever fixed penalty rate and do not try to use larger dt’s; second, when applying the damped
Newton method to the function Ft+ , we start the recurrence at y 0 = x; why do not we use a
better forecast for our target point x∗ (t + dt)? Let us start with discussing this second point.
The path x∗ (·) is smooth (at least two times continuously differentiable), as it is immediately
seen from the Implicit Function Theorem applied to the equation

tc + F 0 (x) = 0 (8.6)

which defines the path. Given a tight approximation x to the point x∗ (t) of the path, we could
try to use the first-order prediction

xf (dt) = x + x0 dt
d ∗
of our target point x∗ (t+dt); here x0 is some approximation of the derivative dt x (·) at the point
t. The simplest way to get this approximation is to note that what we finally are interested in
is to solve with respect to y the equation

(t + dt)c + F 0 (y) = 0;

a good idea is to linearize the left hand side at y = x and to use, as the forecast of x∗ (t + dt),
the solution to the linearized equation. The linearized equation is

(t + dt)c + F 0 (x) + F 00 (x)[y − x] = 0,

and we come to
dx(dt) ≡ y − x = −[F 00 (x)]−1 ∇x Ft+dt (x). (8.7)
Thus, it is reasonable to start the damped Newton method with the forecast

xf (dt) ≡ x + dx(dt) = x − [F 00 (x)]−1 ∇x Ft+dt (x). (8.8)

Note that in fact we do not get anything significantly new: xf (dt) is simply the Newton (not
the damped Newton) iterate of x with respect to the function Ft+ (·); nevertheless, this is not
exactly the same as the initial implementation. The actual challenge is, of course, to get rid of
the once for ever fixed penalty rate. To realize what could be done here, let us write down the
generic scheme we came to:
Predictor-Corrector Updating scheme:
in order to update a given κ-close to the path x∗ (·) pair (t, x) into a new pair (t+ , x+ ) of the
same type, act as follows

• Predictor step:
1) form the primal search line

P = {X(dt) = (t + dt, x + dx(dt)) | dt ∈ R}, (8.9)

dx(dt) being given by (8.7);


2) choose stepsize δt > 0 and form the forecast

t+ = t + δt, xf = x + dx(δt); (8.10)


8.2. DUAL BOUNDS AND DUAL SEARCH LINE 129

• Corrector step:
3) starting with y 0 = xf , run the damped Newton method (8.5) until λ(t+ , y l ) becomes
≤ κ; when it happens, set x+ = y l , thus completing the updating (t, x) 7→ (t+ , x+ ).
Now let us look what are the stepsizes δt acceptable for us. Of course, there is an immediate
requirement that xf = x + dx(δt) should be strictly feasible - otherwise we simply will be unable
to start the damped Newton method with xf . There is, anyhow, a more severe restriction.
Remember that the complexity estimate for the method in question heavily depended on the
fact that the ”default” stepsize (8.4) results in a once for ever fixed (depending on the penalty
rate γ and the path tolerance κ only) Newton complexity of the corrector step. If we wish to
preserve the complexity bounds - and we do wish to preserve them - we should take care of fixed
Newton complexity of the corrector step. Recall that our basic results on the damped Newton
method as applied to the self-concordant function Ft+ (·) (X., Lecture 2) say that the number of
Newton iterations of the method, started at certain point y 0 ∈ int G and ran until the relation
λ(Ft+ , y l ) ≤ κ becomes true, is bounded from above by the quantity
 
1
O(1) [Ft+ (y 0 ) − min Ft+ (y)] + ln(1 + ln ) ,
y∈int G κ
O(1) being an appropriate absolute constant. We see that in order to bound from above the
Newton complexity of the corrector step it suffices to bound from above the residual

V (t+ , xf ) ≡ Ft+ (xf ) − min Ft+ (y),


y∈int G

i.e., to choose the stepsize δt in a way which ensures that

V (t + δt, xf (δt)) ≤ κ, (8.11)

where κ is a once for ever fixed constant - the additional to the path tolerance κ parameter of
the method. The problem, of course, is how to ensure (8.11). If it would be easy to compute
the residual at a given pair (t+ , xf ), we could apply a linesearch in the stepsize δt in order to
choose the largest stepsize compatible with a prescribed upper bound on the residual. Given a
candidate stepsize δt, we normally have no problems with ”cheap” computation of t+ , xf and
the quantity Ft+ (xf ) (usually the cost of computing the value of the barrier is much less than
our natural ”complexity unit” - the arithmetic cost of a Newton step); the difficulty, anyhow,
is that the residual invloves not only the value of Ft+ at the forecast, but also the unknown to
us minimum value of Ft+ (·). What we are about to do is to derive certain duality-based and
computationally cheap lower bounds for the latter minimum value, thus obtaining ”computable”
upper bounds for the residual.

8.2 Dual bounds and Dual search line


From now on, let us make the following Structural assumption on the barrier in question:
Q : the barrier F is of the form

F (x) = Φ(πx + p), (8.12)

where Φ is a ϑ-self-concordant nondegenerate barrier for certain closed convex domain G+ ⊂ Rm


with known Legendre transformation Φ∗ and x 7→ πx + p is an affine mapping from Rn into Rm
with the image intersecting int G+ , so that G is the inverse image of G+ under the mapping
x 7→ πx + p.
130 CHAPTER 8. LONG-STEP PATH-FOLLOWING METHODS

Note that Q indeed defines a ϑ-self-concordant barrier for G, see Proposition 3.1.1.(i).
Note that the essence of the Structural assumption is that we know the Legendre transfor-
mation of Φ (otherwise there would be no assumption at all - we simply could set Φ ≡ F ). This
assumption indeed is satisfied in many important cases, e.g., in Linear Programming, where G
is a polytope given by linear inequalities aTi x ≤ bi , i = 1, ..., m, and
m
X
F (x) = − ln(bi − aTi x);
i=1

here
m
X
G + = Rm
+ , Φ(u) = − ln ui
i=1

and
(πx + p)i = bi − aTi x, i = 1, ..., m;
the Legendre transformation of Φ, as it is immediately seen, is

Φ∗ (s) = Φ(−s) − m, s ∈ Rm
−.

In the mean time we shall speak about other important cases where the assumption is valid.
Now let us make the following simple and crucial observation:

Proposition 8.2.1 Let a pair (τ, s) ∈ R+ × Dom Φ∗ satisfy the linear homogeneous equation

τ c + π T s = 0. (8.13)

Then the quantity


fs (τ ) = pT s − Φ∗ (s) (8.14)
is a lower bound for the quantity
f∗ (τ ) = min Fτ (y)
y∈int G

and, consequently, the quantity

Vs (τ, y) = Fτ (y) − fs (τ ) ≡ τ cT y + F (y) + Φ∗ (s) − pT s (8.15)

is an upper bound for the residual

V (τ, y) = Fτ (y) − min Fτ (·).

Proof. As we know from VII., Lecture 2, the Legendre transformation of Φ∗ is exactly Φ.


Consequently,

Φ(πy + p) = sup [[πy + p]T v − Φ∗ (v)] ≥ [πy + p]T s − Φ∗ (s),


v∈Dom Φ∗

whence
Fτ (y) ≡ τ cT y + F (y) ≡ τ cT y + Φ(πy + p) ≥
≥ τ cT y + [πy + p]T s − Φ∗ (s) = [τ c + π T s]T y + pT s − Φ∗ (s) = pT s − Φ∗ (s)
(the concluding inequality follows from (8.13)).
Our next observation is that there exists a systematic way to generate dual feasible pairs
(τ, s), i.e., the pairs satisfying the premise of the above proposition.
8.2. DUAL BOUNDS AND DUAL SEARCH LINE 131

Proposition 8.2.2 Let (t, x) be a primal feasible pair (i.e., with t > 0 and x ∈ int G), and let

u = πx + p, du(dt) = πdx(dt), s = Φ0 (u), ds(dt) = Φ00 (u)du(dt), (8.16)

where dx(dt) is given by (8.7). Then


(i) Every pair S(dt) on the Dual search line

D = {S(dt) = (t + dt, sf (dt) = s + ds(dt)) | dt ∈ R}

satisfies equation (8.13).


(ii) If (t, x) is κ-close to the path, then the pair S(0), and, consequently, every pair S(dt)
with small enough |dt|, has its s-component in the domain of Φ∗ and is therefore dual feasible.

Proof.
(i): from (8.16) it follows that
 
(t + dt)c + π T (s + ds(dt)) = (t + dt)c + π T Φ0 (u) + Φ00 (u)πdx(dt) =

[since F 0 (x) = π T Φ0 (u) and F 00 (x) = π T Φ00 (u)π in view of (8.12) and (8.16)]

= (t + dt)c + F 0 (x) + F 00 (x)dx(dt) = ∇x Ft+dt (x) + F 00 (x)dx(dt),

and the concluding quantity is 0 due to the origin of dx(dt), see (8.7). (i) is proved.
(ii): let us start with the following simple

Lemma 8.2.1 One has

|ds(dt)|2(Φ∗ )00 (s) = |du(dt)|2Φ00 (u) = [du(dt)]T ds(dt) (8.17)

and
|ds(0)|(Φ∗ )00 (s) = |dx(0)|F 00 (x) = λ2 (Ft , x). (8.18)

Proof. Since s = Φ0 (u) and Φ∗ is the Legendre transformation of Φ, we have

(Φ∗ )00 (s) = [Φ00 (u)]−1 (8.19)

(see (L.3), Lecture 2). Besides this, ds(dt) = Φ00 (u)du(dt) by (8.16), whence

|ds(dt)|2(Φ∗ )00 ≡ [ds(dt)]T [(Φ∗ )00 ][ds(dt)] = [Φ00 du(dt)]T [Φ00 ]−1 [Φ00 du(dt)] =

= [du(dt)]T [Φ00 ][du(dt)],


as claimed in the first equality in (8.17); the second inequality there is an immediate consequence
of ds(dt) = [Φ00 ]du(dt).
To prove (8.18), note that, as we know from (8.17), |ds(0)|2(Φ∗ )00 = |du(0)|2Φ00 ; the latter quan-
tity, in view of (8.16), is nothing but [πdx(0)]T Φ00 [πdx(0)], which, in turn, equals to |dx(0)|2F 00 (x)
in view of F 00 (x) = π T Φ00 (u)π. We have proved the first equality in (8.18); the second is im-
mdeiate, since dx(0) = −[F 00 (x)]−1 ∇x Ft (x) by (8.7), and, consequently,
h iT h i
|dx(0)|2F 00 (x) = [F 00 (x)]−1 ∇x Ft (x) [F 00 (x)] [F 00 (x)]−1 ∇x Ft (x) =

= [∇x Ft (x)]T [F 00 (x)]−1 ∇x Ft (x) ≡ λ2 (Ft , x).


132 CHAPTER 8. LONG-STEP PATH-FOLLOWING METHODS

Now we can immediately complete the proof of item (ii) of the Proposition. Indeed, as we
know from VII., Lecture 2, the function Φ∗ is self-concordant on its domain; since s = Φ0 (u),
we have s ∈ Dom Φ∗ . (8.18) says that the | · |(Φ∗ )00 (s) -distance between s ∈ Dom Φ∗ and sf (0)
equals to λ(Ft , x) and is therefore < 1 due to the premise of (ii). Consequently, s(0) belongs to
the centered at s open unit Dikin ellipsoid of the self-concordant function Φ∗ and is therefore in
the domain of the function (I., Lecture 2). The latter domain is open (VII., Lecture 2), so that
sf (dt) ∈ Dom Φ∗ for all small enough dt ≥ 0; since S(dt) always satisfies (8.13), we conclude
that S(dt) is dual feasible for all small enough |dt|.
Propositions 8.2.1 and 8.2.2 lead to the following
Acceptability Test:
given a κ-close to the path primal feasible pair (t, x) and a candidate stepsize dt, form
the corresponding primal and dual pairs X(dt) = (t + dt, xf (dt) = x + dx(dt)), S(dt) = (t +
dt, sf (dt) = s + ds(dt)) and check whether the associated upper bound

v(dt) ≡ Vsf (dt) (t + dt, xf (dt)) = (t + dt)cT xf (dt) + F (xf (dt)) + Φ∗ (sf (dt)) − pT sf (dt) (8.20)

for the residual V (t + dt, xf (dt)) is ≤ κ (by definition, v(dt) = +∞ if xf (dt) 6∈ Dom F or if
sf (dt) 6∈ Dom Φ∗ ).
If v(dt) ≤ κ, accept the stepsize dt, otherwise reject it.
An immediate corollary of Propositions 8.2.1, 8.2.2 is the following

Proposition 8.2.3 If (t, x) is a κ-close to the path primal feasible pair and a stepsize dt passes
the Acceptability Test, then
V (t + dt, xf (dt)) ≤ κ

and, consequently, the Newton complexity of the corrector step under the choice δt = dt does not
exceed the quantity
  
1
N (κ, κ) = O(1) κ + ln 1 + ln ,
κ
O(1) being an absolute constant.

Now it is clear that in order to get a ”long step” version of the path-following method,
it suffices to equip the Predictor-Corrector Updating scheme with a linesearch-based rule for
choosing the largest possible stepsize δt which passes our Acceptability Test. Such a rule for
sure keeps the complexity of a corrector step at a fixed level; at the same time, the rule is
computationally cheap, since to test a stepsize, we should compute the values of Φ and Φ∗ only,
which normally is nothing as compared to the cost of the corrector step.
The outlined approach needs, of course, theoretical justification. Indeed, to the moment we
do not know what √ is the ”power” of our Acceptability Test - does it accept, e.g., the ”short”
stepsizes dt = O(t/ ϑ) used in the very first version of the method. This is the issue we come
to.

8.3 Acceptable steps


Let us start with the following construction. Given a point u ∈ Dom Φ and a direction δu ∈ Rm ,
let us set
s = Φ0 (u), δs = Φ00 (u)δu,
8.3. ACCEPTABLE STEPS 133

thus coming to the conjugate point s ∈ Dom Φ∗ and to the conjugate direction δs. Now, let
ρ∗u [δu] be the remainder in the second-order Taylor expansion of the function Φ(v) + Φ∗ (w) at
the point (u, s) along the direction (δu, δs):

ρ∗u [δu] = Φ(u + δu) + Φ∗ (s + δs)−


" #
∗ T 0 T [δu]T Φ00 (u)δu [δs]T (Φ∗ )00 (s)δs
∗ 0
− Φ(u) + Φ (s) + [δu] Φ (u) + [δs] (Φ ) (s) + +
2 2
(the right hand side is +∞, if u + δu 6∈ Dom Φ or if s + δs 6∈ Dom Φ∗ ).
Our local goal is to establish the following

Lemma 8.3.1 One has


q
ζ ≡ |δu|Φ00 (u) = |δs|(Φ∗ )00 (s) = [δu]T δs. (8.21)

Besides this, if ζ < 1, then


2 2 2
ρ∗u [δu] ≤ 2ρ(ζ) − ζ 2 = ζ 3 + ζ 4 + ζ 5 + ..., ρ(z) = − ln(1 − z) − z. (8.22)
3 4 5
Last, the third derivative of Φ(·) + Φ∗ (·) taken at the point (u, s) along the direction (δu, δs) is
zero, so that ρ∗u [δu] is in fact the reminder in the third-order Taylor expansion of Φ(·) + Φ∗ (·).

Proof. (8.21) is proved exactly as relation (8.17), see Lemma 8.2.1. From (8.21) it follows that
if ζ < 1, then both u + δu and s + δs are in the centered at u, respectively, s open unit Dikin
ellipsoids of the self-concordant functions Φ, Φ∗ (the latter function is self-concordant due to
VII., Lecture 2). Applying to Φ and Φ∗ I., Lecture 2, we come to

u + δu ∈ Dom Φ, Φ(u + δu) ≤ Φ(u) + [δu]T Φ0 (u) + ρ(|δu|Φ00 (u) ),

s + δs ∈ domΦ∗ , Φ∗ (s + δs) ≤ Φ∗ (s) + [δs]T (Φ∗ )0 (s) + ρ(|δs|(Φ∗ )00 (s) ),


whence
1 1
ρ∗u [δu] ≤ 2ρ(ζ) − |δu|2Φ00 (u) − |δs|2(Φ∗ )00 (s) = 2ρ(ζ) − ζ 2 ,
2 2
as claimed in (8.22).
To prove that the third order derivative of Φ(·) + Φ∗ (·) taken at the point (u, s) in the
direction (δu, δs) is zero, let us differentiate the identity

hT [(Φ∗ )00 (Φ0 (v))]h = hT [Φ00 (v)]−1 h

(h is fixed) with respect to v in the direction h (cf. item 40 in the proof of VII., Lecture 2).
The differentiation results in

D3 Φ∗ (Φ0 (v))[h, h, h] = −D3 Φ(v)[[Φ00 (v)]−1 h, [Φ00 (v)]−1 h, [Φ00 (v)]−1 h];

substituting v = u, h = δs, we come to

D3 Φ(u)[δu, δu, δu] = −D3 Φ∗ (s)[δs, δs, δs].

Now we are ready to prove the following central result.


134 CHAPTER 8. LONG-STEP PATH-FOLLOWING METHODS

Proposition 8.3.1 Let (t, x) be κ-close to the path, and let dt, |dt| < t, be a stepsize. Then the
quantity v(dt) (see (8.20)) satisfies the inequality

v(dt) ≤ ρ∗u [du(dt)], (8.23)

while
|dt| |dt| √
|du(dt)|Φ00 (u) ≤ ω ≡ λ(Ft , x) + [λ(Ft , x) + λ(F, x)] ≤ κ + [κ + ϑ]. (8.24)
t t
In particular, if ω < 1, then v(dt) is well-defined and is ≤ 2ρ(ω) − ω 2 . Consequently, if

2ρ(κ) − κ2 < κ (8.25)

then all stepsizes dt satisfying the inequality

|dt| κ+ − κ
≤ , (8.26)
t κ + λ(F, x)

κ+ being the root of the equation


2ρ(z) − z 2 = κ,
pass the Acceptability Test.

Proof. Let u, s, du(dt), ds(dt) be given by (8.16). In view of (8.16), s is conjugate to u and
ds(dt) is conjugate to du(dt), so that by definition of ρ∗u [·], we have, denoting ζ = |du(dt)|Φ00 (u) =
|ds(dt)|(Φ∗ )00 (s) (see (8.21))

Φ(u + du(dt)) + Φ∗ (s + ds(dt)) =

= Φ(u) + [du(dt)]T Φ0 (u) + Φ∗ (s) + [ds(dt)]T (Φ∗ )0 (s) + ζ 2 + ρ∗u [du(dt)] =


[since s = Φ0 (u) and, consequently, Φ(u) + Φ∗ (s) = uT s and u = (Φ∗ )0 (s), since Φ∗ is the
Legendre transformation of Φ]

= uT s + [du(dt)]T s + uT ds(dt) + ζ 2 + ρ∗u [du(dt)] =

= [u + du(dt)]T [s + ds(dt)] − [du(dt)]T ds(dt) + ζ 2 + ρ∗u [du(dt)] =


[since [du(dt)]T ds(dt) = ζ 2 by (8.21)]

= [u + du(dt)]T [s + ds(dt)] + ρ∗u [du(dt)] =

[since u + du(dt) = π[x + dx(dt)] + p and, by Proposition 8.2.2, π T [s + ds(dt)] = −(t + dt)c]

= pT [s + ds(dt)] − (t + dt)cT [x + dx(dt)] + ρ∗u [du(dt)] =

[the definition of xf (dt) and sf (dt)]

= pT sf (dt) − (t + dt)cT xf (dt) + ρ∗u [du(dt)],

whence (see (8.20))

v(dt) ≡ (t + dt)cT xf (dt) + F (xf (dt)) + Φ∗ (sf (dt)) − pT sf (dt) = ρ∗u [du(dt)],

as required in (8.23).
8.4. SUMMARY 135

Now let us prove (8.24). In view of (8.16) and (8.12) we have


|du(dt)|Φ00 (u) = |πdx(dt)|Φ00 (u) = |dx(dt)|F 00 (x) =
[see (8.7)]
q
00
= |[F (x)] −1
∇x Ft+dt (x)|F 00 (x) ≡ [[F 00 (x)]−1 ∇x Ft+dt (x)]T [F 00 (x)] [[F 00 (x)]−1 ∇x Ft+dt (x)] =
= |∇x Ft+dt (x)|[F 00 (x)]−1 = |(t + dt)c + F 0 (x)|[F 00 (x)]−1 =
= |(1 + dt/t)[tc + F 0 (x)] − (dt/t)F 0 (x)|[F 00 (x)]−1 ≤
|dt| |dt| 0
≤ (1 + )|∇x Ft (x)|[F 00 (x)]−1 + |F (x)|[F 00 (x)]−1 ≤
t t
[due to the definition of λ(Ft , x) and λ(F, x)]
|dt| |dt|
≤ (1 + )λ(Ft , x) + λ(F, x) = ω ≤
t t
[since (t, x) is κ-close to the path, so that λ(Ft , x) ≤ κ, and since F is ϑ-self-concordant barrier]
|dt| |dt| √
≤ (1 + )κ + ϑ.
t t
The remaining statements of Proposition are immediate consequences of (8.23), (8.24) and
Lemma 8.3.1.

8.4 Summary
Summarizing our observations and results, we come to the following
Long-Step Predictor-Corrector Path-Following method:
• The parameters of the method are the path tolerance κ ∈ (0, 1) and the threshold κ >
2ρ(κ) − κ2 ; the input to the method is a κ-close to the path primal feasible pair (t0 , x0 ) .
• The method forms, starting with (t0 , x0 ), the sequence of κ-close to the path pairs (ti , xi ),
with the updating
(ti−1 , xi−1 ) 7→ (ti , xi )
being given by the Predictor-Corrector Updating scheme, where the stepsizes δti ≡ ti −ti−1
are nonnegative reals passing the Acceptability Test associated with the pair (ti−1 , xi−1 ).
Since, as we know from Proposition 8.3.1, the stepsizes
κ+ − κ
δti∗ = ti−1
κ + λ(F, xi−1 )
for sure pass the Acceptability Test, we may assume that the stepsizes in the above method are
at least the default values δti∗ :
κ+ − κ
δti ≥ ti−1 ; (8.27)
κ + λ(F, xi−1 )
note that to use the default stepsizes δti ≡ δti∗ , no Acceptability Test, and, consequently, no
Structural assumption on the barrier F is needed. Note also that to initialize the method (to get
the initial close to the path pair (t0 , x0 )), one can trace ”in the reverse time” the auxiliary path
associated with a given strictly feasible initial solution x b ∈ int G (see Lecture 4); and, of course,
when tracing the auxiliary path, we also can use the long-step predictor-corrector technique.
The method in question, of course, fits the standard complexity bounds:
136 CHAPTER 8. LONG-STEP PATH-FOLLOWING METHODS

Theorem 8.4.1 Let problem (8.1) be solved by the Long-Step Predict-or-Corrector Path-Follo-
wing method which starts at a κ-close to the path primal feasible pair (t0 , x0 ) and uses stepsizes
δti passing the Acceptability Test and satisfying (8.27). Then the total number of Newton steps
in the method before an ε-solution to the problem is found does not exceed
√  
ϑ
O(1) ϑ ln + 1 + 1,
t0 ε

with O(1) depending on the parameters κ and κ of the method only.

Proof. Since (ti , xi ) are κ-close to the path, we have cT xi − minx∈G cT x ≤ O(1)ϑt−1i with
certain O(1) depending on κ only (see Proposition 4.4.1, Lecture 4); this inaccuracy
√ bound
combined with (8.27) (where one should take into account√ that λ(F, xi−1 ) ≤ ϑ) implies that
cT xi − minx∈G cT x becomes ≤ ε after no more than O(1) ϑ ln(1 + ϑt−1 −1
0 ε ) + 1 steps, with O(1)
depending on κ and κ only. It remains to note that since the stepsizes pass the Acceptability
Test, the Newton complexity of a step in the method, due to Proposition 8.2.3, is O(1).
8.5. EXERCISES: LONG-STEP PATH-FOLLOWING METHODS 137

8.5 Exercises: Long-Step Path-Following methods


Let us start with clarifying an immediate question motivated by the above construction.

Exercise 8.5.1 #∗ The Structural assumption requires F to be obtained from a barrier with
known Legendre transformation by affine substitution of the argument. Why did not we simplify
things by assuming that F itself has a known Legendre transformation?

The remaining exercises tell us another story. We have presented certain ”long step” variant
of the path-following scheme; note, anyhow, that the ”cost” of ”long steps” is certain structural
assumption on the underlying barrier. Although this assumption is automatically satisfied in
many important cases, we have paid something. Can we say something definite about the
advantages we have paid for? ”Definite” in the previous sentence means ”something which can
be proved”, not ”something which can be supported by computational experience” (this latter
aspect of the situation is more or less clear).
The answer is as follows. As far as the worst case complexity bound is concerned, there is no
progress at all, and the current state of the theory of interior √point methods do not give us any
hope to get a worst-case complexity estimate better than O( ϑ ln(V/ε)). Thus, if we actually
have got something, this is not an improvement in the worst case complexity. The goal of the
forthcoming exercises is to explain what is the improvement.
Let us start with some preliminary considerations. Consider a step of a path-following
predictor-corrector method; for the sake of simplicity, assume that at the beginning of the step
we are exactly at the path rather than are close to it (what follows can be without any difficulties
extended onto this latter situation). Thus, we are given t > 0 and x = x∗ (t), and our goal is to
update the pair (t, x) into a new pair (t+ , x+ ) close to the path with larger value of the penalty
parameter. To this end we choose a stepsize dt > 0, set t+ = t + dt and make the predictor step

x 7→ xf = x + (x∗ )0 (t)dt,

shifting x along the tangent to the path line l. At the corrector step we apply to Ft+ the damped
Newton method, starting with xf , to restore closeness to the path. Assume that the method in
question ensures that the residual

Ft+ (xf ) − min Ft+ (x)


x

is ≤ O(1) (this is more or less the same as to ensure a fixed Newton complexity of the corrector
step). Given that the method in question possesses the aforementioned properties, we may ask
ourselves what is the length of the displacement xf − x which is guaranteed by the method. It
is natural to measure the length in the local metric | · |F 00 (x) given by the Hessian of the barrier.
Note that in the short-step version of the method, where dt = O(1)t(1 + λ(F, x))−1 , we have
(see (8.7))
dx(dt) = −dt[F 00 (x)]−1 c = t−1 dt[F 00 (x)]−1 F 0 (x)
(since at the path tc + F 0 (x) = 0), whence

|xf (dt) − x|F 00 (x) = |dx(dt)|F 00 (x) = t−1 dt|[F 00 (x)]−1 F 0 (x)|F 00 (x) =

= t−1 dt|F 0 (x)|[F 00 (x)]−1 = t−1 dtλ(F, x),


and, substituting the expression for dt, we come to
λ(F, x)
Ω ≡ |xf (dt) − x|F 00 (x) = O(1) ,
1 + λ(F, x)
138 CHAPTER 8. LONG-STEP PATH-FOLLOWING METHODS

so that Ω = O(1), provided that λ(F, x) ≥ O(1), or, which is the same, provided that we are
not too close to the analytic center of G.
Thus, the quantity Ω - let us call it the prediction power of the method - for the default
short-step version of the method is O(1). The goal of what follows is to investigate the prediction
power of the long-step version of the method and to compare it with the above reference point
- the O(1)-power of the short-step version; this is a natural formalization of the question ”how
long are the long steps”.
First of all, let us note that there is a natural upper bound on the prediction power - namely,
the distance (measured, of course, in | · |F 00 (x) ) from x to the boundary of G along the tangent
line l. Actually there are two distances, since there are two ways to reach ∂G along l - the
”forward” and the ”backward” movement. It is reasonable to speak about the shortest of these
distances - about the quantity

∆ ≡ ∆(x) = min{|p (x∗ )0 (t)|F 00 (x) | x + p (x∗ )0 (t) 6∈ int G}.

Since G contains the centered at x unit Dikin ellipsoid of F (i.e., the centered at x | · |F 00 (x) -unit
ball), we have
∆ ≥ 1.
Note that there is no prediction policy which always results in Ω >> 1, since it may happen
that both ”forward” and ”backward” distances from x to the boundary of G are of order of 1
(look at the case when G is the unit cube {y ∈ Rn | |y|∞ ≤ 1}, F (y) is the standard logarithmic
P
barrier − ni=1 [ln(1 − yi ) + ln(1 + yi )] for the cube, x = (0.5, 0, ..., 0)T and c = (−1, 0, ..., 0)T ).
What we can speak about is the type of dependence Ω = Ω(∆); in other words, it is reasonable
to ask ourselves ”how large is Ω when ∆ is large”, not ”how large is Ω” - the answer to this
latter question cannot be better than O(1).
In what follows we answer the above question for the particular case as follows:
Semidefinite Programming: the barrier Φ involved into our Structural assumption is the barrier

Φ(X) = − ln Det X

for the cone Sk+ of symmetric positive semidefinite k × k matrices


In other words, we restrict ourselves with the case when G is the inverse image of Sk+ under the
affine mapping
x 7→ A(x) = πx + p
taking values in the space Sk of k × k symmetric matrices and

F (x) = − ln Det A(x).

Note that the Semidefinite Programming case (very important in its own right) covers, in par-
ticular, Linear Programming (look what happens when πx + p takes values in the subspace of
diagonal matrices).
Let us summarize our current knowledge on the situation in question.

• Φ is k-self-concordant barrier for the cone Sk ; the derivatives of the barrier are given by
b h
DΦ(u)[h] = − Tr{u−1 h} = − Tr{h}, b = u−1/2 hu−1/2 ,

so that
Φ0 (u) = −u−1 ; (8.28)
b 2 },
D2 Φ(u)[h, h] = Tr{u−1 hu−1 h} = Tr{h
8.5. EXERCISES: LONG-STEP PATH-FOLLOWING METHODS 139

so that
Φ00 (u)h = u−1 hu−1 ; (8.29)
3
D Φ(u)[h, h, h] = −2 Tr{u −1
hu −1
hu −1 b 3}
h} = −2 Tr{h
(see Example 5.3.3, Lecture 5, and references therein);
• the cone Sk+ is self-dual; the Legendre transformation of Φ is
Φ∗ (s) = −Φ(−s) + const, Dom Φ∗ = − int Sn+
(Exercises 5.4.7, 5.4.10).
Let us get more information on the barrier Φ. Let us call an arrow a pair (v, dv) comprised
of v ∈ int Sk+ and dv ∈ Sk with |dv|Φ00 (v) = 1. Given an arrow (v, dv), let us define the conjugate
co-arrow (v ∗ , dv ∗ ) as
v ∗ = Φ0 (v) = −v −1 , dv ∗ = Φ00 (v)dv = v −1 dvv −1 .
Let also
ζ(v, dv) = sup{p | v ± pdv ∈ Sk+ }, (8.30)
ζ ∗ (v ∗ , dv ∗ ) = sup{p | v ∗ ± dv ∗ ∈ −Sk+ }. (8.31)
In what follows |w|∞ , |w|2 are the spectral norm (maximum modulus of eigenvalues) and the
Frobenius norm Tr1/2 {w2 } of a symmetric matrix w, respectively.
Exercise 8.5.2 Let (v, dv) be an arrow and (v ∗ , dv ∗ ) be the conjugate co-arrow. Prove that
q
1 = |dv|Φ00 (v) = |v −1/2 dvv −1/2 |2 = |dv ∗ |(Φ∗ )00 (v∗ ) = Tr{dv dv ∗ } (8.32)
and that
1
ζ(v, dv) = ζ ∗ (v ∗ , dv ∗ ) = . (8.33)
|v −1/2 dvv −1/2 |∞
Exercise 8.5.3 Prove that for any positive integer j, any v ∈ int Sk+ and any h ∈ Sk one has
b j }, h
Dj Φ(v)[h, ..., h] = (−1)j (j − 1)! Tr{h b = v −1/2 hv −1/2 , (8.34)
and, in particular,
b 2 |h|
|Dj Φ(v)[h, ..., h]| ≤ (j − 1)!|h| b j−2 , j ≥ 2. (8.35)

Let ρj (z) be the reminder in j-th order Taylor expansion of the function − ln(1 − z) at z = 0:
1 j+1 1 j+2
ρj (z) = z + z + ...
j+1 j+2
(so that the perfectly known to us function ρ(z) = − ln(1 − z) − z is nothing but ρ1 (z)).
j
Exercise 8.5.4 Let (v, dv) be an arrow, and let R(v,dv) (r), j ≥ 2, be the remainder in j-th order
Taylor expansion of the function f (r) = Φ(v + rdv) at r = 0:
j
X
j f (i) (0) i
R(v,dv) (r) = f (r) − r
i=0
i!

(the right hand side is +∞, if f is undefined at r). Prove that


 
j |r|
R(v,dv) (r) ≤ ζ 2 (v, dv)ρj , |r| < ζ(v, dv) (8.36)
ζ(v, dv)
(the quantity ζ(v, dv) is given by (8.30), see also (8.33)).
140 CHAPTER 8. LONG-STEP PATH-FOLLOWING METHODS

Exercise 8.5.5 Let (v, dv) be an arrow and (v ∗ , dv ∗ ) be the conjugate co-arrow. Let Rj(v,dv) (r),
j ≥ 2, be the reminder in j-th order Taylor expansion of the function ψ(r) = Φ(v + rdv) +
Φ∗ (v ∗ + rdv ∗ ) at r = 0:
j
X ψ (i) (0) i
Rj(v,dv) (r) = ψ(r) − r
i=0
i!

(the right hand side is +∞, if ψ is undefined at r). Prove that


 
|r|
Rj(v,dv) (r) ≤ 2ζ 2 (v, dv)ρ , |r| < ζ(v, dv) (8.37)
ζ(v, dv)

(the quantity ζ(v, dv) is given by (8.30), see also (8.33)).

Now let us come back to our goal - investigating the forecast power of the long step predictor-
corrector scheme for the case of Semidefinite Programming. Thus, let us fix the pair (t, x)
belonging to the path (so that t > 0 and x = x∗ (t) = argminy∈G [tcT x + F (x)]). We use the
notation as follows:

• I is the unit k × k matrix;

• u = πx + p;

• dx ∈ Rn is the | · |F 00 (x) -unit direction parallel to the line l, and

du = πdx

is the direction of the image L of the line l in the space Sk ;

• s ≡ Φ0 (u) = −u−1 ; ds ≡ Φ00 (u)du = u−1 duu−1 .

Let us first realize what the quantity ∆(x) is.

Exercise 8.5.6 Prove that (u, du) is an arrow, (s, ds) is the conjugate co-arrow and that

∆ = ζ(u, du).

Now we are ready to answer what is the prediction power of the long step predictor-corrector
scheme.

Exercise 8.5.7 Consider the Long-Step Predictor-Corrector Updating scheme with linesearch
(which chooses, as the stepsize, the largest value of dt which passes the Acceptability Test) as
applied to Semidefinite Programming. Prove that the prediction power of the scheme is at least

Ω∗ (x) = O(1)∆1/2 (x),

with O(1) depending on the treshold κ only1 .

Thus, the long-step scheme indeed has a ”nontrivial” prediction power.


An interesting question is to bound from above the prediction power of an arbitrary predictor-
corrector path-following scheme of the aforementioned type; recall that the main restrictions on
the scheme were that
1
recall that for the sake of simplicity the pair (t, x) to be updated was assumed to be exactly at the path; if it
is κ-close to the path, then similar result holds true, with O(1) depending on both κ and κ
8.5. EXERCISES: LONG-STEP PATH-FOLLOWING METHODS 141

• in order to form the forecast xf , we move along the tangent line l to the path [in principle
we could use higher-order polynomial approximations on it; here we ignore this possibility]

• the residual Ft+ (xf ) − miny Ft+ (x) should be ≤ O(1).

It can be proved that in the case of Linear (and, consequently, Semidefinite) Programming the
prediction power of any predictor-corrector scheme subject to the above restrictions cannot be
better than O(1)∆2/3 (x) (which is slightly better than the prediction power O(1)∆1/2 (x) of our
method). I do not know what is the origin of the gap - drawbacks of the long-step method in
question or too optimistic upper bound, and you are welcome to investigate the problem.
142 CHAPTER 8. LONG-STEP PATH-FOLLOWING METHODS
Chapter 9

How to construct self-concordant


barriers

To the moment we are acquainted with four interior point methods; the ”interior point toolbox”
contains more of them, but we are enforced to stop somewhere, and I think it is a right time to
stop. Let us think how could we exploit our knowledge in order to solve a convex program by
one of our methods. Our actions are clear:
(a) we should reformulate our problem in the standard form

minimize cT x s.t. x ∈ G (9.1)

of a problem of minimizing a linear objective over a closed convex domain (or in the conic form
- as a problem of minimizing a linear objective over the intersection of a convex cone and an
affine plane; for the sake of definiteness, let us speak about the standard form).
In principle (a) does not cause any difficulty - we know that both standard and conic problems
are universal forms of convex programs.
(b) we should equip the domain/cone given by step (a) by a ”computable” self-concordant
barrier.
Sometimes we need something more - e.g., to apply the potential reduction methods, we are
interested in logarithmically homogeneous barriers, possibly, along with their Legendre trans-
formations, and to use the long-step path-following scheme, we need a barrier satisfying the
Structural assumption from Lecture 8.
Now, our current knowledge on the crucial issue of constructing self-concordant barriers is
rather restricted. We know exactly 3 ”basic” self-concordant barriers:

• (I) the 1-self-concordant barrier − ln x for the nonnegative axis (Example 3.1.2, Lecture
3);

• (II) the m-self-concordant barrier − ln Det x for the cone Sm


+ of positive semidefinite m×m
matrices (Exercise 3.3.3);

• (III) the 2-self-concordant barrier − ln(t2 −xT x) for the second-order cone {(t, x) ∈ R×Rk |
t ≥ |x|2 } (Example 5.3.2, Lecture 5).

Note that the latter two examples were not justified in the lectures; and this is not that easy to
prove that (III) indeed is a self-concordant barrier for the second-order cone.
Given the aforementioned basic barriers, we can produce many other self-concordant barriers
by applying the combination rules, namely, by taking sums of these barriers, their direct sums
and superpositions with affine mappings (Proposition 3.1.1, Lecture 3). These rules, although

143
144 CHAPTER 9. HOW TO CONSTRUCT SELF-CONCORDANT BARRIERS

very simple, are surprisingly powerful; what should be mentioned first, is that the rules allow
to treat all constraints defining the feasible set G seperately. We mean the following. Normally
the feasible set G is defined by a finite number m of constraints; each of them defines its own
feasible set Gi , so that the resulting feasible set G is the intersection of the Gi :

G = ∩m
i=1 Gi .

According to Proposition 3.1.1.(ii), in order to find a self-concordant barrier for G, it suffices to


find similar barriers for all Gi and then to take the sum of these ”partial” barriers. Thus, we
have in our disposal the Decomposition rule which makes the problem of step (b) ”separable
with respect to constraints”.
The next basic tool is the Substitution rule given by Proposition 3.1.1.(i):
In order to get a ϑ-self-concordant barrier F for a given convex domain G, it suffices to
represent the domain as the inverse image, under certain affine mapping A, of another domain,
G+ , with known ϑ-self-concordant barrier F + :

G = A−1 (G+ ) ≡ {x | A(x) ∈ G+ }

(the image of A should intersect the interior of G+ ); given such representation, you can take as
F the superposition
F (x) = F + (A(x))
of F + and the mapping A.
The Decomposition and the Substitution rules as applied to the particular self-concordant
barriers (I) - (III) allow to obtain barriers required by several important generic Convex Pro-
gramming problems, e.g., they immediately imply self-concordance of the standard logarithmic
barrier m X
F (x) = − ln(bi − aTi x)
i=1
for the polyhedral set
G = {x | aTi x ≤ bi , i = 1, ..., m};
this latter fact covers all needs of Linear Programming. Thus, we cannot say that we are
completely unequipped; at the same time, our equipment is not too rich. Consider, for example,
the problem of the best | · |p -approximation:
(Lp ): given sample uj ∈ Rn , j = 1, ..., N , of ”regressors” along with the responses vj ∈ R,
find the linear model
v = xT u
which optimally fits the observations in the | · |p -norm, i.e., minimizes the quantity
N
X
f (x) = |vj − xT uj |p
j=1

(in fact | · |p -criterion is f 1/p (x), but it is, of course, the same what to minimize - f or f 1/p ).
f (·) clearly is a convex function, so that our approximation problem is a convex program. In
order to solve it by an interior point method, we can write the problem down in the standard
form, which is immediate:

minimize t s.t. (t, x) ∈ G = {(t, x) | f (x) ≤ t};


9.1. APPROPRIATE MAPPINGS AND MAIN THEOREM 145

now we need a self-concordant barrier for G, and where to take it?


At the beginning of the ”interior point science” for nonlinear convex problems we were
enforced to invent an ”ad hoc” self-concordant barrier for each new domain we met and then
were to prove that the invented barrier actually is self-concordant, which in many cases required a
lot of unpleasant computations. Recently it became clear that there is a very powerful technique
for constructing self-concordant barriers, which allows to obtain all previously known barriers,
same as a number of new ones, without any computations ”from nothing” - more exactly, from
the fact that the function − ln x is 1-self-concordant barrier for the nonnegative half-axis. This
technique is based on extending the Substitution rule by replacing affine mappings A by a
wider family of certain nonlinear mappings. The essence of the matter is, of course, what are
appropriate for our goals nonlinear mappings A. It is clear in advance that these cannot be
arbitrary mappings, even smooth ones - we at least should provide convexity of G = A−1 (G+ ).

9.1 Appropriate mappings and Main Theorem


Let us fix a closed convex domain G+ ⊂ RN . An important role in what follows is played by
the recessive cone R(G+ ) of the domain defined as
+ N + +
R(G ) = {h ∈ R | u + th ∈ G ∀t ≥ 0 ∀u ∈ G }.

It is immediately seen that R(G+ ) is a closed convex cone in RN .


Now we are able to define the family of mappings A appropriate for us.

Definition 9.1.1 Let G+ ⊂ RN be closed convex domain, and let K = R(G+ ) be the recessive
cone of G+ . A mapping
A(x) : int G− → RN
defined and C3 smooth on the interior of a closed convex domain G− ⊂ Rn is called β-appropriate
for G+ (here β ≥ 0) if
(i) A is concave with respect to K, i.e.,

D2 A(x)[h, h] ≤K 0 ∀x ∈ int G− ∀h ∈ Rn

(from now on we write a ≤K b, if b − a ∈ K);


(ii) A is compatible with G− in the sense that

D3 A(x)[h, h, h] ≤K −3βD2 A(x)[h, h]

whenever x ∈ int G− and x ± h ∈ G− .

For example, an affine mapping A : Rn → RN , restricted on any closed convex domain G− ⊂


Rn , cleraly is 0-appropriate for any G+ ⊂ RN .
The definition of compatibility looks strange; its justification is that it works. Namely, there
is the following central result (it will be proved in Section 9.4):

Theorem 9.1.1 Let

• G+ ⊂ RN be a closed convex domain;

• F + be a ϑ+ -self-concordant barrier for G+ ;

• A : int G− → RN be a mapping β-appropriate for G+ ;


146 CHAPTER 9. HOW TO CONSTRUCT SELF-CONCORDANT BARRIERS

• F − be a ϑ− -self-concordant barrier for G− .


Assume that the set
G0 = {x ∈ int G− | A(x) ∈ int G+ }
is nonempty. Then G0 is the interior of a closed convex domain

G ≡ cl G0 ,

and the function


F (x) = F + (A(x)) + max[1, β 2 ]F − (x)
is a ϑ-self-concordant barrier for G, with

ϑ = ϑ+ + max[1, β 2 ]ϑ− .

The above Theorem resembles the Substitution rule: we see that an affine mapping A in
the latter rule can be replaced by an arbitrary nonlinear mapping (which should, anyhow,
be appropriate for G+ ), and the substitution F + (·) 7→ F + (A(·)) should be accompanied by
adding to the result a self-concordant barrier for the domain of A. Let us call this new rule
”Substitution rule (N)” (nonlinear); to distinguish between this rule and the initial one, let us
call the latter ”Substitution rule (L)” (linear). In fact Substitution rule (L) is a very particular
case of Substitution rule (N); indeed, an affine mapping A, as we know, is appropriate for any
domain G+ , and since domain of A is the whole Rn , one can set F − ≡ 0 (this is 0-self-concordant
barrier for Rn ), thus coming to the Substitution rule (L).

9.2 Barriers for epigraphs of functions of one variable


As an immediate consequence of the Substitution rule (N), we get a number of self-concordant
barriers for the epigraphs of functions on the axis. These barriers are given by the following
construction:
Proposition 9.2.1 Let f (t) be a 3 times continuously differentiable real-valued concave function
on the ray {t > 0} such that

|f 000 (t)| ≤ 3βt−1 |f 00 (t)|, t > 0.

Then the function


F (x, t) = − ln(f (t) − x) − max[1, β 2 ] ln t
is (1 + max[1, β 2 ])-self-concordant barrier for the 2-dimensional convex domain

Gf = cl{(x, t) ∈ R2 | t > 0, x ≤ f (t)}.

Proposition 9.2.2 Let f (x) be a 3 times continuously differentiable real-valued convex function
on the ray {x > 0} such that

|f 000 (x)| ≤ 3βx−1 f 00 (x), x > 0.

Then the function


F (t, x) = − ln(t − f (x)) − max[1, β 2 ] ln x
is (1 + max[1, β 2 ])-self-concordant barrier for the 2-dimensional convex domain

Gf = cl{(t, x) ∈ R2 | x > 0, t ≥ f (x)}.


9.2. BARRIERS FOR EPIGRAPHS OF FUNCTIONS OF ONE VARIABLE 147

To prove Proposition 9.2.1, let us set

• G+ = R+ [K = R+ ],

• F + (u) = − ln u [ϑ+ = 1],

• G− = {(x, t) | t ≥ 0},

• F − (x, t) = − ln t [ϑ− = 1],

• A(x, t) = f (t) − x,

which results in
G = cl{(x, t) | t > 0, x ≤ f (t)}.
The assumptions on f say exactly that A is β-appropriate for G+ , so that the conclusion in
Proposition 9.2.1 is immediately given by Theorem 9.1.1.
To get Proposition 9.2.2, it suffices to apply Proposition 9.2.1 to the image of the set Gf
under the mapping (x, t) 7→ (t, −x).

Example 9.2.1 [epigraph of the increasing power function] Whenever p ≥ 1, the function

− ln t − ln(t1/p − x)

is 2-self-concordant barrier for the epigraph

{(x, t) ∈ R2 | t ≥ (x+ )p ≡ [max{0, x}]p }

of the power function (x+ )p , and the function

−2 ln t − ln(t2/p − x2 )

is 4-self-concordant barrier for the epigraph

{(x, t) ∈ R2 | t ≥ |x|p }

of the function |x|p .

The result on the epigraph of (x+ )p is given by Proposition 9.2.1 with f (t) = t1/p , β = 2p−1
3p ; to
get the result on the epigraph of |x|p , take the sum of the already known to us barriers for the
epigraphs E+ , E− of the functions (x+ )p and ([−x]+ )p , thus obtaining the barrier for E− ∩ E+ ,
which is exactly the epigraph of |x|p .

Example 9.2.2 [epigraph of decreasing power function] The function



− ln x − ln(t − x−p ), 0<p≤1
− ln t − ln(x − t−1/p ), p>1

is 2-self-concordant barrier for the epigraph

cl{(x, t) ∈ R2 | t ≥ x−p , x > 0}

of the function x−p , p > 0.


148 CHAPTER 9. HOW TO CONSTRUCT SELF-CONCORDANT BARRIERS

2+p
The case of 0 < p ≤ 1 is given by Proposition 9.2.2 applied with f (x) = x−p , β = 3 . The case
of p > 1 can be reduced to the former one by swapping x and t.

Example 9.2.3 [epigraph of the exponent] The function

− ln t − ln(ln t − x)

is 2-self-concordant barrier for the epigraph

{(x, t) ∈ R2 | t ≥ exp{x}}

of the exponent.

2
Proposition 9.2.1 applied with f (t) = ln t, β = 3

Example 9.2.4 [epigraph of the entropy function] The function

− ln x − ln(t − x ln x)

is 2-self-concordant barrier for the epigraph

cl{(x, t) ∈ R2 | t ≥ x ln x, x > 0}

of the entropy function x ln x.

1
Proposition 9.2.2 applied to f (x) = x ln x, β = 3

The indicated examples allow to handle those of the constraints defining the feasible set G
which are separable, i.e., are of the type
X
fi (aTi x + bi ),
i

fi being a convex function on the axis. To make this point clear, let us look at the typical
example - the | · |p -approximation problem (Lp ). Introducing N additional variables ti , we can
rewrite this problem equivalently as
N
X
minimize ti s.t. ti ≥ |vi − uTi x|p , i = 1, ..., N,
i=1

so that now there are N ”simple” constraints rather than a single, but ”complicated” one.
Now, the feasible set of i-th of the ”simple” constraints is the inverse image of the epigraph of
the increasing power function under an affine mapping, so that the feasible domain G of the
reformulated problem admits the following explicit self-concordant barrier (Example 9.2.1 plus
the usual Decomposition and Substitution rules):
N
X 2/p
F (t, x) = − [ln(ti − (vi − uTi x)2 ) + 2 ln ti ]
i=1

with the parameter 4N .


9.3. FRACTIONAL-QUADRATIC SUBSTITUTION 149

9.3 Fractional-Quadratic Substitution


Now let me indicate an actually marvellous nonlinear substitution: the fractional-quadratic one.
The simplest form of this substitution is

ξ2
A(τ, ξ, η) = τ −
η

(ξ, η, τ are real variables and η > 0); the general case is given by ”vectorization” of the numerator
and the denominator and looks as follows:
Given are

• [numerator] A symmetric bilinear mapping

Q[ξ 0 , ξ 00 ] : Rn × Rn → RN

so that the coordinates Qi [ξ 0 , ξ 00 ] of the image are of the form

Qi [ξ 0 , ξ 00 ] = (ξ 0 )T Qi ξ 00

with symmetric n × n matrices Qi ;

• [denominator] A symmetric n×n matrix A(η) affinely depending on certain vector η ∈ Rq .

The indicated data define the general fractional-quadratic mapping

A(τ, ξ, η) = τ − Q[A−1 (η)ξ, ξ] : Rqη × Rnξ × RN N


τ →R ;

it turns out that this mapping is, under reasonable restrictions, appropriate for domains in RN .
To formulate the restrictions, note first that A is not necessarily everywhere defined, since the
matrix A(η) may, for some η, be singular. Therefore it is reasonable to restrict η to vary in
certain closed convex domain Y ∈ Rqη ; thus, from now on the mapping A is considered along
with the domain Y where η varies. The conditions which ensure that A is compatible with a
given closed convex domain G+ ⊂ RN are as follows:
(A): A(η) is positive definite for η ∈ int Y ;
(B): the bilinear form Q[A−1 (η)ξ 0 , ξ 00 ] of ξ 0 , ξ 00 is symmetric in ξ 0 , ξ 00 for any η ∈ int Y ;
(C): the quadratic form Q[ξ, ξ] takes its values in the recessive cone K of the domain G+ .

Proposition 9.3.1 Under assumptions (A) - (C) the mappings

A(τ, ξ, η) = τ − Q[A−1 (η)ξ, ξ] : G− ≡ Y × Rnξ × RN


τ →R
N

and
B(ξ, η) = −Q[A−1 (η)ξ, ξ] : G− ≡ Y × Rnξ → RN
are 1-appropriate for G+ .
In particular, if F + is ϑ+ -self-concordant barrier for G+ and FY is a ϑY -self-concordant
barrier for Y , then
F (τ, ξ, η) = F + (τ − Q[A−1 (η)ξ, ξ]) + FY (η)
is (ϑ+ + ϑY )-self-concordant barrier for the closed convex domain

G = cl{(τ ξ, η) | τ − Q[A−1 (η)ξ, ξ] ∈ int G+ , η ∈ int Y }.


150 CHAPTER 9. HOW TO CONSTRUCT SELF-CONCORDANT BARRIERS

The proof of the proposition is given in Section 9.5. What we are about to do now is to present
several examples.

Example 9.3.1 [epigraph of convex quadratic form] Let f (x) = xT P T P x + bT x + c be a convex


quadratic form on Rn ; then the function

F (t, x) = − ln(t − f (x))

is 1-self-concordant barrier for the epigraph

{(x, t) ∈ Rn × R | t ≥ f (x)}.

Let the ”fractional-quadratic” data be defined as follows:

• G+ = R+ [N = 1];

• Q[ξ 0 , ξ 00 ] = (ξ 0 )T ξ 00 , ξ 0 , ξ 00 ∈ Rn ;

• Rqη = R = Y, A(η) ≡ I
(from now on I stands for the identity operator).

Conditions (A) - (C) clearly are satisfied; Proposition 9.3.1 applied with

F + (τ ) = − ln τ, FY (·) ≡ 0

says that the function


F (τ, ξ, η) = − ln(τ − ξ T ξ)
is 1-self-concordant barrier for the closed convex domain

G = {(τ, ξ, η) | τ ≥ ξ T ξ}.

The epigraph of the quadratic form f clearly is the inverse image of G under the affine mapping
 
τ = t − bT x − c
(t, x) 7→  ξ = Px ,
η=0

and it remains to apply the Substitution rule (L).


The result stated in the latter example is not difficult to establish directly, which hardly can
be said about the following

Example 9.3.2 [barrier for the second-order cone] The function

F (t, x) = − ln(t2 − xT x)

is 2-logarithmically homogeneous self-concordant barrier for the second order cone



Kn2 = {(t, x) ∈ R × Rn | t ≥ xT x}.

Let the ”fractional-quadratic” data be defined as follows:

• G+ = R+ [N = 1];
9.3. FRACTIONAL-QUADRATIC SUBSTITUTION 151

• Q[ξ 0 , ξ 00 ] = (ξ 0 )T ξ 00 , ξ 0 , ξ 00 ∈ Rn ;

• Y = R+ ⊂ R ≡ Rqη , A(η) ≡ ηI.


Conditions (A) - (C) clearly are satisfied; Proposition 9.3.1 applied with

F + (τ ) = − ln τ, FY (η) = − ln η

says that the function

F (τ, ξ, η) = − ln(τ − η −1 ξ T ξ) − ln η ≡ − ln(τ η − ξ T ξ)

is 2-self-concordant barrier for the closed convex domain

G = cl{(τ, ξ, η) | τ > η −1 ξ T ξ, η > 0}.

The second order cone Kn2 clearly is the inverse image of G under the affine mapping
 
τ =t
(t, x) 7→  ξ = x  ,
η=t

and to prove that F (t, x) is 2-self-concordant barrier for the second order cone, it remains to
apply the Substitution rule (L). Logarithmic homogeneity of F (t, x) is evident.
The next example originally required somewhere 15-page ”brut force” justification which was
by far more complicated than the justification of the general results presented in this lecture.
Example 9.3.3 [epigraph of the spectral norm of a matrix] The function

F (t, x) = − ln Det (tI − t−1 xT x) − ln t

is (m + 1)-logarithmically homogeneous self-concordant barrier for the epigraph

{(t, x) | t ∈ R, x is an m × k matrix of the spectral norm ≤ t}.

of the spectral norm of k × m matrix x 1 .

Let the ”fractional-quadratic” data be defined as follows:


• G+ = Sm
+ is the cone of positive semidefinite m × m matrices [N = m(m + 1)/2];

• Q[ξ 0 , ξ 00 ] = 12 [(ξ 0 )T ξ 00 + (ξ 00 )T ξ 0 ], ξ 0 , ξ 00 are k × m matrices;

• Y = R+ ⊂ R ≡ Rqη , A(η) ≡ ηI.


Conditions (A) - (C) clearly are satisfied; Proposition 9.3.1 applied with

F + (τ ) = − ln Det τ, FY (η) = − ln η

says that the function


F (τ, ξ, η) = − ln(τ − η −1 ξ T ξ) − ln η
1

the spectral norm of a k × m matrix x is the maximum eigenvalue of the matrix xT x or, which is the same,
the norm
max{|xξ|2 | ξ ∈ Rm , |ξ|2 ≤ 1}
of the linear operator from Rm into Rk given by x
152 CHAPTER 9. HOW TO CONSTRUCT SELF-CONCORDANT BARRIERS

is (m + 1)-self-concordant barrier for the closed convex domain

G = cl{(τ, ξ, η) | τ − η −1 ξ T ξ ∈ int Sm
+ , η > 0}.

The spectral norm of a k × m matrix x is < t if and only if the maximum eigenvalue of the
matrix xT x is < t2 , or, which is the same, if the m × m matrix tI − t−1 xT x is positive definite;
thus, the epigraph of the spectral norm of x is the inverse image of G under the affine mapping
 
τ = tI

(t, x) 7→ ξ = x  ,
η=t

and to prove that F (t, x) is (m+1)-self-concordant barrier for the epigraph of the spectral norm,
it suffices to apply the Substitution rule (L). The logarithmic homogeneity of F (t, x) is evident.

The indicated examples of self-concordant barriers are sufficient for applications which will
be our goal in the remaining lectures; at the same time, these examples explain how to use the
general results of the lecture to obtain barriers for other convex domains.

9.4 Proof of Theorem 10.1


.
A. Let us prove that G0 is an open convex domain in Rn . Indeed, since A is continuous on
int G− , G0 clearly is open; thus, all we need is to demonstrate that G0 is convex. Let x0 , x00 ∈ G0 ,
so that x0 , x00 are in int G− and y 0 = A(x0 ), y 00 = A(x00 ) are in int G+ , and let α ∈ [0, 1]. We
should prove that x ≡ αx0 + (1 − α)x00 ∈ G0 , i.e., that x ∈ int G− (which is evident) and that
y = A(x) ∈ int G. To prove the latter inclusion, it suffices to demonstrate that

y ≥K αy 0 + (1 − α)y 00 ; (9.2)

indeed, the right hand side in this inequality belongs to int G+ together with y 0 , y 00 ; since K
is the recessive cone of G+ , the translation of any vector from int G+ by a vector form K also
belongs to int G+ , so that (9.2) - which says that y is a translation of the right hand side by a
direction from K would imply that y ∈ int G+ .
To prove (9.2) is the same as to demonstrate that

sT y ≥ sT (αy 0 + (1 − α)y 00 ) (9.3)

for any s ∈ K ∗ ≡ {s | sT u ≥ 0 ∀u ∈ K} (why?) But (9.3) is immediate: the real-valued


function
f (z) = sT A(z)
is concave on int G− , since D2 A(z)[h, h] ≤K 0 (Definition 9.1.1.(i)) and, consequently,

D2 f (z)[h, h] = sT D2 A(z)[h, h] ≤ 0

(recall that s ∈ K ∗ ); since f (z) is concave, we have

sT y = f (αx0 + (1 − α)x00 ) ≥ αf (x0 ) + (1 − α)f (x00 ) = αsT y 0 + (1 − α)sT y 00 ,

as required in (9.3).
9.4. PROOF OF THEOREM 10.1 153

B. Now let us prove self-concordance of F . To this end let us fix x ∈ G0 and h ∈ Rn and
verify that
|D3 F (x)[h, h, h]| ≤ 2{D2 F (x)[h, h]}3/2 , (9.4)
|DF (x)[h]| ≤ ϑ1/2 {D2 F (x)[h, h]}1/2 . (9.5)
B.1. Let us start with writing down the derivatives of F . Under notation

a = A(x), a0 = DA(x)[h], a00 = D2 A(x)[h, h], a000 = D3 A(x)[h, h, h],

we have
DF (x)[h] = DF + (a)[a0 ] + γ 2 DF − (x)[h], γ = max[1, β], (9.6)
D2 F (x)[h, h] = D2 F + (a)[a0 , a0 ] + DF + (a)[a00 ] + γ 2 D2 F − (x)[h, h], (9.7)
D3 F (x)[h, h, h] = D3 F + (a)[a0 , a0 , a0 ] + 3DF + (a)[a0 , a00 ] + DF + (a)[a000 ] + γ 2 D3 F − (x)[h, h, h].
(9.8)
B.2. Now let us summarize our knowledge on the quantities involved into (9.6) - (9.8).
Since F + is ϑ+ -self-concordant barrier, we have
q q
|DF + (a)[a0 ]| ≤ p ϑ+ , p ≡ D2 F + (a)[a0 , a0 ], (9.9)

|D3 F + (a)[a0 , a0 , a0 ]| ≤ 2p3 . (9.10)


Similarly, since F− is ϑ− -self-concordant barrier, we have
q q
|DF − (x)[h]| ≤ q ϑ− , q ≡ D2 F − (x)[h, h], (9.11)

|D3 F − (x)[h, h, h]| ≤ 2q 3 . (9.12)


Besides this, from Corollary 3.2.1 (Lecture 3) we know that DF + (a)[·] is nonpositive on the
recessive directions of G+ :
DF + (a)[g] ≤ 0, g ∈ K, (9.13)
and even that
{D2 F + (a)[g, g]}1/2 ≤ −DF + (a)[g], g ∈ K. (9.14)
B.3. Let us prove that
3βqa00 ≤K a000 ≤K −3βqa00 . (9.15)
Indeed, let a real t be such that |t|q ≤ 1, and let ht = th; then D2 F − (x)[ht , ht ] = t2 q 2 ≤ 1 and,
consequently, x ± ht ∈ G− (I., Lecture 2). Therefore Definition 9.1.1.(ii) implies that

t3 a000 ≡ D3 A(x)[ht , ht , ht ] ≤K −3βD2 A(x)[ht , ht ] ≡ −3βt2 a00 ;

since the inequality t3 a000 ≤K −3βt2 a00 is valid for all t with |t|q ≤ 1, (9.15) follows.
Note that from (9.13) and (9.15) it follows that the quantity
q
r≡ DF + (a)[a00 ] (9.16)

is well-defined and is such that


|DF + (a)[a000 ]| ≤ 3βqr2 . (9.17)
Besides this, by Cauchy’s inequality
q q
|D2 F + (a)[a0 , a00 ]| ≤ D2 F + (a)[a0 , a0 ] D2 F + (a)[a00 , a00 ] ≤ pr2 (9.18)
154 CHAPTER 9. HOW TO CONSTRUCT SELF-CONCORDANT BARRIERS

(the concluding inequality follows from (9.14)).


B.4. Substituting (9.9), (9.11) into (9.6), we come to
q q
|DF (x)[h]| ≤ p ϑ+ + qγ 2 ϑ− ; (9.19)

substituting (9.16) into (9.7), we get

D2 F (x)[h, h] = p2 + r2 + γ 2 q 2 , (9.20)

while substituting (9.10), (9.12), (9.17), (9.18) into (9.8), we obtain

3 3
|D3 F (x)[h, h, h]| ≤ 2[p3 + pr2 + βqr2 ] + 2γ 2 q 3 . (9.21)
2 2
By passing from q to s = γq, we come to inequalities
q q
|DF (x)[h]| ≤ ϑ+ p + ϑ− γs, D2 F (x)[h, h] = p2 + r2 + s2 ,

and
3 3β 2
|D3 F (x)[h, h, h]| ≤ 2[p3 + pr2 + sr ] + 2γ −1 s3 ≤
2 2γ
[since γ ≥ β and γ ≥ 1]
3
≤ 2[p3 + s3 + r2 (p + s)] ≤
2
[straightforward computation]
≤ 2[p2 + r2 + s2 ]3/2 .
Thus,
q
|DF (x)[h]| ≤ ϑ+ + γ 2 ϑ− {D2 F (x)[h, h]}1/2 , |D3 F (x)[h, h, h]| ≤ 2{D2 F (x)}1/2 . (9.22)

C. (9.22) says that F satisfies the differential inequalities required by the definition of a
γ 2 -self-concordantbarrier for G = cl G0 . To complete the proof, we should demonstrate that F
is a barrier for G, i.e., that F (xi ) → ∞ whenever xi ∈ G0 are such that x ≡ limi xi ∈ ∂G. To
prove the latter statement, set
yi = A(xi )
and consider two possible cases:
C.1: x ∈ int G− ;
C.2: x ∈ ∂G− .
In the easy case of C.1 there exists y = limi yi = A(x), since A is continuous on the interior
of G− and, consequently, in a neighbourhood of x. Since x 6∈ G0 , y 6∈ int G+ , so that the
sequence yi comprised of the interior points of G+ converges to a boundary point of G+ and
therefore F + (yi ) → ∞. Since xi converge to an interior point of G− , the sequence F − (xi ) is
bounded, and the sequence F (xi ) = F + (yi ) + γ 2 F − (xi ) diverges to +∞, as required.
Now consider the more difficult case when x ∈ ∂G− . Here we know that F − (xi ) → ∞ (since
xi converge to a boundary point of the domain G− for which F − is a self-concordant barrier);
in order to prove that F (xi ) ≡ F + (yi ) + γ 2 F − (xi ) → ∞ it suffices, therefore, to prove that the
sequence F + (yi ) is below bounded. From concavity of A we have (compare with A)

yi = A(xi ) ≤K A(x0 ) + DA(x0 )[xi − x0 ] ≡ zi ,


9.5. PROOF OF PROPOSITION 10.1 155

whence, by Corollary 3.2.1, Lecture 3,

F + (yi ) ≥ F + (zi ).

Now below boundedness of F + (yi ) is an immediate conseqeunce of the fact that the sequence
F + (zi ) is below bounded (indeed, {xi } is a bounded sequence, and consequently its image {zi }
under affine mapping also is bounded; and convex function F + is below bounded on any bounded
subset of its domain).

9.5 Proof of Proposition 10.1


.
A. Looking at the definition of an appropriate mapping and taking into account that B
is the restriction of A onto a cross-section of the domain of A and an affine plane t = 0, we
immediately conclude that it suffices to prove that A is 1-appropriate for G+ . Of course, A is
3 times continuously differentiable on the interior of G− .
B. The coordinates of the vector Q[A−1 (η)ξ 0 , ξ 00 ] are bilinear forms (ξ 0 )T A−1 (η)Qi ξ 00 of ξ 0 ,
ξ 00 ; by (B), they are symmetric in ξ 0 , ξ 00 , so that the matrices A−1 (η)Qi are symmetric. Since
both A−1 (η) and Qi are symmetric, their product can be symmetric if and only if the matrices
commutate. Since A−1 (η) commutate with Qi , η ∈ int Y , and Y is open, A(η) commutate with
Qi for all η. Thus, we come to the following crucial conclusion:
for every i ≤ N , the matrix A(η) commutates with Qi for all η.
C. Let us compute the derivatives of A at a point X = (τ, ξ, η) ∈ int G− in a direction
Ξ = (t, x, y). In what follows subscript i marks i-th coordinate of a vector from RN . Note that
from B. it follows that Qi commutates with α(·) ≡ A−1 (·) and therefore with all derivatives of
α(·); with this observation, we immediately obtain

Ai (X) = τi − ξ T α(η)Qi ξ;

DAi (X)[Ξ] = ti − 2xT α(η)Qi ξ − ξ T [Dα(η)[y]]Qi ξ;


D2 Ai (X)[Ξ, Ξ] = −2xT α(η)Qi x − 4xT [Dα(η)[y]]Qi ξ − ξ T [D2 α(η)[y, y]]Qi ξ;
D3 Ai (X)[Ξ, Ξ, Ξ] = −6xT [Dα(η)[y]]Qi x − 6xT [D2 α(η)[y]]Qi ξ − ξ T [D3 α(η)[y, y, y]]Qi ξ.
Now, denoting
α = α(η), a0 = DA(η)[y], (9.23)
we immediately get
Dα(η)[y] = −αa0 α, D2 α(η)[y, y] = 2αa0 αa0 α,
D3 α(η)[y, y, y] = −6αa0 αa0 αa0 α.
Substituting the expressions for the derivatives of α(·) in the expressions for the dreivatives of
Ai , we come to
D2 Ai (X)[Ξ, Ξ] = −2ζ T αQi ζ, ζ = x − a0 αξ, (9.24)
and
D3 Ai (X)[Ξ, Ξ, Ξ] = 6ζ T αa0 αQi ζ (9.25)
(the simplest way to realize why ”we come to” is to substitute in the latter two right hand
sides the expression for ζ and to open the parentheses, taking into account that α and a0 are
symmetric and commutate with Qi ).
156 CHAPTER 9. HOW TO CONSTRUCT SELF-CONCORDANT BARRIERS

D. Now we are basically done. First, α commutates with Qi and is positive definite in view
of condition (A) (since α = A−1 (η) and η ∈ int Y ). It follows that α1/2 also commutates with
Qi , so that (9.24) can be rewritten as
√ √
D2 Ai (X)[Ξ] = −2[ αζ]T Qi [ αζ],

which means that


D2 A(X)[Ξ, Ξ] = −2Q[ω, ω]
for certain vector ω, so that
D2 A(X)[Ξ, Ξ] ≤K 0
according to (C). Thus, A is concave with respect to the recessive cone K of the domain G+ , as
is required by item (i) of Definition 9.1.1.
It requires to verify item (ii) of the Definition for the case of β = 1, , i.e., to prove that

D3 A(X)[Ξ, Ξ, Ξ] + 3D2 A(X)[Ξ, Ξ] ≤K 0

whenever Ξ is such that X ± Ξ ∈ G− . This latter inclusion means that η ± y ∈ Y , so that


A(η ± y) is positive semidefinite; since A(·) is affine, we conclude that

B = A(η) − DA(η)[y] ≡ α−1 − a0 ≥ 0

(as always, ≥ 0 for symmetric matrices stands for ”positive semidefinite”), whence also

γ = α[α−1 − a0 ]α ≥ 0.

From (9.24), (9.25) it follows that

D3 Ai (X)[Ξ, Ξ, Ξ] + 3D2 Ai (X)[Ξ, Ξ] = −6ζ T γQI ζ,

and since γ is positive semidefinite and, due to its origin, commutates with Qi (since α and a0
do), we have ζ T γQi ζ = ζ T γ 1/2 Qi γ 1/2 ζ, so that

D3 A(X)[Ξ, Ξ, Ξ] + 3D2 A(X)[Ξ, Ξ] = −6Q[γ 1/2 ζ, γ 1/2 ζ] ≤K 0

(the concluding inequality follows from (C)).


9.6. EXERCISES ON CONSTRUCTING SELF-CONCORDANT BARRIERS 157

9.6 Exercises on constructing self-concordant barriers


The goal of the below exercises is to derive some new self-concordant barriers.

9.6.1 Epigraphs of functions of Euclidean norm


Exercise 9.6.1 #+ Let G+ be a closed convex domain in R2 which contains a point with both
coordinates being positive and is ”antimonotone in the x-direction”, i.e., such that (u, s) ∈ G+ ⇒
(v, s) ∈ G+ whenever v ≤ u, and let F + be a ϑ+ -self-concordant barrier for G. Prove that
1) The function
F 1 (t, x) = F + (xT x, t)
is ϑ+ -self-concordant barrier for the closed convex domain

G1 = {(x, t) ∈ Rn × R | (xT x, t) ∈ G}.

Derive from this observation that if p ≤ 2, then the function

F (t, x) = − ln(t2/p − xT x) − ln t

is 2-self-concordant barrier for the epigraph of the function |x|p2 on Rn .


2) The function
xT x
F 2 (t, x) = F + ( , t) − ln t
t
is (ϑ+ + 1)-self-concordant barrier for the closed convex domain

xT x
G2 = cl{(x, s) ∈ Rn × R | ( , t) ∈ G, t > 0}.
t
Derive from this observation that if 1 ≤ p ≤ 2, then the function

F (t, x) = − ln(t2/p − xT x) − ln t

is 3-self-concordant barrier for the epigraph of the function |x|p2 on Rn .

9.6.2 How to guess that − ln Det x is a self-concordant barrier


Now our knowledge on concrete self-concordant barriers is as follows. We know two ”building
blocks” - the barriers − ln t for the nonnegative half-axis and − ln Det x for the cone of positive
semidefinite symmetric matrices; the fact that these barriers are self-concordant was justified
by straightforward computation, completely trivial for the former and not that difficult for the
latter barrier. All other self-concordant barriers were given by these two via the Substitution
rule (N). It turns out that the barrier − ln Det x can be not only guessed, but also derived from
the barrier − ln t via the same Substitution rule (N), so that in fact only one barrier should be
guessed.

Exercise 9.6.2 #

1) Let  
τ ξT
A=
ξ η
be a symmetric matrix (τ is p × p, η is q × q). Prove that A is positive definite if and only if both
p+q
the matrices η and τ − ξ T η −1 ξ are positive definite; in other words, the cone S+ of positive
158 CHAPTER 9. HOW TO CONSTRUCT SELF-CONCORDANT BARRIERS

semidefinite symmetric (p + q) × (p + q) matrices is the inverse image G, in terms of Substitution


rule (N), of the cone G+ = Sp under the fractional-quadratic mapping
A : (τ, ξ, η) 7→ τ − ξ T η −1 ξ
with the domain of the mapping {(τ, ξ, η) | η ∈ Y ≡ Sq+ }.
2) Applying Proposition 9.3.1, derive from 1), that if Fp and Fq are self-concordant barriers
for Sp+ , Sq+ with parameters ϑp , ϑq , respectively, then the function
F (A) ≡ F (τ, ξ, η) = Fp (τ − ξ T η −1 ξ) + Fq (η)
is (ϑp + ϑq )-self-concordant barrier for Sp+q
+ .
3) Use the observation that − ln η is 1-self-concordant barrier for S1+ ≡ R+ to prove by
induction on p that Fp (x) = − ln Det x is p-self-concordant barrier for Sp+ .

9.6.3 ”Fractional-quadratic” cone and Truss Topology Design


Consider the following hybride of the second-order cone and the cone S·+ : let ξ1 , ..., ξq be variable
matrices of the sizes n1 × m,...,nq × m, τ be m × m variable matrix and yj (η), j = 1, ..., q, be
symmetric nj × nj matrices which are linear homogeneous functions of η ∈ Rk . Let Y be certain
cone in Rk (closed, convex and with a nonempty interior) such that yj (η) are positive definite
when η ∈ int Y .
Consider the set
K = cl{(τ ; η; ξ1 , ..., ξq ) | τ ≥ ξ1T y1−1 (η)ξ1 + ... + ξqT yq−1 (η)ξq , η ∈ int Y }.
Let also FY (η) be a ϑY -self-concordant barrier for Y .
Exercise 9.6.3 + Prove that K is a closed convex cone with a nonempty interior, and that the
function
 
F (τ ; η; ξ1 , ..., ξq ) = − ln Det τ − ξ1T y1−1 (η)ξ1 − ... − ξqT yq−1 (η)ξq + FY (η) (9.26)

is (m + ϑY )-self-concordant barrier for K; this barrier is logarithmically homogeneous, if FY is.


Prove that K is the inverse image of the cone SN + of positive semidefinite N × N symmetric
matrices, N = m + n1 + ... + nq , under the linear homogeneous mapping
 
τ ξ1T ξ2T ξ3T ... ξqT
 ξ1 y1 (η) 
 
ξ y2 (η) 
 
L : (τ ; η; ξ1 , ..., ξq ) 7→  2 
 ξ3 y3 (η) 
 
 ... ... ... ... ... ... 
ξq yq (η)
(blank space corresponds to zero blocks). Is the barrier (9.26) the barrier induced, via the Sub-
stitution rule (L), by the mapping L and the standard barrier − ln Det (·) for SN+?

Now we are in a position to complete, in a sense, our considerations related to the Truss
Topology Design problem (Section 5.7, Lecture 5). To the moment we know two formulations
of the problem:
Dual form (TTDd ): minimize t by choice of the vector x = (t; λ1 , ..., λk ; z1 , ..., zm ) (t and λj
are reals, zi ∈ Rn ) subject to the constraints
k
" #
X (bT zj )2
t≥ 2zjT fj +V i , i = 1, ..., m,
j=1
λj
9.6. EXERCISES ON CONSTRUCTING SELF-CONCORDANT BARRIERS 159
X
λ ≥ 0; λj = 1.
j

Primal form (ψ): minimize t by choice of x = (t; φ; βij ) (t and βij , i = 1, ..., m, j = 1, ..., k
are reals, φ ∈ Rm ) subject to the constraints
m
X 2
βij
t≥ , j = 1, ..., k;
i=1
φi
m
X
φ ≥ 0; φi = V ;
i=1
m
X
βij bi = fj , j = 1, ..., k.
i=1

Both forms are respectable convex problems; the question, anyhow, is whether we are
equipped enough to solve them via interior point machinery, or, in other words, are we clever
enough to point out explicit self-concordant barriers for the corresponding feasible domains. The
answer is positive.
Exercise 9.6.4 Consider the problem (TTDd ), and let

x = (t; λ1 , ..., λk ; z1 , ..., zm )

be the design vector of the problem.


1) Prove that (TTDd ) can be equivalently written down as the standard problem

minimize cT x ≡ t s.t. x ∈ G ⊂ E,

where
k
X
E = {x | λj = 1}
j=1

is affine hyperplane in Rdim x and

G = {x ∈ E | x is feasible for (TTDd )}

is a closed convex domain in E.


2)+ Let u = (si ; tij ; rj ) (i runs over {1, ..., m}, j runs over {1, ..., k}, s· , t· , r· are reals), and
let  
m k k
X X t2ij X
Φ(u) = − ln si − − ln rj .
i=1 j=1 j
r j=1

Prove that Φ is (m+k)-logarithmically homogeneous self-concordant barrier for the closed convex
cone
k
X
G+ = cl{u | rj > 0, j = 1, ..., k; si ≥ rj−1 t2ij , i = 1, ..., m},
j=1

and the Legendre transformation of the barrier is given by


k m
! m

X X τij2 X
Φ (σi ; τij ; ρj ) = − ln −ρj + − ln(−σi ) − (m + k),
j=1 i=1
4σi i=1
160 CHAPTER 9. HOW TO CONSTRUCT SELF-CONCORDANT BARRIERS

the domain of Φ∗ being the set


m
X τij2
G0 = {σi < 0, i = 1, ..., m; −ρj + > 0, j = 1, ..., k}.
i=1
4σi

3) Prove that the domain G of the standard reformulation of (TTDd ) given by 1) is the
inverse image of G# = cl G0 under the affine mapping
 P 
si = t − 2 kj=1 zjT fj
 √ 
x 7→ πx + p =  tij = (bTi zj ) V 
rj = λj

(the mapping should be restricted onto E).


Conclude from this observation that one can equip G with the (m + k)-self-concordant barrier

F (x) = Φ(πx + p)

and thus get the possibility to solve (TTDd ) by the long-step path-following method.
Note also that the problem

minimize cT x ≡ t s.t. x ∈ E, πx + p ∈ G+

is a conic reformulation of (TTDd ), and that Φ is a (m + k)-logarithmically homogeneous self-


concordant barrier for the underlying cone G+ ; since we know the Legendre transformation of
Φ, we can solve the problem by the primal-dual potential reduction method as well.

Note that the primal formulation (ψ) of TTD can be treated in completely similar way (since
its formal structure is similar to that one of (TTDd ), up to presence of a larger number of linear
equality constraints; linear equalities is something which does not influence our abilities to point
out self-concordant barriers, due to the Substitution rule (L).
9.6. EXERCISES ON CONSTRUCTING SELF-CONCORDANT BARRIERS 161

9.6.4 Geometrical mean


The below problems are motivated by by the following observation: the function ξ 2 /η of two
scalar variables is convex on the half-plane {η > 0}, and we know how to write down a self-
concordant barrier for its epigraph - it is given by our marvellous fractional-quadratic substitu-
tion. How to get similar barrier for the epigraph of the function (ξ+ )p /η p−1 (p > 1 is integer),
which, as it is easily seen, also is convex when η > 0?
The epigraph of the function f (ξ, η) = (ξ+ )p /η p−1 is the set

cl{(τ, ξ, η) | η > 0, τ η p−1 ≥ (ξ+ )p }

This is a cone in R3 , which clearly is the inverse image of the hypograph

G = {(t, y1 , ..., yp ) ∈ Rp+1 | y1 , ..., yp ≥ 0, t ≤ φ(y) = (y1 ...yp )1/p }

under the affine mapping


L : (τ, ξ, η) 7→ (ξ, τ, η, η, ..., η),
so that the problem in question in fact is where to get a self-concordant barrier for the hypograph
G of the geometrical mean. This latter question is solved by the following observation:
(G): the mapping

A(t, y1 , ..., yp ) = (y1 ...yp )1/p − t : G− → R, G− = {(t, y) ∈ Rp+1 | y ≥ 0}

is 1-appropriate for the domain G+ = R+ .

Exercise 9.6.5 + Prove (G).

Exercise 9.6.6 + Prove that the mapping

B(τ, ξ, η) = τ 1/p η (p−1)/p − ξ : int G− → R, G− = {(τ, ξ, η) | τ ≥ 0, η ≥ 0}

is 1-appropriate for G+ = R+ .
Conclude from this observation that the function

F (τ, ξ, η) = − ln(τ 1/p η (p−1)/p − ξ) − ln τ − ln η

is 3-logarithmically homogeneous self-concordant barrier for the cone

cl{(τ, ξ, η) | η > 0, τ ≥ (ξ+ )p η −(p−1) }

which is the epigraph of the function (ξ+ )p η −(p−1) .


162 CHAPTER 9. HOW TO CONSTRUCT SELF-CONCORDANT BARRIERS
Chapter 10

Applications in Convex
Programming

To the moment we know several general schemes of polynomial time interior point methods; at
the previous lecture we also have developed technique for constructing self-concordant barriers
the methods are based on. It is time now to look how this machinery works. To this end let us
consider several standard classes of convex programming problems. The order of exposition is as
follows: for each class of problems in question, I shall present the usual description of the problem
instances, the standard and conic reformulations required by the interior point approcah, the
related self-concordant barriers and, finally, the complexities (Newton and arithmetic) of the
resulting methods.
In what follows, if opposite is not explicitly stated, we always assume that the constraints
involved into the problem satisfy the Slater condition.

10.1 Linear Programming


Consider an LP problem in the canonical form:

minimize cT x s.t. x ∈ G ≡ {x | Ax ≤ b}, (10.1)


A being m × n matrix of the full column rank1

Path-following approach can be applied immediately:


Standard reformulation: the problem from the very beginning is in the standard form;
Barrier: as we know, the function
m
X
F (x) = − ln(bj − aTj x)
j=1

is m-self-concordant barrier for G;


Structural assumption from Lecture 8 is satisfied: indeed,
m
X
F (x) = Φ(b − Ax), Φ(u) = − ln uj : int Rm
+ →R (10.2)
j=1
1
the assumption that the rank of A is n is quite natural, since otherwise the homogeneous system Ax = 0
has a nontrivial solution, so that the feasible domain of the problem, if nonempty, contains lines. Consequently,
the problem, if feasible, is unstable: small perturbation of the objective makes it below unbounded, so that the
problems of this type might be only of theoretical interest

163
164 CHAPTER 10. APPLICATIONS IN CONVEX PROGRAMMING

and Φ is m-logarithmically homogeneous self-concordant barrier for the m-dimensional nonneg-


ative orthant; the Legendre transformation of Φ, as it is immediately seen, is
m
X
Φ∗ (s) = − ln(−sj ) − m : int Rm
− → R. (10.3)
j=1

Thus, to solve an LP problem, we can use both the basic and the long-step versions of the
path-following method.
Complexity: as we remember, the Newton complexity of finding an ε-solution by a path-following

method associated with a ϑ-self-concordant barrier is M = O(1) ε ln(Vε−1 ), O(1) being certain
absolute constant2 and V is a data-dependent scale factor. Consequently, the arithmetic cost of
an ε-solution is MN , where N is the arithmetic cost of a single Newton step. We see that the
complexity of the method is completely characterized by the quantities ϑ and N . Note that the
product √
C = ϑN
is the factor at the term ln(Vε−1 ) in the expression for the arithmetic cost of an ε-solution; thus,
C can be thought of as the arithmetic cost of an accuracy digit in the solution (since ln(Vε−1 )
can be naturally interpreted as the amount of accuracy digits in an ε-solution).
Now, in the situation in question ϑ = m is the larger size of the LP problem, and it remains
to understand what is the cost N of a Newton step. At a step we are given an x and should
form and solve with respect to y the linear system of the type

F 00 (x)y = −tc − F 0 (x);

the gradient and the Hessian of the barrier in our case, as it is immediately seen, are given by
m
X
F 0 (x) = dj aj , F 00 (x) = AT D2 A,
i=j

where
dj = [bj − aTj x]−1
are the inverse residuals in the constraints at the point x and

D = Diag(d1 , ..., dm ).

It is immediately seen that the arithmetic cost of assembling the Newton system (i.e., the cost
of computing F 0 and F 00 ) is O(mn2 ); to solve the system after it is assembled, it takes O(n3 )
operations more3 . Since m ≥ n (recall that Rank A = n), the arithmetic complexity of a step is
dominated by the cost O(mn2 ) of assembling the Newton system. Thus, we come to

ϑ = m; N = O(mn2 ); C = O(m3/2 n2 ). (10.4)

Potential reduction approach also is immediate:


2
provided that the parameters of the method - i.e., the path tolerance κ and the penalty rate γ in the case of
the basic method and the path tolerance κ and the treshold κ in the case of the long step one - are once for ever
fixed
3
if the traditional Linear Algebra is used (Gauss elimination, Cholesski decomposition, etc.); there exists, at
least in theory, ”fast” Linear Algebra which allows to invert an N × N matrix in O(N γ ) operations for some γ < 3
rather than in O(N 3 ) operations
10.1. LINEAR PROGRAMMING 165

Conic reformulation of the problem is given by


minimize f T y s.t. y ∈ {L + b} ∩ K, (10.5)
where
K = Rm n
+ , L = A(R )
and f is m-dimensional vector which ”expresses the objective cT x in terms of y = Ax”, i.e., is
such that
f T Ax ≡ cT x;
one can set, e.g.,
f = A[AT A]−1 c
(non-singularity of AT A is ensured by the assumption that Rank A = n).
The cone K = Rm + clearly is self-dual, so that the conic dual to (10.5) is

minimize bT s s.t. s ∈ {L⊥ + f } ∩ Rm


+; (10.6)
as it is immediately seen, the dual feasible plane L⊥ + f is given by
L⊥ + f = {s | AT s = c}
(see Exercise 5.4.11).
Logarithmically homogeneous barrier for K = Rm + is, of course, the barrier Φ given by (10.2);
the parameter of the barrier is m, and its Legendre transformation Φ∗ is given by (10.3). Thus,
we can apply both the method of Karmarkar and the primal-dual method.
Complexity of the primal-dual method for LP is, at it is easily seen, completely similar to that
one of the path-following method; it is given by
ϑ = m; N = O(mn2 ); C = O(m3/2 n2 ).
The method of Karmarkar has the same arithmetic
√ cost N of a step, but worse Newton com-
plexity (proportional to ϑ = m rather than to ϑ), so that for this method one has
N = O(mn2 ), C = O(m2 n2 ).

Comments. 1) Karmarkar acceleration. The aforementioned expressions for C correspond to


the default assumption that we solve the sequential Newton systems ”from scratch” - indepen-
dently of each other. This is not the only possible policy: the matrices of the systems arising at
neighbouring steps are close to each other, and therefore there is a possibility to implement the
Linear Algebra in a way which results in certain progress in the average (over steps) arithmetic
cost of finding Newton directions. I am not going to describe the details of the corresponding
Karmarkar acceleration; let me say that this acceleration results in the (average over iterations)
value of N equal to O(m1/2 n2 ) instead of the initial value O(mn2 ) 4 . As a result, for the accel-
erated path-following and primal-dual methods we have C = O(mn2 ), and for the accelerated
method of Karmarkar C = O(m3/2 n2 ). Thus, the arithmetic complexity of an accuracy digit
in LP turns out to be the same as when solving systems of linear equations by the traditional
Linear Algebra technique.
2) Practical performance. One should be awared that the outlined complexity estimates for
interior point LP solvers give very poor impression of their actual performance. There are two
reasons for it:
4 √
provided that the problem is not ”too thin”, namely, that n ≥ O( m)
166 CHAPTER 10. APPLICATIONS IN CONVEX PROGRAMMING

• first, when evaluating the arithmetic cost of a Newton step, we have implicitly assumed
that the matrix of the problem is dense and ”unstructured”; this case never occurs in
actual large-scale computations, so that the arithmetic cost of a Newton step normally has
nothing in common with the above O(mn2 ) and heavily depends on the specific structure
of the problem;

• second, and more important fact is that the ”long-step” versions of the methods (like the
potential reduction ones and the long step path following method) in practice possess much
better Newton complexity than it is said by the theoretical worst-case efficiency estimate.
According to the latter estimate, the Newton complexity should be proportional at least
to the square root of the larger size m of the problem; in practice the dependence turns
out to be much better, something like O(ln m); in the real-world range of values of sizes it
means that the Newton complexity of long step interior point methods for LP is basically
independent of the size of the problem and is something like 20-50 iterations. This is the
source of ”competitive potential” of the interior point methods versus the Simplex method.

3) Unfeasible start. To the moment all schemes of interior point methods known to us have com-
mon practical drawback: they are indeed ”interior point schemes”, and to start a method, one
should know in advance a strictly feasible solution to the problem. In real-world computations
this might be a rather restrictive requirement. There are several ways to avoid this drawback,
e.g., the following ”big M ” approach: to solve (10.1), let us extend x by an artificial design
variable t and pass from the original problem to the new one

minimize cT x + M t s.t. Ax + t(b − e) ≤ b, −t ≤ 0;

here e = (1, ..., 1)T . The new problem admits an evident strictly feasible solution x = 0, t = 1;
on the other hand when M is large, then the x-component of optimal solution to the problem is
”almost feasible almost optimal” for the initial problem (theoretically, for large enough M the
x-components of all optimal solutions to the modified problem are optimal solutions to the initial
one). Thus, we can apply our methods to the modified problem (where we have no difficulties
with initial strictly feasible solution) and thus get a good approximate solution to the problem
of interest. Note that the same trick can be used in our forthcoming situations.

10.2 Quadratically Constrained Quadratic Programming


The problem here is to minimize a convex quadratic function g(x) over a set given by finitely
many convex quadratic constraints gj (x) ≤ 0. By adding extra variable t and extra constraint
g(x) − t ≤ 0 (note that it also is a convex quadratic constraint), we can pass from the problem
to an equivalent one with a linear objective and convex quadratic constraints. It is convenient
to assume that this reduction is done from the very beginning, so that the initial problem of
interest is

minimize ct x s.t. x ∈ G = {x | fj (x) = xT Aj x + bTj x + cj ≤ 0, j = 1, ..., m}, (10.7)

Aj being n × n positive semidefinite symmetric matrices.


Due to positive semidefiniteness and symmetry of Aj , we always can decompose these ma-
trices as Aj = BjT Bj , Bj being k(Bj ) × n rectangular matrices, k(Bj ) ≤ n; in applications,
normally, we should not compute these matrices, since Bj , together with Aj , form the ”matrix”
part of the input data.
Path-following approach is immediate:
10.2. QUADRATICALLY CONSTRAINED QUADRATIC PROGRAMMING 167

Standard reformulation: the problem from the very beginning is in the standard form.
Barrier: as we know from Lecture 9, the function

− ln(t − f (x))

is 1-self-concordant barrier for the epigraph {t ≥ f (x)} of a convex quadratic form f (x) =
xT B T Bx + bT x + c. Since the Lebesque set Gf = {x | f (x) ≤ 0} of f is the inverse image
of this epigraph under the linear mapping x 7→ (0, x), we conclude from the Substitution rule
(L) (Lecture 9) that the function − ln(−f (x)) is 1-self-concordant barrier for Gf , provided that
f (x) < 0 at some x. Applying the Decomposition rule (Lecture 9), we see that the function
m
X
F (x) = − ln(−fj (x)) (10.8)
j=1

is m-self-concordant barrier for the feasible domain G of problem (10.7).


Structural assumption. Let us demonstrate that the above barrier satisfies the Structural as-
sumption from Lecture 8. Indeed, let us set

r(Bj ) = k(Bj ) + 1

and consider the second order cones


q
2
Kr(Bj)
= {(τ, σ, ξ) ∈ R × R × Rk(Bj ) | τ ≥ σ 2 + ξ T ξ}.

Representing the quantity bTj x + cj as


" #2 " #2
1 + bTj x + cj 1 − bTj x − cj
bTj x + cj = − ,
2 2

we come to the following representation of the set Gj = {x | fj (x) ≤ 0}:

{x | fj (x) ≤ 0} ≡ {x | [Bj x]T [Bj x] + bTj x + cj ≤ 0} =


 " #2 " #2 
 1 − bTj x − cj 1 + bTj x + cj 
= x| ≥ + [Bj x]T [Bj x] =
 2 2 

[note that for x in the latter set bTj x + cj ≤ 0]


 v 
 u" #2 
 1 − bTj x − cj u 1 + bT x + cj 
≥t
j
= x| + [Bj x]T [Bj x]

 2 2 

2
Thus, we see that Gj is exactly the inverse image of the second order cone Kr(B under the
j)
affine mapping  
τ = 12 [1 − bTj x − cj ]
x 7→ πj x + pj =  σ = 21 [1 + bTj x + cj ]  .
ξ = Bj x
It is immediately seen that the above barrier − ln(−fj (x)) for Gj is the superposition of the
standard barrier
Ψj (τ, σ, ξ) = − ln(τ 2 − σ 2 − ξ T ξ)
168 CHAPTER 10. APPLICATIONS IN CONVEX PROGRAMMING

2
for the cone Kr(B and the affine mapping x 7→ πj x + pj . Consequently, the barrier F (x) for
j)
the feasible domain G of our quadraticaly constrained problem can be represented as
 
τ1 = 12 [1 − bT1 x − c1 ]
 σ1 = 1 [1 + bT x + c1 ] 
 2 1 
 ξ1 = B1 x 
 
 
F (x) = Φ(πx + p), πx + p =  ... , (10.9)
 
 τm = 1 [1 − bTm x − cm ] 
 2 
 σm = 1 [1 + bT x + cm ] 
2 m
ξm = Bm x
where m
X
Φ(τ1 , σ1 , ξ1 , ..., τm , σm , ξm ) = − ln(τj2 − σj2 − ξjT ξj ) (10.10)
j=1

is the direct sum of the standard self-concordant barriers for the second order cones Kr(B 2 ;
j)
as we know from Proposition 5.3.2.(iii), Φ is (2m)-logarithmically homogeneous self-concordant
2
barrier for the direct product K of the cones Kr(B . The barrier Φ possesses the immediately
j)
computable Legendre transformation

Φ∗ (s) = Φ(−s) + 2m ln 2 − 2m (10.11)

with the domain − int K.


Complexity. The complexity characteristics of the path-following method associated with barrier
(10.8), as it is easily seen, are given by

ϑ = m; N = O([m + n]n2 ); C = O(m1/2 [m + n]n2 ) (10.12)

(as in the LP case, expressions for N and C correspond to the case of dense ”unstructured”
matrices Bj ; in the case of sparse matrices with reasonable nonzero patterns these characteristics
become better).

Potential reduction approach also is immediate:


Conic reformulation of the problem is a byproduct of the above considerations; it is

minimize f T y s.t. y ∈ {L + p} ∩ K, (10.13)


Qm 2
where K = j=1 Kr(Bj ) is the above product of second order cones, L + b is the image of the
above affine mapping x 7→ πx + p and f is the vector which ”expresses the objective cT x in
terms of y = πx”, i.e., such that
f T πx = cT x;
it is immediately seen that such a vector f does exist, provided that the problem in question is
solvable.
The direct product K of the second order cones is self-dual (Exercise 5.4.7), so that the conic
dual to (10.13) is the problem

minimize pT s s.t. s ∈ {L⊥ + f } ∩ K (10.14)

with the dual feasible plane L⊥ + f given by

L⊥ + f = {s | π T s = c}
10.3. APPROXIMATION IN LP NORM 169

(see Exercise 5.4.11).


Logarithmically homogeneous self-concordant barrier with parameter 2m for the cone K is, as it
was already mentioned, given by (10.10); the Legendre transformation of Φ is given by (10.11).
Thus, we have in our disposal computable primal and dual barriers for (10.13) - (10.14) and
can therefore solve the problems by the method of Karmarkar or by the primal-dual method
associated with these barriers.
Complexity: it is immediately seen that the complexity characteristics of the primal-dual
method are given by (10.12); the characteristics N and C of the method of Karmarkar are

O( m) times worse than the corresponding characteristics of the primal-dual method.

10.3 Approximation in Lp norm


The problem of interest is
m
X
minimize |vj − uTj x|p , (10.15)
j=1

where 1 < p < ∞, uj ∈ Rn and vj ∈ R.

Path-following approach seems to be the only one which can be easily carried out (in the
potential reduction scheme there are difficulties with explicit formulae for the Legendre trans-
formation of the primal barrier).
Standard reformulation of the problem is obtained by adding m extra variables tj and rewriting
the problem in the equivalent form
m
X
minimize tj s.t. (t, x) ∈ G = {(t, x) ∈ Rm+n | |vj − uTj x|p ≤ tj , j = 1, ..., m}. (10.16)
j=1

Barrier: self-concordant barrier for the feasible set G of problem (10.16) was constructed in
Lecture 9 (Example 9.2.1, Substitution rule (L) and Decomposition rule):
m
X 2/p
F (t, x) = Fj (tj , x), Fj (t, x) = − ln(tj − (vj − uTj x)2 ) − 2 ln tj , ϑ = 4m.
j=1

Complexity of the path-following method associated with the indicated barrier is characterized
by
ϑ = 4m; N = O([m + n]n2 ); C = O(m1/2 [m + n]n2 ).
The above expression for the arithmetic complexity N needs certain clarification: our barrier
depends on m + n variables, and its Hessian is therefore an (m + n) × (m + n) matrix; how it
could be that we can assemble and invert this matrix at the cost of O(n2 [m + n]) operations,
not at the ”normal” cost O([m + n]3 )?
The estimate for N is given by the following reasoning. Since the barrier is separable, its
Hessian H is the sum of Hessians of the ”partial barriers” Fj (t, x); the latter Hessians, as it
is easily seen, can be computed at the arithmetic cost O(n2 ) and are of very specific form:
the m × m block corresponding to t-variables contains only one nonzero entry (coming from to
∂2 2
∂tj ∂tj ). It follows that H can be computed at the cost O(mn ) and is (m + n) × (m + n) matrix
of the form  
T PT
H= ,
P Q
170 CHAPTER 10. APPLICATIONS IN CONVEX PROGRAMMING

where the m × m block T corresponding to t-variables is diagonal, P is n × m and Q is n × n. It


is immediately seen that the gradient of the barrier can be computed at the cost O(mn). Thus,
the arithmetic cost of assembling the Newton system is O(mn2 ), and the system itself is of the
type
Tu + PTv = p
P u + Qv = q
with m-dimensional vector of unknowns u, n-dimensional vector of unknowns v and diagonal T .
To solve the system, we can express u via v:

u = T −1 [p − P T v]

and substitute this expression in the remaining equations to get a n × n system for u:

[Q − P T −1 P T ]u = q − P T −1 p.

To assemble this latter system it clearly costs O(mn2 ) operations, to solve it - O(n3 ) operations,
and the subsequent computation of u takes O(mn) operations, so that the total arithmetic cost
of assembling and solving the entire Newton system indeed is O([m + n]n2 ).
What should be noticed here is not the particular expression for N , but the general rule which
is illustrated by this expression: the Newton systems which arise in the interior point machinery
normally possess nontrivial structure, and a reasonable solver should use this structure in order
to reduce the arithmetic cost of Newton steps.

10.4 Geometrical Programming


The problem of interest is
X X
minimize f0 (x) = ci0 exp{aTi x} s.t. fj (x) = cij exp{aTi x} ≤ dj , j = 1, ..., m. (10.17)
i∈I0 i∈Ij

Here x ∈ Rn , Ij are subsets of the index set I = {1, ..., k} and all coefficients cij are positive,
j = 1, ..., m.
Note that in the standard formulation of a Geometrical Programming program the objective
and the constraints are sums, with nonnegative coefficients, of ”monomials” ξ1α1 ...ξnαn , ξi being
the design variables (which are restricted to be positive); the exponential form (10.17) is obtained
from the ”monomial” one by passing from ξi to the new variables xi = ln ξi .
Here it again is difficult to compute the Legendre transformation of the barrier associated
with the conic reformulation of the problem, so that we restrict ourselves with the Path-following
approach only.
Standard reformulation: to get it, we introduce k additional variables ti , one per each of the
exponents exp{aTi x} involved into the problem, and rewrite (10.17) in the following equivalent
form: X
minimize ci0 ti s.t. (t, x) ∈ G, (10.18)
i∈I0

with
X
G = {(t, x) ∈ Rk × Rn | cij tj ≤ dj , j = 1, ..., m; exp{aTi x} ≤ ti , i = 1, ..., k}.
i∈Ij
10.4. GEOMETRICAL PROGRAMMING 171

Barrier. The feasible domain G of the resulting standard problem is given by a number of linear
constraints and a number of exponential inequalities exp{aTi x} ≤ ti . We know how to penalize
the feasible set of a linear constraint, and there is no difficulty in penalizing the feasible set of
an exponential inequality, since this set is inverse image of the epigraph
{(τ, ξ) | τ ≥ exp{ξ}}
under an affine mapping.
Now, a 2-self-concordant barrier for the epigraph of the exponent, namely, the function
Ψ(τ, ξ) = − ln(ln τ − ξ) − ln τ
was found in Lecture 9 (Example 9.2.3). Consequently, the barrier for the feasible set G is
 
k
X m
X X    
t
F (t, x) = Ψ(ti , aTi x) − ln dj − cij tj  = Φ π +p ,
i=1 j=1
x
i∈Ij

where
k
X m
X
Φ(τ1 , ξ1 , ..., τk , ξk ; τk+1 , τk+2 , ..., τk+m ) = Ψ(τi , ξi ) − ln τk+j
i=1 j=1
 
t
is self-concordant barrier with parameter 2k + m and the affine substitution π + p is given
x
by  
τ1 = t 1
 ξ1 = aT1 x 
 
 
 ... 
 
   τ k = t k 
t  
π 
+p= ξk = ak xT .
x P 
 τ 
 k+1 = d1 − Pi∈I1 ci1 ti 
 
 τk+2 = d2 − i∈I2 ci2 ti 
 
 ... 
P
τk+m = dm − i∈Im cim ti

Structural assumption. To demonstrate that the indicated barrier satisfies the Structural as-
sumption, it suffices to point out the Legendre transformation of Φ; since this latter barrier is
the direct sum of k copies of the barrier
Ψ(τ, ξ) = − ln(ln τ − ξ) − ln τ
and m copies of the barrier
ψ(τ ) = − ln τ,
the Legendre transformation of Φ is the direct sum of the indicated number of copies of the
Legendre transformations of Ψ and ψ. The latter transformations can be computed explicitly:
 
∗ η+1
Ψ (σ, η) = (η + 1) ln − η − ln η − 2, Dom Ψ∗ = {σ < 0, η > 0},
−σ
ψ ∗ (σ) = − ln(−σ) − 1, Dom ψ ∗ = {σ < 0}.
Thus, we can solve Geometrical programming problems by both the basic and the long-step
path-following methods.
Complexity of the path-following method associated with the aforementioned barrier is given by
ϑ = 2k + m; N = O(mk 2 + k 3 + n3 ); C = O((k + m)1/2 [mk 2 + k 3 + n3 ]).
172 CHAPTER 10. APPLICATIONS IN CONVEX PROGRAMMING

10.5 Exercises on applications of interior point methods


The below problems deal with a topic from Computational Geometry - with computing extremal
ellipsoids related to convex sets.
There are two basic problems on extremal ellipsoids:
(Inner): given a solid Q ⊂ Rn (a closed and bounded convex domain with a nonempty interior),
find the ellipsoid of the maximum volume contained in Q.
(Outer): given a solid Q ⊂ Rn , find the ellipsoid of the minimum volume containing Q.
Let us first explain where the problems come from.
I know exactly one source of problem (Inner) - the Inscribed Ellipsoid method InsEll for
general convex optimization. This is an algorithm for solving problems of the type

minimize f (x) s.t. x ∈ Q,

where Q is a polytope in Rn and f is convex function. The InsEll, which can be regarded as a
multidimensional extension of the usual bisection, generates a decreasing sequence of polytopes
Qi which cover the optimal set of the problem; these localizers are defined as

Q0 = Q; Qi+1 = {x ∈ Qi | (x − xi )T f 0 (xi ) ≤ 0},

where xi is the center of the maximum volume ellipsoid inscribed into Qi .


It can be proved that in this method the inaccuracy f (xi ) − minQ f of the best (with the
smallest value of f ) among the search points x1 , ..., xi admits the upper bound

i
f (xi ) − min f ≤ exp{−κ }[max f − min f ],
Q n Q Q

κ > 0 being an absolute constant; it is known also that the indicated rate of convergence is the
best, in certain rigorous sense, rate a convex minimization method can achieve, so that InsEll
is optimal. And to run the method, you should solve at each step an auxiliary problem of the
type (Inner) related to a polytope Q given by list of linear inequalities defining the polytope.
As for problem (Outer), the applications known to me come from Control. Consider a
discrete time linear controlled plant given by

x(t + 1) = Ax(t) + Bu(t), t = 0, 1, ...,

where x(t) ∈ Rn and u(t) ∈ Rk are the state of the plant and the control at moment t and A,
B are given n × n and n × k matrices, A being nonsingular. Assume that u(·) can take values
in a polytope U ⊂ Rk given as a convex hull of finitely many points u1 , ..., um :

U = Conv{u1 , ..., um }.

Let the initial state of the plant be known, say, be zero. The question is: what is the set XT of
possible states of the plant at a given moment T ?
This is a difficult question which, in the multi-dimensional case, normally cannot be answered
in a ”closed analytic form”. One of the ways to get certain numerical information here is to
compute outer ellipsoidal approximations of the sets Xt , t = 0, ..., T - ellipsoids Et which cover
the sets Xt . The advantage of this approach is that these approximations are of once for ever fixed
”tractable” geometry, in contrast to the sets Xt which may become more and more complicated
10.5. EXERCISES ON APPLICATIONS OF INTERIOR POINT METHODS 173

as t grows. There is an evident possibility to form Et ’s in a recurrent way: indeed, if we already


know that Xt belongs to a known ellipsoid Et , then the set Xt+1 for sure belongs to the set
bt = AEt + BU.
E
bt is nothing but the convex hull Qt+1 of the
Since U is the convex hull of u1 , ..., um , the set E
i
union of Et , i = 1, ..., m. Thus, a convex set contains E bt if and only if it contains Qt+1 .
Now, it is, of course, reasonable to look for ”tight” approximations, i.e., to choose Et+1 as
close as possible to the set Qt+1 (unfortunately, Qt+1 usually is not an ellipsoid, so that in any
case Et+1 will be redundant). A convenient integral measure of the quality of outer approxi-
mation is the volume of the approximating set - the less it is, the better is the approximation.
Thus, to approximate the sets Xt , we should solve a sequence of problems (Outer) with Q given
as the convex hull of a union of ellipsoids.

10.5.1 (Inner) and (Outer) as convex programs


Problems (Inner) and (Outer) can be reformulated as convex programs. To this end recall that
there are two basic ways to describe an ellipsoid

• an ellipsoid W ⊂ Rn is the image of the unit Euclidean ball under a one-to-one affine
mapping of Rn onto itself:

(I) W = I(x, X) ≡ {y = x + Xu | uT u ≤ 1};


here x ∈ Rn is the center of the ellipsoid and X is a nonsingular n × n matrix. This matrix is
defined uniquely up to multiplication from the right by an orthogonal matrix; under appropriate
choice of this orthogonal ”scale factor” we may make X to be symmetric positive definite, and
from now on our convention is that the matrix X involved into (I) is symmetric positive definite.
Thus, (I) allows to parameterize n-dimensional ellipsoids by the pairs (x, X), with x ∈ Rn and
X being n × n positive definite symmetric matrix.
It is worthy to recall that the volume of ellipsoid (I) is κn Det X, κn being the volume of the
n-dimensional Euclidean ball.

• an ellipsoid W is the set given by strictly convex quadratic inequality:

(II) W = E(r, x, X) ≡ {u | uT Xu + 2xT u + r ≤ 0};


here X is a positive definite symmetric n × n matrix, x ∈ Rn and r ∈ R. The above relation
can be equivalently rewritten as

W = {u | (u + X −1 x)T X(u + X −1 x) + r − xT X −1 x ≤ 0;

thus, it indeed defines an ellipsoid if and only if

δ(r, x, X) ≡ xT X −1 x − r > 0.

The representation of W via r, x, X is not unique (proportional triples define the same ellipsoid).
Therefore we always can enforce the quantity δ to be ≤ 1, and in what follows this is our default
convention on the parameterization in question.
It is clearly seen that the volume of the ellipsoid E(r, x, X) is nothing but

κn δ n/2 (r, x, X)Det −1/2 X.


174 CHAPTER 10. APPLICATIONS IN CONVEX PROGRAMMING

Now let us look at problem (Inner). From the above discussion we see that it can be written
down as
(Inner’) minimize F (X) = − ln Det X s.t. (x, X) ∈ GI ,
with
GI = {(x, X) | X ∈ Sn+ , I(x, X) ⊂ Q};
here Sn+ is the cone of positive semidefinite matrices in the space Sn of symmetric n×n matrices.

To get (Inner’), we have passed from the problem of maximizing

Voln (I(x, X)) = κn Det X

to the equivalent problem of minimizing − ln Det X.

Exercise 10.5.1 Prove that (Inner’) is a convex program: its feasible domain GI is closed
and bounded convex set with a nonempty interior in the space Rn × Sn , and the objective is a
continuous convex function (taking values in R ∪ {+∞}) on GI and finite on the interior of the
domain GI .

Similarly, (Outer) also can be posed as a convex program


(Outer’) minimize − ln Det X s.t. (r, x, X) ∈ GO = cl G0 ,

G0 = {(r, x, X) ∈ R × Rn × int Sn+ | δ(r, x, X) ≤ 1, E(r, x, X) ⊃ Q}.

Exercise 10.5.2 + Prove that (Outer’) is a convex programming program: GO is closed convex
domain, and F is continuous convex function on GO taking values in R ∪ {+∞} and finite on
int GO . Prove that the problem is equivalent to (Outer).

Thus, both (Inner) and (Outer) can be reformulated as convex programs. This does not,
anyhow, mean that the problems are computationally tractable. Indeed, the minimal ”well
posedness” requirement on a convex problem which allows to speak about it numerical solution
is as follows:
(!) given a candidate solution to the problem, you should be able to check whether the
solution is feasible, and if it is the case, you should be able to compute the value of the objective
at this solution5 .
Whether (!) is satisfied or not for problems (Inner) and (Outer), it depends on what is the
set Q and how it is represented; and, as we shall see in a while, ”well posed” cases for one of
our problems could be ”ill posed” for another. Note that ”well posedness” for (Inner) means a
possibility, given an ellipsoid W to check whether W is contained in Q; for (Outer) you should
be able to check whether W contains Q.
Consider a couple of examples.

• Q is a polytope given ”by facets”, more exactly, by a list of linear inequalities (not all of
them should represent facets, some may be redundant).
This leads to well-posed (Inner) (indeed, to check whether W is contained in Q, i.e., in
the intersection of a given finite family of half-spaces, is the same as to check whether W
is contained in each of the half-spaces, and this is immediate). In contrast to this, in the
5
to apply interior point methods, you need, of course, much stronger assumptions: you should be able to point
out a ”computable” self-concordant barrier for the feasible set
10.5. EXERCISES ON APPLICATIONS OF INTERIOR POINT METHODS 175

case in question (Outer) is ill-posed: to check whether, say, a Euclidean ball W contains
a polytope given by a list of linear inequalities is, basically, the same as to maximize a
convex quadratic form (namely, |x|22 ) under linear inequality constraints, and this is an
NP-hard problem.

• Q is a polytope given ”by vertices”, i.e., represented as a convex hull of a given finite set
S.
Here (Outer) is well-posed (indeed, W contains Q if and only if it contains S, which can
be immediately verified), and (Inner) is ill-posed (it is NP-hard).

As we shall see in a while, in the case of a polytope Q our problems can be efficiently solved
by interior point machinery, provided that they are well-posed.

10.5.2 Problem (Inner), polyhedral case


In this section we assume that

Q = {x | aTj x ≤ bj , j = 1, ..., m}

is a polytope in Rn given by m linear inequalities.

Exercise 10.5.3 Prove that in the case in question problem (Inner) can be equivalently formu-
lated as follows:

(Inner Lin) minimize t s.t. (t, x, X) ∈ G,


with
G = {(t, x, X) | |Xaj |2 ≤ bj − aTj x, j = 1, ..., m; X ∈ Sn+ ; − ln Det X ≤ t}.

To solve (Inner Lin) by interior point machinery, we need self-concordant barrier for the feasible
set of the problem. This set is given by a number of constraints, and in our ”barrier toolbox”
we have self-concordant barriers for the feasible sets of all of these constraints, except the latter
of them. This shortcoming, anyhow, can be immediately overcome.

Exercise 10.5.4 ∗ Prove that the function

Φ(t, X) = − ln(t + ln det X) − ln Det X

is (n + 1)-self-concordant barrier for the epigraph

cl{(t, X) ∈ R × int Sn+ | t + ln Det X ≥ 0}

of the function − ln Det X. Derive from this observation that the function
m
X
F (t, x, X) = − ln([bj − aTj x]2 − aTj X T Xaj ) − ln(t + ln Det X) − ln Det X
j=1

is (2m + n + 1)-self-concordant barrier for the feasible domain G of problem (Inner Lin). What
are the complexity characteristic of the path-following method associated with this barrier?
176 CHAPTER 10. APPLICATIONS IN CONVEX PROGRAMMING

10.5.3 Problem (Outer), polyhedral case


Now consider problem (Outer) with the set Q given by
m
X X
Q={ λj aj | λ ≥ 0 λj = 1}.
j=1 j

Exercise 10.5.5 Prove that in the case in question problem (Outer’) becomes the problem
(Outer Lin) minimize t s.t. (t, r, x, X) ∈ G,
with
G = {(t, r, x, X) |

aTj Xaj + 2xT aj + r ≤ 0, j = 1, ..., m; X ∈ Sn+ ; − ln Det X ≤ t; δ(r, x, X) ≤ 1}.

Prove+ that the function


m
X
F (t, r, x, X) = − ln(−aTj Xaj − 2xT aj − r)−
j=1

− ln(1 + r − xT X −1 x) − ln(t + ln Det X) − 2 ln Det X

is (m + 2n + 2)-self-concordant barrier for G. What are the complexity characteristics of the


path-following method associated with this barrier?

10.5.4 Problem (Outer), ellipsoidal case


The polyhedral versions of problems (Inner) and (Outer) considered so far are, in a sense,
particular cases of ”ellipsoidal” versions, where Q is an intersection of a finite family of ellipsoids
(problem (Inner)) or convex hull of a finite number of ellipsoids (problem (Outer); recall that our
motivation of this latter problem leads to the ”ellipsoidal” version of it). Indeed, the polyhedral
(Inner) relates to the case when Q is an intersection of a finite family of half-spaces, and a
half-space is nothing but a ”very large” ellipsoid. Similarly, polyhedral (Outer) relates to the
case when Q is a convex hull of finitely many points, and a point is nothing but a ”very small”
ellipsoid. What we are about to do is to develop polynomial time methods for the ellipsoidal
version of (Outer). The basic question of well-posedness here reads as follows:
(?) Given two ellipsoids, define whether the second of them contains the first one
This question can be efficiently answered, and the nontrivial observation underlying this answer
is, I think, more important than the question itself.
We shall consider (?) in the situation where the first ellipsoid is given as E(r, x, X), and the
second one - as E(s, y, Y ). Let us start with equivalent reformulation of the question.
The ellipsoid E(r, x, X) is contained in E(s, y, Y ) if and only if every solution u to the
inequality
uT Xu + 2xT u + r ≤ 0

satisfies the inequality


uT Y u + 2y T u + s ≤ 0.

Substituting u = v/t, we can reformulate this as follows:


10.5. EXERCISES ON APPLICATIONS OF INTERIOR POINT METHODS 177

E(r, x, X) ⊂ E(s, y, Y ) if and only if from the inequality

v T Xv + 2txT v + rt2 ≤ 0

and from t 6= 0 it always follows that

v T Y v + 2ty T v + st2 ≤ 0.

In fact we can omit here ”t 6= 0”, since for t = 0 the first inequality can be valid only when
v = 0 (recall that X is positive definite), and the second inequality then also is valid. Thus, we
come to the conclusion as follows:

E(r, x, X) ⊂ E(s, y, Y ) if and only if the following implication is valid:

wT Sw ≤ 0 ⇒ wT Rw ≤ 0,

where    
X x Y y
S= , R= .
xT r yT s

We have reduced (?) to the following question


(??) given two symmetric matrices R and S of the same size, detect whether all directions w
where the quadratic form wT Sw is nonpositive are also the directions where the quadratic form
wT Rw is nonpositive:
(Impl) wT Sw ≤ 0 ⇒ wT Rw ≤ 0.

In fact we can say something additional about the quadratic forms S and R we actually are
interested in:
(*) in the case of matrices coming from ellipsoids there is a direction w with negative wT Sw,
and there is a direction w0 with positive (w0 )T Rw0 .

Exercise 10.5.6 + Prove (*).

Now, there is an evident sufficient condition which allos to give a positive answer to (??): if
R ≤ λS with some nonnegative λ, then, of course, (Impl) is valid. It is a kind of miracle that
this sufficient condition is also necessary, provided that wT Sw < 0 for some w:

Exercise 10.5.7 ∗ Prove that if S and R are symmetric matrices of the same size such that
the implication (Impl) is valid and S is such that wT Sw < 0 for some w, then there exists
nonnegative λ such that
R ≤ λS;

if, in addition, (w0 )T Rw0 > 0 for some w0 , then the above λ is positive.
Conclude from the above, that if S and R are symmetric matrices of the same size such that
wST SwS < 0 for some wS and wR T Rw > 0 for some w , then implication (Impl) is valid if and
R R
only if
R ≤ λS

for some positive λ.


178 CHAPTER 10. APPLICATIONS IN CONVEX PROGRAMMING

It is worthy to explain why the statement given in the latter exercise is so amazing. (Impl)
says exactly that the quadratic form f1 (w) = −wT Rw is nonnegative whenever the quadratic
form f2 (w) = wT Sw is nonpositive, or, in other words, that the function

f (w) = max{f1 (w), f2 (w)}

is nonegative everywhere and attains therefore its minimum at w = 0. If the functions f1 and f2
were convex, we could conclude from this that certain convex combination µf1 (w) + (1− µ)f2 (w)
of these functions also attains its minimum at w = 0, so that −µR + (1 − µ)S is positive
semidefinite; the conclusion is exactly what is said by our statement (it says also that µ > 0,
so that the matrix inequality can be rewritten as R ≤ λS with λ = (1 − µ)µ−1 ; this additional
information is readily given by the assumption that wT Sw < 0 and causes no surprise). Thus,
the conclusion is the same as in the situation of convex f1 and f2 ; but we did not assume the
functions to be convex! Needless to say, the ”statement” of the type
max{f1 , f2 } ≥ 0 everywhere ⇒ ∃µ ∈ [0, 1] : µf1 + (1 − µ)f2 ≥ 0 everywhere

fails to be true for arbitrary f1 and f2 , but, as we have seen, it is true for homogeneous quadratic
forms. Let me add that the implication

max{wT S1 w, ..., wT Sk w} ≥ 0 ∀w ⇒ certain convex combination of Si is ≥ 0

is valid only for k = 2.


Now we are ready to apply interior point machinery to the ellipsoidal version of (Outer).
Consider problem (Outer) with Q given as the convex hull of ellipsoids E(pi , ai , Ai ), i =
1, ..., m. An ellipsoid E(r, x, X) is a convex set; therefore it contains the convex hull Q of our
ellipsoids if and only if it contains each of the ellipsoids. As we know from Exercise 10.5.7 and
(*), the latter is equivalent to existence of m positive reals λ1 , ..., λm such that
 
X x
R(r, x, X) ≡ ≤ λi Si ,
xT r

where Si = R(pi , ai , Ai ).

Exercise 10.5.8 Prove that in the case in question problem (Outer) can be equivalently formu-
lated as the following convex program:
(Outer Ell) minimize t s.t. (t, r, x, X, λ) ∈ G,
where
G = cl{(t, r, x, X, λ) |
X ∈ int Sn+ , t + ln Det X ≥ 0, δ(r, x, X) ≤ 1, R(r, x, X) ≤ λi Si , i = 1, ..., m}.
Prove that the function

F (t, r, x, X, λ) = − ln(t + ln Det X) − 2 ln Det X − ln(1 + r − xT X −1 x)−


m
X
− ln Det (λi Si − R(r, x, X))
i=1

is ([m + 2]n + 2)-self-concordant barrier for the feasible domain G of the problem. What are the
complexity characteristics of the path-following method associated with this barrier?
Chapter 11

Semidefinite Programming

This concluding lecture is devoted to an extremely interesting and important class of convex
programs - the so called Semidefinite Programming.

11.1 A Semidefinite program


The canonical form of a semidefinite program is as follows:
(SD) minimize linear objective cT x of x ∈ Rn under Linear Matrix Inequality constraints

Aj (x) ≥ 0, j = 1, ..., M,

where Aj (x) are symmetric matrices affinely depending on x (i.e., each entry of Aj (·) is an affine
function of x), and A ≥ 0 for a symmetric matrix A stands for ”A is positive semidefinite”.
Note that a system of m Linear Matrix Inequality constraints (LMI’s) Aj (x) ≥ 0, j = 1, ..., M ,
is equivalent to a single LMI
 
A1 (x)
 A2 (x) 
A(x) ≥ 0, A(x) = Diag{A1 (x), ..., AM (x)} = 



... ... ... ...
AM (x)
(blank space corresponds to zero blocks). Further, an affine in x matrix-valued function A(x)
can be represented as
n
X
A(x) = A0 + xi Ai ,
i=1
A0 ,...,An being fixed matrices of the same size; thus, in a semidefinite program we should
minimize a linear form of x1 , ..., xn provided that a linear combination of given matrices Ai with
the coefficients xi plus the constant term A0 is positive semidefinite.
The indicated problem seems to be rather artificial. Let me start with indicating several
examples of important problems covered by Semidefinite Programming.

11.2 Semidefinite Programming: examples


11.2.1 Linear Programming
Linear Programming problem

minimize cT x s.t. aTj x ≤ bj , j = 1, ..., M

179
180 CHAPTER 11. SEMIDEFINITE PROGRAMMING

is a very particular semidefinite program: the corresponding matrix A(x) is M × M diagonal


matrix with the diagonal entries bj − aTj x (indeed, a diagonal matrix is positive semidefinite if
and only if its diagonal entries are nonnegative, so that A(x) ≥ 0 if and only if x is feasible in
the initial LP problem).

11.2.2 Quadratically Constrained Quadratic Programming


A convex quadratic constraint

f (x) ≡ xT B T Bx + bT x + c ≤ 0,

B being k × n matrix, can be expressed in terms of positive semidefiniteness of certain affine in


x (k + 1) × (k + 1) symmetric matrix Af (x), namely, the matrix
 
−c − bT x [Bx]T
Af (x) = .
Bx I

Indeed, it is immediately seen that a symmetric matrix


 
P RT
A=
R Q

with positive definite block Q is positive semidefinite if and only if the matrix P − RT Q−1 R is
positive semidefinite1 ; thus, Af (x) is positive semidefinite if and only if −c − bT x ≥ xT B T Bx,
i.e., if and only if f (x) ≤ 0.
Thus, a convex quadratic constraint can be equivalently represented by an LMI; it follows
that a convex quadratic quadratically constrained problem can be resresented as a problem of
optimization under LMI constraints, i.e., as a semidefinite program.
The outlined examples are not that convincing: there are direct ways to deal with LP and
QCQP, and it hardly makes sense to reduce these problems to evidently more complicated
semidefinite programs. In the forthcoming examples LMI constraints come from the nature of
the problem in question.

11.2.3 Minimization of Largest Eigenvalue and Lovasz Capacity of a graph


The Linear Eigenvalue problem is to find x which minimizes the maximum eigenvalue of sym-
metric matrix B(x) affinely depending on the design vector x (there are also nonlinear versions
of the problem, but I am not speaking about them). This is a traditional area of Convex
Optimization; the problem can be immediately reformulated as the semidefinite program

minimize λ s.t. A(λ, x) = λI − B(x) ≥ 0.

As an application of the Eigenvalue problem, let us look at computation of the Lovasz capacity
number of a graph. Consider a graph Γ with the set of vertices V and set of arcs E. One of
the fundamental characteristics of the graph is its inner stability number α(Γ) - the maximum
cardinality of an independent subset of vertices (a subset is called independent, if no two vertices
in it are linked by an arc). To compute α(Γ), this is an NP-hard problem.
There is another interesting characteristic of a graph - the Shannon capacity number σ(Γ)
defined as follows. Let us interpret the vertices of Γ as letters of certain alphabet. Assume that
1
to verify this statement, note that the minimum of the quadratic form v T P v + 2v T RT u + uT Qu with respect
to u is given by u = −Q−1 Rv, and the corresponding minimum value is v T P v − v T RT Q−1 Rv; A is positive
semidefinite if and only if this latter quantity is ≥ 0 for all v
11.2. SEMIDEFINITE PROGRAMMING: EXAMPLES 181

we are transmitting words comprised of these letters via an unreliable communication channel;
unreliability of the channel is described by the arcs of the graph, namely, letter i on input can
become letter j on output if and only if i and j are linked by an arc in the graph. Now, what is
the maximum number sk of k-letter words which you can send through the channel without risk
that one of the words will be converted to another? When k = 1, the answer is clear - exactly
α(Γ); you can use, as these words, letters from (any) maximal independent set V ∗ of vertices.
Now, sk ≥ sk1 - the words comprised of letters which cannot be ”mixed” also cannot be mixed.
In fact sk can be greater than sk1 , as it is seen from simple examples. E.g., if Γ is the 5-letter
graph-pentagon, then s1 = α(Γ) = 2, but s2 = 5 > 4 (you can draw the 25 2-letter words in
our alphabet and find 5 of them which cannot be mixed). Similarly to the inequality sk ≥ sk1 ,
you can prove that sp×q ≥ sqp (consider sp p-letter words which cannot be mixed as your new
alphabet and note that the words comprised of these q ”macro-letters” also cannot be mixed).
From the relation sp×q ≥ sqp (combined with the evident relation sp ≤ |V |p ) it follows that there
exists
σ(Γ) = lim s1/pp = sup s1/p
p ;
p→∞ p

1/p
this limit is exactly the Shannon capacity number. Since σ(Γ) ≥ sp for every p, and, in
particular, for p = 1, we have
σ(Γ) ≥ α(Γ);
√ √
for the above 5-letter graph we also have σ(Γ) ≥ s2 = 5.
The Shannon capacity number is an upper bound for the inner stability number, which is a
good news; a bad news is that σ(Γ) is even less computationally tractable than α(Γ). E.g., for
more than √ 20 years nobody knew whether the Shannon capacity of the above 5-letter graph is
equal to 5 or is greater than this quantity.
In 1979, Lovasz introduced a ”computable” upper bound for σ(Γ) (and, consequently, for
α(Γ)) - the Lovasz capacity number θ(Γ) which is defined as follows: let N be the number of
vertices in the graph, and let the vertices be numbered by 1,...,N . Let us associate with each arc
γ in the graph its own variable xγ , and let B(x) be the following symmetric matrix depending
on the collection x of these variables: Bij (x) is 1, if either i = j, or the vertices i and j are not
adjacent; if the vertices are linked by arc γ, then Bij (x) = xγ . For the above 5-letter graph,
e.g.,  
1 x12 1 1 x51
x 1 x23 1 1 
 12 
 
B(x) =  1 x23 1 x34 1  .
 
 1 1 x34 1 x45 
x51 1 1 x45 1
Now, by definition the Lovasz capacity number is the minimum, over all x’s, of the maximum
eigenvalue of the matrix B(x). Lovasz has proved that his capacity number is an upper bound
for the Shannon capacity number and the inner stability number:

θ(Γ) ≥ σ(Γ) ≥ α(Γ).

Thus, Lovasz capacity number (which can be computed via solving a semidefinite program)
gives important information on the fundamental combinatorial characteristic of a graph. √In
many cases the information
√ is complete, as it happens in our example, where√θ(Γ) = 5;
consequently, σ(Γ) = 5, since we know that for the graph in question σ(Γ) ≥ 5; and since
α(Γ) is integer, we can rewrite the Lovasz inequality as α(Γ) ≤ bθ(Γ)c and get for our example
the correct answer α(Γ) = 2.
182 CHAPTER 11. SEMIDEFINITE PROGRAMMING

11.2.4 Dual bounds in Boolean Programming


Consider another application of semidefinite programming in combinatorics. Assume that you
should solve a Boolean Programming problem
k
X k
X
minimize dj uj s.t. pij uj = qi , i = 1, ..., n, uj ∈ {0; 1}.
j=1 j=1

One of the standard ways to solve the problem is to use the branch-and-bound scheme, and for
this scheme it is crucial to generate lower bounds for the optimal value in the subproblems arising
in course of running the method. These subproblems are of the same structure as the initial
problem, so that we may think of how to bound from below the optimal value in the problem. The
traditional way here is to pass from the Boolean problem to its Linear Programming relaxation
by replacing the Boolean restrictions uj ∈ {0; 1} with linear inequalities 0 ≤ uj ≤ 1. Some years
ago Shor suggested to use nonlinear relaxation which is as follows. We can rewrite the Boolean
constraints equivalently as quadratic equalities

uj (1 − uj ) = 0, j = 1, ..., k;

further, we can add to our initial linear equations their quadratic implications like
k
X k
X
[qi − pij uj ][qi0 − pi0 j uj ] = 0, i, i0 = 1, ..., n.
j=1 j=1

thus, we can equivalently rewrite our problem as a problem of continuous optimization with
linear objective and quadratic equality constraints

minimize dT u s.t. Ki (u) = 0, i = 1, ..., N, (11.1)

where all Ki are quadratic forms. Let us form the Lagrange function
N
X
L(u, x) = dT u + xi Ki (u) = uT A(x)u + 2bT (x)u + c(x),
i=1

where A(x), b(x), c(x) clearly are affine functions of the vector x of Lagrange multipliers. Now
let us pass to the ”dual” problem

maximize f (x) ≡ inf L(u, x). (11.2)


u

If our primal problem (11.1) were convex, the optimal value c∗ in the dual, under mild regularity
assumptions, would be the same as the optimal value in the primal problem; our situation has
nothing in common with convexity, so that we should not hope that c∗ is the optimal value in
(11.1); anyhow, independently of any convexity assumptions c∗ is a lower bound for the primal
optimal value2 ; this is the bound suggested by Shor.
Let us look how to compute Shor’s bound. We have

f (x) = inf {uT A(x)u + 2bT (x)u + c(x)},


u
2
the proof is immediate: if u is primal feasible, then, for any x, L(x, u) = dT u (since Ki (u) = 0) and therefore
f (x) ≤ dT u; consequently, c∗ = supx f (x) ≤ dT u. Since the latter inequality is valid for all primal feasible u, c∗
is ≤ the primal optimal value, as claimed
11.2. SEMIDEFINITE PROGRAMMING: EXAMPLES 183

so that f (x) is the largest real f for which the quadratic form of u

uT A(x)u + 2bT (x)u + [c(x) − f ]

is nonnegative for all u; substituting u = t−1 v, we see that the latter quadratic form of u is
nonnegative for all u if and only if the homogeneous quadratic form of v, t

v T A(x)v + 2bT (x)vt + [c(x) − f ]t2

is nonnegative whenever t 6= 0. By continuity reasons the resulting form is nonnegative for all
v, t with t 6= 0 if and only if it is nonnegative for all v, t, i.e., if and only if the matrix
 
c(x) − f bT (x)
A(f, x) =
b(x) A(x)

is positive semidefinite. Thus, f (x) is the largest f for which the matrix A(f, x) is positive
semidefinite; consequently, the quantity supx f (x) we are interested in is nothing but the optimal
value in the following semidefinite program:

maximize f s.t. A(f, x) ≥ 0.

It can be easily seen that the lower bound c∗ given by Shor’s relaxation is not worse than that
one given by the usual LP relaxation. Normally the ”semidefinite” bound is better, as it is the
case, e.g., in the following toy problem

40x1 + 90x2 + 28x3 + 22x4 → min


30x1 + 27x2 + 11x3 + 33x4 = 41
28x1 + 2x2 + 46x3 + 46x4 = 74

x1 , x2 , x3 , x4 = 0, 1
with optimal value 68 (x∗1 = x∗3 = 1, x∗2 = x∗4 = 0); here Shor’s bound is 43, and the LP-based
bound is 40.

11.2.5 Problems arising in Control


An extremely powerful source of semidefinite problems is modern Control; there are tens of
problems which are naturally formulated as semidefinite programs. Let me present two generic
examples.
Proving Stability via Quadratic Lyapunov function3 . Consider a polytopic differential inclusion

x0 (t) ∈ Q(x(t)), (11.3)

where
Q(x) = Conv{Q1 x, ..., QM x},
Qi being k × k matrices. Thus, every vector x ∈ Rk is associated with the polytope Q(x),
and the trajectories of the inclusion are differentiable functions x(t) such that their derivatives
x0 (t) belong, for any t, to the polytope Q(x(t)). When M = 1, we come to the usual linear
time-invariant system
x0 (t) = Q1 x(t).
3
this example was the subject of exercises to Lecture 7, see Section 7.6.1
184 CHAPTER 11. SEMIDEFINITE PROGRAMMING

The general case M > 1 allows to model time-varying systems with uncertainty; indeed, a
trajectory of the inclusion is the solution to the time-varying equation

x0 (t) = A(t)x(t), A(t) ∈ Conv{Q1 , ..., QM },

and the trajectory of any time-varying equation of this type clearly is a trajectory of the inclusion.
One of the most fundamental questions about a dynamic system is its stability: what happens
with the trajectories as t → ∞ - do they tend to 0 (this is the stability), or remain bounded,
or some of them go to infinity. A natural way to prove stability is to point out a quadratic
Lyapunov function f (x) = xT Lx, L being positive definite symmetric matrix, which ”proves the
decay rate α of the system”, i.e., satisfies, for some α, the inequality
d
f (x(t)) ≤ −αf (x(t))
dt
along all trajectories x(t) of the inclusion. From this differential inequality it immediately follows
that
f (x(t)) ≤ f (x(0)) exp{−αt};
if α > 0, this proves stability (the trajectories approach the origin at a known exponential rate);
if α = 0, the trajectories remain bounded; if α < 0, we do not know whether the system is
stable, but we have certain upper bound for the rate at which the trajectories may go to infinity.
It is worthy to note that in the case of linear time-invariant system the existence of quadratic
Lyapunov function which ”proves a negative decay rate” is a necessary and sufficient stability
condition (this is stated by the famous Lyapunov Theorem); in the general case M > 1 this
condition is only sufficient, and is not anymore necessary.
Now, where could we take a quadratic Lyapunov function which proves stability? The
derivative of the function xT (t)Lx(t) in t is 2xT (x)Lx0 (t); if L proves the decay rate α, this
quantity should be ≤ −αxT (t)Lx(t) for all trajectories x(·). Now, x(t) can be an arbitrary
point of Rk , and for given x = x(t) the vector x0 (t) can be an arbitrary vector from Q(x).
Thus, L ”proves decay rate α” if and only if it is symmetric positive definite (this is our a priori
restriction on the Lyapunov function) and is such that

2xT Ly ≤ −αxT Lx

for all x and for all y ∈ Q(x); since the required inequality is linear in y, it is valid for all
y ∈ Q(x) if and only if it is valid for y = Qi x, i = 1, ..., M (recall that Q(x) is the convex hull
of the points Qi x). Thus, positive definite symmetric L proves the decay rate α if and only if

xT [LQi + QTi L]x ≡ 2xT LQi x ≤ −αxT Lx

for all x, i.e., if and only if L satisfies the system of Linear Matrix Inequalities

αL + LQi + OiT L ≤ 0, i = 1, ..., M ; L > 0.

Due to homogeneity with respect to L, we can impose on L nonstrict inequality L ≥ I instead


of strict (and therefore inconvenient) inequality L > 0, and thus come to the necessity to solve
the system
L ≥ I; αL + LQi + QTi L ≤ 0, i = 1, ..., M, (11.4)
of Linear Matrix Inequalities, which is a positive semidefinite program with trivial objective.
Feedback synthesis via quadratic Lyapunov function. Now let us pass from differential inclusion
(11.3) to a controlled plant
x0 (t) ∈ Q(x(t), u(t)), (11.5)
11.2. SEMIDEFINITE PROGRAMMING: EXAMPLES 185

where
Q(x, u) = Conv{Q1 x + B1 u, ..., QM x + BM u}

with k × k matrices Qi and k × l matrices Bi . Here x ∈ Rk denotes state of the system and
u ∈ Rl denotes the control. Our goal is to ”close” the system by a linear time-invariant feedback

u(t) = Kx(t),

K being k × l feedback matrix, in a way which ensures stability of the closed-loop system

x0 (t) ∈ Q(x(t), Kx(t)). (11.6)

Here again we can try to achieve our goal via quadratic Lyapunov function xT Lx. Namely,
if, for some given α > 0, we are able to find simultaneously a k × l matrix K and a positive
definite symmetric k × k matrix L in such a way that

d T
(x (t)Lx(t)) ≤ −αxT (t)Lx(t) (11.7)
dt

for all trajectories of (11.6), then we will get both the stabilizing feedback and a sertificate that
it indeed stabilizes the system.
Same as above, (11.7) and the initial requirement that L should be positive definite result
in the system of matrix inequalities

[Qi + Bi K]T L + L[Qi + Bi K] ≤ −αL, i = 1, ..., M ; L > 0; (11.8)

the unknowns in the system are both L and K. The system is not linear in (L, K); nevertheless,
the LMI-based approach still works. Namely, let us perform nonlinear substitution:

(L, K) 7→ (R = L−1 , P = KL−1 ) [L = R−1 , K = P R−1 ].

In the new variables the system becomes

QTi R−1 + R−1 Qi + R−1 P T BiT R−1 + R−1 Bi P R−1 ≤ −αR−1 , i = 1, ..., M ; R > 0,

or, which is the same (multiply by R from the left and from the right)

RQTi + Qi R + P T BiT + Bi P ≤ −αR, i = 1, ..., M ; R > 0.

Due to homogeneity with respect to R, P , we can reduce the latter system to

RQTi + Qi R + P T Bi + Bi P ≤ −αR, i = 1, ..., M ; R ≥ I,

which is a system of LMI’s in variables R, P , or, which is the same, a semidefinite program with
trivial objective.
There are many other examples of semidefinite problems arising in Control (and in other
areas like Structural Design), but I believe that the already indicated examples demonstrate
that Semidefinite Programming possesses a wide variety of important applications.
186 CHAPTER 11. SEMIDEFINITE PROGRAMMING

11.3 Interior point methods for Semidefinite Programming


Semidefinite Programming is a nice field for interior point methods; in fact this family of prob-
lems, due to some intrinsic mathematical properties, is very similar to Linear Programming. Let
us look how the interior point methods can be applied to a semidefinite program

minimize cT x s.t. x ∈ G = {x ∈ Rn | A(x) ≥ 0}, (11.9)

A(x) being m × m symmetric matrix affinely depending on x ∈ Rn :


n
X
A(x) = A0 + xi Ai .
i=1

It is reasonable to assume that A(·) possesses certain structure, namely, that it is is block-
diagonal matrix with certain number M of diagonal blocks, and the blocks are of the row sizes
m1 , ..., mM . Indeed, normally A(·) represents a system of LMI’s rather than a single LMI; and
when assembling system of LMI’s

Ai (x) ≥ 0, i = 1, ..., M

into a single LMI


A(x) = Diag{A1 (x), ..., AM (x)} ≥ 0,
we get block-diagonal A. Note also that the ”unstructured” case (A(·) has no nontrivial block-
diagonal structure, as, e.g., in the problem associated with the Lovasz capacity number) is also
covered by our assumption (it corresponds to M = 1, m1 = m).
Path-following approach is immediate:
Standard reformulation of the problem: problem from the very beginning is in the standard
form.
Barrier: by definition, the feasible set of the problem is the inverse image of the cone Sµ+ of all
positive semidefinite symmetric m×m matrices belonging to the space Sµ of symmetric matrices
of the block-diagonal structure
µ = (m1 , ..., mM )
(M diagonal blocks of the sizes m1 , ..., mM ) under the mapping

x 7→ A(x) : Rn → Sµ .

Due to our standard combination rules, the function

Φ(X) = − ln Det X : int Sµ+ → R

is m-logarithmically homogeneous self-concordant barrier for the cone Sµ+ ; by construction, G


is the inverse image of the cone under the affine mapping

x 7→ A(x),

so that the function


F (x) = Φ(A(x))
is a m-self-concordant barrier for G.
11.3. INTERIOR POINT METHODS FOR SEMIDEFINITE PROGRAMMING 187

Structural assumption is satisfied simply by the origin of the barrier F : it comes from the m-
logarithmically homogeneous self-concordant barrier Φ for Sµ+ , and the latter barrier possesses
the explicit Legendre transformation

Φ∗ (S) = Φ(−S) − m.

Complexity. The only complexity characteristic which needs special investigation is the arith-
metic cost N of a Newton step. Let us look what is, computationally, this step. First of all, a
straigtforward computation results in the following expressions for the derivatives of the barrier
Φ:
DΦ(X)[H] = − Tr{X −1 H}; D2 Φ(X)[H, H] = Tr{X −1 HX −1 H}.
Therefore the derivatives of the barrier F (x) = Φ(A(x)) are given by the relations


F (x) = − Tr{A−1 (x)Ai }
∂xi
Pn
(recall that A(x) = A0 + i=1 xi Ai ),

∂2
F (x) = Tr{A−1 (x)Ai A−1 (X)Aj }.
∂xi ∂xj

We see that in order to assemble the Newton system

F 00 (x)y = −tc − F 0 (x)

we should perform computations as follows (the expressions in brackets {·} represent the arith-
metic cost of the computation; for the sake of clarity, I omit absolute constant factors):
P
• given x, compute X = A(x) {n M 2
i=1 mi - you should multiply n block-diagonal matrices
Ai by xi ’s and take the sum of these matrices and the matrix A0 };
PM
• given X, compute X −1 { 3
i=1 mi ; recall that X is block-diagonal};
PM
• given X −1 , compute n components − Tr{X −1 Ai } of the vector F 0 (x) {n 2
i=1 mi };
P
• given X −1 , compute n matrices Abi = X −1 Ai X −1 {n M 3
i=1 mi } and then compute n(n +
P
1)/2 quantities F 00 (x)ij = Tr{Abi Aj }, 1 ≤ i ≤ j ≤ n {n2 M 2
i=1 mi }.

The total arithmetic cost of assembling the Newton system is therefore


M
X M
X
2
Nass = O(n m2i +n m3i ).
i=1 i=1

It takes O(n3 ) operations more to solve the Newton system after it is assembled. Note that we
may assume that A(·) is an embedding - otherwise the feasible set G of the problem contains
lines, and the problem is unstable - small perturbation of the objective makes the problem below
unbounded. Assuming from now on that A(·) is an embedding (as a byproduct, this assumption
P
ensures nonsingularity of F 00 (·)), we see that n ≤ Mi=1 mi (mi + 1)/2 - simply because the latter
quantity is the dimension of the space where the mapping A(·) takes its values. Thus, here,
as in the (dense) Linear Programming case, the cost of assembling the Newton system (which
P
is at least O(n2 M 2 3
i=1 mi )) dominates the cost O(n ) of solving the system, and we come to
188 CHAPTER 11. SEMIDEFINITE PROGRAMMING

N = O(Nass ). Thus, the complexity characteristics of the path-following method for solving
semidefinite programs are
M
X M
X M
X √
ϑ=m= mi ; N =)(n2 m2i + n m3i ); C = N m. (11.10)
i=1 i=1 i=1

Potential reduction approach also is immediate: Conic reformulation of the problem is given
by
minimize Tr{f y} s.t. y = A(x) ∈ Sµ+ , (11.11)
Pn
where f ∈ Sµ ”represents the objective xT c in terms of y = i=1 xi Ai ”, i.e., is such that

Tr{f Ai} = ci , i = 1, ..., n.

The conic dual to (11.11) is, as it is easily seen, the problem

minimize Tr{A0 s} s.t. s ∈ Sµ+ , Tr{Ai s} = ci , i = 1, ..., n. (11.12)

Logarithmically homogeneous self-concordant barrier: we already know that Sµ+ admits explicit
m-logarithmically homogeneous self-concordant barrier Φ(X) = − ln Det X with explicit Legen-
dre transformation Φ∗ (S) = Φ(−S) − m; thus, we have no conceptual difficulties with applying
the methods of Karmarkar or the primal-dual method.
Complexity: it is easily seen that the complexity characteristics of the primal-dual method
associated with the indicated barrier are given by (11.10); the characteristic C for the method of

Karmarkar is O( m) times worse than that one given by (11.10). Comments. One should take
into account that in the case of Semidefinite Programming, same as in the Linear Programming
case, complexity characteristics (11.10) give very poor impression of actual performance of the
algorithms. The first source of this phenomenon is that ”real-world” semidefinite programs
normally possess additional structure which was ignored in our evaluation of the arithmetic cost
of a Newton step; e.g., for the Lyapunov Stability problem (11.4) we have mi = k, i = 1, ..., M ,
k being the dimension of the state space of the system, n = O(k 2 ) (# of design variables equals
to # of free entries in a k × k symmetric matrix L). Our general considerations result in

N = O(k 6 M )

(see (11.10) and in the qualitative conclusion that the cost of a step is dominated by the cost
of assembling the Newton system. It turns out, anyhow, that the structure of our LMI’s allows
to reduce Nass to O(k 4 M ), which results in N = O(k 6 + k 4 M ); in particular, if M << k 2 , then
the cost of assembling the Newton system is negligible as compared to the cost of solving the
system.
Further, numerical experiments demonstrate that the Newton complexity of finding an ε-
solution of a semidefinite program by a long-step path-following or a potential reduction interior

point method normally is significantly less than its theoretical O( m) upper bound; in prac-
tice # of Newton steps looks like a moderate constant (something 30-60). Thus, Semidefinite
Programming is, basically, as computationally tractable as Linear Programming.
11.4. EXERCISES ON SEMIDEFINITE PROGRAMMING 189

11.4 Exercises on Semidefinite Programming


The goal of the below exercises is to demonstrate additional abilities to represent convex prob-
lems via semidefinite restrictions. Let us start with a useful definition:
let G be a closed convex domain in Rn . We call G semidefinite representable (SDR), if there
exists an affine mapping
AG (x, u) : Rnx × Rlu → Sk
taking values in the space Sk of symmetric matrices of certain row size k such that the image of
AG intersects the interior of the cone Sk+ of positive semidefinite symmetric k × k matrices and

G = {x | ∃u : AG (x, u) ∈ Sk+ }.

The above AG is called semidefinite representation of G.


Example: the mapping
 
u3 − x5
 u2 u3 
 
 u3 u1 
 
 
A(x, u) =  x4 u2  : R5x × R3u → S7
 
 u2 x3 
 
 x2 u1 
u1 x1

(blank space corresponds to zero entries) represents the hypograph

G = {x ∈ R5 | x1 , x2 , x3 , x4 ≥ 0, x5 ≤ [x1 x2 x3 x4 ]1/4 }

of the geometric mean of four variables x1 , ..., x4 .


Indeed, positive semidefiniteness of A(x, u) says that the north-western entry u3 − x5 is
nonnegative, i.e.,
x5 ≤ u3 ,
and that the remaining 2 × 2 diagonal blocks of A are positive semidefinite symmetric matrices,
i.e., say that x1 , ..., x4 , u1 , u2 are nonnegative and
√ √ √
u1 ≤ x1 x2 , u2 ≤ x3 x4 , u3 ≤ u1 u2 .

It is clear that a given x can be extended, by certain u, to a collection satisfying the indicated
inequalities if and only if x1 , ..., x4 are nonnegative and x5 ≤ [x1 ...x4 ]1/4 , i.e., if and only if
x ∈ G.
The relation of the introduced notion to Semidefinite Programming is clear from the following

Exercise 11.4.1 i# Let G be an SDR domain with semidefinite representation AG . Prove that
the convex program
minimize cT x s.t. x ∈ G
is equivalent to the semidefinite program

minimize cT x s.t. AG (x, u) ≥ 0.

SDR domains admit a kind of calculus:


190 CHAPTER 11. SEMIDEFINITE PROGRAMMING

Exercise 11.4.2 # . 1) Let G+ ⊂ Rn be SDR, and let x = B(y) be an affine mapping from
Rl into Rn with the image intersecting int G+ . Prove that G = B −1 (G+ ) is SDR, and that a
semidefinite representation of G+ induces, in an explicit manner, a semidefinite representation
of G.
2) Let G = ∩m n
i=1 Gi be a closed convex domain in R , and let all Gi be SDR. Prove that
G also is SDR, and that semidefinite representations of Gi induce, in an explicit manner, a
semidefinite representation of G.
3) Let Gi ⊂ Rni be SDR, i = 1, ..., m. Prove that the direct product G = G1 ×G2 ×...×Gm is
SDR, and that semidefinite representations of Gi induce, in an explicit manner, a semidefinite
representation of G.
The above exercises demonstrate that the possibilities to pose convex problems as semidefinite
programs are limited only by our abilities to find semidefinite representations for the constraints
involved into the problem. The family of conves sets which admit explicit semidefinite repre-
sentations is surprisingly wide. Lecture 11 already gives us a number of examples which are
summarized in the following
Exercise 11.4.3 # Verify that the below sets are SDR and point out their explicit semidefinite
representations:
• half-space
• Lebesque set {x | f (x) ≤ 0} of a convex quadratic form, such that f (x) < 0 for some x
• the second order cone K 2 = {(t, x) ∈ R × Rn | t ≥ |x|2 }
• the epigraph {(t, X) ∈ R × Sk | t ≥ λmax (X)} of the masimal eigenvalue of a symmetric
k × k matrix X
Now some more examples.
Exercise 11.4.4 Prove that
A(t, x) = Diag{t − x1 , t − x2 , ..., t − xn }
is SDR for the epigraph
{(t, x) ∈ R × Rn | t ≥ xi , i = 1, ..., n}
of the function max{x1 , ..., xn }.
Exercise 11.4.5 Prove that  
t xT
A(t, x) =
x X
is SDR for the epigraph
cl{(t, xi, X) ∈ R × Rn × (int Sn+ ) | t ≥ xT X −1 x}
of fractional-quadratic funtion xT X −1 x of vector x and symmetric positive semidefinite matrix
X.
Exercise 11.4.6 The above Example gives a SDR of the hypograph of the geometrical mean
[x1 ...x4 ]1/4 of four nonnegative variables. Find SDR for the hypograph of the geometrical mean
of 2l nonnegative variables.
Exercise 11.4.7 Find semidefinite representation of the epigraph
{(t, x) ∈ R2 | p ≥ (x+ )p }, x+ = max[0, x],
of the power function for
1) p = 1; 2) p = 2; 3) p = 3; 4) arbitrary integer p > 0.
11.4. EXERCISES ON SEMIDEFINITE PROGRAMMING 191

11.4.1 Sums of eigenvalues and singular values


For a symmetric k × k matrix X let λ1 (X) ≥ λ2 (X) ≥ ... ≥ λk (X) be the eigenvalues of X
written down with their multiplicities in the descent order. To the moment all we know about
convexity of eigenvalues is that the maximum eigenvalue λ1 (X) is convex; we know even a SDR
for this function (Exercise 11.4.3). the remaining eigenvalues λi (X), i ≥ 2, simply are non
convex in X. Nevertheless, they possess nice property of monotonicity:

X, X 0 ∈ Sk , X ≤ X 0 → λi (X) ≤ λi (X 0 ), i = 1, ..., k.

This is an immediate corollary of the Courant-Fisher characterization of eigenvalues4 :

λi (X) = max min uT Xu,


E∈Ei u∈E,|u|=1

Ei being the family of all linear subspaces in Rk of the dimension i.


An important fact is that the functions
m
X
Sm (x) = λi (X), 1 ≤ m ≤ k,
i=1

are convex.

Exercise 11.4.8 + Prove that


 
t − mτ − Tr U 0 0
Am (t, X; τ, U ) =  0 τI + U − X 0
0 0 U

(τ is scalar, U is symmetric k × k matrix) is a SDR for the epigraph

{(t, X) ∈ R × Sk | t ≥ Sm (X)};

in particular, Sm (x) is convex (since its epigraph is SDR and is therefore convex) monotone
function.

For an arbitrary k × k matrix X let σi (X) be the singular values of X, i.e., square roots of
the eigenvalues of the matrix X T X. In what follows we always use the descent order of singular
values:
σ1 (X) ≥ σ2 (X) ≥ ... ≥ σk (X).
Let also m
X
Σm (X) = σi (X).
i=1
The importance of singular values is seen from the following fundamental Singular Value De-
composition Theorem (which for non-symmetric matrices plays basically the same role as the
theorem that a symmetric matrix is orthogonally equivalent to a diagonal matrix):
If X is a k × k matrix with singular values σ1 ,..., σk , then there exist pair of orthonormal basises
{ei } and {fi } such that
k
X
X= σi ei fiT
i=1
4
I strongly recommend to those who do not know this characterization pay attention to it; a good (and not
difficult) exercise if to prove the characterization
192 CHAPTER 11. SEMIDEFINITE PROGRAMMING

(geometrically: the mapping x → Xx takes the coordinates of x in the basis {fi }, multiplies
them by the singular values and makes the result the coordinates of Xx in the basis {ei }).
In particular, the spectral norm of X (the quantity max|x|2 ≤1 |Xx|2 ) is nothing but the
largest singular value σ1 of X.
In the symmetric case we, of course, have ei = ±fi (plus corresponds to eigenvectors fi of X
with positive, minus - to those with negative eigenvalues).
What we are about to do is to prove that the functions Σm (X) are convex, and to find their
SDR’s. To this end we make the following important observation:
let A and B be two k × k matrices. Then the sequences of eigenvalues (counted with their
multiplicities) of the matrices AB and BA are equal (more exactly, become equal under appro-
priate reordering). The proof is immediate: we should prove that the characteristic polynomials
Det (λI − AB) and Det (λI − BA) are equal to each other. By continuouty reasons, it suffices
to establish this identity when A is nondegenerate. But then it is evident:

Det (λI − AB) = Det (A(λI − BA)A−1 ) = (Det A) Det (λI − BA)(Det (A−1 )) = Det (λI − BA).

Now we are enough equipped to construct SDR’s for sums of singular values.

Exercise 11.4.9 + Given a k × k matrix X, form the symmetric 2k × 2k matrix


 
0 X
Y (X) = .
XT 0

Prove that the eigenvalues of this matrix are as follows: the first k of them are σ1 (X),σ2 (X),...,
σk (X), and the remaining k are −σk (X), −σk−1 (X),...,−σ1 (X). Derive from this observation
that
Σm (X) = Sm (Y (X))
and use SDR’s for Sm (·) given by Exercise 11.4.8 to get SDR’s for Σm (X).

The results stated in the exercises from this subsection play the central role in constructing
semidefinite representations for the epigraphs of functions of eigenvalues/singular values of sym-
metric/arbitrary matrices.
Hints to Exercises

Hints to Section 2.3


Exercise 2.3.7: apply (P) to scalar symmetric forms uT A[h1 , ..., hk ], u being a vector with

k u k∗ ≡ sup uT v ≤ 1.
v∈Rk min kvk≤1

Hints to Section 3.3


Exercise 3.3.2+ :
1): the function
m
X n
X
F (x) = − ln(−fj (x)) ≡ Fj (x)
i=j j=1

is self-concordant barrier for G (Exercise 3.3.1). Since G is bounded, F attains its minimum
on int G at certain point x∗ (V., Lecture 3). Choosing appropriate coordinates in Rn , we may
assume that F 00 (x∗ ) is the unit matrix. Now let j ∗ be the index of that one of the matrices
Fj00 (x∗ ) which has the minimal trace; eliminate j ∗ th of the inequalities and look at the Newton
P
decrement of the self-concordant function j6=j ∗ Fj (x) at x∗ .
2): we clearly can eliminate from the list of the sets Gα all elements which coincide with
the whole space, without violating boundedness of the intersection. Now, every closed convex
set which differs from the whole space is intersection of closed half-spaces, and these half-spaces
can be chosen in such a way that their interiors have the same intersection as the half-spaces
themselves. Representing all Gα as intersections of the above type, we see that the statement
in question clearly can be reduced to a similar statement with all Gα being closed half-spaces
such that the intersection of the interiors of these half-spaces is the same as the intersection of
the half-spaces themselves. Prove that if ∩α∈I Gα is bounded and nonempty, then there exists a
finite I 0 ⊂ I such that ∩α∈I 0 Gα also is bounded (and, of course, nonempty); after this is proved,
apply 1).
Exercise 3.3.5: this is an immediate consequence of II., Lecture 3.
Exercise 3.3.6: without loss of generality we may assume that ∆ = (a, 0) with some a < 0.
Choose an arbitrary x ∈ ∆ and look what are the conclusions of II., III., Lecture 3, when
y → −0.
To complete the proof of (P), note that if G differs from Rn , then the intersection of G with
certain line is a sgement ∆ with a nonempty interior which is a proper part of the line, and
choose as f the restriction of F onto ∆ (this restriction is a ϑ-self-concordant barrier for ∆ in
view of Proposition 3.1.1.(i)).
Exercise 3.3.7: note that the standard basis orths ei are recessive directions of G (see
Corollary 3.2.1) and therefore, according to the Corollary,

−DF (x)[ei ] ≥ {D2 F (x)[ei , ei ]}1/2 . (12.13)

193
194 HINTS TO EXERCISES

To prove (3.17), combine (12.13) and the fact that D2 F (x)[ei , ei ] ≥ x−2 i , 1 ≤ i ≤ m (since
x − xi ei 6∈ int G, while the open unit Dikin ellipsoid of F centered at x is contained in int G (I.,
Lecture 2)).
To derive from (3.17) the lower bound ϑ ≥ m, note that, in view of II., Lecture 3, it should
be
ϑ ≥ DF (x)[0 − x],
while (3.17) says that the latter quantity is at least m.
Exercise 3.3.9: as it was already explained, we can reduce the situation to the case of

G ∩ U = {x ∈ U | xi ≥ hi (x), i = 1, ..., m},

where hi (0) = 0, h0i (0) = 0. It follows that the interval


m
X
x(r) = r ei , 0 < r < r0 ,
i=1

associated with certain r0 > 0, belongs to G; here ei are the standard basis orths. Now,
let ∆i (r) be the set of those t for which the vector x(r) − (t + r)ei belongs to G. Prove
that ∆i (r) is of the type [−ai (r), bi (r)] which contains in its interior r, and that bi (r)/r → 0,
ai (r)/r → ∞ as r → +0. Derive from these observations and the statement of Exercise 3.3.8
that −DF (x(r))[ei ]r ≥ 1 − α(r), i = 1, ..., m, with certain α(r) → 0, r → +0. To complete the
proof of (Q), apply the Semiboundedness inequality I., Lecture 3, to x = x(r) and y = 0.
Hints to Section 7.6
Exercise 7.6.7: (Pr’) could be used, but not when we intend to solve it by the primal-dual
method. Indeed, it is immediately seen that if (7.44) is solvable, i.e., in the case we actually are
interested in, the objective in (Pr’) is below unbounded, so that the problem dual to (Pr’) is
unfeasible (why?) Thus, we simply would be unable to start the method!
Hints to Section 8.5
Exercise 8.5.1: we could, of course, assume that the Legendre transformation F ∗ of F is
known; but it would be less restrictive to assume instead that the solution to the problem
is given in advance. Indeed, knowledge of F ∗ means, in particular, ability to solve ”in one
step” any equation of the type F 0 (x) = d (the solution is given by x = (F ∗ )0 (d)); thus, setting
x = (F ∗ )0 (−1020 c), we could get - in one step - the point of the path x∗ (·) associated with
t = 1020 .
Exercise 8.5.3: to get (8.34), prove by induction that

Dj Φ(v)[h, ..., h] = (−1)j (j − 1)! Tr{[v −1 h]j }


d
(use the rule dt |t=0 (v + th)−1 = −v −1 hv −1 ). To derive (8.35) from (8.34), pass to the eigenbasis
b
of h.
Exercise 8.5.5: combine the result of Exercise 5.4.4, the ”symmetric” to this result statement
and the result of Exercise 8.5.2.
Hints to Section 10.5
Exercise 10.4: prove+ that the mapping

A(t, X) = t + ln Det X : R × int Sn+ → R

is 23 -appropriate for the domain G+ = R+ and apply Superposition rule (N) from Lecture 9.
HINTS TO EXERCISES 195

Exercise 10.5.7+ : let for a vector v the set Lv on the axis be defined as

Lv = {λ ≥ 0 | v T Rv ≤ λv T Sv}.

This is a closed convex set, and the premise of the statement we are proving says that the set
is nonempty for every v; and the statement we should prove is that all these sets have a point
in common. Of course, the proof should use the Helley Theorem; according to this theorem, all
we should prove is that
(a) Lv ∩ Lv0 6= ∅ for any pair v, v 0 ;
(b) Lv is bounded for some v.
196 HINTS TO EXERCISES
Solutions to Exercises

Solutions to Section 2.3


Exercise 2.3.3: let A be the set of all multiindexes α = (α1 , ..., αk ) with nonnegative integer
entries αi and the sum of entries equal to k, let Sk be the # of elements in A, and let for α ∈ A

α1 times α2 times αk times


z }| { z }| { z }| {
Aα [h1 , ..., hk ] = A[h1 , ..., h1 , h2 , ..., h2 , ..., hk , ..., hk ]

For k-dimensional vector r = (r1 , ..., rk ) we have, identically in h1 , ..., hk ∈ Rn :

k
X k
X k
X X
A[ ri hi , ri hi , ..., ri hi ] = ωα (r)Aα [h1 , ..., hk ] (13.14)
i=1 i=1 i=1 α∈A

(open parentheses and take into account symmetry of A), with ωα (r) being certain polynomials
of r.
What we are asked to do is to find certain number m of vectors r1 , r2 ,...,rm and certain
weights w1 , ..., wm in such a way that when substituting r = rl into (13.14) and taking sum of
the resulting identities with the weights w1 , ..., wm , we get in the right hand side the only term
A[h1 , ..., hk ] ≡ A(1,...,1) [h1 , ..., hk ], with the unit coefficient; then the resulting identity will be the
required representation of A[h1 , ..., hk ] as a linear combination of the restriction of A[·] onto the
diagonal.
Our reformulated problem is to choose m vectors from the family

F = {ω
b (r) = (ωα (r) | α ∈ A)}r∈Rk

of Sk -dimensional vectors in such a way that certain given Sk -dimensional vectors (unit at certain
specified place, zeros at the remaining places) will be a linear combination of the selected vectors.
This for sure is possible, with m = Sk , if the linear span of vectors from F is the entire space
RSk of Sk -dimensional vectors; and we are about to prove that this is actually the case (this
will complete the proof). Assume, on contrary, that the linear span of F is a proper subspace
in RSk . Then there exists a nonzero linear functional on the space which vanishes on F, i.e.,
there exists a set of coefficients λα , not all zeros, such that
X
p(r) ≡ λα ωα (r) = 0
α∈A

identically in r ∈ Rk . Now, it is immediately seen what is ωα :

k!
ωα (r) = rα1 rα2 ...rkαk .
α1 !α2 !...αk ! 1 2

197
198 SOLUTIONS TO EXERCISES

k
It follows that the partial derivative ∂ α1 r1 ∂ α2∂r2 ...∂ αk rk of p(·) is identically equal to λα ; if p ≡ 0,
then all these derivatives, and, consequently, all λα ’s, are zero, which is the desired contradiction.

Exercise 2.3.5: first of all, e1 and e2 are linearly independent since T1 6= T2 , therefore h 6=
0, q 6= 0. Let (Qx, y) = A[x, y, e3 , ..., el ]; then Q is a symmetric matrix.
Since {T1 , ..., Tl } is an extremal, we have

ω = |(Qe1 , e2 )| = max{|(Qu, v)| | k u k, k v k≤ 1}.

Therefore if E + = {x ∈ Rn | Qx = ωx}, E − = {x ∈ Rn | Qx = −ωx} and E = (E + + E − )⊥ ,


then at least one of the subspaces E + , E − is nonzero, k Qx k≤ ω 0 k x k, x ∈ E, where ω 0 < ω.
Rn is the direct sum of E + , E − and E. Let x = x+ + x− + x0 be the decomposition of x ∈ Rn
corresponding to the decomposition Rn = E + + E − + E. Since each of the subspaces E + , E −
and E is invariant for Q,

ω = |(Qe1 , e2 )| ≤ |ω(e+ + − − 0 0 0
1 , e2 ) − ω(e1 , e2 )| + ω k e1 kk e2 k≤

≤ ω(k e+ + − − 0 0 0
1 kk e2 k + k e1 kk e2 k) + ω k e1 kk e2 k≤

≤ ω{k e+ 2 + 2 1/2
1 k + k e2 k } {k e− 2 − 2 1/2
1 k + k e2 k } + ω 0 k e01 kk e02 k≤ ω
(we have taken into account that k e+ 2 − 2 0 2
i k + k ei k + k ei k = 1, i = 1, 2). We see that all the
inequalities in the above chain are equalities. Therefore we have

k e01 k=k e02 k= 0; k e+ +


1 k=k e2 k; k e− −
1 k=k e2 k;

moreover, |(e+ + + + − − − − + +
1 , e2 )| =k e1 kk e2 k and |(e1 , e2 )| =k e1 kk e2 k, which means that e1 = ±e2
− −
and e1 = ±e2 . Since e1 and e2 are linearly independent, only two cases are possible:
(a) e+ + − − 0 0
1 = e2 6= 0, e1 = −e2 6= 0, e1 = e2 = 0;
+ + − − 0
(b) e1 = −e2 6= 0, e1 = e2 6= 0, e1 = e02 = 0.
In case (a) h is proportional to e+ −
1 , q is proportional to e1 , therefore

{Rh, Rh, T3 , ..., Tl } ∈ T

and
{Rq, Rq, T3 , ...Tl } ∈ T.
The same arguments can be used in case (b).
Exercise 2.3.6: let e ∈ T and f ∈ S be unit vectors with the angle between them being equal to
α(T ). Without loss of generality we can assume that t ≤ s (note that reordering of an extremal
leads to an extremal, since A is symmetric). By virtue of Exercise 2.3.5 in the case of α(T ) 6= 0
the collection
2t times s−t times
z }| { z }| {
0
T = {R(e + f ), ..., R(e + f ), S, ..., S }
belongs to T∗ and clearly α(T 0 ) = α(T )/2. Thus, either T∗ contains an extremal T with α(T ) =
0, or we can find a sequence {Ti ∈ T∗ } with α(Ti ) → 0. In the latter case the sequence {Ti }
contains a subsequence converging (in the natural sense) to certain collection T , which clearly
belongs to T∗ , and α(T ) = 0. Thus, T contains an extremal T with α(T ) = 0, or, which is the
same, an extremal of the type {T, ..., T }.
Solutions to Section 3.3
SOLUTIONS TO EXERCISES 199

Exercise 3.3.1: F clearly is C3 smooth on Q = int G and possesses the barrier property, i.e.,
tends to ∞ along every sequence of interior points of G converging to a boundary point. Let
x ∈ Q and h ∈ Rn . We have
Df (x)[h]
F (x) = − ln(−f (x)); DF (x)[h] = − ;
f (x)

[Df (x)[h]]2 D2 f (x)[h, h] 2 D2 f (x)[h, h]


D2 F (x)[h, h] = − = [DF (x)[h]] + ;
f 2 (x) f (x) |f (x)|
[Df (x)[h]]3 Df (x)[h]D2 f (x)[h, h]
D3 F (x)[h, h, h] = −2 + 3 .
|f |3 (x) f 2 (x)
Since f is convex, we immediately conclude that
s
2 2 2 D2 f (x)[h, h] |Df (x)[h]|
D F (x)[h, h] = r + s , r = , s= ,
|f (x)| |f (x)|
q
|DF (x)[h]| = s ≤ D2 F (x)[h, h]
and
|D3 F (x)[h, h, h]| ≤ 2s3 + 3sr2 ≤ 2(s2 + r2 )3/2
(verify the concluding inequality yourself!). The resulting bounds on DF and D2 F demonstrate
that F is self-concordant and that λ(F, ·) ≤ 1, so that F is a 1-self-concordant barrier for G.
The concluding statement of the exercise in question follows from the already proved one
and Proposition 3.1.1.
Exercise 3.3.2:
1): according to Exercise 3.3.1, F is self-concordant barrier for G; since G is bounded, F
is nondegenerate (II., Lecture 2) and attains its minimum at certain point x∗ (V., Lecture 3).
Choosing appropriate coordinates in Rn , we may assume that F 00 (x∗ ) = I, I being the unit
P P
matrix. Now let Fj (x) = − ln(−fj (x)), Qj = Fj00 (x∗ ), so that F = j Fj and I = j Qj . We
Pm
have j=1 Tr Qj = Tr I = n, so that Tr Qj ∗ ≤ n/m, j ∗ being the index of Qj with the smallest
trace. To simplify notation, in what follows we assume that j ∗ = 1. An immediate computation
implies that
f100
Q1 = gg T + H, g = F10 (x∗ ), H = ;
|f1 (x∗ )|
n
it is seen that H ≥ 0, so that m ≥ Tr Q1 ≥ Tr{gg T } = |g|22 .
Now let us compute the Newton decrement of the function
m
X
Φ(x) = Fj (x)
j=2

at the point x∗ . Since the gradient of F at the point is 0, the gradient of Φ is −g; since the
n
Hessian of F at x∗ is I, the Hessian of Φ is I − Q1 ≥ (1 − m )I (the latter inequality immediately
n
follows from the fact that Q1 ≥ 0 and Tr Q1 ≤ m . We see that
n −1 n
λ2 (Φ, x∗ ) = [Φ0 (x∗ )]T [Φ00 (x∗ )]−1 Φ0 (x∗ ) = g T [Φ00 (x∗ )]−1 g ≤ |g|22 (1 − ) ≤ <1
m m−n
n
(we have used the already proved estimate |g|22 ≤ m and the fact that m > 2n). Thus, the
Newton decrement of a nondegenerate (in view of Φ (x∗ ) > 0) self-concordant barrier (in view
00

of Exercise 3.3.1) Φ for the convex domain G+ = {x ∈ Rn | fj (x) ≤ 0, j = 2, ..., m} is < 1;


200 SOLUTIONS TO EXERCISES

therefore Φ attains its minimum on int G+ (VII., Lecture 2). Since Φ is a nondegenerate self-
concordant barrier for G+ , the latter is possible only when G+ is bounded (V., Lecture 3).

2): as explained in Hints, we can reduce the situation to that one with Gα being closed
half-spaces such that the intersection of the interiors of these half-spaces coincides with the
intersection of the half-spaces themselves; in particular, the intersection of any finite subfamily
of the half-spaces Gα possesses a nonempty interior. Let us first prove that there exists a finite
I 0 ⊂ I such that ∩α∈I 0 Gα is bounded. Without loss of generality we may assume that 0 ∈ Gα ,
α ∈ I (since the intersection of all Gα is nonempty). Assume that for every finite subset I 0
0
of I the intersection GI = ∩α∈I 0 Gα is unbounded. Then for every R > 0 and every I 0 the
0 0
set GIR = {x ∈ GI | |x|2 = R} is a nonempty compact set; these compact sets form a nested
family and therefore their intersection is nonempty, which means that ∩α∈I Gα contains, for
every R > 0, a vector of the norm R and is therefore an unbounded set, which in fact is not the
case.
Thus, we can reduce the situation to a similar one for a finite family of closed half-spaces Gα
with the intersection of the interiors being bounded and nonempty; for this case the required
statement is given by 1).
Remark 13.0.1 I do not think that the above proof of item 1) of Exercise 3.3.2 is the simplest
one; please try to find a better proof.
Exercise 3.3.3: it is clear that F is C3 smooth on the interior of Sm
+ and possesses the barrier
property, i.e., tends to ∞ along every sequence of interior point of the cone converging to a
boundary point of it. Now, let x be an interior point of Sm
+ and h be an arbitrary direction in
the space Sm of symmetric m × m matrices, which is the embedding space of the cone. We have

F (x) = − ln Det x;
∂ ∂
DF (x)[h] = |t=0 [− ln Det (x + th)] = |t=0 [− ln Det x − ln Det (I + tx−1 h)] =
∂t ∂t

|t=0 Det (I + tx−1 h)
= − ∂t = − Tr(x−1 h)
Det (I)
(to understand the concluding step, look at the matrix I + tx−1 h; its diagonal entries are
1+t[x−1 h]ii , and the entries outside the diagonal are of order of t. Representing the determinant
Q
as the sum of products, we obtain m! terms, one of them being i (1+t[x−1 h]ii ) and the remaining
being of the type tk p with k ≥ 2 and p independent of t. These latter terms no not contribute to
the derivative with respect to t at t = 0, and the contribution of the ”diagonal” term is exactly
P −1 −1
i [x h]ii = Tr(x h)).
Thus,
DF (x)[h] = − Tr(x−1 h),
whence
D2 F (x)[h, h] = Tr(x−1 hx−1 h)
(we have already met with the relation DB(x)[h] = −B(x)hB(x), B(x) ≡ x−1 ; to prove it,
differentiate the identity B(x)x ≡ I).
Differentiating the expression for D2 F , we come to

D3 F (x)[h, h, h] = −2 Tr(x−1 hx−1 hx−1 h)

(we again have used the rule for differentiating the mapping x 7→ x−1 ). Now, x is positive
definite symmetric matrix; therefore there exists a positive semidefinite symmetric y such that
SOLUTIONS TO EXERCISES 201

x−1 = y 2 . Replacing x−1 by y and taking into account that Tr(AB) = Tr(BA), we come to the
expressions

DF (x)[h] = − Tr ξ, D2 F (x)[h, h] = Tr ξ 2 , D3 F (x)[h, h, h] = −2 Tr ξ 3 , ξ = yhy

(compare these relations with the expressions for the derivatives of the function − ln t). The
matrix ξ clearly is symmetric; expressing the traces via the eigenvalues λ1 , ..., λm of the matrix
ξ, we come to
m
X m
X m
X
DF (x)[h] = − λi ; D2 F (x)[h, h] = λ2i ; D3 F (x)[h, h, h] = −2 λ3i ,
i=1 i=1 i=1

which immediately implies the desired inequalities


√ q 2
|DF (x)[h]| ≤ m D F (x)[h, h]

and h i3/2
|D3 F (x)[h, h, h]| ≤ 2 D2 F (x)[h, h] .

Exercise 3.3.8: If ∆ = (−∞, 0], then the statement in question is given by Corollary 3.2.1.
From now on we assume that ∆ is finite (i.e., that a < +∞). Then f attains its minimum
on int√∆ at a unique point t∗ (V., Lecture 3), and t∗ partitiones ∆ in the ratio not exceeding

(ϑ+2 ϑ) : 1 (this is the centering property stated by the same V.). Thus, t∗ ≤ −a/(ϑ+2 ϑ+1);
the latter quantity is < t, since γt ∈ ∆ and therefore t ≥ −a/γ. Since t∗ < t, we have f 0 (t) > 0.
Note that we have also establish that

∗ (1 + ϑ)2
t/t ≤ .
γ
Let λ be the Newton decrement of a self-concordant function f at t; since f 0 (t) > 0, we have
q
λ = f 0 (t)/ f 00 (t).

Note that f 00 (t) ≥ t−2 (because the open Dikin ellipsoid of f centered at t should be contained
in int ∆ and 0 is a boundary point of ∆), and therefore

λ ≤ −tf 0 (t). (13.15)

It is possible, first, that λ ≥ 1. If it is the case, then (3.19) is an immediate consequence of


(13.15).
It remains to consider the case when λ < 1. In this case, in view of VIII., Lecture 2, we
have
f (t) ≤ f (t∗ ) + ρ(λ), ρ(s) = − ln(1 − s) − s.
On the other hand, from the Lower bound on f (III., Lecture 3) it follows that

f (t) ≥ f (t∗ ) + f 0 (t∗ )(t − t∗ ) − ln(1 − πt∗ (t)) − πt∗ (t) ≡ f (t∗ ) + ρ(πt∗ (t)).

Thus, we come to
ρ(λ) ≥ ρ(πt∗ (t)),
whence √ 2
∗ ∗ (1 + ϑ)
λ ≥ πt∗ (t) ≡ |(t − t )/t | ≥ 1 − .
γ
202 SOLUTIONS TO EXERCISES

Combining this inequality with (13.15), we come to


√ 2
0 (1 + ϑ)
−tf (t) ≥ 1 − ,
γ

as required in (3.19). (3.20) is nothing but (3.19) applied to the restriction of F onto the
contained in G part of the line passing through x and z.
Solutions to Section 5.4

Exercise 5.5.4: let α(τ, s) be the homogeneous part of the affine mapping A. A vector
w = (r, q1 , ..., qk ) is in c + L⊥ if and only if

(w, α(τ, s)) = (c, α(τ, s))

identically in (τ, s) ∈ E. This relation, in view of the construction of A, can be rewritten as


k
X
rT s + [λj τ + Tr{σj A(s)}] = τ
j=1

identically in (τ, s) with the zero sum of si , which immediately results in (5.20), (5.21).
To complete the derivation of the dual problem, we should realize what is (b, w) for w ∈
c + L⊥ . This is immediate:
k
X k
X
(b, w) = [eT r + Tr{A(e)σj }] + 2 z Tj fj ,
j=1 j=1

and the quantity in the parentheses [ ] is nothing but V ρ in view of (5.21).


Exercise 5.5.5: let us perform in (TTDd ) partial optimization over σj and r. Given a feasible
plan of (TTDd ), we have in our standard notation:

λj ≥ 0; λj = 0 ⇒ z j = 0; σj ≥ λ−1 T
j zj zj
 
λj z Tj
(these relations say exactly that the symmetric matrix qj = is positive semidefinite,
zj σj
cf. Exercise 5.5.3).
From these observations we immediately conclude that replacing in the feasible plan in
question the matrices σj by the matrices σj0 = λ−1 T
j z j z j for λj > 0 and zero matrices for λj = 0,
P
we preserve positive semidefiniteness of the updated matrices qj and ensure that j bTi σj0 bi ≤
P T
j bi σj bi ; these latter quantities were equal to ρ − ri with nonnegative ri , so that the former
ones also can be represented as ρ − ri0 with nonnegative ri0 . Thus, we may pass from a feasible
plan of (TTDd ) to another feasible plan with the same value of the objective, and with σj being
of the dyadic form λ−1 T
j z j z j ; the remaining simplifications are straightforward.
Exercise 5.5.6: as we know, K is self-dual, so that the formalism presented in Exercise 5.4.11
results in the following description of the problem dual to (π):
minimize β T η by choice of
η = (ζ, π· ) ∈ K
and real r subject to the constraint that the equality

(η, A(ξ)) = χT ξ + krpT ξ (13.16)


SOLUTIONS TO EXERCISES 203

holds true identically in ξ; here


β = A(p).
Indeed, the requirement that (13.16) is identity in ξ is exactly the same as the relation

AT η = χ + P T r,

A being the matrix of the mapping A (in our case this mapping is linear homogeneous); we have
taken into account that P T r = krp, see the description of the data of (π).
Now, using in the straightforward manner the description of the data in (π) and denoting
 
αij βij
πij = ,
βij γij

we can rewrite identity (13.16) as the following identity with respect to f , λ, yij and z j (in what
follows i varies from 1 to m, j varies from 1 to k):
  
X X  Xn i X
ζ f − [2z Tj fj + V yij ] + yij αij + 2βij bTi z j + λj γij = f + r λj ,
 i 
i j i,j j

which results in the following equations on η:


X
ζi = 1; (13.17)
i

V ζi = αij ; (13.18)
X X
( ζi )fj = βij bi ; (13.19)
i i
X
γij = r. (13.20)
i

Now, the constraint η ∈ K is equivalent to


 
αij βij
ζi ≥ 0; πij ≡ ≥ 0, (13.21)
βij γij

and the objective β T η ≡ (A(p))T η is nothing but


X
k −1 γij .
ij

Expressing via equations (13.17) - (13.20) all components of η via in terms of variables φi ≡ V ζi ,
βij and r and taking into account that the condition πij ≥ 0 is equivalent to αij ≥ 0, γij ≥ 0,
2 , and eliminating in the resulting problem the variables γ by partial optimization
αij γij ≥ βij ij
with respect to these variables, we immediately come to the desired formulation of the problem
dual to (π).
Exercise 5.5.7: let (φ, β· ) be a feasible solution to (ψ), and let I be the set of indices of
nonzero φi . Then βij = 0 whenever i 6∈ I - otherwise the objective of (ψ) at the solution would
be infinite (this is our rule for interpreting fractions with zero denominators), and the solution
is assumed to be feasible. Let us fix j and consider the following optimization problem:
X X
(Pj ) : minimize vi2 φ−1
i s.t. vi bi = fj ,
i∈I i∈I
204 SOLUTIONS TO EXERCISES

vi being the control variables. The problem clearly is feasible: a feasible plan is given by
vi = βij , i ∈ I. Now, (Pj ) is a quadratic problem with nonnegative objective and linear equality
constraints; therefore it is solvable. Let βij ∗ , i ∈ I, be an optimal solution to the problem,

and let βij = 0 for i 6∈ I. From the optimality conditions for (Pj ) it follows that there is an
n-dimensional vector 2xj - the vector of Lagrange multipliers for the equality constraints - such
∗ , i ∈ I, is an optimal solution to the unconstrained problem
that βij
X X
minimize vi2 φ−1 T
i + 2xj (fj − vi bi ),
i∈I i

so that for i ∈ I one has



βij = φi xTj βi ; (13.22)
this relation, of course, is valid also for i 6∈ I (where both sides are zero). Since βi·∗ is feasible
P ∗
for (Pj ), we have i βij bi = fj , which in view of (13.22) implies that
X
fj = ( φi (bi bTi ))xj ≡ A(φ)xj . (13.23)
i

This latter relation combined with (13.22) says that the plan (φ, β·∗ ) is the image of the feasible
plan (φ, x1 , ..., xk ) under the mapping (5.35).
What are the compliances cj associated with the plan (φ, x1 , ..., xk )? In view of (13.22) -
(13.23) we have
X X X
∗ 2 −1
cj = xTj fj = xTj ∗
βij bj = ∗
βij (xTj bj ) = [βij ] φj ;
i i∈I i∈I
∗ - an optimal plan to (P ), we come to
and since βij form a feasible, and βij j
X
2 −1
cj ≤ βij φi .
i

Thus, the value of the objective (i.e., maxj cj ) of (TTDini ) at the plan (φ, x1 , ..., xk ) does not
exceed the value of the objective of (ψ) at the plan (φ, β· ), and we are done.
Solutions to Section 6.7
Exercise 6.7.3: if the set K σ = {y ∈ K ∩ M | σ T y = 1} were bounded, the set K(σ) =
{y ∈ K ∩ M | σ T y ≤ 1} also would be bounded (since, as we know from (6.7), σ T y is positive
on M ∩ int K). From this latter fact it would follow that σ is strictly positive on the cone
K 0 = K ∩ M (see basic statements on convex cones in Lecture 5). The optimal solution x∗ is a
nonzero vector from the cone K 0 and we know that σ T x∗ = 0; this is the desired contradiction.
All remaining statements are immediate: φ is nondegenerate self-concordant barrier for K σ
(regarded as a domain in its affine hull) due to Proposition 5.3.1; Dom φ is unbounded and
therefore φ is below unbounded on its domain (V., Lecture 3); since φ is below unbounded, its
Newton decrement is ≥ 1 at any point (VIII., Lecture 2) and therefore the damped Newton
step decreases φ at least by ρ(−1) = 1 − ln 2 (V., Lecture 2).
Exercise 6.7.5: 1) is an immediate consequence of III.. To prove 2), note that (S, χ∗ ) = 0
for certain positive semidefinite χ∗ = I − δ with δ ∈ Π (IVb.). Since (S, I) = 1 (III.), we have
(δ, S) = 1; since η is the orthoprojection of S onto Π and δ ∈ Π, we have (δ, η) = (δ, S), whence
(δ, η) = 1. Now, (η, I) = 0 (recall that η ∈ Π and Π is contained in the subspace of matrices
with zero trace, see II.). Thus, we come to (I − δ, η) ≡ (χ∗ , η) = −1. Writing down the latter
relation in the eigenbasis of η, we come to
n
X
χi gi = −1,
i=1
SOLUTIONS TO EXERCISES 205

χi being the diagonal entries of χ∗ with respect to the basis; since χi ≥ 0 (recall that χ∗ is
P
positive semidefinite) and i χ∗i = n (see IVb.), we conclude that maxi |gi | ≥ n−1 .
Exercise 6.7.6: one clearly has τ ∈ T , and, consequently, τ ∈ Dom φ. We have
X
φ(0) − φ(τ ) = ln(1 − τ gi ) − n ln(1 − τ |g|22 ) ≥
i

[due to concavity of ln]

X ∞ X
X
≥ ln(1 − τ gi ) + nτ |g|22 = j −1 (−τ gi )j + nτ |g|22 =
i j=1 i

P
[since i gi = 0, see Exercise 6.7.5, 1)]
∞ X
X
= j −1 (−τ gi )j + nτ |g|22 ≥
j=2 i


X
≥− j −1 [τ |g|2 ]2 [τ |g|∞ ]j−2 + nτ |g|22 =
j=2


|g|22 X
=− j −1 (τ |g|∞ )j + nτ |g|22 =
|g|2∞ j=2

|g|22
= [ln(1 − τ |g|∞ ) + τ |g|∞ ] + nτ |g|22 .
|g|2∞
Substituting into the resulting lower bound for φ(0)−φ(τ ) the value of τ indicated in the exercise,
we come to the lower bound

|g|22
α≥ [n|g|∞ − ln(1 + n|g|∞ )] ;
|g|2∞

it remains to use Exercise 6.7.5, 2).


Solutions to Section 7.6
Exercise 7.6.3: by construction, K is the direct product of M + r copies of the cone Sν+ of
positive semidefinite symmetric ν × ν matrices. The latter cone is self-dual (Exercise 5.4.7), and
therefore K also is self-dual (Exercise 5.4.9). Now, − ln Det y is a ν-vartheta logarithmically ho-
mogeneous self-concordant barrier for the cone Sν+ (Example 5.3.3, Lecture 5), and the Legendre
transformation of this barrier is − ln Det (−r) − ν (Exercise 5.4.10). From Proposition 5.3.2.(iii)
it follows that the direct sum of the above barriers for the direct factors of K, which is nothing
but the barrier F (x) = − ln Det x, is (M + 2)ν-logarithmically homogeneous self-concordant
barrier for K. The Legendre transformation of direct sum clearly is direct sum of the Legendre
transformations.
Solutions to Section 8.5
Exercise 8.5.4: by definition of ζ ≡ ζ(v, dv) we have v + rdv ∈ int Sk+ whenever |r| < ζ, so
that f (r) is well-defined. Now, the function f (r) clearly is analytic on its domain, and its Taylor
expansion at r = 0 is
X∞ ∞
f (i) (0) i X Di Φ(v)[dv, ..., dv] i
r = r =
i=0
i! i=0
i!
206 SOLUTIONS TO EXERCISES

[Exercise 8.5.3]

X b i}
0 Tr{h b = v −1/2 dvv −1/2 .
= f (0) + f (0)r + (−1)i ri , h
i=2
i
In view of (8.35) the absolute values of the coefficients in the latter series are bounded from
b 2 |h|
above by i−1 |h| b i−2 , so that the series converges (and, consequently, represents f - recall that
2 ∞
f is analytic on its domain) when r < ζ(v, dv) ≡ |h|−1 ∞ (see (8.33). It follows that the reminder
for the aforementioned r is bounded from above by the series
b2 X∞
|h|2 b ∞ )i ,
i−1 (r|h|
b2
|h|∞ i=j+1

b 2 = 1 in view of (8.32), we come to (8.36).


and, taking into account that |h|
Exercise 8.5.6: Since x ∈ int G, we have u ∈ int Sk+ ; further,

|du|Φ00 (u) = |πdx|Φ00 (u) = |dx|πT Φ00 (u)π = |dx|F 00 (x) = 1,

so that (u, du) indeed is an arrow; by construction, (s, ds) is the conjugate to (u, du) co-arrow.
It remain to note that by definition of ∆ and due to the normalization |dx|F 00 (x) = 1 we have

∆ = max{p | x ± pdx ∈ G} = max{p | u ± pdu ∈ Sk+ } ≡ ζ(u, du).

Exercise 8.5.7: by Lemma 8.3.1 and Proposition 8.3.1, the upper bound v(r) for the residual
Ft+dt (x+dx(dt))−miny Ft+dt (y) is bounded from above by the reminder ρ∗ (r) in the third order
Taylor expansion of the function Φ(u+rdu(dt))+Φ∗ (s+rds(dt)); here dt is an arbitrary positive
scale factor, and we are in our right to choose dt in a way which ensures that |dx(dt)|F 00 (x) = 1;
with this normalization, Ω = Ω(x) will be exactly the quantity δt/dt, where δt is the stepsize
given by the linesearch. The quantity Ω is therefore such that v(Ω) = O(1) (since we use
linesearch to get the largest r which results in v(r) ≤ κ); consequently, ρ∗ (Ω) ≥ O(1). On the
other hand, in view of Exercises 8.5.5 and 8.5.6, ρ∗ (r) is exactly R3(u,du) (r); combining (8.37)
and the inequality ρ∗ (Ω) ≥ O(1), we come to

ζ 2 (u, du)ρ3 (Ω/ζ(u, du)) ≥ O(1),

and since ζ(u, du) = ∆ ≡ ∆(x) by Exercise 8.5.6, we obtain

ρ3 (Ω/∆) ≥ O(1)∆−2 .

Since ρ3 (z) ≤ O(1)z 4 , |z| ≤ 1/2, we conclude that



Ω/∆ ≤ 1/2 ⇒ Ω ≥ O(1) ∆;

the resulting inequality for sure is true if Ω/∆ > 1/2, since, as we know, ∆ ≥ 1.
Solutions to Section 9.6
Exercise 9.6.1:
1): the ”general” part is an immediate consequence of the Substitution rule (N) as applied
to the mapping  T 
x x
B : (t, x) 7→ [G− = R × Rn ]
t
which is 1-appropriate for G+ in view of Proposition 9.3.1.
SOLUTIONS TO EXERCISES 207

The ”particular” part is given by the general one as applied to

G+ = {(u, s) | u ≤ s2/p , s ≥ 0}

and the 2-self-concordant barrier F + (u, s) = − ln(s2/p − u) − ln s for G+ , see Example 9.2.1.
2): the ”general” part is an immediate consequence of the Substitution rule (N) applied to
the mapping   xT x
B : (t, x) 7→ t [G− = R+ × Rn ]
t
which is appropriate for G+ in view of Proposition 9.3.1.
The ”particular” part is given by the general one as applied to

G+ = {(u, s) | u ≤ s2/p−1 , s ≥ 0},

the 2-self-concordant barrier

F + (u, s) = − ln(s2/p−1 − u) − ln s

for G+ (see Example 9.2.1) and the 1-self-concordant barrier

F − (t, x) = − ln t

for the domain G− of the mapping B.


Exercise 9.6.3: apply Proposition 9.3.1 to the data

• G+ = Sm +
+ , F (τ ) = − ln Det τ ;

1 Pq
• Q[ξ 0 , ξ 00 ] = 2
0 T 00
j=1 [(ξj ) ξj + (ξj00 )T ξj0 ],
ξ = (ξ1 , ..., ξq );

• A(η)ξ = (y1 (η)ξ1 , ..., yq (η)ξq ),


F − (η) = FY (η).

Exercise 9.6.4.2): specify the data in Exercise 9.6.3 as

• q = k, n1 = ... = nk = m;

• Y = Rk+ , yj (η) = ηj I, j = 1, ..., k;


Pk
• FY (η) = − j=1 ln ηj .

The resulting cone K clearly is comprised of collections (τ ; η; ξj ) (τ is m × m symmetric matrix,


η ∈ Rk , ξj are m × m matrices), for which

k
X
η ≥ 0; τ − ηj−1 ξjT ξj ≥ 0.
j=1

The cone G+ is the inverse image of the ”huge” cone K under the linear mapping
 
τ = Diag{s1 , ..., sm }
(si ; tij ; rj ) 7→ ξj = Diag{t1j , ..., tmj }I  ,

ηj = rj
208 SOLUTIONS TO EXERCISES

and Φ is nothing but the superposition of the barrier F for K given by the result of Exercise
9.6.3 and this mapping.
Exercise 9.6.5: let us compute derivatives of A at a point u = (t, y) ∈ int G− in a direction
du = (dt, dy) such that u ± du ∈ G− ; what we should prove is that

D2 A(u)[du, du] ≤ 0 (13.24)

and that
D3 A(u)[du, du, du] ≤ −3D2 A(u)[du, du]. (13.25)
Pp k
Let us set ηi = dyi /yi , σk = i=1 ηi , so that, in the clear notation, dσk = −kσk+1 , and let
φ(t, y) = (y1 ...yp )1/p . We have

DA(t, y)[du] = p−1 σ1 φ(t, y) − dt;

D2 A(t, y)[du, du] = p−2 σ12 φ(t, y) − p−1 σ2 A(t, y) = p−2 [σ12 − pσ2 ]φ(t, y),
D3 A(t, y)[du, du, du] = −p−2 [2σ1 σ2 − 2pσ3 ]φ(t, y) + p−3 σ1 [σ22 − pσ2 ].
Now, let
λ = p−1 σ1 , αi = ηi − λ.
We clearly have
p
X p
X p
X p
X p
X
σ1 = pλ; σ2 = ηi2 2
= pλ + αi2 ; σ3 = ηi3 3
= pλ + 3λ αi2 + αi3 . (13.26)
i=1 i=1 i=1 i=1 i=1

Substituting these expressions for σk in the expressions for the second and the third derivative
of A, we come to
p
X
d2 ≡ −D2 A(t, y)[du, du] = p−1 φ(t, y) αi2 ≥ 0, (13.27)
i=1

as required in (13.24), and


p
X p
X p
X
d3 ≡ D3 A(t, u)[du, du, du] = −2p−2 φ(u)[p2 λ3 + pλ αi2 − p2 λ3 − 3pλ αi2 − p αi3 ]−
i=1 i=1 i=1

p
X
−p−1 φ(u)λ αi2 =
i=1
p
X X p
X p
3 2 3 2
= φ(u)λ αi2 + φ(u) αi3 = φ(u) [λ + αi ]αi2 =
p i=1
p i=1
p i=1
3
X 1 p
3 2
= φ(u) [ λ + ηi ]αi2 . (13.28)
p i=1
3 3
Now, the inclusion u ± du ∈ G− means exactly that −1 ≤ ηi ≤ 1, i = 1, ..., p, whence also
|λ| ≤ 1; therefore | 13 λ + 23 ηi | ≤ 1, and comparing (13.28) and (13.27), we come to (13.25).
Exercise 9.6.6: The mapping B(·) is the superposition A(L(·)) of the mapping

A(t, y1 , ..., yp ) = (y1 ...yp )1/p − t : H → R

with the domain


H = {(t, y1 , ..., yp ) | y ≥ 0}
SOLUTIONS TO EXERCISES 209

and the linear mapping


L(τ, ξ, η) = (ξ, τ, η, ..., η) : R3 → Rp+1 ;
namely, the set G− is exactly L−1 (H), and on the interior of G− we have B(·) ≡ A(L(·)).
From Exercise 9.6.5 we know that A is 1-appropriate for R+ ; the fact that B laso is 1-
appropriate for R+ is given by the following immediate observation:
Let A : int H → RN (H is a closed convex domain in RK ) be β-appropriate for a closed
convex domain G+ ⊂ RN , let L be an affine mapping in certain RM , and let G− be a closed
convex domain in the latter space such that L(int G− ) ⊂ int H. Then the composite mapping

B(x) = A(L(x)) : int G− → RN

is β-appropriate for G+ .
Thus, our particular B indeed is 1-appropriate with R+ ; the remaining claims of the Exercise
are given by Theorem 9.1.1 applied with F + (z) = − ln z and F − (τ, ξ, η) = − ln τ − ln η.
Solutions to Section 10.5
Exercise 10.5.2: what we should prove is that GO is convex and that the solutions to (Outer’)
are exactly the minimum volume ellipsoids which contain Q.
To prove convexity, assume that (r0 , x0 , X 0 ) and (r00 , x00 , X 00 ) are two points of G0 , λ ∈ [0, 1]
and (r, x, X) = λ(r0 , x0 , X 0 ) + (1 − λ)(r00 , x00 , X 00 ); we should prove that (r, x, X) ∈ G0 . Indeed,
by the definition of G0 we have for all u ∈ Q

uT (X 0 )u + 2(x0 )T u + r0 ≤ 0, uT (X 00 )u + 2(x00 )T u + r00 ≤ 0,

whence, after taking weighted sum,

uT Xu + 2xT u + r ≤ 0.

Thus, the points of Q indeed satisfy the quadratic inequality associated with (r, x, X); since
X clearly is symmetric positive definite and Q possesses a nonempty interior, this quadratic
inequality does define an ellipsoid, and, as we have seen, this ellipsoid E(r, x, X) contains Q. It
remains to prove that the triple (r, x, X) satisfies the normalizing condition δ(r, x, X) ≤ 1; but
this is an immediate consequence of convexity of the function xT X −1 x − r on the set (r, x, X)
with X ∈ int Sn+ (see the section on the fractional-quadratic mapping in Lecture 9).
It remains to prove that optimal solutions to (Outer’) represent exactly minimum volume
ellipsoids which cover Q. Indeed, let (r, x, X) be a feasible solution to (Outer’) with finite value
of the objective. I claim that δ(r, x, X) > 0. Indeed, X is positive definite (since it is in Sn+ and
F is finite at X), therefore the set E(r, x, X) is empty, a point or an ellipsoid, depending on
whether δ(r, x, X) is negative, zero or positive; since (r, x, X) ∈ GO , the set E(r, x, X) contains
Q, and is therefore neither empty nor a point (since int Q 6= ∅), so that δ(r, x, X) must be
positive. Thus, feasible solutions (r, x, X) to (Outer’) with finite value of the objective are such
that the sets E(r, x, X) are ellipsoids containing Q; it is immediately seen that every ellipsoid
with the latter property comes from certain feasible solution to (Outer’). Note that the objective
in (Outer’) is ”almost” (monotone transformation of) the objective in (Outer):
n 1
ln Vol(E(r, x, X)) = ln κn + ln δ(r, x, X) − ln Det X,
2 2
and the objective in (Outer’) is F (X) = − ln Det X. We conclude that (Outer) is equivalent to
the problem (Outer”) which is obtained from (Outer’) by replacing the inequality δ(r, x, X) ≤ 1
with the equation δ(r, x, X) = 1. But this is immediate: if (r, x, X) is a feasible solution
210 SOLUTIONS TO EXERCISES

to (Outer’) with finite value of the objective, then, as we know, δ(r, x, X) > 0; setting γ =
δ −1 (r, x, X) and (r0 , x0 , X 0 ) = γ(r, x, X), we come to E(r, x, X) = E(r0 , x0 , X 0 ), δ(r0 , x0 , X 0 ) = 1,
so that (r0 , x0 , X 0 ) ∈ GO , and F (X 0 ) = F (X) + n ln γ ≤ F (X). From this latter observation it
immediately follows that (Outer’) is equivalent to (Outer”), and this latter problem, as we just
have seen, is nothing but (Outer).
Exercise 10.5.4: to prove that A is 23 -appropriate for G+ , note that a direct computation
says that for positive definite symmetric X and any (dt, dX) one has

d2 ≡ D2 A(t, X)[(dt, dX), (dt, dX)] = − Tr{X −1 dXX −1 dX} = − Tr{[δX]2 },

δX = X −1/2 dXX −1/2


and
d3 ≡ D3 A(t, x)[(dt, dX), (dt, dX), (dt, dX)] =
= 2 Tr{X −1 dXX −1 dXX −1 dX} = 2 Tr{[δX]3 }.
Since the recessive cone K of G+ is the nonnegative ray, the evident relation d2 ≤ 0 says that A
is concave with respect to K. Besides this, if X ±dX is positive semidefinite, then −I ≤ δX ≤ I,
whence Tr{[δX]3 } ≤ Tr{[δX]2 } (look what happens in the eigenbasis of δX), so that d3 ≤ −2d2 .
Thus, A indeed is 23 -appropriate for G+ .
Exercise 10.5.5: the feasible set in question is given by the following list of constraints:

aTj Xaj + 2xT aj + r ≤ 0, j = 1, ..., m

(corresponding 1-self-concordant barriers are − ln(−aTj Xaj − 2xT aj − r));

− ln Det X ≤ t

(corresponding (n + 1)-self-concordant barrier is − ln(t + ln Det X) − ln Det X, Exercise 10.5.4);


and, finally,
cl{X ∈ int Sn+ , 1 − r + xT X −1 x ≥ 0}.
The set H defined by the latter constraint is the inverse image of G+ = R+ under the nonlinear
mapping
(r, x, X) 7→ 1 + r − xT X −1 x : int G− → R,
G− = {(r, x, X) | X ∈ Sn+ }. Proposition 9.3.1 says that the function

− ln(1 + r − xT X −1 x) − ln Det X

is (n + 1)-self-concordant barrier for H.


To get a self-concordant barrier for G, it remains to take the sum of the indicated barriers.
Exercise 10.5.6: since Y is positive definite, any direction w0 of the type (u, 0) is such that
(w0 )T Rw0 > 0. Now, E(r, x, X) is an ellipsoid, not a point or the empty set, and therefore there
is a vector v such that
v T Xv + 2xT v + r < 0;
setting w = (v, 1), we get wT Sw < 0.
Exercise 10.5.7: (b) is immediate: we know that wT Sw < 0 for some w, so that Lw clearly
is bounded. A nontrivial task is to prove (a). Thus, let us fix v and v 0 and prove that Lv and
Lv0 have a point in common.
10 . Consider the quadratic forms

S[p, q] = (pv + qv 0 )T S(pv + qv 0 ), R[p, q] = (pv + qv 0 )T R(pv + qv 0 )


SOLUTIONS TO EXERCISES 211

on R2 , and let    
a d α δ
S= , R=
d b δ β
be the matrices of these forms. What we should prove is that there exists nonnegative λ such
that
α ≤ λa, β ≤ λb. (13.29)
The following four cases are possible:
Case A: a > 0, b > 0. In this case (13.29) is valid for all large enough positive λ.
Case B: a ≤ 0, b ≤ 0. Since a = v T Sv and α = v T Rv, in the case of a ≤ 0 we have also
α ≤ 0 (this is given by (Impl)). Similarly, b ≤ 0 ⇒ β ≤ 0. Thus, in the case in question α ≤ 0,
β ≤ 0, and (13.29) is satisfied by λ = 0.
Case C: a ≤ 0, b > 0; Case D: a > 0, b ≤ 0. These are the only nontrivial cases which we
should consider; due to the symmetry, we may restrict ourselves with the case C only. Thus,
from now on a ≤ 0, b > 0.
20 . Assume (case C.1) that a < 0. Then the determinant ab − d2 of the matrix S is negative,
so that inappropriate
 coordinates
 p0 , q 0 on the plane the matrix S 0 of the quadratic form S[·]

1 0 ξ ζ
becomes . Let be the matrix of the form R[·] in the coordinates p0 , q 0 .
0 −1 ζ −η
(Impl) says to us that for any 2-dimensional vector z = (p0 , q 0 )T we have

z T S 0 z ≡ (p0 )2 − (q 0 )2 ≤ 0 ⇒ z T R0 z = ξ(p0 )2 + 2ζp0 q 0 − η(q 0 )2 ≤ 0. (13.30)

The premise in this implication is satisfied by z = (0, 1)T , z = (1, 1)T and z = (1, −1)T , and the
conclusion of it says to us that η ≥ 0, ξ − η ± 2ζ ≤ 0, whence

η ≥ 0; η − ξ ≥ 2|ζ|. (13.31)

20 .1. Assume, first, that the quantity

η+ξ
λ=
2
is nonnegative. Then the matrix
 η−ξ 
0 0 2 −ζ
λS − R = η−ξ
−ζ 2

is positive semidefinite (see (13.31)), and, consequently, the matrix

λS − R

is positive semidefinite, so that λ satisfies (13.29).


20 .2. Now assume that η + ξ < 0. p In the case in question −ξ = |ξ| > η ≥ 0 (the latter
inequality is given by (13.31)). Let ρε = (η + ε)|ξ|−1 , where ε > 0 is so small that 0 ≤ ρε ≤ 1.
The premise in (13.30) is satisfied by z = (ρε , ±1)T , so that from the conclusion of the implication
it follows that
−|ξ|ρ2ε − η ± 2ζρε ≤ 0,
or p
(2η + ε) |ξ|
|ζ| ≤ √
2 η+ε
212 SOLUTIONS TO EXERCISES

p
for all small enough positive
 ε. Passing to limit as ε → 0, we come to |ζ| ≤ η|ξ|. Thus, in
−|ξ| ζ
the case in question R0 = is 2 × 2 matrix with nonpositive diagonal entries and
ζ −|η|
nonnegative determinant; consequently, this matrix is negative semidefinite, so that R also is
negative semidefinite, and (13.29) is satisfied by λ = 0.
30 . It remains to consider the case C.2 when b > 0, a = 0. Here we have a = v T Sv = 0, so
that α = v T Rv ≤ 0 by (Impl). Since b > 0, (13.29) is satisfied for all large enough positive λ.
Solutions to Section 11.4
Exercise 11.4.8: we should prove that
(i) if Am (t, X; τ, U ) is positive semidefinite, then Sm (X) ≤ t;
(ii) if Sm (X) ≤ t, then there exist τ and U such that Am (t, X; τ, U ) is positive semidefinite.
Let us start with (i). Due to construction of Am (·), both matrices τ I + U − X and U are
positive semidefinite; in particular, X ≤ τ I +U , whence, due to monotonicity of Sm (·), Sm (X) ≤
Sm (τ I +U ), The latter quantity clearly is mτ +Sm (U ) ≤ mτ +Tr U . Thus, Sm (X) ≤ mτ +Tr U ,
while t ≥ mτ + Tr U , again due to the construction of Am (·), Thus, Sm (X) ≤ t, as required.
To prove (ii), let us denote by λ1 ≥ ... ≥ λk the eigenvalues of X, and let U have the same
eigenvectors as X and the eigenvalues

λ1 − λm , λ2 − λm ..., λm−1 − λm , 0, ..., 0.

Set also τ = λm . The U is positive senidefinite, while τ I + U − X is the matrix with the
eigenvalues
0, 0, ..., 0, λm − λm+1 , ..., λm − λk ,
so that it also is positive semidefinite. At the same time mτ + Tr U = Sm (X) ≤ t, so that
Am (t, X; τ, U ) is positive semidefinite.
Exercise 11.4.9: let λi , i = 1, ..., 2k, be the eigenvalues
 of Y (X),and σ1 , ..., σk be the singular
2 XX T 0
values of X. It is immediately seen that Y (X) = . We know that the sequence
0 XT X
of eigenvalues of XX T is the sasme of sequence of eigenvalues of X T X, and the latter sequence
is the same as the sequence of squared eigenvalues of X, by definition of the singular values.
Since Y 2 (X) is block diagonal with diagonal blocks XX T and X T X, and both blocks have the
same seqeunces of eigenvalues, to get the sequence of eigenvalues of Y 2 (X), you should twicen
the multiplicity of each eigenvalue of X T X. Thus, the sequence of eigenvalues of Y 2 (X) is

(I) σ12 , σ12 , σ22 , σ22 , ..., σk2 , σk2 .

On the other hand, the sequence of eigenvalues of Y 2 (x) is comprised of (possibly, reordered)
squared eigenvalues of Y (x). Thus, the sequence

λ21 , λ22 , ..., λ22k

differs from (I) only by order. To derive from this intermediate conclusion the statement in
question, it suffices to prove that if certain λ 6= 0 is an eigenvalue of Y (X) of certain multiplicity
s, then −λ also is an eigenvalue of the same multiplicity s. But this is simple. Let L be the
eigenspace
  of Y (X) associated with the eigenvalue λ. In other words, L is comprised of all
u
vectors , u, v ∈ Rk , for which
v
   
Xv u
=λ . (13.32)
XT u v
SOLUTIONS TO EXERCISES 213

Now consider the space    


Xv u
L− = { | ∈ L}.
−X T u v
It is immediately seen that L− is comprised of eigenvectors of Y (X) with the eigenvalue −λ:
      
0 X Xv −XX T u Xv
= = −λ .
XT 0 −X T u X T Xv −X T u
   
u Xv
If we could prove that the mapping 7→ , restricted to L, has no kernel, we could
v −X T u
conclude that dim L− ≥ dim L, so that the multiplicity of the eigenvalue −λ is at least that
one of the eigenvalue λ; by swapping λ and −λ, we would conclude that the multiplicities
  of
u
both the eigenvalues are equal, as required. Thus, it remains to verify that if ∈ L and
v
Xv = 0, X T u = 0, then u and v are both zeros. But this is an immediate consequence of (13.32)
and the assumption that λ 6= 0.

You might also like