MIT OpenCourseWare
http://ocw.mit.edu
6.854J / 18.415J Advanced Algorithms
Fall 2008
��
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
18.415/6.854 Advanced Algorithms October 3, 2001
Lecture 7
Lecturer: Michel X. Goemans Scribe: Nick Hanssens and Nick Matsakis
1 Properties of Barrier Problem
Last lecture, we used the strict convexity of the logarithmic barrier function to show that B P ( p )
has at most one solution. Now we will prove that a solution exists. We assume throughout and
without loss of generality that the rank of A = m where A is rn x n.
T h e o r e m 1 Assume that 3 2 > 0 : Ax = b (2.e. B P ( p ) is feasible), and 3 S > 0, 3 y : ATjj +S = C.
Then B P ( p ) is finite and has a unique solution.
P r o o f of T h e o r e m 1:
Take any x > 0 such that Ax = b. We have:
= (s^T+jjT~)~-pCln~i
j
= STx+eT~x
-pClnxj
j
= jjTb +C ( i j x j - p ln xj), (this sum cannot be arbitrarily negative)
j
implying that the objective function is lower bounded by a constant (this follows from the fact that
+
for p > 0, S j x p l n x tends to +oo as x goes to 0 or to + m ) . Therefore the infimum of B P ( p ) is
finite for every p > 0. To show that the infimum is attained (that there exists an optimum solution),
it is sufficient to notice that the argument above also leads to upper and lower bounds on x j in order
to have a value below the one for 2 , which means that we can restrict our attention to a compact
set; this implies that the infimum is attained. Finally, we have shown last time that if an optimum
solution exists then it is unique.
For any p > 0, the unique solution to B P ( p ) is called the p-center.
2 Karush, Kuhn, and Tucker (KKT) Conditions
Remember the optimality conditions from last lecture. The solution x is optimum for B P ( p ) if 3 y
such that
where
By setting s to be pXV1e, these conditions can be re-written as 3 y, s such that
2.1 Definition of Algorithm
To find the p-center, we need to solve (1)-(3). However, the constraints (3) are quadratic, making
this hard to solve1. Instead, in order to find (or approximate) the p-center, we use an iterative
method based on Netwon's method. We assume we have a solution that satisfies (1) and (2), but
not necessarily (3). We will then linearize equations (3) around our values of x and s, and solve the
corresponding linear system. This gives us new values for x and s and we proceed. We will show
that if we start "close enough" from the p-center then after this update step we will be even closer,
and this iterative process will converge to the p-center of B P ( p ) .
2.2 Update Derivation
Replacing x, y and s with
and ignoring Ax . As in (3), we arrive at
AAx = 0,
A ~ A Y + A S= 0,
x j . sj + +
Axj - sj x j . Asj = /L.
We claim this system has the unique solution,
where
lif such a system was easy to solve then this would give a simple algorithm for linear programming by setting p to
0 and replacing the strict inequalities by inequalities.
Indeed, (6) implies that Axj +xjsjlAsj = psj' - x j , or in vector notation,
Premultiplying by A, using (4) and the fact that Arr: = b, we get
Observe that this is not a square system of equations (but we have m equations in n unknowns).
Substituting As by -ATAy (because of (5)), we get
But AXS-'AT is an m x m matrix of rank m since A has rank m and X and S-' are diagonal
matrices with positive diagonal elements. Thus AXS-'AT is invertible and we derive (7). (5) then
immediately implies (8), and (10) implies (9).
+ +
At each step, then, replace x and s with the values x Ax and s As (y can always be derived
from x and s). We will show that this iteration will converge to the p-center of BP(p).
3 Definitions and Properties
3.1 Proximity Measure
1 11
Let o(x, s, p) = lv be the proximity measure where vj = - 1. Note that this will be zero at
the p-center. We will show that llvll decreases with each iteration.
3.2 ds and dx
As (x,S,p) + (x + Ax, s + As, p), our proximity vector v becomes w where:
-
p + Axj - Asj - 1
lu
- Axj - Asj which we are hoping will be small.
P
For the analysis, it will be useful to rescale the x-space and the s-space so that the the current
iterates x and s are equal, but in a way that x j s j remains constant. For this, we will rescale
component j of any vector in the x-space by and component j of any vector in the s space
by f i .Rescaling Ax and As, we express wj as wj = dxj . dsj where dxj = (3. c)
and
3.3 Properties
Property 1 A x l A s .
Proof of Property 1: This is true because Ax is in the nullspace of A while As is in the
columnspace of AT. Indeed, premultiplying (5) by AxT and using (4), we get that AxTAs = 0.
+ +
Observe that although x Ax and s As do not necessarily satisfy (and will not) that (xj +
+
Axj) (sj Asj) = p, on average they do since the duality gap (x + +
AX)^ (s As) = Cj(xjs j +
Axjsj + xjAsj + AxjAsj) = n p by the above property and (6).
Property 2 d x l d s .
Proof of Property 2:
dsj and Property 1.
dxTds = Cjdxj - ds3. -
- P
'CjAxj . Asj = 0, using the definition of dxj,
Property 3
Proof of Property 3:
4 Theorems 2 and 3
Theorem 2 If a(x, s, p) < 1, then x + Ax > 0 + As > 0.
and s
Theorem 3 If a(x, s, p) < a < 1, then o(x + AX,s + As, p) < -.
These two theorems guarantee that, provided a(x, s, p) is sufficiently small, repeated updates
+ +
x Ax, s As given in the algorithm will converge to the p-center. Theorem 2 was not proved in
this lecture. The proof for theorem 3 is provided below. Theorem 3 shows that the convergence is
quadratic (provided we are close enough).
+ +
Proof of Theorem 3: We have that o2(x Ax, s As, p) = 1 lw1I2 = CjW: = Cj dx: . ds:.
< +
Using the fact that 4 a . b (a b)2 and Property 2,
Using property 3,
Taking the square root, we get
Now, consider these two terms. The first, x. vz, is equal to a2 by definition. The second,
? 3
maxj p / (xj . sj), is at most 1/(1 - a ) by the following argument:
since 1 - a is strictly positive under the conditions of the theorem. Using these upper bounds in
( l l ) , we can conclude the proof by stating,
a2
a(x + Ax, s + As, p) 5 2 - (1-0)'
Corollary 4 I f a < $, then -< a < $.
This corollary gives us a necessary initial bound on a to guarantee convergence.
5 Theorem 5
Theorems 2 and 3 show that the updates Ax, As given in the algorithm will converge to the p-center.
However, instead of making the Newton updates to converge to the p-center for a fixed value of p,
we take one step to get closer to the p-center ( a becomes now a2/(2(1 - a))) and then decrease p
(since our goal is let p. tend to 0) in such a way that our new iterate is within the original value of
a but with respect to the updated p. Theorem 5 shows how the proximity measure changes as we
change p.
Theorem 5 Suppose xTs = n p and a = a(x, s,p), then o(x, s, p(1- 8)) = &do2 + O2 .n.
~ As) = np,
Observe that our new iterate $+Ax and s + As satisfy the condition that ( x + A x ) (sf
and hence Theorem 5 can be applied to see how the proximity measure changes when we modify p
after a Newton iterate.
Proof of Theorem 5: Let v' be the vector v after having changed p, i.e. v(i =
have that
a - 1. We
Thus u' = &v + m6 e .
Our assumption that zTs = np can be translated into uTe = 0 since uTe = C3. u3. = C j (y- 1) =
- -
zTs n. Therefore, we get that
P
Taking square roots, we get the desired result.