Numerically Stable LDL Factorizations in Interior Point Methods For Convex Quadratic Programming
Numerically Stable LDL Factorizations in Interior Point Methods For Convex Quadratic Programming
Numerically Stable LDL Factorizations in Interior Point Methods For Convex Quadratic Programming
doi:10.1093/imanum/drn058
Numerically stable LDL
T
factorizations in interior point methods for
convex quadratic programming
D. GOLDFARB
Industrial Engineering and Operations Research, Department,
Columbia University, New York, NY 10027, USA
AND
K. SCHEINBERG
IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, USA
[Received on 3 April 2007; revised on 6 September 2008]
Dedicated to Prof. M. J. D. Powell on the occasion of his 70th birthday.
Fletcher & Powell (1974, Math. Comput., 28, 10671087) proposed a numerically stable method for
updating the LDL
T
factorization of a symmetric positive-denite matrix when a symmetric low-rank term
is added to it. In Goldfarb & Scheinberg (2004, Math. Program., 99, 134) we proposed a product-form
version of the method of Fletcher and Powell for use in interior point methods for linear programming
and studied its numerical stability. In this paper we extend these results to convex quadratic programming
where the Hessian matrix of the objective function is, or can be approximated by, a diagonal matrix plus
a matrix of low rank. Our new results are based on showing that the elements of the unit lower triangular
matrix in the LDL
T
factorizations that arise in this context are uniformly bounded as the duality gap is
driven to zero. Practicable versions of our approach are described for structured quadratic programs that
arise in support vector machines and portfolio optimization.
Keywords: product-form LDL
T
factorization; Cholesky factorization; convex quadratic programming;
interior point methods; normal equations; support vector machines; portfolio optimization.
1. Introduction
In this paper we are concerned with the application of interior point methods (IPMs) to convex quadratic
programming (QP) problems of the form
minimize
1
2
x
T
Qx +c
T
x
subject to Ax = b, (1.1)
x 0,
where the symmetric positive-semidenite matrix Q R
nn
, the rank-m matrix A R
mn
, and the
vectors c R
n
and b R
m
are the data for the problem, and x R
n
is the vector of variables. We are
particularly interested here in problems in which the matrix Q has the special structure Q = D
Q
+VV
T
,
where D
Q
R
nn
is a non-negative diagonal matrix and V R
nk
, with k n.
Email: katyas@us.ibm.com
c The author 2008. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved.
a
t
A
z
a
r
b
i
j
a
n
T
a
r
b
i
a
t
M
o
a
l
l
e
m
U
n
i
v
e
r
s
i
t
y
o
n
N
o
v
e
m
b
e
r
8
,
2
0
1
1
h
t
t
p
:
/
/
i
m
a
j
n
a
.
o
x
f
o
r
d
j
o
u
r
n
a
l
s
.
o
r
g
/
D
o
w
n
l
o
a
d
e
d
f
r
o
m
LDL
T
FACTORIZATIONS IN IPMS FOR CONVEX QP 807
The KarushKuhnTucker (KKT) necessary and sufcient optimality conditions for problem (1.1)
are
A
T
y Qx +s = c,
Ax = b,
Xs = e,
x 0, s 0, (1.2)
where = 0. Here y R
m
is the vector of dual variables corresponding to the equality constraints and
s R
n
is the vector of dual slack variables. Note that if x is primal feasible and y and s are dual feasible
then n is the duality gap. By X and S we denote the diagonal matrices whose diagonal entries are the
elements of the vectors x and s, respectively, and we use I to denote the identity matrix and e to denote
a vector of all ones.
Primaldual IPMs generate a sequence {x(), y(), s()} of iterates, with x() > 0 and s() > 0,
that approximate the unique solutions of the perturbed KKT optimality conditions (1.2) for a decreasing
sequence of parameter values
1
>
2
> > 0. In a typical primaldual IPM each iteration involves
computing a Newton step (x, y, s) for the system of nonlinear equations (1.2) for a particular
value of i.e., solving
_
_
_
Q A
T
I
A 0 0
S 0 X
_
_
_
_
_
x
y
s
_
_
=
_
_
_
c (A
T
y Qx +s)
b Ax
e Xs
_
_
. (1.3)
In feasible IPMs we have b Ax = 0 and c (A
T
y Qx +s) = 0.
By eliminating s from the above system, one obtains a smaller linear system, usually referred to
as the augmented system, whose matrix is
T
_
F A
T
A 0
_
, (1.4)
where F = Q + D
2
and D
2
= X
1
S. In the normal equations approach for solving this system one
reduces it further by eliminating the x variables (forming the Schur complement (SC) of F in the
matrix (1.4)) to yield a system of equations with a symmetric positive-denite matrix
M AF
1
A
T
= A(Q + D
2
)
1
A
T
, (1.5)
which can be solved by forming its LDL
T1
or Cholesky factorization. In purely primal and purely dual
IPMs one has to solve similar systems. In these cases the diagonal matrix D
2
equals X
2
and
1
S
2
,
respectively.
In the case of linear programming (LP) the normal equations matrix (1.5) reduces to AD
2
A
T
since
Q = 0. In Goldfarb & Scheinberg (2004) we proposed a product-form factorization (PFF) method to
handle dense columns in the constraint matrix A in an LP problem that gives rise to a normal equations
1
LDL
T
is the name of a particular symmetric matrix factorization. The diagonal matrix D is not the same as the diagonal matrix
D in (1.5).
a
t
A
z
a
r
b
i
j
a
n
T
a
r
b
i
a
t
M
o
a
l
l
e
m
U
n
i
v
e
r
s
i
t
y
o
n
N
o
v
e
m
b
e
r
8
,
2
0
1
1
h
t
t
p
:
/
/
i
m
a
j
n
a
.
o
x
f
o
r
d
j
o
u
r
n
a
l
s
.
o
r
g
/
D
o
w
n
l
o
a
d
e
d
f
r
o
m
808 D. GOLDFARB AND K. SCHEINBERG
matrix consisting of a dense low-rank part and a (potentially) sparse part. In Goldfarb & Scheinberg
(2004) we showed that this method is comparable in efciency and is more stable in practice than the
ShermanMorrisonWoodbury (SMW) approach for handling the dense low-rank part. We also proved
that, unlike the SMW approach, the PFF approach is numerically stable by using the analysis of Fletcher
& Powell (1974) and a new property of the LDL
T
factorization that we also proved in Goldfarb &
Scheinberg (2004): the unit lower triangular matrix in this factorization of the normal equations matrix
AD
2
A
T
remains uniformly bounded (in norm) for any xed matrix A and all positive-denite diagonal
matrices D. This means that the norm of L remains uniformly bounded as the IPM iterates converge
to a solution (in an arbitrary manner) even though D converges to a singular matrix and AD
2
A
T
may
also converge to a singular matrix.
In the case of second-order cone programming (SOCP), which includes convex QP as a special case,
the normal equations matrix has the form A
F
1
A
T
, where
F
1
is the sum of a diagonal matrix and a
block diagonal matrix, each of whose blocks are the difference of two rank-one outer products (Alizadeh
& Goldfarb, 2003). In Goldfarb & Scheinberg (2005) we proposed applying the PFF approach to this
matrix. This required extending the PFF method to handle subtraction as well as addition of positive-
semidenite low-rank terms. The PFF for SOCP was incorporated into the software package SeDuMi
(J. F. Sturm, personal communication) and its efciency and stability have been veried in practice.
To provide a theoretical proof of this numerical stability, again using an analysis based on the work of
Fletcher & Powell (1974), we proved that, as in the LP case, the norm of the lower triangular matrix in
the LDL
T
factorization of the normal equations matrix A
F
1
A
T
is uniformly bounded as the iterates of
an IPM converge to a solution. However, in contrast with the LP case, this result is only valid under the
assumption that the iterates remain in a xed neighbourhood of the central path.
In this paper we extend to the convex QP case results similar to those that we obtained for the LP
case, which are stronger than their SOCP counterparts. First, we show that the elements of L in the
LDL
T
factorizations of Q + D
2
and the normal equation matrix A(Q + D
2
)
1
A
T
remain uniformly
bounded as the duality gap is driven to zero, and hence the elements of D become unbounded during the
iterations of an IPM. Second, we develop PFFs for these two matrices that are practicable when Q has
low rank or equals a diagonal matrix plus a matrix of low ranki.e., Q = VV
T
or Q = D
Q
+ VV
T
,
where V R
nk
and k nand for the second factorization m, the number of rows of A, is similar
in size to n and a well-conditioned basis matrix B can easily be chosen from the columns of A. Then,
because of the boundedness results just mentioned, it follows that these factorizations are numerically
stable.
In Fine & Scheinberg (2001) the PFF approach considered here for factorizing Q + D
2
was applied
to convex QP problems that arise in machine learningnamely in support vector machines (SVMs)
(Cristianini & Shawe-Taylor, 2000; Sch olkopf & Smola, 2002). In these QP problems the Hessian Q
is, or can be well approximated by, a large-scale dense low-rank matrix and A contains only a few
rows. It was shown in Fine & Scheinberg (2001) that, for certain SVM applications, the PFF approach
provides a signicant improvement in the computational workload of an IPM; however, the stability of
the PFF approach used was not analysed, nor was an extension of it to large-scale problems with many
constraints considered. In this paper we provide this analysis. We note that even lower workloads can
be achieved by using the SMW approach or separable equivalent approaches (see Section 3.4) and that,
in spite of the fact that these approaches can be numerically unstable, excellent results using them have
been reported (Ferris & Munson, 2003) for solving large SVM problems and in Vanderbei & Carpenter
(1993) and Gondzio & Grothey (2007) for general convex QP problems. We also note that the PFF
approach proposed in this paper can be incorporated into the block-structured framework developed by
Gondzio & Grothey (2007) and used in their object-oriented parallel solver OOPS.
a
t
A
z
a
r
b
i
j
a
n
T
a
r
b
i
a
t
M
o
a
l
l
e
m
U
n
i
v
e
r
s
i
t
y
o
n
N
o
v
e
m
b
e
r
8
,
2
0
1
1
h
t
t
p
:
/
/
i
m
a
j
n
a
.
o
x
f
o
r
d
j
o
u
r
n
a
l
s
.
o
r
g
/
D
o
w
n
l
o
a
d
e
d
f
r
o
m
LDL
T
FACTORIZATIONS IN IPMS FOR CONVEX QP 809
The paper is organized as follows. In Section 2 we show that the unit lower triangular matrix in the
LDL
T
factorizations of Q + D
2
and the normal equation matrix A(Q + D
2
)
1
A
T
remains uniformly
bounded for any xed matrix A and positive-semidenite matrix Q and all positive-denite diagonal
matrices D. Note that we do not need any assumptions on D and, consequently, on the iterates of the
IPM. Hence, in terms of this property, QP is closer to LP than to SOCP.
We then turn our attention in Section 3 to efcient and numerically stable methods for implementing
the linear algebra at each IPM iteration, i.e., for solving the systems of equations (D
2
+ Q)u = w
and A(D
2
+ Q)
1
A
T
u = w when Q can be expressed as a low-rank outer product Q = VV
T
or Q =
D
Q
+VV
T
. We develop product-form LDL
T
factorizations of D
2
+Q and A(D
2
+Q)
1
A
T
, effectively
extending the product-form LDL
T
factorization approach in Goldfarb & Scheinberg (2004) to such
structured QP problems. This PFF approach and the analysis of its numerical stability are based heavily
on the earlier work of Fletcher & Powell (1974). We also discuss the use of the SMW update formula,
approaches that transform the objective function into separable form and product-form techniques for
factorizing the symmetric indenite matrix (1.4) of the augmented system.
Finally, in Sections 4 and 5 we describe two areas where IPMs for QP using our PFF approach can
be applied: (i) SVMs for solving classication problems and (ii) portfolio optimization.
2. Uniform boundedness of the LDL
T
factor L
The results in this section apply to arbitrary positive-semidenite matrices Q and hence to IPMs applied
to general convex QP problems. These results are based on the following lemma and theorem, which
are proved in Goldfarb & Scheinberg (2004).
LEMMA 2.1 If L is a lower triangular matrix with ones on the diagonal and subdiagonal elements that
depend on a parameter and are uniformly bounded as 0 then the elements of L
1
are also
uniformly bounded as 0.
THEOREM 2.2 Let
A be a xed mn matrix of rank m,
D be a positive diagonal matrix whose diagonal
elements depend on and LL
T
be the LDL
T
factorization of
A
D
2
A
T
. Then the elements of L are
uniformly bounded as 0.
We now extend Theorem 2.2 to the LDL
T
factorizations of F Q+D
2
and M AF
1
A
T
, where
Q is a symmetric positive-semidenite matrix and D is a positive diagonal matrix whose elements
depend on a parameter. This property is crucial for proving in Section 3 below that the PFFs of F and
M are numerically stable.
THEOREM 2.3 Let A be a xed m n matrix of rank m, Q be a xed n n symmetric positive-
semidenite matrix and D be a positive diagonal matrix whose diagonal elements depend on . Let
L
F
F
L
T
F
and LL
T
be the LDL
T
factorizations of F Q + D
2
and M AF
1
A
T
, respectively.
Then the elements of L
F
and L are uniformly bounded as 0.
Proof. If the rank of Q is k then Q can be expressed as VV
T
, where V R
nk
and k n. Letting
A [I, V] and
D
2
_
D
2
0
0 I
_
, the uniform boundedness of the elements of L
F
follows immediately
from Theorem 2.2 since
F Q + D
2
= VV
T
+ D
2
= [I, V]
_
D
2
0
0 I
__
I
V
T
_
= L
F
F
L
T
F
. (2.1)
a
t
A
z
a
r
b
i
j
a
n
T
a
r
b
i
a
t
M
o
a
l
l
e
m
U
n
i
v
e
r
s
i
t
y
o
n
N
o
v
e
m
b
e
r
8
,
2
0
1
1
h
t
t
p
:
/
/
i
m
a
j
n
a
.
o
x
f
o
r
d
j
o
u
r
n
a
l
s
.
o
r
g
/
D
o
w
n
l
o
a
d
e
d
f
r
o
m
810 D. GOLDFARB AND K. SCHEINBERG
To prove that the elements of L are uniformly bounded, let E R
(nm)n
be any matrix such that
the matrix K
T
[A
T
, E
T
] is nonsingular and let
M K F
1
K
T
. For example, if A = [B, N], where
B is a nonsingular m m matrix, then we can let E = [0, I ], which gives
K
_
A
E
_
=
_
B N
0 I
_
. (2.2)
Also let
P
_
_
_
_
_
_
_
_
1
1
_
_
R
nn
(2.3)
be the reverse-order permutation matrix and
L
L
T
be the LDL
T
factorization of P
M
1
P. Then
M = P
L
T
1
L
1
P = P
L
T
PP
1
PP
L
1
P =
L
L
T
(2.4)
is the LDL
T
factorization of
M, where
L P
L
T
P and
P
1
P. Since M = [I, 0]
M[I, 0]
T
, the
LDL
T
factorization of M is
M = LL
T
, where
L =
_
L 0
L
_
and
=
_
0
0
_
. (2.5)
Furthermore, if we let H PK
T
then it follows from (2.1) that
P
M
1
P = H(VV
T
+ D
2
)H
T
= [H, HV]
_
D
2
0
0 I
__
H
T
V
T
H
T
_
.
Now, after dening
A [H, HV] and
D
2
_
D
2
0
0 I
_
, we can apply Theorem 2.2 to the LDL
T
factorization
L
L
T
of P
M
1
P
A
D
2
A
T
and conclude that the elements of
L are uniformly bounded
for all D. Then, by Lemma 2.1, the elements of
L = P
L
T
P are uniformly bounded for all D, and
from (2.5) so are those of L.
Note that, since the dependence of the diagonal elements of D in Theorems 2.2 and 2.3 is totally
arbitrary, the uniform boundedness of L holds for all D in the open set of positive-denite diagonal
matrices.
3. Low-rank updates
Solving the normal equations or the augmented system (depending on the approach taken) is the most
computationally intensive part of each interior point iteration. There are two potentially expensive steps
if the normal equations are solved: one is the factorization of the matrix F Q + D
2
and the other
is the formation and factorization of M AF
1
A
T
. If m, the number of linear constraints, is small
a
t
A
z
a
r
b
i
j
a
n
T
a
r
b
i
a
t
M
o
a
l
l
e
m
U
n
i
v
e
r
s
i
t
y
o
n
N
o
v
e
m
b
e
r
8
,
2
0
1
1
h
t
t
p
:
/
/
i
m
a
j
n
a
.
o
x
f
o
r
d
j
o
u
r
n
a
l
s
.
o
r
g
/
D
o
w
n
l
o
a
d
e
d
f
r
o
m
LDL
T
FACTORIZATIONS IN IPMS FOR CONVEX QP 811
then performing the latter two operations is relatively inexpensive. Even if m is large, if A, F and
F
1
are sparse then solving the normal equations involves the factorization of two sparse matrices and
can be done inexpensively. If Q is dense then both matrices F and M are dense, and computing their
LDL
T
factorizations takes O(n
3
) and O(m
3
) operations, respectively. Here, and in the rest of the paper,
by operations we mean multiplications and/or divisions. In most cases the number of additions and/or
subtractions required by a procedure is approximately the same as the number of multiplications and/or
divisions required. It also takes O(mn
2
) operations to form M. However, Q often has low ranki.e., it
can be represented as Q = VV
T
, where V R
nk
and has full column rank and k nor it can be
approximated by such a low-rank outer product. Note that if Q is the sum of a low-rank outer product
and a diagonal matrix then the special structure of Q + D
2
is essentially the same as when Q = VV
T
since the diagonal component of Q can be incorporated into D
2
. In this section we consider several
methods for exploiting this special structure of Q to decrease the computational cost of each interior
point iteration.
We rst consider two approaches, one based on the SMW updating formula and the other based on
the PFF method proposed in Goldfarb & Scheinberg (2004), for efciently solving the systems (D
2
+
VV
T
)u = w and A(D
2
+ VV
T
)
1
A
T
u = w when A is sparse or only has a few dense columns
and m k. The workload and storage requirements of the SMW approach are smaller than the PFF
approach applied to these systems, but the SMW approach can experience numerical instability. We then
consider a third approach based on transforming the quadratic objective function into separable form by
introducing auxiliary variables. Finally, we briey discuss how to take advantage of the structure of Q
in applying a symmetric indenite factorization method to the matrix T in (1.4).
3.1 SMW updates
One can avoid computing and storing the LDL
T
factorization of the nn dense matrix D
2
+VV
T
when
solving the system (D
2
+ VV
T
)u = w by applying the SMW formula
(D
2
+ VV
T
)
1
= D
2
D
2
V(I + V
T
D
2
V)
1
V
T
D
2
(3.1)
as follows: compute
S: S = I + V
T
D
2
V,
L
v
,
v
: L
v
v
L
T
v
= S,
z: z = D
2
w,
t : L
v
t = V
T
z, (3.2)
t : L
T
v
t =
1
v
t ,
u: u = z D
2
Vt.
Forming the k k matrix S = (I +V
T
D
2
V) takes
1
2
k
2
n +O(kn) operations, while computing its
LDL
T
factorization L
v
v
L
T
v
takes
1
6
k
3
operations. Since all other computations require at most O(kn)
operations, if k is small with respect to n then the overall work required to solve (D
2
+ VV
T
)u = w is
1
6
k
3
+
1
2
nk
2
+O(kn). Next, to solve the system of normal equations
A(D
2
+ VV
T
)
1
A
T
u = w, (3.3)
a
t
A
z
a
r
b
i
j
a
n
T
a
r
b
i
a
t
M
o
a
l
l
e
m
U
n
i
v
e
r
s
i
t
y
o
n
N
o
v
e
m
b
e
r
8
,
2
0
1
1
h
t
t
p
:
/
/
i
m
a
j
n
a
.
o
x
f
o
r
d
j
o
u
r
n
a
l
s
.
o
r
g
/
D
o
w
n
l
o
a
d
e
d
f
r
o
m
812 D. GOLDFARB AND K. SCHEINBERG
we rst note that, using (3.1) above, we have that
A(D
2
+ VV
T
)
1
A
T
= AD
2
A
T
AD
2
V(I + V
T
D
2
V)
1
V
T
D
2
A
T
(3.4)
= AD
2
A
T
AS
1
A
T
,
where
A AD
2
V and S is dened in (3.2). Hence, forming the above m m matrix M A(D
2
+
VV
T
)
1
A
T
by using the LDL
T
factorization L
v
v
L
T
v
of S, computing the LDL
T
factorization of M
and solving the normal equations system (3.3) requires
1
6
m
3
+
1
2
(k
2
m + m
2
k) + O(m
2
) operations,
assuming that the sparsity in A enables AD
2
A
T
to be computed using only O(m
2
) operations.
A better approach when A is sparse and k m is to apply the SMW formula again, this time to
(3.4). Specically, letting the LDL
T
factorization of AD
2
A
T
be LL
T
and dening
V L
1
A =
L
1
AD
2
V and
S S
V
T
1
V, we obtain after some algebraic manipulation
[A(D
2
+ VV
T
)
1
A
T
]
1
= L
T
1
L
1
+ L
T
1
V
S
1
V
1
L
1
.
Therefore we can solve the system (3.3) by the following procedure:
L, : LL
T
= AD
2
A
T
,
V: L
V = AD
2
V,
S:
S = S
V
T
1
V,
L,
:
L
L
T
=
S, (3.5)
z: Lz = w,
t :
L
t =
V
T
1
z,
t :
L
T
t =
t ,
u: L
T
u :=
1
(z +
Vt ).
Assuming that L has only O(m) nonzero subdiagonal elements and that m
2
> n, the total operation
count for (3.5) is O(m
2
) +k
2
m +O(km) +
1
6
k
3
.
The SMWupdate (3.2) has been used in an IPMfor solving the QP problems that arise in linear SVM
problems (Ferris & Munson, 2003). It has also been widely used in IPMs for LP (Marxen, 1989; Choi
et al., 1990). In the latter context the method often runs into numerical difculties. While modications
to the SMW update have been proposed to deal with its numerical instability, these modications do not
always help and they increase the computational cost (Andersen, 1996; Scheinberg & Wright, 2000).
Intuition suggests that numerical problems occur when D
2
is very ill-conditioned. The following
example that is similar to one considered in Fine & Scheinberg (2001) illustrates this.
EXAMPLE 3.1 Let
D
2
=
_
_
_
2
1
_
_
, V =
_
_
_
1 1
1 1
1 1
_
_
and A
T
=
_
_
_
1 0
0 1
1 1
_
_
,
where
2
is smaller than machine precision, i.e., 1
2
is computed as 1. The systems (D
2
+ VV
T
)
u = w and A(D
2
+ VV
T
)
1
A
T
u = w are well conditioned and solving them directly yields
a
t
A
z
a
r
b
i
j
a
n
T
a
r
b
i
a
t
M
o
a
l
l
e
m
U
n
i
v
e
r
s
i
t
y
o
n
N
o
v
e
m
b
e
r
8
,
2
0
1
1
h
t
t
p
:
/
/
i
m
a
j
n
a
.
o
x
f
o
r
d
j
o
u
r
n
a
l
s
.
o
r
g
/
D
o
w
n
l
o
a
d
e
d
f
r
o
m
LDL
T
FACTORIZATIONS IN IPMS FOR CONVEX QP 813
u =
_
3
2
w
1
w
3
,
1
2
w
2
, w
3
w
1
_
T
and u =
_
2 w
1
,
3
2
w
2
_
T
, which are accurate up to machine
precision. Applying the SMW procedure (3.2) to solve (D
2
+ VV
T
)u = w yields
S =
_
2
2
1
1
2
2
_
, L
v
=
_
1 0
2
2
1
_
,
v
=
_
2
2
0
0
2
2
_
, z =
_
_
_
_
w
1
2
w
2
2
w
3
_
_
_
_
,
t =
_
_
w
1
w
2
2
w
1
+w
2
2
_
_
, t =
_
w
1
w
2
2
w
1
+w
2
2
_
and u =
_
_
_
0
0
w
3
w
1
_
_
_
.
The rst two components of the SMW solution are totally inaccurate due to the extreme loss of infor-
mation in computing the matrix S = I +V
T
D
2
V. The size of the error depends on the size of
2
: e.g.,
if
2
10
10
(so 1
2
is not computed as 1) then some of the intermediate computed quantities and
the rst two components of u will have only ve correct digits.
Applying procedure (3.5) yields
AD
2
A
T
=
_
1
2
1
1
1
2
_
, L =
_
1 0
2
1
_
, =
_
1
2
0
0
1
2
_
,
V =
_
_
1
2
1
2
1
2
1
2
_
_
and
S =
_
0 1
1 0
_
.
The matrix
S is indenite and does not have an LDL
T
factorization. Hence procedure (3.5) breaks down.
In the next two subsections we present alternative methods that avoid many of the numerical
difculties of the above SMW approaches.
3.2 Product-form LDL
T
factorization of D
2
+ VV
T
An efcient alternative to using the SMW approach (3.2) for solving the system (D
2
+ VV
T
)u = w
is based on computing the product-form LDL
T
factorization of D
2
+ VV
T
. This factorization has the
form
L
L
1
L
2
L
k
k
(
L
k
)
T
(
L
2
)
T
(
L
1
)
T
L
T
, (3.6)
where L and each
L
i
are unit lower triangular matrices and
k
is a non-negative diagonal matrix.
Moreover, each
L
i
has a special structure.
To see how to compute (3.6), let us assume that we have the LDL
T
factorization of a matrix
F:
F = LL
T
. For simplicity, we shall ignore any symmetric permutations that may have been applied to
F +vv
T
= LL
T
+vv
T
= L(+ pp
T
)L
T
= L
L
L
T
L
T
,
where p is the solution of Lp = v and
L
L
T
is the LDL
T
factorization of + pp
T
. As shown
in Fletcher & Powell (1974),Gill et al. (1975) and Goldfarb & Scheinberg (2004, 2005),
L has the
a
t
A
z
a
r
b
i
j
a
n
T
a
r
b
i
a
t
M
o
a
l
l
e
m
U
n
i
v
e
r
s
i
t
y
o
n
N
o
v
e
m
b
e
r
8
,
2
0
1
1
h
t
t
p
:
/
/
i
m
a
j
n
a
.
o
x
f
o
r
d
j
o
u
r
n
a
l
s
.
o
r
g
/
D
o
w
n
l
o
a
d
e
d
f
r
o
m
814 D. GOLDFARB AND K. SCHEINBERG
special form
L =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1
p
2
1
1
p
3
1
p
3
2
1
.
.
.
.
.
.
.
.
.
.
.
.
p
n1
1
p
n1
2
p
n1
3
1
p
n
1
p
n
2
p
n
3
p
n
n1
1
_
_
, (3.7)
where R
n
and
= diag{
1
, . . . ,
n
}, and the elements of these matrices can be computed from the
following recurrence relations:
t
0
:= 1;
for j = 1, 2, . . . , n,
t
j
:= t
j 1
+ p
2
j
/
j
, (3.8)
j
:=
j
t
j
/t
j 1
,
j
:= p
j
/(
j
t
j
).
Now, assuming that we have an LDL
T
factorization LL
T
of
F, we can obtain a PFF of
F+VV
T
=
F+v
1
(v
1
)
T
+v
2
(v
2
)
T
+ +v
k
(v
k
)
T
by repeating the above procedure k times, once for each rank-one
term v
i
(v
i
)
T
. It is important to note that there is no need to form and store the k matrices
L
1
, . . . ,
L
k
.
Rather, only the pairs of vectors ( p
1
,
1
), . . . , ( p
k
,
k
) that are needed to dene these matrices are
stored.
For our rst system (D
2
+VV
T
)u = w the PFF method is directly applicable. Since D
2
is diagonal,
L = I and = D
2
are its LDL
T
factors. Because of the special structure of each
L
i
, which enables
systems involving
L
i
to be solved in only O(n) operations, the work required to compute the PFF (3.6)
and from it u is k
2
n + 8kn + o(kn), where o(kn) denotes terms that are of lower order than kn. Thus
for k much smaller than n this computation is very inexpensive (i.e., O(n)) even though it is roughly
twice as expensive as the SMW computation (3.2). Alternative PFF methods are presented in Goldfarb
& Scheinberg (2005) for updating the LDL
T
factorization of a matrix when a rank-k matrix of the form
V SV
T
is added to it, where V R
nk
and is of full column rank and S is a k k nonsingular diagonal
matrix. In the present context S = I and any of these methods could be used in place of the one given
above.
3.3 Product-form LDL
T
factorization of A(D
2
+ VV
T
)
1
A
T
To apply the PFF method to the matrix M = AF
1
A
T
, where F = D
2
+VV
T
, we can write this matrix
as the sum of a sparse matrix and a low-rank matrix via the SMW updating formula, as we have shown
in the Subsection 3.1. However, this would defeat the purpose of using PFF because of the numerical
instability of the SMW update.
We now show how the PFF procedure can be used to compute a factorization of M in a numer-
ically stable manner in roughly O(n
2
) + k
2
n operations, assuming that A is suitably sparse and well
conditioned. The computational cost depends on n rather than m, even though M is m m.Therefore,
a
t
A
z
a
r
b
i
j
a
n
T
a
r
b
i
a
t
M
o
a
l
l
e
m
U
n
i
v
e
r
s
i
t
y
o
n
N
o
v
e
m
b
e
r
8
,
2
0
1
1
h
t
t
p
:
/
/
i
m
a
j
n
a
.
o
x
f
o
r
d
j
o
u
r
n
a
l
s
.
o
r
g
/
D
o
w
n
l
o
a
d
e
d
f
r
o
m
LDL
T
FACTORIZATIONS IN IPMS FOR CONVEX QP 815
if m n or A is dense then it is preferable to directly factorize the normal equations matrix M. Our
method embeds M in the n n matrix
M K F
1
K
T
=
_
AF
1
A
T
AF
1
E
T
EF
1
A
T
EF
1
E
T
_
,
where K
T
= [A
T
, E
T
] is the nonsingular matrix introduced in Section 2. Specically, our method
applies the PFF procedure to
M P
M
1
P = PK
T
FK
1
P, (3.9)
where P is the reverse-order permutation matrix (2.3), and makes use of the fact that the LDL
T
factor-
izations of
M and
M are related, i.e., if
M =
L
L
T
then
M = P
M
1
P = P
L
T
1
L
1
P = P
L
T
P
. , .
L
P
1
P
. , .
P
L
1
P
. , .
L
T
=
L
L
T
.
We rst nd the sparse LDL
T
factorization L
0
0
L
0
T
of K D
2
K
T
. If there are any dense columns
of A then they are handled as they are in the LP case (see Goldfarb & Scheinberg, 2004). In such
a case, K is partitioned as [K
s
, K
d
], where K
s
(respectively, K
d
) is composed of the columns of K
corresponding to the sparse (respectively, dense) columns of A, and the PFF of K D
2
K
T
is computed.
This yields an L
0
that is the product of a sparse lower triangular matrix with specially structured lower
triangular matrices of the form (3.7). Then using the fact that PK
T
D
2
K
1
P =
L
0
0
(
L
0
)
T
, where
L
0
= P(L
0
)
T
P and
0
= P(
0
)
1
P, we compute the PFF of
M = PK
T
(D
2
+ VV)
T
K
1
P =
L
0
0
(
L
0
)
T
+
V
V
T
, where
V = PK
T
V, i.e.,
M =
L
0
L
1
L
2
L
k
k
(
L
k
)
T
(
L
2
)
T
(
L
1
)
T
(
L
0
)
T
, (3.10)
using the procedure described in Section 3.2. Note that at the start of this procedure we compute the
LDL
T
factorization L
0
0
L
0
T
of K D
2
K
T
and not that of PK
T
D
2
K
1
P since K
1
is likely to be
less sparse than K. It then follows from the relationship between the LDL
T
factorizations of
M and
M
that
AF
1
A
T
= L
0
11
L
1
11
L
2
11
L
k
11
k
11
(
L
k
11
)
T
(
L
2
11
)
T
(
L
1
11
)
T
(L
0
11
)
T
(3.11)
= L
0
11
P
1
(
L
1
22
)
T
(
L
2
22
)
T
(
L
k
22
)
T
(
k
22
)
1
(
L
k
22
)
1
(
L
2
22
)
1
(
L
1
22
)
1
P
1
(L
0
11
)
T
,
where, for j = 1, . . . , k we have
L
j
L
j
11
0
L
j
12
L
j
22
_
=
_
P
1
(
L
j
22
)
T
P
1
0
P
2
(
L
j
11
)
T
(
L
j
12
)
T
(
L
j
22
)
T
P
1
P
2
(
L
j
11
)
T
P
2
_
= P(
L
j
)
T
P
T
,
j
11
0
0
j
22
_
=
_
P
1
(
j
22
)
1
P
1
0
0 P
2
(
j
11
)
1
P
2
_
= P(
j
)
1
P
T
,
L
j
=
_
L
j
11
0
L
j
12
L
j
22
_
and P =
_
0 P
1
P
2
0
_
.
a
t
A
z
a
r
b
i
j
a
n
T
a
r
b
i
a
t
M
o
a
l
l
e
m
U
n
i
v
e
r
s
i
t
y
o
n
N
o
v
e
m
b
e
r
8
,
2
0
1
1
h
t
t
p
:
/
/
i
m
a
j
n
a
.
o
x
f
o
r
d
j
o
u
r
n
a
l
s
.
o
r
g
/
D
o
w
n
l
o
a
d
e
d
f
r
o
m
816 D. GOLDFARB AND K. SCHEINBERG
Note that
L
j
11
and
L
j
22
,
j
11
and
j
22
and P
1
are m m unit lower triangular, diagonal and reverse
permutation matrices, respectively, and that
L
j
22
and
L
j
11
,
j
22
and
j
11
and P
2
are (n m) (n m)
unit lower triangular, diagonal and reverse permutation matrices, respectively. Also, while we need the
full matrices
L
1
, . . . ,
L
k
, as in Section 3.2 there is no need to form and store them. Rather, because of
their special structure, indicated in (3.7), only the pair of vectors p
j
and
j
that is needed to dene each
L
j
is stored.
To compute the factorization (3.11), assuming that K is given by (2.2), we need to rst compute the
LU factorization of the matrix B and compute K
T
V. To ensure that our entire procedure is numerically
stable, we have to choose the matrix B in (2.2) so that it is well conditioned and compute its LU
factorization in a numerically stable manner (i.e., with pivoting). This, of course, is possible only if
A is well conditioned. Assuming that A is suitably sparsei.e., A has O(n) elements and the matrix
B can be chosen that has an LU factorization whose factors have O(m) elementsthis can be done in
O(m
2
) +O(kn) operations and needs to be done only once on the rst interior point iteration.
For those applications such as those in portfolio optimization problems with trading constraints,
where the above factorization is practicable, it is easy to choose such a matrix B and form its LU
factorization. See the discussion and computational analysis for such problems near the end of Section 5.
At every iteration one needs to do the following.
(i) Form K D
2
K
T
and compute its LDL
T
factorization L
0
0
(L
0
)
T
. This requires O(n
2
) opera-
tions assuming that the sparsity in K results in L
0
having only O(n) nonzero elements.
(ii) Compute (
L
0
)
1
V = P(L
0
)
T
(K
T
V). This requires O(kn) operations.
(iii) Solve k(k 1)/2 special triangular systems of the form
L
j
p = q. This requires k(k 1)n
operations.
(iv) Compute the elements of the k sets of vectors
j
, p
j
and
j
from which the special matrices
j
and
L
j
, j = 1, . . . , k, are composed using recurrence formulas (3.8). This requires 5kn
operations.
Thus the total number of operations needed at each iteration to compute the factorization (3.11) is
O(n
2
) +k
2
n plus lower-order terms.
To solve the normal equations system, i.e., a system of the form AF
1
A
T
u = w, requires an
additional O(km) operations. One also needs to solve a system of the form (D
2
+ VV
T
)u = w to
obtain the Newton step x. Note that this system can be solved in O(kn) + O(m
2
) operations with-
out computing the PFF of F = D
2
+ VV
T
since, from (3.9) and (3.10), F = K
T
P
MPK, i.e.,
u = F
1
w = K
1
L
0
P(
L
1
)
T
(
L
k
)
T
(
k
)
1
(
L
k
)
1
(
L
1
)
1
P(L
0
)
T
K
T
w.
Choices for the matrix K other than the one dened by (2.2) are possible. For example, one can
set K
T
= [A
T
, Q
2
], where Q
2
is obtained from the QR factorization A
T
= QR = [Q
1
, Q
2
]
_
R
1
0
_
.
Although this choice is numerically stable, it is generally much more expensive to implement than (2.2).
One is most likely to use the above updates when n and m are similar in size, so the loss in compu-
tational efciency caused by factorizing
M to obtain
M instead of factorizing M = A(Q + D
2
)
1
A
T
is
not signicant. A concrete example is presented at the end of Section 5 to illustrate the computational
savings that can be obtained by using the PFF approach presented in this section. When m n or A is a
dense matrix it is preferable to form and factorize M directly. However, it is still worthwhile to compute
the PFF of F if V has a relatively small number of columns. Using the PFF of F, the approximate cost of
forming M and directly factorizing it is
1
6
m
3
+
1
2
m
2
n +2kmn in addition to the k
2
n +4nk approximate
cost of forming the PFF of F.
a
t
A
z
a
r
b
i
j
a
n
T
a
r
b
i
a
t
M
o
a
l
l
e
m
U
n
i
v
e
r
s
i
t
y
o
n
N
o
v
e
m
b
e
r
8
,
2
0
1
1
h
t
t
p
:
/
/
i
m
a
j
n
a
.
o
x
f
o
r
d
j
o
u
r
n
a
l
s
.
o
r
g
/
D
o
w
n
l
o
a
d
e
d
f
r
o
m
LDL
T
FACTORIZATIONS IN IPMS FOR CONVEX QP 817
The PFF method is numerically more stable than the SMW method in practice as well as in theory.
To illustrate this, it is easy to check that PFF procedures presented in this subsection and in Subsection
3.2 produce correct results up to machine precision when applied to Example 3.1 in Subsection 3.1. In
Fine & Scheinberg (2001) some examples of convex QP problems arising in SVMs are presented on
which an implementation of an IPM using the SMW approach (3.2) fails numerically, whereas an im-
plementation of the same algorithm using the PFF works well. The PFF approach has been implemented
in interior point codes for handling dense columns in the constraint matrix A in LP problems and for
handling the special structure that arises in the matrix F in (1.4) in SOCP problems, and in both cases
it has worked well in practice. In Goldfarb & Scheinberg (2004) an analysis of the stability of PFF is
given for the LP case. The core of that analysis is the proof of the uniform boundedness of the factors
L and
L. We have shown that L is uniformly bounded for the QP case in Theorem 2.3. The uniform
boundedness of
L, and hence the numerical stability of the PFF procedure proposed in this subsection
(assuming that the matrix B in the denition of K is chosen to be well conditioned), follows from the
following theorem.
THEOREM 3.2 Let v be a xed vector and let M = A(Q + D
2
)
1
A
T
and A(Q + D
2
+vv
T
)
1
A
T
have
factorizations LL
T
and
L
L
T
= L
L
L
T
L
T
, respectively. Then the elements of
L are uniformly
bounded as 0.
Proof. By Theorem 2.3, the elements of
L and L are uniformly bounded as 0. However, we
can represent
L as a product of factors L
L. By Lemma 2.1, the elements of L
1
are also uniformly
bounded. Since the factorization
L
L
T
is unique,
L = L
1
L and hence it follows that the elements of
T
_
_
_
_
D
2
V A
T
V
T
I 0
A 0 0
_
_
, (3.12)
where the rst to the third column blocks of
T correspond to a Newton step (x, z, y) in the x, z
and y variables, respectively.
There are several ways to solve this system. If one rst eliminates the z variables or equivalently
forms the SC of I in
T then one obtains the augmented systems matrix T in (1.4) and one is back to the
situation faced before the auxiliary z variables were introduced. If instead one forms the SC of D
2
in
_
=
_
_
_
I 0 0
V
T
D
2
L
v
0
AD
2
AL
T
v
1
v
L
_
_
_
_
_
D
2
0 0
0
v
0
0 0
_
_
_
_
_
I D
2
V D
2
A
T
0 L
v
T
1
v
L
1
v
A
T
0 0
L
T
_
_
,
where
A = AD
2
V and L
v
v
L
T
v
and
L
L
T
are, respectively, the LDL
T
factorizations of S = I +
V
T
D
2
V and
M = A(D
2
+ VV
T
)
1
A
T
= AD
2
A
T
AS
1
A
T
.
The above form of the dense mm normal equations matrix M is identical to the one given in (3.4),
and the above approach is clearly equivalent to the rst SMW approach described in Section 3.1.
A better approach from a sparsity perspective, assuming that AD
2
A
T
has a sparse LDL
T
factoriza-
tion, is to symmetrically permute the blocks of C so that AD
2
A
T
appears in the (1, 1) block position,
or equivalently to symmetrically permute the second and third blocks of columns and rows of the matrix
_
=
_
_
_
I 0 0
AD
2
L 0
V
T
D
2
V
T
1
L
_
_
_
_
_
D
2
0 0
0 0
0 0
_
_
_
_
I D
2
A
T
D
2
V
0 L
T
1
V
0 0
L
T
_
_
,
where
V L
1
AD
2
V = L
1
A and LL
T
and
L
L
T
are, respectively, the LDL
T
factorizations of
AD
2
A
T
and the k k dense matrix
S S
V
T
1
V.
This is the same matrix
S that arises in procedure (3.5). In fact, the above approach is equivalent to the
second SMW approach (3.5) described in Section 3.1.
As already discussed in Section 3.1, the above two approaches can be numerically unstable. The
cause is in the formation of C or equivalently in the initial elimination of the x variables. To see this,
consider the upper leftmost 2 2 block submatrix of
T. The LDL
T
factorization of this matrix,
_
D
2
V
V
T
I
_
=
_
I 0
V
T
D
2
I
__
D
2
0
0 I + V
T
D
2
V
__
I D
2
V
0 I
_
, (3.14)
can be numerically unstable because V
T
D
2
blows up when some of the diagonal elements of D be-
come very small. The matrix on the left-hand side of (3.14) is an example of a quasi-denite matrix,
which, as has been shown in Vanderbei (1995), has an LD
L
T
factorization, where D
is a diagonal
matrix with both positive and negative diagonal elements, for any symmetric permutation of its rows
a
t
A
z
a
r
b
i
j
a
n
T
a
r
b
i
a
t
M
o
a
l
l
e
m
U
n
i
v
e
r
s
i
t
y
o
n
N
o
v
e
m
b
e
r
8
,
2
0
1
1
h
t
t
p
:
/
/
i
m
a
j
n
a
.
o
x
f
o
r
d
j
o
u
r
n
a
l
s
.
o
r
g
/
D
o
w
n
l
o
a
d
e
d
f
r
o
m
LDL
T
FACTORIZATIONS IN IPMS FOR CONVEX QP 819
and columns. Finding such an LD
L
T
factorization of a symmetrically permuted quasi-denite matrix
can in fact be shown to be equivalent to a nested sequence of SMW computations. In spite of its lack of
guaranteed numerical stability, this factorization has been successfully used in the interior point software
packages LOQO and OOPS to solve convex QP problems, as reported in Vanderbei & Carpenter (1993)
and Gondzio & Grothey (2007). In Gill et al. (1996) the numerical stability of computing the Cholesky
factorization of a quasi-denite matrix is analysed and conditions indicating when this factorization can
be computed with reasonable accuracy are given.
3.5 Augmented system factorizations
In some implementations of IPMs for LP the symmetric indenite augmented systems matrix T dened
by (1.4), where Q = 0 in the LP case, is factorized rather than the positive-denite normal equations
matrix (1.5). One way to do this was proposed by Fourer and Mehrotra (FM) (Fourer & Mehrotra, 1993;
see also Duff et al., 1991). Their approach applies a version of BunchParlett (BP) (Bunch & Parlett,
1971) symmetric indenite factorization to T, i.e., it computes
PT P
T
LL
T
, (3.15)
where P is a permutation matrix, L is a unit lower triangular matrix and is a block diagonal matrix
with 1 1 and 2 2 diagonal blocks. Unlike in the positive-denite case, the permutation P cannot
be determined prior to the numerical phase of the factorization. The FM implementation of the BP
factorization determines P and the block structure of during the numerical factorization by choosing
a pivot sequence so that the elements of L remain bounded to ensure numerical stability while at the
same time controlling the growth of ll-in as much as possible. The permutation determined on the rst
interior point iteration is reused on subsequent iterations until numerical difculties are encountered
since this speeds up the algorithm considerably. If such difculties are detected then the full sparse BP
procedure is reapplied, usually resulting in a denser factorization. Computational experience has shown
that LP problems can usually be solved by the FM approach using only one or two pivot orders and with
50100% more time than that required by computing LDL
T
factorizations of the normal equations
matrices AD
2
A
T
(see Wright, 1997, p. 221).
The FM procedure can be applied to QP, resulting in sparse and numerically stable factorizations if
both Q and A are suitably sparse, but it is not able to directly take advantage of the special structure
of Q considered in this paper. However, it is possible to develop a product-form symmetric indenite
factorization. Specically, given the factorization (3.15), where T is the matrix in (1.4), we can write
P
_
(Q +vv
T
+ D
2
) A
T
A 0
_
P
T
= L( pp
T
)L
T
= L
P
T
L
L
T
PL
T
, (3.16)
where p is the solution of Lp = (v
T
, 0)
T
and
L
L
T
is the symmetric indenite factorization of
P(
pp
T
)
P
T
. Let p
P p and
P
P
T
, and let p be partitioned into one- and two-element subvectors
p
j
according to the block structure of
, i.e., assuming that
diag{
1
,
2
, . . . ,
k
}, p can be
written as p
T
( p
T
1
, p
T
2
, . . . , p
T
k
). We note that, as in the FM and BP cases, the permutation
P and
the block structure of
can only be determined during the numerical factorization phasei.e., during
the computation of the recurrence formulas (3.18) below. It is easy to verify that
L has a special form,
a
t
A
z
a
r
b
i
j
a
n
T
a
r
b
i
a
t
M
o
a
l
l
e
m
U
n
i
v
e
r
s
i
t
y
o
n
N
o
v
e
m
b
e
r
8
,
2
0
1
1
h
t
t
p
:
/
/
i
m
a
j
n
a
.
o
x
f
o
r
d
j
o
u
r
n
a
l
s
.
o
r
g
/
D
o
w
n
l
o
a
d
e
d
f
r
o
m
820 D. GOLDFARB AND K. SCHEINBERG
namely
L =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
I
p
2
T
1
I
p
3
T
1
p
3
T
2
I
.
.
.
.
.
.
.
.
.
.
.
.
p
n1
T
1
p
n1
T
2
p
n1
T
3
I
p
n
T
1
p
n
T
2
p
n
T
3
p
n
T
n1
I
_
_
, (3.17)
where
T
(
T
1
,
T
2
, . . . ,
T
k
) and
diag{
1
,
2
, . . . ,
k
} are partitioned to conform with the
partitions of p and
and can be computed from the following recurrence relations:
0
:= 1;
for j = 1, 2, . . . , k,
j
:=
j
+
j 1
p
j
p
T
j
, (3.18)
j
:=
j 1
1
j
p
j
,
j
:=
j 1
T
j
j
.
The above recursion is a specialization of a method rst proposed by Bennett (1965). We have used
it rather than an analogous version of the recurrence formulas (3.8) rst proposed by Fletcher & Powell
(1974) and Gill et al. (1975) since we no longer need to guarantee positive semideniteness of
in the
presence of round-off errors, as required by the PFF approach. Since testing for growth in the elements
in the factorization and determining the permutation
P must be done while computing the factorization
and this is expensive, one can reuse the same permutation
P on subsequent interior point iterations as
in the FM procedure until numerical difculties are encountered.
4. Support vector machines
An SVM(see the seminal work of Vapnik, 1996; Cristianini &Shawe-Taylor, 2000; Sch olkopf &Smola,
2002) is an algorithm for solving the following problem: given a set of data points in a k-dimensional
space V = {v
i
R
k
|i = 1, . . . , n} and a set of labels {a
i
{1, 1}|i = 1, . . . , n} that partition
the set V into positive and negative subsets, V
+
{v
i
V|a
i
= 1} and V
{v
i
V|a
i
=
1}, nd a surface, or classier, that separates these two sets of points and then use it to classify
new unlabelled data points by assigning each new point to V
+
or V
i
i
, with some positive penalty parameter c, which leads to the so-called one-norm
soft-margin classication problem:
min
,w,y
1
2
w
T
w +c
n
i =1
i
s.t. a
i
(w
T
v
i
+ y) 1
i
, i = 1, . . . , n,
i
0, i = 1, . . . , n.
(SVM)
This is a convex QP problem. From QP duality, the optimal solution w
=
n
i =1
x
i
a
i
v
i
, where x (x
1
, . . . , x
n
)
T
is an optimal solution to the dual problem
min
x,y,
1
2
x
T
D
a
VV
T
D
a
x +ce
T
s.t. D
a
VV
T
D
a
x +ay 1 ,
0, 0 x ce.
(D
SVM
)
Here V R
nk
is a matrix whose rows are the data vectors v
i
, i = 1, . . . , n, and D
a
R
n
is a diagonal
matrix with diagonal elements a
1
, . . . , a
n
.
The strength of the SVM methodology is that problem (D
SVM
) can be easily extended to the nonlin-
ear case. If the data are not linearly separable in the original space then it may be desirable to lift it into
a higher-dimensional feature space using some nonlinear mapping (), where the data become linearly
separable. Hence we need to solve the linear SVM with the data vectors (v
i
), i = 1, . . . , n. This can
be done by means of a kernel operation, K(, ), which is a scalar function of two vectors that satises
K(v
i
, v
j
) = (v
i
)
T
(v
j
). For a given kernel operation K(, ) the matrix VV
T
in (D
SVM
) is replaced
by the matrix K whose (i, j )th element is K(v
i
, v
j
).
The dual to problem (D
SVM
) corresponding to the kernel K is the convex QP problem
min
x
1
2
x
T
Qx e
T
x
s.t. a
T
x = 0,
0 x ce,
(P
SVM
)
where Q = D
a
K D
a
and the optimal solution to (P
SVM
), x R
n
, is also the optimal solution to (D
SVM
).
For large-scale problems, which are common in real-world applications such as speech, document
classication, optical character recognition, etc., the training set is typically large and Q is a large dense
matrix. Often, however, Q has low rank, which happens when the dimension of the feature space is
small, or Q = VV
T
and V is a large sparse matrix with a small number of dense columns, which
happens when the dimension of the feature space is large but there are a few features that appear in
most of the data. In such cases one can signicantly improve the performance of an IPM by using
the techniques described in Section 3. Since there is only one equality constraint in (P
SVM
), the only
major work lies in factorizing Q + D
2
. Here, due to the upper bound constraints x ce,
a
t
A
z
a
r
b
i
j
a
n
T
a
r
b
i
a
t
M
o
a
l
l
e
m
U
n
i
v
e
r
s
i
t
y
o
n
N
o
v
e
m
b
e
r
8
,
2
0
1
1
h
t
t
p
:
/
/
i
m
a
j
n
a
.
o
x
f
o
r
d
j
o
u
r
n
a
l
s
.
o
r
g
/
D
o
w
n
l
o
a
d
e
d
f
r
o
m
822 D. GOLDFARB AND K. SCHEINBERG
D
2
= X
1
S + (cI X)
1
U, where U is a diagonal matrix whose i th diagonal element is the dual
variable corresponding to the upper bound constraint on x
i
.
When a nonlinear kernel is used the rank of Q can be as large as the dimension of the feature space,
which can be innite. However, it has been observed (Williamson et al., 1998; Smola & Sch olkopf,
2000; Williams & Seeger, 2001) that for many classes of nonlinear SVM problems the eigenvalues of
Q rapidly decay to zero. Hence one can approximate Q by a low-rank positive-semidenite matrix to
speed up an IPM, as proposed in Fine & Scheinberg (2001). The bound on the error in the objective
function when such an approximation is used is
1
2
c
2
l
max
, where l
max
is the maximum of the number of
support vectors in the original and the perturbed problems. Other low-rank approximations of the kernel
matrix have been proposed in Smola & Sch olkopf (2000) and Williams & Seeger (2001).
5. Portfolio optimization
Portfolio optimization refers to the allocation of capital over a number of available assets in order to
maximize the return on the investment while minimizing the risk. Different portfolio optimization prob-
lems arise from the way that these two conicting objectives are handled. Markowitz (1952, 1959)
rst formulated mathematical models for this class of problems, measuring the return and risk by the
expected value and the variance, respectively, of the random portfolio return. Markowitz showed that,
given a lower bound on the acceptable return, the optimal (minimum risk) portfolio can be obtained by
solving a convex QP problem.
We consider the following factor model of the market that has n traded assets. The vector of asset
returns over a single market period is denoted by r R
n
(i.e., asset i returns (1 +r
i
) dollars for every
dollar invested in it). It is assumed that r is a vector of random variables given by
r = +
V f +, (5.1)
where R
n
is the vector of mean returns, f R
k
is the vector of returns of factors that drive the
market with mean zero and a positive-denite covariance matrix G,
V
T
R
k n
is the matrix of factor
loadings of the n assets and R
n
is the vector of residual returns with mean zero and the positive-
semidenite covariance matrix D = diag(d). In addition, we assume that the residual returns are
independent of the factor returns f .
A portfolio is a vector R
n
, where the i th component
i
represents the fraction of the total wealth
invested in asset i . If no short selling is allowed then
i
0 for all i . The return r
of the portfolio is
given by
r
= r
T
=
T
+ f
T
V
T
+
T
.
The Markowitz portfolio selection model chooses a portfolio
]
T
(VV
T
+ D), where V
VG
1/2
, from all portfolios that have an expected return
E[r
]
T
of at least , i.e.,
]r
f
Var[r
]
,
where r
f
is the risk-free rate, subject to the portfolio constraints e
T
= 1 and 0. If the expected
a
t
A
z
a
r
b
i
j
a
n
T
a
r
b
i
a
t
M
o
a
l
l
e
m
U
n
i
v
e
r
s
i
t
y
o
n
N
o
v
e
m
b
e
r
8
,
2
0
1
1
h
t
t
p
:
/
/
i
m
a
j
n
a
.
o
x
f
o
r
d
j
o
u
r
n
a
l
s
.
o
r
g
/
D
o
w
n
l
o
a
d
e
d
f
r
o
m
LDL
T
FACTORIZATIONS IN IPMS FOR CONVEX QP 823
return of at least one asset is greater than r
f
then, using the fact that
T
r
f
= ( r
f
e)
T
is
a homogeneous function of , a solution to the maximum Sharpe ratio problem can be obtained by
solving the convex QP problem
minimize
T
(VV
T
+ D)
subject to ( r
f
e)
T
= 1, 0
(5.3)
and then setting =
/e
T
.
If the portfolio is constrained to satisfy the additional inequality constraints A b and the equal-
ity constraints
A =
b then these can be handled by introducing an auxiliary variable to homogenize
them and the constraint e
T
= 1. Specically, the constraints
A
b,
=
b,
e
T
=,
0
(5.4)
need to be added to (5.3).
In practice, these basic models are augmented with additional constraints depending on the specic
application (e.g., see Perold, 1984), including upper bounds on the fraction of wealth
i
invested in each
asset i and on the total fractions of the portfolio invested in subsets of assets (e.g., telecommunications
stocks). One may also have constraints of the form
i
j
for various asset pairs (i, j ) to capture the
preference for asset j over asset i . In most applications the optimal portfolio is obtained by modifying
an existing portfolio and there are transaction costs that are often different for buying and selling a
particular asset. In such situations, if
i
is the current fraction of asset i in the portfolio then trading
constraints of the form
i
z
i
+ y
i
=
i
,
0 z
i
i
i
, (5.5)
0 y
i
i
are added to the basic model, where z
i
and y
i
are, respectively, the amounts of asset i added to and
removed from the portfolio (in terms of fractions of the total wealth) and
i
is an upper bound on
i
. If
transaction costs are linear in z
i
and y
i
(i.e.,
T
i
z
i
and
T
i
y
i
, respectively) then the constraint e
T
= 1
is replaced by the total wealth constraint e
T
+
T
z +
T
y = 1 to account for the effect of these costs
on the value of the portfolio. For any asset i not in the current portfolio
i
= z
i
and y
i
= 0, so it is not
necessary to introduce variables z
i
and y
i
and trading constraints for such assets.
Consider problem (5.2) modied as described above to handle transaction costs and in which the
constraint
T
is, without loss of generality, treated as an equality constraint. In a typical situation
the universe of assets might be the n = 500 stocks in the S&P 500 index, the number of factors used
might be 25 and the number of stocks in an existing portfolio might be 400. Hence the number of rows
m in the matrix A in Section 3.3 would be 402.
Let us compare the computational approach suggested in Section 3.3 with that of forming and fac-
torizing M. Because the m = 400 rows of the matrix A corresponding to the equations in (5.5) have
the form [I, 0, I, I ], where the 0 block corresponds to the variables
i
for assets that are not in the
current portfolio, the m m matrix B in (2.2) will have a diagonal m m block equal to the identity
a
t
A
z
a
r
b
i
j
a
n
T
a
r
b
i
a
t
M
o
a
l
l
e
m
U
n
i
v
e
r
s
i
t
y
o
n
N
o
v
e
m
b
e
r
8
,
2
0
1
1
h
t
t
p
:
/
/
i
m
a
j
n
a
.
o
x
f
o
r
d
j
o
u
r
n
a
l
s
.
o
r
g
/
D
o
w
n
l
o
a
d
e
d
f
r
o
m
824 D. GOLDFARB AND K. SCHEINBERG
matrix or a diagonal matrix I
q
0
e
T
1
j
_
_
=
_
_
_
_
I 0 0
T
1 0
e
T 1
q
1
_
_
_
_
_
_
I 0 e
j
0
q
j
0 0
j
+
j
q
1
_
_
,
where (
1
, . . . ,
m
). Clearly, for typical values of and that arise in practice one can choose B
so that it is well conditioned.
Furthermore, one can verify that the matrix L
0
in the LDL
T
factorization of the n n matrix
K D
2
K
T
, where n = n + 2 m = 1300, also has a very special structure if the rows and columns
of K D
2
K
T
are rst suitably and symmetrically permuted. Specically, L
0
is a 3 3 block lower tri-
angular matrix, where L
0
11
is a (3 m 2) (3 m 2) identity matrix with two subdiagonals starting in
the rst column in rows m+1 and 2 m+1, L
0
12
= 0 R
( n m)( n+2 m2)
and L
0
22
= I R
( n m)( n m)
.
The remaining three block submatrices, L
31
, L
32
and L
33
, all have two rows and are dense. Hence L
0
has at most 2n +3 m 5 < 3n nonzero elements below the diagonal.
Therefore, for this example, at each IPM iteration computing the PFF of AF
1
A
T
and using it to
solve the systems of equations AF
1
A
T
u = w and Fu = w requires the following:
(i) 2 n +22 m operations to form K D
2
K
T
and compute its LDL
T
factorization L
0
0
(L
0
)
T
,
(ii) 2 mk operations to compute (
L
0
)
1
V = P(L
0
)
T
(K
T
V),
(iii) k(k 1)n operations to solve k(k 1)/2 special triangular systems of the form
L
j
p = q,
(iv) 5kn operations to compute the elements of the k sets of vectors
j
, p
j
and
j
needed for forming
the matrices
j
and
L
j
, j = 1, . . . , k, using the recurrence formulas (3.8),
(v) (4k +1)n +2 m operations to solve AF
1
A
T
u = w and
(vi) (4k +1)n +2 m +4 n +10m operations to solve Fu = w.
In addition, less than kn operations are needed to compute the LU factorization of the matrix B and
K
T
V at the rst interior point iteration.
For the typical values of n, n, m, m and k, mentioned above, the PFF method of Section 3.3 requires
approximately 1.24 10
6
operations per iteration. This is almost a factor of 100 times less than the
117 10
6
operations required if one uses the PFF method of Section 3.2 to factorize D + VV
T
, uses
it to form AF
1
A
T
, computes the Cholesky factorization of the latter and then obtains u and u and a
factor of more than 325 times less than the 408 10
6
operations needed if just standard Cholesky
factorizations are used.
In some applications disjunctive constraints of the form either
i
i
or
i
= 0 and cardinality
constraints that limit the maximum number of assets in the portfolio are imposed. These lead to a mixed-
integer QP problem that can be solved within a branch-and-cut framework by repeatedly solving convex
QP problems, whose structure is the one considered here, to which polyhedral cuts have been added
(e.g., see Bienstock, 1996).
a
t
A
z
a
r
b
i
j
a
n
T
a
r
b
i
a
t
M
o
a
l
l
e
m
U
n
i
v
e
r
s
i
t
y
o
n
N
o
v
e
m
b
e
r
8
,
2
0
1
1
h
t
t
p
:
/
/
i
m
a
j
n
a
.
o
x
f
o
r
d
j
o
u
r
n
a
l
s
.
o
r
g
/
D
o
w
n
l
o
a
d
e
d
f
r
o
m
LDL
T
FACTORIZATIONS IN IPMS FOR CONVEX QP 825
Acknowledgements
We are grateful to the anonymous referees for many helpful suggestions.
Funding
National Science Foundation (DMS 01-04282, DMS 06-06712); Department of Energy (DE-FG02-
92EQ25126, DE-FG02-08ER58562); Ofce of Naval Research (N00014-03-0514, N00014-08-1-1118).
REFERENCES
ALIZADEH, F. & GOLDFARB, D. (2003) Second-order cone programming. Math. Program., 95, 351.
ANDERSEN, K. D. (1996) A modied Schur complement method for handling dense columns in interior point
methods for linear programming. ACM Trans. Math. Softw., 22, 348356.
BENNETT, J. M. (1965) Triangular factors of modied matrices. Numer. Math., 7, 217221.
BIENSTOCK, D. (1996) Computational study of a family of mixed-integer quadratic programming problems. Math.
Program., 74, 121140.
BUNCH, J. R. & PARLETT, B. N. (1971) Direct methods for solving symmetric indenite systems of linear equa-
tions. SIAM J. Numer. Anal., 8, 639655.
CHOI, I. C., MONMA, C. L. & SHANNO, D. F. (1990) Further development of primaldual interior point methods.
ORSA J. Comput., 2, 304311.
CRISTIANINI, N. & SHAWE-TAYLOR, J. (2000) An Introduction to Support Vector Machines and Other
Kernel-based Learning Methods. Cambridge, UK: Cambridge University Press.
DUFF, I. S., GOULD, N. I. M., REID, J. K., SCOTT, J. A. & TURNER, K. (1991) The factorization of sparse
indenite matrices. IMA J. Numer. Anal., 11, 181204.
FERRIS, M. C. & MUNSON, T. S. (2003) Interior point methods for massive support vector machines. SIAM J.
Optim., 13, 783804.
FINE, S. & SCHEINBERG, K. (2001) Efcient SVMtraining using low-rank kernel representations. J. Mach. Learn.
Res., 2, 243264.
FLETCHER, R. & POWELL, M. J. D. (1974) On the modication of ldl
T
factorization. Math. Comput., 28,
10671087.
FOURER, R. & MEHROTRA, S. (1993) Solving symmetric indenite systems in an interior point method for linear
programming. Math. Program., 62, 1539.
GILL, P. E., MURRAY, W. & SAUNDERS, M. A. (1975) Methods for computing and modifying the ldv factors of
a matrix. Math. Comput., 29, 10511077.
GILL, P. E., SAUNDERS, M. A. & SHINNERL, J. R. (1996) On the stability of Cholesky factorization for
symmetric quasidenite systems. SIAM J. Matrix Anal. Appl., 17, 3546.
GOLDFARB, D. & SCHEINBERG, K. (2004) A product-form Cholesky factorization method for handling dense
columns in interior point methods for linear programming. Math. Program., 99, 134.
GOLDFARB, D. & SCHEINBERG, K. (2005) Product-form Cholesky factorization in interior point methods for
second-order cone programming. Math. Program., 103, 153179.
GONDZIO, J. & GROTHEY, A. (2007) Parallel interior point solvers for structured quadratic programs: applications
to nancial planning problems. Ann. Oper. Res., 152, 319339.
MARKOWITZ, H. M. (1952) Portfolio selection. J. Financ., 7, 7791.
MARKOWITZ, H. M. (1959) Portfolio Selection. New York: Wiley.
MARXEN, A. (1989) Primal barrier methods for linear programming. Technical Report. Stanford, CA: Department
of Operations Research, Stanford University.
PEROLD, A. F. (1984) Large-scale portfolio optimization. Manage. Sci., 30, 11431160.
a
t
A
z
a
r
b
i
j
a
n
T
a
r
b
i
a
t
M
o
a
l
l
e
m
U
n
i
v
e
r
s
i
t
y
o
n
N
o
v
e
m
b
e
r
8
,
2
0
1
1
h
t
t
p
:
/
/
i
m
a
j
n
a
.
o
x
f
o
r
d
j
o
u
r
n
a
l
s
.
o
r
g
/
D
o
w
n
l
o
a
d
e
d
f
r
o
m
826 D. GOLDFARB AND K. SCHEINBERG
SCHEINBERG, K. & WRIGHT, S. (2000) A note on modied Cholesky and Schur complement in interior point
methods for linear programming (unpublished manuscript).
SCH OLKOPF, B. & SMOLA, A. J. (2002) Learning with Kernels. Cambridge, MA: MIT Press.
SMOLA, A. J. & SCH OLKOPF, B. (2000) Sparse greedy matrix approximation for machine learning. Proceedings
of the 17th International Conference on Machine Learning, Stanford University, CA. Cambridge, MA: Morgan
Kaufmann Publishers, pp. 911918.
VANDERBEI, R. J. (1995) Symmetric quasi-denite matrices. SIAM J. Optim., 5, 100113.
VANDERBEI, R. J. & CARPENTER, T. J. (1993) Symmetric indenite systems for interior point methods. Math.
Program., 58, 132.
VAPNIK, V. N. (1996) The Nature of Statistical Learning Theory. New York: Wiley & Sons.
WILLIAMS, C. & SEEGER, M. (2001) Using the Nystr om method to speed up kernel machines. Advances in
Neural Information Processing Systems 13 (T. K. Leen, T. G. Dietterich & V. Tresp eds). Cambridge, MA:
MIT Press, pp. 682688.
WILLIAMSON, R. C., SMOLA, A. J. & SCH OLKOPF, B. (1998) Generalization performance of regularization
networks and support vector machines via entropy numbers of compact operators. IEEE Trans. Inf. Theory,
47, 25162532.
WRIGHT, S. (1997) PrimalDual Interior Point Methods. Philadelphia, PA: SIAM.
a
t
A
z
a
r
b
i
j
a
n
T
a
r
b
i
a
t
M
o
a
l
l
e
m
U
n
i
v
e
r
s
i
t
y
o
n
N
o
v
e
m
b
e
r
8
,
2
0
1
1
h
t
t
p
:
/
/
i
m
a
j
n
a
.
o
x
f
o
r
d
j
o
u
r
n
a
l
s
.
o
r
g
/
D
o
w
n
l
o
a
d
e
d
f
r
o
m