On The Implementation of A Primal-Dual Interior Po

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/230873223

On the Implementation of a Primal-Dual Interior Point Method

Article in SIAM Journal on Optimization · November 1992


DOI: 10.1137/0802028

CITATIONS READS
1,424 5,064

1 author:

Sanjay Mehrotra
Northwestern University
172 PUBLICATIONS 4,217 CITATIONS

SEE PROFILE

All content following this page was uploaded by Sanjay Mehrotra on 05 June 2014.

The user has requested enhancement of the downloaded file.


SIAM J. OPTIMIZATION (C) 1992 Society for Industrial and Applied Mathematics
Vol. 2, No. 4, pp. 575-601, November 1992 003

ON THE IMPLEMENTATION OF A PRIMAL-DUAL


INTERIOR POINT METHOD*
SANJAY MEHROTRAt

Abstract. This paper gives an approach to implementing a second-order primal-dual interior point
method. It uses a Taylor polynomial of second order to approximate a primal-dual trajectory. The computa-
tions for the second derivative are combined with the computations for the centering direction. Computations
in this approach do not require that primal and dual solutions be feasible. Expressions are given to compute
all the higher-order derivatives of the trajectory of interest. The implementation ensures that a suitable
potential function is reduced by a constant amount at each iteration.
There are several salient features of this approach. An adaptive heuristic for estimating the centering
parameter is given. The approach used to compute the step length is also adaptive. A new practical approach
to compute the starting point is given. This approach treats primal and dual problems symmetrically.
Computational results on a subset of problems available from netlib are given. On mutually tested
problems the results show that the proposed method requires approximately 40 percent fewer iterations
than the implementation proposed in Lustig, Marsten, and Shanno Tech. Rep. TR J-89-11, Georgia Inst.
of Technology, Atlanta, 1989]. It requires approximately 50 percent fewer iterations than the dual affine
scaling method in Adler, Karmarkar, Resende, and Veiga [Math. Programming, 44 (1989), pp. 297-336],
and 35 percent fewer iterations than the second-order dual affine scaling method in the same paper. The
new approach for estimating the centering parameter and finding the step length and the starting point have
contributed to the reduction in the number of iterations. However, the contribution due to the use of second
derivative is most significant.
On the tested problems, on the average the implementation shown was found to be approximately two
times faster than OB1 (version 02/90) described in Lustig, Marsten, and Shanno and 2.5 times faster than
MINOS 5.3 described in Murtagh and Saunders [Tech. Rep. SOL 83-20, Dept. of Operations Research,
Stanford Univ., Stanford, CA, 1983].

Key words, linear programming, interior point methods, primal-dual methods, power series methods,
predictor-corrector methods

AMS(MOS) subject classifications. 90C05, 90C06, 90C20, 49M15, 49M35

1. Introduction. This paper considers interior point algorithms for simultaneously


solving the primal linear program"
minimize cx
(P) s.t. Ax b,
x>=O,
and its dual
maximize b TTl"
(D) s.t. A TTr + S C,
s>=O,
where c, x, s 9]", 7r, b 91", and A 9] "n. It is assumed that A has full row rank.
This can be ensured by removing the linearly dependent rows in the beginning. The

* Received by the editors June 18, 1990; accepted for publication (in revised form) August 30, 1991.
This paper was presented at the Second Asilomar Workshop on Progress in Mathematical Programming,
Monterey, CA, February 5-7, 1990. This research was supported in part by National Science Foundation
research initiation grant CCR-8810107 and by a grant from the GTE Laboratories.
f Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston,
Illinois 60208 (mehrotra@iems.nwu.edu).

575
576 SANJAY MEHROTRA

primal-dual methods, which use solutions of both (P) and (D) in the scaling matrix,
are of primary interest.
The primal-dual algorithms have their roots in Megiddo [21]. These were further
developed and analyzed by Kojima, Mizuno, and Yoshise [15] and Monteiro and
Adler [27]. They showed that the central trajectory can be followed to the optimal
solution in O(x/-ffL) iterations by taking "short steps." Kojima, Mizuno, and Yoshise
[16] showed that the primal-dual potential function [31], which is a variant of Kar-
markar’s potential function [13], can also be reduced by a constant amount at each
iteration and therefore, they developed a primal-dual large step potential reduction
algorithm.
McShane, Monma, and Shanno [20] were the first to develop an implementation
of this method. They found it to be a viable alternative to the then popular dual affine
scaling method [1], [26] for solving large sparse problems. They also found that this
method typically takes fewer iterations than the dual affine scaling method. However,
it was found to be only competitive with the dual affine scaling method because of
the additional computations that their implementation had to perform. This
implementation created some artificial problems (by adding artificial variables/con-
straints to the original problem) and maintained primal and dual feasible solutions of
these problems. Further developments on this implementation were reported in Choi,
Monma, and Shanno [4].
Lustig, Marsten, and Shanno [18] implemented a variant of the primal-dual
method which is based on an earlier work of Lustig [17]. This method is developed
by considering the Newton direction on the optimality conditions for the
logarithmic barrier problem. An important feature of their approach is that it did
not explicitly require that feasible solutions for the primal or the dual problem be
available. They showed that the resulting direction is a particular combination of
primal-dual affine scaling direction, feasibility direction, and centering direction. In
support of their method they reported success in solving all the problems in the netlib
[7] test set.
This paper builds on the work of Lustig, Marsten, and Shanno [18]. In doing so
it makes use of the work of Monteiro, Adler, and Resende [28] and Karmarkar,
Lagarias, Slutsman, and Wang [14]. The discussion in this paper assumes that direct
methods are preferred over iterative methods for solving linear equations arising at
each iteration of the algorithm. In our view, the following accomplishments are reported
in this paper:
It gives an algorithm and describes its implementation by using first and second
derivatives of the primal-dual affine scaling trajectory Taylor polynomial and by
effectively combining the second derivative with a centering direction.
A comparison with the results reported in the literature (on mutually tested
problems) shows that the method developed in this paper takes approximately 50
percent fewer iterations than the dual affine scaling method as implemented by Adler,
Karmarkar, Resende, and Veiga 1 ], 40 percent fewer iterations than the primal-dual
method implemented in Lustig, Marsten, and Shanno [18], and 55 percent fewer
iterations than the logarithmic barrier function method implemented in Gill, Murray,
and Saunders [8]. It requires 35 percent fewer iterations than the second-order dual
affine scaling method implemented in Adler, Karmarkar, Resende, and Veiga 1 and
20 percent fewer iterations than the "optimal three-dimensional method" implemented
by Domich, Boggs, Donaldson, and Witzgall [5].
An efficient preliminary implementation of the proposed approach was
developed. On average, it was found to be two times faster than OB1 (version 02/1990)
[18]. On average, it was also found to be 2.5 times faster than MINOS 5.3.
PRIMAL-DUAL INTERIOR POINT METHOD 577

While developing our implementation we ensure that a suitable potential func-


tion is reduced by a constant amount. This is accomplished by taking a two tier
approach. Most of the work is performed at the first level, which uses extensive heuristic
arguments. The second level ensures the robustness of the implementation.
It gives expressions for computing first and higher derivatives of a primal-dual
trajectory. This trajectory starts from any positive point and goes to the optimum.
It gives an adaptive approach to computing the centering parameter.
It gives a modified heuristic for computing step length at each iteration. The
approach allows us to adaptively take steps much closer to the boundary.
It gives an approach to generating primal and dual starting points, which treat
these problems symmetrically.
We find it convenient to outline the proposed approach first. This is done in the
next section. The organization of this paper is given in that section. The following
notation and terminology is used throughout this paper.
Notation and terminology, x k, 7r k, and s k represent the estimate of solutions of
(P) and (D) at the beginning of iteration k. X k and S k are used to represent diagonal
matrices whose elements are xk,xk ,xk and s k1, sk .,Sn,k
respectively, sGk--
Ax k- b, k ATTrk+ S k- C, x, and G are referred to as error vectors. D 2 represents
matrix (sk)-ax k. e is used to represent a vector of all ones. ei represents column of
an identity matrix. is used to represent the Euclidean norm of a vector.
The term search direction in primal space is used for a direction px, which is
constructed from a combination of directions (pl,p2). The term primal blocking
variable is used for a variable that will first become negative when moving in a direction.
The term step factor represents the fraction of step length that makes the blocking
variable zero. Similar terminology is used for directions in the dual space.
Central trajectory is the set of feasible points in (P) and (D) satisfying xi(lz)si(lz)
/z, for i= 1,..., n. In all references to central trajectory we assume that x(/x), s()
exists for all
2. Implementation of an interior point method. This section outlines the approach
we take to implement an interior point method. We do this to provide a complete
picture of this paper and to fix certain additional notations used throughout. Various
steps in the development of this implementation are discussed in more detail in 3-7.
The procedure is outlined in Exhibit 2.1 and it is called AIPM (an interior point
method). We now discuss the procedure.
Procedure AIPM
Input: Let x>
0 and s>
0, 7r be the given starting points.
For k 0, 1,... until a stopping criterion is satisfied do:
Step 0
ks := A TTrk + sk c,
:k := AX b,
k

D2:= sk-’x k.
Step 1
c find the first derivative of primal-dual affine scaling trajectory.
pl :--(AD2AT)-I(b-AD2k),
psl := k A rpl,
k
p 1 := x D2p 1.
EXHIBIT 2.1. A pseudo-code for implementing a second-order primal-dual method.
578 SANJAY MEHROTRA

Step 2
c compute centering parameter /k.
CALL CENPAR (X k, pxl, S k, ps1,/zk).
C compute the second derivative of primal-dual trajectory with the centering direction.
Step 3
vi:=-2* ((pxl),. (psi)i- )/sk for i= 1, 2... n,
p=2:=(AD2AT)-IAv,
ps2 := A Tp=2,
p := V D2p2.
c construct a Taylor polynomial and find maximum steps (e, e,.) using this polynomial.
Step 4
CALL SFSOP (x k, pl, p2, ex, s k, psl, ps2, e).
c construct a search direction.
Step 5
p := e * p 1- .5 * e 2 * ps2,
p := e * p 1 -.5 * * p2, e
px := ex * p 1- .5 * e * p2.
c compute step factors (fx, f).
Step 6
CALL GTSF (x k, p, s k, p, f, f ).
c generate trial points.
Step 7
:= x -fx * p,
k
:= s -f * p,
:= r k -f * p,.
c test if the trial point is acceptable.
Step 8
If an appropriate potential function is reduced, then
X
k+l

k+l
:___

;--- y,
,
S
k+l :--
7/"

else
perform a line search/if necessary compute
additional vectors and ensure reduction in the potential function.
endif

EXHIBIT 2.1 (continued).


PRIMAL-DUAL INTERIOR POINT METHOD 579

The approach used to generate a starting point is discussed in 7.


Given x k, r k, and s k, Step 0 computes error vectors :k and :k representing the
amount by which primal and dual constraints are violated. D 2 has the primal-dual
scaling matrix.
Step 1 computes direction pxl in primal and pl, psl in dual spaces. These
directions are tangent to a primal-dual trajectory. This trajectory is discussed in 4.
Expressions to compute all derivatives of this trajectory at a point are also developed
in 4.
The primal and dual directions computed at Step 1 are used in procedure CENPAR
to estimate the centering parameter /z k. Our approach to estimating the centering
parameter is given in Exhibit 5.1. This approach is discussed in 5.
Step 3 computes the second derivative of the primal-dual trajectory and the
centering direction. These directions could be computed separately. However, in the
current implementation we prefer to combine their computation in order to save a
forward and a back solve. We use the tangent direction and the direction in Step 3 to
construct a Taylor polynomial and to find a maximum step to the boundary (in primal
and dual spaces separately) using this polynomial. This is done in Procedure SFSOP
given in Exhibit 4.1. The computations performed in this procedure are also discussed
in 4.
In Step 5 we use the maximum step in a Taylor polynomial to generate search
directions px, p, and ps. In Procedure GTSF (Exhibit 6.1) we compute a fraction
(f,f) of the total step to the boundary in the search direction. This is discussed
further in 6. Using the search directions and the step factors, trial point
generated in primal and dual spaces.
, ,, g is

Step 8 ensures the robustness of the overall procedure. It is loosely defined here.
It depends on the choice of the function used to measure the progress of the algorithm
and the best possible theoretical results that could be proved for this function. The
potential function we used to ensure the progress is developed in the next section ( 3),
and our motivations for using it are discussed there.
If the potential function is not reduced by the desired amount at the trial points,
we may perform a line search and, if necessary, compute additional directions to ensure
a reduction in this function. This actually happened for the potential function we
discuss in the next section. If this happens, we generate an additional three trial points
by using e, := es := min(e,, e); e, := e,, e := 0; and e := 0, e := es in Step 5 to compute
p, p=, p. The potential function was always reduced by the desired amount at one of
the new trial points. Therefore, on the tested problems, additional vectors were never
computed and explicit line searches were never performed.

3. A potential function. In our view the interior point methods generate one or
more interesting search directions at each iteration and effectively combine these
directions to ensure that sufficient progress in a suitable convergence function is made.
Various proposed methods differ in the directions they compute, in how they combine
these directions (implicitly or explicitly), and in the convergence function they use to
measure the progress [12]. Unfortunately, to our knowledge, the current theoretical
understanding of these methods has not reached a point where a clear superiority of
one method is established. Hence, practical implementations [1], [5], [18], [19], [20],
[22], [26] rely on heuristic arguments and empirical evidence obtained from performing
experiments on a set of real problems.
Use of a suitable potential function is frequently ignored while developing fast
implementations. In our experience, an important reason, among others, is that the
580 SANJAY MEHROTRA

cost of performing line searches in one- or higher-dimensional subspaces is significant


on sparse problems, and it is frequently not justified by the return.
In our opinion use of heuristic arguments is justified, but not at the cost of the
robustness of the solution procedures. Hence, even though in this paper several different
heuristics are proposed and their use justified solely on the basis of empirical evidence,
in the actual implementation we recommend the use of a potential function.
We now develop the potential function that was used to measure progress in our
implementation. We find it instructive to go through some construction to motivate
this function. Some steps used in this construction appeared in Karmarkar [13] and

Let x> ,
others in Goldfarb and Mehrotra [9] and Todd and Ye [31].
0, 7r S O > 0, be any given point. Let Ax o b and :o
In order to solve (P) and (D), it is enough to find an optimal solution of
A TTr+ S o- C. :o
minimize A
Sot. Ax A b,
(3.1) arTr-I s As c,
c rx- b r"a" + ( b rTr- crx ) O,

(x , ,,
7r s
x,&, _->0, i=l,2,...,n.
1) is a feasible interior solution of (3.1). Let Z be a matrix whose columns
are the basis for the null space of A. Multiplying the second set of equations in (3.1)
with JAr"
Z]r and solving for the free variables 7r results in
(3.2) 7r (AA T)-I(Ac As + AAsC);
therefore, solving (3.1) is the same as:
minimize A
s.t. Ax- A b,
PDO) Z Ts Az TO Z Tc,
cTx + b T (AA T)-As + A b T(AA r)-Ac,

where

(3.3)
s% (b TTr-- C X
- Xi, Si, A O,
b T(AA r)-A(). Consider the potential function

F(x, s, A)= p In A- L In
i=1
xisi

for p--2n+x/2n+ 1. In the Appendix we show that F(x, s, A) can be reduced by a


constant amount (.25) at any feasible solution of (PDO).

-
r,
Let k=((b--axk) (c--sk)rz, br(aar)-la(c-sk)-crxk) Note that :k is A k r.
times the last column in (PDO). If x k, 7r k, s k, A k and x k+, r k+l, s k+, A k+ are feasible
solutions of (PDO), then
/k+l
F(x k+l, s k+l, A k+l) F(x k, s k, A k) p In In-
i=1

In IlQfl+ll]-
x/k+ls/k+l
ln
=p
PRIMAL-DUAL INTERIOR POINT METHOD 581

for any nonsingular matrix Q )(n+l)(n+l). An important consequence of this observa-


tion is that it ensures that the potential function

(3.4) E (x, s, :, Q)= p In QII- In x,s,


i-----1

can be reduced by a constant amount at each iteration. We use the following potential
function

E(x, s, )-- p In II(’x, ’sU, ,)11- In x,s,,


il

where K, and Ks are some prespecified constants. The potential function (3.5) is used
for the following reasons: (i) We think that a potential function of the form (3.5) is
superior for developing implementations because it allows for numerical errors [10].
(ii) It is easily computable without having to know Z. (iii) It allows us to separately
update primal and dual solutions and the corresponding error vectors. (iv) There is
no unknown that has to be determined during the algorithm. (v) Finally, it is possible
to compute directions which ensure that (3.3), and therefore (3.5), is reduced by a
constant amount.
The potential function (3.5) is, however, dependent on the scaling of rows (in
general, the choice of Q in (3.4)). Because of this, a search direction that may be
acceptable while using one scaling matrix may become unacceptable for a different
choice. However, in our implementation we use it to our advantage. We think that the
construction of directions p,,1, p=l, psl and px2, p=2, ps2 discussed in the next section
is inherently biased towards finding (nearly) feasible solutions first. The values of x
and s are chosen so that they emphasize primal and dual feasibility over the feasibility

that reduce ,,
of the last equality constraint in (PDO). As a consequence of this, search directions
s
and/or significantly, and do not reduce (or possibly increase) the
error in the last equality constraint of (3.1) become acceptable.
x 100 maxi {s} and s 100 maxi {x} were used for all the problems in our
implementation. The construction of x and s o is described in 7.
A reduction by constant amount in (3.5) at each iteration ensures convergence to
an optimal solution provided that i"--1 In xis remain bounded. On the other hand, if
(3.5) cannot be reduced by a constant amount at some iteration, then either (P) or
(D) or both do not have a feasible solution. We may introduce a constraint providing
an upper bound on x and s if we detect (through some tests) that the method is not
converging.

4. Derivatives of a primal-dual trajectory. This section provides motivation for


using directions px 1, pl, Ps 1, p,,2, p2, ps2 in Procedure AIPM. It was mentioned that
these directions use first and second derivative information of a primal-dual trajectory
at a given point. This section defines the trajectory of interest and also shows how to
compute all of its derivatives at a given point. While we used the potential function
(3.5) to measure the progress, derivatives of the primal-dual trajectory being considered
are used because they are easily computed and found to be effective in practice.
The results in Monteiro, Adler, and Resende [28] are used frequently to develop
these expressions. Monteiro, Adler, and Resende [28] assume that feasible solutions
are available. We do not assume this here. The expressions are given in the context
of linear programming problems. Extensions to convex quadratic programming are
straightforward.
582 SANJAY MEHROTRA

Assume that xk> 0, "k, s k> 0 is the current point. Consider the following system
,asc,
"/l

of nonlinear equations:
X()s()=Xs
Ax( a b +
(4.1)
aTTr(a)+ s(a)= c+
x(,)_->o, s(,)_->o
for a [0, 1]. Let w(a)=(x(a), rr(a),s(a)) represent the solutions of (4.1) for a
given a.
PROPOSIa’ION 4.1. If the system of equations (4.1) has a solution for a =0, then it
has a solution for all re [0, 1 ]. Furthermore, the solution is unique for a (0, 1 ].
Proof For a (0, 1 (4.1) gives the optimality conditions for the weighted logarith-
mic barrier problems
minimize B(x, a)=- c + olks Tx Ol xki sk In x,
i=1

(P) s.t. Ax b + ak
x>0,
and
maximize (b + a)rTr + xsi In si

(D) s.t. AT’It’+ S


s>0.
C+ , i=1

Let x(0), (0), s(0) represent a solution of (4.1) for a 0. For a fixed a e [0, 1],
Y() (1-)x(0)+x is a feasible solution for (P) and (a) (1-a)(0)+
g(a)=(1-a)s(O)+s is a feasible solution for (D). If the feasible set of (P) is
,
bounded, then obviously (P) has a solution.
We now consider the case when the feasible set of (P) is unbounded. Since (a),
g(a) is a feasible solution for (D), it can be shown that the set {dlAd =0, d0,
d O, (c + a)Td O} is empty. Hence, the set Pt {x[Ax b + a, x > O, (c +
a)rx t} is bounded for all values of < m. Fuhermore, since the feasible region
of (P) is nonempty and unbounded, Pt is nonempty for all values of > t*, where t*
is the minimum value of (c+ a)rx subject to Ax b + a, x O. Clearly, B(x, a)
is bounded over Pt for all values of t. Now, to complete the proof for the existence
of the solution, note that in B(x,a), (C+a)Tx increases linearly in t, while
max =, (xs) In xg subject to {x[Ax b + a, x > O, (c + a) TX t} increases only
logarithmically in t.
The proof for the uniqueness of solution follows from the strict convexity of
B(x,a).
Let dw(a)/da be the jth derivative of (a). It is now shown that dJ(a)/da
can be computed for all j at a 1. Differentiating (4.1) for the first time gives

S()
dx()
+ X() ds() Xks k,
da da
dx() x,
(4.2) A
da
ds(a)
da k
PRIMAL-DUAL INTERIOR POINT METHOD 583

The solution of (4.2) at ot 1 is given by

_(ADA T )--1( b AD k),


dot

(4.3)
ds(1)
dot ks _ATd’a’(1)
dot
dx(1)
dot
xk D2 ds(1)
dot
Further differentiating (4.2) gives

(4.4)
1=o () dlxi(1) d(J-l)si(1
dot

ATd;r(1)
dot
_
dot(j-l)

AdJX(1)
ds(1)
dot
=0, j>=2, i=l,...,n,

From (4.4) it is clear that dJw(ot)/dot can be computed recursively. The derivatives
can be computed explicitly from
dJTr(1) _(AXS -, A T )-’ ASk-’u,
dot
dis(l)
da
ATdJr(1)
dot
(4.5)
d Jx(1)
dot
sk-’u -ll- sk-’x k dJs(1)
dot

ui=-j
dx(1)
dot )( d-)s(1) )
dot (j- )
i, i=l,...,n; j>--2.

The recursion (4.5) is the same as the recursion given in Monteiro, Adler, and
Resende [28] and Karmarkar et al. [14] (see also Megiddo [21] and Bayer and Lagarias
[2]). The derivatives resulting from the computations would be the same if :k 0 and
k 0 is assumed, and in the latter paper if no centering and reparameterization is done.
The point w(1 e) for 1 > e > 0 can be approximated by using the rth-order Taylor
polynomial

(4.6) w((1-e), r) -= w(1)/ i (-e)J


j=l j!
dJw(1)
dot

The Taylor polynomial is considered for the following two reasons.


(1) In the special case Monteiro, Adler, and Resende [28] established powerful
results (near the central path) for this approximation.
(2) Computational results indicate that near the optimal solution the first- and
second-order approximations result in nearly unit steps. Our results also indicate that,
asymptotically, w(ot) is well represented by the Taylor polynomial of a lower order.
In this context Megiddo [21] has argued that for problems with unique optimal solution,
if we start close to an optimal solution, the primal-dual paths take us approximately
in a straight line to the optimal solution. Most of the tested problems do not satisfy
this assumption, however they still show this property.
584 SANJAY MEHROTRA

In addition to other things, practical implementations that compute more than


one direction must offset the cost of doing extra work. Adler et al. [1] were the first
to show that in the dual affine scaling method the information from a second derivative
can be used to significantly reduce the number of iterations. However, on problems
in the netlib test set they found that reduction in the number of iterations did not
always translate into reduction in cpu time on sparse problems. Computational results
were also given in Karmarkar et al. [14] on a small set of "representative problems"
using methods implemented in the AT&T KORBX system [3].
This paper restricts itself to using the second-order Taylor polynomial. In fact,
we compute only two directions. The tangent direction d Jw(c)/da is computed at
Step 1 of Procedure AIPM. Step 3 combines the computation of a second derivative
with that of a centering direction. This saves a forward and a back solve. We must
compute two directions at each iteration in order to use the adaptive approach for
computing the centering parameter ( 5).
Our strategy of combining the computations for second derivative and the centering
direction seem to work in practice for the following reasons. (i) The performance of
interior point methods in practice weakly depends on the choice of centering parameter.
(ii) If we view the computations in constructing the Taylor polynomial as that of
finding a search direction, then in practice it appears that a wide range of e can be
used without adversely affecting the performance of the implementation. To illustrate
this, we would like the reader to compare the iteration counts reported in Mehrotra
[24], [25] for a predictor-corrector method with those in Table 8.2. The predictor-
corrector method results if we take e 1 at each iteration.
We find that taking different steps in primal and dual spaces generally results in
superior performance. This is similar to the experience of Choi, Monma, and Shanno
[4] for their method. We construct different polynomials
(4.7) x(e2, 2) -= x k- exPxl + expx2,
2

(4.8) s( e, 2)-= s k- epsl + e2p2


in primal and dual spaces. The computations for ex and es that use (4.7)-(4.8) are
described in Procedure SFSOP (step from second-order polynomial) of Exhibit 4.1.
A procedure that finds the root of a quadratic equation is used to implement SFSOP.

Procedure SFSOP (X k, pxl, px2, ex, s k, psl, p2, es)


Find maximum 0_-< ex--< 1 such that X(ex, 2) is feasible.
Find maximum 0= < e-< 1 such that s(e, 2) is feasible.

EXHIBIT 4.1. Computations for step size using the Taylor polynomial.

Before concluding this section, we point out that if there are reasons to believe
that at the current iterate it is better to target a solution satisfying XS W for some
positive diagonal matrix W, then expressions for derivatives of a trajectory taking us
to such a point can be obtained in a similar manner. The only difference would be to
replace "ogXks k’’ in (4.1) with "aXksk+(1--a)We. In particular, we may use the
Heuristic CENPAR to compute the centering parameter (given in the next section) in
order to decide a target point on the central path, then go back and find desired
derivatives of a trajectory going to this point.
PRIMAL-DUAL INTERIOR POINT METHOD 585

5. Centering. In 4 expressions were developed to construct a Taylor polynomial


at a given point in order to approximate a path going to an optimal solution. Obviously,
the performance of the algorithm depends to a great extent on how well a "small-order"
Taylor polynomial approximates this path at the current point, and on the domain in
which the Taylor polynomial results in good approximations.
The results in Monteiro, Adler, and Resende [28] and the convergence results of
large step polynomial time algorithms proved by Freund [6], Gonzaga and Todd [11],
and Ye [32] implicitly or explicitly use the properties of the central path. The projected
gradient of the potential function used for analysis in these papers encourages centering.
On the other hand, it is not clear if the central path (with equal weights) is the best
path to follow, particularly since it is affected by the presence of redundant constraints
[30]. Furthermore, the points on (or near) the central path are only intermediate to
solving the linear programming problem. It is only the limit point on this path that is
of interest to us.
In view of this, we make our implementation weakly dependent on centering. The
centering direction is obtained by solving the equations
jr txk(AD2A r)-lASk-’e, s A T, x p, ksk-’e D2s
k
/x is called the centering parameter. It was mentioned in 4 that the computation
for the centering direction is combined with computations for the second derivative
to save an extra forward and backward solve.
Heuristic CENPAR given in Exhibit 5.1 was used to compute/x k. In the description
of this heuristic we assume that the direction tangent to the primal-dual affine scaling
trajectory has been computed. The heuristic is adaptive. It attempts to generate a value
of tx k, depending on the progress that could be made by moving in the tangent direction.

Heuristic CENPAR (X k, Px 1, S k, Ps 1, k)
Step 1. Let e,
el be computed as follows:
e)l min ( Xx
(px 1)x
,1 )
lx argmin { xki (pxl),>O},
(px 1)i

e,1 min
(ps 1)1,
/s argmin {( pl)i
si
(pl)i > 0}.
Step 2. Let mdg=(X-exlpl)r(s-eslp.l).

Step 3. Let/x k=- [ tndg’


x rs
\xs]
Step 4. Let ef =(p)I)rD--2(pxl)+(pI)rD2(psl)
xs
Step 5. If (ef> 1.1)/x k=/xk/min (ex, e).
EXHIBIT 5.1. A heuristic to compute centering parameter.
586 SANJAY MEHROTRA

The motivation behind various steps in Heuristic CENPAR are now discussed.
For the moment assume that sex 0 and Cs 0. If this is the case, then x rs is the current
duality gap and mdg is the minimum duality gap that one can achieve by moving in
directions px I and ps I in primal and dual spaces, respectively, ex I and e 1 are always
taken smaller than one, because at this value the computations for p 1 and p 1 ensure
that sex 0 and s 0 if no numerical error is present. Hence, for v 1 the choice of
/x is such that it targets the point on the central path at which the duality gap is mdg.
The ratio mdg/xTs provides us "some indication" of how well the primal-dual
affine scaling trajectory is being approximated locally. A value of ratio mdg/x Ts near
1 means that the local approximations are not good, whereas mdg/xTs near zero
indicates that the approximations of the trajectory are good.
Table 5.1 gives the number of iterations required to solve the problems for choices
of v- 1, 2, 3, 4. All other parameters were the same as those for results in Table 8.2.
The last column of this table gives the number of iterations required to solve the
problem if no centering was done. The results in Table 5.1 on the test problems show
only a moderate variation in the number of iterations for values of v between two and
four.
The discussion on the computation of the centering parameter thus far assumed
that x 0 and O. If this is not the case, then ef (error factor) is used as an indicator
for their contribution to the search direction. If
that
: 0 and : O, then it is easy to see

Dp, 1 DA T AD2A T )- AD ](xks ’) /2 e,


D-lpx 1 [I- DA T(AD2A T)-AD](xksk)I/2e;
hence ef as defined in Step 4 of Exhibit 5.1 is equal to 1. If ef is smaller than 1, it
:x :
indicates that the presence of and is probably reducing the norm of the search
direction and, therefore, it is expected to allow for larger steps in primal and/or dual
spaces. Since k and k reduce linearly in step size when moving in directions p 1 and
psl, respectively, larger steps result in greater reduction in the error vectors. Hence,
the value of ef smaller than one is not likely to hurt the performance of the
implementation.
Now if ef> 1, then empirical results indicate that the presence of and/or ,
results in a reduction in the step length, which adversely affects the improvement in
s
the duality gap as well as the reduction in the error vectors. Therefore, it might be
indicating trouble ahead. If this happens in practice, we seem to quickly get out of
the trouble spots by placing more emphasis on centering. In the current implementation
this is accomplished in Step 5.

6. Step length. Standard practice [1], [5], [18], [19], [20], [26] has been to move
a certain fixed distance (step factor) to the boundary to avoid one-dimensional line
searches. The step factor in the case of primal-dual methods has typically been .995
or .9995 18].
Although the performance of step factor =.995 or .9995 appears satisfactory in
practice, in our view, it has a major drawback as a heuristic: it limits the asymptotic
rate of convergence of the algorithm. Furthermore, during the earlier phase of the
algorithm it is overly aggressive. A modified approach to computing step factor is
given in Exhibit 6.1. It adaptively allows for larger (and smaller) step factors. In an
extreme case it may allow a full step to the boundary and generate a point with zero
duality gap.
PRIMAL-DUAL INTERIOR POINT METHOD 587

TABLE 5.1
Performance of implementation for different choices of u. + Stopped
after 100 iterations with one digit of accuracy in objective function.

Problem 2 3 4 nc

afiro 9 8 7 7 8
adlittle 14 11 10 10 11
scagr7 17 14 13 13 15

stochforl 16 16 16 16 21
sc205 14 12 11 11 11
share2b 14 12 12 13 14

sharelb 23 22 22 20 25
scorpion 13 12 12 12 12
scagr25 19 17 16 17 18

sctapl 17 15 15 15 17
brandy 21 20 20 19 19
scsdl 9 8 8 8 8

israel 30 25 24 24 26
bandm 20 17 17 19 17
scfxml 21 19 18 18 19

e226 25 21 20 20 21
agg 27 22 25 27 56
scrs8 24 21 21 21 22

beaconfd 11 8 7 7 7
scsd6 13 10 10 10 10
shipO4s 15 12 13 13 15

agg2 23 21 24 23 26
agg3 22 19 21 21 23
scfxm2 22 18 19 19 20

ship041 14 12 12 12 12
fffffS00 36 38 38 38 100
ship08s 16 13 13 13 13

sctap2 15 13 12 12 13
scfxm3 25 20 20 20 19
shipl2s 18 16 16 17 19

scsd8 12 10 9 9 9
czprob 39 33 35 35 38
ship081 18 14 14 14 14

shipl21 19 16 16 17 17
25fv47 32 27 26 25 27
588 SANJAY MEHROTRA

Procedure GTSF (x k, Px, s k, Ps, fx, Z


Let

and
Ix argmin { xi

ls argmin
s,
(Ps)i I(p),>o}.
Compute fx such that

(X k,-L * (p),.)(s-(p.),)=
(x-p)(s-p)
(6.1) n * "ya
f := max (L, Yy)
and f such that

(S,x-f * (ps),s)(X-(px),s)= (xk--px)T(sg--ps)


(6.2) n*
f max (f, yy)
EXHIBIT 6.1. Computation of step factor.

The step factor in the primal space is chosen so that the product of primal blocking
variable (lx) and the corresponding dual slack is nearly equal to the value their product
would take at the point on the central trajectory at which the duality gap is equal to
:
(X--px)r(s--ps)/(n * ya). Note that if sex=0 and =0, then (x--px)T(s--p) is the
duality gap at the point obtained after moving full step. The parameter y, > 1 should
be used. The parameter 0 < y/<_- 1 is used to safeguard against very small or negative
steps. The explanation for computation of step factor for the dual variables is similar.
In essence, the choice offx and f is guessing the minimizer of potential function (3.5)
in directions Px and p while implicitly assuming that x
0 and :
0.
Provided that the computations for the search directions were performed with
sufficient accuracy, the computational experience on the tested problems shows that
the number of iterations required to solve the problems is relatively insensitive to the
choice of y and y/in a large range. We experimented with values of y .5, .75, .9,
.99, .999 and y 1/(1-yy). In our experience we found that on most problems the
number of iterations were fewer for a larger choice of yy, but the difference was small
for yy in the range .75 to .999.
However, we observed a very interesting phenomenon. The implementation
showed signs of instability for larger values of y/for problems brandy, scfxml, scfxm2,
and scfxm3. For these problems to obtain eight digits of accuracy in the solutions at
the last two iterations of the algorithm, the conjugate gradient method was needed to
improve the accuracy in the search direction. These problems were successfully solved
to the desired accuracy for yy .5, .75 and yy .9.
An examination of problem data of brandy, scfxml, scfxm2, and scfxm3 shows
that for these problems the set of optimal primal solutions is unbounded. An examin-
ation of various stages of our implementation (with different choices of parameters)
revealed that allowing for a larger step factor may result in premature convergence of
dual slacks corresponding to primal variables unbounded in the optimal set. At a later
iteration, this further causes the primal unbounded variables to become very large.
PRIMAL-DUAL INTERIOR POINT METHOD 589

Hence, xi/si corresponding to these variables become disproportionately large. This


results in cancellation of large numbers when computing the Cholesky factor, causing
computations to become less stable.
The reader is referred to Mehrotra [23] for a more elaborate discussion of the
precision of computations in the context of interior point methods and to Mehrotra
[22] for a discussion of issues involved in developing implementations based on the
preconditioned conjugate gradient method, which we may want to use to improve the
numerical accuracy.
7. Initial point. In all of our implementations the initial point is generated as
follows. We first compute
(7.1) =(AAT)-IAc; =c--AT’n’; :=AT(AAT)-lb,
and 6 max (-1.5 min {)i}, O) and 6 max (-1.5 min {i}, 0). We then obtain
(,+6xe)T(+6s e)
(7.2) 6, x + .5 *
(Y + &,e) r(g+ ,se)
(7.3) 6, 3 + .5
and generate 7r
initial point.
= L
and si=+ ,i=l,...,n and x=Y+ ,i=l,...,n as an

We first discuss the validity of the above approach in generating x> 0 and s> 0.
In the cases in which it fails to produce such a point, either the problems reduce to
that of finding a feasible solution of (P) or (D), or an optimal solution is generated.
From the definition of 6 and we know that > 0 and sO_- x= > 0. A positive point
is always generated, if 6 > 0 and 6 > 0. Furthermore, to show that x> 0 and s> 0,
it is sufficient to show that 6 > 0 and 6 > 0.

solution for (P) and ,


First consider the case when 6 0 and 0. Clearly, in this case Y is a feasible
is a feasible solution for (D). If y T 0, then these solutions
are optimal for the respective problems. Otherwise, yT> 0 and hence > 0 and > 0. 6"
Now consider the case when 6 =0 and 6 > 0. In this case if Y # 0 for all i, then
obviously > 0 and 6 > 0. On the other hand, if Y 0 for all i, then b 0 and the
problem reduces to that of finding a feasible solution of (D). This problem can then
be solved separately or by generating a perturbed problem for which the right-hand
side is Ae for any positive 6. 6e can be used as a feasible interior solution of the
perturbed problem, and (7.2)-(7.3) can be used to generate a feasible point of the
perturbed problem. Finally, the case when 6 > 0 and 6 0 can be argued in a similar
manner.
We now discuss some properties of the proposed approach.
7.1. Desirable properties of the proposed approach.
Shift in origin. The approach is independent of the shift in origin. To explain
what we mean by this, consider the dual problem
maximize b r (Tr + A 7r)
(D(A)) s.t. AT(’rr+ATr)+s=c,
s>_--0

for any fixed choice of Act. Clearly, the polytope defined by the constraints in (D(A))
is the same as the polytope defined by the constraints in (D) except for a shift of
590 SANJAY MEHROTRA

origin. It is desirable that an initial point be the same in relation to the respective
polytopes. Note that in (7.1) is independent of the choice of Ar.
To demonstrate how similar arguments hold for (P), we consider an equivalent
formulation of this problem. Let Z be a matrix whose columns form the basis for the
null space of A, and let Xo be any point satisfying Axo b. It is easy to see that (P)
is equivalent to
minimize (cTZ)(y+ Ay)
(p(A))
s.t. Z(y+ Ay) >=-Xo
for y n-m and any (fixed) choice of Ay. An approach analogous to that of finding
g computes
_(ZZ)--’Zxo.
The slacks in the constraint of (P(Z)) are given by
Xo z(zrz)-lZrxo [I z(zrz)-lzr]xo a r (aa r)-laxo a r (aa r)-l b
Note that is the orthogonal projection of any vector satisfying Ax b onto the range
.
space of A. Since and g are independent of Ay and Ar, respectively, it is obvious
that x and s o are also independent of this.
Simple scaling. The initial point is not affected if all the constraints in (P) are
scaled by a constant or if c is scaled.
7.2. Undesirable properties of the proposed approach.
Column scaling. The initial point is affected by the scaling of columns of A. To
illustrate this, in Table 7.1 we give a number of iterations required to solve the problems
after problems were scaled by using subroutine MSSCAL from MINOS. All other
details of the implementation were kept the same as those for results in Table 8.2.
The results in Table 7.1 indicate that scaling of columns may effect the performance
of the implementation. On most problems, use of MSSCAL improved the number of
iterations required to solve the problem or the number of iterations did not change
significantly. The notable exception was problem fffffS00. For this problem, scaling
increased the number of iterations by about 40 percent.
We point out that (P) and (D) can be solved in one iteration if we know the
correct scaling of columns of A. Hence, the problem of finding the best scaling of
columns appears to be as difficult as solving the linear programming problem itself.
Presence of redundant constraint. The initial point generated by using this approach
is affected by the presence of redundant constraints. This can be a problem, for example,
if redundant dual constraints with large slacks (primal variables with huge costs) are
present. In this regard we point out that the central trajectory itself is affected by the
presence of such constraints.
A possible way to alleviate this problem would be to ask the user to give relative
importance to various primal variables (dual constraints) in solving the problem. A
clearly redundant variable could be assigned zero weight, and therefore it could be
removed while generating an initial point. In general, the possibility of solving weighted
least squares problems to generate initial points in the context of "warm start" should
be explored further.
All the approaches [1], [5], [18], [26] used for practical implementations that are
reported in the literature are dependent on column scaling and the presence of
redundant constraints. Furthermore, they lack one of the desirable properties that the
proposed approach has.
PRIMAL-DUAL INTERIOR POINT METHOD 591

TABLE 7.1
Effect of scaling on the performance of the algorithm.
Problem No scaling Scaling

afiro 7 6
adlittl 10 10
scagr7 13 14

stochforl 16 8
sc205 11 11
share2b 12 14

sharelb 22 24
scorpion 12 12
scagr25 16 17

sctapl 15 15
brandy 20 17
scsdl 8 7

israel 24 17
bandm 17 15
scfxml 18 17

e226 20 16
agg 25 16
scrs8 21 18

beaconfd 7 8
scsd6 10 10
ship04s 13 12

agg2 24 20
agg3 21 19
scfxm2 19 20

ship041 12 12
fffffSO0 38 52
shipO8s 13 14

sctap2 12 12
scfxm3 20 20
shipl2s 17 14

scsd8 9 9
czprob 35 30
ship081 14 15

shipl21 16 15
25fv47 26 23

The choice of constants "1.5" and ".5" in computing x


and 8 is arbitrary. The
arguments about the validity of the approach (and its properties) do not change if we
replace 1.5 with any constant larger than one and .5 with any constant larger than
zero. We may do a one-dimensional line search (possibly in direction e) on the potential
function (3.5) to generate these constants.
592 SANJAY MEHROTRA

8. Computational performance. The basic ideas presented in this paper were imple-
mented in a FORTRAN code. This section discusses our computational experience
with this implementation and compares it with the results documented in the literature.
All testing was performed on a SUN 4/110 work station. The code was compiled
with SUN Fortran version 1.0 compiler option "-O3." All the cpu times were obtained
by using utility etime. The test problems were obtained from netlib [7]. The test set
includes small and medium size problems. The problem names and additional informa-
tion on these problems are given in Table 8.1.
All problems were cleaned by using the procedure outlined in Mehrotra [22]. An
in-house implementation of the minimum degree heuristic was used to permute the
rows. Complete Cholesky factor was computed at each iteration to solve the linear
equations. The procedure and the associated data structure, which we used to compute
the Cholesky factor, were described in Mehrotra [23]. All the linear algebra subroutines
were written by the author.
The algorithm was terminated when the relative duality gap satisfied
c Tx b TTr
(8.1) e exit.
1 +lbTTrl
In the actual implementation :, and :s were computed afresh at each iteration. However,
(s)i was set to zero and absorbed in si if it satisfied I(s)l/s <.001. This occasionally
saved some computational efforts.
The following parameters were set to obtain the results reported in Table 8.2.
Eexit 10 -8 (in (8.1)),
v=3 (in Procedure CENPAR),
yf--.9, ya =10 (in Procedure GFSF),
100 * max {s} (in (3.5)),
Ks 100 max {x/} (in (3.5)).
The number of iterations required to solve the problems is given in the second
column of Table 8.2. The primal objective value recorded at termination is given in
column 3. The relative duality gap (8.1) is given in column 4. The primal infeasibility,
(8.2) Ilax b II/(1 + Ilxll),
and the dual infeasibility,
(8.3) IIA% + s- c11/(1 + s II),
recorded at termination are given in columns 5 and 6, respectively. This information
on relative duality gap and primal and dual feasibility is the same as that given in
Lustig, Marsten, and Shanno [18]. We use the same stopping criterion and provide
similar information in order to be consistent while making comparisons.
All the problems were accurately solved to eight digits. A comparison with the
results in Lustig, Marsten, and Shanno [18] show that on many problems in our
implementation the accuracy in the objective value at termination was better. This is
primarily due to our approach for computing the step factor.
In Table 8.3 we compare the number of iterations required to solve the test
problems. Column 2 of this table gives the number of iterations taken by our
implementation. Column 3 gives the number of iterations taken by the dual affine
PRIMAL-DUAL INTERIOR POINT METHOD 593

scaling method, as reported by Adler et al. 1 ]. Column 4 gives the number of iterations
required by the second-order dual affine scaling method in Adler et al. 1]. Column 5
gives the number of iterations reported in Lustig, Marsten, and Shanno [18] for a
primal-dual method. Column 6 gives the number of iterations reported in Gill, Murray,
and Saunders [8] for a logarithmic barrier function method. Column 7 gives the number

TABLE 8.1
Problem statistics.

Problem Rows Columns Nonzeros

afiro 28 32 88
adlittle 57 97 465
scagr7 130 140 553

stochforl 118 111 474


sc205 206 203 552
share2b 97 79 730

sharelb 118 225 1,182


scorpion 389 358 1,708
scagr25 472 500 2,029

sctapl 301 480 2,052


brandy 221 249 2,150
scsdl 78 760 3,148

israel 175 142 2,358


bandm 306 472 2,659
scfxml 331 457 2,612

e226 224 282 2,767


agg 489 163 2,541
scrs8 491 1,169 4,029

beaconfd 174 262 3,476


scsd6 148 1,350 5,666
ship04s 403 1,458 5,810

agg2 517 302 4,515


agg3 517 302 4,531
scfxm2 661 914 5,229

ship041 403 2,118 8,450


fffff800 525 854 6,235
shipO8s 779 2,387 9,501

sctap2 1,091 1,880 8,124


scfxm3 991 1,371 7,846
shipl2s 1,152 2,763 10,941

scsd8 398 2,750 11,334


czprob 930 3,523 14,173
ship081 779 4,283 17,085

shipl21 1,152 5,427 21,597


25fv47 822 1,571 11,127
594 SANJAY MEHROTRA
PRIMAL-DUAL INTERIOR POINT METHOD 595
596 SANJAY MEHROTRA

TABLE 8.3
Comparison of number of iterations with other implementations.
Problem AIPM AKRV2 AKRV1 LMS GMS DBDW

afiro 7 15 20 13 20 10
adlittle 10 18 24 17 18 15
scagr7 13 19 24 22 24 18

stochforl 16 19
sc205 11 20 28 16 32 17
share2b 12 21 29 17 46 14

sharelb 22 33 38 40 35 28
scorpion 12 19 24 18 33 17
scagr25 16 21 29 24 28 21

sctapl 15 23 33 22 53 21
brandy 20 24 38 27 41 21
scsdl 8 16 19 12 13 8

israel 24 29 37 47 36 24
bandm 17 24 30 28 31 21
scfxml 18 30 33 31 37 23

e226 20 30 34 31 38 24
agg 25 32
scrs8 21 29 39 50 59 25

beaconfd 7 17 23 21 34 17
scsd6 10 18 22 15 15 13
ship04s 13 22 30 21 40 15

agg2 24 32
agg3 21 32
scfxm2 19 29 39 37 42 29

ship041 12 21 28 22 36 17
fffffS00 38 59 55
ship08s 13 21 32 23 34 16

sctap2 12 25 34 23 41 16
scfxm3 20 30 40 39 42 31
shipl2s 16 23 35 27 46 16

scsd8 9 18 23 15 15 13
czprob 35 35 52 57 56 41
ship081 14 23 31 24 22 17

shipl21 16 23 32 27 24 17
25fv47 24 52 48 44 28

of iterations reported in Domich et al. [5] for a variant of the method of centers. Our
method and the second-order dual affine scaling method in Adler et al. 1] computes
two directions at each iteration. The method implemented in Domich et al. [5] computes
three directions at each iteration and solves a linear programming problem defined by
PRIMAL-DUAL INTERIOR POINT METHOD 597

using these directions. All other implementations compute only one direction at each
iteration.
On the average the number of iterations required by our implementation to solve
the mutually tested problems is 40 percent less than that reported in Lustig, Marsten,
and Shanno [18]; it is 50 percent less than that reported in Adler et al. [1]; and it is
55 percent less than that reported by Gill, Murray, and Saunders [8]. Compared to
the results in Adler et al. [1] for their second-order method, the results show that our
method takes about 35 percent fewer iterations. Finally, our implementation required
20 percent fewer iterations than those required in Domich et al. [5].
It is useful to point out that it is possible to further reduce the total number of
iterations needed to solve the problems by using higher-order derivatives. This is
discussed in a subsequent paper.
The number of iterations was always fewer than the number of iterations required
by the second-order dual affine scaling method implemented by Adler et al. 1]. Many
of the problems were solved in practically half the number of iterations when compared
with 1 ].
9. Comparison with OB1 and MINOS 5.3. This section compares the cpu times
required by our implementation to those required by the implementation of the
primal-dual method in OB1 [18] (02/90 version) and the simplex method in MINOS
5.3 [29]. The source codes (also written in FORTRAN) of OB1 (02/90 version) and
MINOS 5.3 were compiled using compiler option "-O3." Hence, everything was
identical while making these comparisons. All the default options of OB1 (02/90
version) and MINOS 5.3 were used. Printing was turned to minimum level in both
cases. In the case of OB1 (02/90 version), crush 2 was used for all problems.
The times for MINOS 5.3 are those for subroutine M5SOLV only. The TIMER
subroutine in OB 1 was used to compute its cpu times. The times for OB 1 were calculated
as follows:
OB1 Time=end of hprep-after mpsink+end of obdriv-after getcmo.
Times required by MINOS 5.3, OB1 (02/90 version), and our implementation do
not include times spent in converting the MPS input file into a problem in the standard
form. The times required by our implementation include all the time spent after the
input files were converted into a problem in the standard form.
The times required by our implementation is given in the second column of Table
9.1. The times required by OB1 (02/90 version) are given in column 3 and the times
required by MINOS are given in column 4. Column 5 gives the ratio of times required
by OB1 (02/90 version) to our code. Column 6 gives the ratio of times required by
MINOS 5.3 to our code.
From these results we find that, on the average, our implementation in the current
state performs two times better than the implementation in OB1 (02/90 version). The
ratio of cpu times with OB1 (02/90 version) is more or less uniform.
Comparing the results with MINOS 5.3 we find that the proposed implementation
is on the average better by a factor of 2.5. In this case, however, the ratio of cpu times
varies significantly. MINOS 5.3 was generally superior on problems with few relatively
dense columns, whereas our implementation of the primal-dual method was superior
on problems with sparse Cholesky factor.
10. Conclusions. Details of a particular implementation of the primal-dual method
are given. This implementation requires a considerably smaller number of iterations
and saves considerable computational effort. We have given expressions to compute
598 SANJAY MEHROTRA

TABLE 9.1
Comparison of cpu time with OB1 and MINOS 5.3 on SUN 4/110.

OB1 MINOS5.3
Problem AIPM OB1 MINOS5.3
AIPM AIPM

afiro .12 .60 .09 5.0 .7


adlittle .64 1.81 .70 2.8 1.1
scagr7 1.11 2.75 1.66 2.5 1.5

stochforl 1.48 2.91 1.50 2.0 1.0


sc205 1.49 3.33 2.16 2.2 1.4
share2b 1.50 3.00 1.46 2.0 1.0

sharelb 3.27 9.21 3.98 2.8 1.2


scorpion 2.87 7.06 5.92 2.4 2.0
scagr25 5.23 10.43 15.32 2.0 2.9

sctapl 4.92 9.18 7.71 1.8 1.6


brandy 7.18 15.30 11.55 2.1 1.6
scsdl 2.46 5.46 6.15 2.2 2.5

israel 58.37 127.01 6.11 2.2 .1


bandm 8.01 17.61 22.09 2.2 2.7
scfxml 10.55 20.82 12.76 2.0 1.2

e226 9.38 15.83 15.30 1.7 1.6


agg 32.88 47.46 7.32 1.4 .2
scrs8 13.31 43.95 40.86 3.3 3.1

beaconfd 2.56 9.28 1.97 3.6 .8


scsd6 5.66 10.56 31.19 1.9 5.5
ship04s 6.92 17.58 6.63 2.5 .95

agg2 65.86 100.92 10.05 1.5 .15


agg3 66.66 94.74 10.95 1.4 0.16
scfxm2 21.69 46.74 51.42 2.1 2.37

ship041 8.90 24.35 13.20 2.7 .54


iliif800 80.90 140.01 14.33 1.7 .17
ship08s 9.50 23.05 20.43 2.4 2.1

sctap2 30.04 47.75 56.02 1.6 1.9


scfxm3 33.31 72.84 107.40 2.1 3.2
shipl2s 14.06 31.78 47.85 2.2 3.4

scsd8 10.38 21.98 230.23 2.1 22.2


czprob 33.78 81.93 166.03 2.4 4.9
ship081 18.12 42.81 19.34 2.4 1.0

shipl21 28.56 63.11 120.31 2.2


25fv47 164.53 334.14 941.41 2.0

Total 766.20 1,507.29 2,011.4


PRIMAL-DUAL INTERIOR POINT METHOD 599

all the derivatives at a given point of a primal-dual affine scaling trajectory. The
implementation described here effectively combines the second derivative with the
centering vector. Heuristics for computing centering parameter and step length were
given and their effectiveness was demonstrated. A new approach to generating a starting
point was used. In addition, the results demonstrate that it is possible to develop fast
(robust) implementations of interior point methods, which ensure sufficient reduction
in a potential function at each iteration.
Comparison with OB1 (02/90 version) and the simplex method show that our
implementation was faster by a factor of 2 and 2.5, respectively.
Acknowledgments. I thank Professor Michael Saunders for making the source code
of MINOS 5.3 available. I thank Professor Roy E. Marsten for releasing a copy of
OB1 for comparison in this paper. Also, I thank Mr. I. C. Choi for helping out with
OB1 and Professor Donald Goldfarb for his constant encouragement, which made this
work possible. I also thank the two anonymous referees for their careful reading of
this paper and for their suggestions for improvement.
Appendix. Here we prove that the potential function (3.3) can be reduced by a
constant amount at each iteration. The development of our proof is based on the
analysis in Freund [6]. Let us define I-= 2n + 1,

,i=- o z
cT (AAT) -1A a
f) r =_ r,
(b crZ, b r(AA T)-IAc), and y (x s A) r, r, r. Hence, without loss of generality,
consider the problem
minimize yt
(PD) s.t. Ay b,
y>0,=
where y e R t. Let us consider the potential function

(A.1) F(y)=- In y-
i=1
E In y,,

where =/+vq. The function F(y) in (A.1) is the same as the function (3.3). Let
yk > 0 be any feasible point of (PD) and let yk be the diagonal matrix whose diagonal
elements are (Yl,..., Yt). Let

where , yk. Let y k+l


d -= [I- r (,r)-lA] (/e,_ e),
y k (e /]l d II) Y d, < 1. Then
y/k+1 Z yk+
F(y k+ )- F(y k) 3 In y----- )’, In
i=1 y--’.k
=ln 1-ll--d -i=lln 1- 2

2(1- e)
2

2(1-e)"
600 SANJAY MEHROTRA

The inequality above follows by using the fact that In (1 + 8)_-< 8 for 8>-1, and
In (1 + 8) -> (82/2(1 e)) if 181--< e < 1.
If [[dl[->_ 1, then for e =.5, F(y) is reduced by .25.
Otherwise, if lid < 1, then from the definition of d, we have

,’ 7" ,, (
7"

P
fi y e
k

e)+ y
P
(d+e)= y e

which gives a feasible solution to the dual of (PD). Furthermore, the duality gap is
given by

yk T"( + e) <=Y t/VTIId


e d
l+vQ
<ykt"
But y/k is also the current objective value. Therefore, the optimal objective value of
(PD) must be positive. If this is the case, then either (P) or (D) has no feasible
solution, and we would stop.
Hence, if (P) and (D) have a feasible solution then F(y) can be reduced by .25
at each iteration. A failure to reduce F(y) by this amount would imply that either (P)
or (D) has no feasible solution.

REFERENCES
I. ADLER, N. KARMARKAR, M. G. C. RESENDE, G. VEIGA (1989), An implementation of Karmarkar’s
algorithm for linear programming, Math. Programming, 44, pp. 297-336.
[2] D. A. BAYER AND J. C. LAGARIAS (1989), The nonlinear geometry of linear programming: I. Affine
and projective scaling trajectories, II. Legendre transform coordinates and central trajectories, Trans.
Amer. Math. Soc., pp. 499-581.
[3] Y. C. CHENG, D. J. HOUCK, J. M. LIU, M. S. MEKETON, L. SLUTSMAN, R. J. VANDERBEI, AND
P. WANG (1989), The AT&T KORBX System, AT&T Teeh. J., 68, pp. 7-19.
[4] I. C. CHOI, C. MONMA, AND D. F. SHANNO (1990), Further development of a primal-dual interior
point method, ORSA J. Comput., 2, pp. 304-311.
[5] P. D. DOMICH, P. T. BOGGS, J. R. DONALDSON, AND C. WITZGALL (1989), Optimal 3-dimensional
methodsfor linearprogramming, NISTIR 89-4225, Center for Computing and Applied Mathematics,
U.S. Dept. of Commerce, National Institute of Standards and Technology, Gaithersburg, MD.
[6] R. FREUND (1988), Polynomial-time algorithms for linear programming based only on primal scaling and
projected gradient of a potential function, Working Paper OR 182-88, Operations Research Center,
Massachusetts Institute of Technology, Cambridge, MA.
[7] D. M. GAY (1985), Electronic mail distribution of linear programming test problems, Mathematical
Programming Society Committee on Algorithms News Letter, 13, pp. 10-12.
[8] P. E. GILL, W. MURRAY, AND M. A. SAUNDERS (1988), A single-phase dual barrier method for linear
programming, Report SOL 88-10, Systems Optimization Laboratory, Stanford Univ., Stanford, CA.
[9] D. GOLDFARB AND S. MEHROTRA (1988), A relaxed ofKarmarkar’s algorithm, Math. Programming,
40, pp. 285-315.
10] (1989), A self-correcting version ofKarmarkar’s algorithm, SIAM J. Numer. Anal., 26, pp. 1006-
1015.
11] C. GONZAGA AND M. J. TODD (1989), An O(x/-ffL) iterations large-step primal-dual affine algorithm
for linear programming, Tech. Report, School of Operations Research and Industrial Engineering,
Cornell University, Ithaca, NY.
12] D. D. HERTOG AND C. Ross (1989), A survey of search directions in interior point methods for linear
programming, Report 89-65, Faculty of Mathematics and Informatics/Computer Science, Delft
Univ. of Technology, Delft, Holland.
13] N. KARMARKAR (1984), A new polynomial time algorithm for linear programming, Combinatorica, 4,
pp. 373-395.
[14] N. KARMARKAR, J. C. LAGARIAS, L. SLUTSMAN, AND P. WANG (1989), Power series variants of
Karmarkar-type algorithms, AT&T Tech. J., May/June, pp. 20S36.
PRIMAL-DUAL INTERIOR POINT METHOD 601

[15] M. KOJIMA, S. MIZUNO, AND A. YOSHISE (1989), A primal-dual interior point algorithm for linear
programming, in Progress in Mathematical Programming, Interior Point and Related Methods, N.
Megiddo, ed., Springer-Verlag, New York, pp. 29-47.
[16] (1991), An O(x/-ff L)-iteration potential reduction algorithm for linear complementarity problems,
Math. Programming, 50, pp. 331-342.
17] I. J. LUSTIG (1991), Feasibility issues in a primal.dual interior-method for linear programming, Math.
Programming, 49, pp. 145-162.
18] I.J. LUSTIG, R. E. MARSTEN, AND O. F. SHANNO (1989), Computational experience with a primal.dual
interior point method for linear programming, TR J-89-11, Industrial and Systems Engineering
Report Series, Georgia Inst. of Technology, Atlanta, GA.
[19] R. E. MARSTEN, M. J. SALTZMAN, O. F. SHANNO, G. S. PIERCE, AND J. F. BALLINTIJN (1989),
Implementation of a dual affine interior point algorithm for linear programming, ORSA J. Comput.
1, pp. 287-297.
[20] K.A. MCSHANE, C. L. MONMA, AND O. SHANNO (1989), An implementation of a primal-dual interior
point method for linear programming, ORSA J. Comput., 1, pp. 70-83.
[21] N. MEGIDDO (1986), Pathways to the optimal set in linear programming, in Progress in Mathematical
Programming, N. Megiddo, ed., Springer-Verlag, New York, pp. 131-158.
[22] S. MEHROTRA (1992), Implementations of affine scaling methods: Approximate solutions of systems of
linear equations using preconditioned conjugate gradient methods, ORSA J. Comput., 4,
pp. 103-118.
[23 (1989), Implementations of affine scaling methods: Towards faster implementations wiih complete
Cholesky factor in use, TR-89-15R, Dept. of Industrial Engineering/Management Science, North.
western Univ., Evanston, IL.
[24] (1991), On finding a vertex solution using interior point methods, Linear Algebra Appl., 152,
pp. 233-253.
[25] (1990), On an implementation of primal-dual predictor-corrector algorithms, presented at the
Second Asilomar Workshop on Progress in Mathematical Programming, Asilomar, CA.
[26] C. L. MONMA AND A. J. MORTON (1987), Computational experience with a dual affine variant of
Karmarkar’s method for linear programming, Oper. Res. Lett., 6, pp. 261-267.
[27] R. C. MONTEIRO AND I. ADLER (1989), An O(n3L) primal-dual interior point algorithm for linear
programming, Math. Programming, 44, pp. 43-66.
[28] R. C. MONTEIRO, I. ADLER, AND M. G. C. RESENDE (1990), A polynomial-time primal-dual ane
scaling algorithm for linear and convex quadratic programming and its power series extension, Math.
Oper. Res., 15, pp. 191-214.
[29] B. A. MURTAGH AND M. A. SAUNDERS (1983), MINOS 5.0 user’s guide, Tech. Rep. SOL 83-20,
Dept. of Operations Research, Stanford Univ., Stanford, CA.
[30] M. J. TODD (1989), The affine.scaling direction for linear programming is a limit of projective.scaling
direction, Tech. Rep., School of Operations Research and Industrial Engineering, Cornell Univ.,
Ithaca, NY.
[31] M. J. TODD AND Y. YE (1990), A centered projective algorithm for linear programming, Math, Oper.
Res., 15, pp. 508-529.
[32] Y. YE (1991), An O(n3L) potential reduction algorithm for linear programming, Math. Programming,
50, pp. 239-258.

View publication stats

You might also like